[Home] [Downloads] [Search] [Help/forum]

Gammon Forum

See www.mushclient.com/spam for dealing with forum spam. Please read the MUSHclient FAQ!

[Folder]  Entire forum
-> [Folder]  MUSHclient
. -> [Folder]  Lua
. . -> [Subject]  Parsing XML documents
Home  |  Users  |  Search  |  FAQ
Username:
Register forum user name
Password:
Forgotten password?

Parsing XML documents

It is now over 60 days since the last post. This thread is closed.     [Refresh] Refresh page


Posted by Nick Gammon   Australia  (21,322 posts)  [Biography] bio   Forum Administrator
Date Sun 13 Nov 2005 01:17 AM (UTC)

Amended on Sun 13 Nov 2005 01:26 AM (UTC) by Nick Gammon

Message
Version 3.69 of MUSHclient adds a new function (xmlread) to the "utils" table, which uses MUSHclient's internal XML parser to parse an XML string you supply. This effectively would let you parse triggers, aliases etc. that you have copied to the clipboard as text (or created with ExportXML script routine), and see exactly what each value is set to. Or, by reading a MUSHclient world file into memory as a string, you could parse that.

The XML parser is not necessarily 100% industry-standard XML parsing, however it is the method MUSHclient uses for its own XML documents, and should be reasonably compatible with standard XML unless you use some of the more fancy XML extensions. It should certainly parse the XML output by MUSHclient itself (eg. triggers, aliases, world files, plugins) as that is the same routine it uses to read them in.

You pass to the parser a single string, which is the XML to be parsed. If the parsing is successful three results are returned:


  • The root node (all other nodes are children of this node)

  • The root document name (eg. "muclient")

  • A table of custom entities in the document, or nil if no custom entities


If the parsing fails, three results are returned:


  • nil - to indicate failure

  • The error reason

  • The line the error occurred at



You can pass the first 2 results to "assert" to quickly check if the parsing was successful.

Each node consists of a table with the following entries:


  • name - name of the node (eg. <trigger>foo</trigger> - the name is "trigger")

  • content - contents of the node (eg. <trigger>foo</trigger> - the content is "foo")

  • empty - boolean to indicate if the node is empty. (eg. <br/> is an empty node)

  • line - which line in the XML string the node occurred on (eg. line 5)

  • attributes - a table of attributes for this node, keyed by the attribute name

    (eg. "world_file_version"="15"). Attribute names have to be unique so we can used a keyed lookup to find them.

    The attributes table is not present if there are no attributes defined.

  • nodes - a table of child nodes, keyed by ascending number (the order they appeared in). Each child node has the same contents as described above.

    Children are not necessarily unique (eg. there may be more than one <trigger> node in a document) so they are keyed by number, and not by node name.

    The nodes table is not present if there are no children of this node.



Example of use:


a, b, c = utils.xmlread [[
<foo width="1" height="2">
contents of foo
  <bar west="true" fish="bicycle">
  child of foo
  </bar>
</foo>
<goat blood="100">eep</goat>
]]

if not a then 
  print ("error on line = ", c) 
end -- if

assert (a, b)

tprint (a)


Output:


"line"=0
"name"=""
"content"=""
"nodes":
  1:
    "line"=2
    "name"="foo"
    "nodes":
      1:
        "line"=4
        "name"="bar"
        "content"="
  child of foo
  "
        "attributes":
          "fish"="bicycle"
          "west"="true"
    "content"="
contents of foo
  
"
    "attributes":
      "height"="2"
      "width"="1"
  2:
    "line"=8
    "name"="goat"
    "content"="eep"
    "attributes":
      "blood"="100"



You can see from the above that the "root" node is really just an unnamed node which is the placeholder for the top level nodes (ie. the first "real" node is a child of the root node). In this case the node "foo" is the first child of the root node. The node "goat" is the 2nd child of the root node.

Custom entities are declared in the <!DOCTYPE> directive, like this:


<!DOCTYPE muclient [
  <!ENTITY afk_command "afk" > 
  <!ENTITY timer_mins "5" > 
  <!ENTITY timer_secs "0" > 
  <!ENTITY afk_message "You are now afk." > 
  <!ENTITY not_afk_message "You are no longer afk." > 
]>


If your XML document contains such entries, they will appear in the "custom entities" table returned as the 3rd result from utils.xmlread.

Note that custom entities are automatically replaced in the body of the document, it is not possible to reconstruct from the nodes where they occurred.

Another example, parsing a standard alias:


<aliases>
  <alias
   name="test"
   match="eat"
   sequence="100"
  >
  <send>eat food</send>
  </alias>
</aliases>


Using tprint to print the result gives this:


"line"=0
"name"=""
"nodes":
  1:
    "line"=2
    "name"="aliases"
    "nodes":
      1:
        "line"=3
        "name"="alias"
        "nodes":
          1:
            "line"=8
            "content"="eat food"
            "name"="send"
        "content"="
  
  "
        "attributes":
          "sequence"="100"
          "name"="test"
          "match"="eat"
    "content"="
  
"
"content"=""


Here you can see the first child node (key 1, name "aliases") is the "all aliases" node. Under that (a child of that) is the node (key 1, name "alias") for the first alias. That also has a child, the <send> node, which is what the alias sends.

Thus the hierarchy is:

root (unnamed) -> aliases -> alias -> send


- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Nick Gammon   Australia  (21,322 posts)  [Biography] bio   Forum Administrator
Date Reply #1 on Sun 13 Nov 2005 03:18 AM (UTC)

Amended on Sun 13 Nov 2005 03:33 AM (UTC) by Nick Gammon

Message
As an example of using the returned XML information, this is a simple Lua function that would re-write its contents as XML again. It does not handle every case (such as the doctype and entities) however it shows the general idea ...


function writenode (node)

  -- root node won't have a name
  if node.name ~= "" then

    -- show node name followed by attributes (if any)
    Tell ("<" .. node.name)

    if node.attributes then
      print ""
      for k, v in pairs (node.attributes) do
        print ("  " .. k .. '="' .. FixupHTML (v) .. '"')
      end -- doing attributes
    end -- if

    if node.empty then
      print ("/>")
      return  -- no closing tag
    else
      Tell (">")
    end -- if

  end -- if have a node name

  -- print node contents
  Tell (FixupHTML (node.content))

  -- do children
  if node.nodes then
    for k, v in ipairs (node.nodes) do
      writenode (v)
    end -- for
  end -- of having children

  -- root node won't have a name
  if node.name ~= "" then

	-- closing tag
    print ("</" .. node.name .. ">")

  end -- if have a node name
  
end -- writenode 



If we call this for the above alias, like this:


writenode (a)


We get the following XML output, which is similar to what we had in the first place:


<aliases>
  
<alias
  sequence="100"
  name="test"
  match="eat"
>
  
  <send>eat food</send>
</alias>
</aliases>



- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Ked   Russia  (524 posts)  [Biography] bio
Date Reply #2 on Sun 13 Nov 2005 07:29 AM (UTC)
Message
Would it be possible to also add a string splitting function, and possibly a microsecond timestamping one to the same table? I've found this code for splitting in PIL and compiled it as per your instructions on extending Lua. It seems to work OK, though I can't tell if it's really up to speed or not:


#define LUA_API __declspec(dllexport)

#pragma comment( lib, "lua.lib" )
#pragma comment( lib, "lualib.lib" )

#include "lua.h"

#include "lauxlib.h"
#include "lualib.h"

   static int l_split (lua_State *L) {
      const char *s = luaL_checkstring(L, 1);
      const char *sep = luaL_checkstring(L, 2);
      const char *e;
      int i = 1;
    
      lua_newtable(L);  /* result */
    
      /* repeat for each separator */
      while ((e = strchr(s, *sep)) != NULL) {
        lua_pushlstring(L, s, e-s);  /* push substring */
        lua_rawseti(L, -2, i++);
        s = e + 1;  /* skip separator */
      }
    
      /* push last substring */
      lua_pushstring(L, s);
      lua_rawseti(L, -2, i);
    
      return 1;  /* return the table */
    }
    
static const luaL_reg strlib[] = 
{
  {"split", l_split},
  {NULL, NULL}
};

/*
** Open test library
*/
LUALIB_API int luaopen_strlib (lua_State *L)
 {
  luaL_openlib(L, "strlib", strlib, 0);
  return 1;
 }



[Go to top] top

Posted by Nick Gammon   Australia  (21,322 posts)  [Biography] bio   Forum Administrator
Date Reply #3 on Sun 13 Nov 2005 07:55 PM (UTC)
Message
I tried out your code, it seems to work OK. You realise that it does something similar to what I describe on:

http://www.gammon.com.au/forum/bbshowpost.php?bbsubject_id=6034

It is certainly faster, taking 11 seconds in my test compared to 34 seconds the Lua way (to do 1000000 iterations).

It also behaves slightly differently, returning 1 more element in the case of the speedwalk example:


1="north"
2="north"
3="north"
4="north"
5="north"
6="north"
7="east"
8="east"
9="east"
10="south"
11="south"
12="south"
13="south"
14="ne"
15="ne"
16="ne"
17=""


Your entry 17 seems to be the "empty" text beyond the final newline. I'm not sure whether it should really be there, although you might argue it should be. A test you could add is:



     /* push last substring, if it exists */
     if (*s)
       {
       lua_pushstring(L, s);
       lua_rawseti(L, -2, i);
       }



Anyway, apart from the speed increase, this doesn't really add anything that can't be done in straight Lua, and I am a bit reluctant to expand the library with useful utilities that can already be done reasonably quickly and easily in straight Lua. However feel free to argue for its inclusion, it isn't much extra code.

This contrasts with the new things I recently added:


  • Directory scanner

  • XML parser


The directory scanner probably simply wasn't possible in MUSHclient and Lua, and the XML parser would have been tedious to write in Lua.


Quote:

... possibly a microsecond timestamping one to the same table?


You mean, like GetInfo (232)?

http://www.gammon.com.au/scripts/doc.php?function=GetInfo

That is already available from the above function call.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Ked   Russia  (524 posts)  [Biography] bio
Date Reply #4 on Mon 14 Nov 2005 01:49 AM (UTC)
Message
As for GetInfo(232) - must've missed it, but it works great. Lua's timing was so bad that I had to Google for a way to cram RDTSC into a dll.

As for the split function... I think it's needed. Why Lua's string library doesn't have it, especially since their documentation has the code for it, is beyond me, but this is a very basic and a very useful thing. Lua has table.concat but is missing string.split for some absurd reason. Basically, if you want to accomplish something as trivial as checking a trigger name in a function by splitting it, you need to either c/p the split() function into your script, or require a file that has it, or compile it yourself and load it as a library.

What I am getting at is that this is a pretty standard thing. I grew used to it first in vbs and then in Python, and Lua not having it out of the box is a bit of a shock. I understand your concerns about it not fitting ideologically together with XML and directory parsing, but at the same time it seems to be a very small and a very useful thing that could maybe slip by.
[Go to top] top

Posted by Nick Gammon   Australia  (21,322 posts)  [Biography] bio   Forum Administrator
Date Reply #5 on Mon 14 Nov 2005 02:34 AM (UTC)

Amended on Mon 14 Nov 2005 02:47 AM (UTC) by Nick Gammon

Message
OK, I agree it is rather asymmetric of Lua to provide a function to turn a string into a table, but not vice-versa.

I'll add that as another "utils" function.

I have added a test that the separator be a single character. With your code you could conceivably pass a multi-byte string, which would give unexpected results, or an empty string, which would fail in strange ways.

I have also added a "split count" as the optional 3rd argument. A couple of MUSHclient callbacks (MXP, I think) are passed something like:

arg=value

Where "value" might contains any characters, including the "=" symbol. Thus in this case you would split with one replacement.


- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.


3,875 views.

It is now over 60 days since the last post. This thread is closed.     [Refresh] Refresh page

Go to topic:           Search the forum


[Go to top] top

Quick links: MUSHclient. MUSHclient help. Forum shortcuts. Posting templates. Lua modules. Lua documentation.

Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.

[Home]


Written by Nick Gammon - 5K   profile for Nick Gammon on Stack Exchange, a network of free, community-driven Q&A sites   Marriage equality

Comments to: Gammon Software support
[RH click to get RSS URL] Forum RSS feed ( https://gammon.com.au/rss/forum.xml )

[Best viewed with any browser - 2K]    [Hosted at FutureQuest]