[Home] [Downloads] [Search] [Help/forum]

Gammon Forum

See www.mushclient.com/spam for dealing with forum spam. Please read the MUSHclient FAQ!

[Folder]  Entire forum
-> [Folder]  MUSHclient
. -> [Folder]  Lua
. . -> [Subject]  Loading delimited file with multi-line string
Home  |  Users  |  Search  |  FAQ
Username:
Register forum user name
Password:
Forgotten password?

Loading delimited file with multi-line string

It is now over 60 days since the last post. This thread is closed.     [Refresh] Refresh page


Posted by LezChap   (36 posts)  [Biography] bio
Date Mon 20 Sep 2010 11:00 PM (UTC)
Message
I've got a comma-delimited file that I exported from Access that I'm trying to read, and split into table items. Problem is one of the fields contains multiple lines, and I'm not sure how to process it.

An example of one exported record from the database:
34172,"The Hitching Post","","Flail Smith Guild","This is the local tavern. There are round tables up near the front, with
the bar at the back just in front of a small kitchen. Hefty mugs are set on
the bar top, large enough to quench a powerful thirst. There is a space
clear for dancing at the center of the tavern.
",0,-1,34172,0,-1200,-21840,0,0,0,0,0,0,0,11812261,-1,11,1,,,,0,,0,0,0,1/3/2006 19:29:31,114,-1,1/3/2006 19:29:31,0



  local filename = utils.filepicker ("ObjectTbl File", "ObjectTbl.txt", "txt", { txt = "Text files", ["*"] = "All files" }, false)
  
  if not filename then
    return -- they cancelled
  end
  
  rooms = {}
  local num = 0
  for line in io.lines (filename) do
    num = num + 1
    line = string.gsub (line, '"', '')
    rooms[num] = utils.split (line, ",")
  end 


Now I'm hitting two problems. First, the code above isn't processing the multi-line records properly. Second, the commas in the strings for some of the fields causes unwanted splits when ran through utils.split. If using a different delimiter would be the easiest solution for the second problem, that's easily done...but I can't figure out how to get around the first. Any hints or suggestions?

This is for my attempt to make a dirty Zmud to MUSHclient mapper conversion script, which I'm more than happy to share when I finish.
[Go to top] top

Posted by Twisol   USA  (2,257 posts)  [Biography] bio
Date Reply #1 on Mon 20 Sep 2010 11:18 PM (UTC)

Amended on Mon 20 Sep 2010 11:19 PM (UTC) by Twisol

Message
A naive solution would see if the number of "'s in a line is even or odd, and start/end a line or add to an existing one based on that. Here's the even/odds for the "'s in your example row:

ODD
EVEN
EVEN
EVEN
ODD


So given this, everything from the last quote in an ODD line to the first quote in the next ODD line is part of a single string. A newline on an EVEN line that's not part of a single string marks the end of a row.

Caveat: CSV allows you to escape quotes within strings, so you might see "A sign says \"Beware Of Dog\" in bright red", for example. More complex logic is required to handle that case.

[EDIT]: Caveat two: if one string is ended on the same line as another string is started, it will appear as EVEN. Tricky business, I'm not sure how to handle that. Overall I'd prefer a state machine approach.

'Soludra' on Achaea

Blog: http://jonathan.com/
GitHub: http://github.com/Twisol
[Go to top] top

Posted by Nick Gammon   Australia  (21,322 posts)  [Biography] bio   Forum Administrator
Date Reply #2 on Mon 20 Sep 2010 11:27 PM (UTC)
Message
Let's move away from comma-delimited, that is causing the problems.

http://www.gammon.com.au/forum/bbshowpost.php?id=6324

You can use the techniques described there to get at your Access database in Lua, then the issue of multiple lines, and commas, just goes away.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by LezChap   (36 posts)  [Biography] bio
Date Reply #3 on Mon 20 Sep 2010 11:27 PM (UTC)
Message
Adding a few more examples of ones that contain quotes:

299,"The Caemlyn Road",,,"A sudden narrowing in the highway weaves in and out between the brown
hills. Weary travelers are often seen plodding here along on their journeys
to Caemlyn, or part of the wagon groups commuting between Lugard, Caemlyn
and Whitebridge. A few rotten trees stand huddled in a copse, with a sign 
nailed on one showing how it can be used to ""block"" or ""clear"" ""the road"".
",0,-1,299,0,3840,-5520,0,0,0,0,0,0,0,32896,-1,11,1,,,,0,,0,0,0,,4,-1,,

3404,"Edge of the Mire",,,"More mud covers the ground as far as the eye can see. But to call it ""ground""
is to use the term loosely indeed; it is neither firm nor safe nor nourishing
nor least of all natural. There is an oppressive air of anxiety that hangs 
over the region. Progress south is impossible; the Mire is far too watery.
",0,-1,3404,0,-12240,16680,0,0,0,0,0,0,0,536870911,-1,11,1,,,,0,,0,0,0,,15,-1,,

3617,"Downward Tunnel",,,"The tunnel suddenly seems warm here in contrast to the slight chill of the
main cave, and its floor begins to slope downward as evidenced by the path
of free-falling water drops which land with a ""splatt"" which echoes dully
inside the tunnel.
",0,-1,3617,0,-22800,13320,0,0,0,0,0,0,0,8421504,-1,11,1,,,,0,,0,0,0,,15,-1,,

4013,"Old Road",,,"The road becomes rather monotonous as it trudges along in its unchanging path.
The trees of the Westwood stand off to the west of the road, offering some
relief from the dusty boredom of the road. ""Old Road"" the people of the area
call it. An apt name; it seems not to have changed in the memorable past.
",0,-1,4013,0,-19920,27960,0,0,0,0,0,0,0,32896,-1,11,1,,,,0,,0,0,0,,15,-1,,


So far looks like that logic will work...I'll look at more of the records to be sure.
[Go to top] top

Posted by Twisol   USA  (2,257 posts)  [Biography] bio
Date Reply #4 on Mon 20 Sep 2010 11:32 PM (UTC)

Amended on Mon 20 Sep 2010 11:34 PM (UTC) by Twisol

Message
I'd agree with Nick in general, it's easier to get at the data directly if possible. That said, it looks like I was wrong about the quoting! It seems that Access is exporting it so that "'s are escaped to "". That means the only quote-counting issue is the one where a string ends and another string begins on the same line. Given the format you've presented, I don't think that will be a major issue here.

'Soludra' on Achaea

Blog: http://jonathan.com/
GitHub: http://github.com/Twisol
[Go to top] top

Posted by LezChap   (36 posts)  [Biography] bio
Date Reply #5 on Tue 21 Sep 2010 12:13 AM (UTC)
Message
Looking good so far...though I had to figure out how to create an ODBC source to connect through via the Windows Control Panel...

Just so others can find the information/steps to do so:

Under Control Panel, open "Administrative Tools".
Then open "Data Sources(ODBC)".
Select an appropriate driver (I selected "Driver do Microsoft Access (*.mdb)")
Enter a Name --This is what you'll use in the "con = assert (env:connect ("Name"))" line.
Enter a Description --not sure if it's needed
Click "Select" under "Database" and find your MS Access file.

After you OK and get out of those windows, you should have access via the ODBC code that Nick linked above.
[Go to top] top

Posted by Nick Gammon   Australia  (21,322 posts)  [Biography] bio   Forum Administrator
Date Reply #6 on Tue 21 Sep 2010 01:15 AM (UTC)
Message
I'm presuming you will ending up writing back to SQLite3 database, as SQLite3 support is built into MUSHclient. There might be an easier way, just get Access to export the SQL and import that into sqlite3. However the syntax may be slightly different.

Another approach that might work is to simply get at the Access database from your scripts - after all, you have worked out how to do it.

But for future compatibility, and ease of distribution to other players, converting to SQLite3 is probably better.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by WillFa   USA  (525 posts)  [Biography] bio
Date Reply #7 on Tue 21 Sep 2010 03:07 AM (UTC)

Amended on Tue 21 Sep 2010 03:10 AM (UTC) by WillFa

Message
I wrote a quick function in lpeg to parse the CSV...



function ParseCSV(str)
    local Ct, C, P, digit, V, S, Cc = lpeg.Ct, lpeg.C, lpeg.P, lpeg.digit, lpeg.V, lpeg.S, lpeg.Cc
    local grammar = P{  Ct( V("record")^1 ) ,
                    record = Ct(  V"Value" * (V"delim" * (V"Value" + Cc""))^0 * V"EOR") ,
                    Value = ( V"Date" + V"Digit" + V"String"),
                    delim = P(","),
                    EOR = P"\n" + P(-1),
                    Digit = C(P("-")^-1 * digit^1) / tonumber ,
                    String = P'"' * C(( P'""' + P'\"' + 1-P'"')^0) * P'"',
                    Date = C( digit^-4 * P"/" * digit^-2 * P"/" * digit^-4 *
                    P" " *
                    digit^-2 * P":" * digit^-2 * P":" * digit^-2 )
                    }
        return grammar:match(str)
end


I'll leave it up to you to script out opening the file and doing a read all to get One Big String (tm!) with the CSV... Usage otherwise is:

--open and read file into a string Foo

AreasTbl = ParseCSV(Foo)
tprint(AreasTbl)


I didn't test this extensively, but it does parse the first record in your sample above correctly...
1:
  1=34172
  2="The Hitching Post"
  3=""
  4="Flail Smith Guild"
  5="This is the local tavern. There are round tables up near the front, with
the bar at the back just in front of a small kitchen. Hefty mugs are set on
the bar top, large enough to quench a powerful thirst. There is a space
clear for dancing at the center of the tavern.
"
  6=0
  7=-1
  8=34172
  9=0
  10=-1200
  11=-21840
  12=0
  13=0
  14=0
  15=0
  16=0
  17=0
  18=0
  19=11812261
  20=-1
  21=11
  22=1
  23=""
  24=""
  25=""
  26=0
  27=""
  28=0
  29=0
  30=0
  31="1/3/2006 19:29:31"
  32=114
  33=-1
  34="1/3/2006 19:29:31"
  35=0

Oh, I had to make a small concession and jump through a small hoop... Make sure the first field in the record isn't blank. Other than that, it'll parse multiline strings with embedded quotes correctly
[Go to top] top

Posted by LezChap   (36 posts)  [Biography] bio
Date Reply #8 on Tue 21 Sep 2010 04:01 AM (UTC)

Amended on Tue 21 Sep 2010 10:48 PM (UTC) by LezChap

Message
Nick Gammon said:

I'm presuming you will ending up writing back to SQLite3 database, as SQLite3 support is built into MUSHclient. There might be an easier way, just get Access to export the SQL and import that into sqlite3. However the syntax may be slightly different.

Another approach that might work is to simply get at the Access database from your scripts - after all, you have worked out how to do it.

But for future compatibility, and ease of distribution to other players, converting to SQLite3 is probably better.


That was my plan...using the built-in SQLite support for portability to other players with the Mapper plugin itself. This is just an added "conversion" utility to help people from other Muds convert Zmud databases they may have access to so it will be compatible with the more universal mapping plugin.

That, and the Zmud database would need to be translated for UIDs on MUDs that don't send such data (via ATCP or other means), among other things. Just seems simpler to convert the database once than trying to reprocess it on every Mapper load.
[Go to top] top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.


7,295 views.

It is now over 60 days since the last post. This thread is closed.     [Refresh] Refresh page

Go to topic:           Search the forum


[Go to top] top

Quick links: MUSHclient. MUSHclient help. Forum shortcuts. Posting templates. Lua modules. Lua documentation.

Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.

[Home]


Written by Nick Gammon - 5K   profile for Nick Gammon on Stack Exchange, a network of free, community-driven Q&A sites   Marriage equality

Comments to: Gammon Software support
[RH click to get RSS URL] Forum RSS feed ( https://gammon.com.au/rss/forum.xml )

[Best viewed with any browser - 2K]    [Hosted at FutureQuest]