[Home] [Downloads] [Search] [Help/forum]

Gammon Forum

See www.mushclient.com/spam for dealing with forum spam. Please read the MUSHclient FAQ!

[Folder]  Entire forum
-> [Folder]  MUSHclient
. -> [Folder]  Lua
. . -> [Subject]  Use LPEG (and "re") to iterate over a string
Home  |  Users  |  Search  |  FAQ
Username:
Register forum user name
Password:
Forgotten password?

Use LPEG (and "re") to iterate over a string

It is now over 60 days since the last post. This thread is closed.     [Refresh] Refresh page


Posted by Nick Gammon   Australia  (21,322 posts)  [Biography] bio   Forum Administrator
Date Fri 17 Jun 2016 09:20 PM (UTC)

Amended on Sun 19 Jun 2016 10:04 PM (UTC) by Nick Gammon

Message
Following on from my post about LPEG: http://www.gammon.com.au/forum/?id=8683

LPEG in its simple form matches on a line anchored to the start.

Example:


require "re"  -- LPEG regular expression module

local target = "the quick brown fox jumped over the lazy dog"
local grammar = "{%a+}"

print (re.match (target, grammar))   --> the 


In this case the rule is to match one or more alphabetic characters (%a+) and capture the result (hence the braces).

string.gmatch equivalent


But what if we want to match anywhere in the line, preferably more than once, like string.gmatch does?

To do this we make up a grammar that looks for the matching word, but if it fails, consumes one character, and then tries again. This is what this rule does:


  line              <- (word / .)*


That is saying that the rule called "line" tries to match a word (defined below) and if it doesn't match, take the "or" branch (indicated by a "/" symbol) and matches a single character instead. This moves the match on by one further into the string. Then the brackets and "*" symbol cause the expression to be repeated indefinitely, thus matching multiple words.

The second rule:


  word              <-  %a+ -> gotWord


... matches on one or more letters. If it matches it calls the function "gotWord". For the grammar to know that gotWord exists, it is mentioned in a table, which is the second argument to re.compile.



require "re"

local target = "the quick brown fox jumped over the lazy dog"

local function gotWord (which)
  print (which)
end -- gotWord

local grammar = re.compile ([[
  line              <- (word / .)*
  word              <-  %a+ -> gotWord
]], { gotWord = gotWord } )

-- run grammar on target text
grammar:match (target)


The result is printed:


the
quick
brown
fox
jumped
over
the
lazy
dog


Since most of that is static, we can make a function to do it:


require "re"

function gmatch (str, pattern, func)
  re.compile  ( "line  <- (word / .)*  word <-  " .. pattern .. " -> func", 
                { func = func} ):match (str)
end

local function gotWord (which)
  print (which)
end -- gotWord

gmatch ("the quick brown fox jumped over the lazy dog", "%a+", gotWord)


Now we just call "gmatch" passing down the string we want to look at, the pattern, and the function to be called for each match.


string.gsub equivalent


Now we can make something equivalent to string.gsub by using captures.

This time we change the first rule to be:


  line              <- {| (word / {.})*  |}


This adds two extra things. First we "capture" anything that is not a word by putting braces around the period. Second we put the entire expression inside {| ... |} which makes a "table capture". This is returned by the match. If we concatenate that table together we get everything we matched on (and captured).

Finally we change the gotWord function to convert the matching word to upper case (as an example).


require "re"

local target = "the quick brown fox jumped over the lazy dog"

local function gotWord (which)
  return which:upper ()
end -- gotWord

local grammar = re.compile ([[
  line              <- {| (word / {.})*  |}
  word              <- %a+ -> gotWord
]], { gotWord = gotWord } )

-- run grammar on target text
result = grammar:match (target)

-- concatenate all captures
print (table.concat (result))


Output from the above:


THE QUICK BROWN FOX JUMPED OVER THE LAZY DOG


This can also be turned into a general function:


require "re"

function gsub (str, pattern, func)
  return table.concat (re.compile  ( 
                       "line <- {| (word / {.})* |} word <- " .. pattern .. " -> func", 
                       { func = func} ):match (str))
end

print (gsub ("the quick brown fox jumped over the lazy dog", "%a+", string.upper))


In this case I am passing string.upper as the handling function, so that just converts every matching word to upper case. You could pass other functions, like string.reverse:


print (gsub ("the quick brown fox jumped over the lazy dog", "%a+", string.reverse))


Output:


eht kciuq nworb xof depmuj revo eht yzal god


re.gsub can also be used


The "re" library already has a "gsub" function, so the above example can be written as:


print (re.gsub ("the quick brown fox jumped over the lazy dog", "%a+", string.reverse))


The replacement (third argument) can be a string, function, or table.




References:



- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.


1,371 views.

It is now over 60 days since the last post. This thread is closed.     [Refresh] Refresh page

Go to topic:           Search the forum


[Go to top] top

Quick links: MUSHclient. MUSHclient help. Forum shortcuts. Posting templates. Lua modules. Lua documentation.

Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.

[Home]


Written by Nick Gammon - 5K   profile for Nick Gammon on Stack Exchange, a network of free, community-driven Q&A sites   Marriage equality

Comments to: Gammon Software support
[RH click to get RSS URL] Forum RSS feed ( https://gammon.com.au/rss/forum.xml )

[Best viewed with any browser - 2K]    [Hosted at FutureQuest]