Gammon Forum : MUSHclient : Lua : Use LPEG (and "re") to iterate over a string

Log on

Gammon Forum

Entire forum

MUSHclient

Lua

Use LPEG (and "re") to iterate over a string

Use LPEG (and "re") to iterate over a string

It is now over 60 days since the last post. This thread is closed. Refresh page

Posted by Nick Gammon Australia (22,975 posts) bio Forum Administrator

Date

Fri 17 Jun 2016 09:20 PM (UTC)

Amended on Sun 19 Jun 2016 10:04 PM (UTC) by Nick Gammon

Message

Following on from my post about LPEG: http://www.gammon.com.au/forum/?id=8683

LPEG in its simple form matches on a line anchored to the start.

Example:


require "re"  -- LPEG regular expression module

local target = "the quick brown fox jumped over the lazy dog"
local grammar = "{%a+}"

print (re.match (target, grammar))   --> the

In this case the rule is to match one or more alphabetic characters (%a+) and capture the result (hence the braces).

string.gmatch equivalent

But what if we want to match anywhere in the line, preferably more than once, like string.gmatch does?

To do this we make up a grammar that looks for the matching word, but if it fails, consumes one character, and then tries again. This is what this rule does:


  line              <- (word / .)*

That is saying that the rule called "line" tries to match a word (defined below) and if it doesn't match, take the "or" branch (indicated by a "/" symbol) and matches a single character instead. This moves the match on by one further into the string. Then the brackets and "*" symbol cause the expression to be repeated indefinitely, thus matching multiple words.

The second rule:


  word              <-  %a+ -> gotWord

... matches on one or more letters. If it matches it calls the function "gotWord". For the grammar to know that gotWord exists, it is mentioned in a table, which is the second argument to re.compile.


require "re"

local target = "the quick brown fox jumped over the lazy dog"

local function gotWord (which)
  print (which)
end -- gotWord

local grammar = re.compile ([[
  line              <- (word / .)*
  word              <-  %a+ -> gotWord
]], { gotWord = gotWord } )

-- run grammar on target text
grammar:match (target)

The result is printed:


the
quick
brown
fox
jumped
over
the
lazy
dog

Since most of that is static, we can make a function to do it:


require "re"

function gmatch (str, pattern, func)
  re.compile  ( "line  <- (word / .)*  word <-  " .. pattern .. " -> func", 
                { func = func} ):match (str)
end

local function gotWord (which)
  print (which)
end -- gotWord

gmatch ("the quick brown fox jumped over the lazy dog", "%a+", gotWord)

Now we just call "gmatch" passing down the string we want to look at, the pattern, and the function to be called for each match.

string.gsub equivalent

Now we can make something equivalent to string.gsub by using captures.

This time we change the first rule to be:


  line              <- {| (word / {.})*  |}

This adds two extra things. First we "capture" anything that is not a word by putting braces around the period. Second we put the entire expression inside {| ... |} which makes a "table capture". This is returned by the match. If we concatenate that table together we get everything we matched on (and captured).

Finally we change the gotWord function to convert the matching word to upper case (as an example).


require "re"

local target = "the quick brown fox jumped over the lazy dog"

local function gotWord (which)
  return which:upper ()
end -- gotWord

local grammar = re.compile ([[
  line              <- {| (word / {.})*  |}
  word              <- %a+ -> gotWord
]], { gotWord = gotWord } )

-- run grammar on target text
result = grammar:match (target)

-- concatenate all captures
print (table.concat (result))

Output from the above:


THE QUICK BROWN FOX JUMPED OVER THE LAZY DOG

This can also be turned into a general function:


require "re"

function gsub (str, pattern, func)
  return table.concat (re.compile  ( 
                       "line <- {| (word / {.})* |} word <- " .. pattern .. " -> func", 
                       { func = func} ):match (str))
end

print (gsub ("the quick brown fox jumped over the lazy dog", "%a+", string.upper))

In this case I am passing string.upper as the handling function, so that just converts every matching word to upper case. You could pass other functions, like string.reverse:


print (gsub ("the quick brown fox jumped over the lazy dog", "%a+", string.reverse))

Output:


eht kciuq nworb xof depmuj revo eht yzal god

re.gsub can also be used

The "re" library already has a "gsub" function, so the above example can be written as:


print (re.gsub ("the quick brown fox jumped over the lazy dog", "%a+", string.reverse))

The replacement (third argument) can be a string, function, or table.

References:

- Nick Gammon

www.gammon.com.au, www.mushclient.com

top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.

5,314 views.

It is now over 60 days since the last post. This thread is closed. Refresh page

Go to topic: Search the forum

top

Quick links: MUSHclient. MUSHclient help. Forum shortcuts. Posting templates. Lua modules. Lua documentation.

Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.