Posted by
| Nick Gammon
Australia (22,975 posts) bio
Forum Administrator |
Message
| Following on from my post about LPEG: http://www.gammon.com.au/forum/?id=8683
LPEG in its simple form matches on a line anchored to the start.
Example:
require "re" -- LPEG regular expression module
local target = "the quick brown fox jumped over the lazy dog"
local grammar = "{%a+}"
print (re.match (target, grammar)) --> the
In this case the rule is to match one or more alphabetic characters (%a+) and capture the result (hence the braces).
string.gmatch equivalent
But what if we want to match anywhere in the line, preferably more than once, like string.gmatch does?
To do this we make up a grammar that looks for the matching word, but if it fails, consumes one character, and then tries again. This is what this rule does:
That is saying that the rule called "line" tries to match a word (defined below) and if it doesn't match, take the "or" branch (indicated by a "/" symbol) and matches a single character instead. This moves the match on by one further into the string. Then the brackets and "*" symbol cause the expression to be repeated indefinitely, thus matching multiple words.
The second rule:
... matches on one or more letters. If it matches it calls the function "gotWord". For the grammar to know that gotWord exists, it is mentioned in a table, which is the second argument to re.compile.
require "re"
local target = "the quick brown fox jumped over the lazy dog"
local function gotWord (which)
print (which)
end -- gotWord
local grammar = re.compile ([[
line <- (word / .)*
word <- %a+ -> gotWord
]], { gotWord = gotWord } )
-- run grammar on target text
grammar:match (target)
The result is printed:
the
quick
brown
fox
jumped
over
the
lazy
dog
Since most of that is static, we can make a function to do it:
require "re"
function gmatch (str, pattern, func)
re.compile ( "line <- (word / .)* word <- " .. pattern .. " -> func",
{ func = func} ):match (str)
end
local function gotWord (which)
print (which)
end -- gotWord
gmatch ("the quick brown fox jumped over the lazy dog", "%a+", gotWord)
Now we just call "gmatch" passing down the string we want to look at, the pattern, and the function to be called for each match.
string.gsub equivalent
Now we can make something equivalent to string.gsub by using captures.
This time we change the first rule to be:
line <- {| (word / {.})* |}
This adds two extra things. First we "capture" anything that is not a word by putting braces around the period. Second we put the entire expression inside {| ... |} which makes a "table capture". This is returned by the match. If we concatenate that table together we get everything we matched on (and captured).
Finally we change the gotWord function to convert the matching word to upper case (as an example).
require "re"
local target = "the quick brown fox jumped over the lazy dog"
local function gotWord (which)
return which:upper ()
end -- gotWord
local grammar = re.compile ([[
line <- {| (word / {.})* |}
word <- %a+ -> gotWord
]], { gotWord = gotWord } )
-- run grammar on target text
result = grammar:match (target)
-- concatenate all captures
print (table.concat (result))
Output from the above:
THE QUICK BROWN FOX JUMPED OVER THE LAZY DOG
This can also be turned into a general function:
require "re"
function gsub (str, pattern, func)
return table.concat (re.compile (
"line <- {| (word / {.})* |} word <- " .. pattern .. " -> func",
{ func = func} ):match (str))
end
print (gsub ("the quick brown fox jumped over the lazy dog", "%a+", string.upper))
In this case I am passing string.upper as the handling function, so that just converts every matching word to upper case. You could pass other functions, like string.reverse:
print (gsub ("the quick brown fox jumped over the lazy dog", "%a+", string.reverse))
Output:
eht kciuq nworb xof depmuj revo eht yzal god
re.gsub can also be used
The "re" library already has a "gsub" function, so the above example can be written as:
print (re.gsub ("the quick brown fox jumped over the lazy dog", "%a+", string.reverse))
The replacement (third argument) can be a string, function, or table.
References:
|
- Nick Gammon
www.gammon.com.au, www.mushclient.com | top |
|