Searches a string for a pattern
Find the first match of the regular expression "pattern" in "str", starting at position "index".
string.match which operates in a similar way, but does not return the start and end positions
string.gmatch which iterates over a string, allowing you to take action on each match (eg. on each word)
string.gsub which lets you make replacements on matching elements (for example, replace one word with another, or make certain things all upper-case)
The standard patterns (character classes) you can search for are:
Important! - the uppercase versions of the above represent the complement of the class. eg. %U represents everything except uppercase letters, %D represents everything except digits.
Also important! If you are using string.find (or string.match etc.) in MUSHclient, and inside "send to script" in a trigger or alias, then the % sign has special meaning there (it is used to identify wildcards, for example, %1 is wildcard 1). Thus the % signs in string.find need to be doubled or they won't work properly (so use %%d instead of %d in "send to script"). This does not apply if you are scripting in a script file, because the expansion of wildcards does not apply there.
There are some "magic characters" (such as %) that have special meanings. These are:
If you want to use those in a pattern (as themselves) you must precede them by a % symbol.
eg. %% would match a single % (also see note above about "send to script")
In practice, it is safe to put % in front of any non-alphanumeric character. If in doubt, put a % in front of a special character.
Quotes and backslashes
The arguments to string.find (and string.match, etc.) are just normal Lua strings. Thus, to put a backslash or quote inside such a string you still need to "escape" it with a backslash in the usual way.
eg. string.find (str, "\\") -- find a single backslash
You can build your own pattern classes (sets) by using square brackets, eg.
You can use pattern classes in the form %x in the set. If you use other characters (like periods and brackets, etc.) they are simply themselves.
You can specify a range of character inside a set by using simple characters (not pattern classes like %a) separated by a hyphen. For example, [A-Z] or [0-9]. These can be combined with other things. For example [A-Z0-9] or [A-Z,.].
The end-points of a range must be given in ascending order. That is, [A-Z] would match upper-case letters, but [Z-A] would not match anything.
A hyphen at the start or end of a set is itself (matches a hyphen).
You can negate a set by starting it with a "^" symbol, thus [^0-9] is everything except the digits 0 to 9. The negation applies to the whole set, so [^%a%d] would match anything except letters or digits. In anywhere except the first position of a set, the "^" symbol is simply itself.
Inside a set (that is a sequence delimited by square brackets) the only "magic" characters are:
Thus, inside a set, characters like "." and "?" are just themselves.
The repetition characters, which can follow a character, class or set, are:
A "greedy" match will match on as many characters as possible, a non-greedy one will match on as few as possible.
Anchor to start and/or end of string
The standard "anchor" characters apply:
You can also use round brackets to specify "captures":
Here, whatever matches (.*) becomes the first capture.
You can also refer to matched substrings (captures) later on in an expression:
This example shows how you can look for a repetition of a word matched earlier, whatever that word was ("dogs" in this case).
As a special case, an empty capture string returns as the captured pattern, the position of itself in the string. eg.
What this is saying is that the word "dogs" starts at column 9.
There is a limit of 32 captures that can be returned.
Finally you can look for nested "balanced" things (such as parentheses) by using %b, like this:
After %b you put 2 characters, which indicate the start and end of the balanced pair. If it finds a nested version it keeps processing until we are back at the top level. In this case the matching string was "(big fish (swimming) in the pond)".
A "frontier" (or boundary) pattern is used to assert a transition from one set of characters to another (eg. non-letters to letters, or non-digits to digits). This can be useful to detect words, such as "log" but omit "blog" or "logging".
A frontier is specified as %f[set] and matches on a transition from not-in-set to in-set. For example, to match "log" on its own:
The first frontier ("%f[%a]") matches on the transition from not-letters to letters. The second frontier ("%f[%A]") matches on letters to not-letters. Effectively this gives you a word boundary match.
See Also ...
string.byte - Converts a character into its ASCII (decimal) equivalent
Lua base functions
(Help topic: lua=string.find)
Enter a search string to find matching documentation.
Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.
Gammon Software support
Forum RSS feed ( https://gammon.com.au/rss/forum.xml )