Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to verify your details, confirm your email, resolve issues, making threats, or asking for money, are
spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the
password reset link.
Due to spam on this forum, all posts now need moderator approval.
Entire forum
➜ MUSHclient
➜ General
➜ fun with regex
It is now over 60 days since the last post. This thread is closed.
Refresh page
Posted by
| Tsunami
USA (204 posts) Bio
|
Date
| Sat 27 Nov 2004 08:19 PM (UTC) |
Message
| I'm afraid this isn't exactly MC related, but I had some problems with a regex. It's purpose is to match on parts of a string where that string is to split. The string will be an equation, and I want to split it into it's component parts. For example "5+-2 / 7" should matched on the empty strings in between 5, +, and -2, and the spaces between -2, /, and 7. The end result of the split should be ["5","+","-2","/","7"]. The tricky part of course is the -2. I have a basic regex so far, which I will post as soon as I find it. I'm using back references to split before operators, but I am puzzeled as how to split after operators while taking into account a possible negative sign. Any help would be appreciated! Thanks, Tsunami. | Top |
|
Posted by
| David Haley
USA (3,881 posts) Bio
|
Date
| Reply #1 on Sat 27 Nov 2004 10:19 PM (UTC) |
Message
| If I'm not mistaken, this kind of work is impossible to do correctly with regexes - you would need a parser to actually get this done properly. You could hack together a regex to get it working in most simple cases, but as you've already discovered you rapidly run into very serious problems. |
David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone
http://david.the-haleys.org | Top |
|
Posted by
| Flannel
USA (1,230 posts) Bio
|
Date
| Reply #2 on Sat 27 Nov 2004 10:54 PM (UTC) Amended on Sat 27 Nov 2004 10:56 PM (UTC) by Flannel
|
Message
| Yeah, your best bet is to write your own routine.
Actually, Why do you need to do this? Are you trying to evaluate the string?
You can use Execute (in VB) to 'run' the string. Just tack on some assignment to the beginning. Im sure there are comperable things in other languages. Heck, I know TI calculators do. |
~Flannel
Messiah of Rose
Eternity's Trials.
Clones are people two. | Top |
|
Posted by
| Nick Gammon
Australia (23,140 posts) Bio
Forum Administrator |
Date
| Reply #3 on Sat 27 Nov 2004 11:07 PM (UTC) |
Message
| |
Posted by
| Tsunami
USA (204 posts) Bio
|
Date
| Reply #4 on Sun 28 Nov 2004 04:01 PM (UTC) |
Message
| I realize that there are parsers, I have already made one, I justed wanted this as an excersise for myself. I actually think it may be possible. Using one backreference lets you match before an operator, and a think using two nested forereferences will allow matching after a backreference. Then you just need to combine them with an or '|'. | Top |
|
Posted by
| Nick Gammon
Australia (23,140 posts) Bio
Forum Administrator |
Date
| Reply #5 on Sun 28 Nov 2004 06:58 PM (UTC) Amended on Sun 28 Nov 2004 07:03 PM (UTC) by Nick Gammon
|
Message
| Ah, I sense a challenge here. :)
Your first problem really is that a single regexp, executed once, will not return an indefinite number of matching patterns.
I assume therefore that you are prepared to execute the rexexp repeatedly on the same string, similar to the "repeat on same line" option in MUSHclient, where you keep executing the rexexp starting at the column where the previous execution left off.
To make this useful, I'll demonstrate it in Lua (however this is using the usual PCRE regexp), where you use a callback (function) for each execution, so you can process it.
rex.new ("([ ]{0,}[+\\-]{0,1}[0-9]{1,}[ ]{0,}[+\\-*/]{0,1})"):
gmatch ("5+-2 / 7",
function (m, t)
Note ("match = ", m)
end)
-- output:
match = 5+
match = -2 /
match = 7
What this is looking for is:
- Optional leading spaces
- Optional leading sign (one or zero of)
- One or more digits (0-9)
- Optional spaces after the digits
- Optional arithmetic operator (+-*/) (one or zero of)
As you can see from the output this appears to split the supplied text into batches of (signed) numbers followed by an operator.
For each batch you could then apply a different regexp to extract the operator (or simply remove trailing spaces and get the last character). |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Nick Gammon
Australia (23,140 posts) Bio
Forum Administrator |
Date
| Reply #6 on Sun 28 Nov 2004 07:02 PM (UTC) |
Message
| However if you don't want to process the string, but merely test it, you can probably do it with one regexp parse, by adding {0,} to the end of the one I did. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| David Haley
USA (3,881 posts) Bio
|
Date
| Reply #7 on Sun 28 Nov 2004 08:55 PM (UTC) |
Message
| Well, if you actually think it's possible, let me know if it works out. I actually think it's impossible and I could probably prove it to you formally (as in, using the theory of computer science) - the regular expression language is limited and one has to move to the context free language most of the time to do interesting things.
A single reg-ex, for instance, cannot match indefinite and complex levels of nesting. You could probably run regular expressions multiple times to piece away the parentheses one nest-level at a time, but then technically you're not using a regular expression anymore.
Also, how is a regex going to split apart precedence? How is it going to know that 3*5-1/5 is actually (3*5)-(1/5)? I guess it all depends on what you're using this for - perhaps you don't need to know about precedence.
I'm half tempted to go prove it's impossible formally just for the challenge of it. In fact I'm pretty sure we proved things just like this last Spring in class...
But really, if you do figure it out, let me know. :-) |
David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone
http://david.the-haleys.org | Top |
|
Posted by
| Nick Gammon
Australia (23,140 posts) Bio
Forum Administrator |
Date
| Reply #8 on Sun 28 Nov 2004 09:39 PM (UTC) |
Message
| Doesn't sound like he wants precedence, and the nesting might be done with a recursive regexp. Interesting to try ... |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Tsunami
USA (204 posts) Bio
|
Date
| Reply #9 on Mon 29 Nov 2004 06:56 PM (UTC) Amended on Tue 30 Nov 2004 12:51 PM (UTC) by Tsunami
|
Message
| You're right, I don't want precedence. That would just be a nightmare, not to mention impossible. I'm not using MC for this, so ignore making it global, just assume that will happen in any case.
I'm afraid you may have misunderstood me however. :P I do NOT want to match numbers or operators, I want to match that which is not a number or operator, because that is where the string will be split. If I use parens to demonstrate where I want to match:
5()+()-2( )/( )7
I think Nick did a good job matching the numbers, although I'm planning to include a possible decimal point in there (.?). It seems much harder to match non-operators or non-numbers though.
An initial regex I came up with (I apologize if some of the syntax in the assertions are wrong, I did this off the top of my head):
To match before operators:
(?<=\d|\.).{0,0}(?=\-|\+|\*|\/)
I'm not sure if the ".{0,0}" is necessary to match on nothing? I'm thinking two nested positive assertions would make sure that an operator is followed by a possible negative sign, in turn followed by a number or decimal point. Thanks!
P.S. Ksilyan, if you could prove that, I'd be very interested seeing your proof! | Top |
|
Posted by
| David Haley
USA (3,881 posts) Bio
|
Date
| Reply #10 on Mon 29 Nov 2004 07:49 PM (UTC) |
Message
| The problem is one of nesting and context. Regular expressions have no way to express contextual importance.
For instance, can you write a regular expression that represents the language:
S -> aSa | B
B -> b
Or, in English:
pairs of 'a' on the outside, with finally a 'b' at the innermost level.
Regular expressions cannot express the context necessary to represent this "language". I hold that it is not possible to write a single regular expression that can yield this language.
I think that intuitively you see the problem with regular expressions and nesting, but when I have the time I'll rummage through my notes from last year a bit and see if I can find a more formal proof. I'm pretty sure we did one, on almost exactly this topic... |
David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone
http://david.the-haleys.org | Top |
|
The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).
To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.
26,531 views.
It is now over 60 days since the last post. This thread is closed.
Refresh page
top