[Home] [Downloads] [Search] [Help/forum]


Register forum user name Search FAQ

Gammon Forum

[Folder]  Entire forum
-> [Folder]  MUSHclient
. -> [Folder]  General
. . -> [Subject]  fun with regex

fun with regex

It is now over 60 days since the last post. This thread is closed.     [Refresh] Refresh page


Posted by Tsunami   USA  (204 posts)  [Biography] bio
Date Sat 27 Nov 2004 08:19 PM (UTC)
Message
I'm afraid this isn't exactly MC related, but I had some problems with a regex. It's purpose is to match on parts of a string where that string is to split. The string will be an equation, and I want to split it into it's component parts. For example "5+-2 / 7" should matched on the empty strings in between 5, +, and -2, and the spaces between -2, /, and 7. The end result of the split should be ["5","+","-2","/","7"]. The tricky part of course is the -2. I have a basic regex so far, which I will post as soon as I find it. I'm using back references to split before operators, but I am puzzeled as how to split after operators while taking into account a possible negative sign. Any help would be appreciated! Thanks, Tsunami.
[Go to top] top

Posted by David Haley   USA  (3,881 posts)  [Biography] bio
Date Reply #1 on Sat 27 Nov 2004 10:19 PM (UTC)
Message
If I'm not mistaken, this kind of work is impossible to do correctly with regexes - you would need a parser to actually get this done properly. You could hack together a regex to get it working in most simple cases, but as you've already discovered you rapidly run into very serious problems.

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
[Go to top] top

Posted by Flannel   USA  (1,230 posts)  [Biography] bio
Date Reply #2 on Sat 27 Nov 2004 10:54 PM (UTC)

Amended on Sat 27 Nov 2004 10:56 PM (UTC) by Flannel

Message
Yeah, your best bet is to write your own routine.

Actually, Why do you need to do this? Are you trying to evaluate the string?
You can use Execute (in VB) to 'run' the string. Just tack on some assignment to the beginning. Im sure there are comperable things in other languages. Heck, I know TI calculators do.

~Flannel

Messiah of Rose
Eternity's Trials.

Clones are people two.
[Go to top] top

Posted by Nick Gammon   Australia  (22,975 posts)  [Biography] bio   Forum Administrator
Date Reply #3 on Sat 27 Nov 2004 11:07 PM (UTC)
Message
An example of a recursive-descent parser is in the expression evaluator I posted a little while back:

http://www.gammon.com.au/forum/bbshowpost.php?bbsubject_id=4649

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Tsunami   USA  (204 posts)  [Biography] bio
Date Reply #4 on Sun 28 Nov 2004 04:01 PM (UTC)
Message
I realize that there are parsers, I have already made one, I justed wanted this as an excersise for myself. I actually think it may be possible. Using one backreference lets you match before an operator, and a think using two nested forereferences will allow matching after a backreference. Then you just need to combine them with an or '|'.
[Go to top] top

Posted by Nick Gammon   Australia  (22,975 posts)  [Biography] bio   Forum Administrator
Date Reply #5 on Sun 28 Nov 2004 06:58 PM (UTC)

Amended on Sun 28 Nov 2004 07:03 PM (UTC) by Nick Gammon

Message
Ah, I sense a challenge here. :)

Your first problem really is that a single regexp, executed once, will not return an indefinite number of matching patterns.

I assume therefore that you are prepared to execute the rexexp repeatedly on the same string, similar to the "repeat on same line" option in MUSHclient, where you keep executing the rexexp starting at the column where the previous execution left off.

To make this useful, I'll demonstrate it in Lua (however this is using the usual PCRE regexp), where you use a callback (function) for each execution, so you can process it.


rex.new ("([ ]{0,}[+\\-]{0,1}[0-9]{1,}[ ]{0,}[+\\-*/]{0,1})"):
   gmatch ("5+-2 / 7", 
       function (m, t)  
        Note ("match = ", m) 
       end)

-- output:

match = 5+
match = -2 /
match =  7


What this is looking for is:


  • Optional leading spaces

  • Optional leading sign (one or zero of)

  • One or more digits (0-9)

  • Optional spaces after the digits

  • Optional arithmetic operator (+-*/) (one or zero of)


As you can see from the output this appears to split the supplied text into batches of (signed) numbers followed by an operator.

For each batch you could then apply a different regexp to extract the operator (or simply remove trailing spaces and get the last character).

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Nick Gammon   Australia  (22,975 posts)  [Biography] bio   Forum Administrator
Date Reply #6 on Sun 28 Nov 2004 07:02 PM (UTC)
Message
However if you don't want to process the string, but merely test it, you can probably do it with one regexp parse, by adding {0,} to the end of the one I did.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by David Haley   USA  (3,881 posts)  [Biography] bio
Date Reply #7 on Sun 28 Nov 2004 08:55 PM (UTC)
Message
Well, if you actually think it's possible, let me know if it works out. I actually think it's impossible and I could probably prove it to you formally (as in, using the theory of computer science) - the regular expression language is limited and one has to move to the context free language most of the time to do interesting things.

A single reg-ex, for instance, cannot match indefinite and complex levels of nesting. You could probably run regular expressions multiple times to piece away the parentheses one nest-level at a time, but then technically you're not using a regular expression anymore.

Also, how is a regex going to split apart precedence? How is it going to know that 3*5-1/5 is actually (3*5)-(1/5)? I guess it all depends on what you're using this for - perhaps you don't need to know about precedence.

I'm half tempted to go prove it's impossible formally just for the challenge of it. In fact I'm pretty sure we proved things just like this last Spring in class...

But really, if you do figure it out, let me know. :-)

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
[Go to top] top

Posted by Nick Gammon   Australia  (22,975 posts)  [Biography] bio   Forum Administrator
Date Reply #8 on Sun 28 Nov 2004 09:39 PM (UTC)
Message
Doesn't sound like he wants precedence, and the nesting might be done with a recursive regexp. Interesting to try ...

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Tsunami   USA  (204 posts)  [Biography] bio
Date Reply #9 on Mon 29 Nov 2004 06:56 PM (UTC)

Amended on Tue 30 Nov 2004 12:51 PM (UTC) by Tsunami

Message
You're right, I don't want precedence. That would just be a nightmare, not to mention impossible. I'm not using MC for this, so ignore making it global, just assume that will happen in any case.

I'm afraid you may have misunderstood me however. :P I do NOT want to match numbers or operators, I want to match that which is not a number or operator, because that is where the string will be split. If I use parens to demonstrate where I want to match:

5()+()-2( )/( )7

I think Nick did a good job matching the numbers, although I'm planning to include a possible decimal point in there (.?). It seems much harder to match non-operators or non-numbers though.

An initial regex I came up with (I apologize if some of the syntax in the assertions are wrong, I did this off the top of my head):

To match before operators:
(?<=\d|\.).{0,0}(?=\-|\+|\*|\/)

I'm not sure if the ".{0,0}" is necessary to match on nothing? I'm thinking two nested positive assertions would make sure that an operator is followed by a possible negative sign, in turn followed by a number or decimal point. Thanks!

P.S. Ksilyan, if you could prove that, I'd be very interested seeing your proof!
[Go to top] top

Posted by David Haley   USA  (3,881 posts)  [Biography] bio
Date Reply #10 on Mon 29 Nov 2004 07:49 PM (UTC)
Message
The problem is one of nesting and context. Regular expressions have no way to express contextual importance.

For instance, can you write a regular expression that represents the language:

S -> aSa | B
B -> b

Or, in English:
pairs of 'a' on the outside, with finally a 'b' at the innermost level.

Regular expressions cannot express the context necessary to represent this "language". I hold that it is not possible to write a single regular expression that can yield this language.

I think that intuitively you see the problem with regular expressions and nesting, but when I have the time I'll rummage through my notes from last year a bit and see if I can find a more formal proof. I'm pretty sure we did one, on almost exactly this topic...

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
[Go to top] top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.


22,362 views.

It is now over 60 days since the last post. This thread is closed.     [Refresh] Refresh page

Go to topic:           Search the forum


[Go to top] top

Quick links: MUSHclient. MUSHclient help. Forum shortcuts. Posting templates. Lua modules. Lua documentation.

Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.

[Home]


Written by Nick Gammon - 5K   profile for Nick Gammon on Stack Exchange, a network of free, community-driven Q&A sites   Marriage equality

Comments to: Gammon Software support
[RH click to get RSS URL] Forum RSS feed ( https://gammon.com.au/rss/forum.xml )

[Best viewed with any browser - 2K]    [Hosted at HostDash]