regex again

cormullion · October 12, 2007, 01:21:56 PM

Struggling again with these. I'm getting confused as to how many of the backslashes should be escaped. I'm trying to convert some like these from Perl, inside quotes:

Code Select Expand
"^(([ ]{0,3}([*+-])[ t]+)(?s:.+?) (z|n{2,}(?=S)(?![ t]*[*+-][ t]+)))"

Is there a rough rule of thumb for when converting Perl regexen to newLISP -what should be escaped and what shouldn't?

jrh · October 12, 2007, 02:09:34 PM

~~Quote from: "cormullion"~~
Is there a rough rule of thumb for when converting Perl regexen to newLISP

Yes, don't. Use awk. It may be wordy but at least it is comprehensible.

cormullion · October 12, 2007, 02:32:37 PM

What would be the benefit of using regexen via Awk rather than in PCRE via newLISP? Looks very much the same sort of thing to me...

jrh · October 12, 2007, 02:40:46 PM

With awk you can break it up into program steps. Better yet is to get rid of all those awful special characters in regular expressions. Here is a discussion of a possible LISPy way out of Perl/regex hell:

http://c2.com/cgi/wiki?AlternativesToRegularExpressions">//http://c2.com/cgi/wiki?AlternativesToRegularExpressions

cormullion · October 12, 2007, 03:08:15 PM

Hmm - some interesting musings there, but nothing practical or immediately useful. I'd rather struggle a bit with regular expression, imperfect though they may be, than completely fail in an attempt write a context-free grammar parser or whatever is needed.

HPW · October 12, 2007, 10:22:57 PM

You may have a look at the regex-coach:

http://weitz.de/regex-coach/">http://weitz.de/regex-coach/

(Written in lispworks common-lisp)

m i c h a e l · October 13, 2007, 03:13:35 AM

Hi cormullion!

Since I know you're using a Mac and Hans-Peter's recommendation runs on windows only, I'll mention my favorite regex helper: http://homepage.mac.com/roger_jolly/software/index.html#regexhibit">RegExhibt.

http://homepage.mac.com/roger_jolly/software/pics/RegExhibit_Screen.png">

This is the fourth one I've tried so far, and I think this regex helper is head and shoulders above the others.

Also, if you will be doing a lot of regexing (or cssing, htmling, or even javascripting), I highly recommend http://www.visibone.com/products/browserbook.html">The VisiBone Browser Book. I used this heavily during the edit of the newLISP manual and while making the neglOOk website. When I think how useful it's been to me, I'm sorry I didn't mention it earlier!

m i c h a e l

Jeff · October 13, 2007, 05:42:54 AM

There are two-inch-thick books on the differences between the different regex flavors. PCRE is something of a standard, and that's the library newLISP uses. newLISP has better regex support than most other languages. The only thing it lacks is named groups, which I think were invented for Python and are not standard; but it does have recursion, I believe, using ?R syntax.

I usually put regular expressions between {} so I don't have to double-escape. Your expression should be able to be used verbatim between [text] tags (since it uses curly braces, curly braces would need to be escaped).

cormullion · October 13, 2007, 07:36:28 AM

thanks - that's the sort of help i'd been hoping for!

rickyboy · October 13, 2007, 07:53:54 AM

~~Quote from: "cormullion"~~Struggling again with these. I'm getting confused as to how many of the backslashes should be escaped. I'm trying to convert some like these from Perl, inside quotes:

Code Select Expand "^(([ ]{0,3}([*+-])[ t]+)(?s:.+?) (z|n{2,}(?=S)(?![ t]*[*+-][ t]+)))"

Holy Schneikees! What is *that* supposed to do? :-)

Seriously, whenever I write monstrosities like that, either I put a massive amount of comments explaining what each piece does, or I break up the regex string into smaller strings and assign them to meaningful symbol names (if only for my own benefit when I look at the code more than a week later). But you probably got this from non-cormullion code, right?

Sorry I can't help -- I always have to look it up in a book. I was doing OK until I got to (?s:.+?). :-)

cormullion · October 13, 2007, 11:49:18 AM

Yes, horrible things. I think it was Markdown lists or headers I was trying to match. It's those positive and negative lookaheads that hurt my brain the most.

I had originally hoped to just copy these regexes without looking 'inside them', but they didn't work first time... And then you start tinkering with them, adding backslashes etc. and it all starts to come unstuck ;-(

cormullion · October 13, 2007, 11:53:42 AM

~~Quote from: "Jeff"~~Your expression should be able to be used verbatim between [text] tags (since it uses curly braces, curly braces would need to be escaped).

It's possible that curly braces might work - the manual quoth "Balanced nested curly brackets may be used within a string. This aids in writing regular expressions or short sections of HTML."

I might investigate, when I'm feeling more optimistic about regexes...

Thanks Jeff!

HPW · October 24, 2007, 11:20:39 PM

Another WIN-only tool for regex:

http://www.regexbuddy.com">http://www.regexbuddy.com

Not free but with 29.95 € affordable.

Very easy to make and analyse a regex.

Also it's big brother powergrep is worth a look. (But not cheap)

jrh · October 27, 2007, 08:23:25 AM

~~Quote from: "rickyboy"~~
Holy Schneikees! What is *that* supposed to do?

Only a nitwit would write something like that or spend time figuring out what it did. I must say that the excessive use of macros in LISP is another example of this syndrome. I don't care how rich he is, Paul Graham is dead wrong.

Clear, concise, and simple is what I strive for. Complexity for its own sake is horse shit.

HPW · October 27, 2007, 09:01:02 AM

~~Quote~~Only a nitwit would write something like that or spend time figuring out what it did.

Paste it in RegexBuddy and let explain it:

HTML:

http://hpwsoft.de/anmeldung/html1/newLISP/NewLispRegexPost.htm">http://hpwsoft.de/anmeldung/html1/newLI ... exPost.htm">http://hpwsoft.de/anmeldung/html1/newLISP/NewLispRegexPost.htm

Hardcopy:

http://hpwsoft.de/anmeldung/html1/newLISP/NewLispRegexPost.png">http://hpwsoft.de/anmeldung/html1/newLI ... exPost.png">http://hpwsoft.de/anmeldung/html1/newLISP/NewLispRegexPost.png

newLISP Fan Club

News:

regex again

cormullion

jrh

cormullion

jrh

cormullion

HPW

m i c h a e l

Jeff

cormullion

rickyboy

cormullion

cormullion

HPW

jrh

HPW