rexeg-comp ?BUG?

Started by newdep, July 28, 2009, 10:13:40 AM

Previous topic - Next topic

newdep

Using regex-comp under windows.

(didnt test it under unix)





Btw I think the manual should provide a better exmaple of the use regex-

comp.. I got it from Python that I know how this works..





This works ->


> (find-all {d+} "1 2 3 4 numbers upto 5 6 7 8 in it." )
("1" "2" "3" "4" "5" "6" "7" "8")






This doesnt ->


(setq p1 (regex-comp {d+} ))
"ERCP1000000000000040000000000000000d000000(000300000000000000000000000000P0005,06B000500"

> (find-all p1 "1 2 3 4 numbers upto 5 6 7 8 in it." 0x10000 )
()




And replace does work directly with regex-comp..


> (replace p1 "1 2 3 4 numbers upto 5 6 7 8 in it." "+" 0x10000 )
"+ + + + numbers upto + + + + in it."




Recapturing... or only replace does work or my example doesnt work..

Anyway.. It took me far too much time to get a simple find-all working with a regex-comp..got actualy irritated by it because I needed something

quickly.. Spending 1 hour on this isnt realy quick.. anyway.. Is it a bug?
-- (define? (Cornflakes))

Lutz

#1
Your syntax is not correct, there is a translation expression before the options number:



http://www.newlisp.org/downloads/newlisp_manual.html#find-all">http://www.newlisp.org/downloads/newlis ... l#find-all">http://www.newlisp.org/downloads/newlisp_manual.html#find-all


> (find-all p1 "1 2 3 4 numbers upto 5 6 7 8 in it." $0 0x10000 )
("1" "2" "3" "4" "5" "6" "7" "8")
>


Also using a precompiled pattern in the above example is redundant, as newLISP caches the compilation of the last pattern automatically.



http://www.newlisp.org/downloads/newlisp_manual.html#regex-comp">http://www.newlisp.org/downloads/newlis ... regex-comp">http://www.newlisp.org/downloads/newlisp_manual.html#regex-comp

newdep

#2
..Yes.. I actualy still dont get it ;-)



from the regex-comp manual part there is the mentioning of the 0x10000

to point out a compiled regex to always be there..



the normal (find-all {d+} ...) does work without the $0.



From the find-all manual part I still dont see any link between the use of

the $0 and the compiled regex...



If (find-all {d+} "131numbers234" )

results in a list, logicaly I dont have to think any further and use



(find-all p1 "1234more2341" 0x10000)



or even



(find-all p1 "1234more2341")



to get it working with compiled regex..



..And actualy "right here" at the point of not-logical (in my case) it is

crusial/critical in use to the user..





I just think the regex based functions arnt working in the way of logical

expectation of returned results, in default setup.



Working on complex functions creates automaticly a mind-set of complex/simple results.



But a simple function expects a simple handling and result..



Now the question is "what is logical" ;-) Im just writing from the user

perspective..





edited ->



Btw.. even this works out of the box ->

> (setq p2 {d+})

"\d+"

> (find-all p2 "1 2 3 4 numbers upto 5 6 7 8 in it.")

("1" "2" "3" "4" "5" "6" "7" "8")



Wether p is a compiled regex or a string it should simply not differ to newlisp..
-- (define? (Cornflakes))

Lutz

#3
'find-all' uses regular expressions by default, which makes sense, because a non-regex 'find-all' would just return the same element multiple times in a list.



So using a regex option is relatively rare. It is much more frequent, that you want to process the found element, so the first optional parameter in 'find-all'  is not the a regex option number but a processing expression.



So the most frequent use: "just return the found results in a list" is also the most simple syntax pattern. The first option then is an expression to process the result.



I believe if you start using 'find-all' more often, you will agree ;-)

newdep

#4
It is that newlisp isnt written in newlisp, else i would have rebuild this ;-)

But you leave me no other choise than fighting regex with regex.. Yes I agree..I need to use find-all more but only when im in the need of finding it all more ;-)
-- (define? (Cornflakes))

newdep

#5
Lutz,



Wondering.. How easy is it to de-compile the regex-comp result?



If someone provided me a compiled regex for use in newlisp but I

want to adjust it.. Than it would be useful to be able to de-compile it.

Or even if you want to know whats inside..



because.. for now ..nobody can read this
"ERCP16000000016000000000000000000000000000000(000300000000000000000000000000P00t1921"21t21h21i21s21 21i21s21 21a21 21h21i21d21d21e21n21 21m21e21s21s21a21g21e21 21n21o21t21 21b21e21i21n21g21 21d21e21-21c21o21m21p21i21l21e21d21 21f21o21r21 21n21o21w21"B00t00"







edited..



Actualy there is a way on how to decode this in newlisp and its logic , but only for this regex-comp... others im unable to decode..;-)



 --> [size=59](find-all {[a-zA-Z' '"]+} secret)[/size] <--
-- (define? (Cornflakes))

Lutz

#6
Perhaps what is needed is a program translating a regex source pattern into plain English. Regex patterns on the net are all published as source, and they are hard enough to read already in source format.



There are really relative few cases, where you would want to pre-compile regex patterns. newLISP compiles patterns automatically and always caches the last one already. So you need repeatedly alternating patterns to really take advantage of the 'regex-comp' function.



I wonder if  "ERCP" in the compiled string means "PCRE" backwards? In the PCRE source this string doesn't occur once and the the function pcre_compile() is about 1200 lines of code (one! function).



I wouldn't worry too much about pre-compiling regex patterns, except if you have to address a specific performance bottleneck.

newdep

#7
QuotePerhaps what is needed is a program translating a regex source pattern into plain English




..Perhpas not fully but this is what i was trying to work out..

I want a more natural search inside my code but there a different ways

on doing this actualy. Using a 'define-macro or a FOOP way Class.

Then there is ofcourse the content handling, yes regex.





Using this has its advantage of not needing to eval the args.
..but thats also a disadvantage.

(define (return:return) (find-all (string (args 1) (args 0)) (args -1)))



Then there is the question of "readability"..




Is this more readable?

    (return only letters from line)

or this
   
    (return:letters line)


   



Now an example..






(setq line "This is a line with letters, digits 23141234 and numbers 12354125")
(setq machine "you have $10.00 in deposit")

(constant 'only "+")
(constant 'all only)
(constant 'numbers "[0-9]")
(constant 'letters "[a-zA-Z]")
(constant 'money {$[0-9|.]+})

(define (return:return) (find-all (string (args 1) (args 0)) (args -1)))
(define (return:letters) (find-all {[a-zA-Z]+} (args -1)))
(define (return:numbers) (find-all {[0-9]+} (args -1)))
(define (return:money) (find-all  {$[0-9|.]+} (args -1)))
(define (return:time)  (find-all {d{1,2}:d{1,2} | d{1,2}:d{1,2}:d{1,2}} (args -1)))
(define (return:ip)    (find-all {d{1,3}.d{1,3}.d{1,3}.d{1,3}} (args -1)))


(return only letters from line)
> ("This" "is" "a" "line" "with" "letters" "digits" "and" "numbers")

(return all money from machine)
> ("$10.00")

(return:letters line)
>("This" "is" "a" "line" "with" "letters" "digits" "and" "numbers")

(return:money   machine)
>("$10.00")

(return:numbers {n0th1ng h3re})
>("0" "1" "3")

(return:time    {yesterday at 10:00 AM or 22:00:23})
>("10:00 " " 22:00:23")

(return:ip      {Elvis did use 172.172.172.172 But Joe used 1.1.1.1})
>("172.172.172.172" "1.1.1.1")

-- (define? (Cornflakes))