regex

Started by Sammo, May 24, 2005, 03:14:04 PM

Previous topic - Next topic

Sammo

This



(replace "(?<=[0-9])(?=(?:[0-9]{3})+(?![0-9]))" "1234567" "," 0)



returns



1,234567



instead of



1,234,567



The expression does the right thing (i.e., returns 1,234,567) at this test site:



http://www.fileformat.info/tool/regex.htm">http://www.fileformat.info/tool/regex.htm



Look's like (replace ...) is replacing only the first instead of all occurences in v.8.5.9.



Thanks,

-- Sam

Lutz

#1
This happens when the first character sequence matched is of 0 (zero) length. This is fixed in 8.5.10.



As this is a very rare instance I want to wait to the weeekend before posting a fixed version (the problem is fixed) .



If this fix is urgent for you or anybody else, please let me know and I can post it right away.



Lutz

Sammo

#2
Hi Lutz,



A fix isn't urgently needed. I can wait.



Thanks for looking into it.

-- Sam

Lutz

#3
Thanks for catching this Sam, regular expressions are a feature we have to rely on 100%.



Lutz

Sammo

#4
Thank you, Lutz, for fixing the problem with 'replace' in 8.5.10. It does, indeed, work correctly now.

-- Sam

Lutz

#5
Yes, this bug affected all zero length boundary patterns using "",  "^", "$" and "b", a wonder that nobody tripped over it before you. I have bookmarked that test site, but I am still looking for a longer test suite, not for pattern searching in itself which is all PCRE, but repetitive replacement, which is coded inside newLISP. Your number formatting pattern had all the critical elements in it. I also found the following patterns, which are now included in the 'qa_dot' test-suite (dot for decimal point versus comma):



; must all evaluate to true

(= (replace "" "abc" "x" 0) "xaxbxcx")
(= (replace "$" "abc" "x" 0) "abcx")
(= (replace "^" "abc" "x" 0) "xabc")
(= (replace "\b" "abc" "x" 0) "xabcx")


Most of these even have some practical usage.



Lutz