replace bug?

Started by rrq, August 20, 2014, 05:37:59 AM

Previous topic - Next topic

rrq

I ran into the following, which seems like a bug in the regex pattern handling, illustrated in the following example:
> (map char (explode (replace "['']" "'" "x" 0)))
(120 120 120)
> (map char (explode (replace "'" "'" "x" 0)))
(120)
> (map char (explode (replace "['']" "'" "x" 2048)))
(120)


Thus, when the pattern is within brackets, the replacement of char u8216 gets replicated into each of the source bytes, whereas without the brackets, the "proper" replacement occurs. The replace is also proper with the flags code 2048 raher than 0.



newLISP v.10.6.0 32-bit on Linux IPv4/6 UTF-8 libffi.

Lutz

#1
The behavior is correct. When using UTF-8 characters in PCRE character classes and not specifying the UTF-8 option (either 2048 or letter "u" in version 10.6.1), each byte in the UTF-8 multibyte character found from the character class will be replaced. Character classes are taken byte-wise if not specifying UTF-8 mode.



http://www.newlisp.org/downloads/pcrepattern.html#SEC7">http://www.newlisp.org/downloads/pcrepattern.html#SEC7

rrq

#2
Ah. Yes, of course!



And, a bit much to expect newlisp mode in emacs know and show this difference, so the dumbbell at the keyboard can go on thinking about nothing...