(replace) doesn't seem to work with "^".

Started by gregben, October 23, 2004, 06:20:24 PM

Previous topic - Next topic

gregben

I'm trying to replace the initial character of a string

(actually, I'm trying to do something more complex, but this is the gist of it).

I tried the following under newLisp 8.2.0 (Sparc Solaris 8) and got the following:



> (replace "a" "aaa" (upper-case $0) 0)

"AAA"

> (replace "^a" "aaa" (upper-case $0) 0)

"AAA"

> (replace "a$" "aaa" (upper-case $0) 0)

"aaA"

> (replace "^a" "aaa" (upper-case $0) 0)

"AAA"

> (replace "xa" "aaa" (upper-case $0) 0)

"aaa"



Note that I expected:

(replace "^a" "aaa" (upper-case $0) 0) to yield "Aaa".

See that "a$" works as expected.

I put in "^a" just for grins, but I expected "aaa" from it.



I suspect this is my fault, but perhaps it is a bug in pcre.c

or newLisp?

Lutz

#1
The newLISP function 'replace' will replace _all_ occurences of the pattern found. To replace just the first pattern do a:



(regex "a" "aaa")



which will give you just the first occurence:



("a" 0 1) ; offset 0 length 1



Lutz

Lutz

#2
(replace "^a" "aaa" (upper-case $0) 0)



on this pattern it will take "aa" and match again then take "a" and match again.



so 'replace' replaces then takes the rest of the string and repeats the whole procedure again. This behaviour is most of the time Ok wheren it looks like this:



(replace "^a" "axaa" (upper-case $0) 0)



"Axaa"



But fails when having multiple matches one after the other. Unless newLISP would start parseing the pattern itself, the only work around I see at the moment is using 'regex' or find, catching the first position/length and then doing only one replace.



Perhaps we need an additional bit in the option flag to force 'only one replace'.



Lutz

Lutz

#3
and this would also work:



(replace "(a)(.*)" "aaa" (append (upper-case $1) $2) 0)



=> "Aaa"



because the search pattern is 'greedy' and 'eats' the whole string.



Lutz

gregben

#4
Try this in perl:



#!/usr/bin/perl



$a = "aaa";

$a =~ s/^w/x/;

print "$a=$an";



You will get $a="xaa". This is correct in Perl,

and is what I want newLisp to do.



To anyone who's interested, the pattern

/^w/ means match any single "word" character

at the beginning of the string. This is the correct

function of the "^" character.

Lutz

#5
Yes, I understand, newLISP tries to do multiple replaces, it does the first "a" then tries another with the ramaining string "aa" then with "a" as outlined in previous posts. To do the replace only once use the workarounds described.



And I can offer to add a flag in the 'option' field to tell 'replace' to do only one replacein the next version. There are still several bits left in the options flag (see manual for regex and replace).



Lutz

gregben

#6
Thanks, Lutz for all your feedback. :-)



Here's a little function I wrote that  is sort of useful

without using regex or replace.



#!/usr/bin/newlisp



(define (initial-capital str)

  (nth-set 0 str (upper-case (nth 0 str)))

  str

)





(println (initial-capital "abc"))

(exit)

Lutz

#7
Hi Greg,



 just finished the additional option for 'replace':



(replace "a" "aaa" (upper-case $0) 0x8000) => "Aaa"



(replace "^\w" "aaa" (upper-case $0) 0x8000) => "Aaa"



(replace {^w} "aaa" (upper-case $0) 0x8000) => "Aaa"



of course it can be combined with any other option for regex



Lutz