replace: does it not update the $ variables?

Started by cormullion, October 18, 2007, 10:13:35 AM

Previous topic - Next topic

cormullion

More confusion here at Cormullion Towers...   Take the following code (please :-!), which distills my possible misunderstanding of the correct operation of replace:


(dotimes (x 10)
 (set 't {this is some text I think})
 (replace {(th).*(so).*(xt).*(nk)} t (for (n 1 15) (if ($ n) (print { $} n { } ($ n)))) 0)
 (println)
 (set 't {that was some text for sure})
 (replace {(th).*(so).*(xt).*} t (for (n 1 15) (if ($ n) (print { $} n { } ($ n)))) 0)
 (println)
)

 $1 th $2 so $3 xt $4 nk
 $1 th $2 so $3 xt $4 nk
 $1 th $2 so $3 xt $4 nk
 $1 th $2 so $3 xt $4 nk
 $1 th $2 so $3 xt $4 nk
 $1 th $2 so $3 xt $4 nk
 $1 th $2 so $3 xt $4 nk
 $1 th $2 so $3 xt $4 nk
 $1 th $2 so $3 xt $4 nk
 $1 th $2 so $3 xt $4 nk
 $1 th $2 so $3 xt $4 nk
 $1 th $2 so $3 xt $4 nk
 $1 th $2 so $3 xt $4 nk
 $1 th $2 so $3 xt $4 nk
 $1 th $2 so $3 xt $4 nk
 $1 th $2 so $3 xt $4 nk
 $1 th $2 so $3 xt $4 nk
 $1 th $2 so $3 xt $4 nk
 $1 th $2 so $3 xt $4 nk
 $1 th $2 so $3 xt $4 nk


Now, the second replace operation is only looking for three matches, so I would have expected the fourth match variable ($4) to be set to empty. But it remembers the value from the previous replace. So after the second replace, $4 is 'nk', even though that string doesn't appear anywhere in the string.



What am I doing wrong this time? -)

m i c h a e l

#1
Hi Cormullion!



The $n globals are not reset to nil each time regex or replace are used. Simply set the $n variables to nil before calling the next function that sets them. When I needed this, I just defined a function that sets all the $n variables to nil for me.



m i c h a e l

cormullion

#2
Thanks for that, michael, I'm glad you see what I mean! The workround will be useful too.



But it seems to me that the current behaviour is not right. I don't have any hard evidence to say why I think each replace op should reset the $ variables that don't match, I just think I'd expect them to. If there was not match for $4, then $4 should be empty. I suppose I feel that system variables ought to be bang up to date...



What do other languages do in this respect?

Jeff

#3
Perl works the same way.  In most other languages, external variables are not set by regex operations.  For example, in Python, back-references are only available in the context of the module:


import re

regexp = re.compile("(foo)bar")
str = "foobar"
new_str = regexp.sub("1") # <-- "foo"
Jeff

=====

Old programmers don\'t die. They just parse on...



http://artfulcode.net\">Artful code

Lutz

#4
QuoteI suppose I feel that system variables ought to be bang up to date...


The performance hit on 'replace' is too big when always resetting all system variables to nil. The rare case this problem occurs, Michael's solution is the best.



Lutz

Jeff

#5
Plus, if you are matching the same expression against a large number of strings, you don't have to store a "last match".
Jeff

=====

Old programmers don\'t die. They just parse on...



http://artfulcode.net\">Artful code

cormullion

#6
OK, thanks! I'm wiser... :-)



Ironically it was a translation of some Perl code and the subsequent puzzling over the differences that made me raise the question originally. I probably didn't understand the Perl original very well - perhaps there was some phraseology that resets the $ thingies...



I'll use michael's workround.



Thanks!



PS: Why does nearly everything in Perl start with dollar signs...?



PPS: Don't answer that question... I really don't want to know the answer! (and I could look it up if I did :-)

Jeff

#7
The dollar sign one of a number of glyphs that describe the context in which a variable is accessed.  The same variable can be several different data types in Perl.  The glyph (eg $, @, or %) tells the interpreter to access the variable as a scalar, array, or hash.  When you access an element of an array or hash, you use the scalar $ because the value attached to the symbol is scalar.  When you access the array or hash as a whole, you use @ or %, because the symbol is pointing to the array or hash data type.



The global "magic" symbols point to symbols used by Perl to describe the environment.  Perl began as a replacement for awk, which had no concept of scope, really- it was a text processing tool.  Perl uses things like $_ and $/, et al, so the interpreter doesn't pollute the global namespace or overwrite variables that the programmer creates (like __foo__ methods in Python or __methods() in PHP).
Jeff

=====

Old programmers don\'t die. They just parse on...



http://artfulcode.net\">Artful code

cormullion

#8
No, hang on just a sec... Where do I reset the $ variables? It's got to happen between each replacement, hasn't it?


(replace
  {(t.*?s).*(x.)}
  {this is some text
  that was some terrific text
  this is some text}
  (println { $1 } $1 { $2 } $2 { $3 } $3 { $4 } $4)
  0
)

m i c h a e l

#9
With a function defined, say clear-globals, simply call the function before calling one of the global-setting functions where this behavior is desired. The pattern usually goes like this:


  ...
   (replace ...)
   (clear-globals)
   (replace ...)
   (clear-globals)
   (replace ...)
   ...


At least it was in my circumstances ;-)



m i c h a e l