replace

Started by eddier, August 15, 2003, 12:45:56 PM

Previous topic - Next topic

eddier

What variable does replace return the regular expression in "()" in.



I want to do something like the following



(replace {(%[0-9A-F][0-9A-F])} request (char (integer (append "0x" (trim $1) "%"))))



where $1 is the match inside the {()}.



Or if you know a real neat way of converting the %hex values in a cgi request to ASCII?



Eddie

eddier

#1
This function works but is brute force! Not very pretty!



(define (replace-hex-codes w)

(join (map (lambda (x) (append (char (integer (append "0x" (slice x 0 2)))) (slice x 2))) (rest (parse text "%"))))))



Eddie

eddier

#2
theargument " w" should be "text" in the function above

Lutz

#3
Before all, don't forget that 'replace' on strings only works in regular expression mode if you add the last options parameter, i.e.



(set 'str "abydxfg")

(replace "x|y"  str "e") ;; will not work

(replace "x|y" str "e" 0) ;; will work nicely

(replace "x|y" str "e" 1) ;; will work case insensitive



Replace with regex subexpression is not yet implemented and I hope I will come around to do it one day. Unfortunately 'replace mode seems not to be part of the PCRE package I am using to bild newLISP regular expressions.



But you could use 'regex'  to hack something together, because 'regex' returns parenthesized subexpressions:



(set "abcdxfg")

(regex "(..)x" str) -> ("cdx" 2 3 "cd" 2 2)



Now you could use 'slice', 'nth' and concat  to compose you new string. But I am not sure that in the case of URL hex translation it will be smaller/faster than your solution, which looks quite elegant (lisp'yish) to me.



The is also a function for hex-code translation in cgi.lsp of the distribution, but your solution seems to be a lot shorter already.



Lutz

eddier

#4
I thought about a hack on regex but it isn't possible without recursing over a string because regex doesn't have a global option.



I had one problem in the code above, you have to have a leading code. The rest on the parse skips the fake code "%00".



(define (replace-hex-codes w)
  (join
    (map
      (lambda (x)
        (append (char (integer (append "0x" (slice x 0 2)))) (slice x 2)))
      (rest (parse (append "%00" w) "%")))))


This seems to work in all cases.



The next function below always works. I haven't profiled the functions yet to see which is better.



(define (replace-hex-codes w)
  (let ((s (match "*%??*" w)))
    (if s
      (append (first s)
                   (char (integer (append "0x" (trim (nth 1 s) "%"))))
                   (replace-hex-codes (last s)))
      w)))
)


Why does match with "*%??*" work and match with "*%*" don't?



Eddie[/code]

Lutz

#5
in 'match' strings are broken at the stars '*':



(match "*%??*" "abc%40def") =>  ("abc" "%40" "def")



The '?' means: any one character. But:



(match "*%*" "abc%40def") => ("abc" "%" "40def")



You also could use '#' which stands for any number digit:



(match "*%##*" "abc%40def") =>  ("abc" "%40" "def")



Lutz

eddier

#6
Thanks for the explanation.



I needed the "?"s because of the hex values "A-F."



Eddie

Lutz

#7
I suspect the first of the last two URL-hex translations you posted, will be the faster one but haven't measured it



I am pretty much done with everything for the next release 7.1. Just finished a Bayesian spam detection in newLISP, which I want o slip into the next release. Less than 300 lines of newLISP code, *real* Bayesian, no shortcuts and running on the nuevatec mail server right now. Nice workout for the PCRE functions in newLISP and a good demo how newLISP can handle big symbol spaces. The database of words/token is kept in a newLISP context as symbols. I also updated to the lates 4.3 version of PCRE and things seems to work fine.



I would love to have regex replace with subexpressions in newLISP, but now easy solutuion for this seems to be available. The PCRE code doesn't offer it.  Any ideas anybody?



Lutz

eddier

#8
I'm using the first one with a

(replace "+" w " ") on the first line to kill the "+"s.



I know that it is posible with PCRE because Python uses this library and supports regular expression substitution. You can also use an external function when using replacement as well. Like you say, it may not be easy to implement.



I have just finished a new College calendar. I had constructed one a year or so ago for just faculty and staff but they wanted one to be displayed from the College's front page.



It is composed of 4 newLisp programs.



c.cgi visible to the world at http://www.bmc.edu">www.bmc.edu click the calendar link. (Carla hasn't had time to add the events yet. Meryl Lee hasn't had time to add all the academic events either)

a.cgi to add academic events

e.cgi to add monthly events

p.cgi to view a month and year in printable form



I'm the new webmaster of the College and have been asked to do a major makeover of the site. Unfortunately, I have to implement a committee's design which I think is horrible.



Eddie

Lutz

#9
Just visited the site with your calendar, very nice!



Yes, Pyhton uses PCRE and does regex replace and so does PHP. I guess I have to sit down and do it too. Once you are into using Regular Expressions it feels like an essential thing. Perhaps I find some time ...



Lutz

eddier

#10
Maybe I should have started a new thread.



If you would like the calendar cgi programs I could e-mail them. Maybe some of the definitions would be useful. I'm sure there are some things that could have been coded a lot better, since some of it was a hack due to the rush to get the calendar in place. Some of the definitions are used in all four programs and could have been separated out in a separate file and loaded in during a run. I had my reasons for not separating the common definitions out.



I'm excusing myself of any waranty or liabilities of any kind though if others want to change or adapt the code to their on use.



Eddie