I have a series of lists, which contain something like:
"Telefónica to Sell Spanish Assets"
I want to be able to quickly change all these codes to their proper ascii characters.
(set 'x "Telefónica to Sell Spanish Assets")
(find {&#(.*);} x 0) ; reference only
(regex {&#(.*);} x)
; > ("ó" 5 6 "243" 7 3)
> $1
; > "243"
(char (int $1))
; > "ó"
My question?
How can I do the above in one quick pass, where I get all the {&#(.*);} and replace them with (char (int $1)) ... without doing a loop through everything over and over until the (regex) finds nothing?
I feel like I am missing some (map) or (replace) iterative operator.
(setq text "Telefónica to Sell Spanish Assets")
; regex flag = 0
(replace "&#.*?;" text (char (int (2 -1 $it))) 0)
(println text)
;-> Telefónica to Sell Spanish Assets
(exit)
-- xytroxon
Quote from: "kanen"
I feel like I am missing some (map) or (replace) iterative operator.
In other words, $it is the iterative operator I was missing. :)
d'oh!
Quote from: "xytroxon"
(setq text "Telefónica to Sell Spanish Assets")
; regex flag = 0
(replace "&#.*?;" text (char (int (2 -1 $it))) 0)
(println text)
;-> Telefónica to Sell Spanish Assets
(exit)
-- xytroxon
Quote
In other words, $it is the iterative operator I was missing. :)
No, this one would work too:
> (set 'x "Telefónica to Sell Spanish Assets")
"Telefónica to Sell Spanish Assets"
> (replace {&#(d+);} x (char (int $1)) 0)
"Telefónica to Sell Spanish Assets"
>
'$it' is just a replacement for '$0', 'replace' always iterates through all occurrences found:
http://www.newlisp.org/downloads/newlisp_manual.html#replace
and here about the anaphoric '$it':
http://www.newlisp.org/downloads/newlisp_manual.html#system_symbols
It's a little more complicated then what I posted from memory... Handling all the HTML special entity codes in html or rss docs is a pain!
Besides decimal codes, there are...
hexadecimal codes -> � ... ÿ ... &0x150; etc...
common codes -> & < > etc...
foreign language -> ¡ ¿ € etc...
http://tlt.its.psu.edu/suggestions/international/web/codehtml.html
http://webdesign.about.com/library/bl_htmlcodes.htm
A more general solution:
(define (HTML-special-chars str)
; code here
)
(replace "&.*?;" text (HTML-special-chars $1) 0)
And I've seen & in rss docs requiring a second pass...
-- xytroxon