encoding url with chinese words meets an error

Started by xmftlg, February 21, 2013, 06:54:55 AM

Previous topic - Next topic

xmftlg

newLISP v.10.4.5 on Win32 IPv4/6 UTF-8 libffi, execute 'newlisp -h' for more info.



(define (url-encode str)  

  (replace {([^a-zA-Z0-9])} str (format "%%%2X" (char $1)) 0))



(url-encode "倒")



ERR: invalid UTF8 string in function char。



need help.

Lutz

#1
You have to set the UTF-8 option for regular expressions:



(define (url-encode str)
    (replace {([^a-zA-Z0-9])} str (format "%%%2X" (char $1)) 2048))

> (url-encode "爱")
"%7231"
> $1
"爱"
> (char 0x7231)
"爱"


See here for all options: http://www.newlisp.org/downloads/newlisp_manual.html#regex">http://www.newlisp.org/downloads/newlis ... html#regex">http://www.newlisp.org/downloads/newlisp_manual.html#regex

xmftlg

#2
much thanks to  Lutz.



but in url:



 (url-encode "爱")



should be   %e7%88%b1



how to do that?

Lutz

#3

(define (url-encode str)
    (join (map (fn (chr) (format "%%%02x" chr)) (unpack (dup "b" (length str)) str))))

(url-encode "所有的愛是公平的")
;=> "%e6%89%80%e6%9c%89%e7%9a%84%e6%84%9b%e6%98%af%e5%85%ac%e5%b9%b3%e7%9a%84"


Ps: this and a url-decode for utf-8, you can now also find here:

http://www.newlisp.org/index.cgi?page=Code_Snippets">http://www.newlisp.org/index.cgi?page=Code_Snippets

xmftlg

#4
Thank you Lutz.

It works.



I should keep on learning  NEWLISP.



ps: found  in  google :



https://github.com/kosh04/newlisp.snippet/blob/master/net.lsp">https://github.com/kosh04/newlisp.snipp ... er/net.lsp">https://github.com/kosh04/newlisp.snippet/blob/master/net.lsp



;; URL translation of hex codes with dynamic replacement

(define (url-encode url (literal ""))

     (join (map (lambda (c)      

        (if (or (regex "[-A-Za-z0-9$_.+!*'(|),]" (char c))          

            (member (char c) literal))      

            (char c)            

           (format "%%%02X" c)))          

           ;; 8-bit clean    

        (unpack (dup "b" (length url)) url))))



haven't test  it.