newLISP Fan Club

Forum => newLISP in the real world => Topic started by: dexter on January 02, 2014, 09:04:29 PM

Title: Fore nth to not use utf8 ?
Post by: dexter on January 02, 2014, 09:04:29 PM
Well new lisp support utf8



(nth utf8str index) will return a utf8 string



(slice utf8str 0 1) will return part of a utf8 string, let's say one byte



Why nth works like that?  



And How can I force nth to work without utf8 behaviour? like slice do



without re-compile new lisp?
Title: Re: Fore nth to not use utf8 ?
Post by: dexter on January 03, 2014, 02:08:53 AM
I re-compiled newlisp



This is not the first time ,-DSUPPORT_UTF-8 make mess with me





I really don't think new lisp should support utf8 natively , could be based on system, leave utf-8 to system



isn't it better?
Title: Re: Fore nth to not use utf8 ?
Post by: Lutz on January 03, 2014, 07:35:41 AM
Most people in the world speak languages which need Unicode characters, so UTF-8 should be the standard in newLISP rather than the exception ;-)  But we also need to process binary information, so newLISP has functions to process for UTF-8 and others for processing binary string buffers. The manual indicates what it does for each string processing function.



All functions working on UTF-8 multi byte characters are also marked with a "utf8" suffix in the reference section. An there is also a table collecting them all:



http://www.newlisp.org/downloads/newlisp_manual.html#utf8_capable
Title: Re: Fore nth to not use utf8 ?
Post by: dexter on January 09, 2014, 11:58:21 PM
I think if these "utf8 "function with a name "utf8_" in the head would less the mess



I know people speak different languages

but today ,most of the operation systems support multi language natively , quiet well



If I split a utf8 string to binary, which will be three bytes , and the print out these bytes, it will be showed as a utf8 string normally



Just a thought :)
Title: Re: Fore nth to not use utf8 ?
Post by: Lutz on January 10, 2014, 01:28:53 PM

> (set  'str "我能吞下玻璃而不伤身体。")
"我能吞下玻璃而不伤身体。"

; split into 8-bit bytes

> (unpack (dup "b" (length str)) str)
(230 136 145 232 131 189 229 144 158 228 184 139 231 142 187 231 146 131 232 128
 140 228 184 141 228 188 164 232 186 171 228 189 147 227 128 130)

; show a char for each 8-bit byte
> (map char (unpack (dup "b" (length str)) str))
("æ" "ˆ" "‘" "è" "ƒ" "½" "å" "" "ž" "ä" "¸" "‹" "ç" "Ž" "»" "ç"
 "’" "ƒ" "è" "€" "Œ" "ä" "¸" "" "ä" "¼" "¤" "è" "º" "«" "ä" "½"
 "“" "ã" "€" "‚")
>


or do this if you want a unicode number for each Chinese character:


> (explode str)
("我" "能" "吞" "下" "玻" "璃" "而" "不" "伤" "身" "体" "。")
> (map char (explode str))
(25105 33021 21534 19979 29627 29827 32780 19981 20260 36523 20307 12290)
>
Title: Re: Fore nth to not use utf8 ?
Post by: winger on February 08, 2014, 10:24:03 PM
> (println(join  ( 0 4 (explode "好汉不吃眼前亏"))) " ")
好汉
" "

This  trick is very cool!
Title: Re: Fore nth to not use utf8 ?
Post by: Lutz on February 09, 2014, 07:17:56 AM
you mean this:
> (println(join  ( 0 4 (explode "好汉不吃眼前亏"))) " ")
好汉不吃
" "
Title: Re: Fore nth to not use utf8 ?
Post by: TedWalther on February 09, 2014, 11:07:51 AM
Since utf8 should be the default, for every function that has a binary equivalent, is that mentioned in the manual, what to use for binary mode instead of utf8 mode?



Instead of the utf8_ prefix proposal, I propose the opposite; bin_ prefix for all the functions for when you want it to work byte by byte intead of char by char.



Knowing Lutz, he probably has something even better and more fun up his sleeve.