Print Page - How to do string like binary?

Title: How to do string like binary?
Post by: dexter on November 16, 2011, 12:46:49 AM

I set a str with cjk chars like

(setq cn "中文abc")

which contains chinese chars

How can I cut this string into an binary array like in C cn

Cause I need to putchar this string ,but in newlisp

if I use slice like :

Code Select Expand


> (char (slice cn 0 1))
16384
> (char (slice cn 1 1))
184
> (char (slice cn 2 1))
173
> (char (slice cn 3 1))
24576

I think this is not the right code value .right?

Title: Re: How to do string like binary?
Post by: dexter on November 16, 2011, 01:11:16 AM

DONE

TURN OFF UTF8 SUPPORT

---------------------------------------------

Turn off utf8 support in makefile

rebuild newlisp withouf utf8

you will see -DSUPPORT_UTF8 in

makefile_build

makefile_linuxLP64_utf8

....

I Just deleted -DSUPPORT_UTF8.

now if ( setq cn "中文")

it'll be :

Code Select Expand

> (setq cn "中文")
"228184173230150135"

20013 or else will cause putchar (FCGI_putchar ) error.

the right code of 中文 is above 228....

like lutz said

:)

Title: Re: How to do string like binary?
Post by: sunmountain on November 16, 2011, 03:04:13 AM

Could you please tell the rest of us, what exactly you did ?

BTW, the correct codes should be:

中 20013

文 25991

a 97

b 98

c 99

(verified by Python 2.7.2).

There you have to explicitly mark a string as unicode via u'the string' (this changed in Python 3.x, where

all strings are unicode by default).

I'm asking because disabling unicode support while using unicode strings and then getting correct

results seems a bit strange.

Perhaps you could post the code you wrote.

Me wants to learn :-)

Title: Re: How to do string like binary?
Post by: Lutz on November 16, 2011, 06:38:51 AM

In UTF-8 versions of newLISP indexing on strings works on character rather than single byte boundaries. Although 'slice' slices binary, 'char' will try to convert to Unicode on UTF-8 versions of newLISP. Use 'unpack':

Code Select Expand
> (unpack (dup "b" (length cn)) cn)
(228 184 173 230 97 98 99)

In the manual all functions working on UTF-8 character boundaries are marked with a utf8 behind the red function name.

There is a list of all of these functions in this chapter:

http://www.newlisp.org/downloads/newlisp_manual.html#unicode_utf8

ps: run this to see how it works:

Code Select Expand
(set 'str "中文abc")
(println (unpack (dup "b" (length str)) str))
(println (explode str))
(dotimes (i (utf8len str))
    (print (str i) " -> ")
    (println (char (str i))))

gives you this output:

Code Select Expand
(228 184 173 230 150 135 97 98 99)
("中" "文" "a" "b" "c")
中 -> 20013
文 -> 25991
a -> 97
b -> 98
c -> 99

newLISP Fan Club

Forum => newLISP in the real world => Topic started by: dexter on November 16, 2011, 12:46:49 AM