about (directory)

Started by qinying, July 16, 2017, 06:11:07 AM

Previous topic - Next topic

qinying

The Chinese file name cannot be displayed correctly.on my win7-64-bit computer.

it's a bug?

TedWalther

#1
To help with debugging, can you run this newlisp code and paste the results?



(define str->bytes (lambda (s) (unpack (dup "b" (length s)) s)))
(str->bytes "the real filename")
(str->bytes "the filename directory returns")


Don't  cut and paste the code; when you run str->bytes function, then put in the different strings, the one you see in Windows Explorer, and the one that (directory) returns.  Then we can compare the byte strings.
Cavemen in bearskins invaded the ivory towers of Artificial Intelligence.  Nine months later, they left with a baby named newLISP.  The women of the ivory towers wept and wailed.  \"Abomination!\" they cried.

qinying

#2
so glad to receive the reply,thanks!

> (define str->bytes (lambda (s) (unpack (dup "b" (length s)) s)))
(lambda (s) (unpack (dup "b" (length s)) s))
> (directory)
("." ".." "guiserver" "index.html" "newlisp.exe" "鏂板缓鏂囨湰鏂囨。.txt")

> (str->bytes "新建文本文档")
(208 194 189 168 206 196 177 190 206 196 181 181)
> (str->bytes "鏂板缓鏂囨湰鏂囨。")
(230 150 176 229 187 186 230 150 135 230 156 172 230 150 135 230 161 163)

TedWalther

#3
Can you run (set-locale) and tell us the results?
Cavemen in bearskins invaded the ivory towers of Artificial Intelligence.  Nine months later, they left with a baby named newLISP.  The women of the ivory towers wept and wailed.  \"Abomination!\" they cried.

qinying

#4

> (set-locale)
("Chinese (Simplified)_People's Republic of China.936" ".")

rrq

#5
Hmm, I get the following
> (length "新建文本文档")
18
> (unpack (dup "b" 18) "新建文本文档")
(230 150 176 229 187 186 230 150 135 230 156 172 230 150 135 230 161 163)
I.e., my byte sequence for the first string (copy-and-paste from this forum) is the same as your byte sequence for the second string.


> (length "鏂板缓鏂囨湰鏂囨。")
27
> (unpack (dup "b" 27) "鏂板缓鏂囨湰鏂囨。")
(233 143 130 230 157 191 231 188 147 233 143 130 229 155 168 230 185 176 233 143 130 229 155 168 227 128 130)

Though I have > (set-locale)
("en_AU.UTF-8" ".")


Not much help I'm afraid.

TedWalther

#6
Might have something to do with locale; perhaps the output of (directory) is coming in the current locale, which may NOT be UTF-8?  So, how do you convert from his locale encoding into UTF8?  What text encoding is he using if not UTF8?  Some sort of non-UTF multi-byte encoding.
Cavemen in bearskins invaded the ivory towers of Artificial Intelligence.  Nine months later, they left with a baby named newLISP.  The women of the ivory towers wept and wailed.  \"Abomination!\" they cried.

TedWalther

#7
So, the problem is his system is using code page 936 (Simplified Chinese), but newlisp is using UTF8.  To convert between encodings perhaps I need to write some bindings for libiconv, libiconv can convert between different character encodings.  But I don't see myself having time to write the bindings soon.  If he is comfortable with the C interface for newlisp modules, he can install the iconv DLL and write the bindings and that will fix the problem.
Cavemen in bearskins invaded the ivory towers of Artificial Intelligence.  Nine months later, they left with a baby named newLISP.  The women of the ivory towers wept and wailed.  \"Abomination!\" they cried.

qinying

#8
I'm new to programming and I can't solve this problem by myself, so I can only look forward to the new version

qinying

#9
try the non utf-8 version

newLISP v.10.7.2 64-bit on Windows IPv4/6 libffi, options: newlisp -h

> (set-locale)
("C" ".")
> (directory)
("." ".." "newlisp.exe" "208194189168206196177190206196181181.txt")
> (println (last (directory)))
新建文本文档.txt
"208194189168206196177190206196181181.txt"
> (append-file (last (directory)) "中文可以吗?")
12
> (read-file (last (directory)))
"214208206196191201210212194240163191"
> (println (read-file (last (directory))))
中文可以吗?
"214208206196191201210212194240163191"


then I change the locale



> (set-locale "Chinese (Simplified)_People's Republic of China.936" ".")
("Chinese (Simplified)_People's Republic of China.936" ".")
> (directory)
("." ".." "newlisp.exe" "新建文本文档.txt")
> (println (read-file (last (directory))))
中文可以吗?
"中文可以吗?"

TedWalther

#10
Does that mean the non-UTF8 version of newlisp is working for you?
Cavemen in bearskins invaded the ivory towers of Artificial Intelligence.  Nine months later, they left with a baby named newLISP.  The women of the ivory towers wept and wailed.  \"Abomination!\" they cried.

qinying

#11
Quote from: "TedWalther"Does that mean the non-UTF8 version of newlisp is working for you?


Can only handle Chinese directory and file names,

can not  parsing strings correctly