reading utf16 files?

Started by cormullion, May 28, 2007, 02:56:29 PM

Previous topic - Next topic

m35

#15
Quote from: "jp"set the code page with the following command, chcp 65001

Thanks jp! I wasn't aware of that one.



Now here is that same process after changing the code page.

F:temp>chcp 65001
Active code page: 65001

...

newLISP v.9.1.1 on Win32 UTF-8, execute 'newlisp -h' for more info.

> (print "230162182230181166231148177232168152")
梶浦由記""
> (write-file "230162182230181166231148177232168152" "Hello Unicode")
13
> (directory)
("." ".." "")
> !dir /w
 Volume in drive F has no label.
 Volume Serial Number is C458-D3A7

 Directory of F:temp

[.]            [..]           梶浦ç"±è¨˜
               1 File(s)             13 bytes
               2 Dir(s)      12,066,816 bytes free
> (read-file "230162182230181166231148177232168152")
"Hello Unicode"
>
Note that the 梶浦由記 appear as rectangles in the console (but I assume that's just because the Lucida Console font doesn't have those characters).



The behavior of that (directory) entry is interesting...
> (directory)
("." ".." "")
> (length ((directory) 2))
12
> (setq s ((directory) 2))
""
> s
""
> (length s)
12
> (source 's)
"(set 's "")rnrn"
> (print s)
梶浦由記""


Unfortunately I'm still left with the 梶浦ç"±è¨˜ file, and not the proper Unicode one.



Since I'm not having any luck, I went ahead and implemented UTF-16 versions of functions that refer to path names (using the Win32 API). I'll post them on the "newlisp for Win" board when I'm done.

jp

#16
Quote from: "m35"Unfortunately I'm still left with the 梶浦ç"±è¨˜ file, and not the proper Unicode one.


Perhaps it is worth mentioning that for win2k and above the internal representations are in Unicode UTF-16LE and if one can change arbitrarily its DOS code page, in Windows proper, the internal character representations remained fixed.

Also the name 梶浦由記 strikes me more as being a Japanese name (Kajiura Yuki) rather than a Chinese. Nonetheless Windows will need to have its Chinese/Japanese Fonts enabled in order to render those characters properly.

m35

#17
Quote from: "jp"Also the name 梶浦由記 strikes me more as being a Japanese name (Kajiura Yuki) rather than a Chinese.


Good eye jp. Read Japanese? Other languages?



ps I'm a big fan of Yuki Kajiura's http://www.fictionjunction.com/index2.html">work :)

jp

#18
Quote from: "m35"Good eye jp. Read Japanese? Other languages?

Pleased to oblige!

Yes indeed, I read Japanese. And I believe you are a Japanese native speaker since you inadvertently inverted the L for an R in your login summary.

m35

#19
ご免なさい I know only a little Japanese because I work with Japanese people (and like あにめ ^_^). The カリフォニア typo is part 日本語 accent, and part Arnold Schwarzenegger accent (´∀`)

newdep

#20
QuoteAnd I believe you are a Japanese native speaker since you inadvertently inverted the L for an R in your login summary.


Speaking about good eyes?? That must be a secret hint.. I was indeed wondering why he mispelled california... ;-) No offence btw... it just caught my eye too and did not know there was perhpas a reason for it..
-- (define? (Cornflakes))

jp

#21
QuoteSpeaking about good eyes?? That must be a secret hint..

Well, there is nothing too esoteric about it!

Japanese has no phonetic equivalent to the L and R consonants but has a consonant that seat somewhere between those 2 sounds. Hence even knowing perfectly well all common place names since childhood due to the lack of that phonetic register the Japanese are often at loss to write down L and R containing names in English they know assuredly in Japanese.