UTF8 bug

Started by ptroev, January 31, 2009, 12:52:21 AM

Previous topic - Next topic

ptroev

Non-latin utf8 symbol 0xd098 (0x98d0)  is translated into 0xd03f (0x3fd0)

when saving a file in newlisp-edit , also any regular expression throws an exception on that symbol,

maybe this is java issue, any suggestions?

Lutz

#1
On Mac OS X I see Chinese (or Korean?) characters. On Ubuntu Linux and Windows XP the characters are displayed as a box, but they save and reload without change on all 3 platforms. I tried all characters from your post.



On Windows and Linux I also tried other UTF8 characters (Greek) which display fine and I don't see any change when saving and reloading.



Perhaps and error specific to your locale? What language is your Windows version and Java localized too?



Also, make sure you have replaced newlisp.exe with an UTF8 enabled executable from here: http://www.newlisp.org/downloads/UTF-8_win32/">http://www.newlisp.org/downloads/UTF-8_win32/

ptroev

re
#2
that's strange, because i have same issue on 2 computers.. (jre 1.6.0_05 and 1.5.x)

newLISP v.10.0.1 on Win32 IPv4 UTF-8



that letter is a cyrillic symbol "È" (mirrored N) in utf8,

and is only one that is saved and regexed incorrectly,

during editing and saving it looks ok, but when file is closed and reopened

it shows like square and '?'.



maybe this is regex issue, when highlighting

Lutz

#3
'regex' is fine for me on all three platforms. It must be a problem with the localization of either Windows or Java in your language, or just the font. If I understand you correctly its just one character showing these problems? If your language is Russian, there are several other users on this forum, who may be able to comment.





ps: using Java 1.5 on Mac OS X and Java 1.6 on Windows

ptroev

re
#4
yes, just 1 character, i'm puzzled.



i don't think this is localization (cp1251) or font problem, that char works in other apps,

and it's saved correctly in windows notepad in utf8



thanks any way,

i'll try to contact those people, maybe there's workaround



btw, i was mistaken, nothing is wrong with regex, checked it, works with utf8 text files in russian,

it fails only if that symbols is in .lsp source, inside [text][/text]



ps: that's funny, according to google, there is something special with that symbol and utf8.

so maybe this is a problem with java textarea and/or base64 + utf8