Upper-case, UTF-8 and Windows won`t work together

Started by Fritz, October 25, 2009, 09:50:52 AM

Previous topic - Next topic

Fritz

I have a russian Windows with native Win-1251 and DOS-CP866 encoding. For some strange reason "upper-case" operator want not work:


(println (upper-case "абвгдеёжзийклмнопрстуфхцчшщъыьэюя"))

Result:


"ࡢ㤥¸槨骫쭮ﰱ㴵縹콾"

Expected:


"АБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ"

Screenshots:

http://img7.imageshost.ru/imgs/091025/a6a681a191/e2e93.png">http://img7.imageshost.ru/imgs/091025/a ... /e2e93.png">http://img7.imageshost.ru/imgs/091025/a6a681a191/e2e93.png

http://img7.imageshost.ru/imgs/091025/b357b8e2bc/44605.jpg">http://img7.imageshost.ru/imgs/091025/b ... /44605.jpg">http://img7.imageshost.ru/imgs/091025/b357b8e2bc/44605.jpg



Version: newLISP v.10.1.5 on Win32 IPv4 UTF-8



Btw, in Ubuntu "upper-case" works allright:

http://img7.imageshost.ru/imgs/091025/35fb59f3ce/31ca8.png">http://img7.imageshost.ru/imgs/091025/3 ... /31ca8.png">http://img7.imageshost.ru/imgs/091025/35fb59f3ce/31ca8.png

m35

#1
According to 10.1.5 newLISP nl-string.c
/* Note that on many platforms towupper/towlower
do not work correctly for non-ascii unicodes */

Have you tried using (upper-case) using the regular (non UTF8) newLISP?



Given the way Windows doesn't cater to UTF8 by default, this may require a bit of platform specific code.



Links for reference

http://msdn.microsoft.com/en-us/library/45119yx3%28VS.71%29.aspx">http://msdn.microsoft.com/en-us/library ... 71%29.aspx">http://msdn.microsoft.com/en-us/library/45119yx3%28VS.71%29.aspx

http://www.lingoport.com/gi/help/gihelp/unsafeMethod/cpp_toupper.htm">http://www.lingoport.com/gi/help/gihelp ... oupper.htm">http://www.lingoport.com/gi/help/gihelp/unsafeMethod/cpp_toupper.htm

Fritz

#2
non-UTF upper- and low-case just do nothing with russian letters. (lower-case "A") -> "A", (upper-case "a") -> "a".



But that is really not too important: newLISP is comfortable enough to create any encoding function in coffee-cup time. Anyway I had to write my own functions to decode russian letters in URL, in POST-queries, in RTF-files etc.


(set 'cyr-alphabet (list "а" "б" "в" "г" "д" "е" "ё" "ж" "з" "и" "й" "к" "л" "м" "н" "о" "п" "р" "с" "т" "у" "ф" "х" "ц" "ч" "ш" "щ" "ъ" "ы" "ь" "э" "ю" "я" "А" "Б" "В" "Г" "Д" "Е" "Ё" "Ж" "З" "И" "Й" "К" "Л" "М" "Н" "О" "П" "Р" "С" "Т" "У" "Ф" "Х" "Ц" "Ч" "Ш" "Щ" "Ъ" "Ы" "Ь" "Э" "Ю" "Я"))

(define (cyr-low linea)
  (let (menudo "" letra "")
    (while (!= (set 'letra (pop linea)) "")
      (if (and (find letra cyr-alphabet) (> (find letra cyr-alphabet) 32))
        (push (cyr-alphabet (- (find letra cyr-alphabet) 33)) menudo -1)
        (push letra menudo -1)))
    menudo))


Screenshot:

http://img7.imageshost.ru/imgs/091027/6515c1b902/812d6.jpg">http://img7.imageshost.ru/imgs/091027/6 ... /812d6.jpg">http://img7.imageshost.ru/imgs/091027/6515c1b902/812d6.jpg