UTF-8 - Unicode in development version 8.0.8

Started by Lutz, July 08, 2004, 04:39:00 PM

Previous topic - Next topic

Lutz

Just posted the 1st Unicode/UTF-8 compileable development version. I did testing with Cyrillic/Greek/Hebrew/Russian character sets, but could not test on a platform with keybboard support for those characters or platforms which heavily use multibyte characters like Chinese/Japanese/Indian/Arabic  chracter sets and also input these from the keyboard.



I believe JP (Jean Pierre) on this board is running Japanese Windows?



There could be (shouldn't) differences running the Tcl/Tk frontend and running newlisp.exe or newlisp (Linux binary) alone. The TCl/TK frontend switches fine on Linux, but could not test on Win32. It is not only display but also the correct  working of UTF-8 versions of specific string functions (see CHANGES file and manual cpater about UTF-8), like 'trim', 'nth',  'upper-case', etc..



Any feedback about this is appreciated.



Lutz

HPW

#1
Running Turtle.lsp with the UTF-8 EXE gives an error ' Bad screen distance "302,1612092" '.
Hans-Peter

HPW

#2
When I use a german Umlaut in a String with the Trim command I get a strange result.


Quote
> (trim "Höhe;;" ";")

"Hö¨¥–"


Do I need to give the trim command a UTF-8 string where the umlaut is encoded in a compatible way?
Hans-Peter

Lutz

#3
I don't think you should use the UTF-8 version on Win32 in Germany, where Windows is localized with German as a one-byte-character language, probably with code page ISO-8859.



Windows in Germany and other European countries will display Unicode in the notepad.exe application and others but is else not a Unicode enabled OS.



Lutz

Lutz

#4
just found out:



(trim "Höhe;;" ";") => "Höhe"



works fine on newlisp-utf8.exe when in the command shell, it is together with the Tcl/Tk frontend, that is gets confused. I wonder if on Win32 Tcl/Tk has to be compiled as a Unicode application with unicows.dll/lib etc. On Linux it switches on startup.



Lutz

HPW

#5
Thanks for the info.



In the utf8 doku is a typo:



'The utf8 function is used top convert from UCS-4 to UTF-8'



We all know that newLISP is top but I think it should read 'to'.
Hans-Peter

HPW

#6
Maybe the trim example in the docu could be clearer:



(trim "00012340" "0")            => "1234"
(trim "00012340" "0" "")         => "12340"
(trim "01234000" "" "0")         => "01234"
Hans-Peter

jp

#7
Lutz



Strangely enough I did try your newlisp-utf8.exe and I found it brakes code when

 used with UTF8 strings but the regular NewLisp does not

Example if you run the strings ..



(trim "     µùѵ£¼Φ¬₧πüîΘ¢úπüùπüä             ")  ;; Japanish ist schwer (UTF8 i

n Japanese)

(trim "     Er ist ein gro├ƒer Schw├ñtzer  ")  ;; Er ist ein grosser Schwaetzer

(UTF8 in German)



The code will be broken on both accounts by newlisp-utf8.exe but left intact wit

h Newlisp



Jean-Pierre

jp

#8
Quote from: "Lutz"I don't think you should use the UTF-8 version on Win32 in Germany, where Windows is localized with German as a one-byte-character language, probably with code page ISO-8859.



Lutz


Indeed under XP with the default code page chcp 437, under the command prompt

echo (trim "Höhe;;" ";")  > test.txt

notepad test.txt will show that we have an ANSI coded file.



Jean-Pierre

Lutz

#9
Seems like Windows uses Unicode only internally but else translates to one-byte-character code pages. But when loading a utf-8 file into notepad.exe it works correctly. You also can read this file in newlisp-utf8.exe, upper-case the string and write it back, and it will be fine in notepad.exe. 'upper-case' in newLISP converts the a utf-8 string to 4-byte Unicode and calls a Borland/Windows or Linux -library function towupper(), then converts back to utf-8. notepad.exe also has a save-as option for utf-8.



I wonder if all you need is a utf-8 compiled cmd.exe, like it is the case on Linux with Xterm, and I thought that perhaps Japanese Windows would be like this. Did you try your experiment on US-WinXP or on a Japanese localized version?



Lutz

jp

#10
The localization won't matter under Win2k or XP since all the internal representations are in Unicode (UTF-16LE). Strictly speaking UTF8 is not Unicode but a coding that lends itself readily to conversion in Unicode(s). The disparity between newlisp-utf8.exe and its UNIX counterpart could come that under UNIX Unicode is not Low Endian but High Endian and Windows will require a Low Endian code otherwise will mess up subsequent conversion in UTF8.



Jean-Pierre

jp

#11
Quote from: "HPW"Running Turtle.lsp with the UTF-8 EXE gives an error ' Bad screen distance "302,1612092" '.


Running an equivalent program UTF-8 EXE was able to carry all its calculations and display Japanese without any problems

Jean-Pierre



========= Kame.lsp

;; Kame.lsp - graphics

;; written by Jean-Pierre Berard

;;

;; 1 rad = 180/3.1415927 = 57.29578 deg

;; 1 deg = 0.017453292 rad



(set! color "blue")

(set! width  500)

(set! height 500)



(define (convert angle) (mul angle 0.017453292))

(define (adjacent-cos angle hypo) (mul hypo (cos (convert angle))))

(define (adjacent-tan angle opposite) (div opposite (tan (convert angle))))

(define (hypo-sin angle opposite) (div opposite (sin (convert angle))))

(define (hypo-cos angle adjacent) (div adjacent (cos (convert angle))))

(define (opposite-sin angle hypo) (mul hypo (sin (convert angle))))

(define (opposite-tan angle adjacent) (mul adjacent (tan (convert angle))))

(define (outer inner-angle) (sub 180 inner-angle))



(define (rectangular angle radius)

 (set! x (adjacent-cos angle radius))

 (set! y (opposite-sin angle radius))

 (println "x=" x " y=" y)

true

)



(define (polar x y)

 (set! angle (div (atan (div y x)) 0.017453292))

 (set! radius (root (add (pow x 2) (pow y 2)) 2))

 (println "angle=" angle " radius=" radius)

true

)



(define (triangulation side side-size)

 (set! y (div side-size 2))

 (set! angle (div 360 side 2))

 (set! x (adjacent-tan angle y))

 (set! radius (hypo-sin angle y))

 (println "angle=" angle " radius=" radius)

 (println "x=" x " y=" y)

 (pen 'yellow)

 (forward y)

 (right 90)

 (forward x)

 (right (sub 180 angle))

 (forward radius)

true

)



(define (pseudo-polygon side n)

  (set! ratio (div 360 side))

  (dotimes (x side)

    (forward n)

    (right ratio))

  (left ratio)

  )



(define (polygon side n)

  (dotimes (x side)

    (forward n)

    (right (div 360 side))

  ))



(define (oval x y)

 (set! Y (sub lastY (div y 2)))

 (tk ".kw.canvas create oval "

     (join (map string (list lastX Y (add lastX x) (add Y y))) " ")

     " -outline " color)

 (round (div direction 0.017453292))

)



(define (circle n)

 (set! X 0)

 (set! x (round lastX))

 (set! y (round lastY))

 (set 'direction -1.570796327)

 (set! ratio (mul (div 57.29578 n) 2))

 (until (and (= x X) (= y (round lastY)))

   (set! X (round lastX))

   (forward 1)

   (right ratio))

)



(define (cercle n)

 (set! x (round lastX))

 (set! y (round lastY))

 (set! lastX (+ x n))

 (for (t 0 2 0.005) ;; from 0 to 2 rad

    (set! newX (mul n (cos (mul pi t))))

    (set! newY (mul n (sin (mul pi t))))

    (set! newX (add newX x))

    (set! newY (add newY y))

    (tk ".kw.canvas create line "

        (join (map string (list lastX lastY newX newY)) " ")

       " -fill " color)

    (set 'lastX newX)

    (set 'lastY newY))

 (set 'lastX x)

 (set 'lastY y)

 (round (div direction 0.017453292))

)



(define (rose clr)

  (set 'color clr)

  (dotimes (x 90)

   (pseudo-polygon 4 60)

   (right 2))

 )



(define (square n)

  (dotimes (x 4)

    (forward n)

    (right 90))

   )



(define (squirl n)

  (dotimes (x (/ n 3))

    (forward n)

    (right 90)

    (set! n (- n 2)))

  (round (div direction 0.017453292))

 )



(define (dragon sign level)

  (if (= 0 level)

    (forward 4)

    (begin

      (dec 'level)

      (right (sign 45))

      (dragon - level)

      (left (sign 90))

      (dragon + level)

      (right (sign 45))

     )))



(define (dragon-curve n clr)

  (set 'color clr)

  (dragon + n)

  )



(define (right d)

  (set 'direction (add direction (mul d 0.017453292)))

  (round (div direction 0.017453292)))



(define (left d)

  (set 'direction (sub direction (mul d 0.017453292)))

  (round (div direction 0.017453292)))



(define (forward d)

  (set 'newX (add lastX (mul (cos direction) d)))

  (set 'newY (add lastY (mul (sin direction) d)))

  (tk ".kw.canvas create line "

       (join (map string (list lastX lastY newX newY)) " ")

      " -fill " color)

  (tk "update idletasks")

  (set 'lastX newX)

  (set 'lastY newY)

  (round (div direction 0.017453292))

 )



(define (backward d)

 (set! direction (mul -1 direction))

 (forward d)

)



(define (pen clr) (set! color (string clr)))



(define (clear)                                ;; upper left and lower right

 (tk ".kw.canvas create rectangle 0 0 "

       (join (map string (list width height)) " ")

       " -fill black -tag clear")

 (center)

)



(define (center)

  (set 'lastX (/ width 2))

  (set 'lastY (/ height 2))

  (set 'direction -1.570796327))



(define (start x y)

  (set 'lastX x)

  (set 'lastY y)

  (set 'direction -1.570796327))



(define (goto x y)

  (set 'lastX x)

  (set 'lastY y)

  (round (div direction 0.017453292))

  )



(begin

 (set! today (parse (date (apply date-value (now)))))

 (println (car today) " " (cadr today) " " (caddr today))

 (set! nihongo {u4e80u3000u4f5cu56f3})

 (tk "if {[winfo exists .kw] == 1} {destroy .kw}")

 (tk "toplevel .kw")

 (tk "canvas .kw.canvas -width " width " -height " height " -bg black")

 (tk "pack .kw.canvas")

 (tk "wm geometry .kw +290+25")

 (tk "wm title .kw { Kame.lsp}")

 (tk "bind .kw exit")

 (start 50 450)

 (squirl 400)

 (rose "red")

 (tk ".kw.canvas create text 130 380 "

     "-fill white -font {Times 22 normal} -text " nihongo)

 )



(define (help)

 (println "outer inner-angle")

 (println "adjacent-cos angle hypo")

 (println "adjacent-tan angle opposite")

 (println "hypo-sin angle opposite")

 (println "hypo-cos angle adjacent")

 (println "opposite-sin angle hypo")

 (println "opposite-tan angle adjacent")

 (println "triangulation side side-size")

 (println "rectangular angle radius")

 (println "polar x y")

true

)

jp

#12
[/quote]

Running an equivalent program UTF-8 EXE was able to carry all its calculations and display Japanese without any problems

Jean-Pierre



========= Kame.lsp

[/quote]



Sorry the second statement after (begin

(println (car today) " " (cadr today) " " (caddr today))

has to be substituted with ....

(println (nth 0 today) " " (nth 1 today) " " (nth 3 today))



One has also to add the function ...

(define (round n) (floor (add n 0.5)))

to make it newlisp standard script



Jean-Pierre

Lutz

#13
Thanks Jean-Pierre and Hans-Peter for all the input about the UTF-8 version on Win32. It seems that things are working ok, except for the UTF-8 version of 'trim'.



I found the problem with 'trim' and fixed it for the next development version 8.0.9, probably out tomorrow. I still want to retest all other UTF-8 enabled functions on Windows, which is a bit tedious, because I have to write all strings before/after manipulation to a file and then view them with notepad.exe, to see if they correctly work on character versus byte borders and not change things they shouldn't.



Lutz

jp

#14
Lutz



Well done everything seems to be fixed except for the best feature of all; the ability of Newlisp to directly communicate with the clipboard. Newlisp-utf8.exe seems to disable completely the clipboard on Unicode and non Unicode based Windows (Win98/ME).

Also newlisp-utf8.dll and alas newlisp.dll as well cannot run processes under the exec function, only the function ! shell out works.



Jean-Pierre