guiserver not work UTF8 encoding

Started by unya, August 06, 2010, 03:30:33 AM

Previous topic - Next topic

unya

Hello,



Java(guiserver.jar) need start option "-Dfile.encoding=UTF8".



I tried add setProperty() in guiserver(main), but NOT WORK.(Java BUG?)


System.setProperty("file.encoding", "UTF8") and "UTF-8"

Added java start option. the changed work font-name demo.



---- guiserver.lsp changed ----
(define (init (portIn 47011) (host "127.0.0.1") manual)
        ; check for server portIn and if this was started by java
        (if (main-args 2) (set 'portIn (int (main-args 2) portIn)))
        ; if guiserver.jar did not start this process then guiserver.jar
        ; still has to be started, except when manual parameter is true
        (if (and (not (= (main-args 3) "javastart")) (not manual))
                (if (= ostype "Win32")
                        ;(process (string "cmd /c " server-path " " portIn))
                        (process (string "javaw.exe -Dfile.encoding=UTF8 -jar " server-path " " portIn))

                        (= ostype "OSX")
                        ;(process (string "/usr/bin/java -jar " server-path " " portIn))
                        (process (string "/usr/bin/java -Dfile.encoding=UTF8 -jar " server-path " " portIn))

                        (env "JAVA_HOME")
                        ;(process (string (env "JAVA_HOME") "/bin/java -jar " server-path " " portIn))
                        (process (string (env "JAVA_HOME") "/bin/java -Dfile.encoding=UTF8 -jar " server-path " " portIn))

                        ;(process (string "/usr/bin/java -jar " server-path " " portIn))
                        (process (string "/usr/bin/java -Dfile.encoding=UTF8 -jar " server-path " " portIn))

                )
        )
   ....

   ....
----


thanks,

Lutz

#1
newLISP-GS switches character sets during runt-time not Java start-up! So when you specify '-Dfile.encoding=UTF8' during Java start up, it will get overwritten by 'init' in guiserver.lsp to whatever flavor of newLISP is running. In many of the Java files you find the following lines:


if(guiserver.UTF8)
            try {
            text = new String(text.getBytes(), "UTF-8");
            } catch (UnsupportedEncodingException ee) {}

The 'guiserver.UTF8' variable is set during the 'init' function of guiserver.lsp looking for the built in 'utf8' primitive function in the newLISP process. If 'utf8' is not 'nil' then 'guiserver.UTF8' is set to 'true'. The following snippet from guiserver.lsp occures in the definition of 'init':


(define (init (portIn 47011) (host "127.0.0.1") manual)
...
(gs:set-utf8 (primitive? MAIN:utf8))
)

The function 'gs:set-utf8' sets the 'guiserver.UTF8' variable in Java. So guiserver.lsp does the switching during run-time automatically depending on newLISP version running.  As an alternative, one could use System.setProperty() during run time.



The best would be, to ship only the newLISP UTF8 version for MS Windows, as done on Mac OS X, but many European countries use special flavors of the 8-byte ISO 8859 character set. As many people use editor and text files encoded using these character sets, they wouldn't be able to work with the UTF8 version of newLISP. Countries using multibyte UNICODE character sets in Windows (mostly in Asia), can download the UTF8 versions in http://www.newlisp.org/downloads/UTF-8_win32/">http://www.newlisp.org/downloads/UTF-8_win32/  .

unya

#2
I know, but Java(guiserver.jar) socket port encoding is 'MS932'.



My newlisp running/compiled with UTF8 on Windows XP.

newLISP v.10.2.12 on Win32 IPv4/6 UTF-8, execute 'newlisp -h' for more info.



java version "1.6.0_21"
Java(TM) SE Runtime Environment (build 1.6.0_21-b07)
Java HotSpot(TM) Client VM (build 17.0-b17, mixed mode, sharing)


allfonts-demo.lsp(added gs:set-utf8)

...
(gs:init)
(gs:set-trace true)

(gs:set-utf8 true)

(gs:frame 'AllFontsDemo 100 100 500 400)
(gs:set-background 'AllFontsDemo 1 1 1)
(gs:get-fonts)
(gs:panel 'FontPanel)
(gs:set-grid-layout 'FontPanel (length gs:fonts) 1 0 0)
...


guiserver.java (puts socket encoding)

...
out = new PrintWriter(new OutputStreamWriter(socket.getOutputStream()));

System.out.println("server connected");

// puts Server socket Encoding.
System.out.println("portIn: " +
  new InputStreamReader(client.getInputStream()).getEncoding());
System.out.println("portOut: " +
  new OutputStreamWriter(socket.getOutputStream()).getEncoding());

Dispatcher.init();

String cmd = null;
try {
while(listening)
...


Not working, it can not see Japanese Fonts Name.

C:localHomenewlisp-10.2.12guiserver>java -jar guiserver.jar 20000 allfonts-demo.lsp
newLISP-GS v.1.38 on Windows XP
 double buffering not supported.
guiserver starting newLISP "newlisp allfonts-demo.lsp 20000 javastart &"
guiserver finished exec
 listening on 20000
 accepted connection from 0.0.0.0
 connecting to 0.0.0.0:20001
server connected
portIn: MS932
portOut: MS932
-> set-utf8 System true
-> frame MAIN:AllFontsDemo 100 100 500 400
-> set-color MAIN:AllFontsDemo 1 1 1 1
-> get-fonts System
-> panel MAIN:FontPanel
-> set-grid-layout MAIN:FontPanel 108 1 0 0
-> set-text MAIN:AllFontsDemo QWxsIDEwOCBmb250cyBvbiB0aGlzIHN5c3RlbQ==
-> label label-0 QXJpYWw=
Arial


Working, it can see Japanese Fonts Name.

C:localHomenewlisp-10.2.12guiserver>java -Dfile.encoding=UTF8 -jar guiserver.jar 20000 allfonts-
demo.lsp
newLISP-GS v.1.38 on Windows XP
 double buffering not supported.
guiserver starting newLISP "newlisp allfonts-demo.lsp 20000 javastart &"
guiserver finished exec
 listening on 20000
 accepted connection from 0.0.0.0
 connecting to 0.0.0.0:20001
server connected
portIn: UTF8
portOut: UTF8
-> set-utf8 System true
-> frame MAIN:AllFontsDemo 100 100 500 400
-> set-color MAIN:AllFontsDemo 1 1 1 1
-> get-fonts System
-> panel MAIN:FontPanel
-> set-grid-layout MAIN:FontPanel 108 1 0 0
-> set-text MAIN:AllFontsDemo QWxsIDEwOCBmb250cyBvbiB0aGlzIHN5c3RlbQ==
-> label label-0 QXJpYWw=
Arial
...


Japanese Windows environment, Java is a good deal to UTF8, "-Dfile.encoding = UTF8" option is also necessary to say Q & A can be seen.



Thank you,

Lutz

#3
Is other Japanese text in the editor and output fine? With Russian UTF8 I cannot repeat the problem in either WinXP, Linux or Mac OS X, but have no font names with utf8 to test.



I wonder if this problem is only in gs:get-fonts. Does the following fix for getFonts() in guiserver.java solve the problem ?


public void getFonts(StringTokenizer tokens)
    {
    GraphicsEnvironment ge = GraphicsEnvironment.getLocalGraphicsEnvironment();
    String[] fontNames = ge.getAvailableFontFamilyNames();
    String item;

    guiserver.out.print("(set 'gs:fonts '( ");
    for(int i = 0; i < fontNames.length; i++)
        {
        item = fontNames[i];
        if(guiserver.UTF8)
            try { item = new String(item.getBytes("UTF-8"));
            } catch (UnsupportedEncodingException exc) { }
        guiserver.out.print(""" + Base64Coder.encodeString(item) + "" ");
        }
    guiserver.out.println(")) ");
    guiserver.out.flush();
    }

unya

#4
I tried only Label/ListBox widget.

I'll check...



I tried your patch.

fontname can see(or read)  on Windows Terminal, but it should not be see(or read) UTF8 fontname.

the guiserver returns MS932 fontname when java started without option.



P.S.

I've tried "System.setProperty("file.encoding", "UTF8")", NO EFFECT encoding this Java version.

In Java 1.6, "-Dfile.encoding" seems valid.

The Java(CJK) version of the "file.encoding" seems to have different behavior and treatment options.

unya

#5
Found this problem.



JAVA is a string class that determines the default coding environment.

UTF8 string is clear that the problem would be solved.



LabelWidget.java guiserver.java and tried to change, then



LabelWidget.java (diff)

$ diff -u LabelWidget.java.org  LabelWidget.java
--- LabelWidget.java.org        2010-08-06 05:07:02 +0900
+++ LabelWidget.java    2010-08-10 14:27:38 +0900
@@ -74,7 +74,8 @@

        if(guiserver.UTF8)
                try {
-               text = new String(text.getBytes(), "UTF-8");
+                   text = new String(text.getBytes("UTF-8"), "UTF-8");
+                   System.out.println(text); // deleted from source file.
                } catch (UnsupportedEncodingException ee) {}

        label.setText(text);
@@ -86,7 +87,7 @@
        String text = label.getText();
        if(guiserver.UTF8)
                try {
-                       text = new String(text.getBytes("UTF-8"));
+                       text = new String(text.getBytes("UTF-8"), "UTF-8");
                        }
                catch (UnsupportedEncodingException e) {}




guiserver.java

public void getFonts(StringTokenizer tokens)
{
    GraphicsEnvironment ge = GraphicsEnvironment.getLocalGraphicsEnvironment();
    String[] fontNames = ge.getAvailableFontFamilyNames();
    String item;

    guiserver.out.print("(set 'gs:fonts '( ");
    for(int i = 0; i < fontNames.length; i++)
        {
        item = fontNames[i];
        if(guiserver.UTF8)
            try {
item = new String(item.getBytes("UTF-8"), "UTF-8");
            } catch (UnsupportedEncodingException exc) { }
        guiserver.out.print(""" + Base64Coder.encodeString(item) + "" ");
        }
    guiserver.out.println(")) ");
    guiserver.out.flush();
    }


The MS932 Stream Socket can read the font names remain, guiserver is working fine.

The String Class Other Widget to work I think that if you change the generation.



To see all of the Widget, please some time.



Thanks,

Lutz

#6
This works only for getting/displaying the fonts (retrieving Java-internal strings)


new String(text.getBytes("UTF-8"), "UTF-8")
in all other situations either
new String(text.getBytes(), "UTF-8") ; for setText()
or
new String(text.getBytes("UTF-8")) ; for getText()
should be used.



A new newlisp-10.2.12.tgz can be found here:



http://www.newlisp.org/downloads/development/inprogress/">http://www.newlisp.org/downloads/develo ... nprogress/">http://www.newlisp.org/downloads/development/inprogress/



it contains the fix for 'gs:get-fonts' when font names contain UTF-8 characters, but it doesn't contain any other changes, as those will break UTF-8 behavior, at least in my tests on Mac OS X and Win32 XP and using widget-demo-ru.lsp and of course a UTF-8 version of newlisp.exe.



newlisp-x.x.x/guisserver/widget-demo-ru.lsp is a widget test program for the Russian UTF-8 characters. I wonder if you could make a widget-demo-jp.lsp for the Japanese language, which could be useful for testing and could be included in the distribution?



Also, I made the following strange observation on MS Windows. Starting on the command line with: "java -Dfile.encoding=UTF8 -jar guiserver.jar ..." is equivalent to starting newLISP-GS clicking the installed desktop icon.



Certain UTF-8 features would not work doing: "java -jar guiserver.jar ..." without the "-Dfile.encoding=UTF=8", but they would work when starting with the desktop icon. If you look into the link-icon properties of the desktop icon you see the application started like: "guiserver.jar 47011 newlisp-edit.lsp", as if ".jar" is registered to be started with java.exe or javaw.exe. It looks similar to the commandline but has a different effect.

unya

#7
Created for the Japanese version of the test "widget-demo-jp.lsp" Attach.



In addition, to add the image.

Russian and Japanese versions of the image,-Dfile.encoding = UTF8 is the result of running put.

Lutz

#8
Runs fine on Mac OS X 10.6 and UBUNTU Linux 10.4 under newLISP 10.2.8 and 10.2.12 and with all of the following mehods and without the gs:set-utf8 statement in widgets-demo-jp.lsp :



- newlisp widgets-demo-jp.lsp

- java -jar /usr/share/newlisp/guiserver.jar 47011 widgets-demo-jp.lsp

- start newLISP-GS from desktop icon, then load widgets-demo-jp.lsp



On Windows, I don't have the Japanese font installed, but the Russian version runs.



I wonder what is is with with either the MS932 font setting or some other specialty of your Windows installation, that it only works with java -Dfile.encoding=UTF8 on your system?



Perhaps other Japanese users (e.g. Johu ?) can help us with their experience?



In any case, many thanks for your help Unya.

johu

#9
Thanks for this work, unya and Lutz.

At last, I could know how to run newLISP-GS in UTF8 mode.


QuoteI wonder what is is with with either the MS932 font setting or some other specialty of your Windows installation, that it only works with java -Dfile.encoding=UTF8 on your system?


If it means that the widgets-demo-jp.lsp runs in both Shift-JIS code(MS932) and UTF8 code, it is the following link :



http://cid-23a9a25e1aec3626.office.live.com/self.aspx/.Public/widgets-demo-jp.zip">//http://cid-23a9a25e1aec3626.office.live.com/self.aspx/.Public/widgets-demo-jp.zip



maybe.



P.S.

Sorry, I have up-loaded a wrong file.

Now, I corrected it.

unya

#10
Thank you, Lutz and johu,

I have more idea.



Encoding system environment because Java is running on the mind may be led to convert the character stream.

I was investigated, NewLisp is UTF8, Java is a mismatch in the MS932 is happening.

(Java returns fontname is "MS932" via base64-enc)

So even though Java to work,-Dfile.encoding = UTF8 I might be required.



And as follows, allfonts-demo.lsp was working  without -Dfile.encoding=UTF8.



I do not know which method is better/best.



Replace encodeString() and decodeString()



I'think String.getBytes() depends System.Locale.Environment. (after added comment)



// Base64Coder.java

import java.io.UnsupportedEncodingException ;

...

//public static String encodeString (String s) {
//   return new String(encode(s.getBytes())); }
    public static String encodeString (String s) {
byte [] bs ;
try {
   bs = s.getBytes("UTF-8") ;
   String r = new String(encode(bs)) ;
//    System.out.println(r) ;
   return r ;
} catch (UnsupportedEncodingException ee) {}
return new String("") ;
    }

...

//public static String decodeString (String s) {
// if(s.equals("nil")) return("");
//   return new String(decode(s)); }
    public static String decodeString(String s) {
System.out.println("decode in : " + s) ;
if(s.equals("nil")) return("");
try {
   String r = new String(decode(s), "UTF-8") ;
//    System.out.println("decode : " + r) ;
   return r ;
} catch (UnsupportedEncodingException ee) {}
return ("") ;
    }
...



Text Encoding unconscious, Stream(socket) through base64_UTF8 text.

// guiserver.java

public void getFonts(StringTokenizer tokens)
{
    GraphicsEnvironment ge = GraphicsEnvironment.getLocalGraphicsEnvironment();
    String[] fontNames = ge.getAvailableFontFamilyNames();
    String item;

    guiserver.out.print("(set 'gs:fonts '( ");
    for(int i = 0; i < fontNames.length; i++)
        {
        item = fontNames[i];
        guiserver.out.print(""" + Base64Coder.encodeString(item) + "" ");
        }
    guiserver.out.println(")) ");
    guiserver.out.flush();
    }





Text Encoding unconscious, Stream(socket) through base64_UTF8 text.

// LabelWidget.java
...
public void setText(StringTokenizer tokens)
{
   String base64text = tokens.nextToken() ;
//    System.out.println("Label setText : " + base64text) ;
   String text = Base64Coder.decodeString(base64text) ;
label.setText(text);
}
...


Thank you for using your precious time.

Lutz

#11
Thanks Unya and Johu.



I have changed Base64Coder.java to include Unya's a encodeStringUTF8(String) and decodeStringUTF8(String) and all Widgets now use these and avoid String.getBytes():



http://www.newlisp.org/downloads/development/inprogress/">http://www.newlisp.org/downloads/develo ... nprogress/">http://www.newlisp.org/downloads/development/inprogress/



I hope in newlisp-10.2.13.tgz character encoding MS932 now works without -Dfile.encoding=UTF8



ps: updated newlisp-10.2.13.tgz, a few things were still missing

unya

#12
Thank you Lutz and Johu.



newlisp-10.2.13.tgz is working without -Dfile.encoding=UTF8.



I tried test delete "if (guiserver.UTF8) { ... }" from all Java source(base 10.2.12).



It's working without "-Dfile.encoding=UTF8".

Users can selectable Java running environment ,UTF8(with "-Dfile.encoding=UTF8") or System.Locale(without option).



java/*.java

guiserver.jar



added Dump.java for debug HexDump.

Lutz

#13
Quote... which method is better/best.

Both work well now, but using "if (guiserver.UTF8)" we can switch UTF-8 mode on/off from newLISP. This way we can start guiserver.jar first, as required when starting with the desktop icon on Windows or on Mac OS X. When loading "guiserver.lsp", newLISP then automatically switches to UTF-8 in 'gs:init', if required.

unya

#14
thank you Lutz and Johu.