What's the best way to check whether the current newlisp session is running in UTF8 mode? And is it then possible to switch between UTF8 and non-UTF8 functions such as utf8len <-> length?
I'm trying to make sure that a program runs OK on both types of newLISP, but not sure how it can be done?
BTW: who isn't using UTF8 these days?
Quote from: "cormullion"
BTW: who isn't using UTF8 these days?
ME ;)
It's easier to use NON-UTF-8 with data generated by and for Windows legacy apps... (Less surprises!)
-- xytroxon
Quote
BTW: who isn't using UTF8 these days?
Me too, user of the DLL use it in a enviroment which also does not support it.
I dont use utf-8, actualy you dont want to know my statement on utf-8 ;-)
if a user has 2 binarys you could execute in a pre-check script based on utf-8
..
But i liked your check in the GS color gadget...
PS: Who uses utf-8 dies days?
Voice is the gadget people..:) (like it was 15 years ago ;-)
If this works:
(if-not unicode (println "need UTF version" (exit)))
I'll use something similar.
Hard for me to test, of course... :)
My understanding had been that a Unicode version of newLISP can process all data generated by non-Unicode systems... but that non-Unicode versions of newLISP can't process data generated by Unicode applications. Perhaps I'd got it wrong.
I think that is correct indeed.. utf-8 version can handle none utf-8 code..
Here are some ways to test if the running newLISP is UTF-8 enabled:
(= (char (char 1000)) 1000) => true on UTF-8 versions
(primitive? utf8) => true on UTF-8
(primitive? unicode) => true on UTF-8
; or simply check if the utf8 or unicode function is there
(if utf8 true nil) => true on UTF-8
(if unicode true nil) => true on UTF-8
The difference between the two versions is the working of several functions dealing with strings:
http://www.newlisp.org/downloads/newlisp_manual.html#utf8_capable
and the addition of the three functions: 'utf8', 'utf9-len' and 'unicode'. The UTF-8 version can handle ASCII non-UTF-8 strings without a problem. A problem could occur when processing one-byte character sets, which encode characters in the range 128-255. Portions of this text could be mistaken as UTF-8 by the UTF-8 version of newLISP and using functions in the above link. This could occur in popular Windows one-byte ISO-8859 character sets using the bytes beyond 127.