(directory) in Windows and UTF8

Started by Fritz, March 27, 2010, 11:44:02 AM

Previous topic - Next topic

Fritz

I have suddenly noticed, that newLISP directory operator returns his data in UTF8 codepage, while native Windows XP codepage (in Russia) is windows-1251 (and newlisp-edit.lsp module has windows-1251 encoding too).



That is not an important problem, becouse I always can use construction like


(map decode-utf8-to-cp1251 (directory "."))

Fortunately, russian alphabet is 33 letters only, so decoding is easy. I just want to ask about future politics: will newLISP in the future have an operator like *nix iconv?

Cyril

#1
Quote from: "Fritz"I just want to ask about future politics: will newLISP in the future have an operator like *nix iconv


You can use iconv in Windows too. A lot of Unix-like applications for Windows comes with iconv library today: in my system there are... let me count... six instances of iconv.dll (yes, this IS the dll hell!). If you have no one, download it form http://gnuwin32.sourceforge.net/packages/libiconv.htm">GnuWin32 (the library file is named libiconv2.dll in this package, but it is all the same). The library interface is a bit low-level, but newLISP is doing a very good job accessing low-level interfaces. The following code is demo, but it works for me:


(import "iconv.dll" "libiconv_open")        ; see 1
(import "iconv.dll" "libiconv")
(import "iconv.dll" "libiconv_close")

(setq cd (libiconv_open "cp866" "utf-8"))   ; see 2, 3

(setq in ((directory ".") -1))              ; see 4
(setq out (pack "n1024"))                   ; see 5

(setq inbuf (pack "lu" (address in)))
(setq inlen (pack "lu" (length in)))
(setq outbuf (pack "lu" (address out)))
(setq outlen (pack "lu" (length out)))

(libiconv cd (address inbuf) (address inlen) (address outbuf) (address outlen))

(libiconv_close cd)

(println out)

(exit)



Some comments:



1) Put iconv.dll in the current directory, or write the full path in import statements; replace "iconv.dll" with "libiconv2.dll" if you have downloaded it from the location mentioned above; all the rest is the same;



2) I suppose you are using utf-8-enabled build of newLISP, in plain 8-bit build (directory) just returns cp1251;



3) I convert the name to cp866, not to cp1251, cause console window is cp866; with 1251 it works the same;



4) I have created the file "привет.txt" just for this demo, so the last file in directory name is converted;



5) I hope 1024 byte buffer is enough;



6) Absolutely NO error check in this demo, in production code must be some;



This demo happily prints "привет.txt". I hope you can elaborate this to production-quality code. ;-)



Update (two hours later):



The code above was incomplete: the out buffer was still containing 1024-byte string, although most of them was zeros. Such string can be printed, but not of much use otherwise. To extract the converted string from the buffer, one must subtract the resulting outlen, modified by libiconv function (in fact it is the "bytes left" value), from the original one. So, instead of printing the out value directly, write the following at the end:


(setq result (slice out 0 (- (length out) (get-int outlen))))

(println result)


Hope this helps.
With newLISP you can grow your lists from the right side!

Fritz

#2
Quote from: "Cyril"(import "iconv.dll" "libiconv")


GIMP iconv.dll made this for me! Thanx!

Lutz

#3
You could further simplify and speed up a little:


(setq inbuf (pack "lu" in))
(setq inlen (pack "lu" (length in)))
(setq outbuf (pack "lu" out))
(setq outlen (pack "lu" (length out)))

(libiconv cd inbuf inlen outbuf outlen)


Strings are automatically passed by their address to 'pack' and imported functions and therefor don't need the 'address' operator.

Cyril

#4
A bit too late, but: a newlisp wrapper to iconv exists on the site of Dmitry Chernyak [http://en.feautec.pp.ru/store/libs/doc/index.html">here]. It was written as unix-specific, but it is easy to adopt it for Windows usage: just change library name (form ".so" to ".dll") and add prefix "lib" to all three imported functions. Warning: the module works fine for me when correct args are passed, but seems to have a broken error handling. At least during my evaluation it has either gone into infinite loop or return unexceptional results on wrong (unconvertable) args. And no, I have not put a great effort into investigation, just a quick glance, sorry.
With newLISP you can grow your lists from the right side!

IVShilov

#5
Hello, I have a problem with importing iconv in WinXP (old notebook) - newlisp terminates right after first function call, libiconv_open.





Importing from kernel32.dll, user32.dll, works fine, but with iconv from GnuWin32 - no luck.



How can I debug FFI calls? Calling MessageBoxA without parameters ends with same process crash, but with right set of params - it works; so, maybe
(libiconv_open "cp866" "utf-8")
is wrong call?

How to certainly figure out which params and its type I have to pass to wich FFI-function?

I have "Dependency Walker" utility, maybe needs something more?



With another versions of iconv.DLL, found in system, newlisp behave like this, it crashes.

rrq

#6
This problem might be due to that the passed in strings are temporary, and that therefore their allocated space might be reclaimed too early, before the actual call is done. If that's the case, a wrapping like the following might do the trick
(let ((IN "CP866") (OUT "UTF-8")) (libiconv_open (address IN) (address OUT)))
You might also get the same effect by declaring the imported function in full, with parameters as char*.



As far as I can tell from a cursory glance at nl-import.c, the parameters are deleted before the actual function call is made. Though I may well misread it or misunderstand it, so don't hold your breath.