how can I make read-file work with /dev/stdin? when my application doesn't have a file specified by the user, I want to fall back to stdin to slurp my data in. It is potentially utf8 data input, and I will want to iterate over each character in the stream..
Writing to /dev/stdout works just fine.
Perhaps the simplest solution is to let read-file etc use file descriptors (integers) as well as strings?
Example code:
(setq from-format (sym (or (env "FROM") "utf8")))
(setq to-format (sym (or (env "TO") "neo-paleo-hebrew")))
(setq output-file (or (env "SAVEAS") "/dev/stdout"))
(setq input-file (or (env "OPEN") "/dev/stdin"))
(dostring (c (read-file input-file))
(append-file output-file (char c)))
'read-file' can only be used to read from files.
You could use 'read-char' to read from stdin using 0 as the device. Returned values will be one byte at a time, or use 'read-line' then 'explode' the line into UTF-8 multibyte characters. In both cases processing does not start until a line-feed.
#!/usr/bin/newlisp
(while (!= (setq ch (read-char 0)) (char "q"))
(println "->" ch))
(exit)
Or use 'read-key', which will process immediately after each key-stroke:
#!/usr/bin/newlisp
(while (!= (setq ch (read-key)) (char "q"))
(println "->" ch))
(exit)
Lutz, could it be time to change read-char with read-byte, and make read-char do the utf8 thing?
Would it be hard to slurp in fd 0 for read-file? I mean, generally, I cat a file into the program, and it is no different from reading a regular file. The only difference between read-file and read-char is that one takes an already opened file descriptor, and the other wants to do the open itself. Once read-file accepted all possible input from fd0, it makes sense it would close it.
I see 'read-char', 'read-buffer' and 'read-line' as the low-level API using file descriptors and dealing with octets and 'read/write/append -file' as the highlevel API dealing with file names.
A future version of newLISP has a 'read-utf8' which works like 'read-char', but reads UTF-8 characters from a file handle or stdin. Until then I suggest using (explode (read-line ...)) instead. 'explode' splits a string on multi-byte character borders in UTF8 enabled versions of newLISP.
Quote from: "Lutz"
A future version of newLISP has a 'read-utf8' which works like 'read-char', but reads UTF-8 characters from a file handle or stdin. Until then I suggest using (explode (read-line ...)) instead. 'explode' splits a string on multi-byte character borders in UTF8 enabled versions of newLISP.
Thank you for the explanation. I'm meditating now on the whole utf8 vs byte oriented paradigm works in newLISP. I should have spent more time meditating on the pack and unpack functions.
Does (char ...) use (pack ...) internally? Should it? I recently had some problems when trying to output some Hebrew characters in codepage iso8859-8. I think (char ...) was interpreting them as utf8 characters.
'char' does not use 'pack' or 'unpack' internally, but 'char' is UTF-8-sensitive when using the UTF-8 enabled version of newLISP. If a Windows installation is working with ISO-8859-x character sets, then one should not use the UTF-8 enabled version for Hebrew and most Eastern European (cyrillic) character sets.
When the UTF-8 enabled version of newLISP sees Hebrew ISO-8859-8 characters (greater 127), it sees them not as Hebrew but something else. All characters below 128 will interpreted as ASCII in both, ISO-8859-x or UTf-8/UNICODE. All above 127 will be one-byte-wide characters in ISO-8859-x characters sets but could initiate different one- or multi-byte-wide characters in UTF-8.
Here is a list of all functions sensitive to UTF-8 characters when using the UTF-8 enabled version: http://www.newlisp.org/newlisp_manual.html#utf8_capable
Ps: by default the Windows installer shipped is non-UTF8, and the Mac OS X installer is UTF-8.
Lutz,
Why do you make a differences in function names between utf8 and none utf8?
Its somewhat awkward to have that inside a language...
I would expect global functionality in a function instead of separate behaviour..
Just a though....
Some functions work differently in utf8 and non-utf8 versions like these: http://www.newlisp.org/newlisp_manual.html#utf8_capable They are meant to work strictly on displayable strings and can display this global behavior. But there are also functions where you need both version like 'read-char' and 'read-utf8'. If you let 'read-char' switch behavior, like the others in the link, you would not be able to read binary files or ISO-8859 files correctly.