binary data through stdin

Started by m35, March 21, 2007, 05:00:25 PM

Previous topic - Next topic

m35

I currently have a need to read binary data that is piped in from stdin in Windows (in particular I'm writing some cgi scripts). I've had this need before when running transformations on binary data piped from the command-line.



Unfortunately, since this is Windows, and since stdin is open in text mode, all "rn" byte combinations are read as "n".



Since I couldn't think of any other good solutions to this problem, I tried adding a new primitive function (set-mode). I gave it a shot and it seems to be working well.



#ifdef WINCC
CELL * p_setmode(CELL * params)
{
UINT handle;
UINT mode;
int result;

params = getInteger(params, &handle);
getInteger(params, &mode);

if (mode == 0)
result = setmode( handle, O_TEXT );
else if (mode == 1)
result = setmode( handle, O_BINARY );
else
return(nilCell);

if (result == O_TEXT)
return(stuffInteger(0));
else if (result == O_BINARY)
return(stuffInteger(1));
else
return(nilCell);
}
#endif


Is there any other way around this short of adding a new function to change the mode?

Lutz

#1
You can use 'read-buffer' or 'read-char' on 0 (zero) as a file handle for stdin and read binary data from stdin.



Lutz

m35

#2
Thanks for the reply Lutz. I had tried read-buffer, but unfortunately it still seemed to read it in text mode.



Here's some code to create a file full of "rn"
(set 'handle (open "aFile.ext" "write"))
(write-buffer handle (dup "rn" 10))
(close handle)


Looking at the file "aFile.ext" in a hex editor, I correctly see a 20 byte long file
0d 0a 0d 0a 0d 0a 0d 0a 0d 0a 0d 0a 0d 0a 0d 0a 0d 0a 0d 0a


Then if I create a file (e.g. "stdin.lsp") with the code
(println (read-buffer 0 'buff 50))
(print (source 'buff))


and run the command

newlisp.exe stdin.lsp < aFile.ext



I get the output of
10
(set 'buff "nnnnnnnnnn")


When the output should be
20
(set 'buff "rnrnrnrnrnrnrnrnrnrn")


Edit:



Just tried read-char too with the stdin.lsp file
(setq buff '())
(while (setq ch (read-char 0))
(push ch buff -1)
)
(print (source 'buff))


with the incorrect output of
(set 'buff '(10 10 10 10 10 10 10 10 10 10))

Lutz

#3
Don't use use 'print', which will use textmode. Use 'write-buffer' to 1 (stdout) instead. Here is a short demo program working as a binary streight-thru pipe:


#!/usr/bin/newlisp

(while (> (read-buffer 0 'buff 1024) 0)
        (write-buffer 1  buff))

(exit)


you can pipe a binary through it:



~> cat newlisp-9.1.1.tgz | ./binary-pipe > file.tgz
~> cmp newlisp-9.1.1.tgz file.tgz
~> ls -ltr newlisp-9.1.1.tgz file.tgz
-rw-r--r--   1 lutzmuel  lutzmuel  810061 Mar 21 23:39 newlisp-9.1.1.tgz
-rw-r--r--   1 lutzmuel  lutzmuel  810061 Mar 21 23:39 file.tgz
~>


Lutz



ps: this is on MacOS X but shouldn't be different on Win32, use type instead of cat



ps2: 'print' and 'println' will also stop at binary 0's while 'read-buffer', 'write-buffer' let all characters 0-255 pass.

m35

#4
Thanks for taking the time to provide all your feedback, and for the many very speedy replies.



Using a straight through pipe unfortunately doesn't tell me very much because it's both reading as text, then writing as text. The mangling translation through stdin is negated when translated again through stdout. If I perform any process with the data instead of just piping it back out, I find the data is different from the original file. In addition, if you would examine my code closely, you would notice that I was not using (print) to write any binary data (it was merely for feedback).





For better examples, I created a file called "filefullofcrlf.bin" that looks like this in a hex editor
0d 0a 0d 0a 0d 0a 0d 0a 0d 0a 0d 0a
It contains 6 pairs of "rn", for a file length of 12 bytes.



Then I used a variation of your code that instead writes directly to a file
#!/usr/bin/newlisp
; test1.lsp

(set 'fh (open "whatwasread.bin" "write"))
(while (> (read-buffer 0 'buff 1024) 0)
        (write-buffer fh buff))
(close fh)

(exit)


When I pipe in "filefullofcrlf.bin" by either

C:> type filefullofcrlf.bin | newlisp.exe test1.lsp

or

C:> newlisp.exe test1.lsp < filefullofcrlf.bin



I get the output file "whatwasread.bin" containing exactly 6 bytes, all of which are 0x0a ("n").





Here's another test
#!/usr/bin/newlisp
; test2.lsp

(while (set 'ch (read-char 0))
    (if (= ch 13) ; (char 13) == "r"
        (write-char 1 42))) ; Write a "*" for every "r" in the file
                            ; (char 42) == "*"

(exit)


If I feed in "filefullofcrlf.bin", then this code produces no output at all.



Just to make sure I'm not crazy, I created another file "filefullofcr.bin" that contains 12 "r" bytes and piped it into test2.lsp. The output was 12 stars ("*").





One final test
#!/usr/bin/newlisp
; test3.lsp

(while (read-char 0)
    (write-char 1 42)) ; Write a "*" for every byte in the file
                       ; (char 42) == "*"

(exit)


After feeding in "filefullofcrlf.bin", stdout shows me only 6 stars ("*") when it should be showing me 12.



All this strongly suggests that reading from stdin via (read-buffer) or (read-char) translates "rn" into "n".



-----------------------------------------------------------



I also tested if reading binary 0's caused problems.



I created a 7 byte file "abc_nil_def.bin" which when viewed in a hex editor looks like
61 62 63 00 64 65 66
 a  b  c     d  e  f


I piped this file into test1.lsp and test3.lsp with the following results:

test1.lsp: "whatwasread.bin" contained 7 bytes that matched "abc_nil_def.bin" exactly.

test3.lsp: The output was 7 stars ("*")



This seems to suggest that (read-buffer) and (read-char) are unaffected by binary 0's.



Edit:

All tests run on Windows XP Home and Pro with newLISP v.9.1.1 on Win32

Lutz

#5
This seems to be specific to Windows. In my experiments there is no CR/LF translation happening, and think I had to deal with in one of my programs before. I am travelling today but will try this on Windows tomorrow.



Lutz

Lutz

#6
Windows does drop CR's when going from stdin to memory and adds them back when going from memory to stdout.



I had this problem in httpd (a newLISP coded web-server shipped up to 9.0.0, now http mode is included in the newLISP executable). I solved this back then by just doing a (replace "rrn" text "rn"), but there is a better solution:



You can 'import' the setmode function you mentioned (it carries an underscore in front!). This is what I did.

 

create a file with crlf's:


(write-file "crlf.txt" (dup "rn" 10))


import _setmode from a Win32 DLL:


; - filter -

(import "msvcrt.dll" "_setmode")

(define O_BINARY 0x8000)
(define O_TEXT 0x4000)

(_setmode 0 O_BINARY)

(while (> (read-buffer 0 'buff 1024) 0)
(println (map char (explode buff))))

(exit)


now the CR's don't get dropped:


C:> newlisp filter < crlf.txt
(13 10 13 10 13 10 13 10 13 10 13 10 13 10 13 10 13 10 13 10)


see also here: http://newlisp.org/downloads/newlisp_manual.html#import">http://newlisp.org/downloads/newlisp_manual.html#import



Lutz



ps: this will be added to http://newlisp.org/index.cgi?page=Code_Snippets">http://newlisp.org/index.cgi?page=Code_Snippets

m35

#7
Ah ha! I didn't know the source of the setmode() function was in a DLL. That really does make it easy.



It's a simple function, Windows specific, and I assume that the C code I posted originally was just calling the DLL anyway. Having a conditional import (only when in Windows) solves the problem and keeps the code cross-platform.



Thank you Lutz for helping me with my problem and finding a nice solution, even though you've been busy :)

m35

#8
For the sake of completeness, I just found that the MinGW wiki covered this very issue.



http://www.mingw.org/MinGWiki/index.php/binary">http://www.mingw.org/MinGWiki/index.php/binary



Edit;

I really wouldn't mind if newlisp set binary as the default for Windows. It would make scripts a little more cross-platform.

rickyboy

#9
Quote from: "Lutz"Windows does drop CR's when going from stdin to memory and adds them back when going from memory to stdout.

This blows.
(λx. x x) (λx. x x)