endian tricks?

Started by sunsetandlabrea, March 16, 2005, 07:49:10 AM

Previous topic - Next topic

sunsetandlabrea

Hello,



I'm just learning newLisp (very impressed so far) and I'm new to lisp style languages.



I want to read in a binary file but deal with it both on Windows and OS X and of course there is an endian issue.



I'm reading it in something like:



(read-buffer dat 'example 2)

(first(unpack "f" example))



of course I can just do:



(first(unpack "f" (reverse example))



which works fine.



Ideally though I would like to be able to write a function which deals with all this issue and which I can pass an unpack style string and get back a list. i.e. something like:



(my-read dat "s13fdff")



So that would give me a list containing a 13 character string, a float, an integer and two more floats plus it would swap the bytes round if required on certain platforms.



Anyone any suggestions how I go about writing that?



Many thanks,



Richard

Lutz

#1
Welcome to the group sunsetandlabrea!



first be aware that the "f" format is for packing/unpacking 12bit floats so you code would have to be:



(read-buffer dat 'example 4)    ; read 4 bytes

(first(unpack "f" example))



the endian issue will be automatically resolved when you stay on the same platform. So a number packed/written on little-endian will come out fine when read back on little-endian, and the same is true for writing/reading on big-endian OS.



You can have format for several datatypes in one shot:



(set 'fmt ""s13fdff")



and could do the follwoing (copied from an interactive session):

> (set 'fmt "s13fdff")
"s13fdff"

> (set 'fbuff (pack fmt "1234567890" 1.23 456 7.8 8.9))
"1234567890000000ᄂpン?È01レルù@ff14A"

> (unpack fmt fbuff)
("1234567890000000" 1.230000019 456 7.800000191 8.899999619)
>

So fbuff could be written and read back from a file and unpacked well if staying on the same endianess of the platform.



But be aware of CPU alignment issues. Suppose the date above was written in 'C' with the following structure

typedef struct {
 char name[13];
 float numf;
 short numd;
 float numf-2;
 float numf-3;
} mystuct;

numf will probably not be aligned at offset 13 but most likelly at offset 16 to align it on 32 bit boundaries on a 32bit CPU.  Same with numf-2 which comes after a 16bit short number and also would not be aligned to a 32bit boundary.



You could work around it putting filler bytes in the reading format, i.e:



"s13cccfdccff"



See also the chapter " Unevenly aligned data structures" in http://www.newlisp.org/index.cgi?page=Compiling_and_Importing_Libraries">http://www.newlisp.org/index.cgi?page=C ... _Libraries">http://www.newlisp.org/index.cgi?page=Compiling_and_Importing_Libraries , which is related to this.



Lutz

sunsetandlabrea

#2
Hello,



Many thanks for your reply and the warm welcome!


Quotefirst be aware that the "f" format is for packing/unpacking 12bit floats so you code would have to be:


Yes, sorry the reading 2 bytes was a typo, I presume you meant 16bit floats?


Quotethe endian issue will be automatically resolved when you stay on the same platform. So a number packed/written on little-endian will come out fine when read back on little-endian, and the same is true for writing/reading on big-endian OS.


The problem is the file is always created on a PC, but may be read back on OS X which is obviously big-endian. So I have a requirement to read a little-endian file on a big-endian platform.



I'm more familiar with Python and that has a struct module where you can append an optional formatting character but otherwise works similiarly to unpack. So you have = for native, < for little-endian, and > for big-endian. I guess I'm looking for something similar. I was hoping to create a single function which could do all this for me in a simple manner, for instance:



(myread dat "< s13 f f d d")



Where dat would be an open file handle, and the string would be the unpack format. I guess I need to look at parsing the string and work from there.



Thanks for the pack / unpack, alignment tips I shall keep those in mind.



Also a quick license question. I am interested in newLisp also as a dll to add scripting to a commercial application. From my understanding of the GPL as long as I don't link the code into my application that would be ok? Is my understanding correct? What would happen if I used newLisp's link.lsp to embed one of my scripts within the dll would I still be ok?



Many thanks again,



Richard

Lutz

#3
The little-endian/big-endian control character in Python looks like a good Idea! I will put this on my to-do list. Look for it in a later version!



About licensing. As long as you don't package GPLd newlisp.dll together with closed source products there is no problem. You would have to install newlisp.dll separately.



The upcoming release installs newlisp.dll in a fixed loaction in C:/Program Files/newlisp or whatever the Win32 environment variable PROGRAMFILES indicates in a specific locale/non-english speakig country. This is already the case in the development version 8.4.4. In the next development version 8.4.5 (due end of this week) the Windows installer will will also modify the path to include the location of newlisp.dll/newlisp.exe and will make the installation of the newlisp-tk frontend optional.



So this will give you an easy way for your customers to install a slim version of newISP on their system and use it from a closed source application.



If your product is OpenSource you can package/link newlisp.dll with you code, commercial or not. Note the GPL is not about commercializing/profiting on code but rather about the closed/open aspect of it. In either case the GPL wants you to give reference to the usage of newlisp.dll in your documentation (installed separately or not).



Lutz

sunsetandlabrea

#4
Hello again,



Thanks for the consideration of adding endian support, that will be a useful feature.



The license situation is unfortunate, it means I cannot use it for my project at work. Although it doesn't sound too bad having to run a separate installer it would be impractical for us, and add a layer of unnecessary complexity.



:-(



I presume you have chosen the GPL to inhibit inclusion into commercial products? If not perhaps you could consider a dual license with another open source license which would allow easier usage within a commercial project? (http://www.opensource.org/licenses/">http://www.opensource.org/licenses/).



Obviously I can understand your reasons for releasing under the GPL but for me personally I would have to choose a scripting language with a more liberal license - such as Python, Perl, TCL, etc.



Thanks,



Richard

Lutz

#5
>>

I presume you have chosen the GPL to inhibit inclusion into commercial products?

>>



The GPL doesn't inhibit inclusion into commercial products only the packaging with closed source product. A commercial closed source product can still be done using newLISP, when installing newlisp.dll separately.



Lutz

Lutz

#6
Just finished adding Endian suppport for 'pack' and 'unpack', look for it in the development release by the end of this week.



Lutz

sunsetandlabrea

#7
Excellent thanks for adding that, very useful.



Unfortunately having a separate installer pretty much inhibits me using newLisp in this particular commercial project. We take great pains to reduce the complexity of installation of our software and this would just add to it.



Thanks anyway,



Richard