HowTo using record structure in newLisp

Started by c.ln, October 12, 2004, 03:01:06 PM

Previous topic - Next topic

c.ln

I have several lists of records to manage (thousands of thousands). These records form regular sequences through sequential process.

In the past, even in the present, i use C or Pascal to do it. But is it possible that LISP in the "ethymologie" (french word) could be a better choice ?

I am very impressed by the apparent facility of these script language, but when i do an overview of the documentation, i don't see anything about that (record/struct/union field type/length declaration).

Is it possible to manipulate elementary structure as a pseudo dotted pair ?

car = nameOfField, cdr = dataOfField of a list of pair which should be the record ?

I don't see advanced file functions like sharing, locking, etc...

Even if i call the Windows API to supply that, i need to utilize the result in a way that newLisp can manipulate.

Thanks beforehand for my naive questions !

HPW

#1
Hello c.In,



wellcome to the board,



You have several choices:



A plain list:

(setq MyDb (list (list dataOfField1 dataOfField2)(list ...)))


A assoc list:

(setq MyDb (list (list (list nameOfField1 dataOfField1)(nameOfField2 dataOfField2))(list ...)))


A XML file:

See doku for XML commands.



A sqlite database:

http://www.newlisp.org/index.cgi?page=Libraries">http://www.newlisp.org/index.cgi?page=Libraries

Mabe this is the right tool for your needs. (thousands of thousands)





I think there will be more. Just the few that came to my mind at the moment. Maybe a look in some lisp-books bring even more options.
Hans-Peter

Lutz

#2
Are these data packed into binary data? Look for 'unpack' and 'pack' in the manual. These are used to pack/unpack binary data structures.



If this doesn't help can you give us a concrete example of a record?



Lutz

c.ln

#3
You wil find behind this text, a fragment of interface unit.

This interface declare 3 types of record which may be founded in a sequential file. Typically, we have the following sequence:

deblotSimac

  remiseSimac

    chequeSimac

    chequeSimac

    chequeSimac

    ...

  remiseSimac

    chequeSimac

    chequeSimac

    chequeSimac

  ...

deblotSimac

...

and so on



This kind of file is ascii file, (except for cr/lf at each end of record)

Standard actions on this kind of file is

  to verify

    integrity of fields (by tables or by key computing),

    integrity of sequence.

  to inject or modify some fields.

  to identify double references and so on.

  to merge or split multiples sources.



If i create from external call list like HPW (Horse PoWer :) suggest me, i'm not sure that it is the best form to manipulate datas.



Imagine that i expose is trivial exercise, next steps brass multiple flux/streams to produce new ones.



A constraint is input files and output files must keep ASCII encoding.



Well, i hope that i'm not too boring, and i'm grateful for the help you give me !



Cordially

Christian



(***********************************************************)

unit               dclSimac2;



(***********************************************************)

interface



(***********************************************************)

uses

  utypes,

  utypchar,

  utypbyte,

  utypdate,

  stddcl;



(***********************************************************)

const

  cod_debLotSimac=   '1';

  cod_remiseSimac=   '2';

  cod_chequeSimac=   '3';



(***********************************************************)

type

  rec_lotSynergie    =  packed record

    case byte of

  1:  (  a                :  _a7char );

  2:  (  cod_EDS          :  _a3char;

        cte_4             :  char;

        dat_LotQQQ        :  _a3char );

    end;



(***********************************************************)

type

  ptr_debLotSimac    =  ^rec_debLotSimac;

  rec_debLotSimac    =  packed record

  //cte_aZero          ,

  //cod_enreg          :  char;              (* code enregistrement *)

    num_lotSyn         :  rec_lotSynergie;

    num_remise         : _a7char;

    dat_numeriz        :  rec_jjmmssaa;

    tim_telecol        :  rec_hhmm;

    cod_agence         :  _a5char;

    cod_scanner        :  _a4char;

    nbr_remises        ,

    nbr_cheques        :  _a5char;

    mtt_totLot         :  rec_montantVfix;

    buf_z4             :  _a7char;

    buf_z3             ,

    buf_z2             :  _a12char;

    buf_z1             :  rec_montantPack;

    num_image          :  _a12char;

    cod_valid          :  _a2char;

    cod_deviz          :  char;

    end;



(***********************************************************)

type

  ptr_remiseSimac    =  ^rec_remiseSimac;

  rec_remiseSimac    =  packed record

  //cte_aZero          ,

  //cod_enreg          :  char;              (* code enregistrement *)

    num_lotSyn         :  rec_lotSynergie;

    num_remise         : _a7char;

    dat_numeriz        :  rec_jjmmssaa;

    tim_telecol        :  rec_hhmm;

    nbr_cheques        :  _a5char;

    mtt_remise         :  rec_montantVfix;

    cpt_remise         :  _a11char;

    buf_z4             :  _a7char;

    buf_z3             ,

    buf_z2             :  _a12char;

    buf_z1             : rec_montantPack;

    fil_01             :  _a3char;

    num_image          :  _a12char;

    cod_valid          :  _a2char;

    cod_deviz          :  char;

    end;



(***********************************************************)

type

  ptr_chequeSimac    =  ^rec_chequeSimac;

  rec_chequeSimac    =   packed record

  //cte_aZero          ,

  //cod_enreg          :  char;              (* code enregistrement *)

    num_lotSyn         :  rec_lotSynergie;

    num_remise         : _a7char;

    mtt_cheque         :  rec_montantVfix;

    buf_z4             :  _a7char;

    buf_z3             ,

    buf_z2             :  _a12char;

    buf_z1             : rec_montantPack;

    cod_banque         :  _a5char;

    fil_01             :  _a3char;

    cle_RLMC           :  _a2char;

    fil_02             :  _a21char;

    num_image          :  _a12char;

    cod_valid          :  _a2char;

    cod_deviz          :  char;

    end;



(***********************************************************)

type

  rec_imgSimac      =  packed record

    cte_aZero          :  char;

  case  cod_enreg          : char of

    cod_debLotSimac    :  (  debLot            :  rec_debLotSimac  );

    cod_remiseSimac    :  (  remise            :  rec_remiseSimac  );

    cod_chequeSimac    :  (  cheque            :  rec_chequeSimac  );

    end;



type

  ptr_Simac          =  ^rec_Simac;

  rec_Simac          =  packed record

    case byte of

      1: (carte        : rec_carte118);  (* repr,sentation "physique"    *)

      2: (image        :  rec_imgSimac);  (* repr,sentation "logique"      *)

    end;



(***********************************************************)

procedure          init_Simac  (var enr: rec_Simac; cod: char; pad: char = ' ');



(***********************************************************)

Lutz

#4
This is how you could pack/unpack ASCII data:



;; the data
(set 'nme "john doe")
(set 'phone "123-5555")
(set 'balance 1000.23)

;; pack data in fixed size ASCII record
(set 'rec (format "%-25s%s8%10.2f" nme phone balance))

=> "john doe                 123-5555   1000.23"

;; unpack data in rec
(set 'lst (unpack "s25 s8 s10" rec))

=> ("john doe                 " "123-5555" "   1000.23")

;; convert back into fields
(set 'nme (trim (first lst))) => "john doe"

(set 'phone (nth 1 lst)) => "123-5555"

(set 'balance (float (last lst))) => 1000.23


The 'format' statement works similar to 'printf' in 'C' or other languages. You could set up a special format spec string and 'unpack' format spec for each record format and then write a function hiding the details.



Lutz

c.ln

#5
if i understand the explaination(s) of Lutz, to access to my data, i need 3 copies of them: ('rec; 'lst; 'fld).



is it possible to implement a syntaxe like 'rec.fld to access more quickly to the data ?



recType could be declare as list of 'fldAttr (fldName

,fldOffset // 0= calculate from (previous fldOffset fldSize)

,fldType

[, fldSize])



fldType could be declare as list of 'valAttr (size //0= must be specified

,readable

,writable

,recType //if any to find common subStructures

,context //if any to find context:methods,functions,constants for fldType

,...)



//pseudo declaration of type char

(constant 'char (cons 0 true true nil 'CHRFONC))



//pseudo declaration of type rec_lotSynergie

(constant 'rec_lotSynergie (cons ((cons 'a 1 'char 7) (cons 'cod_EDS 1 'char 3) (cons 'cte_4 0 'char 1) (cons 'dat_lotQQQ 0 'char 3)))



//read data from flux previously open

(read-buffer handle 'buff 'rec_lotSynergie.size)



//pseudo access to a field

(set 'qqq ((rec_lotSynergie buff).dat_lotQQQ)))

or

(set 'lotSynergie (rec_lotSynergie buff))

(set 'qqq ('lotSynergie.dat_lotQQQ))



like you can read, i do not hesitate to transtype a newLisp data referenced by the symbol 'buff ! lol



forgive me for all the atrocities i write, i'm a newBee in Lisp & newLisp (for the blinds who do not see that :)



cordially,

Christian

Lutz

#6
My example was meant to simply explain the workings of 'format' and 'unpack'. But of course you could program something along the lines you are suggesting in your post: describing a record structure with a nested list with field names, types etc. And that would be reusable for other types of records, as you seem to have in mind.



What I did not understand in your code where expressions like (cons 0 true true nil 'CHRFONC). Perhaps you meant just (0 true true nil CHRFONC).



You don't need an operator/function in LISP to form a list, you can just say:

(0 true true nil CHRFONC) and then also omit the ' quote before the CHRFONC symbol. Or assign it: (set 'myvar '(0 true true nil CHRFONC)) etc.



You only would need the list operator, if you would want the list members evaluated before forming the list, i.e:



(list nme phone balance) => ("john doe" "123-5555" 1000.123)



'cons' also forms lists but only takes two arguments (see manual).



If you are a newbee to newLISP I suggest reading some of the many examples in the manual and on the newlisp.org website. If you have studied other LISPs before, be aware that newLISP is different in many aspects, but needs less concepts to understand and study to get into it.



You seem to have a pretty good understanding what LISP is about in general and I am shure newLISP will help you to get into it quickly.



Lutz

c.ln

#7
Because i was afraid of to have 3 copies of my data before working and three others ones to save that job, i take a look more attentive on the source, specially for the p_readBuffer function.



Still yet, sources of newLisp only serve to compile under cygWin... :)



I was very surprised to see that 2 allocMemory, 1 memset, 1 memcpy, 2 freeMemory, 1 deleteList were made for each call.



The code located below is not tested, it's only for discuss.

The documentation of the source is very brief, so i beg your pardon for conceptual mistakes.



The principal purpose of these trial, is to reuse the allocated data and to avoid heap management which is always a source of retard.



The dots before statements are for prevent trim strings parsed by this editor.



A fixed police will be a best for reading too.



//newLisp syntaxe: ( read-buffer int-file sym-buffer [ int-size ])

//params= list of CELL ( CELL-intFile CELL-symBuffer CELL-intSize )

//result= nilCELL or newCELL-nbrBytesRead



//function actions

//- getParams && #ifdef _SAFETY_ checkParams #endif :)

//- (re)formate/keep alive CELL-symBuffer

//- perform read on handle into buff

//- return expected result ( as possible :)



typedef

unsigned char.....* uchar;



CELL..............* p_readBuffer........(

..CELL..............* params )

{

CELL..............* CELL_intFile,

..................* CELL_symBuffer,

..................* CELL_intSize,

..................* CELL_dummy;

int.................handle............= -1,

....................size;

int.................bytesRead;

uchar.............* buff..............= NULL;

CELL..............* strCell...........= NULL;

SYMBOL............* readSptr..........= NULL;



CELL_intFILE......= params;

#ifdef _SAFETY_

if ( ! CELL_intFile || CELL_intFile == nilCell )

..return ( errorProcArgs ( ERR_MISSING_ARGUMENT, params ));

#endif



CELL_symBuffer....= getInteger ( CELL_intFILE, ( UINT * ) &handle );

#ifdef _SAFETY_

if ( ! CELL_symBuffer || CELL_symBuffer == nilCell )

..return ( errorProcArgs ( ERR_MISSING_ARGUMENT, params ));

#endif



CELL_intSize......= getSymbol  ( CELL_symBuffer, &readSptr );



//this part may be discuss as read-buffer could be called on an anonymous (unreferenced) buffer

#ifdef _SAFETY_

if ( ! readSptr || readSptr == nilSymbol )

..return ( errorProcArgs ( ERR_SYMBOL_EXPECTED, params ));

#endif



if ( isProtected ( readSptr->flags ))

..return ( errorProcExt2 ( ERR_SYMBOL_PROTECTED, stuffSymbol ( readSptr )));



strCell           = ( CELL * ) readSptr->contents;

if ( strCell )

{

..if ( strCell->type != CELL_STRING )

..{

....readSptr..........->contents..........= ( UINT ) nilCell;

....deleteList ( strCell );

....strCell...........= NULL;

..}

..else

....buff..............= ( uchar * ) strCell->contents;

}



if ( ! CELL_intSize || CELL_intSize == nilCell )

{

#ifdef _SAFETY_

..if ( ! strCell )

....return ( errorProcArgs ( ERR_MISSING_ARGUMENT, params ));

#endif

..size..............= strCell->aux - 1;

#ifdef _SAFETY_

//optional check, ( read handle buff 0 ) may be used/util ?

..if ( ! size )

....return ( errorProcArgs ( ERR_INVALID_PARAMETER_0, params ));

#endif

}

else

{

..CELL_dummy........= getInteger ( CELL_intSize, ( UINT * ) &size );

#ifdef _SAFETY_

..if ( CELL_dummy != NULL )

....return ( errorProcArgs ( ERR_EXTRA_ARGUMENT, params )); // a new one, lol :)

#endif

..if ( strCell && size != strCell->aux - 1 )

..{

....freeMemory ( buff );

....buff..............= NULL;

..}

}



if ( ! buff )

..buff..............= allocMemory ( size + 1 );



if ( ! strCell )

..strCell...........= getCell ( CELL_STRING );



strCell...........->aux...............= size + 1;

strCell...........->contents..........= ( UINT ) buff;



readSptr..........->contents..........= ( UINT ) strCell;



if (( bytesRead = read ( handle, buff, size )) == -1 )

{

//make an empty string AZT

..* buff............= 0;

..return ( nilCell );

}



//make a string AZT

* ( buff + bytesRead ) = 0;



// create a new CELL with bytesRead in field contents

return ( stuffInteger ( bytesRead ));

}



Thanks for all people who are so courageous to read this mail ! :)



Cordially,

Christian

c.ln

is there a team developpers for newLisp ?

i will be very happy if i can have more informations to understand the code,

conceptual principles for the basic structures and their links with the newLisp script syntaxe.



cordially,

Christian





typedef unsigned char uchar;

and not

typedef unsigned char * uchar;



the original code which was rearranged in the precedent message.



CELL * p_readBuffer(CELL * params)

{

int handle, size;

int bytesRead;

unsigned char * buffer;

unsigned char * buff;

CELL * strCell;

SYMBOL * readSptr;



params = getInteger(params, (UINT*)&handle);

params = getSymbol(params, &readSptr);

getInteger(params, (UINT*)&size);



if(isProtected(readSptr->flags))

  return(errorProcExt2(ERR_SYMBOL_PROTECTED, stuffSymbol(readSptr)));



buffer = allocMemory((int)size+1);

memset(buffer, 0, (int)size+1);



if((bytesRead = read(handle, buffer, size)) == -1)

{

  free(buffer);

  return(nilCell);

}



buff = allocMemory((int)bytesRead+1);

memcpy(buff, buffer, bytesRead);

*(buff + bytesRead) = 0;

freeMemory(buffer);



strCell = getCell(CELL_STRING);

strCell->aux = bytesRead + 1;

strCell->contents = (UINT)buff;



deleteList((CELL *)readSptr->contents);

readSptr->contents = (UINT)strCell;

return(stuffInteger(bytesRead));

}

Lutz

#9
You don't really need all the "#ifdef _SAFETY_" .The functions getInteger(), getString() etc. routines check for parameter type and issue error messages.



The introduction of : CELL_intFile, CELL_symBuffer,  CELL_intSize, CELL_dummy is then really unnecessary and only wastes stack space.



But your comments prompted me to review the code and I found out, that the second allocMemory() can be replaced with a realloc() and this way save another local variable, make the code ca. 1.6% faster and saves 3 lines of code. This change will be in the next release/development version.



Some general info about the code base and contributing/improving: With the exception of some files (see file README) I am to 95% the only developer, but code contributuions/improvements/comments have been made in the past and are well received.



When contributing code, please don't change the identing style except if an entire file is contributed. i.e. osx-xxxx files are entirely contributed by Peter O'Gorman and display his own indenting style. Phil Hazel's PCRE files happen to have a similar style as mine, but that is accidental. For changed functions please contribute it in the current indenting style



The are a few policies about how newLISP's 'C' code is written: It should compile with the -Wall -pedantic option on all platforms. The uninitialized option is turned of because some compilers on some platforms report this warning by error. It should link with only the libraries you see in the current makefiles. The readline() libraries are a 'grey' area and have caused and are causing some grief because some Linux distributions don't install them be default; but the feature of commandline editing is just too im important in an interactive LISP.



The three goals of 'tight', 'fast' and 'memory' conserving code are way up on the list. If you can save a CPU cycle here and a line of code, do it. Normally this also makes the code smaller. Sometimes the 3 goals conflict whith each other and a decision has to be taken on a case by case basis. Try to think 'assembler' when you write code, what will the compiler do with your code? I benchmark each function, I write or change, to get a feel for the impact it has on the entire thing. In the early days I profiled newLISP quite a lot to identify the bottle necks.



Last not least: DOCUMENTATION. This is my weakest point but one of the strongest and important features of newLISP. I wrote all of the docs, but they have been heavily edited by many users on this board, who speak/write a better English than I do. My German accent even shows while writing :-). Typically doc improvements are posted in the news section on this board. There is a thread for this.



One or two poeple who seem to be professinal writers went thru the whole manual and sent me edited versions of HTML.  The newlisp_manual.html is hand edited, simple HTML readable and translateable by the oldest browsers, pdf translators etc.. I use WinMerge software for diffing, when people edit the manual. Most corrections are just posted as ASCII on the board, which is totally fine with me.



Please don't feel discouraged by all these rules, I just try to conserve time and maintain quality and portability.



Lutz

c.ln

#10
thanks (a lot) Lutz for your help !



but (i understand that you don't see it) in the sample code i wrote in the penultimate mail, an only ONE memoryAlloc is performed for an infinite reading (as long as you use the same buffer with the same size, what is frequent when you read fixed size records), i avoid too the memset () & memcpy () which are unutil.

the redondant/unutil declaration of CELL_xxx are only here for best understanding of the code, and beleive me, pushing 4 integers/pointers further on the stack increase loop performance about 0.0000000000000001 % :) against re-reading/understanding the source.



of course we could use the same variable "params" to access to the different cells, but if you unassemble code generate by using this kind of variables (using parameters as variables) you will see that the code generated is less compact/performant that using real local variables, it's due that the compiler makes differents assumes for this two different kinds of variables (the principal is parameter variable can' t be a complete register variable (elemantary, my dear Dr Watson :)).



if i understand your last message, i'm in front of 2 choices:

 - i rewrite the entire unit of nl-filesys.

 - i keep the style of the precedent author for modifying pieces of routines.



but what is the precedent style ?

where are the standard comment(s) for each functions ?



and if i take the first option, what style writing source will be a help or an amelioration of the precedent ? (to implement my style is not necesserally a good idea, except for who like mosaic style, lol)



don't beleive i would give a lesson, my intentions are to give some help and to participate/contribute to that project.



for the help documentation of which i need, is

 - the documentation of the basics structures like CELL, SYMBOL, etc ...

 - the use/type casting of their fields like aux, contents, etc ...

 - the general assumes/principes for what a CELL represent and their organization.



what ever, i'm going to propose a rereading for this unit in the next weeks (time is sharing between my job, wy wife, my children, my sport, ... and newLisp, lol :)



cordially,

Christian

Lutz

#11
>>>

... only ONE memoryAlloc is performed for an infinite reading ...

>>>



You cannot do this because any data read by readBuffer eventually has to end up in an object consuming memory, which then gets passed up the call stack to eventually beeing assigned or destroyed.



With the change I implemented (see my last post) this one allocation per object will happen in readBuffer. If you would use only one and the same buffer for all reads, you would still have to allocate the individual memory object pieces. So bets is doig it right away in readBuffer().



If you want to code for newLISP, instead of rewriting existing pieces I suggest to get involved in a new area. One of the things needed is a Unix-like fork() on the Win32 platform. It would be like a Win32 process but a true copy of the parents memory. environment and call stack space. Win32 threads don't work for this because they work in the parents process space.



As of my knowledge the only people who have done this is the folks of CYGWIN. You find the sources at ReadHat/CYGWIN and could create a file nl-fork.c which implements a 'C' fork() and waitpid(). I don't think that it is a big task coding-wise, but more a matter of understanding Win32 and/or Unix internals. I imagine that much is achieved just by studying the CYGWIN sources.



Another area would be exploring new posibilities for multi platform GUIs in newLISP, some sort of library or executable with hooks similar to newlisp-tk.exe or GTK-server.exe but self contained and callable/importable by newLISP on all mayor platforms (Win32, Linux/UNIX, Mac OSX). It would *not* be a new newlisp-tk.exe which controls newLISP, but rather the other way around where newLISP is the controller, in the style GTK-server is implemented.



Another area urgent needed in newLISP is interfaces to existing 'C' libraries and Win32 DLLs. Much in the style of mysql.lsp, odbc.sql or sqlite.sql. I.e. we need a sqlite3.sql. Other possibilities are interfaces to neural network libraries i.e FANN, imap mailservers, mail attachement decoders, etc., etc., etc., Whatever you feel could be of interest for newLISP users or yourself.



Lutz

c.ln

#12
ok Lutz !



you have your mountains to climb, and i have to learn newLisp concept & implementation (to evalutate its rating :)



i take in my mind the pb of multi-threading.

the first idea which come to me, if your problem is the shared memory area, is to use win32 API memory function to allocate separate Heap for each Thread:

 - GlobalAlloc(), GlobalRealloc(), etc ...

 - LocalAlloc(), LocalRealloc(), etc ... (same effects under Zidowz)

 - HeapCreate(), HeapAlloc(), etc ... (is more intersting if you do more than one globaAlloc for each thread)

 - in conjunction with GetProcessHeaps() to retreive the Handles.



my question is what kind of fork do you need ?

 - multi-processing (CreateProcess() Win 32 API)

or

 - multi-threading (CreateThread() Win 32 API)



standard Unix fork() is more similar to the first option, but under Zindowz, the second option is more powerful.



do you imagine newLisp Tree and Threads branches or newLisp Forest ?

and what kind of newLisp syntax will permit to insert multi tasking ?



i must warn you, that cygWin is trying to be Posix compliant, that is a complication and not usefull to run a program for a Zindowz target.



thanks for the attention i require from you.



cordially,

Christian

Lutz

#13
CreateProcess is currently used for in the MinGW and CYGWIN compiles to implement the newlisp 'process' and 'process' with stdio maping to pipes (see win32-util.c and nl-filesys.c). But this function or CreateThread are directly not usable. A fork must be build somehow on top of CreateProcess() but somehow creating an exact copy (not share!) of the parents memory and stack environment, using Win32 memory facilities.



What I need is the classic UNIX fork() as used currently in newLISP for the CYGWIN, Linux, FreeBSD, Mac OSC and Solaris compiles. It may be POSIX or Solaris type, they all work the same for newLISP purpose, making an exact copy of the parent process and inheriting file descriptors, semaphores, shared memory pages etc.



CYGWIN fork() seems to work fine, at least when newLISP is compiled as a CYGWIN app. newLISP 'fork' and 'wait-pid' seem to work. But I want the same in MinGW compiled executable, so the user does not have to install extra CYGWIN libraries. Its the last piece missing to make a MinGW version on Win32 work just like the Linux/UIX compiled versions.  8.2.3 has 'pipe' 'process' with mapped pipes, 'share' for memory and 'semaphore' all for Win32. The only thing missing is a fork(). Threads or pthreads don't serve me, I need the separate process image derived from the parent process.



Lutz



ps: has anybory else knowledge where to get this, except for CGWIN?

pjot

#14
Well, if Cygwin works fine, you could look into the Cygwin sources, and see how they did it.



I know somebody who used to work as a programmer for Microsoft, I will ask him.