Load and Save of contexts using encrypt

Started by CaveGuy, October 21, 2002, 02:05:53 PM

Previous topic - Next topic

CaveGuy

Is there any possibility of getting an encrypt srt-pad, wedged into the load and save subbers. That way application sensitive data can be easily stored on disk

as a loadable context file, without compromising the privacy of the individual the context represents.



want a killer feature deserving of a major version number ? A optionally encrypted binary load and save functions along with a kill context feature.



For now, the ability to put the exsisting load and save stream through the existing encrypt subber would be great :)
Bob the Caveguy aka Lord High Fixer.

Lutz

#1
just use the following to encrypt a source file:



(write-file "myfile.enc" (encrypt (read-file "myfile") "my-secret-password"))



to execute from the encrypted source file do:



(eval-string (encrypt (read-file "qa.enc") "my-secret-password")



See also the feature in "link.lsp' to encryupt the source and link it to the binary executable. Or you could distribute a newlisp linked to just a "loader" consisting of the above line and distribute seperate encrypted source files.



For deleting contexts, just do the folowing:



(map delete (symbols 'MyContext))



You only would stay with the symbol of the context itself. All symbols in 'MyContext and their contents would be deleted and freed from memory.





Lutz

CaveGuy

#2
lets look at an example where NewLISP is fired off with an input file to evaluate and a context file that contains a corpus of knowledge, dynamically updated based on information gleamed from parsing the input file, using modification rules that are also personalized to and saved in the same context structure.



I perform a load of the previously saved context,

parse the input against the knowledge or rule set contained in the loaded context. If the context was changed as a result of the operation, it saves itself back to disk to be referenced again in the future. These save files range in size up to around a meg or so, 500k to 700k are common.



These contexts are personality profiles, often contain 50,000+ symbols with their associated values.

Each of these profiles or contexts contain detailed accumulated knowledge regarding a living breathing person, who would not like their personal corpus documented in a human readable text file.



It is not the program itself that I am protecting, it is these personal contexts, built and maintained by the application, on the behalf of a carbon based life form with the expectation of privacy.



I can use mysave and myload to perform this function, but its slow. The goal here is a tradeoff between speed, flexibility, and security.

With an average file size of 500k with 40K symbols to eval, the mysave and myload approach drastically limits the number of context swaps per minute.



In a past life, I used an encrypted load and save to protect the integrity of engineering data and manufacturing rules. In that case anyone could read and use them, but the generation and modification was rule sets were severely restricted.



The faster I can load a context, parse a small (2k - 30k) input file, save the context back to disk and return the better. :)
Bob the Caveguy aka Lord High Fixer.

Lutz

#3
I did some playing around with contexts containing bigger amounts of symbols (10k to 100k). The bottleneck seems not to be loading and saving, i.e. a 100,000 symbols and their values save in about 2.6 seconds and load in 0.6 seconds on my 1.4Ghz Celeron and the file containing these symbols and their 'set' functions is about 3Mbyte in size.



The long time is taken by deleting (not nil'ling, but totally eliminating the symbol). While 10,000 symbols take about 10 seconds to delete, the time goes up geometrically and with a 100,000 symbols it takes too much time to wait. The reason it takes so long to delete a symbol and that the time increases geometricaly is, that each symbol to be deleted has to be dereferenced in other data structures, which may refer to it. This means identifying all cells which refer to it and replacing the reference with 'nil'.



So my question is the following: do you have to totally eliminate the symbols, or is it enough to just 'nil' their contents, because the symbols used are the same in all personal contexts. The 'nil'ling is pretty quick (~ 0.17 seconds for 100,000 symbols) and it makes all the memory used by symbol contents available again.



If you need to delete the entire symbol, becuase they could be different in any personal context, than consider a different storage organization, not in symbols, but lists etc.. Another possibility would be to save the changed context and than let the whole application die, than reload newLISP with the new data, that would be a lot faster to eliminate 100,000 symbols than doing it at runtime.



By the way, what kind of application is this? (or perhaps you cannot talk about it).



Lutz

CaveGuy

#4
I would be happy to discuss the application in more detail in a more personal communication, but I am limited in what I am willing to say at this time in an open public forum. Even if it is only about 6 of us here at this time, I expect that number to creep up as application info slips out and more people get evolved in this project.



The current logic, loads a single context, processes one or more input files against that context, saves the context if applicable, and exits. This works well in what I will call the CGI mode. I have taken a pass on the idea of unloading contexts at this time, exiting and reloading is faster at this time. I use the existence or nonexistence of a (symbol? as a data state, just as I use (NaN? and (list?. I looked at niling out old symbols, I quickly found myself with the situation where there were 200 users with 30K+ symbols each that is 6,000,000 symbols up to 40 or so characters in length on cluttering the heap. In many cases the symbol its self is longest than the data assigned to it.



Your native (load) is very fast, and (save) speed is quite acceptable all considered. On the other hand, my save gets quite slow when it includes encrypting. My first hacks at a decrypting load function left much to be desired.



I feel that at the "C" level you should be able to apply the keypad at a location very close to the actual file access with very little additional overhead, as compared to the (dolist (symbols) (encrypt .... approach to

building an encrypted data set at the applications level.
Bob the Caveguy aka Lord High Fixer.

Lutz

#5
To weed it into the exisiting load/save functions isn't trivial, because loading is done in the getToken() function of the parser, which would get then too much overhead.



But I think I found an acceptable solution only a little slower than the unencrypted 'load' and 'save'.



These are the benchmarks on a context containig 100,000 initialized symbols and save file size of 3,010,315 bytes:



save 2.688 seconds

load 0.641 seconds



save-encrypted 3.015 seconds   - 12% overhead

load-encrypted 0.906 seconds  - 41% overhead





While in saving there is almost no difference, the overhead on the faster load is still acceptable:



;;
;; save a context in ctx to file-name encrypted with key
;;
;; example:
;;
;; (save-encrypted "mycontext.enc" "secret-key-word" 'MyCTX)
;;
(define (save-encrypted file-name key ctx)
(save ".enctmp" ctx)
(write-file file-name (encrypt (read-file ".enctmp") key))
(delete-file ".enctmp"))

;;
;; load/evaluate an encrypted lisp source file
;;
;; example:
;;
;; (load-encrypted "mycontext.enc" "secret-key-word")
;;
(define (load-encrypted file-name key)
(write-file ".enctmp" (encrypt (read-file file-name) key))
(load ".enctmp")
(delete-file ".enctmp"))



Note, that the context loaded should not contain the 'load-encrypted' function. Overloading a function which is executing at the same time crashes the system. Also, for a short time the data lies in a file ".enctmp" in unencrypted form while saving or loading, but that file is deleted afterwards.



Lutz

CaveGuy

#6
That will work, provided I can get a unique tempfile name from the system.

As this application is CGI triggered, I can and will have numerous overlapping iterations.



I do think, Open file, read 500K, close file, decrypt data, open file, write 500k, close file, now we can begin loading the 500k file utilizing yet one more open, read, parse, close. we complete the operation with a final file deletion. This is a lot of system overhead just to evaluate a symbol in a saved, protected, context.



The save operation takes a similar path. Save the file utilizing an open, compose, write, close. Then we open it back up, read it back in, close the file. Encrypt it, and open a write file, write it back out, close and delete.



I much perfer, open - read - decrypt - parse - close.

The save steps would be open - compose - encrypt - write - close.



5 steps vs 12 steps, the data passes to and from the file system once not three times. Now that is a solution that scales well :)
Bob the Caveguy aka Lord High Fixer.

Lutz

#7
Whenever I think about the encryption stuff the topic 'compression' comes to my mind, it would be nice to have this combined, because lisp source could be compressed to almost 25% of the original.



At this moment I use a self-knitted file-stream interface on which I work with raw read/writes, this is why load is so fast and why there are no limits on the filesize for loading, using the 'C' file stream functions would make things easier , i.e. for using UNIX crypt and compression library functions, but would also slow down lisp source I/O  several times.





Lutz

CaveGuy

#8
In my case, the native file compression provided by NT/W2k is sufficient for my needs short term needs, good idea though.  



My order of importance is, security, speed, then file size.



I understand the desire to optimize based on size and speed, I do not feel

they have to be compromised.



Security through obscurity just does not cut it anymore. Application developers are finding data security concerns are becoming the norm not the exception these days.



On the other hand, cadLISP uses a rotating mask to "Protect" their native binary save and load. They have a separate text based save. The load subber flags on a binary header and loads either one. I have an unprotector written in C for the cadLISP protection, I would have to get a 1.2 floppy working on something around here again. :)
Bob the Caveguy aka Lord High Fixer.

CaveGuy

#9
;; This is driving me nuts, it should work !!!

;;



(context 'TEST)

(set 'a '("string" 1.1))

(set 'b '("string" 2.2))

(set 'c '("string" 3.3))



(context 'MAIN)



(define (run)

    (mysave 'TEST)

    (myload 'NEW)

    (context 'NEW)

    (symbols)

    )



(define (mysave mc , fn fh)

    (context mc)

    (set 'fn "test.lsp")

    (set 'fh (open fn "w"))

    (dolist (nxt (symbols))

       (write-line (append "(set '" (string nxt) " '" (string (eval nxt)) ")" ) fh) )

    (close fh)

    (set 'fn "test.dat")

    (set 'fh (open fn "w"))

    (dolist (nxt (symbols))

       (write-line (encrypt (append "(set '" (string nxt) " '" (string (eval nxt)) ")" ) "my-key-string") fh) )

    (close fh) )

   

(define (myload mc, fn)

    (context mc)

    (set 'fn "test.dat")

    (set 'fh (open fn "r"))

    (while (read-line fh)

       (set 'is (encrypt (current-line) "my-key-string"))

       (print is "n")  ; first it needs to look ok    

;;  (eval-string is)  ; before we can try to eval it.    

       )

    (close fh) )

 

;; end of test
Bob the Caveguy aka Lord High Fixer.

Lutz

#10
The 'context' primitive works like a compile directive controlling the creation of symbols in certain context while parsing newLISP source code. Use the 'context' primitive only on the toplevel, but don't switch context on runtime. If you have to refer to a symbol in a different context during runtime use i.e.:  



(set 'MYCTX:var 123)



this could be embedded in a function without a problem



or



(define (OTHER:proc) .......)



would define proc in the context OTHER, but variables used in OTHER:proc would all refer to the context under which the (define (OTHER:proc) ...) was compiled.



or



(symbols 'ACTX) ; get all the symbols contained in context 'ACTX.



Imagine the following program:



(context 'MAIN)

(define (run) (context 'CTX) (set 'var 123))



MAIN:var => 123



when you run this program, the comand prompt will come back with the context CTX. Because the statement (context 'CTX) switched the "current" context to CTX. But the variable "var" was still compiled under the context 'MAIN, when the function was loaded from a file or entered on the command line. the symbol "var" will be part of MAIN and not CTX.



But now lets load the following program (or enter it on the commandline):



(context 'MAIN)

(define (run) (context 'CTX) (eval-string "(set 'var 123)"))



CTX:var => 123



At this time "var" will be created in the context CTX and not in MAIN. Because "var" doesn't get compiled until runtime of the function 'run' because the compilation is done by the 'eval-string' statement. Now at runtime the "compile context" gets switched to CTX and has an effect on the 'eval-string' statement.



Another more complete example:



(context 'CTX)    ;; step A

(set 'var 123)   ;; step B

(context 'MAIN)   ;; step C

(define (run) (context 'CTX) (set 'var 999))   ;; step D

(run)   ;; step E



CTX:var => 123

MAIN:var => 999



step B: At this time CTX:var will be set to 123.

step D: At this time MAIN:var will be created

step E: At this time MAIN:var will be set to 999 and CTX:var stays 123

also after E the current compile context is switched to CTX (not after step D, but after step E)



The function 'symbols' aleays reports the symbols of the "compile context" that is why:



(define (run) (context 'CTX) (symbols))



will report the symbols in CTX. You can use (symbols 'CTX) to accomplish the same without switching contexts.



The background of all this is the fact that newLISP is not and interpreted language, but rather a compiled language. One S-expression at a time gets read in, compiled and then evaluated.



When you read in a function definition with 'define' all symbols occurring in that 'define' statement get compiled into the current context, which had been switched by a '(context 'BLAH) on the top level before. After that a 'define' function statement gets read then executed. Execution here means a lambda expression is produced for the symbol defined. It does not mean that the '(context 'XYZ)' statement, which happens to be part of the 'define' statement has been excuted. It doesn't go into effect until the defined function is called from somewhere.



If you want to switch contexts during function execution time, you really have to know what you are doing (changing the compile context). That is why I recommend using 'context' only on the toplevel. Functions like 'symbols', 'symbol' and 'save' can take an optional context argument.





Lutz



PS: There is also a chapter about 'context' issues in the manual. This chapter talks about the fact that symbols may already have been created in MAIN by the time a context gets loaded compiled. In this case symbols occurring in context will refer to MAIN and not be newly created in the context. When in doubt what symbol is referred to, always prefix with the context name.

CaveGuy

#11
I understand what you are saying about contexts. My problem is not with the context games I am playing it is a more basic problem in the fact that



(!= (write-line (encrypt string key) file)

     (encrypt (current-line) key) => true

when (read-line file) is used.



For some reason the string read is not the same

as the string written.  (encrypt will take a perfectively valid char string and turn it into a binary string

that is very likely to blow up in the string parser during the read-line or write-line for any one of a number of reasons.



I do understand one needs to be very careful when

swapping, saveing and loading contexts on the fly.
Bob the Caveguy aka Lord High Fixer.

Lutz

#12
'write-line' and 'read-line' are text based functions doing (1) CR-LF translation depending on the current platform and (2) beaking lines when reading in at LF's and CR-LFs.



When encrypting a line, LF's may be generated, where there where none before. The 'read-line process' then reads a shorter line giving garbage when decrypted.



In your case where the encrytion is producing 'binary' content, it is better to use functions 'read-buffer' and 'write-buffer' when dealing with encrypted content.



You could do something like the following:

;; encrypt a file with buffer reads
;;
(set 'infile (open "myfile" "read"))
(set 'outfile (open "myfile.enc" "write"))
(while (!= (read-buffer infile 'buffer 256) 0)
  (set 'buffer (encrypt buffer "secret"))
  (write-buffer outfile 'buffer 256))
(close infile)
(close outfile)

;; decrypt a file with buffer reads
;;
(set 'infile (open "myfile.enc" "read"))
(while (!= (read-buffer infile 'buffer 256) 0)
(print (encrypt buffer "secret")))
(close infile)


Note, that the code is not checking for errors in read-buffer/write-buffer, which would be a return vaue of 'nil' of 'read-buffer'/write-buffer'.



Lutz



PS: This week and the following I may not be as quick to respond as usual, because of lots of other stuff, I have to deal with.