I have archive:
E:dictenrufull>7za l enrufull.dic.dos.7z
7-Zip (A) 4.20 Copyright (c) 1999-2005 Igor Pavlov 2005-05-30
Listing archive: enrufull.dic.dos.7z
Date Time Attr Size Compressed Name
------------------- ----- ------------ ------------ ------------
2006-02-15 19:15:18 ....A 37193160 10484024 enrufull.dic.dos
------------------- ----- ------------ ------------ ------------
37193160 10484024 1 files
I try unpack it into memory:
(setq my-dict (exec "7za e -so E:\dict\enrufull\enrufull.dic.dos.7z"))
(read-line) # "pause"
and in point "pause" my newlisp-program get more then 70 Mb of RAM
It is lots of memory(IMHO)
How I can avoid the problem?
Can you show us the return list of the 'exec' statement? What is in 'my-dict' and what does '7za e -so' do?
The original file already has 30Mbyte, if '7za' is some kind of extraction utility than 70Mbyte with other overhead of putting every line in a list seems not too much, what is '7za e -so' for?
Lutz
enrufull.dic.dos is big text file
command "7za e -so enrufull.dic.dos.7z" extract it to standard output
I think that problem may be in any "external" command.
Example 2:
I have text file eng1.txt (10119386 byte) and after command
(setq my-dict (exec "type eng.txt"))
before
(read-line)
my newlisp-program get near 20 Mb of RAM
It is twice size of eng1.txt :(
I'm not Lutz ;-) But it seems to be normal:
first memory allocation occurs when exec collects a program's output.
second - occurs when it's copied to new symbol my-dict.
After that, first piece of memory is internally freed and will be reused next time.
It just can't be released back to Windows.
It is not good, if it is so... :-(
Under Linux we have such problems, or no?
Hmm... As I can see, under linux newlisp doesn't free large memory allocations for symbols - just reuse it.
Also, as I can see, the memory, allocated by exec is completely freed.
So, I may be wrong about my previous descision...
But, occasionally, I just found another issue:
I load a very large file, consists of small (10 char) lines. After exec's output was parsed to a list (according to documentation) I got a large memory overhead, much more than two times, and I suspect, that it is caused by "list" internal structure. Your dictionary file may have similar issue.
Try with an exec statement that returns a small number of very long lines and compare memory results.
...oh my english isn't much clear :-)
I think the list's cell storage overhead is not an "issue" - just a logically attended behavior.
I got files short.txt and long.txt
(write-file "short.txt" (dup "aaaaaaaaan" 1000000))
(write-file "long.txt" (dup (append (dup "b" 1000000) "n") 10) )
and tests
(setq dict (exec "type long.txt")) #test1
#(setq dict (exec "type short.txt")) #test2
#(setq dict (parse (read-file "long.txt") "n")) #test3
#(setq dict (parse (read-file "short.txt") "n")) #test4
(read-line) # "pause"
(exit)
Amount of memory:
in test1 near 11Mb
in test2 near 60Mb
in test3 near 11Mb
in test4 near 60Mb
Conclusion: long list = big memory