Lots of memory

Started by alex, February 17, 2006, 09:44:16 AM

Previous topic - Next topic

alex

I have archive:

E:dictenrufull>7za l enrufull.dic.dos.7z
7-Zip (A) 4.20  Copyright (c) 1999-2005 Igor Pavlov  2005-05-30
Listing archive: enrufull.dic.dos.7z

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------
2006-02-15 19:15:18 ....A     37193160     10484024  enrufull.dic.dos
------------------- ----- ------------ ------------  ------------
                              37193160     10484024  1 files

I try unpack it into memory:

(setq my-dict (exec "7za e -so E:\dict\enrufull\enrufull.dic.dos.7z"))
(read-line)  # "pause"

and in point "pause" my newlisp-program get more then 70 Mb of RAM

It is lots of memory(IMHO)

How I can avoid the problem?

Lutz

#1
Can you show us the return list of the 'exec' statement? What is in 'my-dict' and what does '7za e -so' do?



The original file already has 30Mbyte, if '7za' is some kind of extraction utility than 70Mbyte with other overhead of putting every line in a list seems not too much, what is '7za e -so' for?



Lutz

alex

#2
enrufull.dic.dos is big text file

command "7za e -so enrufull.dic.dos.7z" extract it to standard output

I think that problem may be in any "external" command.

Example 2:

I have text file eng1.txt (10119386 byte) and after command

(setq my-dict (exec "type eng.txt"))

before

(read-line)

my newlisp-program get near 20 Mb of RAM

It is twice size of eng1.txt :(

Dmi

#3
I'm not Lutz ;-) But it seems to be normal:

first memory allocation occurs when exec collects a program's output.

second - occurs when it's copied to new symbol my-dict.

After that, first piece of memory is internally freed and will be reused next time.

It just can't be released back to Windows.
WBR, Dmi

alex

#4
It is not good, if it is so... :-(

alex

#5
Under Linux we have such problems, or no?

Dmi

#6
Hmm... As I can see, under linux newlisp doesn't free large memory allocations for symbols - just reuse it.

Also, as I can see, the memory, allocated by exec is completely freed.

So, I may be wrong about my previous descision...



But, occasionally, I just found another issue:

I load a very large file, consists of small (10 char) lines. After exec's output was parsed to a list (according to documentation) I got a large memory overhead, much more than two times, and I suspect, that it is caused by "list" internal structure. Your dictionary file may have similar issue.



Try with an exec statement that returns a small number of very long lines and compare memory results.
WBR, Dmi

Dmi

#7
...oh my english isn't much clear :-)

I think the list's cell storage overhead is not an "issue" - just a logically attended behavior.
WBR, Dmi

alex

#8
I got files short.txt and long.txt

(write-file "short.txt" (dup "aaaaaaaaan" 1000000))
(write-file "long.txt" (dup (append (dup "b" 1000000) "n") 10) )

and tests

(setq dict (exec "type long.txt"))              #test1
#(setq dict (exec "type short.txt"))              #test2
#(setq dict (parse (read-file "long.txt") "n"))  #test3
#(setq dict (parse (read-file "short.txt") "n")) #test4
(read-line)  # "pause"
(exit)

Amount of memory:

  in test1 near 11Mb

  in test2 near 60Mb

  in test3 near 11Mb

  in test4 near 60Mb



Conclusion: long list = big memory