Print Page - Lots of memory

Title: Lots of memory
Post by: alex on February 17, 2006, 09:44:16 AM

I have archive:

Code Select Expand

E:dictenrufull>7za l enrufull.dic.dos.7z
7-Zip (A) 4.20  Copyright (c) 1999-2005 Igor Pavlov  2005-05-30
Listing archive: enrufull.dic.dos.7z

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------
2006-02-15 19:15:18 ....A     37193160     10484024  enrufull.dic.dos
------------------- ----- ------------ ------------  ------------
                              37193160     10484024  1 files

I try unpack it into memory:

Code Select Expand

(setq my-dict (exec "7za e -so E:\dict\enrufull\enrufull.dic.dos.7z"))
(read-line)  # "pause"

and in point "pause" my newlisp-program get more then 70 Mb of RAM

It is lots of memory(IMHO)

How I can avoid the problem?

Title:
Post by: Lutz on February 17, 2006, 07:34:21 PM

Can you show us the return list of the 'exec' statement? What is in 'my-dict' and what does '7za e -so' do?

The original file already has 30Mbyte, if '7za' is some kind of extraction utility than 70Mbyte with other overhead of putting every line in a list seems not too much, what is '7za e -so' for?

Lutz

Title:
Post by: alex on February 18, 2006, 02:52:25 AM

enrufull.dic.dos is big text file

command "7za e -so enrufull.dic.dos.7z" extract it to standard output

I think that problem may be in any "external" command.

Example 2:

I have text file eng1.txt (10119386 byte) and after command

(setq my-dict (exec "type eng.txt"))

before

(read-line)

my newlisp-program get near 20 Mb of RAM

It is twice size of eng1.txt :(

Title:
Post by: Dmi on February 18, 2006, 06:32:49 AM

I'm not Lutz ;-) But it seems to be normal:

first memory allocation occurs when exec collects a program's output.

second - occurs when it's copied to new symbol my-dict.

After that, first piece of memory is internally freed and will be reused next time.

It just can't be released back to Windows.

Title:
Post by: alex on February 20, 2006, 05:51:02 AM

It is not good, if it is so... :-(

Title:
Post by: alex on February 20, 2006, 05:59:09 AM

Under Linux we have such problems, or no?

Title:
Post by: Dmi on February 20, 2006, 10:48:53 AM

Hmm... As I can see, under linux newlisp doesn't free large memory allocations for symbols - just reuse it.

Also, as I can see, the memory, allocated by exec is completely freed.

So, I may be wrong about my previous descision...

But, occasionally, I just found another issue:

I load a very large file, consists of small (10 char) lines. After exec's output was parsed to a list (according to documentation) I got a large memory overhead, much more than two times, and I suspect, that it is caused by "list" internal structure. Your dictionary file may have similar issue.

Try with an exec statement that returns a small number of very long lines and compare memory results.

Title:
Post by: Dmi on February 20, 2006, 11:10:56 AM

...oh my english isn't much clear :-)

I think the list's cell storage overhead is not an "issue" - just a logically attended behavior.

Title:
Post by: alex on February 21, 2006, 02:05:41 AM

I got files short.txt and long.txt

Code Select Expand

(write-file "short.txt" (dup "aaaaaaaaan" 1000000))
(write-file "long.txt" (dup (append (dup "b" 1000000) "n") 10) )

and tests

Code Select Expand

(setq dict (exec "type long.txt"))              #test1
#(setq dict (exec "type short.txt"))              #test2
#(setq dict (parse (read-file "long.txt") "n"))  #test3
#(setq dict (parse (read-file "short.txt") "n")) #test4
(read-line)  # "pause"
(exit)

Amount of memory:

in test1 near 11Mb

in test2 near 60Mb

in test3 near 11Mb

in test4 near 60Mb

Conclusion: long list = big memory

newLISP Fan Club

Forum => newLISP in the real world => Topic started by: alex on February 17, 2006, 09:44:16 AM