newLISP speed

Started by HPW, May 06, 2004, 06:49:06 AM

Previous topic - Next topic

HPW

Today a co-worker complained about the speed of newLISP running on our solaris host. He had started a parser process and has waited around 4 minutes for process-return.



I asked him about details and I see that he parses a 40 MB file with 648000 lines, where every line is parsed and a 7200 line result file is written.



So I think it is quit good for an interpreter and it is running rock-solid.



:-)
Hans-Peter

Lutz

#1
actually I think it could be much faster, but I don't know what the code really does, so I am not sure, just a guess.



Lutz

HPW

#2
You may have a look at the code, may be you see some points where optimisation could be possible.



(setq starttime (now))
;-----------------------------------------------------------------------
;Repeats a str with num
;-----------------------------------------------------------------------
(define (repstr str num    newstr)
(setq newstr "")(dotimes(x num)(setq newstr(append newstr str))))
;-----------------------------------------------------------------------
;Main
;-----------------------------------------------------------------------
(if (=(length(main-args))4)
(begin
 (setq in-file(open (nth 2(main-args))"read"))
 (setq out-file(open (nth 3(main-args))"write"))
 (if (>(last(sys-info))5)
(setq lineendstr "") ;WIN = 6
(setq lineendstr "r") ;Solaris = 4
 )
(write-line "Starte Konvertierung Protokoldatei zu CSV")
(setq NextLineHpos nil
NewOutString nil
KombiEndCount 0
UposCount 0
MaxUpos 40
)
(while (setq linestr(read-line in-file))
(setq linelst (parse linestr " "))
(if NextLineHpos
(setq hposlst (replace " " linestr  "")
hposlst (parse hposlst "|")
NewOutString (string NewOutString (nth 0 hposlst)";"
(nth 1 hposlst)";"
(nth 2 hposlst)";")
NextLineHpos nil
)
)
(if (=(string(nth 0 linelst))"UPos")
(begin
(if upospreisneeded
(setq uposlst (replace "[ ]+" linestr "|" 1)
uposlst (parse uposlst "|")
NewOutString (string NewOutString "-99999;"
(nth 2 uposlst)";"
(nth 4 uposlst)";"
(nth 6 uposlst)";"))
(setq uposlst (replace "[ ]+" linestr "|" 1)
uposlst (parse uposlst "|")
NewOutString (string NewOutString (nth 2 uposlst)";"
(nth 4 uposlst)";"
(nth 6 uposlst)";"))
)
(inc 'UposCount)
(setq upospreisneeded true)
)
)
(if (=(string(nth 0 linelst))"UPos-Einzelpr.=")
(setq uposlst (replace "[ ]+" linestr "|" 1)
uposlst (parse uposlst "|")
NewOutString (string NewOutString (replace "."(nth 1 uposlst)",")";")
upospreisneeded nil
)
)
(if (and(=(string(nth 0 linelst))"MbiKatexFctFwUndRundung:")
(=(string(nth 1 linelst))"Input-Preis=")
(=(string(last linelst))"bKombi=1")
(= KombiEndCount 0))
(setq KombiEndCount 1)
)
(if (and(=(string(nth 0 linelst))"MbiKatexFctFwUndRundung:")
(=(string(nth 1 linelst))"Output-Preis=")
(= KombiEndCount 1))
(setq uposlst (replace "[ ]+" linestr "|" 1)
uposlst (parse uposlst "|")
NewOutString (string NewOutString
(repstr ";"(*(- MaxUpos UposCount)4)))
NewOutString (string NewOutString (replace "."(last uposlst)",")";")
KombiEndCount 2
)
)
(if (and(=(string(nth 0 linelst))"HPos")(=(string(last linelst))"|tstik_varid"))
(begin
(setq NextLineHpos true
KombiEndCount 0
UposCount 0
)
(if NewOutString
(write-line (append
NewOutString
lineendstr
)
out-file
)
)
(setq NewOutString "")
)
)
)
(if NewOutString
(write-line (append
NewOutString
lineendstr
)
out-file
)
)
(close out-file)
(close in-file)
(write-line "Ende Konvertierung Protokoldatei zu CSV")
(setq endtime (now))
(write-line (string starttime))
(write-line (string endtime))
(exit)
)
(begin
(write-line "Aufruf: newlisp parsepreisprot.lsp in.txt out.csv")
(exit)
)
)
Hans-Peter

Lutz

#3
You definetely should replace (repstr ...) with this much faster version:



(define (repstr str num , lst)
   (dotimes(x num) (push str lst))
   (join lst))


When num is a few thousands this routines is a hundred times faster.



Letting a string grow by repeatedly appending it is !very! expensive. It is much faster to push the pieces on a list and then join them.



Else it looks Ok to me.



Lutz

HPW

#4
Thanks Lutz,



I will test it in the morning when I am back in the office.

I will post the speedup.

(num was this time max 160)
Hans-Peter

HPW

#5
I made a testfile by hand on my home system with similar amount of data.

(This time 648000 line in 42 MB)



Surprise, no time difference. Both 'repstr' use around 110 sec on my 1.8 GHZ WIN XP. Tomorrow I will compare on solaris. But I think it is slower, because it only use 1 of it's 8 processors for the process and the one is maybe a bit slower than my home PC.



At least I take the complete repstr out and still exactly the same time.

So it takes no signifikant time to use it.
Hans-Peter

Lutz

#6
Looks like (repstr ...) is taking only little time in the overall process, still, keep this kind of optimization in mind for the future, it has saved me many times when doing repeated appends on a string.



The only other suggestion I have, is shorten the code doing regex stuff, but I am not sure how far this is applicable in your case and complex regular expression also can take their time ...



Lutz

HPW

#7
Now tested in the office (same file and code):



Solaris Host 850 MHZ (using 1 of 8 CPU): 164 sec

WIN XP 2.8 GHZ: 58 sec



Then I clean up the loop to do nothing on WIN XP: 47 sec

So only reading the lines makes most of the time.
Hans-Peter

Lutz

#8
Interesting, looks like I/O is the limiting factor in this case, not the processing itself. You could try of course:



(dolist (line (parse (read-file "thedatafile") "rn"))

(process-the-line line))



This might speed up I/O but costs a lot of memory.



Lutz

HPW

#9
Interesting idea:



Home WIN XP 1.8 GHZ/512 MB RAM with 42 MB testfile



With line-read: 104 sec with MEM-load < 1 MB

With file-read: 29 sec with MEM-load  109 MB



So when file fits in memory this way file-read can be prefered.
Hans-Peter

Lutz

#10
Almost 4 times speed improvement ... pretty good ...



and if it is true what they say about Sun workstations, it should perform even better on those, because they are supposed to be much faster doing I/O transfer than PCs. Perhaps a Sun machine can swallow that 45Mbyte chunk in 10secs, that would make your Solaris folks happy :)



Lutz