Optimizing a function

Started by Jeff, September 02, 2008, 09:39:10 AM

Previous topic - Next topic

Jeff

I have been playing with some of the shootout functions and trying to get decent results with newlisp.  Here is the fastest implementation I have so far for the sieve of eratosthenes:


(define (sieve size , flags total)
  (set 'flags (array (+ size 1))
       'total 0)
  (for (i 2 (- size 1))
    (when (not (flags i))
      (for (k (+ i i) size i (> k size))
        (nth-set (flags k) 1))
      (inc 'total)))
  total)

(define (m n)
  (* (pow 2 n) 10000))

(for (i 9 7)
  (let (n (m i))
    (println (format "Primes up to %8d %8d" n (sieve n)))))


It's roughly the same as the incrementing algorithm in the nsieve programs on the shootout site.  I'm able to run it in around 20 seconds on my machine (which is pretty fast).  The PHP implementation runs quite a bit faster than that.  That shouldn't be right... PHP isn't really optimized for this kind of work.  Any ideas on how to speed this up?[/code]
Jeff

=====

Old programmers don\'t die. They just parse on...



http://artfulcode.net\">Artful code

cormullion

#1
14 seconds here...



(iMac 2GHz Core Duo)



for comparison, the Ruby version (http://shootout.alioth.debian.org/gp4/benchmark.php?test=nsieve&lang=ruby&id=1">//http://shootout.alioth.debian.org/gp4/benchmark.php?test=nsieve&lang=ruby&id=1) took 48 seconds.



I don't see any obvious ways to improve it...



Lutz used character vectors and cpymem for his... http://www.newlisp.org/syntax.cgi?benchmarks/nsieve.newlisp.txt">//http://www.newlisp.org/syntax.cgi?benchmarks/nsieve.newlisp.txt

xytroxon

#2
To really optimise code , you have to use "small iron" cpus to see the problems ;)



I tried nsieve on an old Win98 laptop (48mb ram 120 mhz) machine with PHP 5.2.2, LUA 5.1, and Python 2.5...



-------

->"C:Program Filesphp5php.exe" test.php

Primes up to    10000     1229

Primes up to        0        0

Primes up to        0        0

------

->"C:Lualua5.1.exe" test.lua

Primes up to    40000     4203

Primes up to    20000     2262

Primes up to    10000     1229

------

->C:Python25PYTHON.EXE test.py

Traceback (most recent call last):

  File "test.py", line 18, in <module>

    nsieve((1 << (int(sys.argv[1]) - k)) * 10000)

IndexError: list index out of range

------



PHP and Lua both loaded and ran in a few seconds... Until hitting memory limits and returning partial results...



Python crashes, I think for not being able to solce the problem due to lack of memory space... Required indentation makes copy and paste Python programming at best a time consuming pain and at worst, a dangerous practice!



Your newLISP code Jeff, "runs forever" at 100% CPU usage, I killed it after 10 minutes... It continually accesses the hard drive to use the hard drive virtual memory... (The functional language/VN disk thrash record on this machine is held by PLT Scheme's Dr. Scheme IDE, which takes almost 50 minutes to load and run!)



So you might be having memory "juggling" issues that don't show up on large memory machines, but requires more time than expected for your code to execute...



Swans and Lisp(s) look gracefully smooth on the surface, while paddling like heck under the surface!



-- xytroxon
\"Many computers can print only capital letters, so we shall not use lowercase letters.\"

-- Let\'s Talk Lisp (c) 1976

Jeff

#3
Well, look at the numbers.  It's making arrays of 512 * 10000 elements.  Run it with smaller sizes and multiply the results.  The array is used for constant access speeds.



The same with the other languages.



Xytroxon - it ran consistently at 20-21 seconds on my 2.5 ghz quad-core ppc with 4 gigs of ram.
Jeff

=====

Old programmers don\'t die. They just parse on...



http://artfulcode.net\">Artful code

xytroxon

#4
Quote from: "Jeff"Well, look at the numbers.  It's making arrays of 512 * 10000 elements.  Run it with smaller sizes and multiply the results.  The array is used for constant access speeds.



The same with the other languages.



Xytroxon - it ran consistently at 20-21 seconds on my 2.5 ghz quad-core ppc with 4 gigs of ram.


Backed it down to 1000



D:MinGW - 18:14:04.58 Tue 09-02-2008

->"c:program filesphp5php.exe" test.php

Primes up to     1000      168

Primes up to        0        0

Primes up to        0        0



D:MinGW - 18:14:30.78 Tue 09-02-2008

->nl test.nl

Primes up to   512000    42445

Primes up to   256000    22525

Primes up to   128000    11987



D:MinGW - 18:16:20.52 Tue 09-02-2008

->lua test.lua

"C:Lualua5.1.exe" test.lua

Primes up to     4000      550

Primes up to     2000      303

Primes up to     1000      168

----

The spaces all look right, but Python is still MIA...

-----



While newLISP takes a few minutes to run... It looks like it is doing the calculations on this machine! Amazing!!!



Is thw PHP code the problem? Or is the Zend engine kicking in to help the speed???



This was fun... Timw for supper!!!
\"Many computers can print only capital letters, so we shall not use lowercase letters.\"

-- Let\'s Talk Lisp (c) 1976

Kazimir Majorinc

#5
(setq sieve (lambda(size)
              (let ((flags (array (+ size 1)))
                    (sqrtsize (sqrt size))
                    (total 0))
                   
                   (for (i 2 sqrtsize)
                     (when (not (flags i))
                           (for (k (* i i) size i)
                             (nth-set (flags k) 0))
                           (inc 'total)))
                           
                   (for (i (+ sqrtsize 1) size)
                     (unless (flags i)
                             (inc 'total)))
                     
                   total)))


This one is ~ 1.5 times faster, but trick is mathematical, not programming.
http://kazimirmajorinc.com/\">WWW site; http://kazimirmajorinc.blogspot.com\">blog.

cormullion

#6
I can do it in 11 seconds (as opposed to 14) if I get spawn on the case.



Kasimir's function and spawn get me down to 6 seconds... :)

unixtechie

#7
.. what is it I am doing wrong?

running your snippet on a 1.8GHz single-core Intel compaq/HP laptop with 1GB of memory takes 13 seconds under Linux

cormullion

#8
Looks like an unsurprising time to me. My 2Ghz Core Duo runs Jeff's code in 14 seconds (presumably without engaging both CPUs). Allowing for the way I'm running scripts (not from the command line but from an editor) that seems to match yours. Jeff's using PPCs rather than Intels - are they a bit slower?

Jeff

#9
For math and graphics, PPCs should be faster.
Jeff

=====

Old programmers don\'t die. They just parse on...



http://artfulcode.net\">Artful code

cormullion

#10
Interesting - and puzzling too, that a "2.5 ghz quad-core ppc with 4 gigs of ram" manages 20 seconds whereas a "2 ghz intel core duo with 1 gig of ram" manages 14... Still, the results are in the same general area - not 0.3 seconds, for example.