Print Page - Optimizing a function

Title: Optimizing a function
Post by: Jeff on September 02, 2008, 09:39:10 AM

I have been playing with some of the shootout functions and trying to get decent results with newlisp. Here is the fastest implementation I have so far for the sieve of eratosthenes:

Code Select Expand
(define (sieve size , flags total)
  (set 'flags (array (+ size 1))
       'total 0)
  (for (i 2 (- size 1))
    (when (not (flags i))
      (for (k (+ i i) size i (> k size))
        (nth-set (flags k) 1))
      (inc 'total)))
  total)

(define (m n)
  (* (pow 2 n) 10000))

(for (i 9 7)
  (let (n (m i))
    (println (format "Primes up to %8d %8d" n (sieve n)))))

It's roughly the same as the incrementing algorithm in the nsieve programs on the shootout site. I'm able to run it in around 20 seconds on my machine (which is pretty fast). The PHP implementation runs quite a bit faster than that. That shouldn't be right... PHP isn't really optimized for this kind of work. Any ideas on how to speed this up?[/code]

Title:
Post by: cormullion on September 02, 2008, 03:15:29 PM

14 seconds here...

(iMac 2GHz Core Duo)

for comparison, the Ruby version (//http://shootout.alioth.debian.org/gp4/benchmark.php?test=nsieve&lang=ruby&id=1) took 48 seconds.

I don't see any obvious ways to improve it...

Lutz used character vectors and cpymem for his... //http://www.newlisp.org/syntax.cgi?benchmarks/nsieve.newlisp.txt

Title:
Post by: xytroxon on September 02, 2008, 04:49:21 PM

To really optimise code , you have to use "small iron" cpus to see the problems ;)

I tried nsieve on an old Win98 laptop (48mb ram 120 mhz) machine with PHP 5.2.2, LUA 5.1, and Python 2.5...

-------

->"C:Program Filesphp5php.exe" test.php

Primes up to 10000 1229

Primes up to 0 0

Primes up to 0 0

------

->"C:Lualua5.1.exe" test.lua

Primes up to 40000 4203

Primes up to 20000 2262

Primes up to 10000 1229

------

->C:Python25PYTHON.EXE test.py

Traceback (most recent call last):

File "test.py", line 18, in <module>

nsieve((1 << (int(sys.argv[1]) - k)) * 10000)

IndexError: list index out of range

------

PHP and Lua both loaded and ran in a few seconds... Until hitting memory limits and returning partial results...

Python crashes, I think for not being able to solce the problem due to lack of memory space... Required indentation makes copy and paste Python programming at best a time consuming pain and at worst, a dangerous practice!

Your newLISP code Jeff, "runs forever" at 100% CPU usage, I killed it after 10 minutes... It continually accesses the hard drive to use the hard drive virtual memory... (The functional language/VN disk thrash record on this machine is held by PLT Scheme's Dr. Scheme IDE, which takes almost 50 minutes to load and run!)

So you might be having memory "juggling" issues that don't show up on large memory machines, but requires more time than expected for your code to execute...

Swans and Lisp(s) look gracefully smooth on the surface, while paddling like heck under the surface!

-- xytroxon

Title:
Post by: Jeff on September 02, 2008, 05:03:58 PM

Well, look at the numbers. It's making arrays of 512 * 10000 elements. Run it with smaller sizes and multiply the results. The array is used for constant access speeds.

The same with the other languages.

Xytroxon - it ran consistently at 20-21 seconds on my 2.5 ghz quad-core ppc with 4 gigs of ram.

Title:
Post by: xytroxon on September 02, 2008, 05:29:43 PM

~~Quote from: "Jeff"~~Well, look at the numbers. It's making arrays of 512 * 10000 elements. Run it with smaller sizes and multiply the results. The array is used for constant access speeds.

The same with the other languages.

Xytroxon - it ran consistently at 20-21 seconds on my 2.5 ghz quad-core ppc with 4 gigs of ram.

Backed it down to 1000

D:MinGW - 18:14:04.58 Tue 09-02-2008

->"c:program filesphp5php.exe" test.php

Primes up to 1000 168

Primes up to 0 0

Primes up to 0 0

D:MinGW - 18:14:30.78 Tue 09-02-2008

->nl test.nl

Primes up to 512000 42445

Primes up to 256000 22525

Primes up to 128000 11987

D:MinGW - 18:16:20.52 Tue 09-02-2008

->lua test.lua

"C:Lualua5.1.exe" test.lua

Primes up to 4000 550

Primes up to 2000 303

Primes up to 1000 168

----

The spaces all look right, but Python is still MIA...

-----

While newLISP takes a few minutes to run... It looks like it is doing the calculations on this machine! Amazing!!!

Is thw PHP code the problem? Or is the Zend engine kicking in to help the speed???

This was fun... Timw for supper!!!

Title:
Post by: Kazimir Majorinc on September 02, 2008, 06:32:23 PM

Code Select Expand
(setq sieve (lambda(size)
              (let ((flags (array (+ size 1)))
                    (sqrtsize (sqrt size))
                    (total 0))
                    
                   (for (i 2 sqrtsize)
                     (when (not (flags i))
                           (for (k (* i i) size i)
                             (nth-set (flags k) 0))
                           (inc 'total)))
                           
                   (for (i (+ sqrtsize 1) size)
                     (unless (flags i)
                             (inc 'total)))
                     
                   total)))

This one is ~ 1.5 times faster, but trick is mathematical, not programming.

Title:
Post by: cormullion on September 03, 2008, 11:55:41 AM

I can do it in 11 seconds (as opposed to 14) if I get spawn on the case.

Kasimir's function and spawn get me down to 6 seconds... :)

Title:
Post by: unixtechie on September 06, 2008, 12:26:33 AM

.. what is it I am doing wrong?

running your snippet on a 1.8GHz single-core Intel compaq/HP laptop with 1GB of memory takes 13 seconds under Linux

Title:
Post by: cormullion on September 06, 2008, 01:15:27 AM

Looks like an unsurprising time to me. My 2Ghz Core Duo runs Jeff's code in 14 seconds (presumably without engaging both CPUs). Allowing for the way I'm running scripts (not from the command line but from an editor) that seems to match yours. Jeff's using PPCs rather than Intels - are they a bit slower?

Title:
Post by: Jeff on September 06, 2008, 04:32:46 AM

For math and graphics, PPCs should be faster.

Title:
Post by: cormullion on September 06, 2008, 09:56:26 AM

Interesting - and puzzling too, that a "2.5 ghz quad-core ppc with 4 gigs of ram" manages 20 seconds whereas a "2 ghz intel core duo with 1 gig of ram" manages 14... Still, the results are in the same general area - not 0.3 seconds, for example.

newLISP Fan Club

Forum => Anything else we might add? => Topic started by: Jeff on September 02, 2008, 09:39:10 AM