sorting

Started by eddier, March 09, 2005, 01:16:09 PM

Previous topic - Next topic

eddier

Lutz,



In a particular CGI program I am writing the following problem has come up.



I will have to write a sort based on a comparison function. For example, I need to ignore case of letters and I need to sort by a different element of lists of lists as in

(sort (("12" "Bb") ("32" "ab")) f)

where f is a function to compare the 2nd elements without case.

I was able to

(define (sort-by L col)
  (map (fn (y) (swap 0 col y))
       (sort
(map (fn (x) (swap 0 col x))
    L))))

but I still have the problem of a case sensitive sort.  If I code a sort myself in newlisp, I have the problem of setting the stack. Because lists may get rather long, how do I force a cgi to make newlisp change it's stack size?

Maybe I should write a c function store it in a library and use it?



Eddie

Lutz

#1
you can put comandline parameters for newlisp in the first line of your script:



#!/usr/bin/newlisp -s 10000



or



#!/usr/bin/newlisp -s10000



newLISP can parse both, but it also depends on your platform, in my experiments BSD and Linux work fine with both. I always use this as a test script:



#!/usr/bin/newlisp -s10000



(println (main-args))

(println (sys-info))



Lutz

eddier

#2
Thanks, then I have the following solution. I know merge sort uses a bit of memory but it works for now.



(define (merge-sort X f)
  (let (h (/ (length X) 2))
    (if (= (length X) 1)
X
      (merge
       (merge-sort (slice X 0 h) f)
       (merge-sort (slice X h) f))))
 
(define (merge X Y f)
  (cond ((empty? X) Y)
((empty? Y) X)
((f X Y)
(cons (first X) (merge (rest X) Y f)))
(true (cons (first Y) (merge X (rest Y) f)))))


where f is a comparison function. I kind of like to send a (fn (x y) ...)



Eddie

eddier

#3
After testing I have come to no conclusion. Which is faster (= (length X) 1) or (empty? (rest X))?



Sometimes the former is faster sometimes the latter is faster.



Eddie

Lutz

#4
Nice merge sort! In my measurements (empty? (rest X)) is about 10% faster than (= (length X) 1).



Lutz

eddier

#5
Thanks Lutz! Everytime I've tried to time the two I get such different results that I couldn't figure out which was faster.



The previous sort had a problem with the contents passed to the compare function. The following fixes the problem and adds the faster compare.



(define (merge-sort X f)
  (println X)
  (let (h (/ (length X) 2))
    (if (empty? (rest X))
X
      (merge
       (merge-sort (slice X 0 h) f)
       (merge-sort (slice X h) f)))))
 
(define (merge X Y)
  (cond ((empty? X) Y)
((empty? Y) X)
((f (first X) (first Y))
(cons (first X) (merge (rest X) Y)))
(true (cons (first Y) (merge X (rest Y))))))


After trying it out, I've found that this is too slow and uses too much memory for my CGI application. I'm going to have to make a library sort and import it. I can use one of the already tweeked quick sorts with a lisp comparison operation (I'll loose the benifits of the tweek from the lisp comparison). I'll get back to this after class.



Eddie

eddier

#6
Oops! leave the (println X) out.



Eddie

Lutz

#7
You could dothe following:



(set 'data '(("12" "Bb") ("32" "ab") ("43" "bx")) )



(1) extract the upper-cases sort-term together with an index into the list



=> (("BB" 0) ("AB" 1) ("BX" 2))



(2) sort the converted list:



=> (("AB" 1) ("BB" 0) ("BX" 2))



(3) extract the indices:



=> (1 0 2)



(4) select in sort order:



(select data '(2 0 3)) => (("32" "ab") ("12" "Bb") ("43" "bx"))



Lutz

eddier

#8
Thanks. I will see if this gives me the performance and flexibility I need.



Eddie

eddier

#9
Your solution works great! Yep this small and fast enough :)



(define (sort-by L col)
  (let (k -1)
    (select L
  (map last (sort
     (map (fn (x) (list (upper-case (nth col x)) (inc 'k)))
  L))))))


Eddie

eddier

#10
Lutz, The CGI works pretty fast connected straight to our server. I would like to know how fast it responds offsite. I'm not sure if this is intuitive to use so please feel free to suggest improvements.



[/url]http://www.bmc.edu/cgi-bin/alum.cgi">http://www.bmc.edu/cgi-bin/alum.cgi[/url]



Eddie

newdep

#11
I have a 300 Kbyte per second downstream line.

Here are the 10x results:





> (time (get-url "http://www.bmc.edu/cgi-bin/alum.cgi">http://www.bmc.edu/cgi-bin/alum.cgi" ))

4797

> (time (get-url "http://www.bmc.edu/cgi-bin/alum.cgi">http://www.bmc.edu/cgi-bin/alum.cgi" ))

3345

> (time (get-url "http://www.bmc.edu/cgi-bin/alum.cgi">http://www.bmc.edu/cgi-bin/alum.cgi" ))

3755

> (time (get-url "http://www.bmc.edu/cgi-bin/alum.cgi">http://www.bmc.edu/cgi-bin/alum.cgi" ))

3375

> (time (get-url "http://www.bmc.edu/cgi-bin/alum.cgi">http://www.bmc.edu/cgi-bin/alum.cgi" ))

3715

> (time (get-url "http://www.bmc.edu/cgi-bin/alum.cgi">http://www.bmc.edu/cgi-bin/alum.cgi" ))

4236

> (time (get-url "http://www.bmc.edu/cgi-bin/alum.cgi">http://www.bmc.edu/cgi-bin/alum.cgi" ))

4727

> (time (get-url "http://www.bmc.edu/cgi-bin/alum.cgi">http://www.bmc.edu/cgi-bin/alum.cgi" ))

4727

> (time (get-url "http://www.bmc.edu/cgi-bin/alum.cgi">http://www.bmc.edu/cgi-bin/alum.cgi" ))

3985

> (time (get-url "http://www.bmc.edu/cgi-bin/alum.cgi">http://www.bmc.edu/cgi-bin/alum.cgi" ))

3274





For a speed improvement you can create the page in a buffer

and then display the buffer in 1 step to the user, instead of

building it while downloading the page. But perhpas that is

just what the point is ;-)



Pretty good function ;-)



Regards, Norman.
-- (define? (Cornflakes))

eddier

#12
Thanks Norman :)



Connected directly to the network, Im getting

> (time (get-url "http://www.bmc.edu/cgi-bin/alum.cgi"))
409
> (time (get-url "http://www.bmc.edu/cgi-bin/alum.cgi"))
403
> (time (get-url "http://www.bmc.edu/cgi-bin/alum.cgi"))
428
> (time (get-url "http://www.bmc.edu/cgi-bin/alum.cgi"))
399
> (time (get-url "http://www.bmc.edu/cgi-bin/alum.cgi"))
399
> (time (get-url "http://www.bmc.edu/cgi-bin/alum.cgi"))
413
> (time (get-url "http://www.bmc.edu/cgi-bin/alum.cgi"))
397


But I notice from home things are much slower.


QuoteFor a speed improvement you can create the page in a buffer

and then display the buffer in 1 step to the user, instead of

building it while downloading the page. But perhpas that is

just what the point is ;-)


Exactly! I build the page on the fly because I have different programs on two other clients that update the data and I'm going to use this same prototype CGI (with a few modifications) to genereate all the table listings on our cite. I'll have to modify it a small bit for context so that it will know which data to grab and make it a bit more generic.



The main part of the code is to sort data by the column title clicked and generate an index for quick scrolling to the data items. I'm not sure I like the way graduation dates are indexed. Click the column with "Class of" and notice how it does the hyphenated ones. What do you think about ?



Eddie[/quote]

newdep

#13
Hi Eddie,



Its a pretty nice CGI composition..quiet fast, but i think not fast enough because i have a fast internet connection and others perhpas dont. If its

for internal network use only its fast enough..



The "bottle-neck" looks to be the collection of the data by the scripts

behind the cgi script. The CGI scripts itself only generates the webpage (which is fast). If you have a problem with the script running behind the cgi-script

then the webuser needs to wait and wait.. i guess..If you can generate the output webpage in a memory buffer in newlisp (using [text] [/text]) and then dump it to the webuser is much faster.. it is possible doing this with dynamic content also..



Another option is perhpas building a little Cashing inside the CGI script..

So befor you display you must i.e. already have loaded 2 KB of data from

you database befor displaying..This way you get bigger steps in displaying the content to the webuser.



Regards, Norman.
-- (define? (Cornflakes))

eddier

#14
The way data is collected is through the (load mechanism). The client programs create a list of lists from the a database and then ftp the file to the server. I'm now thinking of changing their scripts to just dump 4 html pages onto the server instead of going through CGI. That way the response will be nearly instantaneous.



Eddie