newLISP Fan Club

Forum => newLISP in the real world => Topic started by: cormullion on February 27, 2011, 09:21:47 AM

Title: This week's challenge
Post by: cormullion on February 27, 2011, 09:21:47 AM
Write a function that transforms a list of strings:


(set 'l '(
"aa"
"a12"
"aaaa123"
"aaa12"
"aa1112"
"b"
"ba"
"b12"
"bbaa123"
"baa12"
"ba1112"
"c1"
"c2"
"c1313"
"c1121aa"
"caababb"
; and so on
))

into a list of lists of strings:
'(("a12" "aa" "aa1112" "aaa12" "aaaa123")
  ("b" "b12" "ba" "ba1112" "baa12" "bbaa123")
  ("c1" "c1121aa" "c1313" "c2" "caababb"))
Title: Re: This week's challenge
Post by: Sammo on February 27, 2011, 08:53:06 PM
Probably not elegant but it works:
(define (magoo list-of-strings)
(magoo-aux (sort list-of-strings) '()))

(define (magoo-aux list-of-strings list-of-lists)
(if (empty? list-of-strings)
list-of-lists
  ;else
    (local (newlist)
(push (pop list-of-strings) newlist)
(while (and
(not (empty? list-of-strings))
  (= ((list-of-strings 0) 0) ((newlist 0) 0))
(push (pop list-of-strings) newlist -1))
)
(magoo-aux list-of-strings (push newlist list-of-lists -1))
)
)
)
Title: Re: This week's challenge
Post by: kosh on February 27, 2011, 10:16:14 PM
It works :)


(define (func lst)
  (map (lambda (c)
         (filter (lambda (elem)
                   (starts-with elem c))
                 (sort lst)))
       (unique (map first lst))))

(func l)
;=> (("a12" "aa" "aa1112" "aaa12" "aaaa123")
;    ("b" "b12" "ba" "ba1112" "baa12" "bbaa123")
;    ("c1" "c1121aa" "c1313" "c2" "caababb"))
Title: Re: This week's challenge
Post by: cormullion on February 28, 2011, 01:27:54 AM
Good solutions! Kosh's solution is neater and quick for small lists, but Sammo's solution is quicker for large lists...
Title: Re: This week's challenge
Post by: johu on February 28, 2011, 02:19:58 AM
Belatedly, I try by using dolist.


(define (grouped-list lst (func (fn (x y) (= (x 0) (y 0)))))
  (sort lst)
  (local (res tmp)
    (dolist (x lst)
      (unless (or (not tmp) (func x (tmp -1)))
        (push tmp res -1) (setq tmp '()))
      (push x tmp -1))
  (push tmp res -1)))

This is used as following:


> (grouped-list l)
(("a12" "aa" "aa1112" "aaa12" "aaaa123") ("b" "b12" "ba" "ba1112" "baa12" "bbaa123")
 ("c1" "c1121aa" "c1313" "c2" "caababb"))
> (setq nums (map int (map (curry mul 1000) (random 0 1 100))))
(81 806 855 124 307 573 472 762 191 727 995 15 647 691 174 72 274 676 838 556 371
 779 232 638 231 204 971 281 22 769 151 46 9 348 643 343 414 416 90 545 699 337 657
 876 535 570 53 465 259 426 863 486 87 565 709 778 864 850 293 864 644 746 828 765
 871 908 912 521 440 107 354 823 427 169 0 657 773 63 89 385 683 807 154 142 48 346
 18 978 647 53 914 637 431 21 242 391 654 781 22 826)
> (grouped-list nums (fn (x y) (= (length (string x)) (length (string y)))))
((0 9) (15 18 21 22 22 46 48 53 53 63 72 81 87 89 90) (107 124 142 151 154 169 174
  191 204 231 232 242 259 274 281 293 307 337 343 346 348 354 371 385 391 414 416
  426 427 431 440 465 472 486 521 535 545 556 565 570 573 637 638 643 644 647 647
  654 657 657 676 683 691 699 709 727 746 762 765 769 773 778 779 781 806 807 823
  826 828 838 850 855 863 864 864 871 876 908 912 914 971 978 995))
>
Title: Re: This week's challenge
Post by: cormullion on February 28, 2011, 04:29:26 AM
Excellent - very quick and flexible!
Title: Re: This week's challenge
Post by: cormullion on February 28, 2011, 05:00:30 AM
I was hoping for some success from a hash table approach:


(new Tree 'D)
(dolist (i l)
   (set 'd (D (first i)))
   (if d
       (D (first i) (cons i d))
       (D (first i) (list i))))


but this is very slow. Perhaps I'm copying lists too much? :)



Results so far for a 30000 element list:


grouped-list:   161.906
magoo:          998.97
func:          4875.455
tree:         28030.138
Title: Re: This week's challenge
Post by: Ormente on February 28, 2011, 08:43:42 AM
here's my code :



(define (do-it l , acc p c)
(setf
acc '()
p -1
c   ""
)
(dolist (i (sort l))
(when (!= c (first i))
(setf c (first i))
(inc p)
(push '() acc -1)
)
(push i (acc p) -1)
)
acc
)


How does it perform against your big list ?
Title: Re: This week's challenge
Post by: cormullion on February 28, 2011, 12:05:23 PM
The best so far!


do-it:          81.024
grouped-list:   158.632
Magoo:          992.97
Func:           4829.411
tree:           27848.937
Title: Re: This week's challenge
Post by: Ormente on February 28, 2011, 10:46:44 PM
Cool :-)



I started initialy by doing it "lispishly", but it apears to be extremely ineficient (but more elegant), due do a lot of copies newlisp have to make and carry along :

(define (heads l)
(unique (map first l))
)

(define (members l class)
(filter (fn (x) (= class (first x))) l)
)

(define (doit l)
(map (fn (c) (members l c)) (heads l))
)


So, i still thinks newlisp is pragmatic, practical and useful, and allow elegant "functional style" solutions, but it is often far far more efficient when doing things in the "imperative style". I like it, but like Haskell too ;-)



BTW, would you mind sharing your test list ?
Title: Re: This week's challenge
Post by: cormullion on March 01, 2011, 01:43:37 AM
The list is - as usual - the text of a Sherlock Holmes novel. Download "The Hound of the Baskervilles" from Project Gutenberg (http://www.gutenberg.org/ebooks/2852), then strip the adminstrivia from the beginning. Now:


(set 'l (0 30000 (map lower-case (clean empty? (parse (read-file "/Users/me/hound-of-baskervilles.txt") "\W" 0)))))


and you have my list, precisely.



As Sherlock Holmes might have said: "Sadly, Watson, it is not always the most elegant solution that is the most efficient." :)
Title: Re: This week's challenge
Post by: Ormente on March 01, 2011, 03:23:32 AM
Yes, Sherlock, that's so true !



Thanks for the dataset.
Title: Re: This week's challenge
Post by: cormullion on March 02, 2011, 01:52:29 AM
I made a couple more attempts to move up from the bottom of the speed table :) - since I couldn't see how to speed up the dictionary approach.



I tried find:


(sort L)
(set 'marker 0)
(dolist (c (explode "0123456789abcdefghijklmnopqrstuvwxyz"))
    (set 'start (find c L (fn (x y) (starts-with y x))))
    (when start
          (push (marker (- start marker) L) res -1)
          (set 'marker start)))


Then using find-all:


(dolist (chr (explode "0123456789abcdefghijklmnopqrstuvwxyz"))
        (push (find-all chr L $it (fn (x y) (starts-with y x))) result -1))


But there's an obvious problem with this approach: some elements will be tested up to 36 times! So why not remove the results from the list as we go, using a 'set difference'?


(dolist (chr (explode "0123456789abcdefghijklmnopqrstuvwxyz"))
        (push (find-all chr L $it (fn (x y) (starts-with y x))) result1 -1)
        (set 'L (difference L result1)))


Better. But still nowhere near as fast as the fastest, though!


do-it:                     85.468
grouped-list:             178.692
find-all set difference:  598.032
find:                     869.883
Magoo:                   1060.935
find-all:                2050.081
Func:                    5439.64
tree:                   30974.859
Title: Re: This week's challenge
Post by: cormullion on March 03, 2011, 02:48:53 AM
If only Lutz had been here to show me what I was doing wrong... :) It turns out that Tree is in fact the fastest method, not the slowest, and I'd been puzzling as to its poor showing. But I'd just coded it carelessly.



This:


(new Tree 'D)
(dolist (i L)
    (set 'c (first i))
    (if (D c)
        (push i (D c) -1)
        (D c (list i))))


is, at 50ms, slightly faster than the broadly similar do-it's time of 80ms.
Title: Re: This week's challenge
Post by: Ormente on March 03, 2011, 03:32:50 AM
nice move ;-)



I tried something similar, but keeping "sort l" made it nearly the same speed as do-it.



Whithout the sort, you're faster. You don't get the exact same output (not a list), but having the result in a hashtable make accessing individual sublists fast and easy. Cool.
Title: Re: This week's challenge
Post by: Lutz on March 03, 2011, 05:21:29 AM
You could extract the list of lists from the hash namespace D like this:


(map last (D))

Ps: Don't have much time for the forum these days, as I am busy moving back from Florida to California.