This week's challenge

Started by cormullion, February 27, 2011, 09:21:47 AM

Previous topic - Next topic

cormullion

Write a function that transforms a list of strings:


(set 'l '(
"aa"
"a12"
"aaaa123"
"aaa12"
"aa1112"
"b"
"ba"
"b12"
"bbaa123"
"baa12"
"ba1112"
"c1"
"c2"
"c1313"
"c1121aa"
"caababb"
; and so on
))

into a list of lists of strings:
'(("a12" "aa" "aa1112" "aaa12" "aaaa123")
  ("b" "b12" "ba" "ba1112" "baa12" "bbaa123")
  ("c1" "c1121aa" "c1313" "c2" "caababb"))

Sammo

#1
Probably not elegant but it works:
(define (magoo list-of-strings)
(magoo-aux (sort list-of-strings) '()))

(define (magoo-aux list-of-strings list-of-lists)
(if (empty? list-of-strings)
list-of-lists
  ;else
    (local (newlist)
(push (pop list-of-strings) newlist)
(while (and
(not (empty? list-of-strings))
  (= ((list-of-strings 0) 0) ((newlist 0) 0))
(push (pop list-of-strings) newlist -1))
)
(magoo-aux list-of-strings (push newlist list-of-lists -1))
)
)
)

kosh

#2
It works :)


(define (func lst)
  (map (lambda (c)
         (filter (lambda (elem)
                   (starts-with elem c))
                 (sort lst)))
       (unique (map first lst))))

(func l)
;=> (("a12" "aa" "aa1112" "aaa12" "aaaa123")
;    ("b" "b12" "ba" "ba1112" "baa12" "bbaa123")
;    ("c1" "c1121aa" "c1313" "c2" "caababb"))

cormullion

#3
Good solutions! Kosh's solution is neater and quick for small lists, but Sammo's solution is quicker for large lists...

johu

#4
Belatedly, I try by using dolist.


(define (grouped-list lst (func (fn (x y) (= (x 0) (y 0)))))
  (sort lst)
  (local (res tmp)
    (dolist (x lst)
      (unless (or (not tmp) (func x (tmp -1)))
        (push tmp res -1) (setq tmp '()))
      (push x tmp -1))
  (push tmp res -1)))

This is used as following:


> (grouped-list l)
(("a12" "aa" "aa1112" "aaa12" "aaaa123") ("b" "b12" "ba" "ba1112" "baa12" "bbaa123")
 ("c1" "c1121aa" "c1313" "c2" "caababb"))
> (setq nums (map int (map (curry mul 1000) (random 0 1 100))))
(81 806 855 124 307 573 472 762 191 727 995 15 647 691 174 72 274 676 838 556 371
 779 232 638 231 204 971 281 22 769 151 46 9 348 643 343 414 416 90 545 699 337 657
 876 535 570 53 465 259 426 863 486 87 565 709 778 864 850 293 864 644 746 828 765
 871 908 912 521 440 107 354 823 427 169 0 657 773 63 89 385 683 807 154 142 48 346
 18 978 647 53 914 637 431 21 242 391 654 781 22 826)
> (grouped-list nums (fn (x y) (= (length (string x)) (length (string y)))))
((0 9) (15 18 21 22 22 46 48 53 53 63 72 81 87 89 90) (107 124 142 151 154 169 174
  191 204 231 232 242 259 274 281 293 307 337 343 346 348 354 371 385 391 414 416
  426 427 431 440 465 472 486 521 535 545 556 565 570 573 637 638 643 644 647 647
  654 657 657 676 683 691 699 709 727 746 762 765 769 773 778 779 781 806 807 823
  826 828 838 850 855 863 864 864 871 876 908 912 914 971 978 995))
>

cormullion

#5
Excellent - very quick and flexible!

cormullion

#6
I was hoping for some success from a hash table approach:


(new Tree 'D)
(dolist (i l)
   (set 'd (D (first i)))
   (if d
       (D (first i) (cons i d))
       (D (first i) (list i))))


but this is very slow. Perhaps I'm copying lists too much? :)



Results so far for a 30000 element list:


grouped-list:   161.906
magoo:          998.97
func:          4875.455
tree:         28030.138

Ormente

#7
here's my code :



(define (do-it l , acc p c)
(setf
acc '()
p -1
c   ""
)
(dolist (i (sort l))
(when (!= c (first i))
(setf c (first i))
(inc p)
(push '() acc -1)
)
(push i (acc p) -1)
)
acc
)


How does it perform against your big list ?

cormullion

#8
The best so far!


do-it:          81.024
grouped-list:   158.632
Magoo:          992.97
Func:           4829.411
tree:           27848.937

Ormente

#9
Cool :-)



I started initialy by doing it "lispishly", but it apears to be extremely ineficient (but more elegant), due do a lot of copies newlisp have to make and carry along :

(define (heads l)
(unique (map first l))
)

(define (members l class)
(filter (fn (x) (= class (first x))) l)
)

(define (doit l)
(map (fn (c) (members l c)) (heads l))
)


So, i still thinks newlisp is pragmatic, practical and useful, and allow elegant "functional style" solutions, but it is often far far more efficient when doing things in the "imperative style". I like it, but like Haskell too ;-)



BTW, would you mind sharing your test list ?

cormullion

#10
The list is - as usual - the text of a Sherlock Holmes novel. Download "The Hound of the Baskervilles" from Project Gutenberg (http://www.gutenberg.org/ebooks/2852">http://www.gutenberg.org/ebooks/2852), then strip the adminstrivia from the beginning. Now:


(set 'l (0 30000 (map lower-case (clean empty? (parse (read-file "/Users/me/hound-of-baskervilles.txt") "\W" 0)))))


and you have my list, precisely.



As Sherlock Holmes might have said: "Sadly, Watson, it is not always the most elegant solution that is the most efficient." :)

Ormente

#11
Yes, Sherlock, that's so true !



Thanks for the dataset.

cormullion

#12
I made a couple more attempts to move up from the bottom of the speed table :) - since I couldn't see how to speed up the dictionary approach.



I tried find:


(sort L)
(set 'marker 0)
(dolist (c (explode "0123456789abcdefghijklmnopqrstuvwxyz"))
    (set 'start (find c L (fn (x y) (starts-with y x))))
    (when start
          (push (marker (- start marker) L) res -1)
          (set 'marker start)))


Then using find-all:


(dolist (chr (explode "0123456789abcdefghijklmnopqrstuvwxyz"))
        (push (find-all chr L $it (fn (x y) (starts-with y x))) result -1))


But there's an obvious problem with this approach: some elements will be tested up to 36 times! So why not remove the results from the list as we go, using a 'set difference'?


(dolist (chr (explode "0123456789abcdefghijklmnopqrstuvwxyz"))
        (push (find-all chr L $it (fn (x y) (starts-with y x))) result1 -1)
        (set 'L (difference L result1)))


Better. But still nowhere near as fast as the fastest, though!


do-it:                     85.468
grouped-list:             178.692
find-all set difference:  598.032
find:                     869.883
Magoo:                   1060.935
find-all:                2050.081
Func:                    5439.64
tree:                   30974.859

cormullion

#13
If only Lutz had been here to show me what I was doing wrong... :) It turns out that Tree is in fact the fastest method, not the slowest, and I'd been puzzling as to its poor showing. But I'd just coded it carelessly.



This:


(new Tree 'D)
(dolist (i L)
    (set 'c (first i))
    (if (D c)
        (push i (D c) -1)
        (D c (list i))))


is, at 50ms, slightly faster than the broadly similar do-it's time of 80ms.

Ormente

#14
nice move ;-)



I tried something similar, but keeping "sort l" made it nearly the same speed as do-it.



Whithout the sort, you're faster. You don't get the exact same output (not a list), but having the result in a hashtable make accessing individual sublists fast and easy. Cool.