Lists within lists and finding things...

Started by kanen, August 19, 2013, 08:33:28 PM

Previous topic - Next topic

kanen

I'm trying to figure out if any of the items in a bigger list are in a smaller list. I've tried a bunch of options, but none of them are fast enough.


(setf big-list '( ((5 0 0 0 3 14 0 1 0 0 1 0 0 0 0 0 0) 1006) ((8 97 108 45 106 105 110 97 110 3 110 101 116 0 0 1 0 1) 1001) ((8 97 49 45 106 105 110 97 110 3 110 101 116 0 0 1 0 1) 1001) ((6 97 114 100 100 114 97 4 104 111 115 116 2 115 107 0 0 1 0 1) 1001) ((3 119 119 119 5 106 111 45 117 102 3 110 101 116 0 0 1 0 1) 1001) ((3 119 119 119 12 106 111 102 112 109 117 121 116 114 118 99 102 3 99 111 109 0 0 1 0 1) 1001) ((71 69 84 32) 1001) ((46 46 47 46 46 47 46 46 47) 232) ((205 128 232 215 255 255 255) 232) ((7 10 0 0 0 75 0 208 0) 286) ((7 0 0 0 0 75 0 208 0) 286) ((0 0 0 0 3 97 112 105 7) 1010) ))
;
(setf small-list '(59 126 1 0 0 1 0 0 0 0 0 0 3 97 112 105 7 116 119 105 116 116 101 114 3 99 111 109 0 0 1 0 1) )


Is there a REALLY FAST way to see if an item in the big-list is contained within the small-list and return the value associated with the big-list content?  For example;


> (last big-list)
((0 0 0 0 3 97 112 105 7) 1010)


I can now do something like:


(intersect ((last big-list) 0) small-list)
; (0 3 97 112 105 7)


If the above was present in the small-list (which it is in this case), I'd want to return 1010, which is the "value" of that line in the big-list.



The only problem with this way of doing it is the "0" gets squashed into one value, so I don't get a "this is an exact match". Also, intersect seems to be slower than I'd like.



Thanks!
. Kanen Flowers http://kanen.me[/url] .

rickyboy

#1
Kanen,



If you want to find an exact sublist in a longer list, then this might do:


(define (sublist? SUBLIST SUPERLIST)
  "Check if SUBLIST occurs in SUPERLIST, for instance
   (sublist? '(1 2 3) '(0 1 2 3 4)) => true."
  (let (len-superlist (length SUPERLIST)
        len-sublist (length SUBLIST))
    (for (i 0 (- len-superlist len-sublist) 1
            (= SUBLIST (slice SUPERLIST i len-sublist)))
      nil)))

(define (second xs) (xs 1))

Then the following expression answers part of your question, namely "... see if an item in the big-list is contained within the small-list and return the value associated with the big-list content?"  This will check all big-list items against small-list and return only the values.


> (map second (filter (fn (b) (sublist? (b 0) small-list)) big-list))
(1010)

Looks like there is only one "record" in big-list that contains a sublist of small-list: the one associated to the value 1010.



sublist? is not as fast as intersect, but it's correct, whereas the intersect usage is not (if I understand what you are trying to do).  Maybe you can make sublist? faster.  Cheers!
(λx. x x) (λx. x x)

rickyboy

#2
Hello Kanen,



When I wrote my last message, I had to guess (a little) as to what exactly you needed.  Then today, I thought of another meaning regarding what you said.



When I coded sublist?, I assumed that you wanted to see if a big-list item (which was a list) was a sublist of small-list.  Since order matters in lists, sublist? obeys the ordering of the elements of the lists involved.   It also assumes that adjacent elements in the corresponding lists are preserved.  For instance:


> ;; Order matters
> (sublist? '(3 4) '(0 1 2 3 4))
true
> (sublist? '(4 3) '(0 1 2 3 4))
nil
> ;; Adjacency matters
> (sublist? '(3 4) '(0 1 2 3 3.5 4))
nil

But what if you had really meant for the "containment" to not necessarily respect element order or adjacency?  Then, a natural way to do that would be to treat the lists as http://planetmath.org/multiset">multisets.  In this case, the test then changes from sublist? to subset? (where "set" in this context means multiset, a generalization of proper sets).



If that is what you were really after then please consider the following definitions.


(define (subset? A B)
  "Is A a subset of B? (A and B are lists that act as multisets)."
  (if (empty? A) true
      (empty? B) nil
      (letn (sizeA0 (length A)        ; size of original A
             sizeB0 (length B)        ; size of original B
             dsize0 (- sizeB0 sizeA0) ; original delta of sizes
             sizeA sizeA0 sizeB sizeB0 dsize dsize0)
        (while (and (>= dsize 0) (= dsize0 dsize)
                    (not (empty? A)))
          (letn (elt (first A)
                 cnt (length (find-all elt A)))
            (setq A (remove cnt elt A))
            (setq B (remove cnt elt B))
            (setq sizeA (length A))
            (setq sizeB (length B))
            (setq dsize (- sizeB sizeA))))
        (empty? A))))

for which you will need the following definition.


(define (remove HOW-MANY ELT LST)
  "Remove as many as HOW-MANY occurences of ELT in LST."
  (let (elt-position true)
    (while (and elt-position (> HOW-MANY 0))
      (when (setq elt-position (find ELT LST))
        (pop LST elt-position)
        (dec HOW-MANY)))
    LST))

Now, neither order nor adjacency matter.


> ;; Order doesn't matter now.
> (subset? '(3 4) '(0 1 2 3 4))
true
> (subset? '(4 3) '(0 1 2 3 4))
true
> ;; Adjacency doesn't matter either.
> (subset? '(3 4) '(0 1 2 3 3.5 4))
true

It just so happens that, with the given definitions of big-list and small-list, using subset? as the test yields the same answer as using sublist? as the test.


> (map second (filter (fn (b) (subset? (b 0) small-list)) big-list))
(1010)

And as before, the functions I defined might not meet your performance standards, so please http://www.panix.com/~ilaine/jargon/jargon.cgi?bum">bum the code as appropriate. :)
(λx. x x) (λx. x x)