I'm trying to figure out if any of the items in a bigger list are in a smaller list. I've tried a bunch of options, but none of them are fast enough.
(setf big-list '( ((5 0 0 0 3 14 0 1 0 0 1 0 0 0 0 0 0) 1006) ((8 97 108 45 106 105 110 97 110 3 110 101 116 0 0 1 0 1) 1001) ((8 97 49 45 106 105 110 97 110 3 110 101 116 0 0 1 0 1) 1001) ((6 97 114 100 100 114 97 4 104 111 115 116 2 115 107 0 0 1 0 1) 1001) ((3 119 119 119 5 106 111 45 117 102 3 110 101 116 0 0 1 0 1) 1001) ((3 119 119 119 12 106 111 102 112 109 117 121 116 114 118 99 102 3 99 111 109 0 0 1 0 1) 1001) ((71 69 84 32) 1001) ((46 46 47 46 46 47 46 46 47) 232) ((205 128 232 215 255 255 255) 232) ((7 10 0 0 0 75 0 208 0) 286) ((7 0 0 0 0 75 0 208 0) 286) ((0 0 0 0 3 97 112 105 7) 1010) ))
;
(setf small-list '(59 126 1 0 0 1 0 0 0 0 0 0 3 97 112 105 7 116 119 105 116 116 101 114 3 99 111 109 0 0 1 0 1) )
Is there a REALLY FAST way to see if an item in the big-list is contained within the small-list and return the value associated with the big-list content? For example;
> (last big-list)
((0 0 0 0 3 97 112 105 7) 1010)
I can now do something like:
(intersect ((last big-list) 0) small-list)
; (0 3 97 112 105 7)
If the above was present in the small-list (which it is in this case), I'd want to return 1010, which is the "value" of that line in the big-list.
The only problem with this way of doing it is the "0" gets squashed into one value, so I don't get a "this is an exact match". Also, intersect seems to be slower than I'd like.
Thanks!
Kanen,
If you want to find an exact sublist in a longer list, then this might do:
(define (sublist? SUBLIST SUPERLIST)
"Check if SUBLIST occurs in SUPERLIST, for instance
(sublist? '(1 2 3) '(0 1 2 3 4)) => true."
(let (len-superlist (length SUPERLIST)
len-sublist (length SUBLIST))
(for (i 0 (- len-superlist len-sublist) 1
(= SUBLIST (slice SUPERLIST i len-sublist)))
nil)))
(define (second xs) (xs 1))
Then the following expression answers part of your question, namely "... see if an item in the big-list is contained within the small-list and return the value associated with the big-list content?" This will check all big-list items against small-list and return only the values.
> (map second (filter (fn (b) (sublist? (b 0) small-list)) big-list))
(1010)
Looks like there is only one "record" in big-list that contains a sublist of small-list: the one associated to the value 1010.
sublist? is not as fast as intersect, but it's correct, whereas the intersect usage is not (if I understand what you are trying to do). Maybe you can make sublist? faster. Cheers!
Hello Kanen,
When I wrote my last message, I had to guess (a little) as to what exactly you needed. Then today, I thought of another meaning regarding what you said.
When I coded sublist?, I assumed that you wanted to see if a big-list item (which was a list) was a sublist of small-list. Since order matters in lists, sublist? obeys the ordering of the elements of the lists involved. It also assumes that adjacent elements in the corresponding lists are preserved. For instance:
> ;; Order matters
> (sublist? '(3 4) '(0 1 2 3 4))
true
> (sublist? '(4 3) '(0 1 2 3 4))
nil
> ;; Adjacency matters
> (sublist? '(3 4) '(0 1 2 3 3.5 4))
nil
But what if you had really meant for the "containment" to not necessarily respect element order or adjacency? Then, a natural way to do that would be to treat the lists as multisets (//http). In this case, the test then changes from sublist? to subset? (where "set" in this context means multiset, a generalization of proper sets).
If that is what you were really after then please consider the following definitions.
(define (subset? A B)
"Is A a subset of B? (A and B are lists that act as multisets)."
(if (empty? A) true
(empty? B) nil
(letn (sizeA0 (length A) ; size of original A
sizeB0 (length B) ; size of original B
dsize0 (- sizeB0 sizeA0) ; original delta of sizes
sizeA sizeA0 sizeB sizeB0 dsize dsize0)
(while (and (>= dsize 0) (= dsize0 dsize)
(not (empty? A)))
(letn (elt (first A)
cnt (length (find-all elt A)))
(setq A (remove cnt elt A))
(setq B (remove cnt elt B))
(setq sizeA (length A))
(setq sizeB (length B))
(setq dsize (- sizeB sizeA))))
(empty? A))))
for which you will need the following definition.
(define (remove HOW-MANY ELT LST)
"Remove as many as HOW-MANY occurences of ELT in LST."
(let (elt-position true)
(while (and elt-position (> HOW-MANY 0))
(when (setq elt-position (find ELT LST))
(pop LST elt-position)
(dec HOW-MANY)))
LST))
Now, neither order nor adjacency matter.
> ;; Order doesn't matter now.
> (subset? '(3 4) '(0 1 2 3 4))
true
> (subset? '(4 3) '(0 1 2 3 4))
true
> ;; Adjacency doesn't matter either.
> (subset? '(3 4) '(0 1 2 3 3.5 4))
true
It just so happens that, with the given definitions of big-list and small-list, using subset? as the test yields the same answer as using sublist? as the test.
> (map second (filter (fn (b) (subset? (b 0) small-list)) big-list))
(1010)
And as before, the functions I defined might not meet your performance standards, so please bum (//http) the code as appropriate. :)