Faster than intersect?

Started by kanen, April 12, 2011, 05:33:03 PM

Previous topic - Next topic

kanen

While I am asking about speed...



I am trying to find a set of values within another, larger set of values:



(I removed some of the code formatting so bold items appear properly)



The big string is:

(set 'big '(1 1 5 10 82 222 118 135 82 222 120 79 10 235 83 241 225 229 147 140 238 0 61 97 224 64 31 205 40 134 125 48 169 148 82 132 66 51 32 246 120 99 236 162 100 96 169 220 227 31 11 99 42 103 41 161 227 204 234 48 49 137 236 124 132 152 255 89 121 201 120 199 2 105 6 126 52 120 23 157 57 90 212 125 158 47 33 57 4 200 126 118) )



The substrings are:



(set 'subs '((89 71 0 11 0 0 0 0 0 18 0 0 0 0)

 (92 0 116 0 101 0 115 0 116 0 46 0 112 0 119 0 100)

 (97 37 97 37 97 37 97 37 97 37 97 37 97 37)

 (101 32 99 108 105 101 110 116 32 105 110 102 111 114 109 97)

 (101 122 105 112 58 47 47 98 108 97 47 98 108 97 63 83 78 61 98 108 97 63 80 78 61 98 108 97 63 85 78 61 98 108 97)

 (103 0 115 0 101 0 99 0 100 0 117 0 109 0 112 0 46 0 101 0 120 0 101)

 (108 0 115 0 114 0 101 0 109 0 111 0 114 0 97)

 (133 0 0 1 0 1 0 1)

 (171 205 9 128 0 0 0 1 0 0 0 0 0 0 1 0 1)

 (186 213 25 93 134 103 213 142 127 188 208 60 110 216 226 23 22 232 58 159 207 89 184 123 246)

 (200 79 50 75 112 22 211 1 18 120 90 71 191 110 225 136)

 (201 120 199 2 105 6 126 52 120 23)

 (211 200 197 204 204 197 216 197 195 213 212 197)

 (243 232 229 236 236 229 248 229 227 245 244 229)) )



I have tried several different ways to to this:

(map (fn (c) (if (= c (intersect c big true)) true nil)) subs)
; (nil nil nil nil nil nil true nil nil nil nil nil nil nil nil nil nil true nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil true nil nil)
(find true
 (map (fn (c) (if (= c (intersect c big true)) true nil)) subs))
; 6, 17
;
; But, the time is 100 milliseconds or more...


What am I missing? There has to be a faster way.
. Kanen Flowers http://kanen.me[/url] .

Lutz

#1
Convert everything back to byte buffers then use 'find'.


; convert number lists to byte buffers
(set 'bigs (pack (dup "b" (length big)) big))
(set 'subss (map (fn (l) (pack (dup "b" (length l)) l)) subs))

; use find
(clean nil? (map (fn (s) (find s bigs)) subss)) ;=> (69)


As you are dealing with binary byte buffers (network packets), I would never leave that domain and do everything with byte buffers from the beginning. Then you also don't need the first to lines converting from number lists to binary byte buffers. The last statement is about 80 micro seconds on a Mac Mini, more than a 1000 times faster then the other method.



I would use numbers only when you have to visualize data for debugging purpose.