Faster than find?

Started by kanen, April 12, 2011, 05:13:17 PM

Previous topic - Next topic

kanen

I have the following issue:



> (dolist (s (sequence 1 10000)) (push (rand 300 4) y -1))
> (set 'z (y -2))  
; this is the second to last {number} :)
> (time (find z y))
; 8.x ms


Why is this a problem? I need a faster search through these numbers. 8+ milliseconds is way too long.



The numbers must be in this format, or something very close. Any extra processing to convert the input (which is z in this case) could take too long.



I can happily convert the y to a better format, if it improves the search speed.



Is there a faster find I should be considering?
. Kanen Flowers http://kanen.me[/url] .

cormullion

#1
Hmm. Tricky to know where to make the trade-offs. Are the numbers/lists unique? If so, you could use a dictionary, which is probably much faster for recall, although the building of said dictionary could be more time-consuming, although perhaps it could be done during the more contemplative parts of your code.

kanen

#2
The large list can be pre-built.



The thing-to-find can be dynamically assembled in the same format as the large list.



Elaborate on your dictionary idea. I would love to hear your thoughts.


Quote from: "cormullion"Hmm. Tricky to know where to make the trade-offs. Are the numbers/lists unique? If so, you could use a dictionary, which is probably much faster for recall, although the building of said dictionary could be more time-consuming, although perhaps it could be done during the more contemplative parts of your code.
. Kanen Flowers http://kanen.me[/url] .

cormullion

#3
Well, imagine that you stored the list of four numbers with a string in a dictionary:


(new Tree 'H)
(dolist (s (sequence 1 10000))
    (set 'n (rand 300 4))
    (H (format "%03d%03d%03d%03d" n) n))


Now H is:
("003263145157" (3 263 145 157))
("005053174019" (5 53 174 19))
..


I would expect lookups to be faster:


(set 'z ((H) -2))
(H (format "%03d%03d%03d%03d" (last z)))


since dictionaries are faster than lists for looking up values. At least, I think they should be.  But of course there is some additional overhead...



The question of uniqueness of keys is still pertinent, though... :)

kanen

#4
Genius.



I should have considered it, but I didn't think it would be that much faster.



Turns out, I was totally wrong. It is way faster.



(new Tree 'H)
(dolist (s (sequence 1 10000))
    (set 'n (rand 300 4))
    (H (format "%03d%03d%03d%03d" n) n))

(164 274 162 184)

> (time (H "164274162184"))
0.011


Pretty fast! Let's try another one...



> ((H) -2)
("299285172263" (299 285 172 263))

> (time (H "299285172263"))


Now that you've shared... remind me to write something about my contexts as multiple timers hack... :)
. Kanen Flowers http://kanen.me[/url] .

xytroxon

#5
If these are IPv4 addresses, the numbers are always 8 bit bytes (000 - 255 decimal ie: 00-FF hexadecimal) and can be turned into shorter (faster to search) hex char strings (8 chars for hexadecimal vs 12 chars for decimal)...



   (set 'n (rand 256 4))
    (H (format "%02X%02X%02X%02X" n) 0)


n = ( 000 000 000 000) -> 00000000

n = ( 255 128 005 000) -> FF080500

n = ( 255 255 255 255) -> FFFFFFFF



-- xytroxon
\"Many computers can print only capital letters, so we shall not use lowercase letters.\"

-- Let\'s Talk Lisp (c) 1976