Faster than find?

kanen · April 12, 2011, 05:13:17 PM

I have the following issue:

Code Select Expand

> (dolist (s (sequence 1 10000)) (push (rand 300 4) y -1))
> (set 'z (y -2))   
; this is the second to last {number} :)
> (time (find z y))
; 8.x ms

Why is this a problem? I need a faster search through these numbers. 8+ milliseconds is way too long.

The numbers must be in this format, or something very close. Any extra processing to convert the input (which is z in this case) could take too long.

I can happily convert the y to a better format, if it improves the search speed.

Is there a faster find I should be considering?

cormullion · April 13, 2011, 12:18:21 PM

Hmm. Tricky to know where to make the trade-offs. Are the numbers/lists unique? If so, you could use a dictionary, which is probably much faster for recall, although the building of said dictionary could be more time-consuming, although perhaps it could be done during the more contemplative parts of your code.

kanen · April 13, 2011, 12:28:54 PM

The large list can be pre-built.

The thing-to-find can be dynamically assembled in the same format as the large list.

Elaborate on your dictionary idea. I would love to hear your thoughts.

~~Quote from: "cormullion"~~Hmm. Tricky to know where to make the trade-offs. Are the numbers/lists unique? If so, you could use a dictionary, which is probably much faster for recall, although the building of said dictionary could be more time-consuming, although perhaps it could be done during the more contemplative parts of your code.

cormullion · April 13, 2011, 02:01:43 PM

Well, imagine that you stored the list of four numbers with a string in a dictionary:

Code Select Expand
(new Tree 'H)
(dolist (s (sequence 1 10000)) 
    (set 'n (rand 300 4))
    (H (format "%03d%03d%03d%03d" n) n))

Now H is:

Code Select Expand
("003263145157" (3 263 145 157))
("005053174019" (5 53 174 19))
..

I would expect lookups to be faster:

Code Select Expand
(set 'z ((H) -2))
(H (format "%03d%03d%03d%03d" (last z)))

since dictionaries are faster than lists for looking up values. At least, I think they should be. But of course there is some additional overhead...

The question of uniqueness of keys is still pertinent, though... :)

kanen · April 14, 2011, 09:52:52 PM

Genius.

I should have considered it, but I didn't think it would be that much faster.

Turns out, I was totally wrong. It is way faster.

Code Select Expand

(new Tree 'H)
(dolist (s (sequence 1 10000)) 
    (set 'n (rand 300 4))
    (H (format "%03d%03d%03d%03d" n) n))

(164 274 162 184)

> (time (H "164274162184"))
0.011

Pretty fast! Let's try another one...

Code Select Expand

> ((H) -2)
("299285172263" (299 285 172 263))

> (time (H "299285172263"))

Now that you've shared... remind me to write something about my contexts as multiple timers hack... :)

xytroxon · April 15, 2011, 09:33:21 AM

If these are IPv4 addresses, the numbers are always 8 bit bytes (000 - 255 decimal ie: 00-FF hexadecimal) and can be turned into shorter (faster to search) hex char strings (8 chars for hexadecimal vs 12 chars for decimal)...

Code Select Expand

    (set 'n (rand 256 4))
    (H (format "%02X%02X%02X%02X" n) 0)

n = ( 000 000 000 000) -> 00000000

n = ( 255 128 005 000) -> FF080500

n = ( 255 255 255 255) -> FFFFFFFF

-- xytroxon

newLISP Fan Club

News:

Faster than find?

kanen

cormullion

kanen

cormullion

kanen

xytroxon