I have the following issue:
> (dolist (s (sequence 1 10000)) (push (rand 300 4) y -1))
> (set 'z (y -2))
; this is the second to last {number} :)
> (time (find z y))
; 8.x ms
Why is this a problem? I need a faster search through these numbers. 8+ milliseconds is way too long.
The numbers must be in this format, or something very close. Any extra processing to convert the input (which is z in this case) could take too long.
I can happily convert the y to a better format, if it improves the search speed.
Is there a faster find I should be considering?
Hmm. Tricky to know where to make the trade-offs. Are the numbers/lists unique? If so, you could use a dictionary, which is probably much faster for recall, although the building of said dictionary could be more time-consuming, although perhaps it could be done during the more contemplative parts of your code.
The large list can be pre-built.
The thing-to-find can be dynamically assembled in the same format as the large list.
Elaborate on your dictionary idea. I would love to hear your thoughts.
Quote from: "cormullion"
Hmm. Tricky to know where to make the trade-offs. Are the numbers/lists unique? If so, you could use a dictionary, which is probably much faster for recall, although the building of said dictionary could be more time-consuming, although perhaps it could be done during the more contemplative parts of your code.
Well, imagine that you stored the list of four numbers with a string in a dictionary:
(new Tree 'H)
(dolist (s (sequence 1 10000))
(set 'n (rand 300 4))
(H (format "%03d%03d%03d%03d" n) n))
Now H is:
("003263145157" (3 263 145 157))
("005053174019" (5 53 174 19))
..
I would expect lookups to be faster:
(set 'z ((H) -2))
(H (format "%03d%03d%03d%03d" (last z)))
since dictionaries are faster than lists for looking up values. At least, I think they should be. But of course there is some additional overhead...
The question of uniqueness of keys is still pertinent, though... :)
Genius.
I should have considered it, but I didn't think it would be that much faster.
Turns out, I was totally wrong. It is way faster.
(new Tree 'H)
(dolist (s (sequence 1 10000))
(set 'n (rand 300 4))
(H (format "%03d%03d%03d%03d" n) n))
(164 274 162 184)
> (time (H "164274162184"))
0.011
Pretty fast! Let's try another one...
> ((H) -2)
("299285172263" (299 285 172 263))
> (time (H "299285172263"))
Now that you've shared... remind me to write something about my contexts as multiple timers hack... :)
If these are IPv4 addresses, the numbers are always 8 bit bytes (000 - 255 decimal ie: 00-FF hexadecimal) and can be turned into shorter (faster to search) hex char strings (8 chars for hexadecimal vs 12 chars for decimal)...
(set 'n (rand 256 4))
(H (format "%02X%02X%02X%02X" n) 0)
n = ( 000 000 000 000) -> 00000000
n = ( 255 128 005 000) -> FF080500
n = ( 255 255 255 255) -> FFFFFFFF
-- xytroxon