Interesting Challenge...

Started by kanen, April 29, 2011, 06:42:27 PM

Previous topic - Next topic

kanen

I was reading a post on Craig's List where a company was asking for resumes, but they wanted you to solve a specific problem first.



As I do a ton of work like this, I thought it would be interesting to share and talk about.



I have settled on a way to do this without building a matrix, but I want to make sure it works for all 250,000 words in the word-list (below) before I post the results.



(it's in code form so you can copy and paste into your program ;) )


;
; Two words are friends if they have a Levenshtein distance of 1.
; That is, you can add, remove, or substitute exactly one letter in
; word X to create word Y.
; A words social network consists of all of its friends, plus all of
; their friends, and all of their friends friends, and so on.
;
; Write a program to tell us how big the
; social network for the word causes is, using this word list:
; https://github.com/causes/puzzles/raw/master/word_friends/word.list
;
; http://en.wikipedia.org/wiki/Levenshtein_distance
;
; tons of implementations in several languages
; http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance
;
; a working HTML version
; http://www.miislita.com/searchito/levenshtein-edit-distance.html
;
. Kanen Flowers http://kanen.me[/url] .

kanen

#1
I posted some work-in-progress code, with a ton of options and some fun stuff in it for this challenge.



https://gist.github.com/954811">https://gist.github.com/954811
. Kanen Flowers http://kanen.me[/url] .

kanen

#2
I've posted an update.



It is quite a lot faster.



https://gist.github.com/963774">//https://gist.github.com/963774



Suggestions?
. Kanen Flowers http://kanen.me[/url] .

rickyboy

#3
Hello Kanen,



Do you really want suggestions? :)  Before that, I wanted to say great job on the improved routine.  OK, here goes.



First: you should change the name of your function from FastEditDistance to something which better describes what it is doing, like get-friends.



Second: fix the bugs which cause word to appear in the output list.  This is caused by word being regenerated multiple times by the swap and replace loops.  I put changes in an updated version of the code, here below, to stop this from happening.



Third: cover your free variables with a let or something.  I think your free variables are new-words and tmpWord.  On a related note, I also brought the binding for alphabet into the function definition.  But I am relying on the assumption that it is not needed anywhere else. I may be wrong.



Fourth: if the problem calls for using the Levenshtein metric as a basis for the definition of "friend", then you should remove the swap loop.  (I have, in the code below.)  According to your wiki cite, the Levenshtein doesn't allow the swap adjacent operation, but there is a variant called Damerau–Levenshtein which does.



Here is an updated version of the code which covers the issues discussed above:


(define (get-friends word word-list)
  (let ((new-words '())
        (alphabet (explode "abcdefghijklmnopqrstuvwxyz"))
        (tmpWord ""))

    ;; Deletes (removing one letter)
    (for (i 0 (- (length word) 1))
      (setf tmpWord word)
      (pop tmpWord i)
      (push tmpWord new-words -1))

    ;; Modifies (one letter to another)
    (for (i 0 (- (length word) 1))
      (set 'tmpWord word)
      (dolist (a alphabet)
        (when (not (= (word i) a))
          (setf (tmpWord i) a)
          (push tmpWord new-words -1))))

    ;; Inserts (add a letter)
    (for (i 0 (length word))
      (dolist (a alphabet)
        (set 'tmpWord word)
        (push (push a tmpWord i) new-words -1)))

    (intersect new-words word-list)))

All in all, great job optimizing the routine.  I imagine it takes a little while to get unwrapped around the axle from the initial notion that you needed to use the Levenshtein metric, when you didn't.  I don't think I would have realized it as quickly as you.  My hat's off to you, Sir! --Rick



P.S. -- Here is the updated swap loop code, should you need it.  The index i has to end at length(word)-2.  When i is length(word)-1, the code will regenerate word.


   ;; Swaps (swap adjacent letters)
    (for (i 0 (- (length word) 2))
      (set 'tmpWord word)
      (push (push (pop tmpWord i) tmpWord (+ 1 i)) new-words -1))
(λx. x x) (λx. x x)

rickyboy

#4
Kanen (or anyone else),



I have a solution to the problem in newLISP, which computes the answer in under 2 minutes (about 114 seconds), on my Macbook Pro*.  Kanen, is that the type of run time you are seeing also with yours?



Curious, --Rick



* -- It has a 2.16 GHz Intel Core 2 Duo with 1 GB (667 MHz DDR2 SDRAM) memory.  I bought it in 2006.
(λx. x x) (λx. x x)

kanen

#5
If you are talking about a solution which solves the original problem (at the top of this thread) in under 2 minutes (and gets just over 78,000 friends) -- you should post the results.



And, yes, that is what I am seeing.


Quote from: "rickyboy"Kanen (or anyone else),



I have a solution to the problem in newLISP, which computes the answer in under 2 minutes (about 114 seconds), on my Macbook Pro*.  Kanen, is that the type of run time you are seeing also with yours?
. Kanen Flowers http://kanen.me[/url] .

rickyboy

#6
OK, well here's the code.  It's in a link, vice posting the code directly in here, in case someone wants to work on it and doesn't want a "spoiler".  Just click it when you're ready.



http://www.rickhanson.org/code/causes.lsp.pdf">//http://www.rickhanson.org/code/causes.lsp.pdf



And I hope you enjoy working on the problem.  I did! :-)  Thanks again, Kanen, for the heads up on the problem and starting the discussion.



Yours, --Rick
(λx. x x) (λx. x x)