suggestions for better string mangling

Started by cormullion, September 21, 2008, 02:03:31 AM

Previous topic - Next topic

cormullion

I wrote this yesterday, converting it as best I could from Python:


(define (edits1 word)
 (let ((l (length word)) (res '()))
    (for (i 0 (- l 1)) (push (replace (nth i (string word)) (string word) "") res -1))  ; deletion
    (for (i 0 (- l 2)) (push (swap i (+ i 1) (string word)) res -1))                    ; transposition
    (for (i 0 (- l 1)) (for (c (char "a") (char "z"))
        (push (replace (nth i (string word)) (string word) (char c)) res -1)            ; alteration
        (push (replace (nth i (string word)) (string word) (string (char c) (nth i (string word)) )) res -1))) ; insertion
    res))


The idea was to take a word and to generate a set of most of the possible single-letter modifications of that word. So given "newlisp", you'd get deletions: "ewlisp" "nwlisp" "nelisp"...,  transpositions:  "enwlisp" "nwelisp" ..., insertions: "anewlisp"  "bnewlisp" ..., and alterations: "aewlisp" "bewlisp"....



Lutz said I was using 'string casts'  - I think he meant those (string word) functions; they're to stop replace modifying the original word too soon. But I'm sure there's a better way of doing the job. Anyone got any bright ideas?

Lutz

#1
The function (edits1 word) is entered with 'word' already being a string. This means we don't need the first 'string' cast in 'replace'. We still need the second cast, because we want 'replace' to be non-destructive, as we need the same word in following 'replace' statements again:


; current code in edits1
(for (i 0 (- l 1))
  (push (replace (nth i (string word)) (string word) "") res -1))

; leaving out one cast and implicit indexing
(for (i 0 (- l 1))
  (push (replace (word i) (string word) "") res -1))


This is still the same logic as before but slightly shorter and faster.



But here is another approach for extracting the single letters using the select function and much faster:


; new logic with select and almost 3 times as fast as the original
(for (i 0 (- l 1))
  (push (select word (replace i (sequence 0 (- l 1)))) res -1))


The 'replace' takes out one number of (0 1 2 3 4 5 ...) then uses the remaining as a parameter to the 'select' statement.

cormullion

#2
That's clever! It's surprising that it's so much faster, too.



Thanks!