Stemming in newLISP

Started by methodic, March 12, 2009, 09:14:19 AM

Previous topic - Next topic

methodic

Coded this real quick for a project I've been working on.


;; stemmer for newLISP, relies on the following code:
;; http://tartarus.org/~martin/PorterStemmer/c_thread_safe.txt
;;
;; download, rename to stemmer.c and compile with:
;; gcc -fPIC -c stemmer.c
;; gcc -shared -o libstemmer.so stemmer.o
;;
;; this way was faster than porting the cLISP one :)

(constant 'STEMLIB "/home/tony/wiki/libstemmer.so")

(import STEMLIB "create_stemmer")
(import STEMLIB "stem")
(import STEMLIB "free_stemmer")

(define (stemmer words)
  (set 'new_words '())

  (dolist (w words)
    (set 's (create_stemmer))
    (set 'len (stem s w (- (length w) 1) ))
    (free_stemmer s)

    (set 'n (slice w 0 (+ len 1)))
    (push n new_words -1)
  )
  new_words
)

(set 'sentence "Martin Scorsese directed the film Taxi Driver")
(set 'new_sentence (join (stemmer (parse sentence " ")) " "))

(println new_sentence)
(exit)


Quote[tony@lcars ~/wiki]$ ./stemmer.lsp

Martin Scorses direct the film Taxi Driver


Of course you can change this to be it's own context and such, I just wanted to show a quick example.