Web Crawler

Started by kanen, April 15, 2010, 01:57:20 PM

Previous topic - Next topic

kanen

Has anyone written a web crawler in newLISP?



I have looked, but cannot find such a beast.



Any pointers would be greatly appreciated.
. Kanen Flowers http://kanen.me[/url] .

cormullion

#1
I'd start looking http://www.nodep.nl/downloads/newlisp/?direct">here...

Fritz

#2
Quote from: "kanen"Has anyone written a web crawler in newLISP?


I have written later in 2009 some kind of crawler to gather information from one big goverment site.



Pretty simple thing, just several hundreds lines of code, "cgi.lsp" + regular expressions + lots of cookie romp. If you are going to make crawler without cookie, I think, simple crawler can be developed in one evening.

kanen

#3
I had forgotten, after using Ruby and Python (and, of course, C) for a few years, just how fetching awesome newLISP is.



I did indeed write the simple crawler in one evening and it turns out to be quite fast.


Quote from: "Fritz"
Quote from: "kanen"Has anyone written a web crawler in newLISP?


I have written later in 2009 some kind of crawler to gather information from one big goverment site.



Pretty simple thing, just several hundreds lines of code, "cgi.lsp" + regular expressions + lots of cookie romp. If you are going to make crawler without cookie, I think, simple crawler can be developed in one evening.
. Kanen Flowers http://kanen.me[/url] .

hilti

#4
Hi Kanen



I wrote some simple tools for analysing websites in newLISP and used CURL for fetching urls, because (get-url) doesn't support "https". You can simple invoke CURL by the exec function.



Then using SXML for parsing the returned HTML would be the easiest.



Cheers

Hilti
--()o Dragonfly web framework for newLISP

http://dragonfly.apptruck.de\">http://dragonfly.apptruck.de