Web Crawler

kanen · April 15, 2010, 01:57:20 PM

Has anyone written a web crawler in newLISP?

I have looked, but cannot find such a beast.

Any pointers would be greatly appreciated.

cormullion · April 15, 2010, 02:32:19 PM

I'd start looking http://www.nodep.nl/downloads/newlisp/?direct">here...

Fritz · April 17, 2010, 02:02:47 AM

~~Quote from: "kanen"~~Has anyone written a web crawler in newLISP?

I have written later in 2009 some kind of crawler to gather information from one big goverment site.

Pretty simple thing, just several hundreds lines of code, "cgi.lsp" + regular expressions + lots of cookie romp. If you are going to make crawler without cookie, I think, simple crawler can be developed in one evening.

kanen · April 17, 2010, 07:54:00 AM

I had forgotten, after using Ruby and Python (and, of course, C) for a few years, just how fetching awesome newLISP is.

I did indeed write the simple crawler in one evening and it turns out to be quite fast.

~~Quote from: "Fritz"~~
~~Quote from: "kanen"~~Has anyone written a web crawler in newLISP?

I have written later in 2009 some kind of crawler to gather information from one big goverment site.

Pretty simple thing, just several hundreds lines of code, "cgi.lsp" + regular expressions + lots of cookie romp. If you are going to make crawler without cookie, I think, simple crawler can be developed in one evening.

hilti · April 27, 2010, 11:01:54 PM

Hi Kanen

I wrote some simple tools for analysing websites in newLISP and used CURL for fetching urls, because (get-url) doesn't support "https". You can simple invoke CURL by the exec function.

Then using SXML for parsing the returned HTML would be the easiest.

Cheers

Hilti

newLISP Fan Club

News:

Web Crawler

kanen

cormullion

Fritz

kanen

hilti