Print Page - Web Crawler

Title: Web Crawler
Post by: kanen on April 15, 2010, 01:57:20 PM

Has anyone written a web crawler in newLISP?

I have looked, but cannot find such a beast.

Any pointers would be greatly appreciated.

Title: Re: Web Crawler
Post by: cormullion on April 15, 2010, 02:32:19 PM

I'd start looking here (//http)...

Title: Re: Web Crawler
Post by: Fritz on April 17, 2010, 02:02:47 AM

~~Quote from: "kanen"~~Has anyone written a web crawler in newLISP?

I have written later in 2009 some kind of crawler to gather information from one big goverment site.

Pretty simple thing, just several hundreds lines of code, "cgi.lsp" + regular expressions + lots of cookie romp. If you are going to make crawler without cookie, I think, simple crawler can be developed in one evening.

Title: Re: Web Crawler
Post by: kanen on April 17, 2010, 07:54:00 AM

I had forgotten, after using Ruby and Python (and, of course, C) for a few years, just how fetching awesome newLISP is.

I did indeed write the simple crawler in one evening and it turns out to be quite fast.

~~Quote from: "Fritz"~~
~~Quote from: "kanen"~~Has anyone written a web crawler in newLISP?

I have written later in 2009 some kind of crawler to gather information from one big goverment site.

Pretty simple thing, just several hundreds lines of code, "cgi.lsp" + regular expressions + lots of cookie romp. If you are going to make crawler without cookie, I think, simple crawler can be developed in one evening.

Title: Re: Web Crawler
Post by: hilti on April 27, 2010, 11:01:54 PM

Hi Kanen

I wrote some simple tools for analysing websites in newLISP and used CURL for fetching urls, because (get-url) doesn't support "https". You can simple invoke CURL by the exec function.

Then using SXML for parsing the returned HTML would be the easiest.

Cheers

Hilti

newLISP Fan Club

Forum => newLISP in the real world => Topic started by: kanen on April 15, 2010, 01:57:20 PM