newLISP Fan Club

Forum => newLISP in the real world => Topic started by: kanen on April 15, 2010, 01:57:20 PM

Title: Web Crawler
Post by: kanen on April 15, 2010, 01:57:20 PM
Has anyone written a web crawler in newLISP?



I have looked, but cannot find such a beast.



Any pointers would be greatly appreciated.
Title: Re: Web Crawler
Post by: cormullion on April 15, 2010, 02:32:19 PM
I'd start looking here (//http)...
Title: Re: Web Crawler
Post by: Fritz on April 17, 2010, 02:02:47 AM
Quote from: "kanen"Has anyone written a web crawler in newLISP?


I have written later in 2009 some kind of crawler to gather information from one big goverment site.



Pretty simple thing, just several hundreds lines of code, "cgi.lsp" + regular expressions + lots of cookie romp. If you are going to make crawler without cookie, I think, simple crawler can be developed in one evening.
Title: Re: Web Crawler
Post by: kanen on April 17, 2010, 07:54:00 AM
I had forgotten, after using Ruby and Python (and, of course, C) for a few years, just how fetching awesome newLISP is.



I did indeed write the simple crawler in one evening and it turns out to be quite fast.


Quote from: "Fritz"
Quote from: "kanen"Has anyone written a web crawler in newLISP?


I have written later in 2009 some kind of crawler to gather information from one big goverment site.



Pretty simple thing, just several hundreds lines of code, "cgi.lsp" + regular expressions + lots of cookie romp. If you are going to make crawler without cookie, I think, simple crawler can be developed in one evening.
Title: Re: Web Crawler
Post by: hilti on April 27, 2010, 11:01:54 PM
Hi Kanen



I wrote some simple tools for analysing websites in newLISP and used CURL for fetching urls, because (get-url) doesn't support "https". You can simple invoke CURL by the exec function.



Then using SXML for parsing the returned HTML would be the easiest.



Cheers

Hilti