get-url -> ERR: HTTP document empty

Started by Darth.Severus, March 26, 2013, 03:52:39 PM

Previous topic - Next topic

Darth.Severus

I'm using some code in a program, to get the links of a website incl. subpages. But it's not working, I often get "ERR: HTTP document empty". I puted a until loop into the code, so that it tries it several times, always after some minutes. My fist thougth was, I'm blocked by the server, but this seems not to be the case. If I open a newlist shell and write (get-url url) I get the site, while I still have the same IP.



(define (pick-links url)
(setq page (get-url url))
(println (1 20 page)) ; testing
(write-file "page" page); also testing
(until (not (starts-with page "ERR: HTTP document empty"))
(and (sleep 600000) (setq page (get-url url))))
(setq linklist (join (find-all "<a href=([^>]+)>([^>]*)</a>" page) "<br>n"))
        (setq linklist
(replace {"} linklist "`")) ;"
(setq parsedlist
(parse linklist "n"))  
(setq page nil) )

cormullion

#1
It worked on the first 6 sites I tried it on. (I removed the 'write-file' line.) Perhaps it's site-specific...?

Darth.Severus

#2
Ahhh, my usual error, thinking complex instead of doing the nearest thing. I always tried it with the same website... -> facepalm.



However, it works now.  I'm using the dump option of w3m browser to save the sites to my disc before I process them.
(eval-string (string {(exec "w3m } url { -dump_source -T text/html > temppage.html")}))

I think the problem was a security measure of the website, maybe it blocks non-browsers when they try to get more than two sites.



Thanks, anyway.