dolist or not dolist

Started by joejoe, November 16, 2010, 08:07:29 AM

Previous topic - Next topic

joejoe

Hi -



I still know next to nothing nL but I have patched together this with help:


(set 'result (unique (sort
    (find-all {[a-zA-Z]+}
        (replace "<[^>]+.*+>" (get-url "http://mysite.com/") "" 0) )
)))
(println result)
(exit)


I am trying to now pull from multiple of my sites, so I want to get-url mysite.com and mysite0.com and mysite1.com as well.



I thought this should be done with dolist. I still want to unique and sort the compilation of urls that I retreive.



So I tried this:


(set 'result (unique (sort (dolist 123
    (find-all {[a-zA-Z]+}
        (replace "<[^>]+.*+>" (get-url "http://mysite.com") "" 0) )
    (find-all {[a-zA-Z]+}
        (replace "<[^>]+.*+>" (get-url "http://site.com/") "" 0) )
    (find-all {[a-zA-Z]+}
        (replace "<[^>]+.*+>" (get-url "http://newsite.com/") "" 0) )
))))

(println result)

(exit)


But it is saying list is expected in dolist.



But how would I make dolist not unique and sort until the end of compiling the 'result list?



Thanks!

cormullion

#1
Without knowing what the find-all stuff is doing, I'm guessing that this is one way of writing the sort of thing you want:


(dolist (url '("http://mysite.com" "http://mysite.com" "http://mysite.com"))
    (push (find-all {[a-zA-Z]+} (replace "<[^>]+.*+>" (get-url url) "" 0)) result))

(println (unique (sort result)))
(exit)


You need a list to iterate through. The results are accumulated in the result list.



hth

joejoe

#2
Quote from: "cormullion"Without knowing what the find-all stuff is doing, I'm guessing that this is one way of writing the sort of thing you want:


(dolist (url '("http://mysite.com" "http://mysite.com" "http://mysite.com"))
    (push (find-all {[a-zA-Z]+} (replace "<[^>]+.*+>" (get-url url) "" 0)) result))

(println (unique (sort result)))
(exit)


You need a list to iterate through. The results are accumulated in the result list.



hth


That is a work of beauty. I cant get over how effective and to the point nL is.



You all are angels. Thanks big cormullion!

joejoe

#3
I cant figure why it is not doing unique and sort to the results list:


#!/usr/bin/newlisp

(dolist (url '("http://www.newlisp.org" "http://newlisp.nfshost.com/wiki/"))
    (push (find-all {[a-zA-Z]+} (replace "<[^>]+.*+>" (get-url url) "" 0)) result))

(println (unique (sort result)))
(exit)


I am seeing duplicate words and unsorted. Must be something simple I am missing?


[...] "script" "body" "body" "html"))

i feel like a child who knows what he wants to say but cant get the words out. :)

i promise i am learning and love to help once learned. ;)



thanks again for any tip or suggestion!

cormullion

#4
Keep at it!  Sorry - the code I supplied wasn't really a solution, rather it was a suggestion of an approach you could try.



Perhaps what you're forgetting is that lists can be simple or structured/nested/hierarchical:


(3 2 1) ; simple
((0 3) (2 1) (14)) ; structured


A list element can be a string or symbol or number, but it could also be a list. You need to be aware of the nature of the lists you're storing your data in.



In your code, you're using push to add the results of a find-all to an existing list. But find-all returns a list. So your result is a list of lists  where each 'sub-list' contains the words found in a single HTML page (I think). Both unique and sort will work on structured lists, but they won't flatten or merge the lists for you - they'll maintain the list structure. So unique on your list looks like it will effectively do nothing (since the elements are almost certain to be different), and sort will probably reorganize the result list so that the shorter lists appear earlier than longer ones.



There are two obvious solutions. Either flatten the result list first, to remove the structure (non-destructively):


(unique (sort (flat result)))

Or, more instructively, change the way you build the results list:


(set 'result '())
(dolist (url '("http://www.newlisp.org" "http://newlisp.nfshost.com/wiki/"))
    (extend result (find-all {[a-zA-Z]+} (replace "<[^>]+.*+>" (get-url url) "" 0))))

joejoe

#5
Quote from: "cormullion"Perhaps what you're forgetting is that lists can be simple or structured/nested/hierarchical


That is precisely what I was missing.



I went back to your Introduction

http://en.wikibooks.org/wiki/Introduction_to_newLISP">//http://en.wikibooks.org/wiki/Introduction_to_newLISP



and re-read the section on flat.

http://en.wikibooks.org/wiki/Introduction_to_newLISP/Lists#flat">http://en.wikibooks.org/wiki/Introducti ... Lists#flat">http://en.wikibooks.org/wiki/Introduction_to_newLISP/Lists#flat


Quote from: "cormullion"There are two obvious solutions. Either flatten the result list first, to remove the structure (non-destructively):


(unique (sort (flat result)))

Or, more instructively, change the way you build the results list:


(set 'result '())
(dolist (url '("http://www.newlisp.org" "http://newlisp.nfshost.com/wiki/"))
    (extend result (find-all {[a-zA-Z]+} (replace "<[^>]+.*+>" (get-url url) "" 0))))


Wow, incredible. Im facinated again and again w/ nL and its community.



Thank you again on this, cormullion. Works beautifully and I get it.

cormullion

#6
Cool! If you're using the wikibooks' Intro to newLISP and find any errors or obscurities, please make a note or something. You can add comments to the 'discussion' page of each page if you like.



wikibooks version will be updated for newLISP 10.3 soon. Especially  if Lutz promises not to break any existing code till 10.3.1... :)



Sadly syntax highlighting in wikibooks is still unavailable until they update their version of Geshi to the latest.