getting nL symbols into the find-all regex?

Started by joejoe, June 04, 2012, 08:16:46 PM

Previous topic - Next topic

joejoe

Hi -



This is my working code to grab titles from my page:


(set 'titles (find-all {<h2>.*</h2>} page $0))

Sometimes they are not <h2> tags that I have to look into to find my titles.



Sometimes they are <a href=...> tags that are inside of <h2> tags. Tough! :0)



I have definied two symbols, enclosure-tag-open and enclosure-tag-close.



Now I am trying to use these symbols in my find-all. I have tried all of these:


(set 'titles (find-all {enclosure-tag-open[.*]enclosure-tag-close} page $0))

(set 'titles (find-all {(eval enclosure-tag-open).*(eval enclosure-tag-close)} page $0))

(set 'titles (find-all {(println enclosure-tag-open)[.*](println enclosure-tag-close)} page $0))


I have even tried to make my own find-all string and then somehow run it:


(println "find-all {"(println enclosure-tag-open)".*"(println enclosure-tag-close)"} page $0")

Is there an easy way to have symbols replace the <h2> and </h2> below:


(set 'titles (find-all {<h2>.*</h2>} page $0))

Thanks very much!

Lutz

#1
You could use format to embed variable enclosure strings:



http://www.newlisp.org/downloads/newlisp_manual.html#format">http://www.newlisp.org/downloads/newlis ... tml#format">http://www.newlisp.org/downloads/newlisp_manual.html#format

cormullion

#2
Quote(set 'titles (find-all {enclosure-tag-open[.*]enclosure-tag-close} page $0))

By putting the strings 'enclosure-tag-open' and 'enclosure-tag-close' into a string, you've prevented newLISP seeing that there is anything special about them.


Quote(set 'titles (find-all {(eval enclosure-tag-open).*(eval enclosure-tag-close)} page $0))

... similarly, here you've put characters that might be newLISP code into a string. But newLISP will treat the characters as ordinary strings.


Quote(set 'titles (find-all {(println enclosure-tag-open)[.*](println enclosure-tag-close)} page $0))

... again, these characters are just strings. newLISP treats them as such.



What you need is something like this:


(set 'titles (find-all (string enclosure-tag-open {.*?} enclosure-tag-close) page))


where the string function evaluates symbols and builds a string from their 'contents'.



I'm just wondering whether you're using an editor that does syntax highlighting? If you were, you could see that your symbols were  being treated as plain old strings once inside string delimiters (e.g. as you see in http://en.wikibooks.org/wiki/Introduction_to_newLISP/Strings">wikibooks with the new syntax highlighting. Life will be much easier if you're currently not using syntax highlighting...

joejoe

#3
string - !!!



I see, yes. Super! Thank you cormullion!



I am using the Geany text editor, and it has a LISP document filetype, which I use.



I attached what it looks like on my screen. I do not see it highlighting symbols. These were already defined further up the page, enclosure-tag-open and enclosure-tag-close.



Are you saying the symbols should be highlighted in some color w/ a good editor?

xytroxon

#4
I'm at lunch, so here is a quick and dirty post!



Notes:

process-tags function  builds list of tags found into clean_tags list

add "magic"  before the / in the regex of closing tags </h2>

add ? to make regex stop at first ending tag match

find-all flag 5 to process upper or lower case tags and include newlines that may occur between tags



(setq page
[text]
<html>
<head><title>Test HTML</title></head>
<body>

<h1>Header H1</h1>
    blah blah blah blah
<h2>Header H2</h2>
  blah blah blah blah
<h3>Header H3</h3>
          blah blah blah blah
<h4>Header H4</h4>
 blah blah blah blah
<h5>Header H5</h5>
blah blah blah blah
<h6>Header H6</h6>

<H2><a href="http://www.newlisp.org">   newLISP Main Page  </a></H2>
<h2><a href="http://newlispfanclub.alh.net/forum/"> newLISP Forum</a></H2>

</body>
</html>
[/text]
)

(define (process-tags str)
(println "tag: " str)
(replace "<h2>" str "" 5)
(replace "</h2>" str "" 5)
(replace "<a.*?>" str "" 5)
(replace "</a>" str "" 5)
(setq str (trim str))
(push str clean_tags -1)
)

; (println page)

(find-all {<h2>.*?</h2>} page (process-tags $0) 5)
(println)
(println clean_tags)
(exit)



>"c:program files (x86)newlispnewlisp.exe" "C:UsersProgrammingzx.nl"
tag: <h2>Header H2</h2>
tag: <H2><a href="http://www.newlisp.org">   newLISP Main Page  </a></H2>
tag: <h2><a href="http://newlispfanclub.alh.net/forum/"> newLISP Forum</a></H2>

("Header H2" "newLISP Main Page" "newLISP Forum")
>Exit code: 0


-- xytroxon
\"Many computers can print only capital letters, so we shall not use lowercase letters.\"

-- Let\'s Talk Lisp (c) 1976

joejoe

#5
xytroxon,



You got me in business! :0)



I will combine the find-all w/ nL symbols technique cormullion taught above with your process-tags, [a.k.a. the laundromat :]



Big kickin! :D