newLISP Fan Club

Forum => newLISP in the real world => Topic started by: paweston2003 on January 26, 2013, 04:26:08 PM

Title: Newbie question
Post by: paweston2003 on January 26, 2013, 04:26:08 PM
I'm new to newLISP and to LISP in general.



I am trying return a slice of an xml-parsed nested list but I can't work out the syntax. Essentially I'm trying isolate all the list between the first and last horizontal rules. I have made a kludge which seems to work, but I would like to know how to do this correctly.



Here is what I have come up with so far:
;Try to open the file given in the main args
(if (not (set 'myxml (read-file (main-args -1))))
((println (rest (sys-error)))(exit 1)))

;Set the flags so that the XML parser doesn't return "Element" "CType", etc.
(xml-type-tags nil nil nil nil)

;Hopefully parse the text into an s-expression list. The numbers at the end of xml-parse are the parser settings.
(if (not (set 'myxml (xml-parse myxml (+ 1 2 8))))
((println (first (xml-error))) (exit 1)))

;Isolate the <body>
(set 'myxml (myxml 0 2))

;Discard everything before the first <HR>
(set 'myxml (slice myxml (inc (first (ref 'HR myxml)))))

;Discard everything after the remaining <HR>
(set 'myxml (slice myxml 0 (first (ref 'HR myxml))))

;Output to console to be piped or redirected
(println myxml)

(exit)


Any help, or direction to help, would be appreciated.



-Peter Weston
Title: Re: Newbie question
Post by: Lutz on January 26, 2013, 07:41:13 PM
Looks like the right approach to me - using 'ref' and manipulating multi dimensional indices - but there are others with more hands-on XML experience than I have.



In any case: Welcome to newLISP
Title: Re: Newbie question
Post by: hilti on January 27, 2013, 12:50:56 AM
Welcome Peter!



Would You please post an example of Your XML. It's easier to help then.



Thanks

- Marc
Title: Re: Newbie question
Post by: cormullion on January 27, 2013, 01:51:40 AM
Well, if it works reliably on your test examples, it's 'good enough' code! :)



I think there's no real alternative to carefully slicing it up like you're doing. Although by slicing the XML into pieces presumably it's no longer valid XML when it comes out...?
Title: Re: Newbie question
Post by: paweston2003 on January 27, 2013, 08:09:07 AM
Quote from: "cormullion"Well, if it works reliably on your test examples, it's 'good enough' code! :)



I think there's no real alternative to carefully slicing it up like you're doing. Although by slicing the XML into pieces presumably it's no longer valid XML when it comes out...?


That's right. I was thinking of outputting some basic markdown, and then importing it into the Retext program. After I posted this I thought of chopping up the text before feeding it into xml-parse. However it will definitely be malformed then. I will see if that works when I get home from work. I can always paste on html and body tags to reform it.


Quote from: "hilti"Welcome Peter!



Would You please post an example of Your XML. It's easier to help then.



Thanks

- Marc


It is the "Introduction to Scheme" at uTexas. I saw that it was in the "garbage" folder there, I want to read it, but the format is awful. Retext will output this as pdf. I found it while trying to figure out "(+1- var)" syntax. I definitely prefer "dec".



ftp://ftp.cs.utexas.edu/pub/garbage/cs345/schintro-v14/schintro_12.html#SEC12



Thanks-

Peter
Title: Re: Newbie question
Post by: rickyboy on January 27, 2013, 11:30:27 AM
Obliquely related, instapaper seems to do a nice job on it.  (I picked the narrowest margin setting — from the top-right corner button — to get the code snippets out at the right width. You might have to set that too.)  



Check it out: //http://www.instapaper.com/text?u=ftp%3A%2F%2Fftp.cs.utexas.edu%2Fpub%2Fgarbage%2Fcs345%2Fschintro-v14%2Fschintro_12.html%23SEC12
Title: Re: Newbie question
Post by: rickyboy on January 27, 2013, 12:27:47 PM
About the approach, Lutz is right — that's basically the approach I and others use.  One comment though, and you've probably seen this already, xml-parse will balk at almost surely all HTML which predates XHTML, like this one.  In general, it's a good idea to pre-tidy your input, for this reason.



In the case of this Scheme book snippet, I could NOT get xml-parse to eat it, as is.


> (xml-parse (read-file "schintro_12.html") (+ 1 2 8))
nil
> (xml-error)
("closing tag doesn't match" 3228)

As you no doubt found out, xml-parse hit the </BODY> tag scanning the input and said "Hey! You're closing the BODY element but I have two HR elements that need to be closed first. Game over, man!"  :)



However, after popping off to the command line and running tidy on the input, I had me some XHTML which xml-parse happily processed.  (Of course, the <HR>s were converted to
s, and all was well with the world.)