Newbie question

Started by paweston2003, January 26, 2013, 04:26:08 PM

Previous topic - Next topic

paweston2003

I'm new to newLISP and to LISP in general.



I am trying return a slice of an xml-parsed nested list but I can't work out the syntax. Essentially I'm trying isolate all the list between the first and last horizontal rules. I have made a kludge which seems to work, but I would like to know how to do this correctly.



Here is what I have come up with so far:
;Try to open the file given in the main args
(if (not (set 'myxml (read-file (main-args -1))))
((println (rest (sys-error)))(exit 1)))

;Set the flags so that the XML parser doesn't return "Element" "CType", etc.
(xml-type-tags nil nil nil nil)

;Hopefully parse the text into an s-expression list. The numbers at the end of xml-parse are the parser settings.
(if (not (set 'myxml (xml-parse myxml (+ 1 2 8))))
((println (first (xml-error))) (exit 1)))

;Isolate the <body>
(set 'myxml (myxml 0 2))

;Discard everything before the first <HR>
(set 'myxml (slice myxml (inc (first (ref 'HR myxml)))))

;Discard everything after the remaining <HR>
(set 'myxml (slice myxml 0 (first (ref 'HR myxml))))

;Output to console to be piped or redirected
(println myxml)

(exit)


Any help, or direction to help, would be appreciated.



-Peter Weston

Lutz

#1
Looks like the right approach to me - using 'ref' and manipulating multi dimensional indices - but there are others with more hands-on XML experience than I have.



In any case: Welcome to newLISP

hilti

#2
Welcome Peter!



Would You please post an example of Your XML. It's easier to help then.



Thanks

- Marc
--()o Dragonfly web framework for newLISP

http://dragonfly.apptruck.de\">http://dragonfly.apptruck.de

cormullion

#3
Well, if it works reliably on your test examples, it's 'good enough' code! :)



I think there's no real alternative to carefully slicing it up like you're doing. Although by slicing the XML into pieces presumably it's no longer valid XML when it comes out...?

paweston2003

#4
Quote from: "cormullion"Well, if it works reliably on your test examples, it's 'good enough' code! :)



I think there's no real alternative to carefully slicing it up like you're doing. Although by slicing the XML into pieces presumably it's no longer valid XML when it comes out...?


That's right. I was thinking of outputting some basic markdown, and then importing it into the Retext program. After I posted this I thought of chopping up the text before feeding it into xml-parse. However it will definitely be malformed then. I will see if that works when I get home from work. I can always paste on html and body tags to reform it.


Quote from: "hilti"Welcome Peter!



Would You please post an example of Your XML. It's easier to help then.



Thanks

- Marc


It is the "Introduction to Scheme" at uTexas. I saw that it was in the "garbage" folder there, I want to read it, but the format is awful. Retext will output this as pdf. I found it while trying to figure out "(+1- var)" syntax. I definitely prefer "dec".



ftp://ftp.cs.utexas.edu/pub/garbage/cs345/schintro-v14/schintro_12.html#SEC12">ftp://ftp.cs.utexas.edu/pub/garbage/cs3 ... html#SEC12">ftp://ftp.cs.utexas.edu/pub/garbage/cs345/schintro-v14/schintro_12.html#SEC12



Thanks-

Peter

rickyboy

#5
Obliquely related, instapaper seems to do a nice job on it.  (I picked the narrowest margin setting — from the top-right corner button — to get the code snippets out at the right width. You might have to set that too.)  



Check it out: http://www.instapaper.com/text?u=ftp%3A%2F%2Fftp.cs.utexas.edu%2Fpub%2Fgarbage%2Fcs345%2Fschintro-v14%2Fschintro_12.html%23SEC12">//http://www.instapaper.com/text?u=ftp%3A%2F%2Fftp.cs.utexas.edu%2Fpub%2Fgarbage%2Fcs345%2Fschintro-v14%2Fschintro_12.html%23SEC12
(λx. x x) (λx. x x)

rickyboy

#6
About the approach, Lutz is right — that's basically the approach I and others use.  One comment though, and you've probably seen this already, xml-parse will balk at almost surely all HTML which predates XHTML, like this one.  In general, it's a good idea to pre-tidy your input, for this reason.



In the case of this Scheme book snippet, I could NOT get xml-parse to eat it, as is.


> (xml-parse (read-file "schintro_12.html") (+ 1 2 8))
nil
> (xml-error)
("closing tag doesn't match" 3228)

As you no doubt found out, xml-parse hit the </BODY> tag scanning the input and said "Hey! You're closing the BODY element but I have two HR elements that need to be closed first. Game over, man!"  :)



However, after popping off to the command line and running tidy on the input, I had me some XHTML which xml-parse happily processed.  (Of course, the <HR>s were converted to
s, and all was well with the world.)
(λx. x x) (λx. x x)