Possible bug in xml-parse

Started by Jeff, September 05, 2007, 01:45:43 PM

Previous topic - Next topic

Jeff

Lutz,



When using xml-parse to read xhtml (which is valid xml), any attributes which contain quotation marks, even escaped or in a CDATA block, will cause an error:


<a href="/some/path" title="something" onclick="alert("Hello world")">Looky here</a>

This should validate.  It would be impossible to make this render correctly using anything apart from single quotes, and that would mean forcing double quotes to single quotes as a direct translation, which might corrupt some javascript (which can use both in the same string).  The following is an example that should be valid xml and demonstrates the problem:


<a href="/some/path/" title="something" onclick="alert("Hello" + 'world')">Looky here</a>

I could get it to validate using the html entity, but it would not function when rendered as a string.  Escaping the quotes should be valid; I can't find anything forbidding this in the xhtml spec.



PS Excuse the visible entities - can't figure out any other way to display the contents of an html tag, since the BB seems to be stripping any attributes off.
Jeff

=====

Old programmers don\'t die. They just parse on...



http://artfulcode.net\">Artful code

Lutz

#1
When you use XML with no DTD validation (xml-parse does no validation, only checks for XML being well formed) then the following characters are not allowed as of XML spec:


greater
less
ampersand
quote
apostrope


Use entities to encode them.



When using CDATA, newLISP will process correctly:



>  (xml-type-tags nil nil nil nil)
(nil nil nil nil)
> (xml-parse {<data><![CDATA[<>&"']]></data>} 15)
((data "<>&"'"))
>


but XSLT will translate all special chars in CDATA into entities, so there is no safe way to use special chars in a CDATA block. The best is to just base64 encode all CDATA strings.



Lutz



ps:  note that xml-parse is an XML parser not an XHTML parser with HTML DTD validation.

Jeff

#2
I don't need DTD validation.  I wasn't going that far with it.  Checking for well-formed markup was all that I am after.
Jeff

=====

Old programmers don\'t die. They just parse on...



http://artfulcode.net\">Artful code