parse internal tokenizer and double quotes

Started by Dmi, October 22, 2005, 03:54:23 PM

Previous topic - Next topic

Dmi

I found that 'parse', called with internal tokenizer have in mind about double quotes matching (cool!). But, it doesn't include them into returned tokens:
newLISP v.8.7.0 on linux, execute 'newlisp -h' for more info.

> (parse """)

string token too long in function parse : "" 0708x2070801"

> (parse ""abc"")
("abc")
>

Is there a way to got (""abc"") instead of ("abc").



I trying to make function to read un-evaluated s-expression.

If parse's internal tokenizer will include double quotes where they are found in tokens, then quite good read function will be as small as 13 lines (in my version).



Either, imho, it's good when things like 'parse' will have the universal rather than special behavior. I.e., in case of 'parse' it's easy to strip quotes when they are unwanted, but pretty unavailable to restore them when they are already stripped.
WBR, Dmi

Lutz

#1
Use 'parse' with specifying a break string and regular expression in the break string. The 'break-string' is the specification which tells 'parse' where to break up the string.



If you use regular expressions in the break-string you have to specify an options number, which is 0 (zero) in the most simplest case, '1' for case-insensitive matching etc.



For example:

> (parse {this "is" a    sentence} {s+} 0)
("this" ""is"" "a" "sentence")
>


The curly braces are used as string delimiters so I can freely use quotes " inside the string. The regular expression pattern: {s+} tells to break up the string at one ore more white space characters (spaces, linefeeds, tabs, etc.).



The 0 (zero) tells that {s+} should be taken as a regular expression and not as a literal string. Ther are other numbers than 0 you would use for other situations. See 'regex' for other option numbers ans their meaning.



Lutz