Identifying strings with parse

Jeff · May 21, 2008, 12:51:22 PM

When using the internal parser with parse, strings in the target are not distinguished from atoms:

Code Select Expand
(parse [text](println "Hello world")[/text])

...results in:

Code Select Expand
'("(" "println" "Hello world" ")")

The only way to check them would be contextually, which would be more difficult that is reasonable in newLISP (i.e. ml-style inference) or testing against the current symbol set. The latter has the disadvantage that if the string is equal to the string value of an existing symbol, it will not be identified as a string. It will also not be able to find other contexts without the application previously tracking context creation.

Can parse be modified to use newLISP's parsing rules but to identify strings correctly? Or perhaps identify ", {, }, [text], [/text] all as tokens?

Lutz · May 21, 2008, 01:40:14 PM

You can use 'find-all' with a huge regular expressions, e.g. like this:

Code Select Expand
(set 'newlisp {!=|^|(|)|[a-zA-Z]+|[text]|[/text]})

> (find-all newlisp "(foo [text]hello world[/text]) (!= x y)")
("(" "foo" "[text]" "hello" "world" "[/text]" ")" "(" "!=" "x" "y" ")")
>

You could add an optional expression to preprocess each token before it goes into the return list:

Code Select Expand
> (find-all newlisp "(foo [text]hello world[/text]) (!= x y)" (println $0))
(
foo
[text]
hello
world
[/text]
)
(
!=
x
y
)
("(" "foo" "[text]" "hello" "world" "[/text]" ")" "(" "!=" "x" "y" ")")
>

Instead of (print $0) you could use any other expression transforming $0 into something else, e.g. add a type number, etc. What goes into the list is the return value of that expression:

Code Select Expand
> (define (xform) (upper-case $0))
(lambda () (upper-case $0))
> (find-all newlisp "(foo [text]hello world[/text]) (!= x y)" (xform))
("(" "FOO" "[TEXT]" "HELLO" "WORLD" "[/TEXT]" ")" "(" "!=" "X" "Y" ")")
>

Jeff · May 21, 2008, 01:47:48 PM

The goal is to avoid costly regexes. I'm trying to write some pre-processing code and too much regex matching would definitely hurt load times.

rickyboy · May 21, 2008, 02:34:33 PM

Hey Jeff,

Just guessing here, but if the text is lisp code, eval-string might help.

Code Select Expand
(define (text2sexp text-lisp-exp)
	(eval-string (append "'" text-lisp-exp)))

(text2sexp [text](println "Hello world")[/text])
   ;; => (println "Hello world")

Then you can crawl the answer and discover that println is a symbol, "Hello world" a string. Hope that helps.

Jeff · May 21, 2008, 05:02:20 PM

No, what I'm doing is writing a run-pre-processor for true macros. See my other post about template expansion. Rather than running macros as a sort of lazily evaluating function, I am trying to use them more like CL- that is, as a way of writing larger pieces of code more tersely.

Rather than using letex, which doesn't have any way of expanding '(+ '(1 2 3)) into '(+ 1 2 3), I'm adding [*] and [**] and trying to kludge up the same effect as a common lisp back-tick expression. But I don't want to perform expansions inside of strings, so I was hoping to be able to identify them in parsed token lists, but apparently it does not *quite* use newLISP's parser, because newLISP can obviously identify strings.

newLISP Fan Club

News:

Identifying strings with parse

Jeff

Lutz

Jeff

rickyboy

Jeff