Identifying strings with parse

Started by Jeff, May 21, 2008, 12:51:22 PM

Previous topic - Next topic

Jeff

When using the internal parser with parse, strings in the target are not distinguished from atoms:


(parse [text](println "Hello world")[/text])

...results in:


'("(" "println" "Hello world" ")")

The only way to check them would be contextually, which would be more difficult that is reasonable in newLISP (i.e. ml-style inference) or testing against the current symbol set.  The latter has the disadvantage that if the string is equal to the string value of an existing symbol, it will not be identified as a string.  It will also not be able to find other contexts without the application previously tracking context creation.



Can parse be modified to use newLISP's parsing rules but to identify strings correctly?  Or perhaps identify ", {, }, [text], [/text] all as tokens?
Jeff

=====

Old programmers don\'t die. They just parse on...



http://artfulcode.net\">Artful code

Lutz

#1
You can use 'find-all' with a huge regular expressions, e.g. like this:


(set 'newlisp {!=|^|(|)|[a-zA-Z]+|[text]|[/text]})

> (find-all newlisp "(foo [text]hello world[/text]) (!= x y)")
("(" "foo" "[text]" "hello" "world" "[/text]" ")" "(" "!=" "x" "y" ")")
>


You could add an optional expression to preprocess each token  before it goes into the return list:


> (find-all newlisp "(foo [text]hello world[/text]) (!= x y)" (println $0))
(
foo
[text]
hello
world
[/text]
)
(
!=
x
y
)
("(" "foo" "[text]" "hello" "world" "[/text]" ")" "(" "!=" "x" "y" ")")
>


Instead of (print $0) you could use any other expression transforming $0 into something else, e.g. add a type number, etc. What goes into the list is the return value of that expression:


> (define (xform) (upper-case $0))
(lambda () (upper-case $0))
> (find-all newlisp "(foo [text]hello world[/text]) (!= x y)" (xform))
("(" "FOO" "[TEXT]" "HELLO" "WORLD" "[/TEXT]" ")" "(" "!=" "X" "Y" ")")
>

Jeff

#2
The goal is to avoid costly regexes.  I'm trying to write some pre-processing code and too much regex matching would definitely hurt load times.
Jeff

=====

Old programmers don\'t die. They just parse on...



http://artfulcode.net\">Artful code

rickyboy

#3
Hey Jeff,



Just guessing here, but if the text is lisp code, eval-string might help.


(define (text2sexp text-lisp-exp)
(eval-string (append "'" text-lisp-exp)))

(text2sexp [text](println "Hello world")[/text])
   ;; => (println "Hello world")

Then you can crawl the answer and discover that println is a symbol, "Hello world" a string.  Hope that helps.
(λx. x x) (λx. x x)

Jeff

#4
No, what I'm doing is writing a run-pre-processor for true macros.  See my other post about template expansion.  Rather than running macros as a sort of lazily evaluating function, I am trying to use them more like CL- that is, as a way of writing larger pieces of code more tersely.  



Rather than using letex, which doesn't have any way of expanding '(+ '(1 2 3)) into '(+ 1 2 3), I'm adding [*] and [**] and trying to kludge up the same effect as a common lisp back-tick expression.  But I don't want to perform expansions inside of strings, so I was hoping to be able to identify them in parsed token lists, but apparently it does not *quite* use newLISP's parser, because newLISP can obviously identify strings.
Jeff

=====

Old programmers don\'t die. They just parse on...



http://artfulcode.net\">Artful code