newLISP Fan Club

Forum => Anything else we might add? => Topic started by: Jeff on May 21, 2008, 12:51:22 PM

Title: Identifying strings with parse
Post by: Jeff on May 21, 2008, 12:51:22 PM
When using the internal parser with parse, strings in the target are not distinguished from atoms:


(parse [text](println "Hello world")[/text])

...results in:


'("(" "println" "Hello world" ")")

The only way to check them would be contextually, which would be more difficult that is reasonable in newLISP (i.e. ml-style inference) or testing against the current symbol set.  The latter has the disadvantage that if the string is equal to the string value of an existing symbol, it will not be identified as a string.  It will also not be able to find other contexts without the application previously tracking context creation.



Can parse be modified to use newLISP's parsing rules but to identify strings correctly?  Or perhaps identify ", {, }, [text], [/text] all as tokens?
Title:
Post by: Lutz on May 21, 2008, 01:40:14 PM
You can use 'find-all' with a huge regular expressions, e.g. like this:


(set 'newlisp {!=|^|(|)|[a-zA-Z]+|[text]|[/text]})

> (find-all newlisp "(foo [text]hello world[/text]) (!= x y)")
("(" "foo" "[text]" "hello" "world" "[/text]" ")" "(" "!=" "x" "y" ")")
>


You could add an optional expression to preprocess each token  before it goes into the return list:


> (find-all newlisp "(foo [text]hello world[/text]) (!= x y)" (println $0))
(
foo
[text]
hello
world
[/text]
)
(
!=
x
y
)
("(" "foo" "[text]" "hello" "world" "[/text]" ")" "(" "!=" "x" "y" ")")
>


Instead of (print $0) you could use any other expression transforming $0 into something else, e.g. add a type number, etc. What goes into the list is the return value of that expression:


> (define (xform) (upper-case $0))
(lambda () (upper-case $0))
> (find-all newlisp "(foo [text]hello world[/text]) (!= x y)" (xform))
("(" "FOO" "[TEXT]" "HELLO" "WORLD" "[/TEXT]" ")" "(" "!=" "X" "Y" ")")
>
Title:
Post by: Jeff on May 21, 2008, 01:47:48 PM
The goal is to avoid costly regexes.  I'm trying to write some pre-processing code and too much regex matching would definitely hurt load times.
Title:
Post by: rickyboy on May 21, 2008, 02:34:33 PM
Hey Jeff,



Just guessing here, but if the text is lisp code, eval-string might help.


(define (text2sexp text-lisp-exp)
(eval-string (append "'" text-lisp-exp)))

(text2sexp [text](println "Hello world")[/text])
   ;; => (println "Hello world")

Then you can crawl the answer and discover that println is a symbol, "Hello world" a string.  Hope that helps.
Title:
Post by: Jeff on May 21, 2008, 05:02:20 PM
No, what I'm doing is writing a run-pre-processor for true macros.  See my other post about template expansion.  Rather than running macros as a sort of lazily evaluating function, I am trying to use them more like CL- that is, as a way of writing larger pieces of code more tersely.  



Rather than using letex, which doesn't have any way of expanding '(+ '(1 2 3)) into '(+ 1 2 3), I'm adding
  • and [**] and trying to kludge up the same effect as a common lisp back-tick expression.  But I don't want to perform expansions inside of strings, so I was hoping to be able to identify them in parsed token lists, but apparently it does not *quite* use newLISP's parser, because newLISP can obviously identify strings.