processing newLISP source code

Started by cormullion, September 18, 2008, 09:30:10 AM

Previous topic - Next topic

cormullion

I've been wondering about a ways of processing newLISP source code - so that it could be processed/analyzed/modified with some of newLISP's super list-processing functions such as ref-all or match. If there was a tokenize command in newLISP that could take some source and produce a nested list of the output... :)



I tried to write a tokenizer once but it wasn't very good and stumbled over a few things (as does parse which throws away comments and gets confused by colons). Also, it wasn't capturing the list structure of code, rather making it a stream of tokens.



Is it technically possible to store newLISP source code in list form. Or as XML, come to that?

DrDave

#1
Quote from: "cormullion"
Is it technically possible to store newLISP source code in list form. Or as XML, come to that?


Hmmm, I thought a basic tent of LISP is that *everything* is a list, no? If so, doesn't that immediately answer the question?
...it is better to first strive for clarity and correctness and to make programs efficient only if really needed.

\"Getting Started with Erlang\"  version 5.6.2

cormullion

#2
Yes, I see what you're saying. I think I should have phrased the question more like: "Can newLISP code be represented as a nested list structure rather than as a flat list?" which is slightly more interesting.



Obviously a simple flat list-representation of code is possible now:


(parse (read-file "/usr/share/newlisp/util/link.lsp"))
("(" "define" "(" "link" "orgExe" "newExeName" "lispSourceName" ")" "(" "set" "'"
 "size" "(" "first" "(" "file-info" "orgExe" ")" ")" ")" "(" "copy-file" "orgExe"
 "newExeName" ")" "(" "set" "'" "buff" "(" "pack" "ld" "size" ")" ")" "(" "set" "'"
 "handle" "(" "open" "newExeName" "u" ")" ")" "(" "search" "handle" "@@@@@@@@" ")"
 "(" "write-buffer" "handle" "'" "buff" "4" ")" "(" "set" "'" "buff" "(" "read-file"
 "lispSourceName" ")" ")" "(" "set" "'" "keylen" "(" "pack" "ld" "(" "length" "buff"
 ")" ")" ")" "(" "write-buffer" "handle" "'" "keylen" "4" ")" "(" "seek" "handle"
 "size" ")" "(" "set" "'" "buff" "(" "encrypt" "buff" "(" "string" "(" "length" "buff"
 ")" ")" ")" ")" "(" "write-buffer" "handle" "'" "buff" "(" "length" "buff" ")" ")"
 "(" "close" "handle" ")" ")")


- with the problems I mentioned above. It's not the best start when you want to write a formatting utility...!



Edit:  I did write a simple tokenizer once. It turned code like this:


(define (edits1 word)
 (let ((l (length word)) (res '()))
    (for (i 0 (- l 1)) (push (select word (replace i (sequence 0 (- l 1)))) res))
    (for (i 0 (- l 2)) (push (swap i (+ i 1) (string word)) res))
    (for (i 0 (- l 1)) (for (c (char "a") (char "z"))
        (push (replace (word i) (string word) (char c)) res)
        (push (replace (word i) (string word) (string (char c) (word i))) res)))
    res))


into this"


(("left-paren" "(")
 ("code" "define")
 ("left-paren" "(")
 ("code" "edits1")
 ("code" "word")
 ("right-paren" ")")
 ("left-paren" "(")
 ("code" "let")
 ("left-paren" "(")
 ("left-paren" "(")
...etc


But I didn't know how to do the nested structure - and it was rubbishy old code too. I thought it would be so much easier for the newLISP interpreter to generate something better, since it already knows how to do it. If you use the debugger, it's quite happy moving up and down through the hierarchy of your code...).

DrDave

#3
Cormullion,



Maybe 'read-expr' can help you out, considering, as Lutz explained here,

http://www.alh.net/newlisp/phpbb/viewtopic.php?t=2463">http://www.alh.net/newlisp/phpbb/viewtopic.php?t=2463, that it processes the first expresion from the input string. So you can move through the string and evaluate/extract expression by expression, doing some additional processing on the expression if you want to before moving to the next.
...it is better to first strive for clarity and correctness and to make programs efficient only if really needed.

\"Getting Started with Erlang\"  version 5.6.2

cormullion

#4
Ah yes.  When Lutz first announced it I had hopes that it would do this job. But the example in the manual clearly shows that it doesn't preserve white space or comments, so it couldn't be used for formatting or modifying newLISP source code... Which was my original goal. I'll get round to it eventually :)

DrDave

#5
Quote from: "cormullion"Ah yes.  When Lutz first announced it I had hopes that it would do this job. But the example in the manual clearly shows that it doesn't preserve white space or comments, so it couldn't be used for formatting or modifying newLISP source code... Which was my original goal. I'll get round to it eventually :)


I don't see that either apparent shortcoming is a big problem. First, for the case you state of wanting to format source code, does it really matter if the original white space isn't preserved? Won't you need to adjust the whitespace on the fly yourself to meet your formatting standard?



Second, I think the loss of comments is really minor. You can easily scan the source for them and check the tokens or expression to the left of the ';' to see where the comment goes. Comments are terminated by 'n' aren't they? If so, then you easily know the start and end of the comment. If nothing else, you can search for the semicolon and have returned an index into the source for where it starts that you can make use of to postion it in your reformatted version. A bit of a kludge, but not too bad.
...it is better to first strive for clarity and correctness and to make programs efficient only if really needed.

\"Getting Started with Erlang\"  version 5.6.2

cormullion

#6
Hi again. Thanks for making this topic a dialog... :)



In fact I have managed some kludgy source code processing before - see

http://unbalanced-parentheses.nfshost.com/formatting">//http://unbalanced-parentheses.nfshost.com/formatting. But at the risk of flogging this particular dead horse yet again, here's my 'argument'.



1: newLISP is super-cool because it contains so much useful stuff built-in - http server, regular expressions, xml parsing, prolog unifier, bayesian analysis, network stuff - how Lutz does it I don't know!



2: newLISP has lots of ultra-cool list analysis functions such as ref and set-ref-all, and they're great for working with nested lists, particularly XML-derived ones. I've recently written a short script that filters an RSS feed from a forum (not this one) to eliminate particular topics. It was only 8 lines - plus another 15 lines to write out the XML to RSS again. Easy with newLISP.



3: Lisps are extra-cool because you can handle code as data and vice versa.



4: So: it would be mega-cool if there was a built-in facility for working with lists that represent its own source code (if it's even possible).  I don't know the technical side, but I'm kind of  assuming that, since newLISP can already read and analyze source code pretty well (!), and generate s-expressions too, it wouldn't be too much of an effort for it to do a complete job on it. And it would do a better job than anyone else... :)



Anyway, I'll probably do some tinkering some time.