current-char to match current-line?

Started by TedWalther, January 15, 2012, 11:04:15 PM

Previous topic - Next topic

TedWalther

Can you add in current-char to match the functionality of current-line?  Or else have read-char put the read char into current-line?
Cavemen in bearskins invaded the ivory towers of Artificial Intelligence.  Nine months later, they left with a baby named newLISP.  The women of the ivory towers wept and wailed.  \"Abomination!\" they cried.

Lutz

#1
Do you have a concrete application for this or is it for the sake of consistency? Or, perhaps 'read-key' can help you out?



Then there is also (read handle chr 1). With handle = 0 for STDIN.

TedWalther

#2
My application is reading the contents of stdin a character at a time, and parsing as I go.  I'm finally making a parser for newlisp so I can mingle newlisp code into a language with a slightly different format, and output well formed newlisp.



And yeah, consistency is nice. :)  I got used to (current-line)
Cavemen in bearskins invaded the ivory towers of Artificial Intelligence.  Nine months later, they left with a baby named newLISP.  The women of the ivory towers wept and wailed.  \"Abomination!\" they cried.

TedWalther

#3
Is there a railway diagram for newlisp somewhere?
Cavemen in bearskins invaded the ivory towers of Artificial Intelligence.  Nine months later, they left with a baby named newLISP.  The women of the ivory towers wept and wailed.  \"Abomination!\" they cried.

Lutz

#4
I would not read character by character, but read the source file in one piece using 'read-file' and then tokenize using either 'find-all' and describe with regular expression how tokens can look like or use 'parse' with a regular expression describing the space in-between tokens.

cormullion

#5
I did some newLISP parsing - it's on GitHub. Please make it better!

TedWalther

#6
Thanks cormullion.  Is your parser up to date with the latest version of newlisp?



Lutz, can read-file take an integer file handle or a FILE* as well as a string filename?  If so, then it would be much easier; I could just (read-file 0)  It is important to me to let users pipe things into my scripts instead of always specifying them on the command line.
Cavemen in bearskins invaded the ivory towers of Artificial Intelligence.  Nine months later, they left with a baby named newLISP.  The women of the ivory towers wept and wailed.  \"Abomination!\" they cried.

cormullion

#7
Well, it didn't fail when running the 10.3.2/qa-specific-tests/qa-bench file. I've not been keeping up with changes, but unless there's something wildly different it should be OK. Of course, it doesn't have to have an up-to-date list of functions, because everything is either a symbol, a string, a number, etc...

Lutz

#8
all of these work with pipes and stdin redirection:



(while (read 0 chr 1)
    (print chr))


or this:



(while (set 'chr (read-char 0))
    (print (char chr)))


this one would read UTF8 characters:



(while (set 'chr (read-utf8 0))
    (print (char chr)))


this one would read a file <= 1Mbyte in one swoop:



(read 0 theFile 1000000)
(print theFile)


and you could use:



(dolist (chr (explode theFile))
...
)


'explode' would recognize UTF8 chars too on UTF8 enabled versions.



Cormullion:



for a quick check of version compatibility look at the last section in any of the release notes since 10.3:



http://www.newlisp.org/downloads/previous-release-notes/">http://www.newlisp.org/downloads/previo ... ase-notes/">http://www.newlisp.org/downloads/previous-release-notes/

cormullion

#9
Thanks. I don't think there was much that would break a parser...

TedWalther

#10
Lutz, thank you!  I am so used to C idiom, I completely didn't realize the (read 0 char 1) could be used.  That solves my problem.  Although, back to the principle of least surprise...



Here is what I was originally getting at.  I am used to an idiom of doing a read, and if the read fails, terminating the loop.  But if the read SUCCEEDS, then I need to go through the state machine and do stuff based on the value of the read.


(while (read 0 c 1) (do-stuff-with c))

With the (read 0 char 1) syntax, this is nice.  WIth the (read-line 0) syntax, the value is stuff into current-line. Again, this is nice.



But there is nothing analogous for read-file, read-char, and read-utf8.  I see two different syntaxen here, the read, and read-line ways of allowing the same thing.  



The documentation says the file handle is optional for read-line, but not read-utf8 or read-char.  Is that true and intentional?  If read-char and read-utf8 have to have the integer argument, then can we add an argument to the end to stuff the value into?  That way the return value is good for noting the success/failure mode, but the value is still stored.
Cavemen in bearskins invaded the ivory towers of Artificial Intelligence.  Nine months later, they left with a baby named newLISP.  The women of the ivory towers wept and wailed.  \"Abomination!\" they cried.

TedWalther

#11
Cormullion, does your parser have a hook so I can "take over" the parsing at a certain point, and return to the calling evaluator, where my parsing ended, and it should resume?



What I want is this; parse like regular newlisp code until constructs that I define are matched (eg, [music], ,@ @( or (para



So once my opening delimiters are matched, I parse until the end of my stuff, then the calling parser resumes.
Cavemen in bearskins invaded the ivory towers of Artificial Intelligence.  Nine months later, they left with a baby named newLISP.  The women of the ivory towers wept and wailed.  \"Abomination!\" they cried.

TedWalther

#12
I envision using it something like this:



(parse str ((match-fn-1 parse-fn-1) (match-fn-2 parse-fn-2) ...))


At every char in the string, the match-fn will be run.



match-fn's will take the following arguments:



(match-fn str $idx)



str is the string

$idx is the index into the string



RETURN: match-fn returns true or nil



parse fn's will take the following arguments:



(parse-fn  str $idx)



str-rest is the rest of the string



RETURN parse-fn returns a list of the form (expr idx)



Where expr is the properly formed newlisp expression resulting from the alien code, and idx is the index of the char in the string after the end of what was parsed.



This is pretty much how I'd like read-expr to work.
Cavemen in bearskins invaded the ivory towers of Artificial Intelligence.  Nine months later, they left with a baby named newLISP.  The women of the ivory towers wept and wailed.  \"Abomination!\" they cried.

cormullion

#13
Hi Ted. It's been a while since I looked at it. Since I'm not a programmer, I didn't think about that feature, but you could probably modify the rules easily by hacking the code. For example, if [text] and  [/text] are properly matched, perhaps other tags could be inserted in that part of the code. It was hard  enough coping with the many different patterns that newLISP code can currently  take. Perhaps,callback functions could be added - I worked out how to do those recently.



(I always wanted someone else to write it, but they never did.. :)