Parsing file gives ERR: string token too long :

Started by joejoe, October 10, 2010, 01:17:32 PM

Previous topic - Next topic

joejoe

I cant figure out what I should be doing differently to be able to parse a file that apparently has a looong string token that nL doesnt like.



I have this code:


(set 'words (parse (read-file "myfile")))
(exit)


and it I get the error:


ERR: string token too long : "></div>rntttttntttt</div>nttt</div> ntt "

Its an html source file i am trying to pull pieces from into lists.



Thank you for direction!

Lutz

#1
When you use 'parse' without the optional break string parameter, then the text is parsed as if newLISP source is read. newLISP source has string length limitations of the "..." quoted strings of 2048 characters. For longer strings the [text], [/text] tags must be used. Another limitation are symbol tokens in newLISP source which cannot be longer than 255 characters.



Use a string break pattern (either simple of regular expression), and the problem will go away.

joejoe

#2
Thank you, Lutz, for pointing out how to use a sting break. Didnt know what that meant before.



Might I also ask what would be the function to use to just pick out a section of the file (using a regular expression) instead of having parse make the entire file a string before I get to the elements Im after.



I first thought to use find or regex, but it seems most of the functions are prepared for nL strings already created.



thank you.

cormullion

#3
I think you can use search for this.


(set 'file (open "program.c" "r"))
(while (search file "#define (.*)" true 0)
   (println $1))
(close file)


I don't know if this avoids reading the whole file into memory or not...

joejoe

#4
Thanks cormullion :)



I dont yet know enough to make use of the code you put in your post, but I used the example above this from the manual and it works great, no errors.



I appreciate the help. Thanks cormullion and Lutz!