(parse) oddness

Started by kanen, April 20, 2010, 01:39:54 PM

Previous topic - Next topic

kanen

I have a problem, where parse is returning an extra item.



I have attached code for this problem, with comments.
;; sites-sm.txt
[text]
;; copy below to sites-sm.txt
1  google.com
2  facebook.com
3  yahoo.com
4  youtube.com
5  live.com
6  wikipedia.org
7  blogger.com
8  baidu.com
9  msn.com
10 yahoo.co.jp
[/text]

(set 'sites (parse (read-file "sites-sm.txt") "n"))
;; ("1tgoogle.com" "2tfacebook.com" "3tyahoo.com" "4tyoutube.com" i
;; "5tlive.com" "6twikipedia.org" "7tblogger.com" "8tbaidu.com"
;; "9tmsn.com" "10tyahoo.co.jp" "")

(println "Sites has " (length sites) " entries") ; ->
;; Sites has 11 entries

(println (slice (sites 0) (+ 1 (find "t" (sites 0))) ) )
;; google.com

(dolist (x sites)
   (println (slice x (+ 1 (find "t" x)) ) ) )
;; prints: google.com ... yahoo.co.jp
;;
;; ERR: value expected : (find "t" x)

(exit)

As you can see, parse returns everything in the file, plus an extra "" item in the list.



This causes everything from the parse to start failing, for various reasons.



* 11 items, when there are only 10 in the list (or, should only be 10)

* The dolist fails on the "" item, because there is no "t" character to find



Obviously, I can correct this with something like:
(dolist (x sites)
   (if (> (length x) 0) (push x newsites))
)  
(set 'sites newsites)


I feel like I am missing something fundamental here, but in my mind, adding this extra list item, with a blank list, seems like a bug or a failure. Can someone help me understand this issue clearly?
. Kanen Flowers http://kanen.me[/url] .

Sammo

#1
A trailing "n" in "sites-sm.txt" is the culprit. Try removing the trailing newlines before parsing:
(set 'sites (parse (trim (read-file "sites-sm.txt") "" "n") "n"))

Edit: Corrected typo

kanen

#2
I thought the same thing, but... unless I completely misunderstand how (parse) works, I can tell you there is no trailing newlines in the file.



Opening it in vim or another editor shows only a newline after each entry, but not an extra, trailing newline.



Yes, your solution fixes the problem, but the problem does not actually exist because there's no blank (or trailing) newline at the end of the file.



Am I still missing something?


Quote from: "Sammo"A trailing "n" in "sites-sm.txt" is the culprit. Try removing the trailing newlines before parsing:
(set 'sites (parse (trim (read-file "sites-sm.txt") "" "n") "n"))

Edit: Corrected typo
. Kanen Flowers http://kanen.me[/url] .

Sammo

#3
If there isn't a trailing "n" in "sites-sm.txt", then is one being appended in (read-file "sites-sm.txt")? It seems that it would have to be in order for trim to correct the problem.

-- Sam

kanen

#4
Makes sense as to what is happening - (read-file) adding a trailing newline.



I consider this to be a bug, though, unless someone can correct my thinking.



Perhaps I should be reading the file differently?


Quote from: "Sammo"If there isn't a trailing "n" in "sites-sm.txt", then is one being appended in (read-file "sites-sm.txt")? It seems that it would have to be in order for trim to correct the problem.

-- Sam
. Kanen Flowers http://kanen.me[/url] .

Sammo

#5
On Win XP running newLisp 10.2.1, I dup'd your file (complete with n instead of rn) but could not duplicate your symptom.
(parse (read-file {C:MyDirectorysites-sm.txt}) "n") returns
("1tgoogle.com" "2tfacebook.com" "3tyahoo.com" "4tyoutube.com" "5tlive.com"
 "6twikipedia.org" "7tblogger.com" "8tbaidu.com" "9tmsn.com" "10tyahoo.co.jp")

The problem is either in a newLisp version later than 10.2.1 or in a non-Windows version.

Lutz

#6
Like on Win XP with 10.2.1 it runs correctly on UNIX too with all versions of newLISP.



Vi adds extra line feed character(s) on Windows and on UNIX, even if not typed in and not visible in vi.



But instead of stripping the extra trailing line-feed, here is another way to parse using 'find-all':


> (find-all {[^n]+} (read-file "example.txt"))
("1  google.com" "2  facebook.com" "3  yahoo.com" "4  youtube.com" "5  live.com"
 "6  wikipedia.org" "7  blogger.com" "8  baidu.com" "9  msn.com" "10 yahoo.co.jp")


while 'parse' defines the border between items, the regex in 'find-all' defines the item content. I used curl braces instead of quotes in the above example, so I don't have to double escape the line-feed character.



'find-all' also takes extra options, which let you process each item found, and by default always uses regular expressions, which 'parse' does not.



Regarding a raw-packet option, from another post:


Quote from: "Kanen"Now, if I could just convince you to create a (raw-packet) option in newLISP, my problems would be solved. ;P


I tried to create a 'net-packet' a few years back and used example code from here:



http://mixter.void.ru/rawip.html">http://mixter.void.ru/rawip.html



but could not get it to work (that was with OSX on PPC G4 and FreeBSD on Intel x368), even the example code. Perhaps it was some simple thing, I missed. What I need is a working C example.

kanen

#7
Quote from: "Lutz"Like on Win XP with 10.2.1 it runs correctly on UNIX too with all versions of newLISP.



Vi adds extra line feed character(s) on Windows and on UNIX, even if not typed in and not visible in vi.


Yes it does, because vi added a ":set binary" or a "-b" command-line option. This might be enough to migrate me to vile, just to not deal with it. Frustrating!


Quote from: "Lutz"But instead of stripping the extra trailing line-feed, here is another way to parse using 'find-all':


> (find-all {[^n]+} (read-file "example.txt"))
("1  google.com" "2  facebook.com" "3  yahoo.com" "4  youtube.com" "5  live.com"
 "6  wikipedia.org" "7  blogger.com" "8  baidu.com" "9  msn.com" "10 yahoo.co.jp")

Fantastic. Thanks for the easy (and appreciated) example. Makes my life less complicated.


Quote from: "Lutz"while 'parse' defines the border between items, the regex in 'find-all' defines the item content. I used curl braces instead of quotes in the above example, so I don't have to double escape the line-feed character.



'find-all' also takes extra options, which let you process each item found, and by default always uses regular expressions, which 'parse' does not.



Regarding a raw-packet option, from another post:


Quote from: "Kanen"Now, if I could just convince you to create a (raw-packet) option in newLISP, my problems would be solved. ;P


I tried to create a 'net-packet' a few years back and used example code from here:



http://mixter.void.ru/rawip.html">http://mixter.void.ru/rawip.html



but could not get it to work (that was with OSX on PPC G4 and FreeBSD on Intel x368), even the example code. Perhaps it was some simple thing, I missed. What I need is a working C example.


Yeah, the stuff from mixter is total crap. He misses about 1/2 the headers you need, includes headers you don't need and there are many mistakes in the code. I have several working examples. I will try to put something together that makes sense. It won't be pretty, but it will at least be functional (i.e. it will compile and run properly, without a hundred hours of tweaking and research).
. Kanen Flowers http://kanen.me[/url] .