Main Menu

char bug?

Started by maq, April 18, 2008, 03:56:09 PM

Previous topic - Next topic

maq

On version 9.3, I get the following:



newLISP v.9.3.0 on OSX UTF-8, execute 'newlisp -h' for more info.

> (char "")

string index out of bounds in function char
>


Prior to 9.3, this used to return 0. Shouldn't this be still the case being that an empty string by definition (at least in C, which I thought newLISP also followed) is one that contains only the null char ()?



Thanks,



--maq

ghfischer

#1
In nl-string.c change line 219 from


offset = adjustNegativeIndex(offset, len);

to


if ((offset != 0) || (len > 0)) offset = adjustNegativeIndex(offset, len);

This will produce the following behavior:

newLISP v.9.3.8 on OSX IPv4 UTF-8, execute 'newlisp -h' for more info.

> (char "")
0
> (char "" 0)
0
> (char "" 1)

string index out of bounds in function char
> (char "" -1)

string index out of bounds in function char
> (char "a")
97
> (char "a" 1)

string index out of bounds in function char
> (char "a" 0)
97
> (exit)


Lutz

#2
Note that 0 and -1 always should return the same result, because they refer to the same character: the first or the last.



The new behavior in 9.3.0 occurred after introducing error reporting for out of range indices on strings.



Thinking a bit more about this I believe that


(char "") -> 0 ; previous to 9.3.0

is actually wrong. The 'char' function does not work on binary characters, so it should never return a 0 because zero is not a valid character in neither ASCII or UTF-8. The "" string is empty and does not have any characters. The fact that C-strings are finished with 0 is a C-issue.



So 9.3.0 behavior will stay ;-), but perhaps return 'nil' on an empty string?



ps: I deleted my previous post.

ghfischer

#3
I'm of the opinion that char should move from strings to integers and not introduce nil.  I think the empty string is a valid special case where 0 should be returned.



That being said I'm ok with char returning errors if the offset is out of bounds.  I'd argue that any argument to the offset for an empty string should be an out of bounds error - but looking at the code it seems easier to just make offset=0 a special case rather that test offset for nil.



0 is a valid character in the ascii set, it's the nul character.

So if (char 0) --> "00" then (char "00") --> 0.

"" is simply shorthand for "00".

xytroxon

#4
What do other languages do?



Python has the ord function...

http://docs.python.org/lib/built-in-funcs.html">//http://docs.python.org/lib/built-in-funcs.html



>>> ord("A")

65



>>> ord("AB")

TypeError: ord() expected a character, but string of length 2 found



>>> ord("")

TypeError: ord() expected a character, but string of length 0 found



------------



Python's chr function behaves as follows:



>>> chr(-1)

ValueError: chr() arg not in range(256)



>>> chr(0)

'x00'



>>> chr(123)

'{'



>>> chr(257)

ValueError: chr() arg not in range(256)



---------



Anyone use Ruby or other languages?
\"Many computers can print only capital letters, so we shall not use lowercase letters.\"

-- Let\'s Talk Lisp (c) 1976

maq

#5
Quote from: "Lutz"


The new behavior in 9.3.0 occurred after introducing error reporting for out of range indices on strings.




I looked on the 9.3 release notes and did not see anything related to this, or the similar new behavior on lists. Could you please direct me to an explanation of the new behavior. This change is causing many things to break in a pre-9.3 codebase that I am working from and I need a systemic way to identify all the affected functions and correct them.


Quote from: "Lutz"
So 9.3.0 behavior will stay ;-), but perhaps return 'nil' on an empty string?


I would argue that returning a nil is better than erroring out and thus requiring additional code to test if the input is within bounds (e.g. not empty) or to catch the error. It jut seems that this one could go either way, so why not default towards an implementation that results in less code.

newdep

#6
Just a mind set..



(get-char) returns "x00"



(get-char "") returns 0



(char) returns "00"



so i would assume (char "") to return 0





But logicaly spoken (char "") is nil







(To be honest, i dont like "string index out of bounds in function char")

I like more consistent global returns of nil when its not-true or out of bounds. because the use of (nil? ...) just fits it perfectly.. or (zero? ...)
-- (define? (Cornflakes))

Lutz

#7
in develpment version 9.3.10 tomorrrow



(char "") => nil

empty string and 0 character not defined as displayable in ASCII and UTF-8, also "" and "00" are not the same (= "" "00") => nil. Now can do: (if (char str) ...)



(char), (get-char)

will give a missing parameter error message (also get-int, get-float, get-string)



(get-char "") => 0

get-char gets the byte at the address of the empty string, which is a zero byte, stays as is.

newdep

#8
I realy must have missed 9.3.9..

I thought I had it but there is only a 9.3.8 to be found..



9.3.9 must have been a ghost release when we move to 9.3.10 ;-)
-- (define? (Cornflakes))

cormullion

#9
I think the list index out of bounds was discussed at length here and is also mentioned in the 9.3 release notes. Not sure about strings...



I had to fix a bit of my code too. I sympathize!