bug? about float point number lexical analysis

Started by psilwen, October 01, 2014, 03:02:24 AM

Previous topic - Next topic

psilwen

I want convert big float point number to integer.

> (int 999999999999999999999999999999999999999999.99)
9223372036854775807


I tried

> (bigint 999999999999999999999999999999999999999999.99)
1000000000000000045259160000000000000000000L


because of
Quote
When converting from floating point, rounding errors occur going back and forth between decimal and binary arithmetic.


And



> (bigint "1234567890123456789012345678901234567890.123456789")
1234567890123456789012345678901234567890L


this is what i expected



it means that needs to be converted to a string first



But,

> (bigint (string 1234567890123456789012345678901234567890.123456789))
1L

> (string 1234567890123456789012345678901234567890.123456789)
"1.23456789012346e+039123456789"


not "1234567890123456789012345678901234567890.123456789"



and also, the result is incorrect!



Distinctly, literal 1234567890123456789012345678901234567890.123456789 to be parsed as two parts "1234567890123456789012345678901234567890." and "123456789"



> 1234567890123456789012345678901234567890.123456789
1.23456789012346e+039
123456789


More tests

> (setq d 1234567890123456789012345678901234567890.123456789)

ERR: missing argument in function setf




> 100000000000000000.9876543210123456789
1e+017
456789

> (length "100000000000000000.9876543210123")
32

> 1000000000000000000.9876543210123456789
1e+018
3456789
> 10000000000000000000.9876543210123456789
1e+019
23456789
> 100000000000000000000.9876543210123456789
1e+020
123456789
> 1000000000000000000000.9876543210123456789
1e+021
342391             <----- Where it comes
89
> 10000000000000000000000.9876543210123456789
1e+022
10123456789
> 100000000000000000000000.9876543210123456789
1e+023
210123456789
> 1000000000000000000000000.9876543210123456789
1e+024
3210123456789
> 10000000000000000000000000.9876543210123456789
1e+025
43210123456789
> 100000000000000000000000000.9876543210123456789
1e+026
543210123456789
> 1000000000000000000000000000.9876543210123456789
1e+027
6543210123456789
> 10000000000000000000000000000.9876543210123456789
1e+028
76543210123456789
> 100000000000000000000000000000.9876543210123456789
1e+029
876543210123456789
> 1000000000000000000000000000000.9876543210123456789
1e+030
9876543210123456789L

> 10.1234567890123456789012345678901234567890
10.1234567890123
342391                  <----- Where it comes
890


Does considered support bigdecimal feature?
(reverse \"newlisp\")

Lutz

#1
(bigint "1234567890123456789012345678901234567890.123456789")
and:
(bigint (string 1234567890123456789012345678901234567890.123456789))

are not the same. When the large decimal-point number is parsed it is converted to floating point IEE 754 with max 16 digits of precision:


; in version up to 10.6.1
> 1234567890123456789012345678901234567890.123456789
1.23456789012346e+39
123456789
>

> (setq d 1234567890123456789012345678901234567890.123456789)

ERR: missing argument in function setf


up to and including v.10.6.1, only 32 characters are parsed for decimal number including a  potential sign and the decimal point. The rest of the source will be parsed as a different number, also causing error for the setq statement syntax. In version 10.6.2 up to 255 characters will be parsed in decimal numbers:


; in version 10.6.2 and after (in progress)
> 1234567890123456789012345678901234567890.123456789
1.23456789012346e+39

> (setq d 1234567890123456789012345678901234567890.123456789)
1.23456789012346e+39
>
>


This float now gets converted to a string and that string parsed by bigint. Note that Python does the same conversion when parsing code:



>>> str(1234567890123456789012345678901234567890.123456789)
'1.23456789012e+39'
>>>

When bigint parses a string it expects integer numbers and will stop parsing at any other character like the decimal point.

> (bigint "1.23456789012346e+39")
1L
>

rickyboy

#2
Thank you, psilwen and Lutz!
(λx. x x) (λx. x x)

psilwen

#3
Quote
only 32 characters are parsed for decimal number including a potential sign and the decimal point.



> (length "1234567890123456789012345678901234567890.0")
42
> 1234567890123456789012345678901234567890.0
1.23456789012346e+039
0

It actually stops parsing at the first non-numeric department.


Quote
The rest of the source will be parsed as a different number


This strategy has the potential problems, it is easy to confusing.



Reported it as an error might be better.



At least we knows the error, rather than face the wrong results puzzled.





> (/   1234567890123456789012345678901234567890     123456)
10000063910409026608770296128995225L
> (/   1234567890123456789012345678901234567890.0   123456)

ERR: division by zero in function /

> (div 1234567890123456789012345678901234567890     123456)
1.0000063910409e+034
> (div 1234567890123456789012345678901234567890.0   123456)
inf
> (div 2   0)
inf



> (%   10000000000000000000000000000000000000000000000.0 10)

ERR: division by zero in function %
> (mod 10000000000000000000000000000000000000000000000.0 10)
nan
> (%   10000000000000000000000000000000000000000000000 10)
0L
> (mod 10000000000000000000000000000000000000000000000 10)
8
(reverse \"newlisp\")

rickyboy

#4
Hello psilwen,



Most of these issues you raise are not issues at all in version 10.6.2.   I recommend you download and build that version on your machine and repeat these examples.



I've done just that and I only recall this one example still being an issue.


>$ ./newlisp
newLISP v.10.6.2 64-bit on BSD IPv4/6 UTF-8, options: newlisp -h

> (mod 10000000000000000000000000000000000000000000000 10)
8

I think it should evaluate to 0.



This example yields the same result in 10.6.2 as in the version you are using.


> (div 2   0)
inf

However, I believe it to be the correct behavior.  I actually have something like the following in some of my old code.


(define inf (div 1 0))
It's convenient to have a symbol that evaluates to "high values."



I hope this helps, and thank you very much for taking the time to check all of this (and making newLISP better).
(λx. x x) (λx. x x)

psilwen

#5
Quote
In version 10.6.2 up to 255 characters will be parsed in decimal numbers


This can not completely solve the problem.



Core of these issues is not the right way to parse.



When the number exceeds 255 characters, the same issue will appear again.
(reverse \"newlisp\")

rickyboy

#6
psilwen,



Are you suggesting that newLISP allow the user to enter a number of arbitrary length (as input) and then parse and store the internal representation of the number (as an exact representation of the input, i.e. arbitrarily large)?



AFAIK, no programming language allows this.  There are always limits.  Life is about dealing with "scarce" resources, and one of the major issues of software design is how to deal with that scarcity, while still meeting the goals you have in mind.



Given that, what do you propose should be the design for entering and storing numbers in newLISP?  Curious.
(λx. x x) (λx. x x)

Lutz

#7
I wrote the following while rickyboy was posting at same time.



Many of these problems will disappear when allowing more longer numbers when parsing. In the following example the second number with a decimal point will not be broken up in 10.6.2:



; correct in all versions
> (div 1234567890123456789012345678901234567890     123456)
1.0000063910409e+34

; in 10.6.0/1 number splits at point second arg gets 0
> (div 1234567890123456789012345678901234567890.0     123456)
inf

; in 10.6.2 same result when the decimal point is present
> (div 1234567890123456789012345678901234567890.0     123456)
1.0000063910409e+34
>


when using the integer division operator '/' the following happens:



; correct in all versions
>  (/   1234567890123456789012345678901234567890     123456)
10000063910409026608770296128995225L

; in 10.6.1 number splits
> (/   1234567890123456789012345678901234567890.0   123456)
ERR: division by zero in function /

; in 10.6.2 expected result after integer overflow
>  (/   1234567890123456789012345678901234567890.0     123456)
74709791641190


In both cases the integer operator translates the first operand into the biggest 64bit signed integer 9223372036854775807 and then does the division.



In 10.6.0/1 the number splits and the second operand is a 0 causing the 'div by zero' error. In 10.6.2 the correct result 74709791641190 is displayed.



The following example:



; in all versions mod forces conversion to floats
(mod 10000000000000000000000000000000000000000000000 10)
8
; 10.6.0/1 big number gets split
> (mod 10000000000000000000000000000000000000000000000.0 10)
nan

; 10.6.2 rounding error because of limited precision float conversion
> (mod 10000000000000000000000000000000000000000000000.0 10)
8
>


The floating point operator will transform the first operand into a float with max 16 digits precision and the 8 is a rounding error when converting to IEEE 754 double floats. The same happens in other languages, e.g. Python



; in Python same rounding error because of limited precision float conversion
>>> 10000000000000000000000000000000000000000000000.0 % 10
8.0


instead of the correct 0 zero.



Some final comments:

====================



All of these examples are constructed test cases using number literals/constants, which in practice never have occurred, except in a post a few days ago where a Pi constant with about 50 digits after the decimal point was used. Except for that code I have never seen this kind of problem in the real world.



newLISP chooses to keep integer and float arithmetic apart. Integer and float operators implicitly convert there arguments causing integer overflows or floating point conversion rounding errors. newLISP also separates bigint from normal 64-bit int arithmetic. Both of these separations are done on purpose and convenient when using newLISP in embedded systems when interacting with hardware registers or when doing integer arithmetic in other domains which are inherently of integer type.



Many of the problems will disappear when allowing higher precision decimal and float numbers while parsing, but the fact the these numbers will be converted to IEEE 754 double floats, will stay and cause some of the effects shown. These effects can be shown in any programming language using floats.



Ps: about the 32 decimal digit limit: The 32-length limit is not tested until the first non-digit character. I misspoke in my previous post.



Ps: about (div 2 0) => inf, this is part of IEE 754 compliance. See also the file

newlisp-10.x.x/qa-specific-tests/qa-float which tests for many of the IEE compliance features.

psilwen

#8
Quote from: "rickyboy"
Are you suggesting that newLISP allow the user to enter a number of arbitrary length (as input) and then parse and store the internal representation of the number (as an exact representation of the input, i.e. arbitrarily large)?



AFAIK, no programming language allows this.  There are always limits.


I certainly know that.



What I mean is



> newlisp
newLISP v.10.6.2 32-bit on Win32 IPv4/6 UTF-8, options: newlisp -h

> 12345678901234567890123456789012345678909999999999999999999999999999999999999
9999999999999999999999999999999999999999999999999999999999999999999999999999999
9999999999999999999999999999999999999999999999999999999999999999999999999999999
999999999999999999.123456789
1.23456789012346e+255
123456789

> (string 123456789012345678901234567890123456789099999999999999999999999999999
9999999999999999999999999999999999999999999999999999999999999999999999999999999
9999999999999999999999999999999999999999999999999999999999999999999999999999999
99999999999999999999999999.123456789)
"1.23456789012346e+255123456789"


This result is very weird, counterintuitive.



Users never expecting completely legal a single decimal number (it only consists of digits and a single dot)  is parsed into multiple number.



The limits from 32 extended to 255, just cover up the issue, but does not really solve it.



In Python



>>> a = 111111111122222222223333333333444444444455555555556666666666777777777788888888889999999999
        111111111122222222223333333333444444444455555555556666666666777777777788888888889999999999
        111111111122222222223333333333444444444455555555556666666666777777777788888888889999999999
        111111111122222222223333333333444444444.987654321
>>> a
1.1111111112222222e+308

>>> str(a)
'1.11111111122e+308'


When the number exceeds the limit



>>> b = 111111111122222222223333333333444444444455555555556666666666777777777788888888889999999999
        111111111122222222223333333333444444444455555555556666666666777777777788888888889999999999
        111111111122222222223333333333444444444455555555556666666666777777777788888888889999999999
        1111111111222222222233333333334444444445.987654321
>>> b
inf


it evaluated to inf, not two values 1.1111111112222222e+308  and  987654321.



AND



>>> str(b)
'inf'


it evaluated to 'inf', not the string '1.1111111112222222e+308987654321'.



Python's behavior is reasonable, newLISP is not. I think.



This is my opinion.



Thank you for attention.
(reverse \"newlisp\")

Lutz

#9
http://en.wikipedia.org/wiki/Overengineering">Overengineering and costly for code size and speed. newLISP would be slower and much bigger when designed in this philosophy. I doubt the splitting of large floats  would ever occur with a 255(*) size limit.



(*) now 1000: http://www.newlisp.org/downloads/development/inprogress/CHANGES-10.6.2.txt">http://www.newlisp.org/downloads/develo ... 10.6.2.txt">http://www.newlisp.org/downloads/development/inprogress/CHANGES-10.6.2.txt