qa-float crash

Started by newdep, September 22, 2009, 03:17:55 AM

Previous topic - Next topic

newdep

Hi Lutz,



on OS/2..10.1.5, Im getting a crash in return from this code inside qa-float ->


(set 'result '())
(set 'u 1.0)
;(while (> u 0.0) (set 'u (mul u 0.5)) (push u result))
 (while (> u 0.0) (set 'u (mul u 0.5)) (println u))


Not sure if it should or not crash, I dont have any other OS at the moment to compare it with..


..
..
..
1.822780505e-304
9.113902524e-305
4.556951262e-305
2.278475631e-305
1.139237816e-305
5.696189078e-306
2.848094539e-306
1.424047269e-306
7.120236347e-307
3.560118174e-307
1.780059087e-307
8.900295434e-308
4.450147717e-308
2.225073859e-308

Killed by SIGFPE
pid=0x0097 ppid=0x0040 tid=0x0001 slot=0x006e pri=0x0200 mc=0x0001
E:PROGNLNEWLISP-10.1.5NEWLISP.EXE
NEWLISP 0:00011eb9
cs:eip=005b:00021eb9      ss:esp=0053:0017faf0      ebp=0017fb28
 ds=0053      es=0053      fs=150b      gs=0000     efl=00002297
eax=005503a0 ebx=005503a0 ecx=00550a40 edx=3fe00000 edi=0017fb08 esi=00000003
Process dumping was disabled, use DUMPPROC / PROCDUMP to enable it.

-- (define? (Cornflakes))

Lutz

#1
This is a rare underflow condition handled by OS/2 with an exception. You could setup a  signal handler for SIGFPE in function setupAllSignals() around line 310 in newlisp.c, or you could set it up in newLISP itself in the startup code. All other OSs handle subnormals by returning the smallest possible FP value.



This has always been in OS/2 but with qa-float we are forcing it to show up for the first time.

newdep

#2
Aha!





This is what I get now..after the sigfpe fix..



A far better error report ;-)


[E:prognlnewlisp-10.1.5].newlisp qa-float
SYS1808:
The process has stopped.  The software diagnostic
code (exception code) is  0097.




Here is the simple addon for the newlisp.c code at line 310 ->




#ifndef WIN_32

#if defined(SOLARIS) || defined(TRU64) || defined(AIX)
setupSignalHandler(SIGALRM, sigalrm_handler);
setupSignalHandler(SIGVTALRM, sigalrm_handler);
setupSignalHandler(SIGPROF, sigalrm_handler);
setupSignalHandler(SIGPIPE, sigpipe_handler);
setupSignalHandler(SIGCHLD, sigchld_handler);
#else
setupSignalHandler(SIGALRM, signal_handler);
setupSignalHandler(SIGVTALRM, signal_handler);
setupSignalHandler(SIGPROF, signal_handler);
setupSignalHandler(SIGPIPE, signal_handler);
setupSignalHandler(SIGCHLD, signal_handler);
#ifdef OS2
setupSignalHandler(SIGFPE, signal_handler);
#endif
#endif

#endif
}




Btw.. Ill put that in for the (import ...) too, thatone crashes far too often here ;-)
-- (define? (Cornflakes))

newdep

#3
mmm actualy something is fishy with the NaN's in OS/2 code..

even a simple (sqrt -1) takes ages and then cracks..



Ill have a closed look inside the code on NaN's I did not check that actualy..



im nor sure if its the Pentium im working on or the compiler or the code ;-)
-- (define? (Cornflakes))

Lutz

#4
I assume it is the FP library in OS/2, but check Windows or Linux on this machine, if you can. Both should be fine on qa-float, unless its one of those (very old) Pentiums with problems in the FP processing units.

newdep

#5
Now this is getting intresting.. Its good to read that a bunch of hardware coders and

compiler writers dont even care about precision ;-) But thats a different story..



Seeking the internet for a Faulty P4 I did indeed ran into story's from back in 1995,

this P4 is from around 1998 so I expect it to have a bug anyway because it one from

a lowcost Compaq mainstream where the Sticker "Intel INside" is bigger then the

machine ittself..



Looking at the GCC compiler optimalizations I added the -march=pentium4 together

with the -O2 this indeed does help a bit but not yet fully to cover the DBL_MIN value

of (DBL_MIN  2.2250738585072014e-308) which is bothering me..



So this is what i did on the command line, a max and a min.. The only difference

where it clashes is the -308...not the max..




>
(mul 2e308 2)
inf
> (mul 2e+308 2)
inf
> (div 2e+308 2)
inf
> (div 2e308 2)
inf
> (div 2e-308 2)
SYS1808:
The process has stopped.  The software diagnostic
code (exception code) is  009A.




Im seeking deeper...



PS: After a small C program compiled with GCC laso that clashed.. So its now time

to digg into this gcc port..



PPS: And why is there a difference of 6 all the time between the result and the

original??


> (div (mul 4195835e128 3145727e128 ) 3145727e128 )
4.195835e+134

= 4195835e128 equal to 4.195835e+134 ? 6 zero's more ? Is this a rounding error?

> (div (mul 4195835e134 3145727e134 ) 3145727e134 )
4.195835e+140

..

> (div (mul 4195835e140 3145727e140) 3145727e140 )
4.195835e+146

> (div (mul 4195835e146 3145727e146) 3145727e146 )
4.195835e+152

> (div (mul 4195835e152 3145727e152) 3145727e152 )


SYS1808:
The process has stopped.  The software diagnostic
code (exception code) is  0098.




PPPS: I did some tests for the FDIV P4 Bug but thats not My P4.. So lets assume

Compaq did sell a working "Intel Inside" for a second...
-- (define? (Cornflakes))

newdep

#6
aa yes oke the 6 zero is obvious..

(not when your staring at it already a few hours ;-)



I have now 2 possibillity's.. Or its the gcc that has the issue or the Klibc im working against...



...still digging...
-- (define? (Cornflakes))

newdep

#7
I digged into the gcc (OS2 port) and cant make any decent bread from it.. Its a spagetti of #defines..



Anyway.. from a simple test in plain C a (sqrt -1) and a (div 0 0) do trigger the SIGFPE.

Im unable to get those to return NaN..



The (div 1 0) returns inf

The (div 0 0) crashes (or with the adjustment traps and then exit's)



Where in the newlisp code do I exacly need to make the adjustment to make

it return "nan" ?  I tried several places.. no luck without a trap..



It would be fine for me to get a return of "nan" by trap, but i would like to keep newlisp

running from that point on... not sure if thats possible..



Norman
-- (define? (Cornflakes))

Lutz

#8
Division by zero is caught by newLISP in file nl-math.c. For integers it is the arithmetikOp() function and for floats the floaOp() function.

newdep

#9
oke its not that simple and it seems that the NaN is simply not defined correctly in the GCC of OS2.



I use for now the generic error return of MATH_ERR,

which is an official error from newlisp and does not alter any code

and MODULO still uses it too ->





> (div 0 0)



ERR: division by zero in function div





Now i need still to fix



(div 0)



(sqrt -1)



and some rest..
-- (define? (Cornflakes))

Lutz

#10
I wonder what the following C program produces in OS2:


#include <stdio.h>

#ifdef __BIG_ENDIAN__
#define __nan_bytes     { 0x7f, 0xf8, 0, 0, 0, 0, 0, 0 }
#endif

#ifdef __LITTLE_ENDIAN__
#define __nan_bytes     { 0, 0, 0, 0, 0, 0, 0xf8, 0x7f }
#endif

int main(int argc, char * argv[])
{
double dFloat;
char bytes[8] = __nan_bytes;

dFloat = *(double *)bytes;

printf("NaN = %lfn", dFloat);
}


produces:


NaN = nan

on Mac OS X



The bit pattern used is derived from this:


> (unpack "bbbbbbbb" (pack "<lf" (sqrt -1)))
(0 0 0 0 0 0 248 255)
> (unpack "bbbbbbbb" (pack ">lf" (sqrt -1)))
(255 248 0 0 0 0 0 0)
>

newdep

#11
;-) Thats what I thought about too last night ..

Good you bring that up actualy.. Ill try that tonight...



actualy the raw (sqrt -1) in C crashes too so anything with a (sqrt -1) in it  i cant test...

But ill test that Indian code..
-- (define? (Cornflakes))

newdep

#12
I did adust __xxx_ENDIAN__ to __xxx_ENDIAN

this is what it returns..


#ifdef __BIG_ENDIAN
#define __nan_bytes     { 0x7f, 0xf8, 0, 0, 0, 0, 0, 0 }
#endif

#ifdef __LITTLE_ENDIAN
#define __nan_bytes     { 0, 0, 0, 0, 0, 0, 0xf8, 0x7f }
#endif

int main(int argc, char * argv[])
{
double dFloat;
char bytes[8] = __nan_bytes;

dFloat = *(double *)bytes;

printf("NaN = %lfn", dFloat);
}
[E:PROGNLnewlisp-10.1.5]nan
NaN = nan




Using BIG_ENDIAN it return -> nan

using LITTLE_ENDIAN it return -> NaN = 0.000000



Now because you previously mentioned the IEEE 754 it got me thinking...
-- (define? (Cornflakes))

newdep

#13
resume "totally utterly flabbergasted".. I cant find it..
-- (define? (Cornflakes))

newdep

#14
let see if I can adjust nl-math.c with

http://www.gnu.org/s/libc/manual/html_node/Infinity-and-NaN.html">//http://www.gnu.org/s/libc/manual/html_node/Infinity-and-NaN.html

and then explicitly test with isfinite() befor returning..
-- (define? (Cornflakes))