qa-float crash

Started by newdep, September 22, 2009, 03:17:55 AM

Previous topic - Next topic

newdep

#15
aaa Fixed it !



Lutz i sent you a PM on this... Ill post the solution inhere when you checked it..







Look Mam... no hands!





> (/ 0 0)



ERR: division by zero in function /

> (div 0)



ERR: division by zero in function div

> (div 0 0)

nan

> (log 0)

-inf

> (sqrt -1)

nan

> (div 0)

inf

>
-- (define? (Cornflakes))

Lutz

#16
yes, seems to be solved. This will avoid the compiler warning:


#ifdef OS2
    case SIGFPE:
        errorProc(ERR_MATH);
        break;
#endif


so it runs qa-float (the one in 10.1.6 checking signed inf) well?



I will make a either a development release for 10.1.6, or perhaps wait until the next Release update and post just the affected files in the development directory.

newdep

#17
It seems that the very first time a NaN or Inf orrceur it returns

the "ERR: division by zero"  message.. The next time you run the same

function again it returns the NaN or Inf..



So somehwere still the ERR: is in the way...

The qa-float now stops at ERR:


* fresh startup *

> (sqrt -1)

ERR: division by zero in function sqrt

> (sqrt -1)
nan



* fresh startup *

> (div 0)

ERR: division by zero in function div
> (div 0)
inf
>

-- (define? (Cornflakes))

Lutz

#18
you have this in line 343 in function setupAllSignals() ?


#ifdef OS2
setupSignalHandler(SIGFPE, signal_handler);
#endif

Lutz

#19
... perhaps you just take "errorProc(...)" out and let it catch it doing nothing:


#ifdef OS2
    case SIGFPE:
        break;
#endif


in line 412

newdep

#20
No that doesnt work, I read somewhere that actualy catching the SIGFPE you need

a longjmp or create a function... I think thats now happening with the errorProc action..



Defining directly the PrintErrorMessage(...) only doesnt work..leaving it empty with a

break causes the real SIGFPE again.. So i need to re-route the Signal and clear

the ERR befor its displaying the NaN...





* added *



what does the return(nilCell); do in the errorProcAll ? I think i need that in

the SIGFPE..  Or a Signal Reset ?
-- (define? (Cornflakes))

newdep

#21
Just wanted to see what happend with the signals actualy,



The first time when newlisp starts and seeing a division by zero it reports the ERR:

And I get a trap Number 8 (which is SIGFPE).. The second time NO signal! but directly

the "inf".. Is this just dumb Luck? Or is there realy something in between? The secondtime its not from the SIGFPE else I would have seems the Singal message again...Mmmm





newLisp v 10.1.6 ........



> (div 0)

Signal = 8



ERR: division by zero in function div



> (div 0)

inf

>
-- (define? (Cornflakes))

newdep

#22
Ill try a different signal handler tomorrow..

GNU writes this about SIGFPE..




Quote— Macro: int SIGFPE



    The SIGFPE signal reports a fatal arithmetic error. Although the name is derived from "floating-point exception", this signal actually covers all arithmetic errors, including division by zero and overflow. If a program stores integer data in a location which is then used in a floating-point operation, this often causes an "invalid operation" exception, because the processor cannot recognize the data as a floating-point number. Actual floating-point exceptions are a complicated subject because there are many types of exceptions with subtly different meanings, and the SIGFPE signal doesn't distinguish between them. The IEEE Standard for Binary Floating-Point Arithmetic (ANSI/IEEE Std 754-1985 and ANSI/IEEE Std 854-1987) defines various floating-point exceptions and requires conforming computer systems to report their occurrences. However, this standard does not specify how the exceptions are reported, or what kinds of handling and control the operating system can offer to the programmer.



BSD systems provide the SIGFPE handler with an extra argument that distinguishes various causes of the exception. In order to access this argument, you must define the handler to accept two arguments, which means you must cast it to a one-argument function type in order to establish the handler. The GNU library does provide this extra argument, but the value is meaningful only on operating systems that provide the information (BSD systems and GNU systems).



FPE_INTOVF_TRAP

    Integer overflow (impossible in a C program unless you enable overflow trapping in a hardware-specific fashion).

FPE_INTDIV_TRAP

    Integer division by zero.

FPE_SUBRNG_TRAP

    Subscript-range (something that C programs never check for).

FPE_FLTOVF_TRAP

    Floating overflow trap.

FPE_FLTDIV_TRAP

    Floating/decimal division by zero.

FPE_FLTUND_TRAP

    Floating underflow trap. (Trapping on floating underflow is not normally enabled.)

FPE_DECOVF_TRAP

    Decimal overflow trap. (Only a few machines have decimal arithmetic and C never uses it.)
-- (define? (Cornflakes))

newdep

#23
Hi Lutz,



This works out of the box on my OS2 machine, no strange things at all.



In newlisp I still get the very first time the SIGFPE occeurs the ERR:... and then

after the second time the nan or inf..



Perhpas you know where the "ERR:" mixup could be in newlisp?

Because I cant find it...;-)






#include <stdio>
#include <float>
#include <signal>
#include <math>
#include <setjmp>

/* testing NaN and Inf return */


/* store stack */
jmp_buf errorJump;
int errorReg = 0;

void signal_handler(int sig)
{
switch(sig)
{
case SIGFPE:
/* signal(SIGFPE,SIG_DFL); */
printf("%s", "SIGFPE!n");
longjmp(errorJump,errorReg);
break;
default: return;
}

}

int main ()
{

/* save stack */
setjmp(errorJump);

/* nan-inf go through sigfpe */
signal(SIGFPE, signal_handler);

double nfloat;

nfloat = (sqrt (-1));
printf("sqrt=%fn", nfloat );

nfloat = (log (0));
printf("log=%fn", nfloat );

nfloat /= 0;
printf("div=%fn",  nfloat );

}




[E:PROGNLnewlisp-10.1.6]f
SIGFPE!
sqrt=nan
log=-inf
div=-inf






Here i removed the errorProc function and only used the longjmp.

That returns the first time nothing and then the nan. Thats also

not like my example, seems there is still some code messing around

in the results inside newlisp? But i cant fint it..
> (sqrt -1)
> (sqrt -1)
nan
>
-- (define? (Cornflakes))

newdep

#24
I found the double entry that causes the problem..



Its inside the errorreg check...

Mmm actualy its a setjmp longjmp issue

where the int var is 0 or 1..because of the amount

of jmp's used now..




if((errorReg = setjmp(errorJump)) != 0)
    {
    printf("ErrorReg2=%dn", errorReg);

    if(errorReg && (errorEvent != nilSymbol) )
        executeSymbol(errorEvent, NULL, NULL);
    else  exit(-1);

    goto AFTER_ERROR_ENTRY;
    }


I first though there might be a difference in the gcc or OS2 regarding the setjmp

behaviour so i tested with this -> http://www-personal.umich.edu/~williams/archive/computation/setjmp-fpmode.html">//http://www-personal.umich.edu/~williams/archive/computation/setjmp-fpmode.html

But thats identical on both my Linux and OS2..



a closer look returns this flow during newlisp ->

first setjmp = 0 (=errorReg) in the funtion above, then the (div 0) appears.

The errorReg in the signal_handler of SIGFPE sees the errorReg = 0 (initial)

Because there is a longjmp the next setjmp gets a 1 (from the longjmp).

so the errorReg checkup does  "goto AFTER_ERROR_ENTRY" with a new

setjmp but thatone returns ofcourse 1 (due to the last longjmp)..

At this point the signal_handler & the jmp_buf are both 1 at the stack is the same.





Oke.. im looking inside the code now for a fix because these "saved stacks" need to be in sync ;-)
-- (define? (Cornflakes))

newdep

#25
I could cheet by putting a SIGNAL trigger like sqrt(-1); inside the C code.

But thats not what I would like to see, also not sure if the stacks are

in sync...
-- (define? (Cornflakes))

newdep

#26
I dont see any workable way currently without cheeting on the SIGFPE

.. perhpas you have an extra clue ?



This is how the SIGFPE adjustment to 10.1.6 now looks ->





this is inside setupallsignals ->
#ifdef OS2
setupSignalHandler(SIGFPE, signal_handler);
/* force a SIGFPE trigger when newlisp starts */
/* this is to activate the NaN Inf returns!   */
(sqrt (-1));
/**********************************************/


this is inside the signal_handler
#ifdef OS2
     /* SIGFPE must be forced for a NaN Inf */
/* the longjmp returns 1 to setjmp when set */
case SIGFPE:
longjmp(errorJump,errorReg);
break;
#endif




the output of qa-float is this ->
operation on NaN result in NaN                
-----------------------------------------------
                     (NaN? (mul 1 aNan)) => true
                     (NaN? (div 1 aNan)) => true
                     (NaN? (add 1 aNan)) => true
                     (NaN? (sub 1 aNan)) => true
                       (NaN? (sin aNan)) => true
                       (NaN? (cos aNan)) => true
                       (NaN? (tan aNan)) => true
                      (NaN? (atan aNan)) => true

comparison with NaN is always nil              
-----------------------------------------------
                        (not (<1> true
                        (not (> 1 aNan)) => true
                       (not (>= 1 aNan)) => true
                       (not (<1> true
                     (not (= aNan aNan)) => true

NaN is not equal to itself                    
-----------------------------------------------
                     (not (= aNan aNan)) => true

integer operations assume NaN as 0            
-----------------------------------------------
                        (= (- 1 aNan) 1) => true
                        (= (+ 1 aNan) 1) => true
                        (= (* 1 aNan) 0) => true
         (not (catch (/ 1 aNan) 'error)) => true
                         (= (>> aNan) 0) => true
                         (= (<<aNan> true

integer operations assume inf as max-int      
-----------------------------------------------
      (= (* 1 aInf) 9223372036854775807) => true
      (= (- aInf 1) 9223372036854775806) => true
     (= (+ aInf 1) -9223372036854775808) => true

FP division by inf results in 0                
-----------------------------------------------
                        (= (/ 1 aInf) 0) => true
                      (= (div 1 aInf) 0) => true

inf specials                                  
-----------------------------------------------
                           (= aInf aInf) => true
                  (NaN? (sub aInf aInf)) => true

retain sign of -0.0                            
-----------------------------------------------
        (= (set 'tiny (div -1 aInf)) -0) => true
                      (= (sqrt tiny) -0) => true

inf is signed too                              
-----------------------------------------------
                  (= aNegInf (div -1 0)) => true
                  (!= aNegInf (div 1 0)) => true

mod with 0 divisor is NaN                      
-----------------------------------------------
                       (NaN? (mod 10 0)) => true

% with 0 divisor throws error                  
-----------------------------------------------
           (not (catch (% 10 0) 'error)) => true

support of subnormals: (0 4.940656458e-324) => (0 4.940656458e-324)
machine epsilon: 1.110223025e-16 => 1.110223025e-16

-- (define? (Cornflakes))

newdep

#27
forgot the extra setjmp, this too i added..the extra errorReg = setjmp(errorJump); call.






if((errorReg = setjmp(errorJump)) != 0)
    {
    if(errorReg && (errorEvent != nilSymbol) )
        executeSymbol(errorEvent, NULL, NULL);
    else exit(-1);
    goto AFTER_ERROR_ENTRY;
    }

errorReg = setjmp(errorJump);
setupAllSignals();

-- (define? (Cornflakes))

Lutz

#28
Quote#ifdef OS2
     /* SIGFPE must be forced for a NaN Inf */
   /* the longjmp returns 1 to setjmp when set */
   case SIGFPE:
      longjmp(errorJump,errorReg);
      break;
#endif


The setjmp() will return only 1 if errorReg was 1, but on program start and after reset it is set to 0, and I think 0 is, what setjmp() when doing the longjmp(). If it would make setjmp() return a 1, then we would see "Not enough memory" reported as error, which is defined as 1.



Can you try this?


#ifdef OS2
   case SIGFPE:
      longjmp(errorJump,0);
      break;
#endif


I believe it also will work.

newdep

#29
No that results in the same "double" effect..

Also when moving it to 1 its of no use, there is always a mismatch

in the jmp_buf content.



How about sigsetjmp and siglongjmp and sigset_buf ?



This is how I see the flow in newlisp now with the sigfpe involved,

correct me here if im wrong ;-) only helps finding the itch...


main()
   |
errorReg = 0
setjmp(errorJump)
   |
setupAllsignals init (NOT SIGFPE, because its only triggered on exception)
   |
(sqrt -1)  (on the newlisp console)
   |
SIGFPE trigger with errorReg = 0 (from the first fresh init)
longJump(errorJump,errorReg)  (initial stack with errorReg = 0)
   |
errorReg = 1 (is always 1 when returns from LongJump!)
setjmp(erroJump) != 0
(no return on console (sqrt -1) because errorReg is now 1 which is a NEW stack)
   |
errorReg = setjmp(errorJump)  (is now 1 because of longjmp)
setupAllSignals (no trigger for SIGFPE)
   |
(sqrt -1)
   |
SIGFPE trigger with errorReg = 1 (new errorReg value from previous setjmp)
  |
(return "nan") (because the jmp_buf stack is now in sync)

-- (define? (Cornflakes))