newLISP Fan Club

Forum => newLISP in the real world => Topic started by: newdep on July 17, 2004, 03:53:36 AM

Title: Segmentation Fault running NewLisp "Daemon mode"
Post by: newdep on July 17, 2004, 03:53:36 AM
Hi Lutz,



Running Newlisp 8009.



Segmentation Fault occeurs when connecting to Newlisp when its running

in Daemon Mode.



Example #1:



bash-2.05b$ newlisp -L -l -d 50000 &

[1] 705



--- Now I telnet to "localhost 50000"

--- Newlisp prompt

--- >(exit)



[1]+  Segmentation fault      newlisp -L -l -d 50000

bash-2.05b$







Example #2:



bash-2.05b$ newlisp -L -l -d 50001 &

[1] 713



--- Now I telnet to "localhost 50001"

--- Newlisp prompt

--- >(exit)



bash-2.05b$ fg

newlisp -L -l -d 50001

Segmentation fault

bash-2.05b$







Hope you can catch the bugger...



Norman...
Title: Still a Segmentation fault in 8010
Post by: newdep on July 20, 2004, 04:12:53 AM
Hello Lutz,





Also in version 8.0.10 the Segmentation fault occeurs on the daemon site.



When Newlisp is running in -d mode (not in -p mode)

and the remote client connects and ONLY types (exit)  on the first prompt

then NewLisp dumps with Segmentation Fault.



When the client presses first ENTER and THEN types the (exit) ..its oke..



(little issue i think)





Norman.
Title:
Post by: Lutz on July 20, 2004, 05:59:54 AM
This bug has been there for a long time and does not occur on Win32 and BSD. It is not limited to doing an (exit) right away, but can also occur in other circumstances.



If you have any idea how to fix this, help would be appreciated ;)



Lutz



ps: use only one of -L or -l options, if you use both the last one will win
Title:
Post by: eddier on July 20, 2004, 06:08:22 AM
Works fine using the Linux 2.6.x kernel. Cannot remember the last digit. What kernel are you using?



Eddie
Title:
Post by: Lutz on July 20, 2004, 06:11:37 AM
I am not sure, have to check, whatever Mandrake 9.2 uses. Sometimes you have to exit and reconnect from the client several times to provoke the error, as if it is some timing problem.



Lutz
Title:
Post by: eddier on July 20, 2004, 06:31:17 AM
Ok. I see. After two tries I got the segment fault.



Mandrake's latest kernel is probably 2.4.x. I think all stable distributions except maybe turbo Linux use this kernel. On install, you can choose the 2.2.x or the 2.4.x kernel, it defaults to 2.2.x.



I'm using Debian testing. For a client this is ok. For the server side I would talk to someone who deals with security. However, I've noticed everything is much faster with the 2.6 kernel. EVERYTHING!



I've run 2.2.x (Mandrake), 2.4.x (Debian), FreeBSD, NetBSD and 2.6.x (Debian) on this same machine (AMD 2400+ with 512M memory). I noticed that the 2.6.x kernel has a much snappier response than 2.4.x and even FreeBSD. I wonder if that holds as well on Intel machines?



I like FreeBSD for a server and 2.6.x as a client.



Eddie
Title:
Post by: newdep on July 20, 2004, 07:14:51 AM
Im running here slackwre 2.4.20 / 2.4.26...



Lets see if i can find anything rlated to this issue in the code....



Regards, Norman.
Title:
Post by: newdep on July 20, 2004, 07:48:09 AM
Oke doing some tracing on my linux machine ..here is the output of the 1st oke and the 2nd dumps..



It stops after the filecontrol64 (fcntl64) the second time. this is 1 daemon sessions btw...



Looks like there is filecontrol happening on a variable wich isnt there..



Perhpas you see it quicker ;-)







### FIRST ###



accept(3, {sin_family=AF_INET, sin_port=htons(34398), sin_addr=inet_addr("127.0.0.1")}}, [16]) = 1

getpeername(1, {sin_family=AF_INET, sin_port=htons(34398), sin_addr=inet_addr("127.0.0.1")}}, [16]) = 0

time(NULL)                              = 1090333867

open("/etc/localtime", O_RDONLY)        = 4

fstat64(4, {st_mode=S_IFREG|0644, st_size=1074, ...}) = 0

old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40015000

read(4, "TZifrr"..., 4096) = 1074

close(4)                                = 0

munmap(0x40015000, 4096)                = 0

fcntl64(1, F_GETFL)                     = 0x2 (flags O_RDWR)

fstat64(1, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0

old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40015000

_llseek(1, 0, 0xbffff570, SEEK_CUR)     = -1 ESPIPE (Illegal seek)

munmap(0x40015000, 4096)                = 0

write(1, "newLISP v8.0.10 Copyright (c) 20"..., 70) = 70

ioctl(1, SNDCTL_TMR_TIMEBASE, 0xbffff700) = -1 EINVAL (Invalid argument)

write(1, "n> ", 3)                     = 3

read(1, "(", 1)                         = 1

read(1, "e", 1)                         = 1

read(1, "x", 1)                         = 1

read(1, "i", 1)                         = 1

read(1, "t", 1)                         = 1

read(1, ")", 1)                         = 1

read(1, "r", 1)                        = 1

read(1, "n", 1)                        = 1

close(1)                                = 0

old_mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40015000

write(-1, "n", 1)                      = -1 EBADF (Bad file descriptor)

close(1)                                = -1 EBADF (Bad file descriptor)





### SECOND ###

accept(3, {sin_family=AF_INET, sin_port=htons(34399), sin_addr=inet_addr("127.0.0.1")}}, [16]) = 1

getpeername(1, {sin_family=AF_INET, sin_port=htons(34399), sin_addr=inet_addr("127.0.0.1")}}, [16]) = 0

time(NULL)                              = 1090333877

fcntl64(1, F_GETFL)                     = 0x2 (flags O_RDWR)

fstat64(1, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0

old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40017000

_llseek(1, 0, 0xbffff570, SEEK_CUR)     = -1 ESPIPE (Illegal seek)

munmap(0x40017000, 4096)                = 0

write(1, "n> ", 3)                     = 3

read(1, "(", 1)                         = 1

read(1, "e", 1)                         = 1

read(1, "x", 1)                         = 1

read(1, "i", 1)                         = 1

read(1, "t", 1)                         = 1

read(1, ")", 1)                         = 1

read(1, "r", 1)                        = 1

read(1, "n", 1)                        = 1

close(1)                                = 0

old_mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40017000

write(-1, "n> ", 3)                    = -1 EBADF (Bad file descriptor)

close(1)                                = -1 EBADF (Bad file descriptor)

accept(3, {sin_family=AF_INET, sin_port=htons(34400), sin_addr=inet_addr("127.0.0.1")}}, [16]) = 1

getpeername(1, {sin_family=AF_INET, sin_port=htons(34400), sin_addr=inet_addr("127.0.0.1")}}, [16]) = 0

time(NULL)                              = 1090333887

fcntl64(1, F_GETFL)                     = 0x2 (flags O_RDWR)

--- SIGSEGV (Segmentation fault) ---

+++ killed by SIGSEGV +++
Title:
Post by: Lutz on July 20, 2004, 07:57:18 AM
thanks for the trace Norman, I think I found the problem



Lutz
Title:
Post by: Lutz on July 20, 2004, 09:24:40 AM
Version 8.0.11 in http://newlisp.org/downloads/development/ solves this problem.



Lutz
Title:
Post by: newdep on July 20, 2004, 10:07:06 AM
Luts thanx for the quick fix...but now in rel 8.0.11 the -d function does not daemon anymore.. drops out after 1 connection exits...



Now it looks like when the clients (exit) from the daemon the daemon re-binds

to the port too quickly...because it has closed it..and fails and exits...



But your right, the segmentation is gone ;-)



PS: perhpas 1 hint for enhancement, when 1 client is connected you can close the "listener" so no more clients can connect and you will keep your current session... that way you dont "pressure" newlisp on the sockets...and siply re-open the listener when the client has exit...
Title:
Post by: Lutz on July 20, 2004, 12:28:27 PM
This is crazy, nn my system Mandrake Linux 9.2 with kernel 2.4.22 it is working Ok.



I then added



deleteInetSession(sock);

close(sock);



after:



connection = accept(sock, (struct sockaddr *) &dest_sin, &dest_sin_len);



in the function: FILE * serverFD(int port, int reconnect) in file nl-sock.c closing the listen socket after accepting a connection, and now it also exits right away on my side and also breaks it on BSD, which all doesn't make much sense :(



Lutz
Title:
Post by: Lutz on July 20, 2004, 02:41:41 PM
Norman, I wonder if this http://newlisp.org/downloads/development/Norman/



make any difference in the -d mode on your Linux system?



Lutz
Title:
Post by: newdep on July 20, 2004, 02:47:00 PM
Hi Lutz,



Did a quick test but no changes...although in -d mode after the first connect

from the client and disconnect the daemon also quits...(nicely) but does not run as daemon anymore...



Norman.
Title:
Post by: eddier on July 20, 2004, 02:54:36 PM
Doesn't break on Debian 2.6.x



Eddie
Title:
Post by: Lutz on July 20, 2004, 02:55:18 PM
This is hard to fix for me as I cannot reproduce it on my Linux installation, can you find out where it exits with a trace?



The only exit I see in the program ogic is when the accept() call falls through and leaves the 'connection' variable with a NULL, but then you would get a message: "newLISP server setup on port xxx failed" and you don't seem to get this. So I think is bombing out somewhere?



Lutz
Title:
Post by: Lutz on July 20, 2004, 02:56:56 PM
thanks for testing Eddie, on my side I am running Kernel 2.4.22 on Mandrake 9.2. Norman, what Linux are you running?



Lutz



ps: you mentioned it already Norman: slackware 2.4
Title:
Post by: newdep on July 20, 2004, 11:21:14 PM
Hello Lutz,



Running linux 2.4.20



here is my trace, this time it pops-out at "semget" or the exit(1) = ? see below...





accept(3, {sin_family=AF_INET, sin_port=htons(35464), sin_addr=inet_addr("127.0.0.1")}}, [16]) = 1

getpeername(1, {sin_family=AF_INET, sin_port=htons(35464), sin_addr=inet_addr("127.0.0.1")}}, [16]) = 0

time(NULL)                              = 1090390017

open("/etc/localtime", O_RDONLY)        = 4

fstat64(4, {st_mode=S_IFREG|0644, st_size=1074, ...}) = 0

old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40015000

read(4, "TZifrr"..., 4096) = 1074

close(4)                                = 0

munmap(0x40015000, 4096)                = 0

fcntl64(1, F_GETFL)                     = 0x2 (flags O_RDWR)

fstat64(1, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0

old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40015000

_llseek(1, 0, 0xbffff4c0, SEEK_CUR)     = -1 ESPIPE (Illegal seek)

munmap(0x40015000, 4096)                = 0

write(1, "newLISP v8.0.12 Copyright (c) 20"..., 70) = 70

ioctl(1, SNDCTL_TMR_TIMEBASE, 0xbffff650) = -1 EINVAL (Invalid argument)

write(1, "n> ", 3)                     = 3

read(1, "r", 1)                        = 1

read(1, "n", 1)                        = 1

write(1, "n> ", 3)                     = 3

read(1, "(", 1)                         = 1

read(1, "e", 1)                         = 1

read(1, "x", 1)                         = 1

read(1, "i", 1)                         = 1

read(1, "t", 1)                         = 1

read(1, ")", 1)                         = 1

read(1, "r", 1)                        = 1

read(1, "n", 1)                        = 1

close(1)                                = 0

close(1)                                = -1 EBADF (Bad file descriptor)

accept(3,

bash-2.05b$

bash-2.05b$ fg

strace newlisp -d 5001

0x8068bf8, [4294967295])      = -1 EINVAL (Invalid argument)

getpeername(-1, 0xbffff630, [16])       = -1 EBADF (Bad file descriptor)

time(NULL)                              = 1090390074

semget(1, 0, 0x5|04)                    = -1 ENOSYS (Function not implemented)

_exit(1)                                = ?
Title:
Post by: Lutz on July 21, 2004, 09:26:06 AM
Thanks Norman, when restarting the server the accept() call seems to fail on slackware, perhaps because the listen socket is invalid.



In http://newlisp.org/downloads/development/Norman/



you find a version which tries to reopen the port for listening.



Lutz



ps: perhaps Steve knows how to use CreateProcess in Win32 ?
Title:
Post by: newdep on July 21, 2004, 03:14:01 PM
Hi Lutz,



just tested the 8.0.12Norman release but unfortunatly no change.. here

is the trace...hope it gives a little insight...



second connect is refuced and the daemon exits...strange it is...





accept(3, {sin_family=AF_INET, sin_port=htons(32819), sin_addr=inet_addr("127.0.0.1")}}, [16]) = 1

getpeername(1, {sin_family=AF_INET, sin_port=htons(32819), sin_addr=inet_addr("127.0.0.1")}}, [16]) = 0

time(NULL)                              = 1090447188

open("/etc/localtime", O_RDONLY)        = 4

fstat64(4, {st_mode=S_IFREG|0644, st_size=1074, ...}) = 0

old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40015000

read(4, "TZifrr"..., 4096) = 1074

close(4)                                = 0

munmap(0x40015000, 4096)                = 0

fcntl64(1, F_GETFL)                     = 0x2 (flags O_RDWR)

fstat64(1, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0

old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40015000

_llseek(1, 0, 0xbffff4a0, SEEK_CUR)     = -1 ESPIPE (Illegal seek)

munmap(0x40015000, 4096)                = 0

write(1, "newLISP v8.0.12 Copyright (c) 20"..., 70) = 70

ioctl(1, SNDCTL_TMR_TIMEBASE, 0xbffff630) = -1 EINVAL (Invalid argument)

write(1, "n> ", 3)                     = 3

read(1, "(", 1)                         = 1

read(1, "e", 1)                         = 1

read(1, "x", 1)                         = 1

read(1, "i", 1)                         = 1

read(1, "t", 1)                         = 1

read(1, ")", 1)                         = 1

read(1, "r", 1)                        = 1

read(1, "n", 1)                        = 1

close(1)                                = 0

accept(3, 0x8068d38, [4294967295])      = -1 EINVAL (Invalid argument)

socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 1

bind(1, {sin_family=AF_INET, sin_port=htons(5001), sin_addr=inet_addr("0.0.0.0")}}, 16) = -1 EADDRINUSE (Address already in use)

close(1)                                = 0

semget(1, 0, 0x5|04)                    = -1 ENOSYS (Function not implemented)

_exit(1)                                = ?

[1]+  Exit 1                  newlisp -d 5000

bash-2.05b$
Title:
Post by: Lutz on July 21, 2004, 04:29:20 PM
Strange, it will not let me do accept() on the old listen socket, but also not let me bind the new one -> "Address already in use".



I will keep on trying ...



Lutz
Title:
Post by: Lutz on July 23, 2004, 06:23:58 AM
The version 8.0.14 in http://newlisp.org/downloads/development/ fixes this. tested on MinGW, Debian, ReadHat, Mandrake, FreeBSD, OpenBSD, Mac OSX, AMD64 and Solaris. I am confident with Norman testing it, we can add Slackware to the list.



The bug wasn't very sophisticated, just uninitialized data structures (shame on me ;-) ). On the Sourceforge compiler farm I could find an OS which also showed the problem and then was able to fix it.



Lutz
Title:
Post by: newdep on July 23, 2004, 07:46:09 AM
Hello Lutz,



Thanks for the fix en enhancements,

its running now under slackware ;-) Great...





Regards, Norman.