Hi Lutz,
Running Newlisp 8009.
Segmentation Fault occeurs when connecting to Newlisp when its running
in Daemon Mode.
Example #1:
bash-2.05b$ newlisp -L -l -d 50000 &
[1] 705
--- Now I telnet to "localhost 50000"
--- Newlisp prompt
--- >(exit)
[1]+ Segmentation fault newlisp -L -l -d 50000
bash-2.05b$
Example #2:
bash-2.05b$ newlisp -L -l -d 50001 &
[1] 713
--- Now I telnet to "localhost 50001"
--- Newlisp prompt
--- >(exit)
bash-2.05b$ fg
newlisp -L -l -d 50001
Segmentation fault
bash-2.05b$
Hope you can catch the bugger...
Norman...
Hello Lutz,
Also in version 8.0.10 the Segmentation fault occeurs on the daemon site.
When Newlisp is running in -d mode (not in -p mode)
and the remote client connects and ONLY types (exit) on the first prompt
then NewLisp dumps with Segmentation Fault.
When the client presses first ENTER and THEN types the (exit) ..its oke..
(little issue i think)
Norman.
This bug has been there for a long time and does not occur on Win32 and BSD. It is not limited to doing an (exit) right away, but can also occur in other circumstances.
If you have any idea how to fix this, help would be appreciated ;)
Lutz
ps: use only one of -L or -l options, if you use both the last one will win
Works fine using the Linux 2.6.x kernel. Cannot remember the last digit. What kernel are you using?
Eddie
I am not sure, have to check, whatever Mandrake 9.2 uses. Sometimes you have to exit and reconnect from the client several times to provoke the error, as if it is some timing problem.
Lutz
Ok. I see. After two tries I got the segment fault.
Mandrake's latest kernel is probably 2.4.x. I think all stable distributions except maybe turbo Linux use this kernel. On install, you can choose the 2.2.x or the 2.4.x kernel, it defaults to 2.2.x.
I'm using Debian testing. For a client this is ok. For the server side I would talk to someone who deals with security. However, I've noticed everything is much faster with the 2.6 kernel. EVERYTHING!
I've run 2.2.x (Mandrake), 2.4.x (Debian), FreeBSD, NetBSD and 2.6.x (Debian) on this same machine (AMD 2400+ with 512M memory). I noticed that the 2.6.x kernel has a much snappier response than 2.4.x and even FreeBSD. I wonder if that holds as well on Intel machines?
I like FreeBSD for a server and 2.6.x as a client.
Eddie
Im running here slackwre 2.4.20 / 2.4.26...
Lets see if i can find anything rlated to this issue in the code....
Regards, Norman.
Oke doing some tracing on my linux machine ..here is the output of the 1st oke and the 2nd dumps..
It stops after the filecontrol64 (fcntl64) the second time. this is 1 daemon sessions btw...
Looks like there is filecontrol happening on a variable wich isnt there..
Perhpas you see it quicker ;-)
### FIRST ###
accept(3, {sin_family=AF_INET, sin_port=htons(34398), sin_addr=inet_addr("127.0.0.1")}}, [16]) = 1
getpeername(1, {sin_family=AF_INET, sin_port=htons(34398), sin_addr=inet_addr("127.0.0.1")}}, [16]) = 0
time(NULL) = 1090333867
open("/etc/localtime", O_RDONLY) = 4
fstat64(4, {st_mode=S_IFREG|0644, st_size=1074, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40015000
read(4, "TZif r r "..., 4096) = 1074
close(4) = 0
munmap(0x40015000, 4096) = 0
fcntl64(1, F_GETFL) = 0x2 (flags O_RDWR)
fstat64(1, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40015000
_llseek(1, 0, 0xbffff570, SEEK_CUR) = -1 ESPIPE (Illegal seek)
munmap(0x40015000, 4096) = 0
write(1, "newLISP v8.0.10 Copyright (c) 20"..., 70) = 70
ioctl(1, SNDCTL_TMR_TIMEBASE, 0xbffff700) = -1 EINVAL (Invalid argument)
write(1, "n> ", 3) = 3
read(1, "(", 1) = 1
read(1, "e", 1) = 1
read(1, "x", 1) = 1
read(1, "i", 1) = 1
read(1, "t", 1) = 1
read(1, ")", 1) = 1
read(1, "r", 1) = 1
read(1, "n", 1) = 1
close(1) = 0
old_mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40015000
write(-1, "n", 1) = -1 EBADF (Bad file descriptor)
close(1) = -1 EBADF (Bad file descriptor)
### SECOND ###
accept(3, {sin_family=AF_INET, sin_port=htons(34399), sin_addr=inet_addr("127.0.0.1")}}, [16]) = 1
getpeername(1, {sin_family=AF_INET, sin_port=htons(34399), sin_addr=inet_addr("127.0.0.1")}}, [16]) = 0
time(NULL) = 1090333877
fcntl64(1, F_GETFL) = 0x2 (flags O_RDWR)
fstat64(1, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40017000
_llseek(1, 0, 0xbffff570, SEEK_CUR) = -1 ESPIPE (Illegal seek)
munmap(0x40017000, 4096) = 0
write(1, "n> ", 3) = 3
read(1, "(", 1) = 1
read(1, "e", 1) = 1
read(1, "x", 1) = 1
read(1, "i", 1) = 1
read(1, "t", 1) = 1
read(1, ")", 1) = 1
read(1, "r", 1) = 1
read(1, "n", 1) = 1
close(1) = 0
old_mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40017000
write(-1, "n> ", 3) = -1 EBADF (Bad file descriptor)
close(1) = -1 EBADF (Bad file descriptor)
accept(3, {sin_family=AF_INET, sin_port=htons(34400), sin_addr=inet_addr("127.0.0.1")}}, [16]) = 1
getpeername(1, {sin_family=AF_INET, sin_port=htons(34400), sin_addr=inet_addr("127.0.0.1")}}, [16]) = 0
time(NULL) = 1090333887
fcntl64(1, F_GETFL) = 0x2 (flags O_RDWR)
--- SIGSEGV (Segmentation fault) ---
+++ killed by SIGSEGV +++
thanks for the trace Norman, I think I found the problem
Lutz
Version 8.0.11 in http://newlisp.org/downloads/development/ solves this problem.
Lutz
Luts thanx for the quick fix...but now in rel 8.0.11 the -d function does not daemon anymore.. drops out after 1 connection exits...
Now it looks like when the clients (exit) from the daemon the daemon re-binds
to the port too quickly...because it has closed it..and fails and exits...
But your right, the segmentation is gone ;-)
PS: perhpas 1 hint for enhancement, when 1 client is connected you can close the "listener" so no more clients can connect and you will keep your current session... that way you dont "pressure" newlisp on the sockets...and siply re-open the listener when the client has exit...
This is crazy, nn my system Mandrake Linux 9.2 with kernel 2.4.22 it is working Ok.
I then added
deleteInetSession(sock);
close(sock);
after:
connection = accept(sock, (struct sockaddr *) &dest_sin, &dest_sin_len);
in the function: FILE * serverFD(int port, int reconnect) in file nl-sock.c closing the listen socket after accepting a connection, and now it also exits right away on my side and also breaks it on BSD, which all doesn't make much sense :(
Lutz
Norman, I wonder if this http://newlisp.org/downloads/development/Norman/
make any difference in the -d mode on your Linux system?
Lutz
Hi Lutz,
Did a quick test but no changes...although in -d mode after the first connect
from the client and disconnect the daemon also quits...(nicely) but does not run as daemon anymore...
Norman.
Doesn't break on Debian 2.6.x
Eddie
This is hard to fix for me as I cannot reproduce it on my Linux installation, can you find out where it exits with a trace?
The only exit I see in the program ogic is when the accept() call falls through and leaves the 'connection' variable with a NULL, but then you would get a message: "newLISP server setup on port xxx failed" and you don't seem to get this. So I think is bombing out somewhere?
Lutz
thanks for testing Eddie, on my side I am running Kernel 2.4.22 on Mandrake 9.2. Norman, what Linux are you running?
Lutz
ps: you mentioned it already Norman: slackware 2.4
Hello Lutz,
Running linux 2.4.20
here is my trace, this time it pops-out at "semget" or the exit(1) = ? see below...
accept(3, {sin_family=AF_INET, sin_port=htons(35464), sin_addr=inet_addr("127.0.0.1")}}, [16]) = 1
getpeername(1, {sin_family=AF_INET, sin_port=htons(35464), sin_addr=inet_addr("127.0.0.1")}}, [16]) = 0
time(NULL) = 1090390017
open("/etc/localtime", O_RDONLY) = 4
fstat64(4, {st_mode=S_IFREG|0644, st_size=1074, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40015000
read(4, "TZif r r "..., 4096) = 1074
close(4) = 0
munmap(0x40015000, 4096) = 0
fcntl64(1, F_GETFL) = 0x2 (flags O_RDWR)
fstat64(1, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40015000
_llseek(1, 0, 0xbffff4c0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
munmap(0x40015000, 4096) = 0
write(1, "newLISP v8.0.12 Copyright (c) 20"..., 70) = 70
ioctl(1, SNDCTL_TMR_TIMEBASE, 0xbffff650) = -1 EINVAL (Invalid argument)
write(1, "n> ", 3) = 3
read(1, "r", 1) = 1
read(1, "n", 1) = 1
write(1, "n> ", 3) = 3
read(1, "(", 1) = 1
read(1, "e", 1) = 1
read(1, "x", 1) = 1
read(1, "i", 1) = 1
read(1, "t", 1) = 1
read(1, ")", 1) = 1
read(1, "r", 1) = 1
read(1, "n", 1) = 1
close(1) = 0
close(1) = -1 EBADF (Bad file descriptor)
accept(3,
bash-2.05b$
bash-2.05b$ fg
strace newlisp -d 5001
0x8068bf8, [4294967295]) = -1 EINVAL (Invalid argument)
getpeername(-1, 0xbffff630, [16]) = -1 EBADF (Bad file descriptor)
time(NULL) = 1090390074
semget(1, 0, 0x5|04) = -1 ENOSYS (Function not implemented)
_exit(1) = ?
Thanks Norman, when restarting the server the accept() call seems to fail on slackware, perhaps because the listen socket is invalid.
In http://newlisp.org/downloads/development/Norman/
you find a version which tries to reopen the port for listening.
Lutz
ps: perhaps Steve knows how to use CreateProcess in Win32 ?
Hi Lutz,
just tested the 8.0.12Norman release but unfortunatly no change.. here
is the trace...hope it gives a little insight...
second connect is refuced and the daemon exits...strange it is...
accept(3, {sin_family=AF_INET, sin_port=htons(32819), sin_addr=inet_addr("127.0.0.1")}}, [16]) = 1
getpeername(1, {sin_family=AF_INET, sin_port=htons(32819), sin_addr=inet_addr("127.0.0.1")}}, [16]) = 0
time(NULL) = 1090447188
open("/etc/localtime", O_RDONLY) = 4
fstat64(4, {st_mode=S_IFREG|0644, st_size=1074, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40015000
read(4, "TZif r r "..., 4096) = 1074
close(4) = 0
munmap(0x40015000, 4096) = 0
fcntl64(1, F_GETFL) = 0x2 (flags O_RDWR)
fstat64(1, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40015000
_llseek(1, 0, 0xbffff4a0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
munmap(0x40015000, 4096) = 0
write(1, "newLISP v8.0.12 Copyright (c) 20"..., 70) = 70
ioctl(1, SNDCTL_TMR_TIMEBASE, 0xbffff630) = -1 EINVAL (Invalid argument)
write(1, "n> ", 3) = 3
read(1, "(", 1) = 1
read(1, "e", 1) = 1
read(1, "x", 1) = 1
read(1, "i", 1) = 1
read(1, "t", 1) = 1
read(1, ")", 1) = 1
read(1, "r", 1) = 1
read(1, "n", 1) = 1
close(1) = 0
accept(3, 0x8068d38, [4294967295]) = -1 EINVAL (Invalid argument)
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 1
bind(1, {sin_family=AF_INET, sin_port=htons(5001), sin_addr=inet_addr("0.0.0.0")}}, 16) = -1 EADDRINUSE (Address already in use)
close(1) = 0
semget(1, 0, 0x5|04) = -1 ENOSYS (Function not implemented)
_exit(1) = ?
[1]+ Exit 1 newlisp -d 5000
bash-2.05b$
Strange, it will not let me do accept() on the old listen socket, but also not let me bind the new one -> "Address already in use".
I will keep on trying ...
Lutz
The version 8.0.14 in http://newlisp.org/downloads/development/ fixes this. tested on MinGW, Debian, ReadHat, Mandrake, FreeBSD, OpenBSD, Mac OSX, AMD64 and Solaris. I am confident with Norman testing it, we can add Slackware to the list.
The bug wasn't very sophisticated, just uninitialized data structures (shame on me ;-) ). On the Sourceforge compiler farm I could find an OS which also showed the problem and then was able to fix it.
Lutz
Hello Lutz,
Thanks for the fix en enhancements,
its running now under slackware ;-) Great...
Regards, Norman.