read-line optimisation

Started by Astrobe, March 04, 2014, 12:42:09 AM

Previous topic - Next topic

Astrobe

Hello,



While programming a little tool that analyses some log file, I've noticed that read-line is a bit slower than one would expect. Looking at its code, it appeared to me reading the stream char-by-char could be the cause. I Modified it to use fgets instead:



char * readStreamLine(STREAM * stream, FILE * inStream)
{
char buf[MAX_STRING];
size_t l;

openStrStream(stream, MAX_STRING, 1);

if(fgets(buf, MAX_STRING, inStream)!=NULL)
{
l=strlen(buf);
if(buf[l-1]==0x0A)
        {
             buf[--l]=0;
             if(buf[l-1]==0x0D) buf[--l]=0;
        }
writeStreamStr(stream, buf, l);
return(stream->buffer);
}
else
{
if(feof(inStream)) clearerr(inStream);
return NULL;
}
}


(this hasn't been heavily tested)



However, it doesn't strictly respects the original semantics of read-line with regards to newline characters. To be honest, I don't understand why they are that way, in particular why there is a requirement that a newline at the end of the file has to be erased.



Also, the part about the TRU64 is missing. I don't know if fgets handles EINTR correctly by itself on this platform. I've worked with systems plagued with a similar illness before, and unfortunately the FILE library (which was not standard IIRC) didn't handle very it well.



On the performance side, timings drop from 250ms to 50ms.

Lutz

#1
Thanks Astrobe. At the moment I don't recall why  read-line was coded using fgetc() and not fgets(), but in the past a lot of problems occurred with read-line, using it on different OSs and for CGI in conjunction with different client web browsers on the web and using sockets on Unix as file handles and also when using pipes. So this change will need a lot of testing, but the speed improvement is certainly worth it.

Lutz

#2
I just realize that your version limits the line length to MAX_STRING, readStreamLine() should be able too read any line length.

Lutz

#3
Seems to pass all tests:



http://www.newlisp.org/downloads/development/inprogress/">http://www.newlisp.org/downloads/develo ... nprogress/">http://www.newlisp.org/downloads/development/inprogress/



Linux, Windows, OSX and FreeBSD seem to be fine. Gains are biggest on Linux and on longer lines than usually found in text files. For TRU64 the old method has been left, as I cannot test it.