integer from string

Started by Sammo, February 11, 2004, 10:12:31 AM

Previous topic - Next topic

Sammo

examples 2) and 4) seem counter-intuitive



1)  (integer "9") --> 9

2)  (integer "09") --> 0

3)  (integer "-9") --> -9

4)  (integer "-09") --> 0

nigelbrown

#1
Leading 0's should be OK as it keeps the symmetry of reading what you write with %05d where the 0 says pad with leading zeros - this padding is used in the txt2pdf code for example

viz

> (format "%05d" -1)

"-0001"

> (integer (format "%05d" -1))

-1

>



However the 9 seems to be broken viz

> (integer "01")

1

> (integer "09")

0

>

and further

> (dotimes (i 20) (print "<" (integer (format "%05d" (integer i))) ">"))

<0><1><2><3><4><5><6><7><0><0><8><9><10><11><12><13><14><15><1><1>">"

>

while

> (dotimes (i 20) (print "<" (integer i) ">"))

<0><1><2><3><4><5><6><7><8><9><10><11><12><13><14><15><16><17><18><19>">"

>

Lutz?



Nigel

Sammo

#2
Aha! It's looking like the bite of octal formatting described under "Evaluating newLISP Expressions" in which octal values are described as being prefixed by 0 (zero).  Since 8 and 9 aren't valid octal digits, (integer "09") would be interpreted much as (integer "15b") --> 15 would be in the decimal base.  And then it also makes sense that (integer "010") --> 8.



Thanks, Nigel, for the insight!

Lutz

#3
the 'C' function strtol() is used for conversion, it also takes hex i.e:



(integer "0xff") => 255



Values bigger than the maximum integer go to the maximum integer:



(integer 10e20) => 2147483647



or minimum integer:



(integer -10e20) => -2147483648



Lutz

ps: all this is also mentioned in the manual

nigelbrown

#4
Yes, it is all clear in the manual, apologies to Lutz for the 'broken' comment.



Perhaps an (integer form could be (integer string defaultexpr base) to be as flexible as the real strtol() as in strtol the parameter base allows forcing "0009" to be 9 by specifying base as 10 viz the C code



#include <stdlib.h>

#include <stdio.h>

int main() {

   char number[] = { '0', '0','0','9' ,' '};



   printf( "base is 0  -> %ld n" , strtol(number, NULL,0) );

   printf( "base is 10 -> %ld n" , strtol(number, NULL,10) );   

   return(0);

}



prints

base is 0  -> 0

base is 10 -> 9



Nigel



PS I had to look up the details of strtol:

  strtol() and strtoll()

     The strtol() function converts the initial  portion  of  the

     string pointed to by str to a type long int representation.



     The strtoll() function converts the initial portion  of  the

     string pointed to by str to a type long long representation.



     Both functions first decompose the input string  into  three

     parts:  an  initial, possibly empty, sequence of white-space

     characters (as specified by isspace(3C)); a subject sequence

     interpreted  as  an integer represented in some radix deter-

     mined by the value of base; and a final  string  of  one  or

     more unrecognized characters, including the terminating null

     byte of the input string. They then attempt to  convert  the

     subject sequence to an integer and return the result.



     If the value of base is 0, the expected form of the  subject

     sequence  is  that  of a decimal constant, octal constant or

     hexadecimal constant, any of which may be preceded by a + or

     - sign. A decimal constant begins with a non-zero digit, and

     consists of a sequence of decimal digits. An octal  constant

     consists  of  the prefix 0 optionally followed by a sequence

     of the digits 0 to 7 only. A hexadecimal  constant  consists

     of the prefix 0x or 0X followed by a sequence of the decimal

     digits and letters a (or A) to f (or F) with values 10 to 15

     respectively.



     If the value of base is between 2 and 36, the expected  form

     of  the subject sequence is a sequence of letters and digits

     representing an integer with the radix  specified  by  base,

     optionally preceded by a + or - sign. The letters from a (or

     A) to z (or Z) inclusive are ascribed the values 10  to  35;

     only  letters  whose  ascribed  values are less than that of

     base are permitted. If the value of base is 16, the  charac-

     ters 0x or 0X may optionally precede the sequence of letters

     and digits, following the sign if present.



     The subject sequence  is  defined  as  the  longest  initial

     subsequence  of  the  input  string, starting with the first

     non-white-space character, that is of the expected form. The

     subject  sequence contains no characters if the input string

     is empty or consists entirely of white-space characters,  or

     if  the first non-white-space character is other than a sign

     or a permissible letter or digit.



     If the subject sequence has the expected form and the  value

     of  base  is 0, the sequence of characters starting with the

     first digit is interpreted as an integer  constant.  If  the

     subject sequence has the expected form and the value of base

     is between 2 and 36, it is used as the base for  conversion,

     ascribing  to  each  letter its value as given above. If the

     subject sequence begins with a minus sign, the value result-

     ing  from  the conversion is negated. A pointer to the final

     string is stored in the object pointed to  by  endptr,  pro-

     vided that endptr is not a null pointer.



     In other than the POSIX locale,  additional  implementation-

     dependent subject sequence forms may be accepted.



     If the subject sequence  is  empty  or  does  not  have  the

     expected  form, no conversion is performed; the value of str

     is stored in the object pointed to by endptr, provided  that

     endptr is not a null pointer.

Lutz

#5
With just a line or two of code I could add the additional parameter for the number base in 'integer':



(integer "1111" 0 2) => 15



turns out you still have to specify at least a numericcal digit or 0x before the numbers:



(integer "ff" "won't work" 16) => "won't work"



(integer "0xff" "won't work" 16) => 255

(integer "0ff" "won't work" 16) => 255



but I think its Ok , it also makes sense, because if not you would take text far to often as a number. So the rule: "must start with +/- or a digit" always applies.



and its fine for octals without the leading 0:



(integer "77" 0 8) => 63





Also regarding (seek 0). After compiling on Linux it turns out, that it also returns a '-1'. So far only on BSD 'ftell(stdout)' will report the number of characters printed. Haven't tried Solaris and Mac OSX (BSD based) yet.



Reading the section in the GNU libC manual about seek/ftell,  I wonder why it wouldn't work on LINUX.



Lutz

nigelbrown

#6
As part of an infix evaluator I'm doing I revisited the newLISP conversion fns comparing them to the underlying C (such as strtol) -

strtol can optionally return the remaining unevaluated portion of the string.

Currently when you do (integer or (float you don't get the unconverted string fragment (in say $0)



You can do this with parse viz:



> (define (myfloat s) (begin (setq $0 (join (rest (parse s)))) (float s)))

(lambda (s)

 (begin

  (setq $0 (join (rest (parse s))))

  (float s)))

> (myfloat "1.3D+3")

1.3

> $0

"D+3"

> (myfloat "1.3D+3/1.2*7")

1.3

> $0

"D+3/1.2*7"

>



I don't know what people think about making this a feature of (float (integer - I'm happy with (myfloat but I thought I'd float the idea (no pun intended).

Lutz

#7
I just tried it (was only one additional line of code) but it more than doubles the time the function needs, from 468 to 1000 micro seconds, for a feature used relatively infrequent, so I didn't do it.



Lutz