integer from string

Sammo · February 11, 2004, 10:12:31 AM

examples 2) and 4) seem counter-intuitive

1) (integer "9") --> 9

2) (integer "09") --> 0

3) (integer "-9") --> -9

4) (integer "-09") --> 0

nigelbrown · February 11, 2004, 12:39:55 PM

Leading 0's should be OK as it keeps the symmetry of reading what you write with %05d where the 0 says pad with leading zeros - this padding is used in the txt2pdf code for example

viz

> (format "%05d" -1)

"-0001"

> (integer (format "%05d" -1))

-1

>

However the 9 seems to be broken viz

> (integer "01")

1

> (integer "09")

0

>

and further

> (dotimes (i 20) (print "<" (integer (format "%05d" (integer i))) ">"))

<0><1><2><3><4><5><6><7><0><0><8><9><10><11><12><13><14><15><1><1>">"

>

while

> (dotimes (i 20) (print "<" (integer i) ">"))

<0><1><2><3><4><5><6><7><8><9><10><11><12><13><14><15><16><17><18><19>">"

>

Lutz?

Nigel

Sammo · February 11, 2004, 01:54:41 PM

Aha! It's looking like the bite of octal formatting described under "Evaluating newLISP Expressions" in which octal values are described as being prefixed by 0 (zero). Since 8 and 9 aren't valid octal digits, (integer "09") would be interpreted much as (integer "15b") --> 15 would be in the decimal base. And then it also makes sense that (integer "010") --> 8.

Thanks, Nigel, for the insight!

Lutz · February 11, 2004, 03:49:11 PM

the 'C' function strtol() is used for conversion, it also takes hex i.e:

(integer "0xff") => 255

Values bigger than the maximum integer go to the maximum integer:

(integer 10e20) => 2147483647

or minimum integer:

(integer -10e20) => -2147483648

Lutz

ps: all this is also mentioned in the manual

nigelbrown · February 11, 2004, 05:14:39 PM

Yes, it is all clear in the manual, apologies to Lutz for the 'broken' comment.

Perhaps an (integer form could be (integer string defaultexpr base) to be as flexible as the real strtol() as in strtol the parameter base allows forcing "0009" to be 9 by specifying base as 10 viz the C code

#include <stdlib.h>

#include <stdio.h>

int main() {

   char number[] = { '0', '0','0','9' ,' '};

   printf( "base is 0 -> %ld n" , strtol(number, NULL,0) );

   printf( "base is 10 -> %ld n" , strtol(number, NULL,10) );

   return(0);

}

prints

base is 0 -> 0

base is 10 -> 9

Nigel

PS I had to look up the details of strtol:

strtol() and strtoll()

The strtol() function converts the initial portion of the

string pointed to by str to a type long int representation.

The strtoll() function converts the initial portion of the

string pointed to by str to a type long long representation.

Both functions first decompose the input string into three

parts: an initial, possibly empty, sequence of white-space

characters (as specified by isspace(3C)); a subject sequence

interpreted as an integer represented in some radix deter-

mined by the value of base; and a final string of one or

more unrecognized characters, including the terminating null

byte of the input string. They then attempt to convert the

subject sequence to an integer and return the result.

If the value of base is 0, the expected form of the subject

sequence is that of a decimal constant, octal constant or

hexadecimal constant, any of which may be preceded by a + or

- sign. A decimal constant begins with a non-zero digit, and

consists of a sequence of decimal digits. An octal constant

consists of the prefix 0 optionally followed by a sequence

of the digits 0 to 7 only. A hexadecimal constant consists

of the prefix 0x or 0X followed by a sequence of the decimal

digits and letters a (or A) to f (or F) with values 10 to 15

respectively.

If the value of base is between 2 and 36, the expected form

of the subject sequence is a sequence of letters and digits

representing an integer with the radix specified by base,

optionally preceded by a + or - sign. The letters from a (or

A) to z (or Z) inclusive are ascribed the values 10 to 35;

only letters whose ascribed values are less than that of

base are permitted. If the value of base is 16, the charac-

ters 0x or 0X may optionally precede the sequence of letters

and digits, following the sign if present.

The subject sequence is defined as the longest initial

subsequence of the input string, starting with the first

non-white-space character, that is of the expected form. The

subject sequence contains no characters if the input string

is empty or consists entirely of white-space characters, or

if the first non-white-space character is other than a sign

or a permissible letter or digit.

If the subject sequence has the expected form and the value

of base is 0, the sequence of characters starting with the

first digit is interpreted as an integer constant. If the

subject sequence has the expected form and the value of base

is between 2 and 36, it is used as the base for conversion,

ascribing to each letter its value as given above. If the

subject sequence begins with a minus sign, the value result-

ing from the conversion is negated. A pointer to the final

string is stored in the object pointed to by endptr, pro-

vided that endptr is not a null pointer.

In other than the POSIX locale, additional implementation-

dependent subject sequence forms may be accepted.

If the subject sequence is empty or does not have the

expected form, no conversion is performed; the value of str

is stored in the object pointed to by endptr, provided that

endptr is not a null pointer.

Lutz · February 12, 2004, 07:59:02 AM

With just a line or two of code I could add the additional parameter for the number base in 'integer':

(integer "1111" 0 2) => 15

turns out you still have to specify at least a numericcal digit or 0x before the numbers:

(integer "ff" "won't work" 16) => "won't work"

(integer "0xff" "won't work" 16) => 255

(integer "0ff" "won't work" 16) => 255

but I think its Ok , it also makes sense, because if not you would take text far to often as a number. So the rule: "must start with +/- or a digit" always applies.

and its fine for octals without the leading 0:

(integer "77" 0 8) => 63

Also regarding (seek 0). After compiling on Linux it turns out, that it also returns a '-1'. So far only on BSD 'ftell(stdout)' will report the number of characters printed. Haven't tried Solaris and Mac OSX (BSD based) yet.

Reading the section in the GNU libC manual about seek/ftell, I wonder why it wouldn't work on LINUX.

Lutz

nigelbrown · March 14, 2004, 03:52:21 PM

As part of an infix evaluator I'm doing I revisited the newLISP conversion fns comparing them to the underlying C (such as strtol) -

strtol can optionally return the remaining unevaluated portion of the string.

Currently when you do (integer or (float you don't get the unconverted string fragment (in say $0)

You can do this with parse viz:

> (define (myfloat s) (begin (setq $0 (join (rest (parse s)))) (float s)))

(lambda (s)

(begin

(setq $0 (join (rest (parse s))))

(float s)))

> (myfloat "1.3D+3")

1.3

> $0

"D+3"

> (myfloat "1.3D+3/1.2*7")

1.3

> $0

"D+3/1.2*7"

>

I don't know what people think about making this a feature of (float (integer - I'm happy with (myfloat but I thought I'd float the idea (no pun intended).

Lutz · March 14, 2004, 05:21:53 PM

I just tried it (was only one additional line of code) but it more than doubles the time the function needs, from 468 to 1000 micro seconds, for a feature used relatively infrequent, so I didn't do it.

Lutz

newLISP Fan Club

News:

integer from string

Sammo

nigelbrown

Sammo

Lutz

nigelbrown

Lutz

nigelbrown

Lutz