Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - Thorstein

#1
Aha!:

"Octals start with an optional + (plus) or - (minus) sign and a 0 (zero), followed by any combination of the octal digits: 01234567. Any other character ends the octal number. Only up to 21 octal digits are valid and any more digits are ignored."



Thanks.
#2
This behavior was a little unexpected.



newLISP v.10.7.5 64-bit on Linux IPv4/6 UTF-8 libffi, options: newlisp -h



> (read-expr "08")

0

> (read-expr "07")

7

> (read-expr "06")

6

> (read-expr "09")

0

> (eval-string "08")

8

> (eval-string "09")

9

> (read-expr "08")

0

> $count

1

> (read-expr "07")

7

> $count

2

>
#3
Thanks, fdb, but I don't want utf-8 chars; I need a byte stream.



But I think I've found a major source of my confusion, so I'll mark this thread "SOLVED":



(char "x") => a Unicode code-point.  This should perhaps have been obvious to me, but "code-point" is not mentioned in the Manual. Consequently my helper function


(define (byte s
   (i 0)  )
  (char s i true)
  )
;; returns a utf-8 char
(byte 218)
"Ú"
;; and even though the code-point for "Ú" is 218
(char "Ú")
218
;; the encoding of the code-point is of length 2 !!
(length (byte 218))
2  
;; so, confusingly,
(char (byte 218))
218                      ;; the code-point is one byte long
;; but
(byte (byte 218))
195                    


where 195 is the first byte of the 2-byte code-point encoding.



It appears my helper function should have been this:


(define (byte x)
  (if (number? x)
      (pack "b" x)
      (char x 0 true)
      )
  )
;; and now
(byte (byte 218))
218


Said differently, while (char) reciprocally translates one-byte code-points in the lower ascii range to one-byte chars, (char) does not do so for code-points in the range 0x80 -0xff and beyond.



My code now uses (slice stringx n 1) to fetch a byte and the revised (byte) to reciprocally transform 1-byte chars.  It never directly calls (char), and this appears to be working. Yay!
#4
[See solution in thread below.]



I'm trying to implement several versions of the Lempel-Ziv-x and Snappy compression algorithms.  Ordinarily, I like to get my logic straight in Lisp, and then, if I need the speed, I'll port the tight loops to a C library.  In this case, however, NEWLisp has been atypically difficult to debug.  I wonder if there are some simple code patterns I'm overlooking.



It would, of course, be simpler to use a non-UTF-8 enabled build of NEWLisp, but I want to compress UTF-8 strings that I'm processing within NEWLisp.



So given a UTF-8 string us, I understand that (slice us i 1) will give me an 8-bit "char". I also found that defining


(define (byte s
   (i 0)  )
  (char s i true)
  )

helped in some situations. But then I ran into problems trying to unpack a code like 32765 into two bytes.  In the following examples  I thought I could use the following for the low byte of 253.

> (mod 32765 256)
253

;; but
> (byte (mod 32765 256)) 
ý

;; and
>(byte (byte (mod 32765 256)))
195


And while, as mentioned above, the following use of (char) looks ok

>(char (char (mod 32765 256)))
253

>(char (mod 32765 256))
"ý"

>(length "ý")
2

the UTF-8 char length messes with the byte discipline of the compression algorithms.



At last, I found that (pack) can work:

>(pack "b" (& 32765 0xff))
"�"

;; and
> (byte (pack "b" (& 32765 0xff)))
253

;; (and for the high byte):
>(byte (pack "b" (/ 32765 256)))
127


But, a little confusingly, there were still some gotchas.  For example, (pack) doesn't work with (mod):

> (byte (pack "b"  (mod 32765 256)))
16

So, long story short, I've got these manipulations more-or-less working, but I wonder if there's a more direct way to manipulate such bytes and 8-bit chars??
#5
Thanks, Lutz!  I found the latest UTF8 build.  That is doing the trick.  (That and RTFM! :-/ ).



And many thanks for this great Lisp!  (And for the great documentation.)
#6
Running on windows,



(exec)'ing Google Translate returns the following str inside a JSON container:



(a) "Nous allons habiller pour la randonnée, selon la météo."



However, somewhere in the process of a (string str "") or (replace x str y) the str begins to (println) as this:



(b) "Nous allons habiller pour la randonn195169e, selon la m195169t195169o. nn</td></tr>"



How can I convert (b) to (a) so I can (println) (a) to a static HTML file?



Do I have to make a unicode build?