crc32

Started by Sammo, January 18, 2004, 02:47:24 PM

Previous topic - Next topic

Sammo

I translated the apparently straightforward crc32 algorithm available http://www.thoughtstuff.com/rme/crc.lisp">here.  A couple of questions:



From the original ...
(defun update-crc (crc buf)
  (declare (type (unsigned-byte 32) crc)
  (type (simple-array (unsigned-byte 8)) buf)
  (optimize speed))
  (setf crc (logxor crc #xffffffff))
  (dotimes (n (length buf))
    (let ((i (logand #xff (logxor crc (aref buf n)))))
      (setf crc (logxor (aref *crc-table* i) (ash crc -8)))))
  (logxor crc #xffffffff))
I wrote ...
(define (update-crc crc buf)
    (set 'crc (^ crc 0xFFFFFFFF))
    (dotimes (n (length buf))
        (set 'i (& 0xFF (^ crc (char (nth n buf)))))
        (set 'crc (^ (nth i CRC-TABLE) (<< crc 8))) )
    (^ crc 0xFFFFFFFF))
in which I interpreted '(ash crc -8)' to mean '(<< crc 8)'.  Did I get that right?  The original concludes with ...
(defun crc (buf)
    (update-crc 0 buf))

;;; output should be #xCBF43926
(defun test-crc ()
  (let ((a (make-array 9 :element-type '(unsigned-byte 8)
      :initial-contents '(#x31 #x32 #x33 #x34 #x35 #x36 #x37 #x38 #x39))))
    (crc a)))
from which I wrote ...
(define (crc buf)
    (update-crc 0 buf))

(define (test-crc)
    (crc "123456789"))
and I invoke this in the 'tk' environment with
(format "%08X" (crc "123456789"))
You'll notice that I'm computing the crc32 of a character string instead of an array of bytes. Assuming I translated the original's #x31 #x32 ... #x39 correctly into "123456789", I don't get the expected 0xCBF43926 result.

Lutz

#1
everythings seems to be right in your translation, this CRC-TABLE is huge! perhaps a typo in there?



the best way to debug this would be to write a 'C' version of this thing and then compare each individual xor/shift result in the loop. Or perhaps there some crc utilities available somewher to compare results?



Lutz

Lutz

#2
The left shift << in newLISP is a rotating shift: bits shifted out at the left come in in on the right side, to avoid this do a:





(<< (& 0x7FFFFFFF crc) 8) instaed of (<< crc 8)





this masks the most left bit to zero. I will mention this in the documentation.



Lutz

Lutz

#3
correction:



(<< (& 0x00FFFFFF crc) 8)



you might have to mask all 8 bits on the left, not only 1 as you are shifing 8 bits.



Lutz

Sammo

#4
Thank you, Lutz, for the hint.



After finding two crc32 programs with outputs that agreed with each other and one of them having Pascal or Delphi code, I learned that (ash crc -8) means (>> crc 8).  In the Pascal code this was written (CRC32 shr 8) but apparently doesn't drag the most significant bit along during the shift.  To accomplish this in newLISP, I wrote the necessary 8-bit right shift as a one-bit shift followed by a 7-bit shift after masking off the high-order bit. See the two '(set 'j ...) lines in the following:
(define (update-crc crc buf)
  (set 'crc (^ crc 0xFFFFFFFF))
  (dotimes (n (length buf))
   (set 'i (& 0xFF (^ crc (char (nth n buf)))))
   (set 'j (>> crc 1))
   (set 'j (>> (& 0x7FFFFFFF j) 7))
   (set 'crc (^ (nth i CRC-TABLE) j)))
  (^ crc 0xFFFFFFFF))
With these changes, this code matches the original's result for the "123456789" string and with every result generated by the two CRC32 programs I found on the web. Along the way I verified the 256 entries in the CRC-TABLE.



The code also works when written as:
(define (update-crc crc buf)
(set 'crc (^ crc 0xFFFFFFFF))
(dotimes (n (length buf))
(set 'i (& 0xFF (^ crc (char (nth n buf)))))
(set 'crc (^ (nth i CRC-TABLE) (>> (& 0x7FFFFFFF (>> crc 1)) 7))) )
(^ crc 0xFFFFFFFF) )

Lutz

#5
congratulations, another function for our collection, and I was wrong earlier: the left shift in newLISP is not a rotating shift but a normal left shift.



But as you found out, the right shift in newLISP copies the most left bitposition so:



1000 >> 1100 >> 1110 etc.



but



0100 >> 0010 >> 0001 etc.



Both >> and << rely internally on the << and >> 'C' operators



Also, instead of:  (char (nth n buff)) you can just do (char buff n) because the 'char' function can take an additional offset parameter into a buffer, this way 'char' can be used a 'character array' function.



Lutz

frontera000

#6
maby I am doing something wrong... but



> (format "%08x" (>> 0x8000000 1))

"04000000"



the manual seems to say that most significant bit should be duplicated.



Actually, this behavior is what I want, but I just wanted to make sure that what manual says is correct , but it does not do what it says.

Sammo

#7
Is the problem that newlisp integers are now 64 bits so that 0x80000000 does not represent a 1 in the most significant bit position? Wouldn't 0x8000000000000000 be a 1 followed by 63 zeros? Also, the format statement requires a different format string for 64-bit integers. In Windows, try:



(format "%I64x" (>> 0x8000000000000000 1))

--> "c000000000000000"



which shows that '>>' copies the most significant bit.

frontera000

#8
In Windows and Linux the printf() for 64 integers use different format strings, I64 in Windows ii in Linux, I think. Depending on format() to do hex conversion of integer values always bugged me, especially after all integers have become 64 bit in newLISP.



So I wrote a little thing:

(set 'hextab "0123456789abcdef")

(define (int->hexstr num)
(set 'hexstr "")
(for (i 0 63 4)
(push (hextab (& (>> num i) 0xf)) hexstr))
(append "0x" hexstr))


> (int->hexstr 0x8000000000000000)

"0x8000000000000000"

> (int (int->hexstr 0x8000000000000000))

9223372036854775807

> (int->hexstr (int (int->hexstr 0x8000000000000000)))

"0x7fffffffffffffff"

> (format "%I64x" (int "0x8000000000000000"))

"7fffffffffffffff"



I don't know why (int "0x8000000000000000") is printed this way.



furthermore,



> (format "%I64x" (int "0xc000000000000000"))

"7fffffffffffffff"

> (format "%I64x" (int "0xa000000000000000"))

"7fffffffffffffff"



i guess the largest 64 bit signed integer ix 0x7fffffffffffffff



Is there a way to get unsigned number?



> (format "%I64x" 0xa000000000000000)

"a000000000000000"





anyway, i wrote:


Quote
(define (hexstr->int str)

  (set 'num 0)

  (set 'str (pop str 2))

  (for (i 0 15 1)

    (| (<< (find (str i) hextab) (* 4 i)) num)))


Now,



> (int->hexstr (hexstr->int (int->hexstr 0x8000000000000000)))

"0x8000000000000000"





To verify that right shift drags the sign bit to the right:



> (int->hexstr (>> 0x8000000000000000 1))

"0xc000000000000000"

> (hexstr->int (int->hexstr (>> 0x8000000000000000 1)))

-4611686018427387904

> (int->hexstr (hexstr->int (int->hexstr (>> 0x8000000000000000 1))))

"0xc000000000000000"

>

frontera000

#9
Also, is there a plan to add SHA-1 or MD-5 into newLISP?



I guess I can write one in newLISP but it seems that will be slower than C.



Some languages have these hash digests built in. I think they can be added with minimal overhead.

Lutz

#10
It's a slippery slope, by the time you have all of security math in newLISP it has doubled or tripled in size :(. I looked into md-5, but even the smallest implementation, I could find was 5k in object code, and that is just one of the numerous algorithms, which are in use in the security field, and new ones are constantly invented.



For md5 you could just do:
((exec (string "md5 -q " buffer)) 0)

The best thing would probably be, to come up with a C library which has all of security math in it and then writing a module file to import it. I am no expert in this field, but could imagine that some of that stuff is already out there as C libraries?



Lutz

frontera000

md5
#11
I wrote md5. It's probably not the fastest implementation.



;;; md5 in newLISP

;;
;; based on RFC 1321
;;

(define (F x y z)
  (| (& x y) (& (& 0xffffffff (~ x)) z)))

(define (G x y z)
  (| (& x z) (& y (& 0xffffffff (~ z)))))

(define (H x y z)
  (^ x y z))

(define (I x y z)
  (^ y (| x (& 0xffffffff (~ z)))))

(define (rotate-left x n)
  (| (& 0xffffffff (<< x n)) (& 0xffffffff (>> x (- 32 n)))))

(define (FF a b c d x s ac)
  (set 'a (& 0xffffffff (+ a (F b c d) x ac)))
  (set 'a (rotate-left a s))
  (set 'a (& 0xffffffff (+ a b))))

(define (GG a b c d x s ac)
  (set 'a (& 0xffffffff (+ a (G b c d) x ac)))
  (set 'a (rotate-left a s))
  (set 'a (& 0xffffffff (+ a b))))

(define (HH a b c d x s ac)
  (set 'a (& 0xffffffff (+ a (H b c d) x ac)))
  (set 'a (rotate-left a s))
  (set 'a (& 0xffffffff (+ a b))))

(define (II a b c d x s ac)
  (set 'a (& 0xffffffff (+ a (I b c d) x ac)))
  (set 'a (rotate-left a s))
  (set 'a (& 0xffffffff (+ a b))))

(define (md5-init )
  (set 'md5-i '(0 0))
  (set 'md5-in (dup 0 64))
  (set 'md5-digest (dup 0 16))
  (set 'md5-buf '(0x67452301 0xefcdab89 0x98badcfe 0x10325476)))

(define (md5-update inbuf inlen)
  ;; compute number of bytes mod 64
  (set 'mdi (& (>> (md5-i 0) 3) 0x3f))

  ;; update number of bits
  (if (< (+ (md5-i 0)  (<< inlen 3)) (md5-i 0))
      (nth-set (md5-i 1) (+ 1 (md5-i 1))))

  (nth-set (md5-i 0) (+ (md5-i 0) (<< inlen 3)))
  (nth-set (md5-i 1) (+ (md5-i 1) (>> inlen 29)))

  (set 'inbuf-index 0)
  (while (> inlen 0)
;; add new character to buffer, increment mdi
(nth-set (md5-in mdi) (inbuf inbuf-index))
(set 'mdi (+ mdi 1))
(set 'inbuf-index (+ inbuf-index 1))

;; transform if necessary
(if (= mdi 0x40)
    (begin
      (set 'ii 0)
      (set 'in (dup 0 16))
      (for (i 0 15 1)
   (nth-set (in i) (| (<< (char->int (md5-in (+ ii 3))) 24)
      (<< (char->int (md5-in (+ ii 2))) 16)
      (<< (char->int (md5-in (+ ii 1))) 8)
      (char->int (md5-in ii))))
   (set 'ii (+ ii 4)))
      (transform  in)
      (set 'mdi 0)))
(set 'inlen (- inlen 1))))

(define (char->int x)
  (if (integer? x) x (char x)))

(define (md5-final)
  (set 'in (dup 0 16))

  ;; save number of bits
  (nth-set (in 14) (md5-i 0))
  (nth-set (in 15) (md5-i 1))

  ;; compute number of bytse mod 64
  (set 'mdi (& (>> (md5-i 0) 3) 0x3f))

  ;; pad out to 56 mod 64
  (if (< mdi 56)
      (set 'padlen (- 56 mdi))
      (set 'padlen (- 120 mdi)))

  (set 'padding (dup 0 64))
  (nth-set (padding 0) 0x80)
  (md5-update padding padlen)

  ;; append lenth in bits and transform
  (set 'ii 0)
  (for (i 0 13 1)
       (nth-set (in i) (| (<< (char->int (md5-in (+ ii 3))) 24)
 (<< (char->int (md5-in (+ ii 2))) 16)
 (<< (char->int (md5-in (+ ii 1))) 8) (char->int (md5-in ii))))
       (set 'ii (+ ii 4)))
  (transform in)

  ;; store buffer in digest
  (set 'ii 0)
  (for (i 0 3 1)
       (nth-set (md5-digest ii) (& (md5-buf i) 0xff))
       (nth-set (md5-digest (+ ii 1)) (& (>> (md5-buf i) 8) 0xff))
       (nth-set (md5-digest (+ ii 2)) (& (>> (md5-buf i) 16)  0xff))
       (nth-set (md5-digest (+ ii 3)) (& (>> (md5-buf i) 24)  0xff))
       (set 'ii (+ ii 4))))

(define (transform  in)
  (set 'a (md5-buf 0))
  (set 'b (md5-buf 1))
  (set 'c (md5-buf 2))
  (set 'd (md5-buf 3))

  ;; Round 1
  (set 'S11 7)
  (set 'S12 12)
  (set 'S13 17)
  (set 'S14 22)
  (set 'a (FF a b c d (in 0) S11 3614090360))
  (set 'd (FF d a b c (in 1) S12 3905402710))
  (set 'c (FF c d a b (in 2) S13  606105819))
  (set 'b (FF b c d a (in 3) S14 3250441966))
  (set 'a (FF a b c d (in 4) S11 4118548399))
  (set 'd (FF d a b c (in 5) S12 1200080426))
  (set 'c (FF c d a b (in 6) S13 2821735955))
  (set 'b (FF b c d a (in 7) S14 4249261313))
  (set 'a (FF a b c d (in 8) S11 1770035416))
  (set 'd (FF d a b c (in 9) S12 2336552879))
  (set 'c (FF c d a b (in 10) S13 4294925233))
  (set 'b (FF b c d a (in 11) S14 2304563134))
  (set 'a (FF a b c d (in 12) S11 1804603682))
  (set 'd (FF d a b c (in 13) S12 4254626195))
  (set 'c (FF c d a b (in 14) S13 2792965006))
  (set 'b (FF b c d a (in 15) S14 1236535329))

  ;; Round 2
  (set 'S21 5)
  (set 'S22 9)
  (set 'S23 14)
  (set 'S24 20)
  (set 'a (GG a b c d (in 1) S21 4129170786))
  (set 'd (GG d a b c (in 6) S22 3225465664))
  (set 'c (GG c d a b (in 11) S23  643717713))
  (set 'b (GG b c d a (in 0) S24 3921069994))
  (set 'a (GG a b c d (in 5) S21 3593408605))
  (set 'd (GG d a b c (in 10) S22   38016083))
  (set 'c (GG c d a b (in 15) S23 3634488961))
  (set 'b (GG b c d a (in 4) S24 3889429448))
  (set 'a (GG a b c d (in 9) S21  568446438))
  (set 'd (GG d a b c (in 14) S22 3275163606))
  (set 'c (GG c d a b (in 3) S23 4107603335))
  (set 'b (GG b c d a (in 8) S24 1163531501))
  (set 'a (GG a b c d (in 13) S21 2850285829))
  (set 'd (GG d a b c (in 2) S22 4243563512))
  (set 'c (GG c d a b (in 7) S23 1735328473))
  (set 'b (GG b c d a (in 12) S24 2368359562))

  ;; Round 3
  (set 'S31 4)
  (set 'S32 11)
  (set 'S33 16)
  (set 'S34 23)
  (set 'a (HH a b c d (in 5) S31 4294588738))
  (set 'd (HH d a b c (in 8) S32 2272392833))
  (set 'c (HH c d a b (in 11) S33 1839030562))
  (set 'b (HH b c d a (in 14) S34 4259657740))
  (set 'a (HH a b c d (in 1) S31 2763975236))
  (set 'd (HH d a b c (in 4) S32 1272893353))
  (set 'c (HH c d a b (in 7) S33 4139469664))
  (set 'b (HH b c d a (in 10) S34 3200236656))
  (set 'a (HH a b c d (in 13) S31  681279174))
  (set 'd (HH d a b c (in 0) S32 3936430074))
  (set 'c (HH c d a b (in 3) S33 3572445317))
  (set 'b (HH b c d a (in 6) S34   76029189))
  (set 'a (HH a b c d (in 9) S31 3654602809))
  (set 'd (HH d a b c (in 12) S32 3873151461))
  (set 'c (HH c d a b (in 15) S33  530742520))
  (set 'b (HH b c d a (in 2) S34 3299628645))

  ;; Round 4
  (set 'S41 6)
  (set 'S42 10)
  (set 'S43 15)
  (set 'S44 21)
  (set 'a (II a b c d (in 0) S41 4096336452))
  (set 'd (II d a b c (in 7) S42 1126891415))
  (set 'c (II c d a b (in 14) S43 2878612391))
  (set 'b (II b c d a (in 5) S44 4237533241))
  (set 'a (II a b c d (in 12) S41 1700485571))
  (set 'd (II d a b c (in 3) S42 2399980690))
  (set 'c (II c d a b (in 10) S43 4293915773))
  (set 'b (II b c d a (in 1) S44 2240044497))
  (set 'a (II a b c d (in 8) S41 1873313359))
  (set 'd (II d a b c (in 15) S42 4264355552))
  (set 'c (II c d a b (in 6) S43 2734768916))
  (set 'b (II b c d a (in 13) S44 1309151649))
  (set 'a (II a b c d (in 4) S41 4149444226))
  (set 'd (II d a b c (in 11) S42 3174756917))
  (set 'c (II c d a b (in 2) S43  718787259))
  (set 'b (II b c d a (in 9) S44 3951481745))

  (nth-set (md5-buf 0) (+ a (md5-buf 0)))
  (nth-set (md5-buf 1) (+ b (md5-buf 1)))
  (nth-set (md5-buf 2) (+ c (md5-buf 2)))
  (nth-set (md5-buf 3) (+ d (md5-buf 3))))

(define (md5-string str)
  (md5-init)
  (md5-update str (length str))
  (md5-final)
  (set 'result "")
  (for (i 0 15 1)
       (set 'result (append result (format "%02x" (md5-digest i)))))
  result)