Please critique: web interaction script

Started by gregben, March 22, 2005, 01:47:40 AM

Previous topic - Next topic

gregben

I'm trying to write a program in Newlisp that

runs under Sun Solaris and:



1) Makes a list of md5 signatures of all files in /usr/bin.

2) Sends sections "chunks" of the signature list to a

    special web page operated by Sun Microsystems for

    checking against a database of known-good signatures.

3) Search the results returned by Sun for suspect file

    names, that is, those with signatures that weren't found

    in the database.



In the past I'd have written such a program in sh/bash or

perl. Here's what I've done so far using Newlisp. I'm looking

to clean the code up to improve efficiency and legibility.







#!/usr/bin/newlisp
#
# md5check
#
# Look for files in /usr/bin on the local machine that
# have md5 signatures that don't match those stored in
# the Sun Fingerprint database. This could indicate the
# file is a maliciously installed replacement for the
# genuine file supplied by Sun.

# The maximum number of md5 signatures passed to the
# Sun Fingerprint web page at one time. This is due to
# a restriction imposed by Sun on the number of lookups
# that can be done at once.

(constant 'md5s_at_once 100)

# The Sun fingerprint database URL.

(constant 'fingerprint_page "http://sunsolve.sun.com/pub-cgi/fileFingerprints.pl")

# Calculate the md5 signatures.

(set 'md5list (exec "cd /usr/bin ; md5sum *"))

# Prepare to loop through the signatures.

(set 'linecount 0)
(set 'md5_chunk "")

# Loop through md5 signatures, breaking into chunks.

(dolist (md5item md5list)
  (set 'md5_chunk (append md5_chunk md5item "n"))
  (inc 'linecount)
  (if (= (% linecount md5s_at_once) 0)
    (begin
      # Prepend POST form variable name to chunk.
      (set 'md5_chunk (append "md5list=" md5_chunk))
      # Send chunk of signatures to Sun for lookup.
      (set 'result (post-url fingerprint_page md5_chunk ))
      # Search returned web page for non-matching files.
      (set 'no_match_list (exec {grep "- 0 match"} result))
#      (println result)
      (println no_match_list)
#      (println (regex "- 0 match" result))
#      (set 'splitup (parse result "n"))
#      (println splitup)
#      (dolist (spl splitup)
#        (println spl)
#      )
      (set 'md5_chunk "")
    )
  )
)

(exit)


I'd like to come up with something better than using grep,

but it doesn't appear that there's an equivalent builtin.



Also, grep returns true in addition to the string results. The

manual shows an example of feeding data into and out of

an external command using exec, but not both at once as

I'm doing (sending a big string in and getting a big string out).

Lutz

#1
Nice program, I like how 'post-url' is used, and how you collect data into newLISP using the 'exec' function.



If I understand you well, you are looking for other ways to parse out the filenames from the web-page in 'result' without using the external grep. This is a problem which occurs many times. Look at the following line from an utility to get all of Norman's utilities: http://www.newlisp.org/code/get-normans.txt">http://www.newlisp.org/code/get-normans.txt



(replace {href="(http://.*lsp)"} page (push $1 links) 0)



'replace' is used here to parse out the filenames. The works is done in the relacement expression (push$1 links) where all found filenames are pushed on to 'links', which will be the list of filenames. So you could do something like



(replace {(.*)- 0 match} result (push $1 links) 0)



supposing that the filename is contained in (.*) and the web-page in 'result'. I am not sure at the moment about the backslash before the minus sign, I dont think, you need it.



Lutz