pop3 email handling with newLISP

Started by didi, October 22, 2008, 10:14:36 AM

Previous topic - Next topic

didi

The pop3-function works fine , i can download the files from my email-account. The files are in mime 1.0 format, i can read them with outlook-express after renaming the extension to .eml .



But what if i want to work with the email-text in newLISP ?

The from:  to: cc: subject: and date: fields are easy to get .

But what with the email-content , in an example a charset="iso-8859-1" is

given ?

I can imagine different ways :

- to start an external mime-decoder out of newLISP , is there any ?

- to use a dll , any idea where i can get one and can give someone a practical example ?

- to make an own decoder in newLISP but i'm not sure eg. how many charset i should use ?

Lutz

#1
ISO-8859-1 is just a standard Windows one byte wide character set, where the bytes between 128 and 255 are used to encoded European special characters like German Umlaute, French and Spanish accented letters etc. Nothing to MIME-translate.



Text will be displayed correctly in Windows applications (e.g. notepad.exe or newLISP-GS), but the command shell uses a different character set (the old ANSI PC characterset) and will only display characters less than 128 correctly and display graphics characters for bytes higher than the first 128 when using 'print(ln)' or use decimal codes for evaluation results.

didi

#2
That sounds good .  I'll try it with newLISP alone , the  emails i've checked have a plain-text part , often there is  a  second html-part .

Kirill

#3
You should be aware of the encoding. If encoding is 8bit or binary - you're fine. If it's quoted-printable or base64 - then you'll need to decode the data to get the message text in that charset.



Also, note that the message headers will (or rather should, but sometimes aren't) will be encoded if they containt 8bit data.



-- Kirill

didi

#4
Thanks. The most samples here are "quoted-printable" ,   in this plain text there are things like "=3D" , i think i have interpret this as a hex-code and replace it through a char  eg.  ( char 0x3D) .

didi

#5
This is my code which works with most of my emails :
; email-to-text.lsp  dmemos 9-11-2008 12:00
; get clear-text from  multipart-alternativ emails
; in pop3 or .eml format

( change-dir "D:\arb" )
   ; test-dircectory
( silent
  ; no console echo, for test

  ( set 'source_txt (read-file "test.pop3" ))  
  ; test-email to var 'source_txt
 
  ( regex "to:.*rn" source_txt 1 )
  ; get first  to: field
 
 ( set 'to_field  $0 )

  ( regex "from:.*rn" source_txt 1 )
  ; get first from: field
  ( set 'from_field  $0 )

  ( regex "subject:.*rn" source_txt 1 )
  ; get first subject: field
  ( set 'subject_field  $0 )

  ( if (regex "boundary.*rn" source_txt 1 )
  ; get boundary pattern
   ( begin
      ( set 'boundary_line $0 )
      ( set 'xlist (parse boundary_line """ ))
      ( set 'bound_pattern ( xlist  1 ))
   ))

 (set 'source_parts ( parse source_txt bound_pattern ))
 ; divide email in parts

  ( set 'mbreak nil  ) ; look for text/plain part
  ( dolist ( x source_parts mbreak )
    (if ( find "text/plain" x )
      ( begin
      ( set 'raw_text x )      
      ( set 'mbreak true ))))

  ( set 'idxx ( find "quoted-printable" raw_text ))
  ; delete header in text-part
  ( set 'raw_text ( (+ idxx 16) raw_text ))

  ; generate new header for output
  ( set 'head ( append to_field  from_field  subject_field  ))
  ( push head raw_text )

  ; replace special-chars
  ( while ( find "=[A-F0-9][A-F0-9]" raw_text 1 )
    ( if $0 ( set 'xstr (append "0x" (1 $0))))
    ( set 'x ( char ( int xstr)))  
    ( replace $0 raw_text x ))

  ; save filtered text to test-output-file
  ( write-file "out_txt.txt" raw_text )

 ) ; end-of-silent
 ( println (length source_txt))


No big newLISP art, but it works fine . I have written a second short program, which uses this text and generates  a  html-page for my  new small project  http://www.thanks-with-love.com">www.thanks-with-love.com  .  



The next steps will be a gui-version in newLISP  and other email-formats .

Maybe i could use it later for a auto-archive-programm , where i can send the email to an archive-program which stores it in clear text and makes an index.