How can I download IMAP mail attachments over SSL and save them locally using Perl?
Asked Answered
F

4

7

I need suggestions on how can I download attachments from my IMAP mails which have attachments and current date in subject line i.e. YYYYMMDD format and save the attachments to a local path.

I went through the Perl module Mail::IMAPClient and am able to connect to the IMAP mail server, but need help on other tasks. One more thing to note is that my IMAP sever requires SSL auth.

Also the attachments could be gz, tar or tar.gz files.

Fucoid answered 16/3, 2010 at 10:29 Comment(0)
B
8

A simple program that does what you want is below.

#! /usr/bin/perl

use warnings;
use strict;

The minimum version for Email::MIME is for when walk_parts was introduced.

use Email::MIME 1.901;
use IO::Socket::SSL;
use Mail::IMAPClient;
use POSIX qw/ strftime /;
use Term::ReadKey;

You don't want to hardcode your password in your program, do you?

sub read_password {
  local $| = 1;
  print "Enter password: ";

  ReadMode "noecho";
  my $password = <STDIN>;
  ReadMode "restore";

  die "$0: unexpected end of input"
    unless defined $password;

  print "\n";
  chomp $password; 
  $password;
}

Connect using SSL. We ought to be able to be able to do this with a simple Ssl parameter to the constructor, but some vendors have chosen to break it in their packages.

my $pw = read_password;
my $imap = Mail::IMAPClient->new(
 #Debug    => 1,
  User     => "you\@domain.com",
  Password => $pw,
  Uid      => 1,
  Peek     => 1,  # don't set \Seen flag
  Socket   => IO::Socket::SSL->new(
                Proto    => 'tcp',
                PeerAddr => 'imap.domain.com',
                PeerPort => 993,
              ),
);

die "$0: connect: $@" if defined $@;

If you want a folder other than the inbox, change it.

$imap->select("INBOX")
  or die "$0: select INBOX: ", $imap->LastError, "\n";

Using IMAP search, we look for all messages whose subjects contain today's date in YYYYMMDD format. The date can be anywhere in the subject, so, for example, a subject of "foo bar baz 20100316" would match today.

my $today = strftime "%Y%m%d", localtime $^T;
my @messages = $imap->search(SUBJECT => $today);
die "$0: search: $@" if defined $@;

For each such message, write its attachments to files in the current directory. We write the outermost layer of attachments and do not dig for nested attachments. A part with a name parameter in its content type (as in image/jpeg; name="foo.jpg") is assumed to be an attachment, and we ignore all other parts. A saved attachment's name is the following components separated by -: today's date, its IMAP message ID, a one-based index of its position in the message, and its name.

foreach my $id (@messages) {
  die "$0: funky ID ($id)" unless $id =~ /\A\d+\z/;

  my $str = $imap->message_string($id)
    or die "$0: message_string: $@";

  my $n = 1;
  Email::MIME->new($str)->walk_parts(sub {
    my($part) = @_;
    return unless ($part->content_type =~ /\bname=([^"]+)/ 
                or $part->content_type =~ /\bname="([^"]+)"/); # " grr...

    my $name = "./$today-$id-" . $n++ . "-$1";
    print "$0: writing $name...\n";
    open my $fh, ">", $name
      or die "$0: open $name: $!";
    print $fh $part->content_type =~ m!^text/!
                ? $part->body_str
                : $part->body
      or die "$0: print $name: $!";
    close $fh
      or warn "$0: close $name: $!";
  });
}
Belly answered 16/3, 2010 at 20:48 Comment(5)
Thanks Gbacon for brief details with code. I need one more help. With your code I can only download the text attachments. Can you please also advise the changes if i also want to download .tar.gz or gz files.Fucoid
@Octopus Do the content-types of the compressed attachments lack name attributes?Belly
yes, the compressed attachments not showing name of the attachments.Fucoid
@Octopus It's hard to give a good suggestion without knowing how the messages are formatted. What's the Content-Type of a gzipped attachment? (Common ones are application/octet-stream or application/x-gzip.) Are these messages machine generated and consistently formatted? Do the messages have other multipart content? Can you edit your question to add a sample message with the base64 data elided?Belly
Just to add more for this answer. I have found that some of your suppliers have bogus mailservers, and the name atribute comes without quotation, like Content-Type: application/pdf; name=065011-5.PDF so, the $part->content_type =~ should be altered to /\bname=([^"]+)/;.Atalanta
O
3

If you want to stick with Mail::IMAPClient, you can tell it to use SSL.

Alternatively, Net::IMAP::Simple::SSL could also help you with that. The interface is the same as the one provided by Net::IMAP::Simple.

Once you have the message, Parsing emails with attachments shows how to extract attachments. I haven't tried it, but my hunch is that using Email::MIME::walk_parts can be used to significantly simplify the script shown in that PerlMonks article.

Olvan answered 16/3, 2010 at 11:40 Comment(0)
A
1

I have changed a little my approach to download attachments from @Greg, since it was shown unreliable to download SAP XML attachments. They do not follow the Content-Type: application/pdf; name=XXXXX standard so, it gave me a lot of problems. Example:

Content-ID: <[email protected]>
Content-Disposition: attachment;
    filename="XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.xml"
Content-Type: application/xml
Content-Descripton: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.xml

The rest of the program remains almost the same. The difference is that i´m now using MIME::Parser to retrieve all the message, and i throw away all that is body and image related. I also removed the Peek => 1 since i wanted to mark the messages as read after they got downloaded(and only navigate on unread messages). Log::Logger helped to create a centralized log:

--- Snippet 1 --- Libs

#! /usr/bin/perl
use warnings;
use strict;
use Mail::IMAPClient; #IMAP connection
use Log::Logger; #Logging facility
use MIME::Parser; #Mime "slicer"
use DateTime; #Date
use File::Copy; #File manipulation
use File::Path qw( mkpath );

--- Snippet 2 --- Log initialization

$log_script = new Log::Logger;
$log_script->open_append("/var/log/downloader.log");
my $dt = DateTime->now;
$dt->set_time_zone('America/Sao_Paulo');
$hour = (join ' ', $dt->ymd, $dt->hms);

--- Snippet 3 --- Mail downloader

$imap->select($remote_dir) or ($log_script->log("$hour: Account $account, Dir $remote_dir. Check if this folder exists") and next);
# Select unseen messages only
my @mails = ($imap->unseen);
foreach my $id (@mails) {
  my $subject = $imap->subject($id);
  my $str = $imap->message_string($id) or ($log_script->log("$hour: Account $account, Email \<$subject\> with problems. Crawling through next email") and next);
  my $parser = MIME::Parser->new();
  $parser->output_dir( $temp_dir );
  $parser->parse_data( $str );
  opendir(DIR, $temp_dir);
  foreach $file (readdir(DIR)) {
    next unless (-f "$temp_dir/$file");
    if ("$file" =~ /^msg/i){ # ignores body
      $body .= "$file ";
      unlink "$temp_dir/$file";
    } elsif (("$file" =~ /jpg$/i) # ignores signature images
          or ("$file" =~ /gif$/i)
          or ("$file" =~ /png$/i)) {
      $body .= "$file ";
      unlink "$temp_dir/$file";
    } else { # move attachments to destination dir
      $log_script->log("$hour: Account: $account, File $file, Email \<$subject\>, saved $local_dir");
      move "$temp_dir/$file", "$local_dir";
    };
 };
  $log_script->log("$hour: Files from email \<$subject\> ignored as they are body related stuff: $body") if $body;
Atalanta answered 26/1, 2016 at 10:25 Comment(0)
I
1

I prefer the Mail::IMAPClient approach outlined by Greg, but it is essential to binmode() the output filehandle, namely to prevent Windows from assuming 0x0A bytes to be linefeeds and replacing them by CRLFs and so invalidating binary files. I'm sorry to disguise this as an answer, comments would be appropriate, but I don't own any reputation by now.

Isotron answered 7/6, 2018 at 21:36 Comment(3)
German? (just kidding, but you don't have to make excuses). Welcome to StackOverflow!Ironmaster
How could you know? I tried my best to let it sound danish :)Isotron
When I first put my foot on US territory and asked a taxi driver if he would drive me from one terminal of JFK to the next, the driver's first word was "German?". Since then I know that Germans are known for making excuses - or asking a taxi driver whether it is OK for him to make his job. Back to SO: it is OK to post an answer. Period. You don't have to make excuses - even if you think a comment would be more appropriate.Ironmaster

© 2022 - 2024 — McMap. All rights reserved.