How to download only new emails from imap?
Asked Answered
A

4

8

I have an application that is used to archive emails using imap. Also in this application are many imap accounts that need to be archived.

In this moment from time to time the application connects to imap accounts and download only new emails. My issue is that every time when it connects to an imap account it verifies all emails from all folders and downloads only emails that aren't downloaded yet (I store Message-ID for all emails and download only emails that have an Message-ID that is not stored). So I want to know if there is an alternative for this, because it takes some time to verify all emails (for 10-20K it takes 2-5 minutes).

I use JavaMail API to connect to imap accounts.

Advocaat answered 18/1, 2011 at 14:14 Comment(0)
A
6

The javadoc helps:

IMAPFolder provides the methods:

getMessagesByUID(long start, long end) and

getUID(Message message)

With getUID() you can get the UID of the last message you already have downloaded. With getMessagesByUID you can define this last message you have downloaded as start-range and look with the method getUIDNext() to find the last message which would be the end of the range.

Assemble answered 18/1, 2011 at 14:26 Comment(11)
Is not a solution for me because UIDs are changed over time, this is why I use Message-ID to track downloaded emails.Advocaat
as I understand the getUID(Message message)-method, you can get the actual MessageUID for the given message from the server. So it would be possible to get the actual UID of the last message you have and then use the getUIDNext() and getMessagesByUID() methods to find the last message and download the given range.Assemble
A message UID will change when the message is moved from one folder (or "mailbox", in IMAP terminology) to another. "Unique identifiers are assigned in a strictly ascending fashion in the mailbox; as each message is added to the mailbox it is assigned a higher UID than the message(s) which were added previously." telebog seems to want to download each message exactly once, regardless of which folder it's in.Judicator
Hmm... I can store UID of the last downloaded email from each folder and instead of getting all messages from an folder I get only new messages using getMessagesByUID, but before download a new email I must verify if it isn't downloaded yet using Message-ID. Thanks.Advocaat
no - if it is a new message in this folder (newer UID), you have to download it, since it is new in this folder. I think it will consume up a lot of CPU-time to check through your local database and search the message in other folders. And I think it won't happen very often, that one message is copied to several folders.Assemble
It is a new message in this folder, but this message could been moved from another folder, and if it was downloaded from that folder I don't want to download it again. My application only stores emails, not sync them :)Advocaat
Yep, I think telebog has it. Fetch the Message-ID headers of messages with UIDs equal to or higher than the folder's previous UIDNEXT value.Judicator
Then you can load only the Headers of the "new Email" and check using the headers, if you already have this email downloaded (perhaps MD5 them). If not, then you can download the complete message. That will save a lot of traffic, if the messages have a bigger sizeAssemble
Yes, thats is, thank you both very much, I still have one issue, not all emails have Message-ID, gmail for example generate one if it can't find it on the source, but other servers don't. In this moment I don;t store emails that don't have an Message-ID, but this is not a good solutionAdvocaat
I would never trust the Message-ID alone (can be set by the sender to any value), since it mustn't be unique. Use (if available) the Message-ID, the Sender-Adress and the Time of the Email to identify the message. You can write them into a string and calculate a MD5-Hash. So you will have a unique identifier for each email.Assemble
Yes, and if one of them si null I don't consider it. Could you tell me how to calculate MD5-Hash?Advocaat
P
3

check only the headers and when you reach a known (the last known), bail out:

for instance (i feel extra nice today) and that's an except from real production code (some parts were cut, so it might not compile, state.processed is some set preferrably LinkedHashMap surrogate [keySet()] (and w/ some max boundary boolean removeEldestEntry())

 try {
      store = mailSession.getStore("imap");
      try {
        store.connect();
        Folder folder = store.getFolder("INBOX");
        folder.open(Folder.READ_ONLY);

        int count = folder.getMessageCount();
        for(int localProc=0, chunk=49;localProc<10 && count>0; count -=chunk+1){


          Message messages[] = folder.getMessages(Math.max(count-chunk, 1), count);

          FetchProfile fp = new FetchProfile();
          fp.add(FetchProfile.Item.ENVELOPE);
          fp.add("Message-ID");
//add more headers, if need be
          folder.fetch(messages,fp);

          for (int i=messages.length;--i>=0;) {

            //can check abort request here
            Message  message = messages[i];


            String msgId = getHeader(message,"Message-ID");
            if (msgId!=null && !state.processed.add(msgId)){            
              if (++localProc>=10){
                break;
              }
              continue;
            }
///process here, catch exception, etc..
          }
        }

        folder.close(false);        
      } catch (MessagingException e) {
        logger.log(Level.SEVERE, "Mail messaging exception", e);
      }
    } catch (NoSuchProviderException e) {
      logger.log(Level.SEVERE, "No mail provider", e);
    }

    if(store != null) {
      try {
        store.close();
      } catch (MessagingException e) {}
    }
Peristome answered 18/1, 2011 at 14:30 Comment(2)
This is a little dicey, as a big move of known messages on top of a bunch of new messages will cause you to miss the new ones.Judicator
true that, it's a good point; the code is for reading failed mails, so i guess it has never been an issue, itself. The code should keep at least the previous count and attempt to retrieve at least that many messages back.Peristome
S
1

Filter on the SEEN flag. This flag is intended for finding new messages. The one Caveat is that if your user is using multiple readers, then it may have been seen using another reader.

Smile answered 18/1, 2011 at 17:18 Comment(3)
You mean the RECENT flag. And yeah, any other client's connection to that folder will unset RECENT on all messages in the folder.Judicator
I've seen it documented as SEEN, but it may be RECENT in your implementation. IMAP keeps track of messages which have not yet been read by a client. For a reader interface, I would expect anything I read with any client to be marked read in my interface. My background in this case is more in administering a server than programming a client.Smile
The Seen flag is a standard flag. Java will automatically flag a message as seen when it is retrieved. This is the behavior I expect from all clients. It is possible to clear the flag.Smile
S
0

message-Id which comes as part of the header is always unique even if u set it manually .i have tested it with gamil and racksoace.

Senary answered 10/6, 2013 at 7:18 Comment(1)
But it is not required and easy amy absent at all. On Gmail too.Flit

© 2022 - 2024 — McMap. All rights reserved.