IMAP "search header" command failing when search-text contains exclamation mark (!), ampersand (&), etc
Asked Answered
V

2

8

I'm accessing GMail's IMAP interface through python. I run a command like this:

UID SEARCH HEADER Message-ID "[email protected]"

That succeeds (returns 1 UID of the matching message, or 0 if it doesn't exist). However, if the search-text contains certain chars (like & or !), the search-text is truncated at that point. This means:

UID SEARCH HEADER Message-ID "[email protected]"

Is treated the same as

UID SEARCH HEADER Message-ID ""

Also:

UID SEARCH HEADER Message-ID "[email protected]"

Is treated as:

UID SEARCH HEADER Message-ID "abc"

I've gone through the IMAP language spec, and from the ABNF language spec it seems like those chars should be valid. Why is gmail truncating these search phrases at the "!" and "&" chars? Is there a way to escape them? (I've tried !, fails as a badly-encoded string). Is there an RFC or doc that shows what really should be accepted? Is this a bug in gmail's imap implementation?

I've also tried literal format, same results:

UID SEARCH HEADER Message-ID {15}
[email protected]

Still treated as:

UID SEARCH HEADER Message-ID {3}
abc

Thanks!

IMAP RFC3501 Search Command: https://www.rfc-editor.org/rfc/rfc3501#section-6.4.4 Formal syntax: https://www.rfc-editor.org/rfc/rfc3501#section-9

Vinavinaceous answered 6/3, 2012 at 18:20 Comment(11)
I can confirm that there is nothing special about using an exclamation mark in the search query. It is most likely that you found a bug in gmail. I suggest using several different IMAP servers during development, in particular since gmail's IMAP implementation is not well known for its conformity to the IMAP specification.Grugru
Thanks nosid. Unfortunately, the IMAP server I need to use with this code is gmail, so testing on others won't help with this bug. But it is good to know that I'm not reading the spec wrong. I'll try to find a way to report this bug to google.Vinavinaceous
Yes, I currently experience this problem when doing an IMAP search on Gmail through alpine mail client trying to select all messages with subjects containing !.Odey
I'd also ask: How to overcome this bug in GMail and do such searches?Odey
See also a discussion of this problem at groups.google.com/forum/#!topic/google-mail-xoauth-tools/… .Odey
Google's IMAP search breaks things up into "words", which is probably why special characters get treated strangely. I echo the recommendation in the groups above: try using X-GM-RAW and sending google search keywords.Substituent
@Substituent Thanks for echoing this useful recommendation! X-GM-RAW extension is documented at developers.google.com/gmail/… , as it was pointed out in https://mcmap.net/q/1470589/-search-utf-8-string-with-gmail-x-gm-raw-imap-command/94687 .Odey
How do I search by headers' substring with X-GM-RAW? Is subject:(!) correct for searching for subjects containing an exclamation mark? Is there a family of rfc822* keys in GMail?Odey
As for searching for ! in the subject, it seems to not work with any queries in the GMail web interface, too (so, X-GM-RAW wouldn't work, too). See webapps.stackexchange.com/q/31322/15124 , webapps.stackexchange.com/q/52828/15124 . Very inconvenient! So, an exernal IMAP client for such kind of searches is not a solution (unless the client does the filtering itself, without relying on server responses to SEARCH).Odey
@imz--IvanZakharyaschev check out my answer below. The same approach could be used no matter what search criteria you want to use.Tout
@imz--IvanZakharyaschev please accept an answer.Tout
T
4

I'm largely basing my answer on the discovery (by Max) in the comments to the original question that GMail's SEARCH implementation uses a backing database that has already split textual content into word tokens rather than storing the full text and doing a substring search.

So here's a possible workaround that you could use with GMail in C# using my MailKit library (which is a fairly low-level IMAP library so this should easily translate into basic pseudocode):

// given: text = "[email protected]"

// split the search text on '!'
var words = text.Split (new char[] { '!' }, StringSplitOptions.RemoveEmptyEntries);

// build a search query...
var query = SearchQuery.HeaderContains ("Message-ID", words[0]);
for (int i = 1; i < words.Count; i++)
    query = query.And (SearchQuery.HeaderContains ("Message-ID", words[i]));

// this will result in a query like this:
// HEADER "Message-ID" "abc" HEADER "Message-ID" "[email protected]"

// Do the UID SEARCH with the constructed query:
// A001 UID SEARCH HEADER "Message-Id" "abc" HEADER "Message-Id" "[email protected]"
var uids = mailbox.Search (query);

// Now UID FETCH the ENVELOPE (and UID) for each of the potential matches:
// A002 UID FETCH <uids> (UID ENVELOPE)
var messages = mailbox.Fetch (uids, MessageSummaryItems.UniqueId |
    MessageSummaryItems.Envelope);

// Now perform a manual comparison of the Message-IDs to get only exact matches...
var matches = new UniqueIdSet (SortOrder.Ascending);
foreach (var message in messages) {
    if (message.Envelope.MessageId.Contains (text))
        matches.Add (message.UniqueId);
}

// 'matches' now contains only the set of UIDs that exactly match your search query
Tout answered 15/9, 2016 at 10:34 Comment(0)
S
2

I've been hitting this issue myself for months now.

SEARCH HEADER Message-ID <-!&!...>

Ended up skipping some MsgId searches that start with '<-'. Also see the problems with &!'s ... Not sure how to workaround this well.

Have you ever got a word from Google on this bug?

Thanks much

Soble answered 4/7, 2012 at 1:44 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.