Whole-word match only in Django query
Asked Answered
C

3

14

I am trying to write a Django query that will only match whole words. Based on the answer here, I've tried something like:

result = Model.objects.filter(text__iregex='\bsomeWord\b')

But this isn't returning the expected result. I also tried

result = Model.objects.filter(text__iregex=r'\bsomeWord\b')

to no avail. My end goal is to be able to pass in a string variable as well, something like:

result = Model.objects.filter(text__iregex=r'\b'+stringVariable+r'\b')

or

result = Model.objects.filter(text__iregex=r'\b %s \b'%stringVariable)

But right now I can't even get it to work with a raw string. I'm using PostgreSQL.

Contingence answered 21/2, 2013 at 8:22 Comment(0)
D
28

Use “\y” instead of “\b” when you're using PostgreSQL, this is because Django passes your regular expression straight down to PostgreSQL – so your RegEx's need to be compatible with it. You should be able to execute them from psql without any problems.

result = Model.objects.filter(text__iregex=r"\y{0}\y".format(stringVariable))

See https://www.postgresql.org/docs/9.1/functions-matching.html#POSIX-CONSTRAINT-ESCAPES-TABLE

Denunciation answered 21/2, 2013 at 8:53 Comment(3)
Django doesn't do any translating of PCREs to PostgreSQL regular expressions (see: bit.ly/ZtpbgH). So you just have to use PostgreSQL's native regular expressions, see here: bit.ly/ZtpojUDenunciation
This works. Thanks for adding the explanation about \y as well. As for the '{0}'.format(stringVariable) part, is that just another way of writing '%s'%stringVariable? Both seem to work for me.Contingence
Yeah, effectively they both do the same. I just use the .format() way because Python 3 requires it (python.org/dev/peps/pep-3101).Denunciation
A
0

You might be able to get something by dropping the regex and using a few django lookups

result = Model.objects.filter(Q(text__contains=' someword ') |
                              Q(text__contains=' someword.') |
                              Q(text__istartswith = 'someword.' |
                              Q(text__istartswith = 'someword.' |
                              Q(text__iendswith = 'someword')

see here for docs.

I realize that's not so elegant (but makes for easy maintenance if you're not a fan of regex).

Afoul answered 21/2, 2013 at 8:26 Comment(4)
Because this doesn't perform a whole-word match; for instance, if I do a text__contains='tart' I pick up results like start.Contingence
my bad - I'm obviously not familiar with sql enough to know that 'whole word match' meant something specific. Anyway, I've updated my answer to improve it a bit.Afoul
Easy maintenance? There are lots of cases you're not covering there (between (), commas, etc).Shockheaded
Easy maintenance if you're not a fan of regex. You don't need to know django or python to read this - but I'd have to reference any regex that solves this problem. And yes, this doesn't catch everything - It would only work if you're confident that text contains only a subset of the language.Afoul
A
0

I had the same problem trying to match word boundaries using the Perl-compatible escape sequence \b. My backend database is MySQL.

I solved the problem by the character class expression [[:space:]], e.g.

        q_sum = Q()
        search_list = self.form.cleaned_data['search_all'].split(' ');
        for search_item in search_list:
            search_regex = r"[[:space:]]%s[[:space:]]" % search_item
            q_sum |= Q(message__iregex=search_regex)
        queryset = BlogMessages.objects.filter(q_sum).distinct()
Auriga answered 27/7, 2013 at 19:43 Comment(1)
This won't account for punctuation just so you know.Sheedy

© 2022 - 2024 — McMap. All rights reserved.