I'm currently writing an application that allows one to store images, and then tag these images. I'm using Python and the Peewee ORM (http://charlesleifer.com/docs/peewee/), which is very similar to Django's ORM.
My data model looks like this (simplified):
class Image(BaseModel):
key = CharField()
class Tag(BaseModel):
tag = CharField()
class TagRelationship(BaseModel):
relImage = ForeignKeyField(Image)
relTag = ForeignKeyField(Tag)
Now, I understand conceptually how to query for all Images that have a given set of tags:
SELECT Image.key
FROM Image
INNER JOIN TagRelationship
ON Image.ID = TagRelationship.ImageID
INNER JOIN Tag
ON TagRelationship.TagID = Tag.ID
WHERE Tag.tag
IN ( 'A' , 'B' ) -- list of multiple tags
GROUP BY Image.key
HAVING COUNT(*) = 2 -- where 2 == the number of tags specified, above
However, I also want to be able to do more complex searches. Specifically, I'd like to be able to specify a list of "all tags" - i.e. an image must have all of the specified tags to be returned, along with a list of "any" and a list of "none".
EDIT: I'd like to clarify this a bit. Specifically, the above query is an "all tags"-style query. It returns Images that have all the given tags. I want to be able to specify something like: "Give me all images that have the tags (green, mountain), any one of the tags (background, landscape) but not the tags (digital, drawing)".
Now, ideally, I'd like this to be one SQL query, because pagination then becomes very easy with LIMIT and OFFSET. I've actually got an implementation working whereby I just load everything into Python sets and then use the various intersection operators. What I'm wondering is if there's a method of doing this all at once?
Also, for those interested, I've emailed the author of Peewee about how to represent the above query using Peewee, and he responded with the following solution:
Image.select(['key']).group_by('key').join(TagRelationship).join(Tag).where(tag__in=['tag1', 'tag2']).having('count(*) = 2')
Or, alternatively, a shorter version:
Image.filter(tagrelationship_set__relTag__tag__in=['tag1', 'tag2']).group_by(Image).having('count(*) = 2')
Thanks in advance for your time.