Recommended SQL database design for tags or tagging [closed]

Asked 21/8, 2008 at 19:18 Answered 28/11, 2015 at 9:51

Solved sql database-design tags data-modeling tagging

343

I've heard of a few ways to implement tagging; using a mapping table between TagID and ItemID (makes sense to me, but does it scale?), adding a fixed number of possible TagID columns to ItemID (seems like a bad idea), Keeping tags in a text column that's comma separated (sounds crazy but could work). I've even heard someone recommend a sparse matrix, but then how do the tag names grow gracefully?

Am I missing a best practice for tags?

Belisle answered 21/8, 2008 at 19:18 Comment(7)

Okay this is question #20856, the (almost) same question is #48475 asked at least two weeks after this question was asked. – Belisle 7/10, 2008 at 16:2

Another interesting question is "How SO implements tags?" – Waverley 28/11, 2011 at 19:19

Another interesting question is "Would you internationalize them, and if so, how?" – Caprification 3/12, 2013 at 11:14

Interesting comparison (Postgres specific): databasesoup.com/2015/01/tag-all-things.html – Circle 11/5, 2015 at 10:10

see also https://mcmap.net/q/94352/-database-design-for-tagging-closed – Kellene 22/12, 2015 at 11:50

if you use RDBMS you have to use a minor-major tag technique else it is not scalable when users will actually use tags. – Accountant 18/6, 2021 at 9:46

What do you mean by "minor-major tag technique"? – Spire 12/9, 2021 at 17:8

476

Three tables (one for storing all items, one for all tags, and one for the relation between the two), properly indexed, with foreign keys set running on a proper database, should work well and scale properly.

Table: Item
Columns: ItemID, Title, Content

Table: Tag
Columns: TagID, Title

Table: ItemTag
Columns: ItemID, TagID

Vigorous answered 21/8, 2008 at 19:22 Comment(8)

This is known as the “Toxi” solution, you can find additional information about it here : howto.philippkeller.com/2005/04/24/Tags-Database-schemas – Acrobat 28/6, 2009 at 12:41

One thing not shown here is hierarchal "tags" or categories in the Tag table. This is commonly needed on sites that have categories and subcategories but need the flexibility of tagging. For example, recipe sites, auto parts sites, business directories, etc. These types of data don't usually fit into only one single category so tagging is the answer but you need to use something like the Nested Set Model or the Adjacency List Model in your Tag table. – Sporophore 21/1, 2011 at 20:50

I agrree with HK1 is it possible with above structure + Table : TagGroup Columns : TagGropuId, Title Table: Tag Columns: TagID, Title , TagGroupId – Autoroute 11/2, 2011 at 8:35

I thought that the table ItemTag needs a Column ItemTagID. Am I wrong? do I miss anything? – Insolation 21/11, 2016 at 16:16

I think the table Tag should only have the title as PK, having a separate ID as primary key especially an auto incrementing one would result in the table Tag getting filled with duplicate values upon insertion/update (in a ManyToMany relationship). – Musetta 3/1, 2017 at 20:53

@ftvs: link again broken, the new link is howto.philippkeller.com/2005/04/24/Tags-Database-schemas – Aborn 12/11, 2017 at 7:2

this is standard and the best RDBMS approach, if you use 2 ItemTag tables splitting tags to minor and major you are also top scalable – Accountant 18/6, 2021 at 9:47

Could you elaborate on "2 ItemTag tables splitting tags to minor and major"? – Spire 12/9, 2021 at 18:33

110

Normally I would agree with Yaakov Ellis but in this special case there is another viable solution:

Use two tables:

Table: Item
Columns: ItemID, Title, Content
Indexes: ItemID

Table: Tag
Columns: ItemID, Title
Indexes: ItemId, Title

This has some major advantages:

First it makes development much simpler: in the three-table solution for insert and update of item you have to lookup the Tag table to see if there are already entries. Then you have to join them with new ones. This is no trivial task.

Then it makes queries simpler (and perhaps faster). There are three major database queries which you will do: Output all Tags for one Item, draw a Tag-Cloud and select all items for one Tag Title.

All Tags for one Item:

3-Table:

SELECT Tag.Title 
  FROM Tag 
  JOIN ItemTag ON Tag.TagID = ItemTag.TagID
 WHERE ItemTag.ItemID = :id

2-Table:

SELECT Tag.Title
FROM Tag
WHERE Tag.ItemID = :id

Tag-Cloud:

3-Table:

SELECT Tag.Title, count(*)
  FROM Tag
  JOIN ItemTag ON Tag.TagID = ItemTag.TagID
 GROUP BY Tag.Title

2-Table:

SELECT Tag.Title, count(*)
  FROM Tag
 GROUP BY Tag.Title

Items for one Tag:

3-Table:

SELECT Item.*
  FROM Item
  JOIN ItemTag ON Item.ItemID = ItemTag.ItemID
  JOIN Tag ON ItemTag.TagID = Tag.TagID
 WHERE Tag.Title = :title

2-Table:

SELECT Item.*
  FROM Item
  JOIN Tag ON Item.ItemID = Tag.ItemID
 WHERE Tag.Title = :title

But there are some drawbacks, too: It could take more space in the database (which could lead to more disk operations which is slower) and it's not normalized which could lead to inconsistencies.

The size argument is not that strong because the very nature of tags is that they are normally pretty small so the size increase is not a large one. One could argue that the query for the tag title is much faster in a small table which contains each tag only once and this certainly is true. But taking in regard the savings for not having to join and the fact that you can build a good index on them could easily compensate for this. This of course depends heavily on the size of the database you are using.

The inconsistency argument is a little moot too. Tags are free text fields and there is no expected operation like 'rename all tags "foo" to "bar"'.

So tldr: I would go for the two-table solution. (In fact I'm going to. I found this article to see if there are valid arguments against it.)

Olden answered 20/9, 2013 at 18:37 Comment(14)

Does "Index: ItemId, Title" mean an index for each or one index containing both? – Caprification 3/12, 2013 at 11:23

Normally two indexes. Could depend on the database you're using, though. – Olden 4/12, 2013 at 16:50

In the tag table is ItemId and Tag a composite key? or do you have a PK as well? – Paregoric 21/3, 2014 at 18:41

I think this depends on the kind of framework you are using / how you are used to write code. Sometime its easier to have a primary key and sometimes it is not. In my case I do have a PK because I like it more to use something like DELETE?id=5 than DELETE?idemid=3&tag=Foo in the web client. From a data consistency standpoint on the other hand I wouldn't use an additional PK. And for teaching purposes I wouldn't recommend it :) – Olden 22/3, 2014 at 8:54

Regarding Scheintod's two table solution, it should work, but there are a couple limitations. 1. If your tags have any data associated with it (For example, if your tag has a description or some other data) multiply tagged objects could not have their metadata edited consistently without doing a lot of work to keep them in sync. 2. There is some extra storage involved when there are long titles and tags are used many times. In a 2 table solution, the title would be re-entered many times. Perhaps some DBs would optimize the data as part of their storage/indexing, but I am not sure. In my case, – Solfa 10/11, 2014 at 20:11

Hi, you said - "taking in regard the savings for not having to join and the fact that you can build a good index" - the 'small' table will also be indexed, the index size will be considerably less. join on FK which is int must be much faster than querying a huge index of nvarchar(n). With that said, I like this idea and will probably use a derivation of it to join many different item types (tables) with common tags. – Selves 24/11, 2014 at 19:57

HI h.alex. The point is, that you have to do less joins which can turn out to be faster than having them joined with an int FK. But considering that normally Tag tables tend to be small the speed difference (in which direction whatsoever) might not be noticeable at all. Here is another article which compares roughly the both techniques: dba.stackexchange.com/questions/15897/… – Olden 27/11, 2014 at 11:36

correct me if i'm wrong, but when i delete an item wont there be a null value in the row, and if i delete all the items wont there be a whole lot of repeating set of rows all with null values for itemid? – Entertain 24/12, 2015 at 5:57

@deez: I'm not sure, what you are talking about (but it's early in the morning). If you delete an item you have to delete the tags with it. Depending on the actual database implementation (foreign key constraint) the database won't let you delete it until you have deleted the tags. – Olden 25/12, 2015 at 7:16

in my application i have some tags that come pre loaded i cant dare delete those tags, but i can do cascade nullify which leaves a lot of duplicate useless rows, i do like the idea of eliminating the mapping table though – Entertain 25/12, 2015 at 15:26

where is the original tag list stored ? web page ? – Lallygag 19/1, 2016 at 2:43

this way you cannot create "unused" tags so an "add tag" feature has to be performed on an Item. On the other method, the "add tag" feature can be performed independently – Theologize 22/4, 2017 at 11:42

Hi I wonder if you still believe 2-table solution is better 3-table solution after all these years ? – Inexpiable 28/9, 2018 at 11:0

@Quilang. I still believe it depends on what you are up to :) I implemented it in both ways in different projects. In my last one I ended up with a 3 table solution because I needed a "tag-type" (or some other meta information on the tag) and could reuse some code from a close cousin of tags: parameters. But in the very same project I used exactly this method for an even closer cousin: flags (eg. 'sold', 'new', 'hot') – Olden 24/10, 2018 at 6:56

If you are using a database that supports map-reduce, like couchdb, storing tags in a plain text field or list field is indeed the best way. Example:

tagcloud: {
  map: function(doc){ 
    for(tag in doc.tags){ 
      emit(doc.tags[tag],1) 
    }
  }
  reduce: function(keys,values){
    return values.length
  }
}

Running this with group=true will group the results by tag name, and even return a count of the number of times that tag was encountered. It's very similar to counting the occurrences of a word in text.

Pancake answered 7/9, 2008 at 19:41 Comment(3)

+1 Nice to see some NoSQL implementations also. – Fagoting 18/3, 2011 at 16:24

@NickRetallack The link is not working. If you could, please update this answer. – Univalve 18/2, 2012 at 10:21

Ok I replaced the link with one to archive.org – Pancake 20/2, 2012 at 4:47

Use a single formatted text column[1] for storing the tags and use a capable full text search engine to index this. Else you will run into scaling problems when trying to implement boolean queries.

If you need details about the tags you have, you can either keep track of it in a incrementally maintained table or run a batch job to extract the information.

[1] Some RDBMS even provide a native array type which might be even better suited for storage by not needing a parsing step, but might cause problems with the full text search.

Polyneuritis answered 7/9, 2008 at 11:47 Comment(5)

Are you aware of any full-text search engine that doesn't find variations on a word? For example, searching for book returns books? Also, what do you do about tags like "c++"? SQL Server, for example, would strip the plus signs in the index. Thanks. – Slapbang 18/1, 2011 at 1:41

Try Sphinx - sphinxsearch.com – Vandyke 9/2, 2011 at 13:45

This 3-parts tutorial maybe useful for those who are going this route (full text search). It's using PostgreSQL native facilities: shisaa.jp/postset/postgresql-full-text-search-part-1.html – Timework 14/5, 2014 at 21:42

is this better than the selected answer in terms of performance ? – Thadeus 22/10, 2017 at 7:49

how about storing in using varchar 255, comma separeted tags and adding kfull text index on it ? – Thadeus 22/10, 2017 at 7:50

I've always kept the tags in a separate table and then had a mapping table. Of course I've never done anything on a really large scale either.

Having a "tags" table and a map table makes it pretty trivial to generate tag clouds & such since you can easily put together SQL to get a list of tags with counts of how often each tag is used.

Lewan answered 21/8, 2008 at 19:23 Comment(1)

This is even more easy if you don't use an mapping table :) – Olden 27/9, 2013 at 19:11

-2

I would suggest following design : Item Table: Itemid, taglist1, taglist2
this will be fast and make easy saving and retrieving the data at item level.

In parallel build another table: Tags tag do not make tag unique identifier and if you run out of space in 2nd column which contains lets say 100 items create another row.

Now while searching for items for a tag it will be super fast.

Truck answered 28/11, 2015 at 9:51 Comment(1)

en.wikipedia.org/wiki/First_normal_form although there are exceptions to the this, you can denormalize, but not here – Entertain 24/12, 2015 at 6:27

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags