What is the best way to design a tag-based data table with Sqlite?
Asked Answered
R

1

8

Json received from the server has this form.

[
 {
  "id": 1103333,
  "name": "James",
  "tagA": [
    "apple",
    "orange",
    "grape"
  ],
  "tagB": [
    "red",
    "green",
    "blue"
  ],
  "tagC": null
  },

  {
  "id": 1103336,
  "name": "John",
  "tagA": [
    "apple",
    "pinapple",
    "melon"
  ],
  "tagB": [
    "black",
    "white",
    "blue"
  ],
  "tagC": [
    "London",
    "New York"
    ]
  }
]

An object can have multiple tags, and a tag can be associated with multiple objects.

In this list, I want to find an object whose tagA is apple or grape and tagB is black.

This is the first table I used to write.

create table response(id integer primary key, name text not null, tagA text, 
tagB text, tagC text)

select * from response where (tagA like '%apple%' or tagA like '%grape%') and (tagB like '%black%')

This type of table design has a problem that the search speed is very slow because it does not support the surface function of the fts function when using ORM library such as Room.

The next thing I thought about was to create a table for each tag.

create table response(id integer primary key, name text not null)

create table tagA(objectID integer, value text, primary key(objectID, value))

create table tagB(objectID integer, value text, primary key(objectID, value))

create table tagC(objectID integer, value text, primary key(objectID, value))

select * from response where id in ((select objectId from tagA where value in ('apple','grape')) 
intersect
(select objectId from tagB where value in 'black'))

This greatly increases the insertion time and the capacity of the APK (roughly twice as much per additional table), but the search speed is far behind that of the FTS virtual table.

I want to avoid this as much as I use FTS tables because there are more things I need to manage myself.

There are a lot of things I missed (index etc.) but I can not figure out what it is.

How can I optimize the database without using the FTS method?

Rios answered 2/7, 2018 at 3:33 Comment(0)
R
8

You could use a reference table (aka mapping table along with a multitude of other names) to allow a many-many relationship between tags (single table for all) and objects (again single table).

So you have the objects table each object having an id and you have the tags table again with an id for each object. So something like :-

DROP TABLE IF EXISTS object_table;
CREATE TABLE IF NOT EXISTS object_table (id INTEGER PRIMARY KEY, object_name);
DROP TABLE IF EXISTS tag_table;
CREATE TABLE IF NOT EXISTS tag_table (id INTEGER PRIMARY KEY, tag_name);

You'd populate both e.g.

INSERT INTO object_table (object_name) VALUES
    ('Object1'),('Object2'),('Object3'),('Object4');
INSERT INTO tag_table (tag_name) VALUES
    ('Apple'),('Orange'),('Grape'),('Pineapple'),('Melon'),
    ('London'),('New York'),('Paris'),
    ('Red'),('Green'),('Blue'); -- and so on

The you'd have the mapping table something like :-

DROP TABLE IF EXISTS object_tag_mapping;
CREATE TABLE IF NOT EXISTS object_tag_mapping (object_reference INTEGER, tag_reference INTEGER);

Overtime as tags are assigned to objects or vice-versa you add the mappings e.g. :-

INSERT INTO object_tag_mapping VALUES
    (1,4), -- obj1 has tag Pineapple
    (1,1),  -- obj1 has Apple
    (1,8), -- obj1 has Paris
    (1,10), -- obj1 has green
    (4,1),(4,3),(4,11), -- some tags for object 4
    (2,8),(2,7),(2,4), -- some tags for object 2
    (3,1),(3,2),(3,3),(3,4),(3,5),(3,6),(3,7),(3,8),(3,9),(3,10),(3,11); -- all tags for object 3

You could then have queries such as :-

SELECT object_name, 
    group_concat(tag_name,' ~ ') AS tags_for_this_object 
FROM object_tag_mapping 
JOIN object_table ON object_reference = object_table.id
JOIN tag_table ON tag_reference = tag_table.id
GROUP BY object_name
;
  • group_concat is an aggregate function (applied per GROUP) that concatenates all values found for the specified column with (optional) separator.

The result of the query being :-

enter image description here

The following could be a search based upon tags (not that you'd likely use both tag_name and a tag_reference) :-

SELECT object_name, tag_name 
FROM object_tag_mapping 
JOIN object_table ON object_reference = object_table.id
JOIN tag_table ON tag_reference = tag_table.id
WHERE tag_name = 'Pineapple' OR tag_reference = 9
;

This would result in :-

enter image description here


  • Note this is a simple overview e.g. you may want to consider having the mapping table as a WITHOUT ROWID table, perhaps have a composite UNIQUE constraint.

Additional re comment :-

How do I implement a query that contains two or more tags at the same time?

This is a little more complex if you want specific tags but still doable. Here's an example using a CTE (Common Table Expression) along with a HAVING clause (a where clause applied after the output has been generated, so can be applied to aggregates) :-

WITH cte1(otm_oref,otm_tref,tt_id,tt_name, ot_id, ot_name) AS 
    (
        SELECT * FROM object_tag_mapping 
        JOIN tag_table ON tag_reference = tag_table.id 
        JOIN object_table ON object_reference = object_table.id
        WHERE tag_name = 'Pineapple' OR tag_name = 'Apple'
    )
SELECT ot_name, group_concat(tt_name), count() AS cnt FROM CTE1 
GROUP BY otm_oref
HAVING cnt = 2
;

This results in :-

enter image description here

Raddle answered 2/7, 2018 at 4:44 Comment(7)
Thank you for your reply. Your approach has shown a fast search speed while taking up a relatively small APK capacity. There is just one question. How do I implement a query that contains two or more tags at the same time? I tried it with a simple, where tag_reference = A and tag_reference = B, but no answer. I think this is not the form of where tag_reference in ((select ~) intersect (select ...)).Rios
The last example searches for two tags. Note that tag_reference is an integer (the id of the tag) so tag_reference = 'A' won't work, tag_name = 'A' would find a tag with A. The last example shows both being used. For a host of tags then using SELECT ........ IN(list of tags) can be used.Raddle
Oh, I think I presented the example wrong. I want to return Object1 and Object3 when I select an object that has both 'Pineapple (4)' and 'Apple (1)' in the list. However, In query is a state in which all objects are returned (Object1 through Object4 have either Apple or Pineapple).Rios
One way could be to use a Common Table Expression e.g. WITH cte1(otm_oref,otm_tref,tt_id,tt_name, ot_id, ot_name) AS ( SELECT * FROM object_tag_mapping JOIN tag_table ON tag_reference = tag_table.id JOIN object_table ON object_reference = object_table.id WHERE tag_name = 'Pineapple' OR tag_name = 'Apple' ) SELECT ot_name, group_concat(tt_name), count() AS cnt FROM CTE1 GROUP BY otm_oref HAVING cnt = 2 ;. However, this does go way beyond answering the actual question that was for an effective design for using many-many relationships.Raddle
I seem to have a clue to solve my problem. Thank youRios
@Raddle Awesome! Any idea which type would the column tags_for_this_object be retrieved in if we are implementing it in an Android app using Room library? Would that be a String[]?Staggard
@ShahoodulHassan I believe String or String[]. However i seldom use Room.Raddle

© 2022 - 2024 — McMap. All rights reserved.