Junction tables vs foreign key arrays? [duplicate]
Asked Answered
G

1

25

I'm modeling many-to-many relationship where the relationship is accessed most of the time from one side only. It's more like a hierarchy, that is accessed top-down and not the other way around.

Survey has and belongs to many Questions has and belongs to many Answers.

Both relationships must be many-to-many because a same question can be re-used across different surveys and same answer in many questions. This is a requirement.

The standard M2M implementation would use two junction tables, surveys_questions and questions_answers. Instead, I'm thinking about using PostgreSQL's integer arrays to store question_ids in Survey and answer_ids in Question.

We can utilize the ANY operator to query all rows matching the foreign key array.

How would we query for all the surveys with their questions and questions's answers using SQL?

How can we match the order of the rows returned with the foreign key array? ie. using question_ids = [1,2,3] is guaranteed to return question rows with the order 1, 2, 3.

How does this perform performance wise compared to junction tables (assuming proper indexes, whatever they might be)?

Would you suggest this? Are there some resources about modeling M2M like this?

Update

There was a proposal to add referential integrity for array foreign keys to PostgreSQL 9.3, but it didn't get included: http://blog.2ndquadrant.com/postgresql-9-3-development-array-element-foreign-keys/

SO question about maintaining order using foreign key array PostgreSQL JOIN with array type with array elements order, how to implement?

Gorton answered 12/12, 2012 at 12:10 Comment(2)
you say many to many , but this sounds like one to many; many to many would mean that each survey relates to several questions and each question relates to several surveys, but that sounds a little odd, certainly, the way you phrased it 'has-many' is normally synonymous with one to many (many-to-many is usually called 'has-and-belongs-to-many')Jacobah
@TokenMacGuy: Sorry for the confusion. Questions are re-usable across surveys and answers across questions making the relationships many-to-many. I'll replace the has many -relationship with HABTM.Gorton
R
8

Use the junction table approach. The array method is non-standard enough that you have to ask questions about how much it would work, whereas the other is completely standard.

Rockfish answered 12/12, 2012 at 13:9 Comment(6)
junction tables can be significantly slower than arrays though. see gist.github.com/joevandyk/031cf5812bd656887623Estrellaestrellita
Yes, sometimes there is a performance gain to be had from the array method. There's a big question over whether it would hold up for all situations of course, and one of the primary problems is that adding/removing a new link requires a modification on a potentially long row (coupon, in the example) rather than insertion/deletion of a single row, and a lock on the coupon table.Rockfish
Agreed! If you didn't want to modify the coupons table on every insert, you could have a coupons_products_array table (coupon_id, product_ids[]) instead. But that might be getting silly.Estrellaestrellita
@JoeVanDyk Joining is slow, better to query a query. In my test, there is no performance difference between array and querying a query, both 0.075 ms. Array's planning time is still faster though, 0.211 ms vs 0.769 ms. gist.github.com/joevandyk/…Cathar
Maintaining an array sounds like a locking nightmare for high-volume tables.Titoism
@Amalgovinus yes it is certainly a point of guaranteed serialisation.Rockfish

© 2022 - 2024 — McMap. All rights reserved.