How to deal with mutually dependent inserts
Asked Answered
O

3

6

I have a model that defines mutually recursive tables:

Answer
  questionId QuestionId
  text

Question
  text
  correct AnswerId

What do I need to do to actually insert a question? I need to know what the correct answer is first. But to insert an answer, I need to know what question it answers.

I'm running Postgres, if it matters.

The DDL is:

CREATE TABLE answer (
  id integer NOT NULL,                 -- answer id
  text character varying NOT NULL,     -- answer text
  question_id bigint NOT NULL          -- question id
);

CREATE TABLE question (
  id integer NOT NULL,                 -- question id
  question character varying NOT NULL, -- question text
  correct bigint NOT NULL,             -- correct answer
  solution character varying NOT NULL  -- solution text
);

ALTER TABLE ONLY answer ALTER COLUMN id SET DEFAULT nextval('answer_id_seq'::regclass);

ALTER TABLE ONLY answer
  ADD CONSTRAINT answer_question_id_fkey FOREIGN KEY (question_id) REFERENCES question(id);

ALTER TABLE ONLY question ALTER COLUMN id SET DEFAULT nextval('question_id_seq'::regclass);
ALTER TABLE ONLY question
  ADD CONSTRAINT question_correct_fkey FOREIGN KEY (correct) REFERENCES answer(id);
```sql
Oscine answered 17/7, 2014 at 20:32 Comment(16)
I don't feel the recursion here. This looks like a straight forward foreign key relationship, unless when dealing with Question.QuestionId 1, has an AnswerId of 2, which in turn has a QuestionId of anything other than 1?Armenian
@Jaaz: It's a model for multiple choice tests. Each question will have, say, 4 answers. Only one of them is correct. The question knows which one is correct, and the answers know which questions they point at. I can't insert a question unless I know what it's answer id is. I can't insert an answer unless I know what its question id is.Oscine
lol that seems so silly. You can't insert a question without knowing the correct answer and you can't insert an answer without knowing the question? Can you redesign this sillyness at all, or are you stuck with it? The correct answer should not be stored on the question record...Pluviometer
Ah, ok. This might be easier to answer, then, with full DDL statements from your tables. For example, is Question.QuestionId a SERIAL column, or just an INT Id that can be assigned by the application?Armenian
@Twelfth: that seems more like a limitation of the database engine than the model. The answer should be stored with the question, because it is a property of the question.Oscine
@Oscine - not at all...a database should never enforce conflicting requirements like a question requires an answer (not null) prior to any answer being entered (it's not a database limitation, it simple logic). Correct answer should either be on the answer record (correct/incorrect being a property of the answer), or a third table that maps questions to correct answers, or the correct answer must allow nulls so a question can be entered then answers, then the update of the question record.Pluviometer
@Twelfth: They're not conflicting requirements. The problem is that the database doesn't have a way to insert both requirements "at the same time" (in the context of a single insert). At least that I know of. Hence my question.Oscine
@Oscine - I think thats true of every database engine...no? The second you split this into 2 tables you made the simoultaneous insert option impossible. I guess a 4th solution would be to combine the tables and flatten out the entire relation so it is just one insert. and a 5th solution...drop the relation, insert, rebuild the relation. WIsh I had a better answerPluviometer
I don't have time for a full answer, but I think "deferred constraints" and manually working with the IDs would work (reserving question ID in advance, and deferring the FK constraint that refers to it so you can insert the answers first).Lithotomy
Hmm....looking further into the documentation, there's a way to do it, but it's not pretty, per se. Can you, in your application, guarantee that the correct answer will be inserted first out of a set of answers? Can you also guarantee only 1 connection doing inserts at any time?Armenian
@Twelth I agree with the OP: the data is not conflicting once it is in place. The intermediate state, which incorrectly allows a question with no correct answer, or an answer without a question, is not part of the model. It may need to exist as a technical limitation, but constraining against it is a valid aim.Lithotomy
@Lithotomy Yes, I've gotten the clarify on there...once he has the data in there, the constraints work...it's just getting there that is his issue. Got tons of work arounds, but none of them are really a solution. Curious what Jaaz Cole has with the correct answer inserted first.Pluviometer
Finally remembered the right term to search for: "circular references" is the common name for this kind of situation. Which leads to this question: #956171 Not sure at a glance if any of the answers will run directly on Postgres, but the principles and techniques should apply.Lithotomy
Heh, answers there are pretty much the same...Oracle has a special defer constraints command, microsoft you simply allow nulls...Pluviometer
@Twelth I don't think the different answers reflect capabilities of different software as much as different expertise of those answering. Deferred constraints are definitely not unique to Oracle.Lithotomy
@IMSoP: The suggested duplicate doesn't have satisfying answers. I think I provided something better.Ximenes
X
6

If you enter question and answer in a single statement with a data-modifying CTE, you do not even need a DEFERRABLE FK constraints. Not to speak of actually making (or SETting) them DEFERRED - which would be a lot more expensive.

Data model

First I cleaned up your data model:

CREATE TABLE question (
   question_id       serial PRIMARY KEY
 , correct_answer_id int  NOT NULL
 , question          text NOT NULL
 , solution          text NOT NULL
);

CREATE TABLE answer (
   answer_id   serial PRIMARY KEY
 , question_id int  NOT NULL REFERENCES question
 , answer      text NOT NULL
);

ALTER TABLE question ADD CONSTRAINT question_correct_answer_id_fkey
FOREIGN KEY (correct_answer_id) REFERENCES answer(answer_id);
  • Don't use the non-descriptive "id" or "text" (also a basic type name) as column names.
  • Put integer columns first for space efficiency. See:
  • bigint was uncalled for, integer should suffice.
  • Simplify your schema definition with serial columns.
  • Define primary keys. PK columns are NOT NULL automatically.

Solution

After delegating primary key generation to sequences (serial columns), we can get auto-generated IDs with the RETURNING clause of the INSERT statement. But in this special case we need both IDs for each INSERT, so I fetch one of them with nextval() to get it going.

WITH q AS (
   INSERT INTO question
          (correct_answer_id              , question, solution)
   VALUES (nextval('answer_answer_id_seq'), 'How?'  , 'DEFERRABLE FK & CTE')
   RETURNING correct_answer_id, question_id
   )
INSERT INTO answer
      (answer_id        , question_id, answer)
SELECT correct_answer_id, question_id, 'Use DEFERRABLE FK & CTE'
FROM   q;

I know the name of the sequence ('answer_answer_id_seq') because I looked it up. It's the default name. If you don't know it use the safe form @IMSoP provided in a comment:

nextval(pg_get_serial_sequence('answer', 'answer_id'))

DEFERRABLE or DEFERRED constraints?

The manual on SET CONSTRAINTS:

IMMEDIATE constraints are checked at the end of each statement.

My solution is a single statement. That's why it works where two separate statements would fail - wrapped in a single transaction or not. And you'd need SET CONSTRAINTS ... DEFERRED; like IMSoP first commented and @Jaaz implemented in his answer.
However, note the disclaimer some paragraphs down:

Uniqueness and exclusion constraints that have not been declared DEFERRABLE are also checked immediately.

So UNIQUE and EXCLUDE need to be DEFERRABLE to make CTEs work for them. This includes PRIMARY KEY constraints. The documentation on CREATE TABLE has more details:

Non-deferred Uniqueness Constraints

When a UNIQUE or PRIMARY KEY constraint is not deferrable, PostgreSQL checks for uniqueness immediately whenever a row is inserted or modified. The SQL standard says that uniqueness should be enforced only at the end of the statement; this makes a difference when, for example, a single command updates multiple key values. To obtain standard-compliant behavior, declare the constraint as DEFERRABLE but not deferred (i.e., INITIALLY IMMEDIATE). Be aware that this can be significantly slower than immediate uniqueness checking.

We discussed this in great detail under this related question:

Ximenes answered 18/7, 2014 at 1:56 Comment(17)
I don't see the reason for the additional CTE only to call nextval()? A simple insert into ... values (nextval()... would work just as well.Rencontre
On a point of pedantry, just because your answer is so careful with other best practices: since you're not manually creating the sequence, you should not be hard-coding its name, so need nextval(pg_get_serial_sequence('answer', 'answer_id')) instead of nextval('answer_answer_id_seq'). (My rule of thumb being "if you didn't name it, don't assume you know its name".)Lithotomy
@a_horse_with_no_name: True, the first CTE was overkill. I simplified accordingly.Ximenes
@IMSoP: Good point, too. My example assumes that I know the name of the sequence. If you don't, your alternative is the sure way.Ximenes
@ErwinBrandstetter But you fixed the schema so that you don't know it. If I run CREATE SEQUENCE answer_answer_id_seq; followed by your recommended SQL, it will fail, because the serial in the table definition will be forced to pick a different name. Given you went so far as to move the int columns to save disk space, not using a proper lookup for the sequence seems like a flaw in an otherwise very careful answer.Lithotomy
@IMSoP: I just have to look it up once to know it. It's not going to change. Alternatively, we can create the constraint with an explicit name to be sure. And you are right, that it was a bit sloppy. So I added a pointer to your comment.Ximenes
@ErwinBrandstetter Not if you plan to save this schema definition for deployment in multiple environments. What is the advantage of not using the lookup function?Lithotomy
Point is shorter / simpler code and not to further complicate an already complex answer. There are always so many considerations one could add. I feel like I am doing too much of that already in my answers.Ximenes
@ErwinBrandstetter Well, if you use the robust form, you don't even need to explain it: as far as the reader's concerned, they will (rightly) assume that's just how you get the underlying sequence for a serial column. As it is, you've added complexity by explaining the assumption you've made.Lithotomy
@IMSoP: Also true. I think this is solved properly by pointing to your comment now.Ximenes
@ErwinBrandstetter I still see no reason not to edit the query to use the best-practice approach (just as you edited the OP's schema not to use columns called id and text, etc), because I see no reason to ever write code that doesn't do so. However, I'll "agree to disagree" if you're set on your current wording.Lithotomy
@IMSoP: Updated the answer accordingly.Ximenes
Hi thank you for your help with this answer. I must admit that, because I'm tied to a specific framework, I didn't implement some of your advice. But the deferred constraint worked great. I knew there had to be a way to get the DB to "always" (atomically) respect my domain constraint.Oscine
This is a really nice approach and would even work if I inserted multiple rows with multiple generated ids at once! However, I'm currently facing a problem with using this on identity columns where apparently one needs extra permissions on the sequence to use next_val(), is that expected?Bakelite
@Bergi: Yes, the current role needs the USAGE or UPDATE privilege on the sequence to call nextval() on it. (Or be the owner or a superuser.) UPDATE or INSERT privilege on the owning table is not enough, though the implicit use of the attached sequence is granted. The sequence of an IDENTITY column is more of an implementation detail and not supposed to be used separately in normal operation. So you need GRANT USAGE ON tbl_col_seq to my_role; first. The sequence of legacy serial columns is more detached: there you need separate privs on the sequence even for INSERT.Ximenes
@Bergi: If you can't get the owner to grant you the necessary privs on the sequence, but you have INSERT and SELECT on the table, an awkward workaround would be possible: BEGIN; INSERT INTO tbl DEFAULT VALUES RETURNING id; -- take note of generated number! ROLLBACK; You may have to provide actual values for NOT NULL columns without default, etc. Not recommended. Stuff for a question really, not a comment. ;)Ximenes
@ErwinBrandstetter Ok thanks for the confirmation, I can do the GRANTs no problem but I was really surprised there's an "implicit privilege" when "implicitly using" the sequence through inserts.Bakelite
A
1

I would insert into question, with a null correct AnswerId. Then I would insert into Answer, and finally I would update Question and set the correct answerId.

Aristophanes answered 17/7, 2014 at 20:46 Comment(3)
Except that correct_answer_id is defined NOT NULL for some reason. And it can be cheaper.Ximenes
@ErwinBrandstetter Ah. That's frustrating. You could always disable the constraint temporarilly but I think Erwin's answer is sufficient.Aristophanes
Disabling the constraint temporarily requires privileges and takes out an exclusive lock on the table. No good for multi-user environment.Ximenes
A
1

I went looking around after seeing the DDL. Consider a function for your call to insert a question with correct answer, and one to add (false) answers to a given question. The structure of the first function allows the application to pick up the anonymous returned record for the questionID, and use it for subsequent calls to the second function, to add false answers.

CREATE FUNCTION newQuestion (questionText varchar, questionSolutionText varchar, answerText varchar, OUT questionID integer) AS $$
  BEGIN
    START TRANSACTION;
    SET CONSTRAINTS question_correct_fkey DEFERRED;
    questionID := nextval('question_id_seq');
    answerID := nextval('answer_id_seq');
    INSERT INTO question (id, question, correct, solution) values (questionID, questionText, answerID, questionSolutionText);
    INSERT INTO answer (id, text, question_id) values (answerID, answerText, questionID);
    SET CONSTRAINTS question_correct_fkey IMMEDIATE;
    COMMIT TRANSACTION;
  END;
$$
CREATE FUNCTION addFalseAnswer (questionID integer, answerText varchar) AS $$
  BEGIN
    INSERT INTO answer (text, question_id) VALUES (answerText, questionID);
  END;
$$

I've not written SQL for PostGreSQL in a long while, so I hope all is in order here. please let me know if there are any issues.

Armenian answered 17/7, 2014 at 22:44 Comment(2)
There is no problem with concurrency or rollbacks here, because nextval() will never give the same answer twice, even in simultaneous transactions. However, your insert into question still violates the foreign key constraint: you know that you're about to insert an answer with that ID, but the DBMS does not. At the time that insert runs, there is no such row, so the constraint is violated.Lithotomy
Agreed, it was a piece I missed as I was looking for the syntax again. Thanks for the heads up.Armenian

© 2022 - 2024 — McMap. All rights reserved.