What is the recommended equivalent of cascaded delete in MongoDB for N:M relationships?
Asked Answered
W

2

32

Assuming the following "schema/relationship" design what is the recommended practice for handling deletion with cascade delete like operation?

Relational Schema:

  +---------+                                    +--------+
  | Student |-*--------1-[Enrollment]-1--------*-| Course |
  +---------+                                    +--------+

MongoDB:

  +---------+                    +--------+
  | Student |-*----------------*-| Course |
  +---------+                    +--------+

Given this classic design of enrollment of students to courses, having a collection of courses in students and vice versa seems to be an appropriate data model when using MongoDB (that is nothing for the relationship/enrollment table). But coming from a relational world how should I handle the semantics of deleting a course? That is, when a course is deleted, all the "enrollment" records should be deleted too. That is, I should delete the course from the collection of each student record. It looks like I have to fire 2 queries: one for deleting the course and then to delete it from each student's collection. Is there a way to have a single query to perform this "cascade delete" like semantic without the additional query? Does the data model need to change?

NOTE: For all other use cases the above data model works just fine:

  • Deleting a student => just delete that student and associated collection of courses deleted along with it.
  • Student willing to drop a course => just delete it from the student collection of courses
  • Add student/course => just add it to corresponding 'table' in essence.

The only tricky thing is handling the deletion of a course. How should I handle this scenario in MongoDB, since I hail from a relational background and am unable to figure this one out.

Windrow answered 4/12, 2013 at 8:50 Comment(3)
There's no cascade delete. It needs to be performed as two steps.Canaletto
Due to the nature of MongoDB I hardly imagine you're designing your database correctly if you have to do something like "cascade delete" in this case. There're two approaches to the same schema, one is embedding courses into the student or using arrays with pointers to the courses. In any way there's not cascade delete available, Mongo operations are isolated document updates-deletes-inserts, even if you update multiple documents with the right clause, it's still processed one document at a time.Dietetics
Nesting has its own problems. It will backfire during reporting and update operations (covered indexes/queries can't even be made on nested objects). Our metric on this is essentially: If you require indexed reporting on any data in a nested object, or plan to update the nested object independent of the parent, make it a separate collection.Relique
S
20

What you are doing is the best and most optimal way of doing it in Mongo. I am in a similar situation and after going all possible implementations of the N:M design pattern, have also arrived to this same solution.

Apparently, This is not a mongodb thing, but more of a concept of NoSQL, wherein, the less changing data (Courses) can be kept separately. And since deleting a Course is not going to be a very frequent operation, its feasible enough to go through all the records to remove it.

On the other hand, you could let it be as it is. In your application logic, just ignore the values of Courses in the Student document that don't have a reference_id in the Course document at all. But in that case, you must make sure that old deleted Course_id's are not being reused.

OR just use the deleted flags on the Course document and handle everything else in your application logic.

Stenophagous answered 6/2, 2015 at 9:17 Comment(3)
Great answer! My only concern is how to "make sure that old deleted Course_id's are not being reused".Deckle
I agree. In my current job I've faced for the first time with a very complex project based on MongoDB and I've soon understood that a document-oriented information model is not well suited when you need to model data in "relational terms". It's necessary to keep design simple and low degree of connection between entitiesOrientate
Based on this: docs.mongodb.com/manual/reference/bson-types the objectId contains the timestamp reflecting the creation time of the document. As a result, I belive that objectId values will not be repeated. You don't need to enforce that manually.Siward
D
5

I'm going to answer based on Mongo team recommendations. I also came from the relational database and I had some issues at the beginning understanding the concepts. Mongo team recommends to design with the idea of "Application-Driven" schema, so you have to figure out first what pieces of data go together. Remember there's not such a transaction concept in any possible way in Mongo, even if we invent a driver that handles transactions we should implement our own solution for this. It means if I have two business objects that requires to be updated at the same time always and I cannot tolerate a failure in this operation, I have to join them into a single document (atomic).

In your case you have two documents, Student and Courses, and a relation between then (A student enrolls to N courses). I assume courses are not required to be altered all the time, so they can be stored in a different collection. But the point is the relation between them, in this case you need to atomically delete a Student and all the courses he enrolled in. So the best suitable solution for this is to embed the relation into Student, and keep a separated Course collection. When you delete the student, the relation is dropped at the same time:

Student Json:

{ _id: ObjectId('...'), name:"John", lastname:"Smith", 
courses: [ 1, 100, 50, 67 ], ...
}

Courses can be a separated collection between them. This is the way to handle it in Mongo. Atomic operations must be embedded into a single document. I assumed Courses is a list of courses that don't change so much, in case they're designed by Student we could change a bit the solution.

Dietetics answered 4/12, 2013 at 13:20 Comment(5)
I think you are misunderstanding something. I'm looking at deleting a course and then having it deleted from all students. What you've stated is what I've also stated as easy/doable.Windrow
Ok, I understand, however there's not such a thing like cascade delete in MongoDb. You have to deal with the fact that two collections are isolated and you could have the issue of having inconsistencies. Which I would do is to "mark" first the record I want to delete and then go through the "foreign key" to remove the rest. It could be done by a separated process, in case there's any inconsistency you can start over bc the main record will be still flagged. Flagged records should be excluded from processes of course.Dietetics
At the end you can remove flagged records in the main collection. I think it's a good approach due to the nature of MongoDbDietetics
@MaximilianoRios question - what shell be done when course been deleted? how do you keep the students synced?Conch
As I mentioned, there is no any mechanism to do this out of the box. If you delete a student you should go through the entire collection of Students and delete the relationship. Another potential solution would be to fail at the time of access, it means when you access this John Smith and read courses, you can remove the ones not existing. Again, Mongo is a document database, everything is related to a single document and relationships are not constraints.Dietetics

© 2022 - 2024 — McMap. All rights reserved.