Self "HABTM" or "HasMany Through" concept confusion
Asked Answered
B

4

7

Bounty:

+500 rep bounty to a GOOD solution. I've seriously banged my head against this wall for 2 weeks now, and am ready for help.

Tables/Models (simplified to show associations)

  • nodes
    • id
    • name
    • node_type_id
  • node_associations
    • id
    • node_id
    • other_node_id
  • node_types
    • id
    • name

General Idea:

A user can create node types (example "TV Stations", "TV Shows", and "Actors"...anything). If I knew ahead of time what the node types were and the associations between each, I'd just make models for them - but I want this to be very open-ended so the user can create any node-types they want. Then, each node (of a specific node-type) can relate to any other node of any other node-type.

Description and what I've tried:

Every node should be able to be related to any/every other node.

My assumption is that to do that, I must have an association table - so I made one called "node_associations" which has node_id and other_node_id.

Then I set up my association (using hasMany through I believe): (below is my best recollection of my set-up... it might be slightly off)

//Node model
public $hasMany = array(
    'Node' => array(
        'className' => 'NodeAssociation',
        'foreignKey' => 'node_id'

    ),
    'OtherNode' => array(
        'className' => 'NodeAssociation',
        'foreignKey' => 'other_node_id'
    )
);

//NodeAssociation model
public $belongsTo = array(
    'Node' => array(
        'className' => 'Node',
        'foreignKey' => 'node_id'

    ),
    'OtherNode' => array(
        'className' => 'Node',
        'foreignKey' => 'other_node_id'
    )
);

At first, I thought I had it - that this made sense. But then I started trying to retrieve the data, and have been banging my head against the wall for the past two weeks.

Example Problem(s):

Lets say I have a the following nodes:

  • NBC
  • ER
  • George Clooney
  • Anthony Edwards
  • Tonight Show: Leno
  • Jay Leno
  • Fox
  • Family Guy

How can I set up my data structure to be able to pull the all TV Stations, and contain their TV Shows, which contain their Actors (as example)? This would be SIMPLE with normal model setup:

$this->TvStation->find('all', array(
    'contain' => array(
        'TvShow' => array(
            'Actor'
        )
    )
));

And then, maybe I want to retrieve all male Actors and contain the TV Show which contain the TV Station. Or TV Shows that start at 9pm, and contain it's actor(s) and it's station...etc etc.

But - with HABTM or HasMany Through self (and more importantly, and unknown data set), I wouldn't know which field (node_id or other_node_id) the model is, and overall just can't wrap my head around how I'd get the content.

Brooklyn answered 1/9, 2012 at 3:11 Comment(0)
C
2

The Idea

Let's try to solve this with convention, node_id will be the model who's alias comes alphabetically first and other_node_id will be the one that comes second.

For each contained model, we create a HABTM association on-the-fly to Node class, creating an alias for each association (see bindNodes and bindNode method).

Each table we query we add an extra condition on node_type_id to only return results for that type of node. The id of NodeType is selected via getNodeTypeId() and should be cached.

For filtering results using condition in deeply related associations, you would need to manually add extra join, creating a join for each jointable with a unique alias and then joining each node type itself with an alias to be able to apply the conditions (ex. selecting all TvChannels that have Actor x). Create a helper method for this in Node class.

Notes

I used foreignKey for node_id and associationForeignKey for other_node_id for my demo.

Node (incomplete)

<?php
/**
 * @property Model NodeType
 */
class Node extends AppModel {

    public $useTable = 'nodes';

    public $belongsTo = [
        'NodeType',
    ];

    public function findNodes($type = 'first', $query = []) {
        $node = ClassRegistry::init(['class' => 'Node', 'alias' => $query['node']]);
        return $node->find($type, $query);
    }

    // TODO: cache this
    public function nodeTypeId($name = null) {
        if ($name === null) {
            $name = $this->alias;
        }
        return $this->NodeType->field('id', ['name' => $name]);
    }

    public function find($type = 'first', $query = []) {
        $query = array_merge_recursive($query, ['conditions' => ["{$this->alias}.node_type_id" => $this->nodeTypeId()]]);
        if (!empty($query['contain'])) {
            $query['contain'] = $this->bindNodes($query['contain']);
        }
        return parent::find($type, $query);
    }

    // could be done better    
    public function bindNodes($contain) {
        $parsed = [];
        foreach($contain as $assoc => $deeperAssoc) {
            if (is_numeric($assoc)) {
                $assoc = $deeperAssoc;
                $deeperAssoc = [];
            }
            if (in_array($assoc, ['conditions', 'order', 'offset', 'limit', 'fields'])) {
                continue;
            }
            $parsed[$assoc] = array_merge_recursive($deeperAssoc, [
                'conditions' => [
                    "{$assoc}.node_type_id" => $this->nodeTypeId($assoc),
                ],
            ]);
            $this->bindNode($assoc);
            if (!empty($deeperAssoc)) {
                $parsed[$assoc] = array_merge($parsed[$assoc], $this->{$assoc}->bindNodes($deeperAssoc));
                foreach($parsed[$assoc] as $k => $v) {
                    if (is_numeric($k)) {
                        unset($parsed[$assoc][$k]);
                    }
                }
            }
        }
        return $parsed;
    }

    public function bindNode($alias) {
        $models = [$this->alias, $alias];
        sort($models);
        $this->bindModel(array(
            'hasAndBelongsToMany' => array(
                $alias => array(
                    'className' => 'Node',
                    'foreignKey' => ($models[0] === $this->alias) ? 'foreignKey' : 'associationForeignKey',
                    'associationForeignKey' => ($models[0] === $alias) ? 'foreignKey' : 'associationForeignKey',
                    'joinTable' => 'node_associations',
                )
            )
        ), false);
    }

}

Example

$results = $this->Node->findNodes('all', [
    'node' => 'TvStation', // the top-level node to fetch
    'contain' => [         // all child associated nodes to fetch
        'TvShow' => [
            'Actor',
        ]
    ],
]);
Cachexia answered 1/9, 2012 at 6:19 Comment(7)
Still trying to wrap my head around your suggestion, but the first red flag is that it looks like you're requiring me to make the actual model file for "TV Station", which defeats the purpose of this question imo. If I made the model files, I wouldn't need complex associations/finds..etc. Y/n?Brooklyn
I thought you wanted that from the example you showed. You only want to use the Node model to query all types or what?Cachexia
You created a "TVStation" model - but "TV Station" is something the user would have generated by adding it as a "Node type".Brooklyn
Like I said, I was just going off the example in your question which I guess I misunderstood. How do you plan on having the user say "I want to retrieve all male Actors and contain the TV Show which contain the TV Station. Or TV Shows that start at 9pm, and contain it's actor(s) and it's station...etc etc." Want to hop on irc?Cachexia
Tried creating a bounty, but it's not letting me - will try again later today and/or tomorrow. Thanks very much again!Brooklyn
WTH i still can't create a bountyBrooklyn
K - added with this hack: meta.stackexchange.com/questions/141671/… will give it in 24 hours when it lets meBrooklyn
J
1

I think you have incorrect relations between your models. I guess it will be enough with:

// Node Model
public $hasAdBelongsToMany = array(
    'AssociatedNode' => array(
        'className' => 'Node',
        'foreignKey' => 'node_id'
        'associationForeignKey' => 'associated_node_id',
        'joinTable' => 'nodes_nodes'
    )
);

// Tables

nodes

  • id
  • name
  • node_type_id

nodes_nodes

  • id
  • node_id
  • associated_node_id

node_types

  • id
  • name

Then you can try using ContainableBehavior to fetch your data. For Example, to find all TVShows belonging to a TVStation:

$options = array(
    'contain' => array(
        'AssociatedNode' => array(
            'conditions' => array(
                'AssociatedNode.node_type_id' => $id_of_tvshows_type
            )
        )
    ),
    conditions => array(
        'node_type_id' => $id_of_tvstations_type
    )
);
$nodes = $this->Node->find('all', $options);

EDIT :

You can even have second level conditions (see last example on this section, look at the 'Tag' model conditions). Try this:

$options = array(
    'contain' => array(
        'AssociatedNode' => array(
            'conditions' => array(
                'AssociatedNode.node_type_id' => $id_of_tvshows_type
            ),
            'AssociatedNode' => array(
                'conditions' => array( 'AssociatedNode.type_id' => $id_of_actors_type)
            )
        )
    ),
    conditions => array(
        'node_type_id' => $id_of_tvstations_type
    )
);
$nodes = $this->Node->find('all', $options);
Jat answered 1/9, 2012 at 18:57 Comment(4)
I don't believe that works - 1) how do I know that it's the associated_node_id I'm looking for as opposed to the node_id, and 2) Per my example question, I want to be able to contain another level down. I've been able to get it to work with one level of contain - but never further... which seems very restricting.Brooklyn
1) I think you don't completely understand the HABTM relation, that is why your models are so messy. You should read more hasAndBelongsToMany. 2) It was just an example, see the edit.Jat
1) I was using HasMany Through, not HABTM. "read more hasAndBelongsToMany" - where? Any/all examples I find are for two different tables, not HABTM with self. And when trying on itself, it doesn't return correct/all data. 2) I understand Contain, and that you can contain more than one level - but when I try this way with . Also - how would I know that I'm looking for 'AssociatedNode.type_id' - my current table has a 'node_type_id' field, but which node is of that type?Brooklyn
1) Relations to same model are equals to any other relation, you just need to give a new name for the relation ('AssociatedNode' in this case) and declare the model you are associating with 'className'. You can find an example (not HABTM) here (see third example). 2) Every "AsociatedNode" is just a "Node". So, each one has many other related nodes, and one node_type_id.Jat
V
1

I think unfortunately part of the problem is that you want your solution to contain user data in the code. Since all your nodes types are user data, you want to avoid trying to use those as the classes methods in your application, as there could be infinite of them. Instead I would try and create methods that model the data operations you want to have.

One omission I see in the provided data model is a way to record the relationships between types. In your example you mention a relationship between TvStation -> TvShows -> Actor etc. But where are these data relationships defined/stored? With all of your node types being user defined data, I think you'll want to/need to record store those relationships somewhere. It seems like node_types needs some additional meta data about what the valid or desired child types for a given type are. Having this recorded somewhere might make your situation a bit simpler when creating queries. It might help to think of all the questions or queries you're going to ask the database. If you cannot answer all those questions with data that is in the database, then you are probably missing some tables. Model associations are just a proxy for data relations that already exist in your tables. If there are gaps there are probably gaps in your data model.

I don't think this is the answer you're looking for but hopefully it helps you find the right one.

Valerle answered 1/9, 2012 at 23:17 Comment(0)
M
-1

Why don't you create a method in the node model?

Something like:

    <?php 
        // first argument is a nested array filled with  integers 
(corresponding to node_type_id)
        //second one id of a node
    //third one corresponds to the data you want(empty at beginning in most case)
    public function custom_find($conditions,$id,&$array){

        //there may several type of nodes wanted: for instances actors and director of a serie, so we loop
        foreach($conditions as $key_condition=>$condition){

            //test to know if we have reached the 'bottom' of the nested array: if yes it will be an integer '2', if no it will be an array like '2'=>array(...)
            if(is_array($condition))){
                   //this is the case where there is deeper levels remaining

                        //a find request: we ask for the node defined by its id,
 //and the child nodes constrained by their type: ex: all actors of "Breaking Bad"
                        $this->id=$id;
                $result=$this->find('all',array(
                        'contain' => array(
                                'OtherNode' => array(
                                        'conditions'=>array('node_type_id'=>$key_condition)
                                )
                        )
                )
             );

                //we add to $array the nodes found. Ex: we add all the actors of the serie, with type_id as key
                        $array[$key_condition]=$result['OtherNode'];

                         //Then  on each node we just defined we call the function recursively. Note it's $condition not $conditions
                foreach($array[$key_condition] as &$value){
                    $this->custom_find($condition,$value['Node']['id'],$value);
                }

            }else{
                //if we simply add data
                        $this->id=$id;
                $result=$this->find('all',array(
                        'contain' => array(
                                'OtherNode' => array(
                                        'conditions'=>array('node_type_id'=>$value)
                                )
                        )
                )
             );

             $array[$condition]=$result['OtherNode'];
            }

        }



    }

That code is almost certainly wrong, it's just to give you an idea of what I mean.

Edit:

What it does:

it's a recursive function that takes a nested array of conditions and the id of a node and gives back nested array of nodes.

For instance: $conditions=array('2','4'=>array('5','6'=>array('4')))

How it works:

For a single node it gives back all the child nodes corresponding to the condition in the array: then it does the same for the children with the conditions one level deeper, until there is no more levels left.

Manzoni answered 1/9, 2012 at 18:2 Comment(8)
I don't believe that works - 1) how do I know that it's the OtherNode I'm looking for as opposed to the Node, and 2) Per my example question, I want to be able to contain another level down. I've been able to get it to work with one level of contain - but never further... which seems very restricting.Brooklyn
1) fixed the code (which is probably still wrong) 2) the function is recursive so no problemManzoni
I tried reading over this 5 times now, but really don't follow your code - can you clarify with a bit more commenting? I really want this to work - just hard to imagine it will (mostly likely because it confuses me as to what you're trying to accomplish)Brooklyn
Edited. I do not know claim it's a good solution, I'm not a Cakephp expert, it's just the idea I had reading your question.Manzoni
I don't have child nodes - I have 'node_id' and 'associated_node_id' .. how do I determine which one to query against?Brooklyn
Imagine '2' is the ref for actors and '3' for series. Then array('2'=>array('3')) will give the actors with the linked series in each of them. But array('3'=>array('2')) will give the series with the linked actors in each of them. The $conditions array has child nodes, as it is nested, that's how we know which is which.Manzoni
The conditions have that, but the table has simply 2 fields. How would you know which to query against, as it could be actor/show or show/actor, and either way, it means they're related.Brooklyn
let us continue this discussion in chatManzoni

© 2022 - 2024 — McMap. All rights reserved.