Mongo Map Reduce first time
Asked Answered
G

2

12

First time Map/Reduce user here, and using MongoDB. I have a lot of page visit data which I'd like to make some sense of by using Map/Reduce. Below is basically what I want to do, but as a total beginner a Map/Reduce, I think this is above my knowledge!

  1. Go through all the pages with visits in the last 30 days, and where external = true.
  2. Then for each page, find all visits
  3. Group all visits by referral location
  4. For each referral location, calculate how many then went to visit a page which has a certain "type" and also has a certain word in the "tags".

The database and collection are organised as

$mongo->dbname->visits

A sample document is:

{"url": "www.example.com", "type": "a", "refer": {"external": true, "domain": "twitter.com", "url": "http://www.twitter.com/page"}, "page": "1235", "user": "1232", "time": 1234567890}

And then I want to find documents of type B with a certain tag.

{"url": "www.example.com", "type": "b", "page": "745", "user": "1232", "time": 1234567890, "tags": {"a", "b", "c"}}

I'm using the normal Mongo PHP extension if that has an impact.

Genitalia answered 9/6, 2010 at 2:57 Comment(7)
What database structure do you have? How is your collections and documents organized?Piscatorial
Added to above post. That help?Genitalia
OK, your sample document does not include a "referral" an "external" or a "tags" field. What you're suggesting is indeed complicated, so you'll probably need to show us more than one document. And you'll probably need to show it with all of the details.Tweeze
I've been working on something that is exactly the same as this (visit tracking using mongo), post a few more details and I can perhaps help.Abbe
Updated, this provide anymore info for you guys? ThanksGenitalia
When you say "Group all visits by referral location", what exactly do you mean? The same with #4 (calculate how many then went to visit a page with a certain type and certain word in tags)...? Could you provide a small data set and expected output from that data set (4 or 5 rows should suffice)?Precis
You should mark an answer if it's sufficient.Dobson
P
16

Ok, I've come up with something that I think may do what you want. Note, that this may not work exactly since I'm not 100% sure of your schema (considering your examples show refer available in type a, but not b (I'm not sure if that's an omission, or what considering you want to view by referer)... Anyway, here's what I've come up with:

The map function:

function() {
    var obj = {
        "types": {},
        "tags": {},
    }
    obj.types[this.type] = 1;
    if (this.tags) {
        for (var tag in this.tags) {
            obj.tags[this.tags[tag]] = 1;
        }
    }
    emit(this.refer.url, obj);
}

The Reduce function:

function(key, values) {
    var obj = {
        "types": {},
        "tags": {},
    }
    for (var i = 0; i < values.length; i++) {
        for (var type in values[i].types) {
            if (!type in obj.types) {
                obj.types[type] = 0;
            }
            obj.types[type] += values[i].types[type];
        }
        for (var tag in values[i].tags) {
            if (!tag in obj.tags) {
                obj.tags[tag] = 0;
            }
            obj.tags[tag] += values[i].tags[tag];
        }
    }
    return obj;
}

So basically, how it works is this. The Map function uses a key of refer.url (what I guessed based on your description). So the end result will look like an array with _id equal to refer.url (It groups based on url). It then creates an object that has two objects under it (types and tags). The reason for the object is so that map and reduce can emit the same format object. Other than that, I THINK that it should be relatively self explanatory (If you don't understand, I can try to explain more)...

So let's implement this in PHP (Assuming that $map and $reduce are strings with the above contained with them for terseness):

$mapFunc = new MongoCode($map);
$reduceFunc = new MongoCode($reduce);
$query = array(
    'time' => array('$gte' => time() - (60*60*60*24*30)),
    'refer.external' => true
);
$collection = 'visits';
$command = array(
    'mapreduce' => $collection,
    'map' => $mapFunc,
    'reduce' => $reduceFunc,
    'query' => $query,
);

$statsInfo = $db->command($command);

$statsCollection = $db->selectCollection($sales['result']);

$stats = $statsCollection->find();

foreach ($stats as $stat) {
    echo $stats['_id'] .' Visited ';
    foreach ($stats['value']['types'] as $type => $times) {
        echo "Type $type $times Times, ";
    }
    foreach ($stats['value']['tags'] as $tag => $times) {
        echo "Tag $tag $times Times, ";
    }
    echo "\n";
}

Note, I haven't tested this. This is just what I've come up with based on my understanding of your schema, and from my understanding of Mongo and its Map-Reduce implementation...

Precis answered 16/6, 2010 at 13:1 Comment(1)
$statsCollection = $db->selectCollection($sales['result']); $sales?Sarcocarp
M
0

Map reduce is already implemented in Mongo DB ODM:

http://www.doctrine-project.org/docs/mongodb_odm/1.0/en/reference/map-reduce.html

Marcelina answered 15/3, 2011 at 17:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.