What is the most efficient way to implement GroupBy in Javascript?
Asked Answered
S

3

8

I am trying to implement a GroupBy method with these parameters

function GroupBy(keySelector, elementSelector, comparer)
{
    // keySelector = function(e) { return e.ID }
    // elementSelector = function(e) { return e.Name }
    // comparer = { Equals: function(a,b) { return a==b }, GetHashCode:... }
}

However I don't know an efficient way to implement it.

I created a jsPerf test with linq.js and a method I created which doesn't use a comparer and only work on flat types. (Output test here)

Other libraries such as underscore and Lo-Dash doesn't take a comparer parameter. So their implementations are irrelevant.

My key could be a class, so I need something to determine if TKey is the same in different instances.

So basically what I am trying to do is copy C# Linq GroupBy behavior documented here.

Sample input:

var arrComplex =
[
    { N: { Value: 10 }, Name: "Foo" },
    { N: { Value: 10 }, Name: "Bar" },
    { N: { Value: 20 }, Name: "Foo" },
    { N: { Value: 20 }, Name: "Bar" }
];

Sample output (or something like this):

[
    {
       "Key": {"Value":10},
       "Elements":["Foo","Bar"]
    },
    {
        "Key": {"Value":20},
        "Elements":["Foo","Bar"]
    }
] 

Any ideas on how to implement it?

Bounty

For the bounty I would like you take in consideration that:

  • The key could be an object
  • Two objects can be equal if some property is equal
  • It should be as fast as or faster than existing solutions
  • The result can be an array or object, doesn't matter as long as I can get elements grouped by a key

Well, I expect a complete answer.

Supererogation answered 17/12, 2013 at 11:59 Comment(11)
Do you really need your own implementation? Can't you use Underscore.js or other library?Bathsheeb
Lo-Dash.js is another alternative and it bills itself as faster and more efficient than Underscore.jsAttitudinarian
Both doesn't seem to have a comparer parameter.Supererogation
@Supererogation Why do you need comparer? What happens if comparer returns 'equality' for two elements with different keys? What key will be used for this group?Bathsheeb
I need a comparer to group by a complex type. My Key could be a class.Supererogation
You want the "Most efficient" way - how large are the data sets you're working on? I'd .reduce here most likely but that's not the most efficient, the most efficient is a Map and a for loop.Genista
Could you add an example where you would need the comparer? And the expected output?Alboin
Thanks, the question is much more understandable now :) Since the keys are created using literals, i suppose you want a to do a test for deep equality on the keys, not a reference check, is that correct?Alboin
The keys could be anything. I would determine the equality by using a method Equals (ex: function(a,b){return a.N.Value == b.N.Value;}) and maybe a GetHashCode (ex: function(a) { return a.N.Value; }).Supererogation
love the question. gave me something to do today :)Savino
@Savino I am really glad to hear that there is still people that code for fun, like me. :)Supererogation
S
0

I managed to implement this way:

I need to get the hashcode from objects.

Object.prototype.GetHashCode = function () {
    var s = this instanceof Object ? stringify(this) : this.toString();

    var hash = 0;
    if (s.length === 0) return hash;
    for (var i = 0; i < s.length; ++i) {
        hash = ((hash << 5) - hash) + s.charCodeAt(i);
    }
    return hash;
};
Number.prototype.GetHashCode = function () { return this.valueOf(); };

As JSON.stringify will fail at circular references, I created another method to stringify it in a way I can get the most of an object as a string and compute the hashcode over it, as follows:

function isPlainObject(obj)
{
    if ((typeof (obj) !== "object" || obj.nodeType || (obj instanceof Window))
        || (obj.constructor && !({}).hasOwnProperty.call(obj.constructor.prototype, "isPrototypeOf"))
    )
    {
        return false;
    }

    return true;
}
function stringify(obj, s)
{
    s = s || "";

    for (var i in obj)
    {
        var o = obj[i];

        if (o && (o instanceof Array || isPlainObject(o)))
        {
            s += i + ":" + JSON.stringify(o);
        }
        else if (o && typeof o === "object")
        {
            s += i + ":" + "$ref#" + o;
        }
        else
        {
            s += i + ":" + o;
        }
    }

    return s;
}

There isn't much impact on performance. For large object it is the same and for small objects it loses, but is still pretty fast and safe. Performance test here.

Name                         op/s
---------------------------------
JSON.stringify large           62
stringify      large           62
JSON.stringify small    1,690,183
stringify      small    1,062,452

My GroupBy method

function GroupBy(a, keySelector, elementSelector, comparer)
{
    // set default values for opitinal parameters
    elementSelector = elementSelector || function(e) { return e; };
    comparer = comparer ||
        {
            Equals: function(a,b) { return a==b },
            GetHashCode: function(e) { return e.GetHashCode(); }
        };

    var key, hashKey, reHashKey;

    // keep groups separated by hash
    var hashs = {};
    for (var i = 0, n = a.length; i < n; ++i)
    {
        // in case of same hash, but Equals returns false
        reHashKey = undefined;

        // grabs the key
        key = keySelector(a[i]);
        // grabs the hashcode
        hashKey = comparer.GetHashCode(key);

        // if a hash exists in the list
        // compare values with Equals
        // in case it return false, generate a unique hash
        if (typeof hashs[hashKey] !== "undefined")
            reHashKey = comparer.Equals(key, hashs[hashKey].Key) ? hashKey : hashKey + " " + i;

        // if a new hash has been generated, update
        if (typeof reHashKey !== "undefined" && reHashKey !== hashKey)
            hashKey = reHashKey;

        // get/create a new group and add the current element to the list
        hashs[hashKey] = hashs[hashKey] || { Key: key, Elements: [] };
        hashs[hashKey].Elements.push(a[i]);
    }

    return hashs;
}

To test

var arrComplex =
[
    { N: { Value: 10 }, Name: "Foo" },
    { N: { Value: 10 }, Name: "Bar" },
    { N: { Value: 20 }, Name: "Foo" },
    { N: { Value: 20 }, Name: "Bar" }
];
//

var x = GroupBy(arrComplex
        , function(e) { return e.N; }
        , function(e) { return e.Name; }
        , {
              Equals: function(a,b) { return a.Value == b.Value },
              GetHashCode: function(e) { return e.GetHashCode(); }
          }
);
//

console.log(x);

Example on jsFiddle, now with Jedi.

But, accordingly to my tests, my implementation of GroupBy is slower than linq.js's GroupBy. It is only faster when I convert ToArray(). Maybe linq.js only really executes when I convert to array, that's why the difference, I am not sure on this part.

Test results

Name                         op/s
---------------------------------
GroupBy                   163,261
GroupByToArray            152,382
linq.js groupBy           243,547
linq.js groupBy toArray    26,309
Supererogation answered 17/12, 2013 at 14:46 Comment(0)
S
5

I used your jsperf as a reference, for some of the finer points of script. I really, really liked your 'hash' code, so I totally stole it. Mine uses a different method to generate the string used to make the hash, which seems to be a little faster, which increases the performance, according to the 'browserscope' charts. I include in my test a 'too much recursion' proof of concept to show that it has recursion protection, like JSON.stringify and .toSource().

My jsfiddle shows that the code returns the format you need. My jsperf seems to indicate that it outperforms the posted solution. I also included the linq.js solution, but it performs pretty bad in FireFox for me. It works comparably in Safari, Chrome, an IE, but not faster than mine, except in IE. I even tried it on my phone, and still I have the same performance difference. I have personally tested it in the latest versions of all browsers in a side-by-side to the posted solution, and mine out performs by around 40% across each of them. What are everyone's thoughts?

Here is my code:

var arr = [
  { N: 10, Name: "Foo" },
  { N: 10, Name: "Bar" },
  { N: 20, Name: "Foo" },
  { N: 20, Name: "Bar" }
];

var poc = { name:'blah', obj:{} };
poc.obj = poc;
var arrComplex = [
  { N: { Value: 10, TooMuchRecursionProofPOC:poc }, Name: "Foo" },
  { N: { Value: 10, TooMuchRecursionProofPOC:poc }, Name: "Bar" },
  { N: { Value: 20, TooMuchRecursionProofPOC:poc }, Name: "Foo" },
  { N: { Value: 20, TooMuchRecursionProofPOC:poc }, Name: "Bar" }
];

var eArr = Enumerable.From(arr);
var eArrComplex = Enumerable.From(arrComplex);

function setup_hashers() {
  // recursion protection idea
  var rp = '_rp'+(Math.random()*10000000);

  function tstr() {
    var out = '', i = '';
    if (this[rp]) { this[rp] = undefined; return out; }
    for (i in this)
      if (i != rp && this.hasOwnProperty(i))
        out += this[i] instanceof Object
          ? ((this[rp] = true) && this[i] != this && !this[i][rp] ? tstr.call(this[i]) : '')
          : (this[i].toString || tstr).call(this[i]);
    return out;
  };

  Number.prototype.GetHashCode = function() {
    return this.valueOf();
  };

  Object.prototype.GetHashCode = function() {
    var s = (this instanceof Object ? tstr : this.toString || tstr).call(this),
      h = 0;
    if (s.length)
      for (var i = 0; i < s.length; i++)
        h = ((h << 5) - h) + s.charCodeAt(i);

    return h;
  };
}

function group_by(a, keyFunc, valFunc, comp, as_array) {
  if (!a.length) return as_array ? [] : {};

  var keyFunc = keyFunc || function (e) { return e; },
      valFunc = valFunc || function (e) { return e; };
  var comp = comp || {
      Equals: function (a, b) { return a == b; },
      Hash: function (e) { return e.GetHashCode(); }
  };


  var hashs = {}, key = '', hash = '';
  for (var i = 0; i < a.length; i++) {
    key = keyFunc(a[i]);
    hash = comp.Hash(key);
    if (typeof hashs[hash] != 'undefined')
      hash = comp.Equals(key, hashs[hash].Key)
        ? hash
        : hash + '-' + i;
    hashs[hash] = hashs[hash] || { Key: key, Elements: [] };
    hashs[hash].Elements.push(valFunc(a[i]));
  }

  if (as_array) {
    var out = [], j = '', keys = Object.keys(hashs);
    for (var j = 0; j < keys.length; j++)
      out.push(hashs[keys[j]]);
    return out;
  }

  return hashs;
};

function group_by_control(a, keyFunc, valFunc) {
  if (!a.length) return as_array ? [] : {};

  var keyFunc = keyFunc || function (e) { return e; },
      valFunc = valFunc || function (e) { return e; };

  var hashs = {}, key = '', hash = '';
  for (var i = 0; i < a.length; i++) {
    key = keyFunc(a[i]);
    hashs[key] = hashs[key] || { Key: key, Elements: [] };
    hashs[key].Elements.push(valFunc(a[i]));
  }

  var out = [], j = '', keys = Object.keys(hashs);
  for (var j = 0; j < keys.length; j++)
  out.push(hashs[keys[j]]);
  return out;
};

setup_hashers();

console.log(group_by_control(
  arr,
  function(e) { return e.N },
  function(e) { return e.Name }
));

console.log(group_by(
  arrComplex, function(e) { return e.N; },
  function(e) { return e.Name; },
  {
    Equals: function(a, b) { return a.Value == b.Value },
    Hash: function(e) { return e.GetHashCode(); }
  }
));

console.log(group_by(
  arrComplex, function(e) { return e.N; },
  function(e) { return e.Name; },
  {
    Equals: function(a, b) { return a.Value == b.Value },
    Hash: function(e) { return e.GetHashCode(); }
  },
  true
));
Savino answered 23/12, 2013 at 5:50 Comment(0)
S
0

I managed to implement this way:

I need to get the hashcode from objects.

Object.prototype.GetHashCode = function () {
    var s = this instanceof Object ? stringify(this) : this.toString();

    var hash = 0;
    if (s.length === 0) return hash;
    for (var i = 0; i < s.length; ++i) {
        hash = ((hash << 5) - hash) + s.charCodeAt(i);
    }
    return hash;
};
Number.prototype.GetHashCode = function () { return this.valueOf(); };

As JSON.stringify will fail at circular references, I created another method to stringify it in a way I can get the most of an object as a string and compute the hashcode over it, as follows:

function isPlainObject(obj)
{
    if ((typeof (obj) !== "object" || obj.nodeType || (obj instanceof Window))
        || (obj.constructor && !({}).hasOwnProperty.call(obj.constructor.prototype, "isPrototypeOf"))
    )
    {
        return false;
    }

    return true;
}
function stringify(obj, s)
{
    s = s || "";

    for (var i in obj)
    {
        var o = obj[i];

        if (o && (o instanceof Array || isPlainObject(o)))
        {
            s += i + ":" + JSON.stringify(o);
        }
        else if (o && typeof o === "object")
        {
            s += i + ":" + "$ref#" + o;
        }
        else
        {
            s += i + ":" + o;
        }
    }

    return s;
}

There isn't much impact on performance. For large object it is the same and for small objects it loses, but is still pretty fast and safe. Performance test here.

Name                         op/s
---------------------------------
JSON.stringify large           62
stringify      large           62
JSON.stringify small    1,690,183
stringify      small    1,062,452

My GroupBy method

function GroupBy(a, keySelector, elementSelector, comparer)
{
    // set default values for opitinal parameters
    elementSelector = elementSelector || function(e) { return e; };
    comparer = comparer ||
        {
            Equals: function(a,b) { return a==b },
            GetHashCode: function(e) { return e.GetHashCode(); }
        };

    var key, hashKey, reHashKey;

    // keep groups separated by hash
    var hashs = {};
    for (var i = 0, n = a.length; i < n; ++i)
    {
        // in case of same hash, but Equals returns false
        reHashKey = undefined;

        // grabs the key
        key = keySelector(a[i]);
        // grabs the hashcode
        hashKey = comparer.GetHashCode(key);

        // if a hash exists in the list
        // compare values with Equals
        // in case it return false, generate a unique hash
        if (typeof hashs[hashKey] !== "undefined")
            reHashKey = comparer.Equals(key, hashs[hashKey].Key) ? hashKey : hashKey + " " + i;

        // if a new hash has been generated, update
        if (typeof reHashKey !== "undefined" && reHashKey !== hashKey)
            hashKey = reHashKey;

        // get/create a new group and add the current element to the list
        hashs[hashKey] = hashs[hashKey] || { Key: key, Elements: [] };
        hashs[hashKey].Elements.push(a[i]);
    }

    return hashs;
}

To test

var arrComplex =
[
    { N: { Value: 10 }, Name: "Foo" },
    { N: { Value: 10 }, Name: "Bar" },
    { N: { Value: 20 }, Name: "Foo" },
    { N: { Value: 20 }, Name: "Bar" }
];
//

var x = GroupBy(arrComplex
        , function(e) { return e.N; }
        , function(e) { return e.Name; }
        , {
              Equals: function(a,b) { return a.Value == b.Value },
              GetHashCode: function(e) { return e.GetHashCode(); }
          }
);
//

console.log(x);

Example on jsFiddle, now with Jedi.

But, accordingly to my tests, my implementation of GroupBy is slower than linq.js's GroupBy. It is only faster when I convert ToArray(). Maybe linq.js only really executes when I convert to array, that's why the difference, I am not sure on this part.

Test results

Name                         op/s
---------------------------------
GroupBy                   163,261
GroupByToArray            152,382
linq.js groupBy           243,547
linq.js groupBy toArray    26,309
Supererogation answered 17/12, 2013 at 14:46 Comment(0)
H
0

My simplest implementation using Map:

function groupByObjectKey(array, callback) {
    const groups = new Map();

    array.forEach(item => {
        const key = callback(item);

        if (!groups.has(key)) {
            groups.set(key, []);
        }

        groups.get(key).push(item);
    });

    return Object.fromEntries(groups.entries());
}
Hargeisa answered 5/2 at 21:44 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.