Implementing goMongoDB-like Query expression object evaluation
Asked Answered
R

2

14

I've been looking for a MongoDb-like ( http://docs.mongodb.org/manual/applications/read/#find, docs.mongodb.org/manual/reference/operators/ ) query expression object evaluation function implementation or a class. It may cover not all the advanced features, and should have extensible architecture.

MongoDB-like query expression objects are easy for understanding and usage, providing ability to write clean, self-explaining code, because both query and objects to search in, are associative arrays.

Basically talking its a convenient function to extract information from php arrays. Knowing the array structure(the arrayPath), it will allow to perform operations on multidimensional arrays data, without the need for multiple nested loops.

If you are not familiar with MongoDb, take a look at a given expression object and array to search in.

I wrote it as JSON string for simplicity. The object contents makes no sense, just showng the MongoDb query syntax.

MongoDb-like query expression object

{
    "name": "Mongo",
    "type": "db",
    "arch": {
        "$in": [
            "x86",
            "x64"
        ]
    },
    "version": {
        "$gte": 22
    },
    "released": {
        "$or": {
            "$lt": 2013,
            "$gt": 2012
        }
    }
}

The array to search in

[
    {
        "name": "Mongo",
        "type": "db",
        "release": {
            "arch": "x86",
            "version": 22,
            "year": 2012
        }
    },
    {
        "name": "Mongo",
        "type": "db",
        "release": {
            "arch": "x64",
            "version": 21,
            "year": 2012
        }
    },
    {
        "name": "Mongo",
        "type": "db",
        "release": {
            "arch": "x86",
            "version": 23,
            "year": 2013
        }
    }
]

Find using Mongo-like query expressions

So, with the help of the function, we should be able to issue the following query to the target array.

$found=findLikeMongo($array, $queryExpr); //resulting in a $array[0] value;
//@return found array

Get array path using Mongo-like query expressions

$arrayPath=getPathFromMongo($array, $queryExpr);// resulting in array("0")
//@return array path, represented as an array where entries are consecutive keys.

Homework

  • I found that goessner.net/articles/JsonPath/ could possibly cover my needs(not being an exact match because it uses Xpath-like expressions), the caveat is, that it heavily relies on regular expressions and string parsing, what will definitely slow it down compared to array only(JSON like) implementation.

  • Also I've found a similar question here, @stackoverflow Evaluating MongoDB-like JSON Queries in PHP. The resulting answer was to use some SPL functions, which I am used to avoid most of the time.
    Wonder if the author had came up with function, he had been trying to develop.

  • The possible arrayPath implementation was found on thereisamoduleforthat.com/content/dealing-deep-arrays-php, thus the lack of this implementation, is that it relies on pointers.

I know its not a trivial question with a oneliner answer, that's why I'm asking it before starting the actual development of my own class.

I appreciate architecture tips, related or similar code, which may be a good practice example for building php "if..else" expressions on the fly.emphasized text

How to write a non-SPL version?

@Baba provided an excellent class, which is written with the use of SPL. I wonder how to rewrite this code without SPL.

There are two reasons for this

  • calling the class multiple times will give function overhead, that can be avoided rewriting it in raw PHP.
  • it would be easily portable to raw Javascript where SPL is not available, leading to easier code maintenance on both platforms.

Results

The created ArrayQuery class is published on Github, consider checking-out the repository for updates.

SPL, raw PHP version and Chequer2 FORP profiler output

In brief-

  1. the raw PHP version performs 10x faster than the SPL one, consuming 20% less memory.
  2. Chequer2 class performs 40% slower than PHP SPL class, and almost 20x slower than raw PHP version.
  3. MongoDb is the fastest(10x faster than raw PHP implementation and consumes 5x less memory), do not use these classes unless you are sure you want to avoid interaction with MongoDb.

MongoDb version

MongoDb reference profiling results

SPL version

PHP with SPL class profiling results

Raw PHP(latest ArrayQuery class) version

raw PHP ArrayQuery class profiling results

Chequer2 version

Chequer2 PHP class profiling results

MongoDb reference test profiling code

$m = new MongoClient(); // connect
$db = $m->testmongo; // select a database
$collection = $db->data;
$loops=100;
for ($i=0; $i<$loops; $i++) {
    $d = $collection->find(array("release.year" => 2013));
}
print_r( iterator_to_array($d) );

PHP with SPL class profiling code

include('data.php');
include('phpmongo-spl.php');
$s = new ArrayCollection($array, array("release.year" => 2013),false);
$loops=100;
for ($i=0; $i<$loops; $i++) {
    $d = $s->parse();
}
print_r( $d );

The SPL class parse() function has been slightly modified to return the value after execution, it could be also be modified to accept expression, but it's not essential for profiling purposes as the expression is being reevaluated every time.

raw PHP(latest ArrayQuery class) profiling code

include('data.php');
include('phpmongo-raw.php');
$s = new ArrayStandard($array);
$loops=100;
for ($i=0; $i<$loops; $i++) {
    $d = $s->find(array("release.year" => 2013));
}
print_r( $d );

chequer2 PHP profiling code

<?php
include('data.php');
include('../chequer2/Chequer.php');
$query=array("release.year" => 2013);

$loops=100;
for ($i=0; $i<$loops; $i++) {
    $result=Chequer::shorthand('(.release.year > 2012) ? (.) : NULL')
        ->walk($array);

}
print_r($result);
?>

data used(same as @baba provided in his answer)

$json = '[{
    "name":"Mongo",
    "type":"db",
    "release":{
        "arch":"x86",
        "version":22,
        "year":2012
    }
},
{
    "name":"Mongo",
    "type":"db",
    "release":{
        "arch":"x64",
        "version":21,
        "year":2012
    }
},
{
    "name":"Mongo",
    "type":"db",
    "release":{
        "arch":"x86",
        "version":23,
        "year":2013
    }
},      
{
    "key":"Diffrent",
    "value":"cool",
    "children":{
        "tech":"json",
        "lang":"php",
        "year":2013
    }
}
]';

$array = json_decode($json, true);

the forp-ui slightly modified sample ui loader(to be called with ?profile=FILE_TO_PROFILE)

<!doctype html>
<html>
    <head>
        <style>
            body {margin : 0px}
        </style>
    </head>
    <body>
        <div class="forp"></div>
<?php
register_shutdown_function(
    function() {
        // next code can be append to PHP scripts in dev mode
        ?>
        <script src="../forp-ui/js/forp.min.js"></script>
        <script>
        (function(f) {
            f.find(".forp")
             .each(
                function(el) {
                    el.css('margin:50px;height:300px;border:1px solid #333');
                }
             )
             .forp({
                stack : <?php echo json_encode(forp_dump()); ?>,
                //mode : "fixed"
             })
        })(forp);
        </script>
        <?php
    }
);

// start forp
forp_start();

// our PHP script to profile
include($_GET['profile']);

// stop forp
forp_end();
?>
</body>
</html>
Roseannaroseanne answered 20/2, 2013 at 3:55 Comment(0)
P
8

Latest Update

@baba has given a great raw PHP version of a class implementing MongoDB-like query expression object evaluation, but the output structure differs a bit, I mean the dot notation in the nested array output( [release.arch] => x86 ), instead of regular arrays( [release] => Array([arch] => x86) ). I would appreciate your tip how to make the class fully compatible with mongoDB in this order, as it seems its strictly tied to the raw PHP class implementation.

=======================================================================

Answer:

What you want is very easy, All you need is 2 corrections in the current code input and output loop and you would get your new format.

What do i mean ?

A. Changed

  foreach ( $array as $part ) {
        $this->flatten[] = $this->convert($part);
    }

To

    foreach ( $array as $k => $part ) {
        $this->flatten[$k] = $this->convert($part);
    }

B. Changed

    foreach ( $this->flatten as $data ) {
        $this->check($find, $data, $type) and $f[] = $data;
    }

To:

    foreach ( $this->flatten as $k => $data ) {
        $this->check($find, $data, $type) and $f[] = $this->array[$k];
    }

New Array for resting 

$json = '[
  {
    "name": "Mongo",
    "release": {
      "arch": "x86",
      "version": 22,
      "year": 2012
    },
    "type": "db"
  },
  {
    "name": "Mongo",
    "release": {
      "arch": "x64",
      "version": 21,
      "year": 2012
    },
    "type": "db"
  },
  {
    "name": "Mongo",
    "release": {
      "arch": "x86",
      "version": 23,
      "year": 2013
    },
    "type": "db"
  },
  {
    "name": "MongoBuster",
    "release": {
      "arch": [
        "x86",
        "x64"
      ],
      "version": 23,
      "year": 2013
    },
    "type": "db"
  },
  {
    "children": {
      "dance": [
        "one",
        "two",
        {
          "three": {
            "a": "apple",
            "b": 700000,
            "c": 8.8
          }
        }
      ],
      "lang": "php",
      "tech": "json",
      "year": 2013
    },
    "key": "Diffrent",
    "value": "cool"
  }
]';

$array = json_decode($json, true);

Simple Test

$s = new ArrayStandard($array);
print_r($s->find(array("release.arch"=>"x86")));

Output

Array
(
    [0] => Array
        (
            [name] => Mongo
            [type] => db
            [release] => Array
                (
                    [arch] => x86
                    [version] => 22
                    [year] => 2012
                )

        )

    [1] => Array
        (
            [name] => Mongo
            [type] => db
            [release] => Array
                (
                    [arch] => x86
                    [version] => 23
                    [year] => 2013
                )

        )

)

If you also want to retain original array key position you can have

    foreach ( $this->flatten as $k => $data ) {
        $this->check($find, $data, $type) and $f[$k] = $this->array[$k];
    }

Just for Fun Part

A. Support for regex

Just for fun i added support for $regex with alias $preg or $match which means you can have

print_r($s->find(array("release.arch" => array('$regex' => "/4$/"))));

Or

print_r($s->find(array("release.arch" => array('$regex' => "/4$/"))));

Output

Array
(
    [1] => Array
        (
            [name] => Mongo
            [type] => db
            [release] => Array
                (
                    [arch] => x64
                    [version] => 21
                    [year] => 2012
                )

        )

)

B. Use Simple array like queries

$queryArray = array(
        "release" => array(
                "arch" => "x86"
        )
);
$d = $s->find($s->convert($queryArray));

$s->convert($queryArray) has converted

Array
(
    [release] => Array
        (
            [arch] => x86
        )

)

To

Array
(
    [release.arch] => x86
)

C. Modulus $mod

print_r($s->find(array(
        "release.version" => array(
                '$mod' => array(
                        23 => 0
                )
        )
)));

 //Checks release.version % 23 == 0 ;

D. Count elements with $size

print_r($s->find(array(
        "release.arch" => array(
                '$size' => 2
        )
)));

// returns count(release.arch) == 2;

E. Check if it matches all element in array $all

print_r($s->find(array(
        "release.arch" => array(
                '$all' => array(
                        "x86",
                        "x64"
                )
        )
)));

Output

Array
(
    [3] => Array
        (
            [name] => MongoBuster
            [release] => Array
                (
                    [arch] => Array
                        (
                            [0] => x86
                            [1] => x64
                        )

                    [version] => 23
                    [year] => 2013
                )

            [type] => db
        )

)

F. If you are not sure of the element key name then you ca use $has its like the opposite of $in

print_r($s->find(array(
        "release" => array(
                '$has' => "x86"
        )
)));

=======================================================================

Old Update

@Baba provided an excellent class, which is written with the use of SPL. I wonder how to rewrite this code without SPL. The reason is that calling this class multiple times will give function overhead, that can be avoided rewriting it in raw PHP, and maybe using goto statement in final version, to avoid recursive function calls.

=======================================================================

Since you don't want SPL and functions .. it took a while but i was able to come up with alternative class that is also flexible and easy to use

To avoid loading the array multiple times you declare it once :

$array = json_decode($json, true);
$s = new ArrayStandard($array);

A. Find where release.year is 2013

$d = $s->find(array(
        "release.year" => "2013"
));
print_r($d);

Output

Array
(
    [0] => Array
        (
            [name] => Mongo
            [type] => db
            [release.arch] => x86
            [release.version] => 23
            [release.year] => 2013
        )

)

B. For the first time you can run complex $and or $or statement like find where release.arch = x86 and release.year = 2012

$d = $s->find(array(
        "release.arch" => "x86",
        "release.year" => "2012"
), ArrayStandard::COMPLEX_AND);

print_r($d);

Output

Array
(
    [0] => Array
        (
            [name] => Mongo
            [type] => db
            [release.arch] => x86
            [release.version] => 22
            [release.year] => 2012
        )

)

C. Imagine a much more complex query

$d = $s->find(array(
        "release.year" => array(
                '$in' => array(
                        "2012",
                        "2013"
                )
        ),
        "release.version" => array(
                '$gt' => 22
        ),
        "release.arch" => array(
                '$func' => function ($a) {
                    return $a == "x86";
                }
        )
), ArrayStandard::COMPLEX_AND);

print_r($d);

Output

Array
(
    [0] => Array
        (
            [name] => Mongo
            [type] => db
            [release.arch] => x86
            [release.version] => 23
            [release.year] => 2013
        )

)

The new Modified class

class ArrayStandard {
    const COMPLEX_OR = 1;
    const COMPLEX_AND = 2;
    private $array;
    private $tokens;
    private $found;

    function __construct(array $array) {
        $this->array = $array;
        foreach ( $array as $k => $item ) {
            $this->tokens[$k] = $this->tokenize($item);
        }   
    }

    public function getTokens() {
        return $this->tokens;
    }

    public function convert($part) {
        return $this->tokenize($part, null, false);
    }

    public function find(array $find, $type = 1) {
        $f = array();
        foreach ( $this->tokens as $k => $data ) {
            $this->check($find, $data, $type) and $f[$k] = $this->array[$k];
        }
        return $f;
    }

    private function check($find, $data, $type) {
        $o = $r = 0; // Obigation & Requirement
        foreach ( $data as $key => $value ) {
            if (isset($find[$key])) {
                $r ++;
                $options = $find[$key];
                if (is_array($options)) {
                    reset($options);
                    $eK = key($options);
                    $eValue = current($options);
                    if (strpos($eK, '$') === 0) {
                        $this->evaluate($eK, $value, $eValue) and $o ++;
                    } else {
                        throw new InvalidArgumentException('Missing "$" in expession key');
                    }
                } else {
                    $this->evaluate('$eq', $value, $options) and $o ++;
                }
            }
        }

        if ($o === 0)
            return false;

        if ($type == self::COMPLEX_AND and $o !== $r)
            return false;

        return true;
    }

    private function getValue(array $path) {
        return count($path) > 1 ? $this->getValue(array_slice($path, 1), $this->array[$path[0]]) : $this->array[$path[0]];
    }

    private function tokenize($array, $prefix = '', $addParent = true) {
        $paths = array();
        $px = empty($prefix) ? null : $prefix . ".";
        foreach ( $array as $key => $items ) {
            if (is_array($items)) {
                $addParent && $paths[$px . $key] = json_encode($items);
                foreach ( $this->tokenize($items, $px . $key) as $k => $path ) {
                    $paths[$k] = $path;
                }
            } else {
                $paths[$px . $key] = $items;
            }
        }
        return $paths;
    }

    private function evaluate($func, $a, $b) {
        $r = false;

        switch ($func) {
            case '$eq' :
                $r = $a == $b;
                break;
            case '$not' :
                $r = $a != $b;
                break;
            case '$gte' :
            case '$gt' :
                if ($this->checkType($a, $b)) {
                    $r = $a > $b;
                }
                break;

            case '$lte' :
            case '$lt' :
                if ($this->checkType($a, $b)) {
                    $r = $a < $b;
                }
                break;
            case '$in' :
                if (! is_array($b))
                    throw new InvalidArgumentException('Invalid argument for $in option must be array');
                $r = in_array($a, $b);
                break;

            case '$has' :
                if (is_array($b))
                    throw new InvalidArgumentException('Invalid argument for $has array not supported');
                $a = @json_decode($a, true) ?  : array();
                $r = in_array($b, $a);
                break;

            case '$all' :
                $a = @json_decode($a, true) ?  : array();
                if (! is_array($b))
                    throw new InvalidArgumentException('Invalid argument for $all option must be array');
                $r = count(array_intersect_key($a, $b)) == count($b);
                break;

            case '$regex' :
            case '$preg' :
            case '$match' :

                $r = (boolean) preg_match($b, $a, $match);
                break;

            case '$size' :
                $a = @json_decode($a, true) ?  : array();
                $r = (int) $b == count($a);
                break;

            case '$mod' :
                if (! is_array($b))
                    throw new InvalidArgumentException('Invalid argument for $mod option must be array');
                list($x, $y) = each($b);
                $r = $a % $x == 0;
                break;

            case '$func' :
            case '$fn' :
            case '$f' :
                if (! is_callable($b))
                    throw new InvalidArgumentException('Function should be callable');
                $r = $b($a);
                break;

            default :
                throw new ErrorException("Condition not valid ... Use \$fn for custom operations");
                break;
        }

        return $r;
    }

    private function checkType($a, $b) {
        if (is_numeric($a) && is_numeric($b)) {
            $a = filter_var($a, FILTER_SANITIZE_NUMBER_FLOAT);
            $b = filter_var($b, FILTER_SANITIZE_NUMBER_FLOAT);
        }

        if (gettype($a) != gettype($b)) {
            return false;
        }
        return true;
    }
}
Parsnip answered 19/3, 2013 at 20:28 Comment(23)
Great class, I'll update question with the class comparison using the forp PHP profiler right away.Roseannaroseanne
thank you. The output structure differs a bit. I took a glance at the implementation of the output with a hope to fix it in a shorthand before profiling, but it looks like it's just the class logic to behave like this. I mean the dot notation in the raw PHP version nested array output( [release.arch] => x86 ), instead of regular arrays( [release] => Array([arch] => x86) ). I would appreciate your tip how to make the classes fully compatible in this order.Roseannaroseanne
This has noting to do with SPL speed .. I only took time to optimize this class better that the previous answer using array flatten method. Am sure if i re write it using SLP it would definitely be faster ans betterParsnip
I'm talking about class output, not the input query! I.e. after running the $d = $s->find(array("release.year" => "2013")); print_r($d); query it should return an array like an original Array ( [0] => Array ( [name] => Mongo [type] => db [release] => Array ( [arch] => x86 [version] => 22 [year] => 2012 ) ) ) not Array ( [0] => Array ( [name] => Mongo [type] => db [release.arch] => x86 [release.version] => 23 [release.year] => 2013 ) )Roseannaroseanne
Oh ok ... your question was not clear ... i would look into it and update my answerParsnip
Answered the question ... You can still hold on for now why i try it on more complex queriesParsnip
Thank you for a prompt update, I'll test it and publish the profiling results. Btw. I'll keep the bounty running, so more people could see your great work :)Roseannaroseanne
@Roseannaroseanne its getting too long and messy and becoming more like god objects which i strong go against .... I moving the improvement of the class to git github.com/olekukonko/ArrayQuery you can monitor the full progress thereParsnip
finally I've added MongoDb reference profiling results, it's obvious its faster. Do you have any thoughts on how to avoid tokenization in the ArrayQuery class? the is_array and json_encode cost too much(I'm aware that json_encode is the fastest tokenizer)Roseannaroseanne
Well it took you over 3 weeks to respond to a comment ... have lost interest .. almost deleting the git repositoryParsnip
lol, and I have almost prepared the documentation(compiled from the question) for the class README, will you accept the pull request?Roseannaroseanne
I will make few improvements and push it to githubRoseannaroseanne
I may be inspired back again .. when i see your improvementsParsnip
pushed the new readme and usage instructions compiled from this discussion, and benchmarks to your repository. A good work should always be finished, and you are right that this class should not become like a god object.Roseannaroseanne
Well the tokens acts like cache ... removing it would affect performance ... that is the reason the first answer was slow ... it was parsing the find real time ....Parsnip
maybe it's possible to implement this cache on pure arrays, without using tokenization to string?Roseannaroseanne
you would end on looping all the time or recursion see getPath on the other answerParsnip
I've just noticed that json_encode is used only for several advanced queries($has, $all, $size), I'll look deeper in the class to find the reason why it's still slower even when json_encode is commented out. At a first glance its just due to multiple function calls.Roseannaroseanne
from my research its very fast .. its only slow when testing 10000 find (took 5sec) because find had to loop over every elements in the array and tun a check .... but from what i see ... its fast enough has it becomes more functional let see what can be doneParsnip
Now I'm working on class readability, going to make code look self-documented(some parts with short, one-letter variables are not very clear for me). Then I will implement some ideas making an optimized, more linear production version with less function calls.Roseannaroseanne
Am also working on some improvement and functionality ... am using this json ... mega.co.nz/#!1A8BTRpC ruing in over 10,000 times to see performance difference ...Parsnip
Use this URL new.jsonParsnip
So the ArrayQuery README.md says that "It may cover not all the advanced features"--which features might those be? How well-tested is it not just in terms of performance but functionality?Thurible
P
10

Introduction

I think Evaluating MongoDB-like JSON Queries in PHP has given all the Information you need. all you need is to be creative with the solution and you achieve what you want

The Array

Lets assume we have the follow json converted to array

$json = '[{
    "name":"Mongo",
    "type":"db",
    "release":{
        "arch":"x86",
        "version":22,
        "year":2012
    }
},
{
    "name":"Mongo",
    "type":"db",
    "release":{
        "arch":"x64",
        "version":21,
        "year":2012
    }
},
{
    "name":"Mongo",
    "type":"db",
    "release":{
        "arch":"x86",
        "version":23,
        "year":2013
    }
},      
{
    "key":"Diffrent",
    "value":"cool",
    "children":{
        "tech":"json",
        "lang":"php",
        "year":2013
    }
}
]';

$array = json_decode($json, true);

Example 1

check if key - Different would be as simple as

echo new ArrayCollection($array, array("key" => "Diffrent"));

Output

{"3":{"key":"Diffrent","value":"cool","children":{"tech":"json","lang":"php","year":2013}}}

Example 2 Check if release year is 2013

echo new ArrayCollection($array, array("release.year" => 2013));

Output

{"2":{"name":"Mongo","type":"db","release":{"arch":"x86","version":23,"year":2013}}}

Example 3

Count where Year is 2012

$c = new ArrayCollection($array, array("release.year" => 2012));
echo count($c); // output 2 

Example 4

Lets take from your example where you want to check version is grater than 22

$c = new ArrayCollection($array, array("release.version" => array('$gt'=>22)));
echo $c;

Output

{"2":{"name":"Mongo","type":"db","release":{"arch":"x86","version":23,"year":2013}}}

Example 5

Check if release.arch value is IN a set such as [x86,x100] (Example)

$c = new ArrayCollection($array, array("release.arch" => array('$in'=>array("x86","x100"))));
foreach($c as $var)
{
    print_r($var);
}

Output

Array
(
    [name] => Mongo
    [type] => db
    [release] => Array
        (
            [arch] => x86
            [version] => 22
            [year] => 2012
        )

)
Array
(
    [name] => Mongo
    [type] => db
    [release] => Array
        (
            [arch] => x86
            [version] => 23
            [year] => 2013
        )

)

Example 6

Using Callable

$year = 2013;
$expression = array("release.year" => array('$func' => function ($value) use($year) {
    return $value === 2013;
}));

$c = new ArrayCollection($array, $expression);

foreach ( $c as $var ) {
    print_r($var);
}

Output

Array
(
    [name] => Mongo
    [type] => db
    [release] => Array
        (
            [arch] => x86
            [version] => 23
            [year] => 2013
        )

)

Example 7

Register your own expression name

$c = new ArrayCollection($array, array("release.year" => array('$baba' => 3)), false);
$c->register('$baba', function ($a, $b) {
    return substr($a, - 1) == $b;
});
$c->parse();
echo $c;

Output

{"2":{"name":"Mongo","type":"db","release":{"arch":"x86","version":23,"year":2013}}}

Class Used

class ArrayCollection implements IteratorAggregate, Countable, JsonSerializable {
    private $array;
    private $found = array();
    private $log;
    private $expression;
    private $register;

    function __construct(array $array, array $expression, $parse = true) {
        $this->array = $array;
        $this->expression = $expression;
        $this->registerDefault();
        $parse === true and $this->parse();
    }

    public function __toString() {
        return $this->jsonSerialize();
    }

    public function jsonSerialize() {
        return json_encode($this->found);
    }

    public function getIterator() {
        return new ArrayIterator($this->found);
    }

    public function count() {
        return count($this->found);
    }

    public function getLog() {
        return $this->log;
    }

    public function register($offset, $value) {
        if (strpos($offset, '$') !== 0)
            throw new InvalidArgumentException('Expresiion name must always start with "$" sign');

        if (isset($this->register[$offset]))
            throw new InvalidArgumentException(sprintf('Expression %s already registred .. Please unregister It first'));

        if (! is_callable($value)) {
            throw new InvalidArgumentException(sprintf('Only callable value can be registred'));
        }

        $this->register[$offset] = $value;
    }

    public function unRegister($offset) {
        unset($this->register[$offset]);
    }

    public function parse() {
        $it = new RecursiveIteratorIterator(new RecursiveArrayIterator($this->array));
        foreach ( $it as $k => $items ) {
            if ($this->evaluate($this->getPath($it), $items)) {
                $this->found[$it->getSubIterator(0)->key()] = $this->array[$it->getSubIterator(0)->key()];
            }
        }
    }

    private function registerDefault() {
        $this->register['$eq'] = array($this,"evaluateEqal");
        $this->register['$not'] = array($this,"evaluateNotEqual");

        $this->register['$gte'] = array($this,"evaluateGreater");
        $this->register['$gt'] = array($this,"evaluateGreater");

        $this->register['$lte'] = array($this,"evaluateLess");
        $this->register['$lt'] = array($this,"evaluateLess");

        $this->register['$in'] = array($this,"evalueateInset");

        $this->register['$func'] = array($this,"evalueateFunction");
        $this->register['$fn'] = array($this,"evalueateFunction");
        $this->register['$f'] = array($this,"evalueateFunction");
    }

    private function log($log) {
        $this->log[] = $log;
    }

    private function getPath(RecursiveIteratorIterator $it) {
        $keyPath = array();
        foreach ( range(1, $it->getDepth()) as $depth ) {
            $keyPath[] = $it->getSubIterator($depth)->key();
        }
        return implode(".", $keyPath);
    }

    private function checkType($a, $b) {
        if (gettype($a) != gettype($b)) {
            $this->log(sprintf("%s - %s  is not same type of %s - %s", json_encode($a), gettype($a), json_encode($b), gettype($b)));
            return false;
        }
        return true;
    }

    private function evaluate($key, $value) {
        $o = $r = 0; // Obigation & Requirement
        foreach ( $this->expression as $k => $options ) {
            if ($k !== $key)
                continue;

            if (is_array($options)) {
                foreach ( $options as $eK => $eValue ) {
                    if (strpos($eK, '$') === 0) {
                        $r ++;
                        $callable = $this->register[$eK];
                        $callable($value, $eValue) and $o ++;
                    } else {
                        throw new InvalidArgumentException('Missing "$" in expession key');
                    }
                }
            } else {

                $r ++;
                $this->evaluateEqal($value, $options) and $o ++;
            }
        }
        return $r > 0 && $o === $r;
    }

    private function evaluateEqal($a, $b) {
        return $a == $b;
    }

    private function evaluateNotEqual($a, $b) {
        return $a != $b;
    }

    private function evaluateLess($a, $b) {
        return $this->checkType($a, $b) and $a < $b;
    }

    private function evaluateGreater($a, $b) {
        return $this->checkType($a, $b) and $a > $b;
    }

    private function evalueateInset($a, array $b) {
        return in_array($a, $b);
    }

    private function evalueateFunction($a, callable $b) {
        return $b($a);
    }
}

Summary

It may cover not all the advanced features, and should have extensible architecture

The above class shows a typical example of what you want .. you can easy decouple it , extend it to support compound expressions like $and and $or

MongoDB-like query expression objects are easy for understanding and usage, providing ability to write clean, self-explaining code, because both query and objects to search in, are associative arrays.

Why not just write the array to a MongoDB database rather than working it arrays ?? It more efficient and it would save you a lot of troubles

I must also mention that use the best tool for the best job ... What you want is basically a function of a Database

Basically talking its a convenient function to extract information from php arrays. Knowing the array structure(the arrayPath), it will allow to perform operations on multidimensional arrays data, without the need for multiple nested loops.

The example shows how using a path to search for value but you are still dependent on loading the array to memory and your class performing multiple recursion ans loops which is not as efficient as a database .

I appreciate architecture tips, related or similar code, which may be a good practice example for building php "if..else" expressions on the fly.

Do you really mean you want all those just in here ???

Parsnip answered 28/2, 2013 at 20:34 Comment(5)
Your code is definitely easy to use and extend. ##Why not write to MongoDB?## Because it gives additional database interaction overhead, no need to benchmark to say the class you provided will run much faster on already preloaded small arrays. ##Why not just write the array to a MongoDB database rather than working it arrays## Yes I wanted to port some of very convenient functionality from MongoDb to PHP to simplify development of PHP applications. This code is going to do some operations on preloaded data on application level, still relying on mongoDB to do the heavy database operations.Roseannaroseanne
##Do you really mean you want all those just in here ???## Lol, the only tip I'm still interested in after such a complete answer, is how to rewrite this code without SPL functionality. Want to know why? Because calling this class multiple times will give function overhead, that can be avoided rewriting it in plain PHP, and maybe using goto statement to avoid recursive function calls.Roseannaroseanne
@fitheflow: avoid premature optimization. Try it. Never make anything faster than it needs to be. You always lose something important in the process (like maintainability). If you're worried about function overhead without even trying it, you're doing it wrong (tm).Magically
@Magically this is not a premature optimization, the class functionality is already well defined. The thing I'm trying to avoid is SPL. Are you interested in the class profiling results?Roseannaroseanne
@Roseannaroseanne Isn't SPL actually supposed to be quite effecient, especially with the way it does link with the PHP core and all. I must admit that does use quite a lot of objects and calls above but I think that is just what it takes to actually decode the query document of a MongoDB query at the end of the day.Tapp
P
8

Latest Update

@baba has given a great raw PHP version of a class implementing MongoDB-like query expression object evaluation, but the output structure differs a bit, I mean the dot notation in the nested array output( [release.arch] => x86 ), instead of regular arrays( [release] => Array([arch] => x86) ). I would appreciate your tip how to make the class fully compatible with mongoDB in this order, as it seems its strictly tied to the raw PHP class implementation.

=======================================================================

Answer:

What you want is very easy, All you need is 2 corrections in the current code input and output loop and you would get your new format.

What do i mean ?

A. Changed

  foreach ( $array as $part ) {
        $this->flatten[] = $this->convert($part);
    }

To

    foreach ( $array as $k => $part ) {
        $this->flatten[$k] = $this->convert($part);
    }

B. Changed

    foreach ( $this->flatten as $data ) {
        $this->check($find, $data, $type) and $f[] = $data;
    }

To:

    foreach ( $this->flatten as $k => $data ) {
        $this->check($find, $data, $type) and $f[] = $this->array[$k];
    }

New Array for resting 

$json = '[
  {
    "name": "Mongo",
    "release": {
      "arch": "x86",
      "version": 22,
      "year": 2012
    },
    "type": "db"
  },
  {
    "name": "Mongo",
    "release": {
      "arch": "x64",
      "version": 21,
      "year": 2012
    },
    "type": "db"
  },
  {
    "name": "Mongo",
    "release": {
      "arch": "x86",
      "version": 23,
      "year": 2013
    },
    "type": "db"
  },
  {
    "name": "MongoBuster",
    "release": {
      "arch": [
        "x86",
        "x64"
      ],
      "version": 23,
      "year": 2013
    },
    "type": "db"
  },
  {
    "children": {
      "dance": [
        "one",
        "two",
        {
          "three": {
            "a": "apple",
            "b": 700000,
            "c": 8.8
          }
        }
      ],
      "lang": "php",
      "tech": "json",
      "year": 2013
    },
    "key": "Diffrent",
    "value": "cool"
  }
]';

$array = json_decode($json, true);

Simple Test

$s = new ArrayStandard($array);
print_r($s->find(array("release.arch"=>"x86")));

Output

Array
(
    [0] => Array
        (
            [name] => Mongo
            [type] => db
            [release] => Array
                (
                    [arch] => x86
                    [version] => 22
                    [year] => 2012
                )

        )

    [1] => Array
        (
            [name] => Mongo
            [type] => db
            [release] => Array
                (
                    [arch] => x86
                    [version] => 23
                    [year] => 2013
                )

        )

)

If you also want to retain original array key position you can have

    foreach ( $this->flatten as $k => $data ) {
        $this->check($find, $data, $type) and $f[$k] = $this->array[$k];
    }

Just for Fun Part

A. Support for regex

Just for fun i added support for $regex with alias $preg or $match which means you can have

print_r($s->find(array("release.arch" => array('$regex' => "/4$/"))));

Or

print_r($s->find(array("release.arch" => array('$regex' => "/4$/"))));

Output

Array
(
    [1] => Array
        (
            [name] => Mongo
            [type] => db
            [release] => Array
                (
                    [arch] => x64
                    [version] => 21
                    [year] => 2012
                )

        )

)

B. Use Simple array like queries

$queryArray = array(
        "release" => array(
                "arch" => "x86"
        )
);
$d = $s->find($s->convert($queryArray));

$s->convert($queryArray) has converted

Array
(
    [release] => Array
        (
            [arch] => x86
        )

)

To

Array
(
    [release.arch] => x86
)

C. Modulus $mod

print_r($s->find(array(
        "release.version" => array(
                '$mod' => array(
                        23 => 0
                )
        )
)));

 //Checks release.version % 23 == 0 ;

D. Count elements with $size

print_r($s->find(array(
        "release.arch" => array(
                '$size' => 2
        )
)));

// returns count(release.arch) == 2;

E. Check if it matches all element in array $all

print_r($s->find(array(
        "release.arch" => array(
                '$all' => array(
                        "x86",
                        "x64"
                )
        )
)));

Output

Array
(
    [3] => Array
        (
            [name] => MongoBuster
            [release] => Array
                (
                    [arch] => Array
                        (
                            [0] => x86
                            [1] => x64
                        )

                    [version] => 23
                    [year] => 2013
                )

            [type] => db
        )

)

F. If you are not sure of the element key name then you ca use $has its like the opposite of $in

print_r($s->find(array(
        "release" => array(
                '$has' => "x86"
        )
)));

=======================================================================

Old Update

@Baba provided an excellent class, which is written with the use of SPL. I wonder how to rewrite this code without SPL. The reason is that calling this class multiple times will give function overhead, that can be avoided rewriting it in raw PHP, and maybe using goto statement in final version, to avoid recursive function calls.

=======================================================================

Since you don't want SPL and functions .. it took a while but i was able to come up with alternative class that is also flexible and easy to use

To avoid loading the array multiple times you declare it once :

$array = json_decode($json, true);
$s = new ArrayStandard($array);

A. Find where release.year is 2013

$d = $s->find(array(
        "release.year" => "2013"
));
print_r($d);

Output

Array
(
    [0] => Array
        (
            [name] => Mongo
            [type] => db
            [release.arch] => x86
            [release.version] => 23
            [release.year] => 2013
        )

)

B. For the first time you can run complex $and or $or statement like find where release.arch = x86 and release.year = 2012

$d = $s->find(array(
        "release.arch" => "x86",
        "release.year" => "2012"
), ArrayStandard::COMPLEX_AND);

print_r($d);

Output

Array
(
    [0] => Array
        (
            [name] => Mongo
            [type] => db
            [release.arch] => x86
            [release.version] => 22
            [release.year] => 2012
        )

)

C. Imagine a much more complex query

$d = $s->find(array(
        "release.year" => array(
                '$in' => array(
                        "2012",
                        "2013"
                )
        ),
        "release.version" => array(
                '$gt' => 22
        ),
        "release.arch" => array(
                '$func' => function ($a) {
                    return $a == "x86";
                }
        )
), ArrayStandard::COMPLEX_AND);

print_r($d);

Output

Array
(
    [0] => Array
        (
            [name] => Mongo
            [type] => db
            [release.arch] => x86
            [release.version] => 23
            [release.year] => 2013
        )

)

The new Modified class

class ArrayStandard {
    const COMPLEX_OR = 1;
    const COMPLEX_AND = 2;
    private $array;
    private $tokens;
    private $found;

    function __construct(array $array) {
        $this->array = $array;
        foreach ( $array as $k => $item ) {
            $this->tokens[$k] = $this->tokenize($item);
        }   
    }

    public function getTokens() {
        return $this->tokens;
    }

    public function convert($part) {
        return $this->tokenize($part, null, false);
    }

    public function find(array $find, $type = 1) {
        $f = array();
        foreach ( $this->tokens as $k => $data ) {
            $this->check($find, $data, $type) and $f[$k] = $this->array[$k];
        }
        return $f;
    }

    private function check($find, $data, $type) {
        $o = $r = 0; // Obigation & Requirement
        foreach ( $data as $key => $value ) {
            if (isset($find[$key])) {
                $r ++;
                $options = $find[$key];
                if (is_array($options)) {
                    reset($options);
                    $eK = key($options);
                    $eValue = current($options);
                    if (strpos($eK, '$') === 0) {
                        $this->evaluate($eK, $value, $eValue) and $o ++;
                    } else {
                        throw new InvalidArgumentException('Missing "$" in expession key');
                    }
                } else {
                    $this->evaluate('$eq', $value, $options) and $o ++;
                }
            }
        }

        if ($o === 0)
            return false;

        if ($type == self::COMPLEX_AND and $o !== $r)
            return false;

        return true;
    }

    private function getValue(array $path) {
        return count($path) > 1 ? $this->getValue(array_slice($path, 1), $this->array[$path[0]]) : $this->array[$path[0]];
    }

    private function tokenize($array, $prefix = '', $addParent = true) {
        $paths = array();
        $px = empty($prefix) ? null : $prefix . ".";
        foreach ( $array as $key => $items ) {
            if (is_array($items)) {
                $addParent && $paths[$px . $key] = json_encode($items);
                foreach ( $this->tokenize($items, $px . $key) as $k => $path ) {
                    $paths[$k] = $path;
                }
            } else {
                $paths[$px . $key] = $items;
            }
        }
        return $paths;
    }

    private function evaluate($func, $a, $b) {
        $r = false;

        switch ($func) {
            case '$eq' :
                $r = $a == $b;
                break;
            case '$not' :
                $r = $a != $b;
                break;
            case '$gte' :
            case '$gt' :
                if ($this->checkType($a, $b)) {
                    $r = $a > $b;
                }
                break;

            case '$lte' :
            case '$lt' :
                if ($this->checkType($a, $b)) {
                    $r = $a < $b;
                }
                break;
            case '$in' :
                if (! is_array($b))
                    throw new InvalidArgumentException('Invalid argument for $in option must be array');
                $r = in_array($a, $b);
                break;

            case '$has' :
                if (is_array($b))
                    throw new InvalidArgumentException('Invalid argument for $has array not supported');
                $a = @json_decode($a, true) ?  : array();
                $r = in_array($b, $a);
                break;

            case '$all' :
                $a = @json_decode($a, true) ?  : array();
                if (! is_array($b))
                    throw new InvalidArgumentException('Invalid argument for $all option must be array');
                $r = count(array_intersect_key($a, $b)) == count($b);
                break;

            case '$regex' :
            case '$preg' :
            case '$match' :

                $r = (boolean) preg_match($b, $a, $match);
                break;

            case '$size' :
                $a = @json_decode($a, true) ?  : array();
                $r = (int) $b == count($a);
                break;

            case '$mod' :
                if (! is_array($b))
                    throw new InvalidArgumentException('Invalid argument for $mod option must be array');
                list($x, $y) = each($b);
                $r = $a % $x == 0;
                break;

            case '$func' :
            case '$fn' :
            case '$f' :
                if (! is_callable($b))
                    throw new InvalidArgumentException('Function should be callable');
                $r = $b($a);
                break;

            default :
                throw new ErrorException("Condition not valid ... Use \$fn for custom operations");
                break;
        }

        return $r;
    }

    private function checkType($a, $b) {
        if (is_numeric($a) && is_numeric($b)) {
            $a = filter_var($a, FILTER_SANITIZE_NUMBER_FLOAT);
            $b = filter_var($b, FILTER_SANITIZE_NUMBER_FLOAT);
        }

        if (gettype($a) != gettype($b)) {
            return false;
        }
        return true;
    }
}
Parsnip answered 19/3, 2013 at 20:28 Comment(23)
Great class, I'll update question with the class comparison using the forp PHP profiler right away.Roseannaroseanne
thank you. The output structure differs a bit. I took a glance at the implementation of the output with a hope to fix it in a shorthand before profiling, but it looks like it's just the class logic to behave like this. I mean the dot notation in the raw PHP version nested array output( [release.arch] => x86 ), instead of regular arrays( [release] => Array([arch] => x86) ). I would appreciate your tip how to make the classes fully compatible in this order.Roseannaroseanne
This has noting to do with SPL speed .. I only took time to optimize this class better that the previous answer using array flatten method. Am sure if i re write it using SLP it would definitely be faster ans betterParsnip
I'm talking about class output, not the input query! I.e. after running the $d = $s->find(array("release.year" => "2013")); print_r($d); query it should return an array like an original Array ( [0] => Array ( [name] => Mongo [type] => db [release] => Array ( [arch] => x86 [version] => 22 [year] => 2012 ) ) ) not Array ( [0] => Array ( [name] => Mongo [type] => db [release.arch] => x86 [release.version] => 23 [release.year] => 2013 ) )Roseannaroseanne
Oh ok ... your question was not clear ... i would look into it and update my answerParsnip
Answered the question ... You can still hold on for now why i try it on more complex queriesParsnip
Thank you for a prompt update, I'll test it and publish the profiling results. Btw. I'll keep the bounty running, so more people could see your great work :)Roseannaroseanne
@Roseannaroseanne its getting too long and messy and becoming more like god objects which i strong go against .... I moving the improvement of the class to git github.com/olekukonko/ArrayQuery you can monitor the full progress thereParsnip
finally I've added MongoDb reference profiling results, it's obvious its faster. Do you have any thoughts on how to avoid tokenization in the ArrayQuery class? the is_array and json_encode cost too much(I'm aware that json_encode is the fastest tokenizer)Roseannaroseanne
Well it took you over 3 weeks to respond to a comment ... have lost interest .. almost deleting the git repositoryParsnip
lol, and I have almost prepared the documentation(compiled from the question) for the class README, will you accept the pull request?Roseannaroseanne
I will make few improvements and push it to githubRoseannaroseanne
I may be inspired back again .. when i see your improvementsParsnip
pushed the new readme and usage instructions compiled from this discussion, and benchmarks to your repository. A good work should always be finished, and you are right that this class should not become like a god object.Roseannaroseanne
Well the tokens acts like cache ... removing it would affect performance ... that is the reason the first answer was slow ... it was parsing the find real time ....Parsnip
maybe it's possible to implement this cache on pure arrays, without using tokenization to string?Roseannaroseanne
you would end on looping all the time or recursion see getPath on the other answerParsnip
I've just noticed that json_encode is used only for several advanced queries($has, $all, $size), I'll look deeper in the class to find the reason why it's still slower even when json_encode is commented out. At a first glance its just due to multiple function calls.Roseannaroseanne
from my research its very fast .. its only slow when testing 10000 find (took 5sec) because find had to loop over every elements in the array and tun a check .... but from what i see ... its fast enough has it becomes more functional let see what can be doneParsnip
Now I'm working on class readability, going to make code look self-documented(some parts with short, one-letter variables are not very clear for me). Then I will implement some ideas making an optimized, more linear production version with less function calls.Roseannaroseanne
Am also working on some improvement and functionality ... am using this json ... mega.co.nz/#!1A8BTRpC ruing in over 10,000 times to see performance difference ...Parsnip
Use this URL new.jsonParsnip
So the ArrayQuery README.md says that "It may cover not all the advanced features"--which features might those be? How well-tested is it not just in terms of performance but functionality?Thurible

© 2022 - 2024 — McMap. All rights reserved.