Scalable Trie implementation in PHP
Asked Answered
M

2

6

Following this tutorial I met the Trie data structure. Since recently I've been programming in PHP I tried to solve the lecture's problem with that. I was able to achieve correct answers, but only for smaller inputs (Input #10 is a 2,82 MB file). Obviously, my algorithm is not scaling well. It also exceeds the default 128 MB memory limit of PHP.

My algorithm

There is a root node stored in Trie. Every node has a "children" member. I use the standard PHP array to store the children. A child key represents a character (currently I am creating a new node for every character, a-z lowercase, mapping to 0-25), a child value is a reference to another node.

The "weight" member that every nodes has is there because of the problem. I would like to optimize my code, (or even rewrite it from stratch using a different approach) so that it can pass the tests for big inputs.

I'm interested in a solution to make this data structure work in PHP with big inputs, if possible.

My code

TrieNode class stores the tree hierarchy.

class TrieNode {
    // weight is needed for the given problem
    public $weight;
    /* TrieNode children, 
    * e.g. [0 => (TrieNode object1), 2 => (TrieNode object2)]
    * where 0 stands for 'a', 1 for 'c'
    * and TrieNode objects are references to other TrieNodes.
    */
    private $children;

    function __construct($weight, $children) {
        $this->weight = $weight;
        $this->children = $children;
    }

    /** map lower case english letters to 0-25 */
    static function getAsciiValue($char) {
        return intval(ord($char)) - intval(ord('a'));
    }

    function addChild($char, $node) {
        if (!isset($this->children)) {
            $this->children = [];
        }
        $this->children[self::getAsciiValue($char)] = $node;
    }

    function isChild($char) {
        return isset($this->children[self::getAsciiValue($char)]);
    }

    function getChild($char) {
        return $this->children[self::getAsciiValue($char)];
    }

    function isLeaf() {
        return empty($this->children);
    }
}

Trie class stores the root TrieNode. It can insert and query nodes.

class Trie {
    /* root TrieNode stores the first characters */
    private $root;

    function __construct() {
        $this->root = new TrieNode(-1, []);
    }

    function insert($string, $weight) {
        $currentNode = $this->root;
        $l = strlen($string);
        for ($i = 0; $i < $l; $i++) {
            $char = $string[$i];
            if(!$currentNode->isChild($char)) {
                $n = new TrieNode($weight, null);
                $currentNode->addChild($char, $n);
            }
            $currentNode->weight = max($weight, $currentNode->weight);
            $currentNode = $currentNode->getChild($char);
        }
    }

    function getNode($string) {
        $currentNode = $this->root;
        $l = strlen($string);
        for ($i = 0; $i < $l; $i++) {
            $char = $string[$i];
            if ($currentNode->isLeaf() || !$currentNode->isChild($char)) {
                return null;
            }
            $currentNode = $currentNode->getChild($char);
        }
        return $currentNode;
    }

    function getWeight($string) {
        $node = $this->getNode($string);
        return is_null($node) ? -1 : $node->weight;
    }
}

Test code. Parses input and calls the Trie object.

//MAIN / TEST

/*
In case the problem page is down:

e.g.
INPUT
2 1
hackerearth 10
hackerrank 9
hacker

OUTPUT
10

where 2 is the number of inserts, 1 is the number of queries
"string number" is the string to insert and its "weight"
"hacker" is the string to query
10 is maximum the weight of the queried string (hacker -> 10)
*/

$trie = new Trie();
$handle = fopen('test.txt', 'r');
//$handle = STDIN; // <- this is for the online judge
list($n, $q) = fscanf($handle, "%d %d");
for ($i = 0; $i < $n; $i++) { // insert data
    list($s, $weight) = fscanf($handle, "%s %d");
    $trie->insert($s, $weight);
}
for ($i = 0; $i < $q; $i++) { // query data
    $query = trim(strval(fgets($handle)));
    echo $trie->getWeight($query) . PHP_EOL;
}
fclose($handle);

Fail

The algorithm fails 6 tests out of 16

Merrygoround answered 5/3, 2017 at 1:17 Comment(5)
@csirmazbendeguz, you are getting time limit exceeded error, so that means, you need to optimize your code.Gorski
I will take a look at the time complexity of your code, later todayGorski
Can you paste the link to this problem on hackerearth, so that I can try submitting a trie solution on it ?Gorski
@Gorski Thanks for your time. The links are in the description.Merrygoround
The good news is making some tweaks to the code, now my only one test case is timing out, your logic and flow everything is correct, just covering some corner cases so that it doesn't fail thereGorski
L
2

Below is the code with following optimizations -

Removed all unnecessary condition checks like

  1. No need to check is node is a leaf because if node does not have a child a specified char then it does not matter if it is a leaf or not.
  2. No need to check if {children} is initialized every time you add a child node. Removed this check initialized {children} to empty array in constructor itself.

Removed function to {getAsciiValue} instead using a simple associative array as. Also changing a {char} to ascii value has been moved from TrieNode to Trie class so that we don't need to convert it multiple times

After these optimization i came up following minimal solution, but this also can not pass the test #10. After reading about array in PHP I got to know that PHP does not implement array like other compiled languages, instead any array in PHP is just an ordered hash map and because of this array does not support constant time operations. https://mcmap.net/q/212632/-php-array-performance

Also using SplFixedArray but did not help because it is an object and has cost of instantiation. It could have helped if tried using a large array to store whole Trie. You can try implementing a solution to using SplFixedArray to store whole Trie and check if you can get it to pass test #10.

<?php

/*
 * Read input from stdin and provide input before running code

fscanf(STDIN, "%s\n", $name);
echo "Hi, ".$name;

*/

class TrieNode {
    // weight is needed for the given problem
    public $weight;
    /* TrieNode children, 
    * e.g. [0 => (TrieNode object1), 2 => (TrieNode object2)]
    * where 0 stands for 'a', 2 for 'c'
    * and TrieNode objects are references to other TrieNodes.
    */
    private $children;

    function __construct($weight) {
        $this->weight = $weight;
        $this->children = [];
    }

    function addChild($char, $node) {
        $this->children[$char] = $node;
    }

    function isChild($char) {
        return isset($this->children[$char]);
    }

    function getChild($char) {
        return $this->children[$char];
    }
}

class Trie {
    /* root TrieNode stores the first characters */
    private $root;

    function __construct() {
        $this->root = new TrieNode(-1);
    }

    static $asciiValues = array(
        "a" => 0,
        "b" => 1,
        "c" => 2,
        "d" => 3,
        "e" => 4,
        "f" => 5,
        "g" => 6,
        "h" => 7,
        "i" => 8,
        "j" => 9,
        "k" => 10,
        "l" => 11,
        "m" => 12,
        "n" => 13,
        "o" => 14,
        "p" => 15,
        "q" => 16,
        "r" => 17,
        "s" => 18,
        "t" => 19,
        "u" => 20,
        "v" => 21,
        "w" => 22,
        "x" => 23,
        "y" => 24,
        "z" => 25
    );

    function insert($string, $weight) {
        $currentNode = $this->root;
        $l = strlen($string);
        for ($i = 0; $i < $l; $i++) {
            $char = self::$asciiValues[$string[$i]];
            $currentNode->weight = max($weight, $currentNode->weight);
            if($currentNode->isChild($char)) {
                $childNode = $currentNode->getChild($char);
            } else {
                $childNode = new TrieNode($weight);
                $currentNode->addChild($char, $childNode);
            }
            $currentNode = $childNode;
        }
    }

    function getNodeWeight($string) {
        $currentNode = $this->root;
        $l = strlen($string);
        for ($i = 0; $i < $l; $i++) {
            $char = self::$asciiValues[$string[$i]];
            if (!$currentNode->isChild($char)) {
                return -1;
            }
            $currentNode = $currentNode->getChild($char);
        }
        return $currentNode->weight;
    }
}

$trie = new Trie();
//$handle = fopen('test.txt', 'r');
$handle = STDIN; // <- this is for the online judge
list($n, $q) = fscanf($handle, "%d %d");
for ($i = 0; $i < $n; $i++) { // insert data
    list($s, $weight) = fscanf($handle, "%s %d");
    $trie->insert($s, $weight);
}
for ($i = 0; $i < $q; $i++) { // query data
    //$query = trim(strval(fgets($handle)));
    $query = trim(strval(fgets($handle)));
    echo $trie->getNodeWeight($query) . PHP_EOL;
}
fclose($handle);


?>
Lapsus answered 23/6, 2017 at 10:20 Comment(0)
G
3

After making some tweaks and modifications, I have been able to get this thing to work for all the test cases except one test case is timing out,

Here is the whole code that will run successfully for all the test cases except the test case 10.

class TrieNode {
        // weight is needed for the given problem
        public $weight;
        /* TrieNode children, 
        * e.g. [0 => (TrieNode object1), 2 => (TrieNode object2)]
        * where 0 stands for 'a', 1 for 'c'
        * and TrieNode objects are references to other TrieNodes.
        */
        private $children;

        function __construct($weight, $children) {
            $this->weight = $weight;
            $this->children = $children;
        }

        /** map lower case english letters to 0-25 */
        static function getAsciiValue($char) {
            return intval(ord($char)) - intval(ord('a'));
        }

        function addChild($char, $node) {
            if (!isset($this->children)) {
                $this->children = [];
            }
            $this->children[self::getAsciiValue($char)] = $node;
        }

        function isChild($char) {
            return isset($this->children[self::getAsciiValue($char)]);
        }

        function getChild($char) {
            return $this->children[self::getAsciiValue($char)];
        }

        function isLeaf() {
            return empty($this->children);
        }
    }

    class Trie {
        /* root TrieNode stores the first characters */
        private $root;

        function __construct() {
            $this->root = new TrieNode(-1, []);
        }

        function insert($string, $weight) {
            $currentNode = $this->root;
            $l = strlen($string);
            for ($i = 0; $i < $l; $i++) {
                $char = $string[$i];
                if(!$currentNode->isChild($char)) {
                    $n = new TrieNode($weight, null);
                    $currentNode->addChild($char, $n);
                }
                $currentNode->weight = max($weight, $currentNode->weight);
                $currentNode = $currentNode->getChild($char);
            }
        }

        function getNode($string) {
            $currentNode = $this->root;
            if (empty($currentNode) || !isset($currentNode)) {
              return null;
            }
            $l = strlen($string);
            for ($i = 0; $i < $l; $i++) {
                $char = $string[$i];
                if (empty($currentNode) || $currentNode->isLeaf() || !$currentNode->isChild($char)) {
                    return null;
                }
                $currentNode = $currentNode->getChild($char);
                if (empty($currentNode)) {
                  return null;
                }
            }
            return $currentNode;
        }

        function getWeight($string) {
            $node = $this->getNode($string);
            return is_null($node) ? -1 : $node->weight;
        }
    }

    $trie = new Trie();
    //$handle = fopen('test.txt', 'r');
    $handle = STDIN; // <- this is for the online judge
    list($n, $q) = fscanf($handle, "%d %d");
    for ($i = 0; $i < $n; $i++) { // insert data
        list($s, $weight) = fscanf($handle, "%s %d");
        $trie->insert($s, $weight);
    }
    for ($i = 0; $i < $q; $i++) { // query data
        $query = trim(strval(fgets($handle)));
        echo $trie->getWeight($query) . PHP_EOL;
    }
    fclose($handle);

I will try to add some more checks so that I could reduce the computation cycles taken by this program.

Gorski answered 22/6, 2017 at 18:24 Comment(0)
L
2

Below is the code with following optimizations -

Removed all unnecessary condition checks like

  1. No need to check is node is a leaf because if node does not have a child a specified char then it does not matter if it is a leaf or not.
  2. No need to check if {children} is initialized every time you add a child node. Removed this check initialized {children} to empty array in constructor itself.

Removed function to {getAsciiValue} instead using a simple associative array as. Also changing a {char} to ascii value has been moved from TrieNode to Trie class so that we don't need to convert it multiple times

After these optimization i came up following minimal solution, but this also can not pass the test #10. After reading about array in PHP I got to know that PHP does not implement array like other compiled languages, instead any array in PHP is just an ordered hash map and because of this array does not support constant time operations. https://mcmap.net/q/212632/-php-array-performance

Also using SplFixedArray but did not help because it is an object and has cost of instantiation. It could have helped if tried using a large array to store whole Trie. You can try implementing a solution to using SplFixedArray to store whole Trie and check if you can get it to pass test #10.

<?php

/*
 * Read input from stdin and provide input before running code

fscanf(STDIN, "%s\n", $name);
echo "Hi, ".$name;

*/

class TrieNode {
    // weight is needed for the given problem
    public $weight;
    /* TrieNode children, 
    * e.g. [0 => (TrieNode object1), 2 => (TrieNode object2)]
    * where 0 stands for 'a', 2 for 'c'
    * and TrieNode objects are references to other TrieNodes.
    */
    private $children;

    function __construct($weight) {
        $this->weight = $weight;
        $this->children = [];
    }

    function addChild($char, $node) {
        $this->children[$char] = $node;
    }

    function isChild($char) {
        return isset($this->children[$char]);
    }

    function getChild($char) {
        return $this->children[$char];
    }
}

class Trie {
    /* root TrieNode stores the first characters */
    private $root;

    function __construct() {
        $this->root = new TrieNode(-1);
    }

    static $asciiValues = array(
        "a" => 0,
        "b" => 1,
        "c" => 2,
        "d" => 3,
        "e" => 4,
        "f" => 5,
        "g" => 6,
        "h" => 7,
        "i" => 8,
        "j" => 9,
        "k" => 10,
        "l" => 11,
        "m" => 12,
        "n" => 13,
        "o" => 14,
        "p" => 15,
        "q" => 16,
        "r" => 17,
        "s" => 18,
        "t" => 19,
        "u" => 20,
        "v" => 21,
        "w" => 22,
        "x" => 23,
        "y" => 24,
        "z" => 25
    );

    function insert($string, $weight) {
        $currentNode = $this->root;
        $l = strlen($string);
        for ($i = 0; $i < $l; $i++) {
            $char = self::$asciiValues[$string[$i]];
            $currentNode->weight = max($weight, $currentNode->weight);
            if($currentNode->isChild($char)) {
                $childNode = $currentNode->getChild($char);
            } else {
                $childNode = new TrieNode($weight);
                $currentNode->addChild($char, $childNode);
            }
            $currentNode = $childNode;
        }
    }

    function getNodeWeight($string) {
        $currentNode = $this->root;
        $l = strlen($string);
        for ($i = 0; $i < $l; $i++) {
            $char = self::$asciiValues[$string[$i]];
            if (!$currentNode->isChild($char)) {
                return -1;
            }
            $currentNode = $currentNode->getChild($char);
        }
        return $currentNode->weight;
    }
}

$trie = new Trie();
//$handle = fopen('test.txt', 'r');
$handle = STDIN; // <- this is for the online judge
list($n, $q) = fscanf($handle, "%d %d");
for ($i = 0; $i < $n; $i++) { // insert data
    list($s, $weight) = fscanf($handle, "%s %d");
    $trie->insert($s, $weight);
}
for ($i = 0; $i < $q; $i++) { // query data
    //$query = trim(strval(fgets($handle)));
    $query = trim(strval(fgets($handle)));
    echo $trie->getNodeWeight($query) . PHP_EOL;
}
fclose($handle);


?>
Lapsus answered 23/6, 2017 at 10:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.