Following this tutorial I met the Trie data structure. Since recently I've been programming in PHP I tried to solve the lecture's problem with that. I was able to achieve correct answers, but only for smaller inputs (Input #10 is a 2,82 MB file). Obviously, my algorithm is not scaling well. It also exceeds the default 128 MB memory limit of PHP.
My algorithm
There is a root node stored in Trie. Every node has a "children" member. I use the standard PHP array to store the children. A child key represents a character (currently I am creating a new node for every character, a-z lowercase, mapping to 0-25), a child value is a reference to another node.
The "weight" member that every nodes has is there because of the problem. I would like to optimize my code, (or even rewrite it from stratch using a different approach) so that it can pass the tests for big inputs.
I'm interested in a solution to make this data structure work in PHP with big inputs, if possible.
My code
TrieNode class stores the tree hierarchy.
class TrieNode {
// weight is needed for the given problem
public $weight;
/* TrieNode children,
* e.g. [0 => (TrieNode object1), 2 => (TrieNode object2)]
* where 0 stands for 'a', 1 for 'c'
* and TrieNode objects are references to other TrieNodes.
*/
private $children;
function __construct($weight, $children) {
$this->weight = $weight;
$this->children = $children;
}
/** map lower case english letters to 0-25 */
static function getAsciiValue($char) {
return intval(ord($char)) - intval(ord('a'));
}
function addChild($char, $node) {
if (!isset($this->children)) {
$this->children = [];
}
$this->children[self::getAsciiValue($char)] = $node;
}
function isChild($char) {
return isset($this->children[self::getAsciiValue($char)]);
}
function getChild($char) {
return $this->children[self::getAsciiValue($char)];
}
function isLeaf() {
return empty($this->children);
}
}
Trie class stores the root TrieNode. It can insert and query nodes.
class Trie {
/* root TrieNode stores the first characters */
private $root;
function __construct() {
$this->root = new TrieNode(-1, []);
}
function insert($string, $weight) {
$currentNode = $this->root;
$l = strlen($string);
for ($i = 0; $i < $l; $i++) {
$char = $string[$i];
if(!$currentNode->isChild($char)) {
$n = new TrieNode($weight, null);
$currentNode->addChild($char, $n);
}
$currentNode->weight = max($weight, $currentNode->weight);
$currentNode = $currentNode->getChild($char);
}
}
function getNode($string) {
$currentNode = $this->root;
$l = strlen($string);
for ($i = 0; $i < $l; $i++) {
$char = $string[$i];
if ($currentNode->isLeaf() || !$currentNode->isChild($char)) {
return null;
}
$currentNode = $currentNode->getChild($char);
}
return $currentNode;
}
function getWeight($string) {
$node = $this->getNode($string);
return is_null($node) ? -1 : $node->weight;
}
}
Test code. Parses input and calls the Trie object.
//MAIN / TEST
/*
In case the problem page is down:
e.g.
INPUT
2 1
hackerearth 10
hackerrank 9
hacker
OUTPUT
10
where 2 is the number of inserts, 1 is the number of queries
"string number" is the string to insert and its "weight"
"hacker" is the string to query
10 is maximum the weight of the queried string (hacker -> 10)
*/
$trie = new Trie();
$handle = fopen('test.txt', 'r');
//$handle = STDIN; // <- this is for the online judge
list($n, $q) = fscanf($handle, "%d %d");
for ($i = 0; $i < $n; $i++) { // insert data
list($s, $weight) = fscanf($handle, "%s %d");
$trie->insert($s, $weight);
}
for ($i = 0; $i < $q; $i++) { // query data
$query = trim(strval(fgets($handle)));
echo $trie->getWeight($query) . PHP_EOL;
}
fclose($handle);