Php prefix tree implementation versus assoc array [closed]
Asked Answered
A

1

6

UPD: I moved original question to https://codereview.stackexchange.com/questions/127055/building-tree-graph-from-dictionary-performance-issues

Here is a short version, without codes.

I'm trying to build a prefix tree from dictionary. So, using the following dictionary 'and','anna','ape','apple', graph should look like this: graph I've tried 2 approaches: using associative array and using self-written tree/node classes.

Note: original dictionary is something about 8 MB and contains >600000 words.

Question: is there any good(fast/efficient) way to do it?

I've tried so far:

  • php associative arrays (they are not very flexible for future work with this graph).

  • self-written Tree/Node classes (performance issues - execution time rises by up to 7x, memory usage rises by 2x even without implementing anything except just inserting function).

Sample codes are available on codereview (the very first link in question)

Alphard answered 29/4, 2016 at 12:42 Comment(11)
They both have the same code/execution complexity, not the same memory footprint and execution speed. Depending on the PHP version you run this under classes use more or less memory too. If you are looking for better performance and not just learning stuff I'd suggest looking into nested sets. You'll find ready to use PHP implementations too: #272510Gravity
This question is better suited for code reviewTremayne
@Sergiu Paraschiv - I'll look into itAlphard
@Tremayne I'm not actually asking to review my code, I'm basically asking "why is tree implementation on classes is MUCH MORE slower, than implementation on arrays?". The code is given to illustrate the problem. I actualy was expecting something near 2x speed drop. 8x+ just shocked me.Alphard
Sure you are - A performance review is still a review. See the tour of code review - "Ask about... The quality of your working code with regards to: Performance".Tremayne
@Tremayne oh... Thanks for the information. Is there an easy way to move the question from here to there?Alphard
I'm voting to close this question as off-topic because it has been cross-posted on a better suited Stack Exchange site. In future, please only post your question on a single Stack Exchange site. For more information, see hereMelanimelania
@Melanimelania I get your point, still I've modified my initial question right after moving it to codereview, so it stopped being a duplicate. In short terms: current question is "what are known ways to solve a problem?" and the one on codereview is "why are my solutions so slow?" (you can check codereview question and see the difference).Alphard
@haldagan: ah, my apologies. But in its current form, this question is not suitable for Stack Overflow "is there any good(fast/efficient) way to do this" is primarily opinion based (good!?) and too broad (lots of ways of doing it).Melanimelania
@Melanimelania Technically you are right, yet i've discovered only 2 widely-used techniques - saving trie as a list with pointers and using hashmaps. So basically there are no other sensible ways of building tries (and even if there were - it would be at least educating for me to learn those). As for "opinion based" - my question clearly states "good(fast/efficient)", so basically I was asking about known approaches that could beat naive array or simple OOP implementation in sense of time of execution and memory usage on a large dictionary.Alphard
So good(fast/efficient) can not be based on one's opinion - only on hardware/code used.Alphard
A
0

As long as I've switched to C++ and got a good answer on codereview, I'll just answer my own question here.

There is one more way to do it way more time-efficient by increasing memory usage(it's not really big increase, compared to "array of arrays of arrays..." approach). The approach is called "double array trie" and you can read info on this topic here and read the aforementioned answer on codereview to see an example of implementation.

It's more time-efficient, yet it allows less flexibility/convenience for future trie use (compared to OOP approach).

So the final answer on this question for me is: "php is not the best tool to work with really big tries with".

Alphard answered 6/5, 2016 at 10:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.