Explanation of Ruby code for building Trie data structures
Asked Answered
R

2

6

So I have this ruby code I grabbed from wikipedia and I modified a bit:

@trie = Hash.new()
def build(str)
    node = @trie
    str.each_char { |ch|
      cur = ch
      prev_node = node
      node = node[cur]
      if node == nil
        prev_node[cur] = Hash.new()
        node = prev_node[cur]
      end
    }
  end

build('dogs')

puts @trie.inspect

I first ran this on console irb, and each time I output node, it just keeps giving me an empty hash each time {}, but when I actually invoke that function build with parameter 'dogs' string, it actually does work, and outputs {"d"=>{"o"=>{"g"=>{"s"=>{}}}}}, which is totally correct.

This is probably more of a Ruby question than the actual question about how the algorithm works. I don't really have adequate Ruby knowledge to decipher what is going on there I guess.

Ranice answered 28/1, 2012 at 2:22 Comment(4)
What do you mean by "each time I output node"? Not sure what kind of answer you're looking for--have you traced the algorithm on paper for a small string? It's pretty short.Farwell
i typed in 'node' on the console after typing in node = prev_node['a'], then it returns something like => {} which means hash to empty object, and then after that i typed in node = node['b'], trying to simulate the for loop by passing in a diffeent char as a key. when i typed in 'node' again after that, it returns nil of course. I just wonder how it can produce {"d"=>{"o"=>{"g"=>{"s"=>{}}}}} when each time i trace it, it looks like its always emptyRanice
I still recommend tracing the algorithm through with a piece of paper, following each step and writing down the values of all the variables. Look particularly at prev_node[cur] = Hash.new, and consider what prev_node[cur] is--but remember that prev_node itself is a hash with a character key, that happens to point to an empty hash (for one iteration).Farwell
I don't understand what your question is.Trisomic
I
25

You're probably getting lost inside that mess of code which takes an approach that seems a better fit for C++ than for Ruby. Here's the same thing in a more concise format that uses a special case Hash for storage:

class Trie < Hash
  def initialize
    # Ensure that this is not a special Hash by disallowing
    # initialization options.
    super
  end

  def build(string)
    string.chars.inject(self) do |h, char|
      h[char] ||= { }
    end
  end
end

It works exactly the same but doesn't have nearly the same mess with pointers and such:

trie = Trie.new
trie.build('dogs')
puts trie.inspect

Ruby's Enumerable module is full of amazingly useful methods like inject which is precisely what you want for a situation like this.

Illsorted answered 28/1, 2012 at 2:54 Comment(6)
How would you add another word to the Trie?Archaeornis
@EricDuminil You can call build as many times as you want with different words.Illsorted
Thanks, I didn't notice build is an instance method. add might be more appropriate, though. Depending on the data, it might be interesting to add some information about final nodes : adding foo after foobar doesn't change anything with the current structure.Archaeornis
@EricDuminil It's just the basis for a more robust version, so if you need that sort of thing, by all means, extend away! build is the method given in the original question, so I was just using that.Illsorted
Thanks. This question is pretty high on "ruby trie" queries, that's why I took the liberty to comment about it. Nice method, BTW!Archaeornis
I love this very simple trie representation. One tiny tweak you can make is to make the sub-tries actual tries instead of generic hashes, so h[char] ||= Trie.new instead of h[char] ||= { }. For example, this allow for a very simple recursive implementation of longest common prefix: gist.github.com/hoffm/e190a0686ef3a958f7a2f8be4c18e60cMidland
M
0

I think you are just using irb incorrectly. You should type the whole function in, then run it, and see if you get correct results. If it doesn't work, how about you post your entire IRB session here.

Also here is a simplified version of your code:

def build(str)
  node = @trie
  str.each_char do |ch|
    node = (node[ch] ||= {})
  end
  # not sure what the return value should be
end
Murphree answered 28/1, 2012 at 2:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.