Tensorflow: Hierarchical Softmax Implementation
Asked Answered
S

2

16

I'm currently having text inputs represented by vector, and I want to classify their categories. Because they are multi-level categories, I meant to use Hierarchical Softmax.

Example:

 - Computer Science
     - Machine Learning
     - NLP
 - Economics
 - Maths
     - Algebra
     - Geometry

I don't know how to implement it in Tensorflow. All examples I've met is using other frameworks.

Thanks

Silkstocking answered 15/11, 2017 at 17:22 Comment(3)
Could you write down the exact formula you want to implement?Photovoltaic
I need to build hierarchical tree first. Assume output tree path of 1 input is [A1-> A10-> A101], then loss_of_that_input = softmax_cross_entropy(A1|Ax) + softmax_cross_entropy(A10|A1x) + softmax_cross_entropy(A101|A10x)Silkstocking
@Photovoltaic you can see a example of implementation in here (but it's not using tensorflow): talbaumel.github.io/softmaxSilkstocking
S
5

Finally, I have changed to use Pytorch. It's easier and more straight-forward than Tensorflow.

Silkstocking answered 16/12, 2017 at 10:37 Comment(2)
Did you get better accurate after using the hierarchical softmax network?Flanker
@Flanker I got better result with hierarchical softmax (+10% approximate ~ not really remember). Think of it as a Jack of all trade person: If a person can do many jobs from A, B, C… to Z then generally he/she is not a master of any of those. While someone who only does A or B in their entire life would be a master. Similarly, if a neuron unit is forced to train for every category (exam 0-9 digit), it won't be effiecient like when I group by similar digit: (1,4,7), (5,6), (3,8),...Silkstocking
M
13

Practically if your total number of categories is in the range of hundreds to thousands (less than 50K), you don't need to consider using hierarchical softmax, which is designed to run training faster for classifying into millions of categories (for example, the number of words in a vocabulary).

In my experience (with Naive Bayesian and neural networks), utilizing the hierarchical structure at training time does not necessarily improve your classification quality.

However, if you are interested to implement Hierarchical Softmax anyway, that's another story.

Middleclass answered 28/11, 2017 at 0:1 Comment(1)
Do you have a authoritative source or measurement about the performance of ordinary softmax versus hierarchical softmax with categories from 10,000-100,000? Because some other (informal) tutorials say you should start thinking about adaptive or hierarchical softmax around the 10,000 mark...Masterson
S
5

Finally, I have changed to use Pytorch. It's easier and more straight-forward than Tensorflow.

Silkstocking answered 16/12, 2017 at 10:37 Comment(2)
Did you get better accurate after using the hierarchical softmax network?Flanker
@Flanker I got better result with hierarchical softmax (+10% approximate ~ not really remember). Think of it as a Jack of all trade person: If a person can do many jobs from A, B, C… to Z then generally he/she is not a master of any of those. While someone who only does A or B in their entire life would be a master. Similarly, if a neuron unit is forced to train for every category (exam 0-9 digit), it won't be effiecient like when I group by similar digit: (1,4,7), (5,6), (3,8),...Silkstocking

© 2022 - 2024 — McMap. All rights reserved.