The Free energy approximation Equation in Restriction Boltzmann Machines
Asked Answered
E

1

12

According a deeplearning tutorial:

The free energy in python is

def free_energy(self, v_sample):
    ''' Function to compute the free energy '''
    wx_b = T.dot(v_sample, self.W) + self.hbias
    vbias_term = T.dot(v_sample, self.vbias)
    hidden_term = T.sum(T.log(1 + T.exp(wx_b)), axis=1)
    return -hidden_term - vbias_term

I am not very good at python, basically it get product expert of each visible unit as vector wx_b, calculate exp and plus 1 , calculate log and sum it for the hidden term.

Which I believe is a little different than free energy equation in the Learning Deep Architectures:

FreeEnergy(x) = −b′x − ∑log∑e^hi(ci+Wix).

Where:

  • hi is the unit i hidden layer,
  • ci is the i hidden bias in vector c.

It calculates exp and sum, calculate log respect to the sum value. after all sum all the product expert based on the number of visible unit.

The above equation is eq.5.21 from Learning Deep Architectures for AI (Yoshua Bengio)

Below is my draft of java implementation vis_v is the visible layer sample, hid_v is the hidden layer unit sample.

private double freeEnergy(RealVector vis_v, RealVector hid_v){
 RealVector wx_hb= W.preMultiply(vis_v).add(hBias);
 double vbias_term= vis_v.dotProduct(vBias);
 double sum_hidden_term = 0;
 for(int i=0;i< wx_hb.getDimension();i++){
     RealVector vis_expert = hid_v.mapMultiply(wx_hb.getEntry(i));
     double hidden_term= StatUtils.sum(vis_expert.map(new Exp()).toArray());
     sum_hidden_term+=Math.log(hidden_term);
 }
 return -sum_hidden_term-vbias_term;
}

Is this some kind of approximation? I am trying to implement the same thing in java, but am getting confused over it. Thanks in advance for any help!

Eunaeunice answered 30/3, 2012 at 14:7 Comment(4)
Yowser! That's a change from the usual kind of question we get on stackoverflow :) Let me take a deeper look to see what's what.Dianetics
As an aside, there is a quick way to check your code: run it and see if it differs from the reference version...Dianetics
thanks, brice. I have run my code against some real example , it works. I believe the python example works too. I need to find out which one more optimize, it might help reduce the deep network error in future, even 0.1~0.2% is worth.Eunaeunice
Since I'm at work, I was going to take a look at this on coming home. It sounds like you're working on interesting stuff there ryo.Dianetics
D
3

I gather your confusion is over the definition of the free energy function in the reference python code. If this isn't what your asking I apologize.

First off, this is not an approximation. It looks like they're assuming the hidden units are binary valued. Remember, the free energy is just the (log of) the energy with hidden variables marginalized out. So, the inner sum in the free energy equation you listed above is just a sum over the values the i^th hidden unit can take on which, in this case, are {0,1}. Since exp(0) = 1 that inner sum just becomes 1+exp(...). See the "RBMs With Binary Units" section in the link you provided.

I'm not familiar with the apache commons math library in java so I can't be a huge amount of help there, but the implementation should be a straightforward translation from that python function.

Diaphoresis answered 30/3, 2012 at 15:25 Comment(4)
right, Jeshua. The hidden units in this case are binary unit. If it is 0, then e^0=1, else e^(ci+wix). My question is actually it should first sum exp(wx_b) then log, it does 1+exp(wx_b) -> log() -> sum() instead. i don't understand why it add constant 1 in it and sum procedure is a little different from equationEunaeunice
I'm not sure which part you think is different form the equation. FE(v) = −b′v − ∑log∑e^h_i(c_i+W_i v) = -b'v - ∑log(1 + e^(W_i v)). Where the second equality follows because the second sum is only over two values of h_i and the c_i disappears because (presumably) the bias term is subsumed by a constant term in v (usual practice). This is exactly what T.sum(T.log(1 + T.exp(wx_b)), axis=1) computesDiaphoresis
To my understanding of the second term in FE(V) equation is, 1. sum all the hidden unit on e^h_i(c_i+W_i v), which is px=∑e^h_i(c_i+W_i v) 2. log(px) 3. sum all the visible unit on log(px), which is ∑log(px). The hidden state h are from {0,1}, the two possible value of e^h_i(c_i+W_i v) are e^0 or e^(c_i+W_i v). To my best knowledge, sum hidden unit should not be sum all the visible unit for e^0 + e^(c_i+W_i v) = 1+e^(c_i+W_i v). That's the exactly T.sum(T.log(1+T.exp(wx+b)),axis), which 1+T.exp(wx+b) = 1+e^(c_i+W_i v). Perhaps my understanding of this python syntax is wrong.Eunaeunice
I believe the outer T.sum(..., axis=1) will not sum over the visible units. It looks to me the sum function gets passed a <num visible units> by <num hidden units> array, and specifying axis=1 means that it will only sum over the hidden units and return an array with <num visible units> elements.Surah

© 2022 - 2024 — McMap. All rights reserved.