Is it right to normalize data and/or weight vectors in a SOM?
Asked Answered
M

2

6

So I am being stumped by something that (should) be simple:

I have written a SOM for a simple 'play' two-dimensional data set. Here is the data:

enter image description here

You can make out 3 clusters by yourself.

Now, there are two things that confuse me. The first is that the tutorial that I have, normalizes the data before the SOM gets to work on it. This means, it normalizes each data vector to have length 1. (Euclidean norm). If I do that, then the data looks like this:

enter image description here

(This is because all the data has been projected onto the unit circle).

So, my question(s) are as follows:

1) Is this correct? Projecting the data down onto the unit circle seems to be bad, because you can no longer make out 3 clusters... Is this a fact of life for SOMs? (ie, that they only work on the unit circle).

2) The second related question is that not only are the data normalized to have length 1, but so are the weight vectors of each output unit after every iteration. I understand that they do this so that the weight vectors dont 'blow up', but it seems wrong to me, since the whole point of the weight vectors is to retain distance information. If you normalize them, you lose the ability to 'cluster' properly. For example, how can the SOM possibly distinguish between the cluster on the lower left, from the cluster on the upper right, since they project down to the unit circle the same way?

I am very confused by this. Should data be normalized to unit length in SOMs? Should the weight vectors be normalized as well?

Thanks!

EDIT

Here is the data, saved as a .mat file for MATLAB. It is a simple 2 dimensional data set.

Montevideo answered 3/12, 2012 at 16:12 Comment(0)
T
10

To decide if you are going to normalize input data or not, it depends on what these data represent. Lets say that you doing clustering on two dimensional (or three dimensional) input data in which each data vector represents a spatial point. First dimension is x coordinate and second is y coordinate. In this case you don't normalize the input data because the input features (each dimension) are comparable between each other.

If you are doing clustering again on two dimension space but each input vector represents the age and the annual income of a person, the first feature (dimension) is the age and the second is the annual income, then you must normalize the input features because they represent something different (different measurement unit) and in a completely different scale. Lets examine these input vectors: D1(25, 30000), D2(50, 30000) and D3(25, 60000). Both D2 and D3 are doubling one of the features compared to D1. Keep in mind that SOM uses Euclidian distance measures. Distance(D1, D2) = 25 and Distance(D1, D3) = 30000. It's kind of "unfair" for the first input feature (age) because although you doubling it you get a much smaller distance as opposed to the second example (D1,D3).

Check this, which also has a similar example

If you are going to normalize your input data, you normalize on each feature/dimension (each column on you input data table). Quoting from som_normalize manual:

"Normalizations are always one-variable operations"

Check also this for a brief explanation on normalization and if you want to read more try this (chapter 7 is what you want)

EDIT:

The most common normalization methods are scaling each dimension data to [0,1] or transforming them to have a zero mean and standard deviation 1. The first is done by substracting from each input the min value of its dimension (column) and the dividing with the the max value minun the min value (of its dimension).

Xi,norm = (Xi - Xmin)/(Xmax-Xmin)

Yi,norm = (Yi - Ymin)/(Ymax-Ymin)

In the second method you substract the mean value of each dimension and then divide with standard deviation.

Xi,norm = (Xi - Xmean)/(Xsd)

Each method has pros/cons. For example the first method is very sensitive to outliers in data. You should choose after you have examined the statistical characteristics of your dataset.

Projecting in the unit circle is not actually a normalization method but more of a dimensionallity reduction method, since after the projection you could replace each data point with a single number (eg. its angle). You don't have to do this.

Tieratierce answered 3/12, 2012 at 23:7 Comment(8)
Thanks @pater, so, 1) Is it then necessary to have all data exist only within the [-1 1] space before projecting on the unit sphere? 2) In other neural net paradigms, I am told that I also have to de-mean. Why dont I need to de-mean in SOMs? (That is, remove the mean of each dimension from the data). 3) The last two links are not working for me... can you please check link is good? Thanks!Montevideo
@pater, Unfortunately, you're wrong. Network does not "know" anything about data's underlying nature. So even if all vector elements represent "spatial point", they will not be processed well, if they are not normalized. The rule is: input data must be always normalized, if it's not normalized already. Network is just an algorithm, where calculations are most effective when inputs lie in appropriate region, matching network equations' domain (this, for example, will prevent neurons from going into saturation, etc).Hostility
@Learnaholic: I have update my answer. If you read my update or the posted links you will see that the most common normalization methods include a de-mean action. The links are working for me, if you still cannot open them, please tell me so and will email the contents to you.Tieratierce
@Stan, Sorry stan but I am not wrong. Normalization is a procedure necessary in most cases, but not in all cases. If data dimensions express features of the same measurement unit and of same scale, normalization is not necessary. You cannot compare oranges with apples but you can compare apples with apples. The scale of the input features it may be of consern for some algorithms (eg Backpropagation), but not for SOM. SOM just calculates euclidean distances so no worries if your data scale between [0,1] or [0,1000], as long as all dimensions are on the same scale (as said before).Tieratierce
@Hostility And for the spatial example is perfectelly ok NOT to normalize the data especially for SOM. Could you explain what will be the problem, if you don't normalize the data in this case?Tieratierce
@pater, Heh, if data has the same scale it's already normalized and should not be normalized, as I said. But otherwise, it must be normalized, regardless to its physical meaning, which is nothing from point of mathematics. [0, 1] and [0, 1000] are apparently not in one scale. Network will learn badly, at least not as good as with normalized data.Hostility
@Stan, That's my point, that in same cases the data are "normalized" by their nature and further normalization will obscure the result of the analysis. As for the scale example, you got it wrong. I meant that it's no problem for the scale to be [0,1] or [0,1000] as long as all input dimensions lay in the same scale (as said, this doesn't apply for all algorithms, but it does for SOM)Tieratierce
Pater, Ah yes, I remember that projection on UC actually does do DR since there is only an angle argument now. In general, we half the number of 'features'. Yes, I also remember that de-meaning each dimension, and dividing by std of each dimension was the most common technique, which is why I am confused by this one mentioned in @Stans article. PS, the links still dont work for me, if you can email them I would highly appreciate it! (The last two links I mean). Thanks.Montevideo
H
3

In SOM training algorithm, a bunch of different measures is used to calculate distance between vectors (patterns and weights). To name a couple of them (perhaps most widely used): euclidean distance and dot product. If you normalize vectors and weights to unity they are equivalent, and allows the network to learn in most effective way. If, for instance, you do not normalize your current data, the network will process points from different parts of input space with different bias (larger values will take larger effect). This is why normalization to unity is important and considered as appropriate step for most cases (specifically, if dot product is used as a measure).

Your source data should be prepared before it can be normalized to unit circle. You should map the data into [-1, 1] region in both axis. There exist several algorithms for this, one of them uses the simple formulae:

mult_factor = 2 / (max - min);
offset_factor = 1 - 2 * max / (max - min),

where min and max are minimal and maximal values in your data set, or domain boundaries, if it's known beforehand. Every dimension is processed separately. For your case, this will be X and Y coordinates.

Xnew = Xold * Xmult_factor + Xoffset_factor, i = 1..N
Ynew = Yold * Ymult_factor + Yoffset_factor, i = 1..N

No matter what are actual values of min and max before the mapping (this can be [0,1] as in your case, or [-3.6, 10]), after the mapping they'll fall into the range [-1, 1]. Actually, the formulae above are specific for converting data into the range [-1, 1], because they are just a special case of general conversion process from one range into another:

data[i] = (data[i] - old_min) * (new_max - new_min) / (old_max - old_min) + new_min;

After the mapping you can proceed with normalization to unit circle, and this way you'll finally get a circle with [0, 0] in its center.

You can find more information on this page. Though the site is not about neural networks in general, this specific page provides good explanations of SOM, including descriptive graphs on data normalization.

Hostility answered 3/12, 2012 at 20:9 Comment(9)
Thanks Stan. Some comments/questions: 1) First off, I added the data set here, as a *.mat file. 2) When you say "Your source data should be prepared before it can be normalized to unity", what exactly do you mean by that? Do you mean I have to remove the mean of every dimension? If not, I dont know how you mean to 'map it to [-1 1]. Thanks a lot.Montevideo
Your data is currently falling into [0, 1]. You need to scale it by multiplication on 2 and shift by subtracting 1 in both x and y. This is a general mapping: mult_factor = 2 / (max - min), plus_offset = (1 - 2 / (max - min) * max), where min and max are minimal and maximal values in your data set, or domain boundaries, if it's known (as in your case). Only after this mapping you can normalize data to unity.Hostility
Ok, so I have read the article you linked, (good article btw), and thought about it some more. However, I do not understand why in this case, we are not simply removing the mean of the data (in both dimensions of course) instead of doing what the article is saying. I realize that it is trying to 'fit' the data from (0,1) into [-1 1], but I do not understand why we do not just de-mean...Montevideo
Also, in this example I posted, I just so happened to have data that existed from (0,1). What if the data I had existed from say, [-3.2, 6]? How would that be corrected for?Montevideo
Normalization into unit cicrle (UC) makes your data equally weighted from point of SOM's view. Without it SOM will still learn but less efficient and clusterization can be distorted. For input vectors of large size normalization to UC is less important, than in your case with 2 dimensions. Data can have any range before normalization ([-3.2, 6] is ok as well). As I wrote, there exists standard ways of normalization, one of them is by the formula with min and max (above), just feed your actual range into it.Hostility
Ah! I was doing it wrong. Thanks for the edits and additional info, it works now. (BTW, there was a typo of - when it was a + so I corrected that). Since the only type of normalization I am aware of is de-mean followed by dividing by std, what is the 'name' of this type of normalization? Is there a wiki or article about it? Why would one chose to use this type VS de-mean followed by dividing by std? Thanks again.Montevideo
The method you mentioned is good as well. Linear transformation, which I suggested, is good for (nearly) uniformly distributed values. If data have a noticable "center of mass" or outliers (which creates false boundaries), then de-mean (standardization) is preferred. Another methods do exist.Hostility
Thanks again. Is there a name for this type of normalization that you are using here? I would like to look it up.Montevideo
As I wrote, this is simple linear transformation. BTW, some people rejected your edit before I catch it to accept, so I edited myself.Hostility

© 2022 - 2024 — McMap. All rights reserved.