Periodic Data with Machine Learning (Like Degree Angles -> 179 is 2 different from -179)
Asked Answered
M

4

8

I'm using Python for kernel density estimations and gaussian mixture models to rank likelihood of samples of multidimensional data. Every piece of data is an angle, and I'm not sure how to handle the periodicity of angular data for machine learning.

First I removed all negative angles by adding 360 to them, so all angles that were negative became positive, -179 becoming 181. I believe this elegantly handles the case of -179 an similar being not significantly different than 179 and similar, but it does not handle instances like 359 being not dissimilar from 1.

One way I've thought of approaching the issue is keeping both negative and negative+360 values and using the minimum of the two, but this would require modification of the machine learning algorithms.

Is there a good preprocessing-only solution to this problem? Anything built into scipy or scikit?

Thanks!

Maisey answered 4/12, 2013 at 17:56 Comment(5)
When you say "Every piece of data is an angle" you mean both the input features and the target variable (for regression)?Encyclopedia
Not an expert on these scipy or scikit, but you can try replacing the angle by cos(angle), sin(angle)Encumbrancer
@ogrisel, yes, I mean all input features and target variables are angles.Maisey
@TalDarom, I don't see how that solves the periodicity of the data. Could you elaborate?Maisey
it solves the problem because cos and sin are periodic functions of the angle. e.g. you could use Euclidean distance (or any other standard metric) between these values.Encumbrancer
B
13

As Tal Darom wrote in the comments, you can replace every periodic feature x with two features cos(x) and sin(x) after normalizing to radians. That solves the 359 ≈ 1 problem:

>>> def fromdeg(d):
...     r = d * np.pi / 180.
...     return np.array([np.cos(r), np.sin(r)])
... 
>>> np.linalg.norm(fromdeg(1) - fromdeg(359))
0.03490481287456796
>>> np.linalg.norm(fromdeg(1) - fromdeg(180))
1.9999238461283426
>>> np.linalg.norm(fromdeg(90) - fromdeg(270))
2.0

norm(a - b) is the good old Euclidean distance between vectors a and b. As you can verify using a simple plot, or by realizing that these (cos,sin) pairs are really coordinates on the unit circle, that this distance is maximal (and the dot product minimal) between two of these (cos,sin) vectors when the original angles differ by 180°.

Blalock answered 6/12, 2013 at 0:2 Comment(2)
sorry but I'm not sure I understand this solution. how is it applied to every sample in the dataset as a preprocessing step?Maisey
@Kylamus: yes, it's part of feature extraction.Blalock
H
2

An alternative to the methods already posted would be to model the angular variables using the Von Mises distribution.

This distribution appears to be supported by scipy so shouldn't be too difficult to fit into a mixture model.

Henri answered 6/12, 2013 at 19:58 Comment(0)
G
0

Another simpler way could be to use time as angle measurements than degree measurements (not DMS though). Since many analytics software features time as a datatype, you can use its periodicity to do your job.

But remember, you need to scale 360 degrees to 24 hours.

Glum answered 21/4, 2018 at 16:1 Comment(0)
I
-3

You need to use the mod function. In straight python this would be (ang2-ang1)%360 but with scipy it looks like you can use numpy.mod() - see the documentation.

Intervalometer answered 4/12, 2013 at 18:5 Comment(2)
this is not even close to an answer for the problem. OP is not asking "how to calculate difference between two angles", the question regards completely different aspect, much deeper and harder. It is not the question about the function, or even about any implementation issue. It is a conceptual question regarding usage of custom metrics in a class of clustering models.Batey
@Batey - About two thirds of the question seemed to be about how to calculate the difference between the angles - even half of the title. I assumed that was where the problem was and that he could do the other stuff. But clearly I misunderstood.Intervalometer

© 2022 - 2024 — McMap. All rights reserved.