Why Gaussian radial basis function maps the examples into an infinite-dimensional space?
Asked Answered
F

3

13

I've just run through the Wikipedia page about SVMs, and this line caught my eyes: "If the kernel used is a Gaussian radial basis function, the corresponding feature space is a Hilbert space of infinite dimensions." http://en.wikipedia.org/wiki/Support_vector_machine#Nonlinear_classification

In my understanding, if I apply Gaussian kernel in SVM, the resulting feature space will be m-dimensional (where m is the number of training samples), as you choose your landmarks to be your training examples, and you're measuring the "similarity" between a specific example and all the examples with the Gaussian kernel. As a consequence, for a single example you'll have as many similarity values as training examples. These are going to be the new feature vectors which are going to m-dimensional vectors, and not infinite dimensionals.

Could somebody explain to me what do I miss?

Thanks, Daniel

Frilling answered 10/5, 2014 at 13:16 Comment(1)
In practice, m is only the upper bound -- the whole point of the SVM is to pick a sparse set of support vectors from the training samples.Importunacy
G
12

The other answers are correct but don't really tell the right story here. Importantly, you are correct. If you have m distinct training points then the gaussian radial basis kernel makes the SVM operate in an m dimensional space. We say that the radial basis kernel maps to a space of infinite dimension because you can make m as large as you want and the space it operates in keeps growing without bound.

However, other kernels, like the polynomial kernel do not have this property of the dimensionality scaling with the number of training samples. For example, if you have 1000 2D training samples and you use a polynomial kernel of <x,y>^2 then the SVM will operate in a 3 dimensional space, not a 1000 dimensional space.

Generable answered 11/5, 2014 at 3:23 Comment(0)
A
15

The dual formulation of the linear SVM depends only on scalar products of all training vectors. Scalar product essentially measures similarity of two vectors. We can then generalize it by replacing with any other "well-behaved" (it should be positive-definite, it's needed to preserve convexity, as well as enables Mercer's theorem) similarity measure. And RBF is just one of them.

If you take a look at the formula here you'll see that RBF is basically a scalar product in a certain infinitely dimensional space

Proof

Thus RBF is kind of a union of polynomial kernels of all possible degrees.

Amaleta answered 10/5, 2014 at 13:31 Comment(1)
In short, thanks to the squared L2 norm of the vector (over "x" or "y" vectors) We don't mind the dimensionality of those vectors ("x" could belong to R2 or R1000, for instance), because those vectors are squashed into a scalar, no matter how long they are. Is it right?Campfire
G
12

The other answers are correct but don't really tell the right story here. Importantly, you are correct. If you have m distinct training points then the gaussian radial basis kernel makes the SVM operate in an m dimensional space. We say that the radial basis kernel maps to a space of infinite dimension because you can make m as large as you want and the space it operates in keeps growing without bound.

However, other kernels, like the polynomial kernel do not have this property of the dimensionality scaling with the number of training samples. For example, if you have 1000 2D training samples and you use a polynomial kernel of <x,y>^2 then the SVM will operate in a 3 dimensional space, not a 1000 dimensional space.

Generable answered 11/5, 2014 at 3:23 Comment(0)
J
6

The short answer is that this business about infinite dimensional spaces is only part of the theoretical justification, and of no practical importance. You never actually touch an infinite-dimensional space in any sense. It's part of the proof that the radial basis function works.

Basically, SVMs are proved to work the way they do by relying on properties of dot products over vector spaces. You can't just swap in the radial basis function and expect it necessarily works. To prove that it does, however, you show that the radial basis function is actually like a dot product over a different vector space, and it's as if we're doing regular SVMs in a transformed space, which works. And it happens that infinite dimensioal-ness is OK, and that the radial basis function does correspond to a dot product in such a space. So you can say SVMs still work when you use this particular kernel.

Javier answered 10/5, 2014 at 13:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.