Low and high dimensional data

Asked 2/6, 2017 at 17:22 Answered 12/3, 2020 at 2:3

Solved machine-learning artificial-intelligence svm dimensions

I'm new to machine learning and while I was learning about SVM I found the term :"Low and high dimensional data" so can anyone explain to me what are they and what's the difference ?

Cavitation answered 2/6, 2017 at 17:22 Comment(0)

It generally refers to the number of features you have for each sample in the problem you are trying to classify. For example, the famous Iris flower dataset only includes 4 features (Sepal length, sepal width, petal width, petal length), and would be considered as a low dimensional dataset.

Other datasets, dealing with more complex data, could include hundreds or thousands features for each sample. Those are the ones considered as high dimensional datasets.

Triserial answered 3/6, 2017 at 11:40 Comment(2)

Thanks got it finally ! – Cavitation 4/6, 2017 at 14:19

Explained in simple terms. – Peeved 20/3, 2018 at 13:8

As defined in The Elements of Statistical Learning (chapter 18, page 649 - or page 668 of the 2nd edition's pdf linked here), high-dimensional problems are problems where

the number of features p is much larger than the number of observations N, often written p>>N

So high dimensional data isn't actually about a large number of features (as the accepted answer suggests), it is defined by the features/samples ratio. Note that this definition holds for the machine learning community, but may not relate to the same idea in other fields.

As this quora answer suggests, developping models with high-dimensional data is often synonymous with introducing strong assumptions when it comes to producing deterministic answers.

Sander answered 5/8, 2019 at 9:3 Comment(1)

Interesting opinion...how to get number of observations? For example, Iris flower dataset there is three classes. Is the observation of each class? – Bennington 11/8 at 0:0

High/low dimensionality is associated with ratio between observations and features in data set. In case, the number of observations is significantly lower than the number of features it is considered high dimensional data set.

Footrace answered 12/3, 2020 at 2:3 Comment(0)

Recommended topics

Hot tags