I'm new to machine learning and while I was learning about SVM I found the term :"Low and high dimensional data" so can anyone explain to me what are they and what's the difference ?
It generally refers to the number of features you have for each sample in the problem you are trying to classify. For example, the famous Iris flower dataset only includes 4 features (Sepal length, sepal width, petal width, petal length), and would be considered as a low dimensional dataset.
Other datasets, dealing with more complex data, could include hundreds or thousands features for each sample. Those are the ones considered as high dimensional datasets.
As defined in The Elements of Statistical Learning (chapter 18, page 649 - or page 668 of the 2nd edition's pdf linked here), high-dimensional problems are problems where
the number of features p is much larger than the number of observations N, often written p>>N
So high dimensional data isn't actually about a large number of features (as the accepted answer suggests), it is defined by the features/samples ratio. Note that this definition holds for the machine learning community, but may not relate to the same idea in other fields.
As this quora answer suggests, developping models with high-dimensional data is often synonymous with introducing strong assumptions when it comes to producing deterministic answers.
High/low dimensionality is associated with ratio between observations and features in data set. In case, the number of observations is significantly lower than the number of features it is considered high dimensional data set.
© 2022 - 2024 — McMap. All rights reserved.