When to use supervised or unsupervised learning?

Asked 4/7, 2017 at 13:49 Answered 27/3, 2022 at 16:33

Solved machine-learning criteria supervised-learning unsupervised-learning

Which are the fundamental criterias for using supervised or unsupervised learning?
When is one better than the other?
Is there specific cases when you can only use one of them?

Thanks

Banns answered 4/7, 2017 at 13:49 Comment(0)

If you a have labeled dataset you can use both. If you have no labels you only can use unsupervised learning.
It´s not a question of "better". It´s a question of what you want to achieve. E.g. clustering data is usually unsupervised – you want the algorithm to tell you how your data is structured. Categorizing is supervised since you need to teach your algorithm what is what in order to make predictions on unseen data.
See 1.

On a side note: These are very broad questions. I suggest you familiarize yourself with some ML foundations.

Good podcast for example here: http://ocdevel.com/podcasts/machine-learning

Very good book / notebooks by Jake VanderPlas: http://nbviewer.jupyter.org/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/Index.ipynb

Iridosmine answered 4/7, 2017 at 14:59 Comment(0)

Depends on your needs. If you have a set of existing data including the target values that you wish to predict (labels) then you probably need supervised learning (e.g. is something true or false; or does this data represent a fish or cat or a dog? Simply put - you already have examples of right answers and you are just telling the algorithm what to predict). You also need to distinguish whether you need a classification or regression. Classification is when you need to categorize the predicted values into given classes (e.g. is it likely that this person develops a diabetes - yes or no? In other words - discrete values) and regression is when you need to predict continuous values (1,2, 4.56, 12.99, 23 etc.). There are many supervised learning algorithms to choose from (k-nearest neighbors, naive bayes, SVN, ridge..)

On contrary - use the unsupervised learning if you don't have the labels (or target values). You're simply trying to identify the clusters of data as they come. E.g. k-Means, DBScan, spectral clustering..)

So it depends and there's no exact answer but generally speaking you need to:

Collect and see you data. You need to know your data and only then decide which way you choose or what algorithm will best suite your needs.
Train your algorithm. Be sure to have a clean and good data and bear in mind that in case of unsupervised learning you can skip this step as you don't have the target values. You test your algorithm right away
Test your algorithm. Run and see how well your algorithm behaves. In case of supervised learning you can use some training data to evaluate how well is your algorithm doing.

There are many books online about machine learning and many online lectures on the topic as well.

Bethsaida answered 4/7, 2017 at 15:20 Comment(0)

Depends on the data set that you have. If you have target feature in your hand then you should go for supervised learning. If you don't have then it is a unsupervised based problem. Supervised is like teaching the model with examples. Unsupervised learning is mainly used to group similar data, it plays a major role in feature engineering. Thank you..

Lickerish answered 27/3, 2022 at 16:33 Comment(0)

Recommended topics

Hot tags