What is Supervised Learning?
This is a technique of machine learning that involves training a model using labeled training data i.e. a specific value is ascribed to the data which youintend the model to predict. A model centered on supervised learning attempts to find patterns in a historical dataset to make a prediction that is a fairly accurate representation of the value or label of the training data. For example, animal recognition systems or traffic sign classification systems. The required categorization may be as simple as yes/no. These are called binary classification problems. With a specific target value in mind, your model is best built on a supervised learning model. Some examples of supervised learning models are linear regression, logistic regression, decision trees, random forests, etc.
Supervised learning aims to use large labeled datasets to train a machine learning model to identify and assign the correct label to other images fed to it based on the examples it has been trained with. The “supervised” in this context refers to the human user(s) who provide the correct labels tothe training data.
The principle of supervised learning can also be applied to scenarios in which the desired output is a number. For instance, predicting the worth of an artifact based on factors like period, location origin, historical relevance, fragility, etc. It may also serve to predict the number of patients in a large hospital may admit in a year based on previous hospital records and current disease trends. These predictive applications belong under “regression” which is a statistical model that seeks to estimate the relationship between a dependent variable and one or more independent variables. In this example a machine learning model could be designed that does just this.
There is a need to tread carefully with machine learning models like this due to the risk of overconfidence in the model’s accuracy. The overconfidence effect is when a human user's subjective confidence in the prediction of a machine learning model is greater than the objective accuracy of those judgments. To avoid this pitfall, it helps to create both a training dataset and a testing dataset. The former serves to establish the rules by which the model will operate and make its predictions while the latter tests the model’s capacity to handle other more general kinds of data.The predictions of the system makes with the training data and the test data can then be compared.The comparison made can however lead to another pitfall called the “overfitting” phenomenon. Overfitting is when the algorithm generates good performance on the training data, but poor generalization to other data.
The generalization here is the machine learning model's ability to make use of the patterns learned from its training data to adapt properly to new, previously unencountered data. Quite simply, the overfitting phenomenon occurs when one attempts to capture all possible variables from historical data in a bid to create an accurate machine learning model. The data scientist tries out a wide range of rules until a perfect fit for the training data is found. The model then starts to learn from the surrounding noise of the data overload and fails to generalize to new input data.