Classification Vs Regression in Machine Learning

3 min readOct 23, 2023

Understanding the task type and which method to use is very important. Some tasks can be done efficiently by classification while others with regression.

Classification and regression both are supervised learning methods of Machine Learning. That means we have labeled data that we can use to train the model and then predict the classes.

There are two main steps:

Train the model with a set of labeled data
Predict the outcome for new observations

What is Classification and when to use it?

Classification is used to classify data. The primary task performed by classifiers is to assign class to new observations.

For example: Consider an example of binary classification task where have option to choose from two classes. Identify if the given email is spam or not spam, So we will have to train the model on dataset that labels emails to spam or not spam. After that we predict the outcome for new email if that is spam or not spam. Other examples include image recognition, sentimental analysis.

What is Regression and when to use it?

Regression predicts the numerical values. The primary objective of regression is to estimate the continuous numeric output.

For example: Predicting the price of house based on number of features including number of bedrooms, locations etc. Other examples include forecasting, stock price, temperature prediction.

Key Differences Between Classification and Regression

Classification is the task of predicting a discrete value (Categorical ). It assigns data to one of several predefined categories or classes.

Regression is the task of predicting a continuous quantity (numerical). It provides a specific numerical value that may not be limited to predefined categories.

Evaluation metrics:

For classification we use Accuracy, precision, recall, F1-score, ROC, AUC
For regression we use Mean Square Error (MSE), R-squared (R²), Mean absolute error (MAE), Root Mean squared error (RMSE).

Algorithms and Techniques

Some models are used for Classification only, some for regression only, but some of them can be adapted to be used for both classification and regression.
Classification only:
Naive Bayes, Logistic regression
Regression only:
Linear Regression, Polynomial regression, K-NN regression, Time series Algorithm
Classification and regression both:
Decision trees, Random Forest, Support Vector Machines, K-nearest Neighbors, Neural Networks, Gradient Boosting

In summary, classification and regression, both supervised learning techniques in machine learning, serve distinct purposes. Classification is employed when the task involves assigning data to predefined categories, making it ideal for tasks like sentiment analysis or image recognition. In contrast, regression is chosen when the objective is to predict continuous numerical values, as seen in forecasting stock prices or house prices. These two methodologies differ in their output type, evaluation metrics, and objective, making it crucial to select the most suitable approach based on the specific problem at hand and the nature of the data.

References

“Data Science from Scratch: First Principles with Python” by Joel Grus (2015).

“Fundamentals of Machine Learning for Predictive Data Analytics” by John D. Kelleher, Brian Mac Namee, and Aoife D’Arcy.