Support Vector Regression

“Support Vector Machine” (SVM) is a supervised machine learning algorithm which can be used for both classification or regression challenges.
Support Vector Machine can also be used as a regression method, maintaining all the main features that characterize the algorithm (maximal margin). The Support Vector Regression (SVR) uses the same principles as the SVM for classification, with only a few minor differences.
In SVM for classification problem we actually try to separate the class as far as possible from the separating line (Hyperplane) and unlike logistic regression, we create a safety boundary from both sides of the hyperplane (different between logistic regression and SVM classification is in their loss function). Eventually, having a separated different data points as far as possible from hyperplane.

In SVM for regression problem, We want to fit a model to predict a quantity for future. Therefore, we want the data point(observation) to be as close as possible to the hyperplane unlike SVM for classification. The SVM regression inherited from Simple Regression like (Ordinary Least Square) by this difference that we define an epsilon range from both sides of hyperplane to make the regression function insensitive to the error unlike SVM for classification that we define a boundary to be safe for making the future decision(prediction). Eventually, SVM in Regression has a boundary like SVM in classification but the boundary for Regression is for making the regression function insensitive respect to the error but the boundary for classification is only to be way far from hyperplane(decision boundary) to distinguish between class for future (that is why we call it safety margin).

Intuitively, as all regressors, it tries to fit a line to data by minimizing a cost function. However, the interesting part about SVR is that we can deploy a non-linear kernel. In this case, we end up making non-linear regression, i.e. fitting a curve rather than a line.

This process is based on the kernel trick and the representation of the solution/model in the dual rather than in the primal. That is, the model is represented as combinations of the training points rather than a function of the features and some weights. At the same time, the basic algorithm remains the same: the only real change in the process of going non-linear is the kernel function, which changes from a simple inner product to some non-linear function.


The implementation is similar to regression with one key difference: parameter tuning.
When I played around with parameter tuning, I observed that the accuracy of the predictions vary a lot. 

Comments

Popular posts from this blog

Linear Regression Theory