Linear Regression - Part 2
This course uses Scikit-learn for machine learning algorithms. With Scikit-Learn it is extremely straightforward to implement linear regression models, as all I really need to do is import the LinearRegression class, instantiate it, and call the fit() method along with our training data. This is about as simple as it gets when using a machine learning library to train on your data.
The linear regression model basically finds the best value for the intercept and slope, which results in a line that best fits the data. We can also see the value of the intercept and slop calculated by the linear regression algorithm for our dataset using the following code:
print(regressor.intercept_)
print(regressor.coef_)
These help us answer the statistical questions of how much the prediction changes for every unit change in the dependent variables.
Once the model is trained, the next step is to make predictions. For this scikit-learn offers a simple predict() function. It was interesting to note how pages and pages of detailed calculation about how the value is actually calculated, starting from calculating the coefficients to determining the regression equation and then training the model based on the training dataset, has been encapsulated into very few lines of code.
The next key task, once we have the predicted values, is to check how accurate these values are.
This step is particularly important to compare how well different algorithms perform on a particular dataset. For regression algorithms, three evaluation metrics are commonly used:
Mean Absolute Error (MAE) is the mean of the absolute value of the errors. It is calculated as:
Mean Squared Error (MSE) is the mean of the squared errors and is calculated as:

Root Mean Squared Error (RMSE) is the square root of the mean of the squared errors:

The Scikit-Learn library comes with pre-built functions that can be used to find out these values for us.
The threshold depends on the data scientist as to how much accuracy are they looking for in a model.
I remember learning these formulas and how they were derived. During that process, I would often lose focus of why were we calculating these values, but this exercise helped me clear a lot these concepts.
The linear regression model basically finds the best value for the intercept and slope, which results in a line that best fits the data. We can also see the value of the intercept and slop calculated by the linear regression algorithm for our dataset using the following code:
print(regressor.intercept_)
print(regressor.coef_)
These help us answer the statistical questions of how much the prediction changes for every unit change in the dependent variables.
Once the model is trained, the next step is to make predictions. For this scikit-learn offers a simple predict() function. It was interesting to note how pages and pages of detailed calculation about how the value is actually calculated, starting from calculating the coefficients to determining the regression equation and then training the model based on the training dataset, has been encapsulated into very few lines of code.
The next key task, once we have the predicted values, is to check how accurate these values are.
This step is particularly important to compare how well different algorithms perform on a particular dataset. For regression algorithms, three evaluation metrics are commonly used:
Mean Absolute Error (MAE) is the mean of the absolute value of the errors. It is calculated as:
Mean Squared Error (MSE) is the mean of the squared errors and is calculated as:

Root Mean Squared Error (RMSE) is the square root of the mean of the squared errors:

The Scikit-Learn library comes with pre-built functions that can be used to find out these values for us.
The threshold depends on the data scientist as to how much accuracy are they looking for in a model.
I remember learning these formulas and how they were derived. During that process, I would often lose focus of why were we calculating these values, but this exercise helped me clear a lot these concepts.
Comments
Post a Comment