Learning the slopes of linear regression and gradient descent

This week I’ve been dipping my toes into machine learning by undertaking the Stanford University Machine Learning course on Coursera.

I’m fascinated by the prospect of finding hidden truths in data, and I’m interested in finding applications for machine learning in both cryptography and in systems security.

So far I’ve learnt about linear regression, the gradient descent algorithm, and the normal method. I’ve also been getting to grips with Octave which is an open source programming language for scientific computing, which is largely compatible with MATLAB. I’ve been really impressed by the ease with which Octave lets me visualise data and by the power of vectorising methods. (Sufficiently excited that I wrote a tweet!)

For this week’s exercise I took some data on revenue and profit, and used gradient descent to find the best linear hypothesis for predicting profit based on population size.

Normalised linear regression plots produced in Octave
From left to right: The training data and best-fit curve as a scatter plot, and a 3D plot of the mean square error cost function.

I’m really pleased how easy these were to produce and I’m looking forward to using more Octave in the future!