mercredi 16 novembre 2016

A study on Linear Regression applied to the Ames dataset

Since around the start of the summer, I've been spending a lot of my time on Kaggle, reading or taking part in Machine Learning competitions. While I've yet to achieve a big score, I've learned a ton of stuff about preprocessing, models, parameter tuning, ensembling models etc. I've also used this as a way to learn to code in Python, since all of my previous Data Science projects were implemented in R.

About 2 months ago, I published a script detailing how I used standard linear regression and regularization methods to achieve a top 10% (at the time at least) public leaderboard score in the House Prices competition. I've taken a look at it again this week, and realized it received more than 10 upvotes (Kaggle's equivalent of Facebook Likes). There's quite a few things I would do differently now, but I still take great pride in having helped some fellow data scientists with this script, and thought I should post the link on my blog before I forget about it :

A study on Linear Regression applied to the Ames dataset