This will be a short post about my Kaggle-ing adventures.
For those that don’t know Kaggle, it is a platform for data science competitions and has been running since 2010. The Kaggle community is a very diverse and the largest data community in the world. Earlier this year they were acquired by Google after which some competitions partnered with Google Brain were announced. I’m looking forward to the other competitions that they will organise.
For more details on how Kaggle competitions work or how much impact they have had you can read more here.
I joined Kaggle years ago to have a look at the types of data sets they had and the competition. I have a background in Machine Learning but having never applied it on real world data sets outside of university so naturally (for me) I didn’t think I could participate. Later that year I attended a short workshop at PyConUK which was an introduction to data science with the Titanic data set. I played around with the data set and submitted my results but after not knowing how to improve my score I gave up and put aside Kaggle till earlier this year.
Over a year ago I renewed my interest in applying ML to data at Atheon and found it very rewarding. I applied Information retrieval and NLP techniques to create a solution for a problem that has plagued us for years. The results were interesting and useful enough to put into production. Since then I have been actively learning and adding new skills to apply Machine Learning to different data sets.
A few months ago I joined a MeetUp in London called Women in Kaggle where a lot of data science practitioners as well as enthusiasts collaborate and play with the data sets and share our thinking and solutions. The meet once a month on a Wednesday evening if you are in London and are interested in data!
The competition they were looking at that time was the House Prices: Advanced Regression Techniques. Regression is a way of finding the relationships between different variables and there are many techniques for modelling these relationships. In this competition we had 79 different variables or features of a house and needed to predicts the Sale Price. Easy? Maybe. Fun? Definitely!
I got hooked and made over 100 submissions! And today finally published my kernel on Kaggle and on GitHub. I got a public score of 0.11819 and this ranked me at 230 out of 1603 submissions. I don’t think it’s too bad as I learned a lot of new techniques, as well refined the other techniques I knew.
What I loved most from this experience was the community at Kaggle and all the shared kernels. This helped me immensely especially feature engineering, which I did not spend much time on before but have a new found appreciation for. Thank you Kagglers!