In this Kaggle.com project, I wanted to predict the probability of Reddit’s Random Acts of Pizza group giving a free pizza in regards to its history of requests. This project was one of my first introductions in analyzing text for Machine Learning. If I were to do this again, I probably would try to integrate NLTK’s collection of stopwords and root words to help improve the code, as well as trying other machine learning techniques besides Decision Trees.
A important item to note is that Kaggle, for whatever reason, chose to leave a column in the test dataset which indicates if a person obtained pizza or not. Some people chose to use this column for training & testing, which while 100% accurate, isn’t exactly following the spirit of this competition because we’re trying to predict if someone will get pizza, not if we know something will get pizza. I did not use this column in my studies.