Machine Learning - Executive Summary

Making Sense of Reddit Posts: Classifying Reddit Posts into Anime and Movies

Our objective was to categorize Reddit posts into two distinct groups: Anime and Movies. We chose a classifier called Random Forest for this. The classifier is adept at discerning patterns and distinguishing between varied types of data.

At first, we started with a basic model for our classifier to see how good the model can predict. The initial results were encouraging, suggesting we were on the right track, yet they weren’t exceptional.The classifier could differentiate between Anime and Movies, but it wasn’t always accurate. Realizing the potential for improvement, we fine-tuned the classifier’s settings. It’s like fine-tuning a musical instrument for the best sound. We adjusted how many decision-making paths it should take, how deep it should look into details, and how it groups information. These adjustments made a big difference.

Post-tuning, the classifier became much more adept at correctly labeling posts as either Anime or Movies. The improvement was evident in its increased accuracy and fewer misclassifications.

Understanding Reddit Post Scores: Predicting Reddit Post Scores

In the third part of our project, we tried to predict how people engage with Reddit posts, would be based on their scores, which are calculated by subtracting downvotes from upvotes. Our aim was to figure out what makes a post engaging to Reddit users. This task was challenging because Reddit’s scoring can be tricky – for example, a post with a score of one could either have one person liking it or thousands of likes and almost as many dislikes.

We started with a simple method called Linear Regression to get a basic idea of how scores work. But we soon realized we needed a more detailed approach to really understand Reddit’s scoring. So, we switched to using a Decision Tree Regressor, which is better at handling complex data. This model helped us see more clearly what affects a post’s score.

However, we found that both models had their limits. The way Reddit calculates scores made it hard for our models to be really accurate. Sometimes the scores didn’t give a clear picture of how users felt about a post.

To wrap up, this part of the project showed us how tough it can be to guess the engagement of Reddit posts. It also taught us that choosing the right method and making adjustments to it is key, especially for data as complex as this. Our work here opens up more opportunities to explore social media data, helping us get better at predicting how people interact with content online.

Making Sense of Reddit Posts: Classifying Reddit Posts into Anime and Movies

Decoding Reddit’s Popular Posts: Classifying Reddit Posts based on their popularity

Understanding Reddit Post Scores: Predicting Reddit Post Scores