DATA ANALYSIS

MOVIELENS DATA ANALYSIS

This data analytics project involved looking at data from Movielens and comparing it with Wikipedia to identify which films were the most feminine and which films were the most masculine. I used Python to conduct this analysis and we found some interesting results. The full report can be found here.

FRUIT GENERATION USING GANs

Machine Learning has countless uses, and I wanted to learn more about a type of model called Generative Adversarial Networks (GANs). These can be used to generate entirely new data based on existing data. This project involved generating new images of fruits based on existing images. Sadly the code itself cannot be shared, but you can watch the presentation explaining the project here.

MLB POPULARITY ANALYSIS

Baseball is a huge deal in the U.S, and I wanted to see which team is the most popular. For this analysis, we used the MLB dataset found on Kaggle, and we looked at which team won the most matches, as well as which stadiums drew the largest crowds. Our complete analysis can be found here.

TODAY’S TOP HITS

This project involves the analysis of music profiles over time and how they changed through the ages. We also analyzed the popularity of different music profiles over time. There is also a prediction section where the music profile is inputted and the program outputs what era your music taste is from and how popular it would be today. 

IPL PREDICTION MODEL

I trained a machine learning model that uses Naive Bayes to predict the result of an IPL match given statistics such as the teams, the venue, the toss winner, and the toss decision. I found the dataset used for training on Kaggle and created the model using scikit-learn. The average accuracy I was able to attain was 57%, which is admittedly pretty low but building more accurate sports prediction models is very difficult due to the number of factors that affect the result. There is also a smaller amount of data available about the IPL since it began only in 2008. Thus, given the data, 57% is a pretty good accuracy.

css.php