Data Science
Predictive Maintenance Exploration
Predictive Maintenance is the strategy of using Data Analytics to predict when equipment needs maintenance before failure, saving on repair costs and downtime. I explored the field through a generated dataset (as datasets for this problem are rare due to corporate secrecy), the AI4I 2020 Predictive Maintenance Dataset to be specific, and generated ML Models to predict failure.
The project covers the standard Data Analysis workflow, Exploratory Data Analysis, Decision Trees, AdaBoost Classifiers, Working with Unbalanced Data, ML Model Tuning, presenting findings to a large audience, and more.
From my work, I achieved a .76 F1 Score for Machine Failure, and .77 AUC-PR from a Tuned Random Forest Model, a strong result based on other work on the Dataset. If I were to revisit this exploration, I'd look towards using more expensive models, and look for real-world datasets, especially those regarding time series data.
My Jupyter Notebook and a Slide Deck of my Results are viewable below.
Twitch Chat Classifier
xQc's Twitch Chat
Northernlion's Twitch Chat
Twitch.tv is a live-streaming platform potentially most notable for its Twitch Chat. Based on 3rd party plugins and years of culture built nowhere else, streamer's Twitch Chats are extremely unique centers for community and interaction.
This project aimed to explore the potential for using Classification Models to predict which streamer's twitch chat a certain message, no longer than 255 characters, belongs to. The project slowly increases the complexity of the problem and tackles the individual challenges along the way, illustrating the journey. Naive Bayes Classifiers, Network Visualizations, Data Mining and Scraping, Text Vectorization, and many more techniques are used.
With a Simple Naive Bayes Binary Classifier as a Proof of Concept, 95% Accuracy was obtained classifying between chats of Low Cosine Similarity, and 70% Accuracy on High Similarity. With the larger problem of 50 Streamers in Multi-class Classification, Naive Bayes gave 30% Accuracy when considering a single message, and 90% when considering 13 messages or more. Using Similarity Metrics, a Network Diagram was made to Visualize the Similarites and Communities that exist between similar chats on the platform.
There is significant room for improvement following this proof of concept. Random sampling of multiple VODs would decrease bias, a larger similarity graph visualization would prove very interesting, and usage of more complex, expensive ML models would be ideal towards creating a more accurate model.
The project's Jupyter Notebook is found below.
Data Visualization Examples
I have working experience with Bokeh, Matplotlib, and Tableau, and familiarity with most other data visualization packages. The following Data Visualization Examples come from Personal Projects, and my work in Northeastern's IE6600 Computation and Visualization. Function has been prioritized over function in these examples, though more uniquely attractive visualization techniques are something I'm interesting in exploring further. More examples are available upon request.
Classwork: Tableau - Patent Activity Map
Classwork: Tableau - Erie County Composity Score Map
Relevant Classes
I am projected to graduate Northeastern University with a Master's Degree in Data Analytics Engineering in May 2025. My Graduate GPA, not including my final semester, is a 3.61. In my graduate program, I've taken/am taking the following classes:
IE6400 Foundations of Data Analytics
EMGT5220 Engineering Project Management
IE6600 Computation and Visualization
IE6700 Data Management for Analytics
IE7275 Data Mining in Engineering
CS5800 Algorithms
DS5220 Supervised Machine Learning
DS5230 Unsupervised Machine Learning
My Undergraduate Degree from Northeastern University was in Mechanical Engineering, with a minor in Computer Science. Relevant classes from that Undergrad include:
CS2500/2510 Fundamentals of Computer Science 1 and 2
CS3200 Database Design
CS3520 Programming in C++
My classwork has given me both a strong theoretical background as well as thorough, real-world project experience. Example Classwork is available upon request.