Welcome to my portfolio! I enjoy various aspects of data science and here you’ll find a recap of my academic and personal projects.
This project focuses on building and optimizing a multiclass classification model to address class imbalance and feature correlation. A Random Forest model, along with Logistic Regression, was used to train and predict the outcomes based on encoded and PCA-transformed data. Bayesian analysis was applied to evaluate the least important feature, providing insights into feature significance and model uncertainty.
Repository URL: https://github.com/joshdscan/Technical-Interview-Multi-Class-Classification
This project focuses on addressing the high costs and low success rates associated with clinical trials. It develops a machine learning solution to automate the labeling of clinical trials based on termination reasons. By leveraging the transformer model, the project aims to reduce the manual curation burden and enhance the efficiency of identifying viable clinical trial candidates via multi-label classification.
Repository URL: https://github.com/joshdscan/CTWhyStopped/tree/main
In this project, machine learning is used to automate the article selection process, focusing on articles likely to exceed 1400 shares for increased profitability. Utilizing a dataset of 40,000 articles with 61 features, including keyword counts and NLP metrics, this project employs a Random Forest model. This model boasts a 66% accuracy rate, surpassing the K-Nearest Neighbors model’s 58%.
Repository URL: https://github.com/joshdscan/Automating-Article-Selection
This project was designed for a potential Airbnb host in The Big Apple. It utilizes a Tableau dashboard to guide strategic pricing decisions for maximizing rental property value. Addressing the unique market dynamics, the dashboard provides comparative insights with other properties, suggesting a balanced market price and evaluating renovation impacts.
This project addresses the rise in fraud complaints within AJOS Bank’s credit card transactions. Four machine learning models were tested to analyze the bank’s data and identify the most effective method for detecting and preventing fraudulent transactions. The Random Forest model emerged as the most accurate, significantly reducing the potential financial losses and improving customer trust.
Repository URL: https://github.com/joshdscan/fraud-detection
This project optimizes the logistics of spare parts distribution by minimizing total costs, including fixed costs for depots and transportation costs between suppliers, depots, and customers. Using Gurobi for mathematical modeling and Python for data handling, the model incorporates real life constraints such as supplier weight limits, depot capacity, and more to ensure efficient logistics operations while meeting service level standards.
Repository URL: https://github.com/joshdscan/Airplane-Spare-Parts-Logistics
Feel free to reach out to me on LinkedIn or via email at jd.scantlebury@hotmail.com. Looking forward to connecting with you!