Dungeons and Dragons 5'th Edition Character Sheet R Shiny Dashboard: Character sheet analysis in a R Shiny Dashboard environment
Dashboard investigating 5'th Edition character sheet atributes and statistics. Race and class backgrounds are applied by user input and multiple packages are employed to present interactive graphs and tables such as Shiny, ShinyDashboard, Highcharter, Plotly and Hmisc.
View dashboard here
Modeling Parameter Differences in Professional Tennis Rally Distributions: Determining the optimal likelihood for professional tennis rally distributions and modeling player differences across serve and surface
Length of professional tennis rallies depend on many factors, such as quality of serve or court surface. We construct parametric likelihoods to fit different rally length distributions that are subject to these factors. To model rally length distributions, we explore modified Geometric distributions that include point masses at various rally lengths and then compare using the Watanabe-Akaike information criterion. These models can use up to 7 parameters that include point masses at short rallies and the modified geometric parameter for longer rallies. By comparing these different likelihoods we assess at what point adding additional point masses no longer improves our model's overall fit and when rally length begins to resemble a Geometric distribution. From this we show that rally length distributions are similar across surfaces and that first serve rallies require two additional balls in play to resemble a Geometric than second serve rallies.
View most recent presentation here
WW1 Project: Machine Learning Methods to Link WW1 Death Data to US Census Records
Matching World War 1 death data to US census records and scraped genealogical data. Builds features from the data available and trains, tunes, and cross validates multiple machine learning models such as XGBoost or Random Forest and then uses a stacked model to predict at 96% accuracy. Data is obtained from public US census records and scraped data is from FamilySearch.org.
View code here
Collegiate Basketball Strength of Schedule Analysis: Understanding the Effects of Distance Traveled and Time Rested on College Basketball Performance
An analysis of the effects of the distance traveled by an away team and the amount of time rested in between consecutive games on different performance metrics for visiting college basketball teams. Metrics measured inclused Point Margin, Field Goal Percentage, Turnover Ratio, Personal Fouls, and Win/Loss. Data was scraped from Sports Reference and a google maps API was used to obtain distances between college basketball stadiums. Models such as Random Forests, XGBoosting, Ridge/Lasso, and Logistic Regression were used in this analysis.
View presentation here