I'm an Operations Research/Computer Science undergraduate at Cornell University. Operations Research (OR) is a discipline that pertains to obtaining optimal or near-optimal solutions for complex decision-making problems. It draws upon fields such as mathematical modeling and optimization, statistical analysis, computer science, and psychology.
As a Go player, the shocking success of AlphaGo inadvertently dragged me into the world of machine learning and artificial intelligence. I enjoy learning about the field and have done a couple of projects in my free time, including a few Kaggle-style competitions. I also write about ML on Quora.
I also love keeping up and tinkering with the latest technologies. I've recently spent some time playing around with cloud services, mainly architecting and building Big Data/ML solutions on AWS and GCP.
When I'm not studying or working, you can find me playing tennis, running, cooking, playing poker or Go, writing, reading, and occasionally, competing in a Spartan Race.
B.S. in Operations Research and Information Engineering • May 2019
Minor in Computer Science
Relevant Courses: Algorithms, Machine Learning, Data Mining, Stochastic Processes, Probability & Statistics, Optimization, Natural Language Processing, Linear Algebra, Differential Equations, Multivariable Calculus
Digital Innovators Program • Sep 2019 - Present
Data Scientist • June 2019 - Sep 2017
- Consulted for a Fortune 500 company to improve existing fraud detection models
- Leveraged PySpark and Apache Hive to analyze millions of credit card transactions (totaling over $14B) and perform feature engineering
- Implemented distributed version of RuleFit in PySpark to increase model interpretability
- Worked on graph-based algorithm for automatic detection of Points-of-Compromises (POCs)
Investment Banking Summer Analyst • June 2017 - Aug 2017
- Conducted preliminary M&A Researchfor $1.2B Healthcare Client
- Analyzed hundreds of equity reports, SEC filings, and press releases to identify underlying drivers of major stock movements for 20+ biotechnology companies
- Worked with senior management to prepare valuation of Amazon using public company comparable and sum-of-parts valuation; analysis showed company was under-valued by 70% (massive upside from AWS segment)
- Worked with head of research to build linear regression models to predict companies’ market cap given the type and number of FDA/EMA designations received; models showed that the FDA Breakthrough Therapy Designation (BTD) and EMA Priority Medicine (PRIME) proved to be significant feature variables
Languages: Python (scikit-learn, numpy, pandas, keras/tensorflow), Java, R, SQL
Machine Learning: Logistic/Linear Regression, Ensemble Models, Deep Learning
Cloud Computing: AWS Platform, Google Cloud Platform
Big Data: Hadoop, Spark
Other Tools: Web Scraping (BeautifulSoup, Seleniumm, HTML, CSS), Git, LaTeX
AWS Certified Developer (Associate)
Academic: Dean's List (2016, 2019)
Quora: 2018 Top Writer
Tennis: Cornell Men's Varsity Tennis (2015-2017), 2017 Ivy League Champions,
Five-Star Recruit, NJ State Champion (2013)
Go: 5 Dan Player, Cornell Go Club, US National Go Championship Runner-Up (2008)
Spartan Races: 1 Spartan Sprint
Author: Cracking the Data Science Interview
I am a huge fan of Ed Thorp. So naturally, I enjoy games that require a blend of skill and luck: blackjack, poker, trading, etc... After spending some time during my summer studying blackjack and card counting, I wondered if a machine could learn to play blackjack optimally. Although I did not discover the Holy Grail of blackjack, I did learn a lot about Markov Decision Processes (MDPs), Q-learning, and approximating the Q-value using neural networks.Reinforcement Learning, AI, Markov Chains
Competition hosted on CrowdAnalytix. This was my first data science competition, so I spent a lot of time learning and exploring different algorithms to get a feel about how each algorithm worked. The goal of the competition was to predict how the point ended: winner, unforced error, or forced error. The model I settled on was XGBoost, which is the go-to algorithm among Kagglers.Data Science, Machine Learning, Classification
As a trader, it is important to have "signals", or indicators, to determine if a trade should be made. I was exploring the sentiement of various cryptocurrencies by using metrics aggregated on Reddit Metrics, which contains various subreddit data, such as Subscriber Growth, Subscriber Count, Milestones, etc...
So this project was created to scrape data from Reddit Metrics. One script also utilizes CryptoCompare's API to obtain general social statistics from sites like Facebook, Githhub, and Twitter.Cryptocurrencies, Web Scraping
I enjoy writing in my spare time. Writing has helped me improve at conveying and condensing ideas and has also helped me refine ideas that were previously unclear. I write about a wide range of topics, such as philosophy, machine learning, and financial markets.
I have over 2000 followers, over 2 million views, and was named a 2018 Top Writer. Check me out below!Writing
An idea pioneered by Richard Feynman, the famous American theoretical physicist. As an undergrad, Feynman collected a notebook which he called the "NOTEBOOK OF THINGS I DON’T KNOW ABOUT". He was able to learn complex topics by continuously revising his notes, boiling difficult ideas down to simple ones, and trying to fill in the gaps in his knowledge.
So here is my own notebook of things that I will be constantly reorganizing. Topics so far include Statistics, Machine Learning, Algorithms, Data Structures.Notes, Writing
This cheatsheet is currently a 9-page reference in basic data science that covers basic concepts in proabability, statistics, statistical learning, machine learning, big data frameworks and SQL.
The cheatsheet is loosely based off of The Data Science Design Manual by Steven S. Skiena and An Introduction to Statistical Learningby Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani.
Inspired by William Chen's The Only Probability Cheatsheet You'll Ever Need, located here here.Data Science, Machine Learning
Since the amount of terrorist attacks has dramatically increased over the past few decades, we were interested in figuring out which attributes of an attack would help predict casualities. Specifically, we wanted to predict the amount of total injuries/deaths for various attacks. Not surprisingly, explosives, bombs and dynamite were found to be among the most lethal types of weapons used. In addition, the type of institutions targeted (military and religious) gave an indication of how devastating an attack would be. What was surprising to us was that East Asia was the region where the most catastrophic events are bound to occur.
Reflecting on the project, although we were only able to make use of 10% of our initial dataset due to the scarcity of terrorist data and incomplete information, we were able to create a respectable model to predict the total casualties. More in-depth research in this area could could help governments identify some of the most important sources of damage during attacks. Armed with such information, we hope that governments around the world take the necessary precautions to mitigate future damages and limit the lives lost.Data Science, Machine Learning
Cracking the Data Science Interview is the first book that attempts to capture the essence of data science in a concise, compact, and clean manner. In a Cracking the Coding Interview style, Cracking the Data Science Interview first introduces the relevant concepts, then presents a series of interview questions to help you solidify your understanding and prepare you for your next interview.
Topics Include: Necessary Prerequisites (statistics, probability, linear algebra, and computer science), 18 Big Ideas in Data Science (such as Occam’s Razor, Overfitting, Bias/Variance Tradeoff, Cloud Computing, and Curse of Dimensionality), Data Wrangling (exploratory data analysis, feature engineering, data cleaning and visualization), Machine Learning Models (such as k-NN, random forests, boosting, neural networks, k-means clustering, PCA, and more), Reinforcement Learning (Q-Learning and Deep Q-Learning), Non-Machine Learning Tools (graph theory, ARIMA, linear programming), Case Studies (a look at what data science means at companies like Amazon and Uber)Data Science, Machine Learning
Data-driven approach to organizing educational materials to help you find the best rated tutorials and videos to understand a given subject.Data Science, Machine Learning
If you have a project that you think I can help with, or just want to say hello, then feel free to get in touch! Cheers.