I am passionate about data science. I love reading about it, blogging about it, and I have a history of using data-driven techniques to tackle and solve problems that seemed intractable. Given what I’ve accomplished so far with a relatively limited tool-set, I can’t wait to see what types of problems I will be able to solve with the skills learned during the Master of Information and Data Science Program at Berkeley.
I earned my B.A. in Mathematics at Pomona college, but at first didn't really realize how much I enjoyed problem solving when it utilizes a combination of mathematics and computer programming. It wasn’t until my final year when the pieces came together and the seeds for my lifelong love of data science were sown. I found my calling in classes such as Investigational Statistics, Mathematical Modeling, Fundamental Concepts in Math (surprisingly similar to programming), and Probability and Its Applications.
Even after becoming a professional software developer, I hadn’t realized the magic of combining programming with math until my former professor Art Benjamin challenged me with a proof he was working on for his upcoming book “Proofs that Really Count: The Art of Combinatorial Proof.” I didn’t think I could possibly solve such a difficult problem, but somehow he had confidence that I could bring a unique approach to the problem. It turns out that he was right, as I created a custom computer program that used a kind of Monte Carlo approach while allowing the user to nudge the program in the right direction. Before long, I had surprised myself by conquering four problems and even got mentioned in his book.
After this, I was part of a two-person team who developed the winning entry in the international 2007 AAAI Computer Poker Competition. In an article in the San Bernardino County Sun, Michael Bowling, from University of Alberta’s Computing Science Department, stated “they are going up against top-notch universities that are doing cutting-edge research in this area, so it was very impressive that they were not only competitive, but they won.”
Following 11 years as a software developer, I better aligned my career with my true interests by joining the Analytics team at Oversee.net. Early on, I overheard a couple co-workers discussing the statistics they used while running randomized A/B experiments and asked them to explain it to me. Something didn’t sound right and after digging into it, I found that their use of statistics was leading them to false and premature conclusions. Oversee switched their testing approach over to my recommended approach and I became the manager of the testing pipeline.
The problem I discovered with their approach (which to this day is still being used by other companies) was that they were treating metrics derived from web visitor statistics, such as the ratio of clicks-per-load, as if they came from coin flips. The problem is that, unlike the ratio of heads to tails, the ratio of loads to clicks can fluctuate wildly from day to day, thanks to bots that occasionally blast webpages with activity. The statistical methods used to determine significance must reflect that volatility in the error bars or they may lead to unjustifiable confidence in the results. With the standard coin-flip approach, the error bars actually never increase in size, no matter how wildly the metrics are varying. The uncertainty of the ratio follows directly from how close it is to 50% (the ratio with the maximum standard deviation) and how many data points there are. My recommended fix was to bucket the ratio of clicks-per-load by day and use the variance of those data points to determine significance.
The field of data science is exploding right now and will undoubtedly lead to many incredible discoveries. I'm just excited to be going along for the ride and will enjoy learning as much as I can.