Photo of campus showing grass, trees, and the Campanile. By Keegan Houser.
Mar 21, 2022

Ph.D. Alum Doris Lee Wants to Democratize Data Science Tools

Her start-up, Ponder, is leading the way with a $7M round of seed financing.

As an astronomy and physics undergraduate at UC Berkeley, Doris Lee, Ph.D. ’21 often found herself with large amounts of data and no easy way to discover the insights within.

“Coming from a non-CS background,” Lee said, “I realized that there was a lack of tools to make it easier for people to visualize and understand their data, especially for domain experts like astronomers or materials scientists, who want to quickly get to their insights.” The problem tugged at her, and in the fall of her senior year, she returned from an astrophysics research internship and refocused her research, realizing that she was more interested in building the tools necessary to assist scientists in addressing their research questions than the questions themselves. 

“I sort of stumbled upon the research I wound up concentrating on in grad school,” Lee said. 

Data scientists around the world might rejoice in this stumble: Lee is a newly minted UC Berkeley School of Information Ph.D., and co-founder and CEO of Ponder, a company spawned from her idea that there should be an easier way for scientists to interpret their data and perform analysis. 

Pondering pandas

Ponder, the start-up Lee launched with her dissertation advisor, School of Information Associate Professor Aditya Parameswaran (he’s company president), and recent Berkeley EECS Ph.D. alum Devin Petersohn (he’s CTO), aims to address the usability and scalability issues for working with data. Their efforts have not gone unnoticed, receiving $7 million in seed financing from Lightspeed Venture Partners, along with Intel Capital, 8VC, and the House Fund.

Doris, Devin, and Aditya (Courtesy of Doris Lee)

When dealing with big data sets, Python is one of the most popular programming languages; it’s known for its simplicity (it uses an English-like syntax) and for having a large number of open-source libraries. 

One of the most-used libraries for data manipulation and analysis is pandas, which offers an intuitive, flexible data structure and a rich set of operations for manipulating data in tabular form. For these reasons, pandas has been referred to as the most important tool in data science. But even though pandas is the tool of choice for many, it’s lacking in the areas of data visualization and scale. That’s where Lee and her teammates saw an opportunity: what if there was a way to improve the usability and scalability of dataframes? That’s where Ponder comes in. 

On the usability side, there’s Lux. Lux is a library that automatically recommends to its users visualizations showcasing interesting patterns and trends whenever they print their dataframes. “Think of it as similar to what Netflix does for its users,” Lee said. “If you want to watch a movie but aren’t sure which one, Netflix will make a recommendation based on your interests.” 

Lux helps users sift through their data in a similar fashion. You can then easily export and share the generated visualizations in HTML or as code. The application has over 3.6 thousand Github stars and over 100,000 downloads; its users span a variety of industries and sectors. 

The scalability side of Ponder is Modin. Modin is a Python library that can be used to speed up the handling of large datasets using parallelization. “We wanted to help data scientists so they wouldn’t have to write a lot of code to be able to work with large amounts of data,” Lee said. “Modin is centered around the idea that you can change a single line of code and Modin will automatically handle the scaling for you.” Unlike many other such parallel-processing tools, Modin does this by acting as a drop-in replacement for pandas, so that they can enjoy a significant performance boost without rewriting any part of their code. Like Lux, Modin has lots of users — with over 2.7 million downloads thus far.

The philosophy behind both tools was to be able to meet users where they are. 

“Doris has done some incredibly important work in trying to make data science more fluid, accessible, and interactive.”
— Aditya Parameswaran

“Doris has done some incredibly important work in trying to make data science more fluid, accessible, and interactive,” Parameswaran said, “and has employed several human-centered design principles along the way.”

“As part of Ponder, we’re applying the same principles to improve the productivity of data scientists,” Parameswaran continued. “We don’t want to force users to change their behavior. Instead, we want them to continue using the tools they know and love, but just make the experience a lot more pleasant at scale.”

So far users have been very happy (see all those GitHub stars), and the initial traction from the open-source community started Lee and her collaborators thinking that maybe they had more than just a research question on their hands: maybe they had the perfect project for a start-up.  

The Berkeley Effect

Lee says that the I School has helped shape her thinking around working with end-users and building something to serve the community. 

“Our initial conversations with I School students and alumni have given us a good insight into the real-world problems faced by data scientists in industry. This reflects the holistic approach that I Schoolers take in solving problems — that the whole is greater than the sum of its parts.” 

“The I School has been so incredibly supportive the past few years,” Lee said and paused for a moment. “It speaks volumes about how students work together and how faculty work together.” 

And it’s a lot to process. Lee filed her dissertation, Designing Automated Assistants for Visual Data Exploration, in August of 2021. Ph.D. to CEO is a 180-degree turn, Lee said. She also finds herself as one of a few female CEOs in the current enterprise data tooling space. “I think you can count the number of women on one hand,” she laughed. She’s keenly aware of how being young, female, and the CEO might raise eyebrows in some circles; Lee remains resolute and ready for the challenge. For the moment, her main concerns are building the initial company team, and maintaining and growing the open-source community. 

Ponder officially launched in early March; she and her Berkeley partners have some busy days ahead, working to supercharge the productivity of customers spanning the AI, financial, and healthcare sectors.

One day you’re filing your dissertation, the next you’re running a company. It’s a lot to ponder. 

Last updated: April 14, 2022