For the I School Community

Ph.D. Research Reception

Thursday, March 12, 2026
3:45 pm - 6:00 pm PDT

Join us as Ph.D. students from the School of Information share their innovative research.

The Ph.D. program at the School of Information draws doctoral students from a wide array of disciplines whose interests and approaches are as varied as their backgrounds. Though they all take technology as their object of study, our Ph.D. students approach the topic from many different angles — economic, political, social, legal, ethical — in an effort to understand the present impact and future development of information technology.


Log in for the event video.


Schedule

TimeSpeaker
3:45 pmOpening Remarks
Coye Cheshire
4:00 pmWhat’s happening on Earth? Scaling Environmental Monitoring with Machine Learning
Ando Shah
4:20 pmCounting corsets: a computational study of fashion and gender in 19th century English fiction
Naitian Zhou
4:40 pmIs the Price Right? Credit Scoring and Digital Loans in Colombia
Simón Ramírez Amaya
5:00 pmBreak
5:20 pmValidating and Refining Generative AI Evaluations via Stakeholder Engagement
Tonya Nguyen
5:40 pmEpistemic Appeals in Scientific Claims on Reddit
Dan Hickey
6:00 pmReception

Presentations

What’s happening on Earth? Scaling Environmental Monitoring with Machine Learning

Ando Shah

Keeping up with changes on the surface of the Earth is like keeping up with the news these days. But we have more and more sophisticated tools at our disposal to deepen our understanding of the changes and trends, so we can respond in the best ways possible. Often, there’s a conundrum in Earth observation (EO): too much data (satellite data, weather data, etc) and too little data (what is ‘ground truth’)? I’ll delve into some of my PhD projects that live at this intersection, from developing “any-sensor” foundation models for EO, scaling sand mining detection and analysis, to using high resolution simulation data to pinpoint big pollution sources.

Counting corsets: a computational study of fashion and gender in 19th century English fiction

Naitian Zhou

Fashion is a social language — a system of social meanings through which we can communicate identity. In this large-scale computational study of 19th century English novels, we try to understand how authors tell us about their characters through their clothes. We bring together sociolinguistics, literary theory, and natural language processing methods to study how fashion, fashion description, and the relationship between fashion and gender within fiction evolve over time.

Is the Price Right? Credit Scoring and Digital Loans in Colombia

Simón Ramírez Amaya

Over the past two decades, lenders in the developing world have increasingly come to rely on algorithmic credit scoring to assess and price credit risk. In principle, scoring technologies can improve market outcomes.However, there is limited evidence on whether modern, commercial-grade credit scoring systems actually deliver these gains in practice — and on how benefits and costs are distributed across consumers. This paper studies digital loans in Colombia, with a focus on the interaction between risk estimation and pricing and its implications for consumer welfare. I approach this question using a combination of theory and theory and empirical analysis. On the theoretical side, I extend models of credit markets with assymetric information to settings in which lenders have a limited ability to price heterogeneity across borrowers. On the empirical side, I leverage a large-scale, continuous field experiment conducted in collaboration with a local lender, which introduces randomized variation in the interest rates offered to consumers. I combine the experimental data with rich administrative records to understand how consumers respond to prince changes and to evaluate the counterfactual implications of different scoring regimes. The experiment is ongoing (3+ months) and has generated more than three million observations to date. Preliminary results indicate substantial heterogeneity in both demand and costs along the score dimension.

Validating and Refining Generative AI Evaluations via Stakeholder Engagement

Tonya Nguyen

Generative AI systems are notoriously difficult to evaluate, in part because definitions of their capabilities, behaviors, and impacts can be contested across use cases, cultures, and languages. To address this, machine learning researchers have begun to draw on measurement theory from the social sciences to develop systematic frameworks for the measurement tasks involved in generative AI evaluation. In this tradition, the first step of tackling a measurement task is to precisely define or systematize the concept to be measured. Systematization creates an opportunity to include stakeholders — including those who will use or be impacted by a system — in conceptual debates about the proposed definitions and boundaries of a concept, ultimately leading to measurements that are more reflective of those stakeholders’ needs and values. In this paper, we explore how to validate and refine systematized concepts via stakeholder engagement. We situate our study in the context of measuring erasure, engaging stakeholders in validating and refining the systematized concept of erasure developed by Corvi et al. We conducted six workshops with 23 participants. We find that participants’ understandings of erasure are largely aligned with Corvi et al.’s systematized concept, but also surface examples and conceptual boundaries that it does not capture. We provide suggestions for how the systematized concept could be refined to better reflect stakeholder perspectives. We reflect on learnings from our study to derive recommendations for how to better engage stakeholders in the AI measurement process.

Epistemic Appeals in Scientific Claims on Reddit

Dan Hickey

Social media is increasingly used to learn about, disseminate, and debate scientific issues. However, research shows that many scientific issues are not just accepted truths. They are hotly contested on social media, often in ways that misrepresent the scientific consensus. While prior work indicates that anti-consensus communities often defer to expert authority, communities on either side of a scientific debate may still differ in which experts, institutions, and sources they deem as trustworthy. In this work, we build a model to detect scientific claims on Reddit and introduce a taxonomy for categorizing epistemic appeals — the ways claims are anchored in external knowledge, authority, or expertise — in those claims. To understand the shifting landscape of science communication on social media, we apply our scientific claim detection model on a sample of 7 billion Reddit comments from 2005–2022, finding that the rate of scientific claims on the platform has gone down, as well as the rate of URLs that accompany scientific claims. At the source level, we find that links to Wikipedia have decreased over time in scientific claims, while links to scientific journals have become more frequent. Annotation of a sample of scientific claims reveals that while most claims are presented without evidence (66%), claims also frequently appeal to the authority of scientific studies and methods.

Last updated: March 19, 2026