Faculty Director: Joshua Blumenstock

Joshua Blumenstock is an Assistant Professor at the U.C. Berkeley School of Information, the Director of the Data-Intensive Development Lab, and the co-Director of the Center for Effective Global Action. His research lies at the intersection of machine learning and development economics, and focuses on using novel data and methods to better understand the causes and consequences of global poverty. At Berkeley, Joshua teaches courses in machine learning and data-intensive development. He has a Ph.D. in Information Science and a M.A. in Economics from U.C. Berkeley, and Bachelor’s degrees in Computer Science and Physics from Wesleyan University. He is a recipient of the Intel Faculty Early Career Honor, a Gates Millennium Grand Challenge award, a Google Faculty Research Award, and the Chancellor’s Award for Public Service. His work has appeared in a variety of publications including Science, Nature, the American Economic Review, and the proceedings of KDD and AAAI.

The Team (at least, most of us)

Post-Doctoral Scholars

Jacqueline Mauro

Jacqueline’s research focuses on nonparametric causal methods motivated by real-world policy issues. These methods lean on developments in Machine Learning to create flexible yet robust estimates of causal effects. Jacqueline defended her dissertation for a PhD in Statistics, joint with Public Policy, at Carnegie Mellon University in July 2018. She studied under Edward Kennedy, developing nonparametric causal inference tools to learn about policies to reduce recidivism in Pennsylvania prisons.

Shekhar Mittal

Shekhar is a development economist with interests in public economics and political economy related issues in India. He aims to use large-scale government data sets (that have only recently begun to be collected) to better understand government capacity and to combine such data sets with field interventions to address questions of first-order causal interest.

Xiao Hui Tai

Xiao Hui has a Ph.D. student in Statistics and Data Science from Carnegie Mellon University. Her past research develops methods for comparing unstructured data, applied to forensics and cybercrime. At the lab, she is focused on developing statistical models to estimate the social and economic consequences of violent conflict.

Doctoral Students

Emily Aiken

Emily has an undergraduate degree in Computer Science from Harvard, with a secondary field in Global Health. She has previously done research on tracking disease outbreaks and identifying missing people in social media streams. At Berkeley, she is excited to work at the intersection of data science and areas of societal and environmental impact. She’s particularly interested in problems in which machine learning has the potential to transform our current understanding and policies.

Guanghua Chi

Guanhua’s research uses geospatial big data to understand the interaction between the activities of individuals and their geographic context. His areas of focus include social network analysis, human mobility, computational social science, and social media.

Personal website: www.guanghuachi.com

Niall Keleher

Niall’s research focuses on the intersection of development economics, ICTD, social network analysis, and new methods of data collection in these domains. Niall has 10 years of experience conducting randomized evaluations and primary data collection in developing countries. He holds a BA from the Johns Hopkins University in Economics and International Studies, and an MPA in International Development from the Harvard Kennedy School of Government.

Nitin Kohli

Nitin researches topics that span privacy, security, and fairness. Utilizing techniques from game theory, cryptography, statistics, and machine learning, Nitin develops theory and tools that safeguard information by constructing statistical tools and algorithmic mechanisms with provable guarantees over the outcomes of their use.

Suraj Nair

Suraj has undergrad and Master’s degrees in development studies from IIT Madras. He has spent the last several years working as a research manager at IFMR, overseeing the implementation and analysis of several large-scale randomized control trials of digital financial services. At Berkeley, he hopes to examine the impact of digital technology and information access on the structures of society and economy, to ensure that digital technologies and services are designed and implemented effectively, in a manner that does not exacerbate existing inequality and exclusions.

Robert On

Robert has a background in EECS, Statistics and Development Economics and is currently a PhD candidate at the School of Information, Berkeley. He is interested in the application of information technology, causal inference, and machine learning towards poverty reduction with the motto: Try a lot, fail a lot, but measure everything.

Isabella Smythe

Isabella is a second year PhD student in Columbia’s Sustainable Development program. She holds a B.S. in Computer Science and an M.S. in Earth Systems from Stanford and has a background in machine learning, geospatial analysis, and economics. She is interested in how novel data sources can be used to support sustainable global development, particularly in the areas of agriculture, food security, and land use.

Data Scientists

Shikhar Mehra

Shikhar is a Data Scientist with Masters Degrees in Computer Science (University at Buffalo) and Development Economics (University of San Francisco). He has previously worked for TechSoup, Amazon, Innovations for Poverty Action, and BRAC.

Raesetje Sefala

Raesetje is a Machine Learning Researcher who is currently a Computer Science Masters Student at Wits University, Johannesburg. Her research focuses on creating ground truth datasets and using machine learning to study spatial segregation in South Africa, post-Apartheid. She is interested in building communities which aim to increase the capacity and quality of work, of underrepresented groups in AI. She is mainly interested in using AI to solve problems experienced in developing countries; creating datasets for machine learning research and the discussions & creation/amendment of data privacy, ethics and accountability policies.

Rachel Warren

Rachel is a data scientist and masters student at the School of Information. She is interested in identifying areas where data-driven decisions exacerbate social inequality as well as opportunities to use machine learning to measure and reduce disparities. In addition to her work with DIDL, she is building tools to help public defenders as a 2020 fellow with the Center for Society Technology and Policy. Prior to entering the I School, Rachel worked as a data scientist in the private sector, most recently at Salesforce. She holds a BA in Computer Science from Wesleyan University.

Fengyang Lin

Fengyang is currently a Master’s student in Statistics at Columbia University. She has a background in machine learning, statistical analysis, and is interested in computational social science.

Affiliates and Collaborators

Dan Bjorkegren
Michael Callen
Dave Donaldson
Tarek Ghani
Sham Kakade
Tyler McCormick
Jacob Shapiro
Xu Tan


Raza Khan
Robert On
Ofir Reich
Jiaxun Song
Niall Keleher