Machine Learning for the Developing World using Mobile Communication Metadata
A report on Ph.D. dissertation research
Researchers working on the problems associated with the developed world generally have access to rich and diverse datasets like social media activity, sensors data, etc. However, the same is not correct about the developing world where access to comprehensive datasets is one of the most significant issues in the research. Social networks and digital sensors have not been that common in the developing world with one big exception: mobile phones.
More than 95% of the world’s population today has mobile phone coverage, and even in some of the most under-developed places of the earth, the penetration of mobile phones is much higher than other measures of human development like literacy or access to the financial infrastructure. As a result, researchers have been increasingly using the metadata collected by the mobile phone companies in these developing countries as an alternative to the more conventional data sources. However, the mobile phone data may not be very well suited for the machine learning algorithms in its raw form. In other words, there is a need for algorithms to convert the raw mobile communication meta-data into features suited for the machine learning algorithms.
In this talk, I am going to describe my work on extracting features from mobile communication logs using techniques like Deterministic Finite Automata (DFA). I will also show how this approach outperforms other methods for problems like product adoption. I further show that by using DFA based features and spectral analysis of the multi-view nature of mobile communication networks, advanced neural network algorithms can be developed that beat the current state of the art methods for the problems like poverty prediction and gender prediction. In the last part of this talk, I will describe the value of communication networks data for research questions related to social networks analysis like what are the salient differences between the behavioral patterns of men and women in the developing world as exhibited in the communication networks data.
Muhammed Raza Khan recently completed his Ph.D. at the School of Information where, as a member of the Data Intensive Development Lab, he worked on problems related to machine learning for social good. The insights resulting from his work on feature generation using mobile communication metadata has been used by the International Finance Corporation (a subsidiary of the World Bank) to improve financial inclusion in countries like Ghana and Zambia. In Ghana, this approach resulted in better targeting of the customers of mobile money products by a margin of 30% as compared to the existing methods. Raza was also one of the grantees of the UN Big Data for Gender Challenge — Data2X. Raza's work has been published in venues like the ACM SIG Knowledge Discovery and Data Mining and the Association for the Advancement of Artificial Intelligence. Prior to UC Berkeley, Raza completed his master’s in computer science from Georgia Tech as a Fulbright Scholar.