MIDS Capstone Project Summer 2023

Detecting Emerging Crises from Tweets

Team members

Problem and Motivation

Humanitarian organizations need actionable information to effectively launch relief efforts to disaster areas. Social media platforms, like Twitter, have become an important information source for crisis management. Humanitarian organizations can use Tweets to identify emerging disasters, but they struggle to parse through large amounts of data. With our application, we hope to promptly summarize critical information to aid organizations in mitigating crisis situations.

Data Source and Data Science Approach:

For this project, we use multi-class and multi-label classification models to:

Identify disaster tweets by disaster event;
Identify disaster tweets by disaster type (e.g. hurricane, flood, etc.);
Categorize the type of disaster information (e.g. news, emerging threat); and
Categorize tweets by priority level.

We use a database acquired from TREC-IS that is comprised of multiple Twitter datasets collected from a range of past wildfire, earthquake, flood, hurricane, bombing and shooting events. Each tweet is assigned a 'priority' label, that indicates how critical the information within that tweet is. In addition, TREC-IS includes 25 information type labels that have been manually assigned by human annotators, such as 'contains location' or is a 'search and rescue request'. In total, the TREC-IS we are using has manually annotated tweet streams for 42 emergency events, comprising over 40,459 labelled tweets.

Evaluation

To evaluate our models, we use accuracy scores, classification reports, and confusion matrices to understand the performance of our classification models. You can find some of our model results below:

Summary of Model Results

Key Learning & Impact

We set out to find the best model for our classification task using approaches that we found were effective from academic papers. We compared and tested models that included: LinearSVC, Multinomial Logistic Regression, MultinomialNB, and Random Forest. While a simplistic model, we found LinearSVC consistently outperformed the others. This may be due to Support Vector Machines (SVMs) performing well with clear separation of classes, of which our dataset has. We also learned the importance of balancing datasets to ensure accurate predictions. We applied RandomOversampler to our imbalanced dataset which supplements the training data with multiple copies of some of the minority classes.

Acknowledgements

We are grateful to have the guidance and encouragements from our Capstone Advisors, Dr. Frederick Nugen and Dr. Puya H. Vahabi, of the Masters in Data Science program at School of Information, University of California, Berkeley.

References

Kevin Stowe, Michael J. Paul, Martha Palmer, Leysia Palen, and Kenneth Anderson. 2016. Identifying and Categorizing Disaster-Related Tweets. In Proceedings of the Fourth International Workshop on Natural Language Processing for Social Media, pages 1–6, Austin, TX, USA. Association for Computational Linguistics.

Imran, M., Mitra, P., & Castillo, C. (2016). Twitter as a lifeline: Human-annotated Twitter corpora for NLP of crisis-related messages. In N. Calzolari, K. Choukri, H. Mazo, A. Moreno, T. Declerck, S. Goggi, M. Grobelnik, J. Odijk, S. Piperidis, B. Maegaard, & J. Mariani (Eds.), Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016 (pp. 1638-1643). (Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016). European Language Resources Association (ELRA).

William J. Corvey, Sarah Vieweg, Travis Rood, and Martha Palmer. 2010. Twitter in Mass Emergency: What NLP Can Contribute. In Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics in a World of Social Media, pages 23–24, Los Angeles, California, USA. Association for Computational Linguistics.

Philipp Seeberger and Korbinian Riedhammer. 2022. Enhancing Crisis-Related Tweet Classification with Entity-Masked Language Modeling and Multi-Task Learning. In Proceedings of the Second Workshop on NLP for Positive Impact (NLP4PI), pages 70–78, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.

Efsun Sarioglu Kayi, Linyong Nan, Bohan Qu, Mona Diab, and Kathleen McKeown. 2020. Detecting Urgency Status of Crisis Tweets: A Transfer Learning Approach for Low Resource Languages. In Proceedings of the 28th International Conference on Computational Linguistics, pages 4693–4703, Barcelona, Spain (Online). International Committee on Computational Linguistics.

Graham Neubig, Yuichiroh Matsubayashi, Masato Hagiwara, and Koji Murakami. 2011. Safety Information Mining — What can NLP do in a disaster—. In Proceedings of 5th International Joint Conference on Natural Language Processing, pages 965–973, Chiang Mai, Thailand. Asian Federation of Natural Language Processing.

Carlos Castillo, Fernando Diaz, Muhammad Imran. 2015. Processing Social Media Messages in Mass Emergency: A Survey. ACM Computing Surveys. https://www.researchgate.net/profile/Muhammad-Imran-43/publication/2643…

Course

Data Science 210. Capstone , Summer 2023

Class Project Gallery

More Information

Website

Github Reporsitory

Detecting Emerging Crises from Tweets

Last updated: August 8, 2023