MIDS Capstone Project Spring 2024

GenderMT: Gender Debias in Machine Translation

Machine translation is widely used across various industries, but it often introduces gender bias and inaccuracies. Gender bias in machine translations can pose heavy social consequences by reinforcing gender stereotypes and inequalities. With growing emphasis in social norms for gender equality, providers risk reputational damage and negative PR consequences from mistranslations. Here we introduce GenderMT, which leverages Large Language Models to tackle gender bias in State of the Art (SOTA) translation models. Our novel approach involves prompt engineering and coreference tagging to generate a large training dataset, which is then fed into an encoder-decoder model to develop a lightweight translation model. We compared our performance to Google Translate, the lead machine translation service provider, in which we achieved a 300% improvement in Recall for unbiased translations. Moreover, in comparison to the Microsoft GATE research paper, our model reveals a substantial improvements in performance, with 53% increase in recall and 15% improvement in F0.5 score. GenderMT is currently deployed on the AWS cloud, offering APIs for service providers and users. We also have integrated a web application interface for users to conduct a quick check on gender bias. We believe GenderMT is a responsible AI tool that will help redefine the future of machine translation. Through a formal publication of our model, we hope that the community can leverage our findings and dataset to build a truly unbiased and accurate translational service. We extend our gratitude to our team members, data providers, past-research contributors, open-source language model developers, and industry partners for their collaboration and insights. Additionally, we thank the broader community for their feedback and support in advancing responsible AI in machine translation.

More Information

Last updated:

April 15, 2024