Please use this identifier to cite or link to this item:
https://www.um.edu.mt/library/oar/handle/123456789/141886| Title: | The association of gender bias with BERT : measuring, mitigating and cross-lingual portability |
| Authors: | Bartl, Marion (2021) |
| Keywords: | Discrimination Natural language processing (Computer science) Sex English language German language |
| Issue Date: | 2021 |
| Citation: | Bartl, M. (2021). The association of gender bias with BERT: measuring, mitigating and cross-lingual portability (Master's dissertation). |
| Abstract: | The development of BERT (Devlin et al., 2018) and other contextualized word embeddings (Radford et al., 2019; Peters et al., 2018) brought about a significant performance increase for many NLP applications. For this reason, contextualized embeddings are replacing standard embeddings as the semantic knowledge base in NLP systems. Since a variety of biases were previously found in standard word embeddings (Caliskan et al., 2017), it is crucial to take a step back and assess biases encoded in their replacements as well. This work focuses on gender bias in BERT, aiming to measure bias, compare this bias with real-world statistics and subsequently mitigate it. Gender bias is measured through associations between gender-denoting target words and professional terms (Kurita et al., 2019). For mitigating gender bias, we first apply Counterfactual Data Substitution (CDS) (Maudslay et al., 2019) to the GAP corpus (Webster et al., 2018) and then fine-tune BERT on these data. Since these methods for measuring and mitigating bias were originally devel- oped for English, we also adopt a cross-lingual perspective and test whether the approach is portable to German. Unfortunately, we find that grammatical gender in German strongly influences the associations between target and attribute words, which makes it impossible to measure gender bias using the same methodology applied for English. Therefore, further experiments to mitigate gender bias in the German BERT model are discarded. On one hand, we find that gender bias in the English BERT model is reflective of both real-world data and gender stereotypes. We mitigate this gender bias through fine-tuning on data to which CDS was applied. We hope that our positive results for English can contribute to the development of standardized methods to deal with gender bias in contextualized word embeddings. On the other hand, the fact that these methods do not work for German supports previous research calling for more language-specific work in NLP (Gonen et al., 2019; Sun et al., 2019). In light of BERT’s rising popularity, finding appropriate methods to measure and mitigate bias continues to be an essential task. |
| Description: | M.Sc. (HLST)(Melit.) |
| URI: | https://www.um.edu.mt/library/oar/handle/123456789/141886 |
| Appears in Collections: | Dissertations - FacICT - 2021 Dissertations - FacICTAI - 2021 |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| 2118ICTCSA531005066741_1.PDF Restricted Access | 2.58 MB | Adobe PDF | View/Open Request a copy |
Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.
