The association of gender bias with BERT : measuring, mitigating and cross-lingual portability

Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/141886

Title:	The association of gender bias with BERT : measuring, mitigating and cross-lingual portability
Authors:	Bartl, Marion (2021)
Keywords:	Discrimination Natural language processing (Computer science) Sex English language German language
Issue Date:	2021
Citation:	Bartl, M. (2021). The association of gender bias with BERT: measuring, mitigating and cross-lingual portability (Master's dissertation).
Abstract:	The development of BERT (Devlin et al., 2018) and other contextualized word embeddings (Radford et al., 2019; Peters et al., 2018) brought about a significant performance increase for many NLP applications. For this reason, contextualized embeddings are replacing standard embeddings as the semantic knowledge base in NLP systems. Since a variety of biases were previously found in standard word embeddings (Caliskan et al., 2017), it is crucial to take a step back and assess biases encoded in their replacements as well. This work focuses on gender bias in BERT, aiming to measure bias, compare this bias with real-world statistics and subsequently mitigate it. Gender bias is measured through associations between gender-denoting target words and professional terms (Kurita et al., 2019). For mitigating gender bias, we first apply Counterfactual Data Substitution (CDS) (Maudslay et al., 2019) to the GAP corpus (Webster et al., 2018) and then fine-tune BERT on these data. Since these methods for measuring and mitigating bias were originally devel- oped for English, we also adopt a cross-lingual perspective and test whether the approach is portable to German. Unfortunately, we find that grammatical gender in German strongly influences the associations between target and attribute words, which makes it impossible to measure gender bias using the same methodology applied for English. Therefore, further experiments to mitigate gender bias in the German BERT model are discarded. On one hand, we find that gender bias in the English BERT model is reflective of both real-world data and gender stereotypes. We mitigate this gender bias through fine-tuning on data to which CDS was applied. We hope that our positive results for English can contribute to the development of standardized methods to deal with gender bias in contextualized word embeddings. On the other hand, the fact that these methods do not work for German supports previous research calling for more language-specific work in NLP (Gonen et al., 2019; Sun et al., 2019). In light of BERT’s rising popularity, finding appropriate methods to measure and mitigate bias continues to be an essential task.
Description:	M.Sc. (HLST)(Melit.)
URI:	https://www.um.edu.mt/library/oar/handle/123456789/141886
Appears in Collections:	Dissertations - FacICT - 2021 Dissertations - FacICTAI - 2021

Files in This Item:

File	Description	Size	Format
2118ICTCSA531005066741_1.PDF Restricted Access		2.58 MB	Adobe PDF	View/Open Request a copy

Show full item record Statistics