Grandma Karl is 27 years old: Automatic pseudonymization of research data is a research environment project funded by the Swedish Research Council (Contract No. VR 2022-02311) during six years, starting 2023.
The three collaborating universities are:
- University of Gothenburg, Sweden
- University of Helsinki, Finland
- Umeå university, Sweden
More on the team see here.
The Grandma Karl-project (or Mormor Karl-project in Swedish), as we call it for shortness, targets several aspects of pseudonymization, aiming to advance Sweden’s work on open access to research data:
-
algorithms to automatically detect, label and pseudonymize personal identifiers in freely written texts (essays/blogs), focusing on linguistic challenges such as spelling errors, ambiguous entities, semantic constraints etc
-
analysis of type and number of personal identifiers versus acceptable protection, followed by reidentification tests to ensure that pseudonymization is effective
-
analysis of the effects of pseudonymization on research data, e.g on the readability of the resulting texts, their utility for answering the intended research questions and applicability to practical scenarios (e.g language assessment)
The primary data for the experiments consists of Swedish learner-written essays, collected and manually annotated by us. The results are further tested on social media domain (through available corpora) and other types of data containing personal information. Natural Language Processing, machine learning, neural networks, word embeddings are some of the methods we are working with.
More on the project see here.