Open House at Mormor Karl’s

Call for participation


We cordially invite all those interested in all aspects of pseudonymization and privacy protection of the research data to attend the Open House event organized by the team behind the project Mormor Karl (Grandma Karl in English). Mormor Karl is a project on pseudonymization of research data funded by the Swedish Research Council during 2023–2028.


Dates and venue Description Program Invited speakers - Boel Nelson (differential privacy) - Karoline Marko (forensic linguistics, authorship analysis) - Rada Mihalcea (NLP, deception, privacy) Important dates Organizers

Dates and venue


Date: November, 29, 2023 Time: 9.00 – 17.00 Venue: Univeristy of Gothenburg, Humanisten, J222 Registration: Please fill in the following form by the end of November, 16, 2023. The registration is free and open for researchers, students and companies, but in order to plan for the event, we would like to know whether you are planning to participate in the conference so we can provide coffee for you. Zoom link: You can join us online at this Zoom link. If you have any issues connecting, you can send us an email.

Date: November, 29, 2023
Time: 9.00 – 17.00
Venue: Univeristy of Gothenburg, Humanisten, J222
Registration: Please fill in the following form by the end of November, 16, 2023.
The registration is free and open for researchers, students and companies, but in order to plan for the event, we would like to know whether you are planning to participate in the conference so we can provide coffee for you.
Zoom link: You can join us online at this Zoom link. If you have any issues connecting, you can send us an email.

Description


Accessibility of research data is critical for advances in many research fields but textual data often cannot be shared due to the personal and sensitive information which it contains, e.g names, political opinions, sensitive personal information and medical data. General Data Protection Regulation, GDPR (EU Commission, 2016), suggests pseudonymization as a solution to secure open access to research data but we need to learn more about pseudonymization as an approach before adopting it for manipulation of research data (Volodina et al., 2023). The main challenge is how to effectively pseudonymize data so that such individuals cannot be identified but at the same time the data is still usable for the natural language processing tasks for which it was collected and for other types of research. During the workshop day, a number of invited speakers from Sweden, Scandinavia and USA will present their research on data pseudonymization and privacy preservation; Mormor Karl researchers will showcase research problems they will be working with in the next five years, and a panel discussion will involve panelists and audience in a timely discussion on pseudonymization and open access to research data.

Accessibility of research data is critical for advances in many research fields but textual data often cannot be shared due to the personal and sensitive information which it contains, e.g names, political opinions, sensitive personal information and medical data. General Data Protection Regulation, GDPR (EU Commission, 2016), suggests pseudonymization as a solution to secure open access to research data but we need to learn more about pseudonymization as an approach before adopting it for manipulation of research data (Volodina et al., 2023). The main challenge is how to effectively pseudonymize data so that such individuals cannot be identified but at the same time the data is still usable for the natural language processing tasks for which it was collected and for other types of research.

During the workshop day, a number of invited speakers from Sweden, Scandinavia and USA will present their research on data pseudonymization and privacy preservation; Mormor Karl researchers will showcase research problems they will be working with in the next five years, and a panel discussion will involve panelists and audience in a timely discussion on pseudonymization and open access to research data.

Program


9.00	Elena Volodina (Opening) Who is Mormor Karl and why is he 27 years old?
9.10-10.00	Boel Nelson (Invited talk) - chair: Sonny Vu Differential privacy: principled foundations, and trade-offs in applications
10.00-10.30	COFFEE BREAK.
10.30-12.00	Mormor Karl talks Elena Volodina: Mormor Karl’s Research Agenda Maria Irena Szawerna: Sense and Sensitivity: what do we need to turn private information into pseudonyms? Therese Lindström Tiedemann: When Sverige is better than Sweden and Helsingfors is better than Helsinki. Ricardo Muñoz Sánchez: Name Biases in Automated Essay Assessment Simon Dobnik: Language models, computational semantics and meaning representation Xuan-Son Vu: Privacy preserving ML / AI: problems, regulations, research, and solutions

12.00-13.00	Lunch in the lobby.

13.00-13.50	Karoline Marko (Invited talk) - chair: Therese Lindström Tiedemann Authorship analysis, personality traits, and the manipulation of linguistic features
13.50-14.30	COFFEE BREAK.
14.30-15.20	Rada Mihalcea (online) (Invited talk) - chair: Simon Dobnik Lessons from a Decade of Research on Automatic Deception Detection
15.20-15.30	BREAK
15.20-17.00	Boel Nelson, Hercules Dalianis, Karoline Marko, Peter Ljunglöf, Ylva Byrman (Panel discussion) Data Privacy and its Risks Moderators: Simon Dobnik and Elena Volodina

PANELISTS
Boel Nelson (Aarhus university, Denmark) Differential privacy Hercules Dalianis (Stockholm university, Sweden) NLP and pseudonymization Karoline Marko (University of Graz, Austria) Authorship attribution Peter Ljunglöf (Chalmers University of Technology, Sweden) NLP, privacy preservation, data and model biases Ylva Byrman (University of Gothenburg, Sweden) Linguistics, forensic linguistics

Invited speakers

Boel Nelson (differential privacy), Aarhus university, Denmark
Differential privacy: principled foundations, and trade-offs in applications Differential privacy is a formal definition of privacy both recognized for its theoretical strength (2017 Gödel Prize, 2021 ACM Paris Kanellakis Theory and Practice Award), and deployed at scale by companies such as Google and Apple. Conceptually, differential privacy stands out by being defined as a property of an algorithm, rather than a property of output data. This shift in defining privacy allows differential privacy to achieve several desirable properties, including privacy guarantees that compose under multiple data releases. Still, differential privacy is no silver bullet—any application of differential privacy requires striking an adequate trade-off between accuracy and privacy. As such, differential privacy still presents many open challenges, not the least when it comes to providing meaningful privacy guarantees for real use cases.
BIO Boel Nelson is a Marie Skłodowska-Curie postdoctoral fellow in the Logic and Semantics group at Aarhus University. She currently leads the MSCA funded project Provable Privacy for Metadata (ProPriM). Boel’s research interests include data privacy, detection and mitigation of side-channels, and privacy enhancing technologies. Prior to joining Aarhus University, Boel worked as a postdoc focusing on differential privacy in the Algorithms and Complexity section at University of Copenhagen, where she was also a member of Basic Algorithms Research Copenhagen (BARC). Before, she worked as a postdoc in the Logic and Semantics group at Aarhus University, where she conducted research on anonymous communication. Boel earned her PhD on the topic of differential privacy from Chalmers University of Technology.

Boel Nelson (differential privacy), Aarhus university, Denmark

Differential privacy: principled foundations, and trade-offs in applications
Differential privacy is a formal definition of privacy both recognized for its theoretical strength (2017 Gödel Prize, 2021 ACM Paris Kanellakis Theory and Practice Award), and deployed at scale by companies such as Google and Apple. Conceptually, differential privacy stands out by being defined as a property of an algorithm, rather than a property of output data. This shift in defining privacy allows differential privacy to achieve several desirable properties, including privacy guarantees that compose under multiple data releases. Still, differential privacy is no silver bullet—any application of differential privacy requires striking an adequate trade-off between accuracy and privacy. As such, differential privacy still presents many open challenges, not the least when it comes to providing meaningful privacy guarantees for real use cases.

BIO
Boel Nelson is a Marie Skłodowska-Curie postdoctoral fellow in the Logic and Semantics group at Aarhus University. She currently leads the MSCA funded project Provable Privacy for Metadata (ProPriM). Boel’s research interests include data privacy, detection and mitigation of side-channels, and privacy enhancing technologies.
Prior to joining Aarhus University, Boel worked as a postdoc focusing on differential privacy in the Algorithms and Complexity section at University of Copenhagen, where she was also a member of Basic Algorithms Research Copenhagen (BARC). Before, she worked as a postdoc in the Logic and Semantics group at Aarhus University, where she conducted research on anonymous communication. Boel earned her PhD on the topic of differential privacy from Chalmers University of Technology.

Karoline Marko (forensic linguistics, authorship analysis), University of Graz, Austria
Authorship analysis, personality traits, and the manipulation of linguistic features Authorship analysis is based on the idea that the social contexts in which language is acquired leaves traces in an individual’s language use and that in turn, these traces can provide hints at the sociodemographic, physical, and contextual variables that caused them (Fobbe, 2021). For example, language can contain hints at an individual’s age, gender, social background, including educational and professional backgrounds (e.g., Ehrhardt, 2018), mental and psychological disorders (e.g., Gawda, 2022; Hunter & Grant, 2022), and to some degree also personality traits (e.g., Marengo et al., 2017; Moreno et al., 2021). However, the identification of features as well as their relation to personality traits and sociodemographic features is not straight-forward. One complication is caused by the potential of manipulation of linguistic features: the use of some features can more consciously be manipulated than others; some individuals are apt at adapting their writing styles to that of their interlocutor/addressee or have a generally non-salient style, while others develop a more salient writing style and are thus more easily identifiable (Wright, 2013, 2017). This presentation will provide some insights into linguistic features that can be connected to specific sociodemographic features and personality traits, shed light onto linguistic features that lend themselves more easily for manipulation than others, and address the issue of individuals’ awareness of their own writing style.
References * Ehrhardt, S. (2018). Authorship attribution analysis. In M. Rathert & J. Visconti (Eds.), Handbook of communication in the legal sphere (pp. 169-200). Berlin: de Gruyter. * Fobbe, E. (2021). Forensische Linguistik. Eine kriminaltechnische Disziplin in Deutschland. SIAK – Zeitschrift für Polizeiwissenschaft und polizeiliche Praxis, 4, 18-27. * Gawda, B. (2022). The differentiation of narrative styles in individuals with high psychopathic deviate. Journal of Psycholinguistic Research, 51(1), 75-92. * Hunter, M. & Grant, T. (2022). Killer stance: an investigation of the relationship between attitudinal resources and psychological traits in the writings of four serial murderers. Language and Law=Linguagem e Direito, 9(1), 48-72. * Marengo, D., Giannotta, F. & Settanni, M. (2017). Assessing personality using emoji: an exploratory study. Personality and Individual Differences, 112, 74-78. * Moreno, J. et al. (2021). Can personality traits me measured analyzing written language? A meta-analytic study on computational methods. Personality and Individual Differences, 177, 1-12. * Wright, D. (2013). Stylistic variation within genre conventions in the Enron email corpus: developing a textsensitive methodology for authorship research. * Wright, D. (2017). Using word n-grams to identify authors and idiolects: a corpus approach to a forensic linguistic problem. International Journal of Corpus Linguistics, 22(2), 212-214.
BIO Karoline Marko has obtained her PhD in 2017 and is currently a postdoctoral researcher at the University of Graz. She has specialized in the field of Forensic Linguistics and her research interests include authorship analysis and forensic discourse analysis. In 2020, she became the coordinator of the Forensic Linguistics Certificate at the University of Graz and since 2022, she has been teaching a course on Forensic Linguistics to students of law.

Rada Mihalcea (NLP, deception, privacy), University of Michigan, United States
Lessons from a Decade of Research on Automatic Deception Detection Whether we like it or not, deception occurs everyday and everywhere: thousands of trials take place daily around the world; little white lies: “I’m busy that day!” even if your calendar is blank; news “with a twist” (a.k.a. fake news) meant to attract the readers attention or influence people in their future undertakings; misinformation in health social media posts; portrayed identities, on dating sites and elsewhere. Can a computer automatically detect deception in written accounts or in video recordings? In this talk, I will overview over a decade of research in building linguistic and multimodal resources and algorithms for deception detection, targeting deceptive statements, trial videos, fake news, identity deception, and health misinformation. I will also show how these algorithms can provide insights into what makes a good lie - and thus teach us how we can spot a liar. As it turns out, computers can be trained to identify lies in many different contexts, and they can often do it better than humans do.
BIO Rada Mihalcea is the Janice M. Jenkins Professor of Computer Science and Engineering at the University of Michigan and the Director of the Michigan Artificial Intelligence Lab. Her research interests are in natural language processing, including multilingual natural language processing, computational social sciences, multimodal processing. She is an ACM Fellow, a AAAI Fellow, and served as ACL President (2018-2022 Vice/Past). She is the recipient of a Sarah Goddard Power award (2019) for her contributions to diversity in science, an honorary citizen of her hometown of Cluj-Napoca, Romania (2013), and the recipient of a Presidential Early Career Award for Scientists and Engineers awarded by President Obama (2009).

Rada Mihalcea (NLP, deception, privacy), University of Michigan, United States

Lessons from a Decade of Research on Automatic Deception Detection
Whether we like it or not, deception occurs everyday and everywhere: thousands of trials take place daily around the world; little white lies: “I’m busy that day!” even if your calendar is blank; news “with a twist” (a.k.a. fake news) meant to attract the readers attention or influence people in their future undertakings; misinformation in health social media posts; portrayed identities, on dating sites and elsewhere. Can a computer automatically detect deception in written accounts or in video recordings?
In this talk, I will overview over a decade of research in building linguistic and multimodal resources and algorithms for deception detection, targeting deceptive statements, trial videos, fake news, identity deception, and health misinformation. I will also show how these algorithms can provide insights into what makes a good lie - and thus teach us how we can spot a liar. As it turns out, computers can be trained to identify lies in many different contexts, and they can often do it better than humans do.

BIO
Rada Mihalcea is the Janice M. Jenkins Professor of Computer Science and Engineering at the University of Michigan and the Director of the Michigan Artificial Intelligence Lab. Her research interests are in natural language processing, including multilingual natural language processing, computational social sciences, multimodal processing. She is an ACM Fellow, a AAAI Fellow, and served as ACL President (2018-2022 Vice/Past). She is the recipient of a Sarah Goddard Power award (2019) for her contributions to diversity in science, an honorary citizen of her hometown of Cluj-Napoca, Romania (2013), and the recipient of a Presidential Early Career Award for Scientists and Engineers awarded by President Obama (2009).

Important dates


* October, 17: first call for registrations * November, 2: second call for registrations * November, 16: deadline for registrations * November, 29: the Open House event

Organizers


* Elena Volodina, University of Gothenburg, Sweden * Simon Dobnik, University of Gothenburg, Sweden * Therese Lindström Tiedemann, University of Helsinki, Finland * Ricardo Muñoz Sánchez, University of Gothenburg, Sweden * Maria Irena Szawerna, University of Gothenburg, Sweden * Xuan-Son Vu, Umeå University, Sweden

Contact: mormor.karl at svenska.gu.se

Twitter Facebook LinkedIn

Mormor Karl

Open House at Mormor Karl’s

Call for participation

Contents

Dates and venue

Description

Program

Invited speakers

Important dates

Organizers

You May Also Enjoy

Panel at IPC20, Helsinki, 2027: Anonymisation and pseudonymisation – challenges with privacy protection in linguistics

Presentation at the 51st NORNA Symposium – Names and Cultural Heritage, May 2026

Workshop for 9–13 year olds at Unijunior, University of Helsinki, April 2026

Media coverage in connection with the question list on names and associations