Call for participation
|We cordially invite all those interested in all aspects of pseudonymization and privacy protection of the research data to attend the Open House event organized by the team behind the project Mormor Karl (Grandma Karl in English). Mormor Karl is a project on pseudonymization of research data funded by the Swedish Research Council during 2023–2028.|
|Dates and venue
- Boel Nelson (differential privacy)
- Karoline Marko (forensic linguistics, authorship analysis)
- Rada Mihalcea (NLP, deception, privacy)
Dates and venue
|Date: November, 29, 2023
Time: 9.00 – 17.00
Venue: Univeristy of Gothenburg, Humanisten, J222
Registration: Please fill in the following form by the end of November, 16, 2023.
The registration is free and open for researchers, students and companies, but in order to plan for the event, we would like to know whether you are planning to participate in the conference so we can provide coffee for you.
Zoom link: You can join us online at this Zoom link. If you have any issues connecting, you can send us an email.
|Accessibility of research data is critical for advances in many research fields but textual data often cannot be shared due to the personal and sensitive information which it contains, e.g names, political opinions, sensitive personal information and medical data. General Data Protection Regulation, GDPR (EU Commission, 2016), suggests pseudonymization as a solution to secure open access to research data but we need to learn more about pseudonymization as an approach before adopting it for manipulation of research data (Volodina et al., 2023). The main challenge is how to effectively pseudonymize data so that such individuals cannot be identified but at the same time the data is still usable for the natural language processing tasks for which it was collected and for other types of research.
During the workshop day, a number of invited speakers from Sweden, Scandinavia and USA will present their research on data pseudonymization and privacy preservation; Mormor Karl researchers will showcase research problems they will be working with in the next five years, and a panel discussion will involve panelists and audience in a timely discussion on pseudonymization and open access to research data.
|9.00||Elena Volodina (Opening)
Who is Mormor Karl and why is he 27 years old?
|9.10-10.00||Boel Nelson (Invited talk) - chair: Sonny Vu
Differential privacy: principled foundations, and trade-offs in applications
|10.30-12.00||Mormor Karl talks
Elena Volodina: Mormor Karl’s Research Agenda
Maria Irena Szawerna: Sense and Sensitivity: what do we need to turn private information into pseudonyms?
Therese Lindström Tiedemann: When Sverige is better than Sweden and Helsingfors is better than Helsinki.
Ricardo Muñoz Sánchez: Name Biases in Automated Essay Assessment
Simon Dobnik: Language models, computational semantics and meaning representation
Xuan-Son Vu: Privacy preserving ML / AI: problems, regulations, research, and solutions
|12.00-13.00||Lunch in the lobby.|
|13.00-13.50||Karoline Marko (Invited talk) - chair: Therese Lindström Tiedemann
Authorship analysis, personality traits, and the manipulation of linguistic features
|14.30-15.20||Rada Mihalcea (online) (Invited talk) - chair: Simon Dobnik
Lessons from a Decade of Research on Automatic Deception Detection
|15.20-17.00||Boel Nelson, Hercules Dalianis, Karoline Marko, Peter Ljunglöf, Ylva Byrman (Panel discussion)
Data Privacy and its Risks
Moderators: Simon Dobnik and Elena Volodina
|Boel Nelson (Aarhus university, Denmark) Differential privacy
Hercules Dalianis (Stockholm university, Sweden) NLP and pseudonymization
Karoline Marko (University of Graz, Austria) Authorship attribution
Peter Ljunglöf (Chalmers University of Technology, Sweden) NLP, privacy preservation, data and model biases
Ylva Byrman (University of Gothenburg, Sweden) Linguistics, forensic linguistics
|Boel Nelson (differential privacy), Aarhus university, Denmark|
|Differential privacy: principled foundations, and trade-offs in applications
Differential privacy is a formal definition of privacy both recognized for its theoretical strength (2017 Gödel Prize, 2021 ACM Paris Kanellakis Theory and Practice Award), and deployed at scale by companies such as Google and Apple. Conceptually, differential privacy stands out by being defined as a property of an algorithm, rather than a property of output data. This shift in defining privacy allows differential privacy to achieve several desirable properties, including privacy guarantees that compose under multiple data releases. Still, differential privacy is no silver bullet—any application of differential privacy requires striking an adequate trade-off between accuracy and privacy. As such, differential privacy still presents many open challenges, not the least when it comes to providing meaningful privacy guarantees for real use cases.
Boel Nelson is a Marie Skłodowska-Curie postdoctoral fellow in the Logic and Semantics group at Aarhus University. She currently leads the MSCA funded project Provable Privacy for Metadata (ProPriM). Boel’s research interests include data privacy, detection and mitigation of side-channels, and privacy enhancing technologies.
Prior to joining Aarhus University, Boel worked as a postdoc focusing on differential privacy in the Algorithms and Complexity section at University of Copenhagen, where she was also a member of Basic Algorithms Research Copenhagen (BARC). Before, she worked as a postdoc in the Logic and Semantics group at Aarhus University, where she conducted research on anonymous communication. Boel earned her PhD on the topic of differential privacy from Chalmers University of Technology.
|Karoline Marko (forensic linguistics, authorship analysis), University of Graz, Austria|
|Authorship analysis, personality traits, and the manipulation of linguistic features
Authorship analysis is based on the idea that the social contexts in which language is acquired leaves traces in an individual’s language use and that in turn, these traces can provide hints at the sociodemographic, physical, and contextual variables that caused them (Fobbe, 2021). For example, language can contain hints at an individual’s age, gender, social background, including educational and professional backgrounds (e.g., Ehrhardt, 2018), mental and psychological disorders (e.g., Gawda, 2022; Hunter & Grant, 2022), and to some degree also personality traits (e.g., Marengo et al., 2017; Moreno et al., 2021). However, the identification of features as well as their relation to personality traits and sociodemographic features is not straight-forward. One complication is caused by the potential of manipulation of linguistic features: the use of some features can more consciously be manipulated than others; some individuals are apt at adapting their writing styles to that of their interlocutor/addressee or have a generally non-salient style, while others develop a more salient writing style and are thus more easily identifiable (Wright, 2013, 2017). This presentation will provide some insights into linguistic features that can be connected to specific sociodemographic features and personality traits, shed light onto linguistic features that lend themselves more easily for manipulation than others, and address the issue of individuals’ awareness of their own writing style.
* Ehrhardt, S. (2018). Authorship attribution analysis. In M. Rathert & J. Visconti (Eds.), Handbook of communication in the legal sphere (pp. 169-200). Berlin: de Gruyter.
* Fobbe, E. (2021). Forensische Linguistik. Eine kriminaltechnische Disziplin in Deutschland. SIAK – Zeitschrift für Polizeiwissenschaft und polizeiliche Praxis, 4, 18-27.
* Gawda, B. (2022). The differentiation of narrative styles in individuals with high psychopathic deviate. Journal of Psycholinguistic Research, 51(1), 75-92.
* Hunter, M. & Grant, T. (2022). Killer stance: an investigation of the relationship between attitudinal resources and psychological traits in the writings of four serial murderers. Language and Law=Linguagem e Direito, 9(1), 48-72.
* Marengo, D., Giannotta, F. & Settanni, M. (2017). Assessing personality using emoji: an exploratory study. Personality and Individual Differences, 112, 74-78.
* Moreno, J. et al. (2021). Can personality traits me measured analyzing written language? A meta-analytic study on computational methods. Personality and Individual Differences, 177, 1-12.
* Wright, D. (2013). Stylistic variation within genre conventions in the Enron email corpus: developing a textsensitive methodology for authorship research.
* Wright, D. (2017). Using word n-grams to identify authors and idiolects: a corpus approach to a forensic linguistic problem. International Journal of Corpus Linguistics, 22(2), 212-214.
Karoline Marko has obtained her PhD in 2017 and is currently a postdoctoral researcher at the University of Graz. She has specialized in the field of Forensic Linguistics and her research interests include authorship analysis and forensic discourse analysis. In 2020, she became the coordinator of the Forensic Linguistics Certificate at the University of Graz and since 2022, she has been teaching a course on Forensic Linguistics to students of law.
|Rada Mihalcea (NLP, deception, privacy), University of Michigan, United States|
|Lessons from a Decade of Research on Automatic Deception Detection
Whether we like it or not, deception occurs everyday and everywhere: thousands of trials take place daily around the world; little white lies: “I’m busy that day!” even if your calendar is blank; news “with a twist” (a.k.a. fake news) meant to attract the readers attention or influence people in their future undertakings; misinformation in health social media posts; portrayed identities, on dating sites and elsewhere. Can a computer automatically detect deception in written accounts or in video recordings?
In this talk, I will overview over a decade of research in building linguistic and multimodal resources and algorithms for deception detection, targeting deceptive statements, trial videos, fake news, identity deception, and health misinformation. I will also show how these algorithms can provide insights into what makes a good lie - and thus teach us how we can spot a liar. As it turns out, computers can be trained to identify lies in many different contexts, and they can often do it better than humans do.
Rada Mihalcea is the Janice M. Jenkins Professor of Computer Science and Engineering at the University of Michigan and the Director of the Michigan Artificial Intelligence Lab. Her research interests are in natural language processing, including multilingual natural language processing, computational social sciences, multimodal processing. She is an ACM Fellow, a AAAI Fellow, and served as ACL President (2018-2022 Vice/Past). She is the recipient of a Sarah Goddard Power award (2019) for her contributions to diversity in science, an honorary citizen of her hometown of Cluj-Napoca, Romania (2013), and the recipient of a Presidential Early Career Award for Scientists and Engineers awarded by President Obama (2009).
|* October, 17: first call for registrations
* November, 2: second call for registrations
* November, 16: deadline for registrations
* November, 29: the Open House event
|* Elena Volodina, University of Gothenburg, Sweden
* Simon Dobnik, University of Gothenburg, Sweden
* Therese Lindström Tiedemann, University of Helsinki, Finland
* Ricardo Muñoz Sánchez, University of Gothenburg, Sweden
* Maria Irena Szawerna, University of Gothenburg, Sweden
* Xuan-Son Vu, Umeå University, Sweden
|Contact: mormor.karl at svenska.gu.se|