Call for papers: Privacy, identity and linguistics - deidentification, pseudonymisation, anonymisation in linguistics
We would like to invite contributions to an anthology about privacy, identity and linguistics. Contributions should be about how linguists are (and have been) anonymising or pseudonymising their data, how participants (and people mentioned) are affected by different means to protect personal and sensitive information, and finally, how linguistic research is affected now, and in the future, by the use of different deidentification techniques.
We invite contributions in English which discuss these issues and others related to the topic of privacy, identity and linguistics. We are particularly interested in discussions which concern languages, including dialects and minority languages, in northern Europe, but also invite chapters on other languages and on general linguistics. We suggest that contributions could concern the following or similar issues:
- Which information you are currently replacing, but also how and why this is done
- How personal information in data and metadata has been treated in your area of linguistics historically
- How replacements of personal and/or sensitive information affect the usability of data in your area of linguistics and the validity of any conclusions
- How replacements of personal and/or sensitive information might affect results in your area of linguistics
- How replacements affect the semantics or pragmatics of the linguistic data and possibly the perception and readability of the data
- How replacements of personal and/or sensitive information might affect the participants in research in your area of linguistics, including pro’s and con’s of letting participants choose whether and how their personal information should be obscured in the data
- How researchers can navigate between their different responsibilities: protecting individuals from harm, while also acknowledging their contributions to the research, and doing reliable linguistic research that can be important to society.
For a chance to contribute to the volume please submit a title by the 30th January 2026, and a 500-word abstract (excluding references) plus an outline of your proposed chapter by the 31st March 2026. First chapter drafts 30th September 2026. Abstracts and chapters will be peer-reviewed. We are currently primarily considering one of De Gruyter Brill’s series as a publication venue.
Contact information
Editors:
Therese Lindström Tiedemann , senior university lecturer, University of Helsinki, Finland
Lisa Södergård , PhD student, University of Helsinki, Finland
Full call: Privacy, identity and linguistics - deidentification, pseudonymisation, anonymisation in linguistics
We would like to invite contributions to an anthology about privacy, identity and linguistics. Contributions should be about how linguists are (and have been) anonymising or pseudonymising their data, how participants (and people mentioned) are affected by different means to protect personal and sensitive information, and finally, how linguistic research is affected now, and in the future, by the use of different deidentification techniques.
Researchers have the ethical responsibility to protect and respect research participants, their personal information and any sensitive information that might be disclosed in the original data (cf. ALLEA 2023, The Swedish Research Council 2024, Finnish National Board on Research Integrity TENK 2019). In recent years several developments have meant increased attention to the personal and sensitive information that might be contained in our research data and how this should be handled to protect individuals. The General Data Protection Regulation, GDPR (EU 2016/679), emphasises the need to reflect on how we treat personal information. At the same time there is an increased interest in open data in relation to open science, reproducibility and sustainability (UNESCO 2021), as well as a need for (authentic) data to train Large Language Models, LLMs (and hence Artificial intelligence, AI).
Little attention has so far been given to the effects anonymisation and pseudonymisation have on research possibilities and research findings. There is hardly any research about what deidentification measures mean for linguistic research, or for the people who participate in linguistic research by agreeing to be interviewed, submitting texts they have written, or people who are mentioned in that data, etc. (cf. Volodina et al 2025; Szawerna et al. 2024; Wang et al. 2024).
Some researchers prefer non-anonymity for various reasons. This can be part of a development to move towards “empowerment of the participants” (see e.g. Deakin-Smith et al. 2025; Terkourafi 2025; D’Arcy & Bender 2023) or giving “agency” to research participants (Pretorius & Patel 2025) some researchers argue that participants should have the right to decide whether they are named or be given a chance to choose (parts of) their own pseudonym (Vainio 2013; Lahman et al. 2023; Deakin-Smith et al. 2025). But the reason can also be that they consider their participants, or the studied organisation, to be so unique and well-known that anonymity is, in their opinion, impossible to maintain (cf. Vainio 2013: 686).
It is also important to consider whether and how you should indicate what has been replaced, and/or say how you have masked personal and sensitive information (Volodina et al. 2025). There is no agreement about this issue, and we need to consider not only the risks it might involve for the participants and people mentioned in data, but also what it means for transparency, reproducibility/replicability and validation in research (cf. Berez-Kroeker et al. 2017; 2021)? Will pseudonymised data still be useful for linguistic research? Can the data still be considered authentic language data? Can it be used to train LLMs? Will it become too different to original authentic data, and skew both training and research?
We invite contributions in English which discuss these issues and others related to the topic of privacy, identity and linguistics. We are particularly interested in discussions which concern languages, including dialects and minority languages, in northern Europe, but also invite chapters on other languages and on general linguistics. We suggest that contributions could concern the following or similar issues:
- Which information you are currently replacing, but also how and why this is done
- How personal information in data and metadata has been treated in your area of linguistics historically
- How replacements of personal and/or sensitive information affect the usability of data in your area of linguistics and the validity of any conclusions
- How replacements of personal and/or sensitive information might affect results in your area of linguistics
- How replacements (might) affect the semantics or pragmatics of the linguistic data as well as the perception and readability of the data
- How replacements of personal and/or sensitive information might affect the participants in research in your area of linguistics, including pro’s and con’s of letting participants choose whether and how their personal information should be obscured in the data
- How researchers can navigate between their different responsibilities: protecting individuals from harm, while also acknowledging their contributions to the research, and doing reliable linguistic research that can be important to society.
Important dates
Preliminary:
- 30th January, 2026: deadline for expression of interest SUBMIT HERE
- 31st March, 2026: 500-word abstract plus chapter outline
- 30th September, 2026: chapter submission deadline (preliminary length max 9 000 words including everything)
- February, 2027: notification from the first round of reviews
- June, 2027: final versions due
Abstracts and chapters will be peer-reviewed. We are currently primarily considering one of De Gruyter Brill’s series as a publication venue.
References
ALLEA. 2023. The European Code of Conduct for Research Integrity – revised edition 2023. Berlin. (DOI 10.26356/ECOC) https://allea.org/code-of-conduct/#toggle-id-15 (Latest access: 14 Nov. 2025)
Berez-Kroeker, A. L., Gawne, L., Kelly, B. F. & Heston, T. 2017. A survey of current reproducibility practices in linguistics journals, 2003–2012. https://sites.google.com/a/hawaii.edu/data-citation/survey
Berez-Kroeker, A. L., McDonnell, B., Collister, L. B. & Koller, E. 2021. Data, data management, and reproducible research in linguistics: on the need for The open handbook of linguistic data management. In: Berez-Kroeker, A. L., McDonnell, B., Koller, E., Collister, L. B. (eds.) The Open Handbook of Linguistic Data Management. Cambridge, Massachusetts & London: MIT Press. Pp. 3–8.
D’Arcy, A., & Bender, E. M. (2023). Ethics in linguistics. Annual Review of Linguistics, 9(1), 49-69.
Deakin-Smith, H., Pilcher, J., Flaherty, J., Coffey, A., & Makis, E. 2025. The research politics of (re) naming participants: A sociology of names perspective. Qualitative Research, 25(3), 629-647.
EU Commission. 2016. General data protection regulation. Official Journal of the European Union, 59, 1–88.
Finnish National Board on Research Integrity, TENK 2019. The ethical principles of research with human participants and ethical review in the human sciences in Finland. Publications of the Finnish National Board on Research Integrity TENK 3/2019.
Lahman, M. K., Thomas, R., & Teman, E. D. 2023. A good name: Pseudonyms in research. Qualitative Inquiry, 29(6), 678-685.
Pretorius, L., & Patel, S. V. 2025. What’s in a name? Participants’ pseudonym choices as a practice of empowerment and epistemic justice. International Journal of Research & Method in Education, 48(4), pp. 371-388.
The Swedish Research Council. 2024. Good Research Practice. Stockholm: The Swedish Research Council. https://www.vr.se/english/analysis/reports/our-reports/2025-07-03-good-research-practice-2024.html
Szawerna, M., Dobnik, S., Lindström Tiedemann, T., Muñoz Sánchez, R., Vu, X.-S. & Volodina, E. 2024. Pseudonymization Categories across Domain Boundaries. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). European Language Resources Association. pp. 13303–13314.
Terkourafi, M. 2025. An ethics for linguistics? What, why, and how?. Linguistics, 63(2), pp. 317–347.
UNESCO. 2021. Recommendation on Open Science. https://doi.org/10.54677/MNMH8546
Vainio, A. 2013. Beyond research ethics: Anonymity as ‘ontology’,‘analysis’ and ‘independence’. Qualitative Research, 13(6). SAGE Publications. pp. 685–698.
Volodina, E., Dobnik, S., Lindström Tiedemann, T., Muñoz Sánchez, R., Szawerna, M. I., Södergård, L. & Vu, X.-S. 2025. Towards shared standards for pseudonymization of research data. Huminfra Conference proceedings, 12th–13th Nov. 2025, Stockholm. https://www.huminfra.se/resources/humevents/hic-2025_proceedings.pdf
Wang, S., Ramdani, J. M., Sun, S., Bose P. & X Gao. 2024. “Naming Research Participants in Qualitative Language Learning Research: Numbers, Pseudonyms, or Real Names?” Journal of Language, Identity & Education. pp. 1–14.
| Acknowledgments |
|---|
| The cfp is organized within the research environment project Grandma Karl is 27 years old and is supported by a research grant on pseudonymization from the Swedish Research Council. |