PhD Position F/M Fine-grained, multimodal speech anonymization

July 02, 2023
Offerd Salary:Negotiation
Working address:N/A
Contract Type:Other
Working Time:Negotigation
Working type:N/A
Ref info:N/A

2023-06410 - PhD Position F/M Fine-grained, multimodal speech anonymization

Contract type : Fixed-term contract

Level of qualifications required : Graduate degree or equivalent

Fonction : PhD Position


This PhD is part of the "Personal data protection" project of PEPR Cybersécurité, which aims to advance privacy preservation technology for various application sectors. It will be co-supervised by Emmanuel Vincent and Marc Tommasi. The PhD student will have the opportunity to spend time in both the Multispeech and Magnet teams, to collaborate with 9 other research teams in France and with the French data protection authority CNIL, and to contribute to the project's overall goals including the organization of an anonymization challenge.


Large-scale collection, storage, and processing of speech data poses severe privacy threats 1. Indeed, speech encapsulates a wealth of personal data (e.g., age and gender, ethnic origin, personality traits, health and socio- economic status, etc.) which can be linked to the speaker's identity via metadata or via automatic speaker recognition. Speech data may also be used for voice spoofing using voice cloning software. With firm backing by privacy legislations such as the European general data protection regulation (GDPR), several initiatives are emerging to develop and evaluate privacy preservation solutions for speech technology. These include voice anonymization methods 2 which aim to conceal the speaker's voice identity without degrading the utility for downstream tasks, and speaker re-identification attacks 3 which aim to assess the resulting privacy guarantees, e.g., in the scope of the VoicePrivacy challenge series 4.

1 A. Nautsch, A. Jimenez, A. Treiber, J. Kolberg, C. Jasserand, E. Kindt, H. Delgado, M. Todisco, M. A. Hmani, M. A. Mtibaa, A. Abdelraheem, A. Abad, F. Teixeira, M. Gomez-Barrero, D. Petrovska, N. Chollet, G. Evans, T. Schneider, J.-F. Bonastre, B. Raj, I. Trancoso, and C. Busch, “Preserving privacy in speaker and speech characterisation,” Computer Speech and Language, vol. 58, pp. 441–480, 2019.

2 B. M. L. Srivastava, M. Maouche, M. Sahidullah, E. Vincent, A. Bellet, M. Tommasi, N. Tomashenko, X. Wang, and J. Yamagishi, “Privacy and utility of x-vector based speaker anonymization,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, to appear.

3 B. M. L. Srivastava, N. Vauquier, M. Sahidullah, A. Bellet, M. Tommasi, and E. Vincent, “Evaluating voice conversion-based privacy protection against informed attackers,” in 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2802–2806, 2020.

4 N. Tomashenko, X. Wang, E. Vincent, J. Patino, B. M. L. Srivastava, P.-G. Noé, A. Nautsch, N. Evans, J. Yamagishi, B. O'Brien, A. Chanclu, J.-F. Bonastre, M. Todisco, and M. Maouche, “The VoicePrivacy 2020 Challenge: Results and findings,” Computer Speech and Language, vol. 74, pp. 101362, 2022.

Main activities

The first objective of this PhD is to improve the privacy-utility tradeoff by better disentangling speaker identity from other attributes, and better decorrelating the underlying dimensions. Solutions may rely on suitable generative or self-supervised models 5, 6 or on adversarial learning 7. The resulting privacy guarantees will be evaluated via stronger attackers, e.g., taking metadata into account.

The second objective is to extend the proposed audio-only approach to multimodal speech (audio, facial video, and gestures). Solutions will exploit existing facial anonymization technology 8. A key difficulty will be to preserve the correlations between modalities, which are essential for training multimodal voice processing systems.

Depending on the PhD student's skills, additional directions may also be explored, e.g., evaluating the proposed anonymization solutions in the context of federated learning.

5 L. Girin, S. Leglaive, X. Bie, J. Diard, T. Hueber, and X. Alameda- Pineda, “Dynamical variational autoencoders: A comprehensive review,” Now Foundations and Trends, 2021.

6 A. Baevski, H. Zhou, A. Mohamed, and M. Auli, “wav2vec 2.0: A framework for self-supervised learning of speech representations,” in Advances in Neural Information Processing Systems, pp. 12449–12460, 2020.

7 B. M. L. Srivastava, A. Bellet, M. Tommasi, and E. Vincent, “Privacy- preserving adversarial representation learning in ASR: Reality or illusion?” in Interspeech, pp. 3700–3704, 2019.

8 T. Ma, D. Li, W. Wang, and J. Dong, “CFA-Net: Controllable face anonymization network with identity representation manipulation,” arXiv preprint arXiv:2105.11137, 2021.


MSc in computer science, machine learning, or signal processing. Strong programming skills in Python/Pytorch. Prior experience in speech and video processing will be an asset.

Benefits package
  • Subsidized meals
  • Partial reimbursement of public transport costs
  • Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
  • Possibility of teleworking (after 6 months of employment) and flexible organization of working hours
  • Professional equipment available (videoconferencing, loan of computer equipment, etc.)
  • Social, cultural and sports events and activities
  • Access to vocational training
  • Social security coverage
  • Remuneration

    1982€ gross/month for 1st and 2nd year.

    2085€ gross/month for 3rd year.

    Monthly salary after taxes : around 1596,05€ for 1st and 2nd year, 1678,99€ for 3rd year (medical insurance included).

    General Information
  • Theme/Domain : Language, Speech and Audio
  • Town/city : Villers lès Nancy
  • Inria Center : CRI Nancy - Grand Est
  • Starting date : 2023-10-01
  • Duration of contract : 3 years
  • Deadline to apply : 2023-07-02
  • Contacts
  • Inria Team : MULTISPEECH
  • PhD Supervisor : Vincent Emmanuel / [email protected]
  • About Inria

    Inria is the French national research institute dedicated to digital science and technology. It employs 2,600 people. Its 200 agile project teams, generally run jointly with academic partners, include more than 3,500 scientists and engineers working to meet the challenges of digital technology, often at the interface with other disciplines. The Institute also employs numerous talents in over forty different professions. 900 research support staff contribute to the preparation and development of scientific and entrepreneurial projects that have a worldwide impact.

    Instruction to apply

    Defence Security : This position is likely to be situated in a restricted area (ZRR), as defined in Decree No. 2011-1425 relating to the protection of national scientific and technical potential (PPST).Authorisation to enter an area is granted by the director of the unit, following a favourable Ministerial decision, as defined in the decree of 3 July 2012 relating to the PPST. An unfavourable Ministerial decision in respect of a position situated in a ZRR would result in the cancellation of the appointment.

    Recruitment Policy : As part of its diversity policy, all Inria positions are accessible to people with disabilities.

    Warning : you must enter your e-mail address in order to save your application to Inria. Applications must be submitted online on the Inria website. Processing of applications sent from other channels is not guaranteed.

    From this employer

    Recent blogs

    Recent news