Research Engineer - Domain Adaptation in NMT

June 15, 2023
Contract type : Fixed-term contract

Level of qualifications required : Graduate degree or equivalent

Fonction : Temporary scientific engineer


The chosen applicant will carry out research in the context of the DadaNMT Emergence project funded by Sorbonne Université. The aim of the project is to explore approaches to training MT models (i) for low-resource domains by using out-of-domain data (domain adaptation) and (ii) that are capable of handling texts from different domains (multi-domain models). We will apply our approaches across three domains: film subtitles (using different film genres), news (from different countries) and biomedical data. We will also test across different language pairs, focusing on at least the three following pairs: English-French, English-German and English-Romanian, chosen because of the availability of training and test data across the domains and because they represent three different scenarios in terms of the amount of data available. The three chosen domains have freely available parallel data available: film subtitles from OpenSubtitles (Lison et al. 2018), news data from the WMT news translation shared tasks (Barrault et al. 2019) and biomedical data from the WMT biomedical MT shared tasks (Neves et al. 2019).


The aim of the topic will be to explore directions for the adaptation of machine translation (MT) to low-resource domains and/or the development of MT models capable of handling several domains, while maintaining performance of the model on those domains that are well represented (i.e. avoid catastrophic forgetting). There are several possible directions that could be explored depending on the interest and experience of the candidate, including:

  • Curriculum learning: how to best introduce training examples such that the model can learn better (e.g. from simpler to more complex examples, transitioning from one domain to another, etc.) (Platanios et al., 2019; Zhan et al., 2021)

  • Meta-learning: how to best initialise a model in order for it to be able to robustly adapt to new domains, particularly those that have few training examples. (Sharaf et al., 2020; Zhan et al., 2021)

  • Using large-scale pretrained LLMs to see how well they can be adapted to translation for specific domains. This could either involve conversational LLMs or traditional LLMs. In the first case, potential approaches could include the decomposition of the translation of new examples into composite tasks via chain-of-thought prompting (Wei et al., 2022; Wang et al., 2022) or the training of composite tasks (Bursztyn et al., 2022) each of lower simplicity (e.g. word translation, formulation, reformulation) and also translation refinement using adapted prompts. In the second case, the approach could follow prompt selection and example selection.

  • Main activities

    The main activities include carrying out reading on the proposed topic, experimenting with baselines (previous work) and proposing and implementing new solutions. The work carried out will be presented both within the team and internationally (should the work be accepted as a peer-reviewed conference or workshop).


    We are looking for an applicant with a good experience in machine learning, natural language processing and a strong interest for linguistics. The applicant must have a good level of written and spoken English.

    Benefits package
  • Subsidized meals
  • Partial reimbursement of public transport costs
  • Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
  • Possibility of teleworking
  • Professional equipment available (videoconferencing, loan of computer equipment, etc.)
  • Social, cultural and sports events and activities
  • Access to vocational training
  • Social security coverage
  • General Information
  • Theme/Domain : Language, Speech and Audio
  • Town/city : Paris
  • Inria Center : Centre Inria de Paris
  • Starting date : 2023-08-01
  • Duration of contract : 4 months
  • Deadline to apply : 2023-06-15
  • Contacts
  • Inria Team : ALMANACH
  • Recruiter : Bawden Rachel / [email protected]
