2023-05787 - PhD Position F/M Privacy-preserving decentralized learning through Model fragmentation and Private Aggregation
Contract type : Fixed-term contract
Level of qualifications required : Graduate degree or equivalent
Fonction : PhD Position
About the research centre or Inria departmentThe Inria Rennes - Bretagne Atlantique Centre is one of Inria's eight centres and has more than thirty research teams. The Inria Center is a major and recognized player in the field of digital sciences. It is at the heart of a rich R&D and innovation ecosystem: highly innovative PMEs, large industrial groups, competitiveness clusters, research and higher education players, laboratories of excellence, technological research institute, etc.
ContextThe WIDE team is involved in a number of projects that tackle related problems. In the context of the SOTERIA H2020, Davide FREY is currently working on decentralized and privacy-preserving machine learning algorithms using trusted execution environments. This thesis provides a complementary approach, and there is thus a concrete possibility to directly apply the results of this Ph.D. thesis to the Personal Data Vault being developed by the SOTERIA project. Davide Frey is also active, with François Taiani, in the FedMalin Inria Challenge, which also investigates decentralized machine learning platforms. In particular, in the context of FedMalin, WIDE is currently developing a library for decentralized machine learning that can be exploited by this thesis. Moreover, we envision close collaboration with other teams involved in the FedMalin project. In addition to the collaborations we mentioned above with the partners of the SOTERIA H2020 project and of the FedMalin project, we are planning to collaborate with Anne-Marie Kermarrec's group at EPFL.
AssignmentMachine learning consists in producing (learning) a computer-based function (usually referred to as a model) from examples (training data). The accuracy and quality of the resulting model are usually directly related to the size of the training data, but training from very large datasets raises at least two problems. First, very large training sets require substantial computing power to train the model in a reasonable time. Second, as machine learning is increasingly applied to sensitive and personal data (e.g. health records, personal messages, user preferences, browsing histories), exposing this data to the learning algorithm raises far-reaching privacy-protection concerns and carries important risks of privacy violation.
These two problems have prompted the emergence of a range of distributed learning techniques, which seek to distribute the learning effort on many machines to scale the learning process and limit privacy leaks by keeping sensitive data on the learning devices. Two related strategies have, in particular, emerged to address these challenges: Federated Learning, initially promoted by Google, and Decentralized Learning, which forgoes entirely any centralized entity in the learning process.
Unfortunately, recent works have shown that, in spite of their promises, both of these approaches can be subject to privacy attacks, such as membership inference, data reconstruction, or attribute inference, that make it possible for malicious participants to access private and or sensitive information through the learning process. This PhD aims to improve the privacy protection granted by decentralized learning by exploring how model fragmentation, a technique developed by the WIDE team within the ANR Pamela project (2016-2020), can be combined with private aggregation and random peer sampling, two of the strategies successfully applied to P2P networks.
Main activitiesThe Ph.D. will investigate
(1) how to organize model fragmentation to obtain privacy protection gains (e.g. which parts of a model are more sensitive than others),
(2) how to combine it with other obfuscation mechanisms (private epidemic aggregation, masking, randomization), and
(3) how to characterize the protection it brings in terms of privacy, and the costs (networks, time, loss of model quality) it causes.
We envision the following high-level work plan.
M0-M3: The PhD student will perform a thorough state-of-the-art of the existing attacks and countermeasures in the context of federated and decentralized learning.
M4-M12: The student will then leverage the state-of-the-art to design and implement a benchmark suite that incorporates the major attack techniques and will apply it to existing decentralized learning solutions. This will allow them to identify the strength and pitfalls of existing solutions.
M13-M24: They will then leverage the developed benchmark to test the privacy guarantees offered by the current model-fragmentation approach developed by the WIDE team. In particular, they will evaluate several fragmentation strategies in combination with different topology-management approaches.
M25-M30: The student will then focus on how to combine fragmentation with privacy-preserving averaging. The goal here consists in designing a hybrid protocol that can combine privacy- preserving steps with clear-text steps on fragmented models.
M31-M36: The final months will be devoted to writing the manuscript and on finalizing the pub- lications of the thesis results.
Good programming skills and a willingness to learn about new techniques (decentralized machine learning and privacy protection) are also crucial, as well as good writing skills and the ability to propose, present, and discuss new ideas in a collaborative setting.
Benefits packagemonthly gross salary amounting to 2051 euros for the first and second years and 2158 euros for the third year
General InformationTheme/Domain : Distributed Systems and middleware Statistics (Big data) (BAP E)
Town/city : Rennes
The candidate recruited for this Ph.D. should have a Master's Degree in Computer Science or equivalent, with a solid algorithmic and systems background, particularly regarding at least one of the folowing: distributed computer systems, machine learning, and/or mobile computing.
About InriaInria is the French national research institute dedicated to digital science and technology. It employs 2,600 people. Its 200 agile project teams, generally run jointly with academic partners, include more than 3,500 scientists and engineers working to meet the challenges of digital technology, often at the interface with other disciplines. The Institute also employs numerous talents in over forty different professions. 900 research support staff contribute to the preparation and development of scientific and entrepreneurial projects that have a worldwide impact.
Instruction to applyPlease submit online : your resume, cover letter and letters of recommendation eventually
For more information, please contact [email protected]
Defence Security : This position is likely to be situated in a restricted area (ZRR), as defined in Decree No. 2011-1425 relating to the protection of national scientific and technical potential (PPST).Authorisation to enter an area is granted by the director of the unit, following a favourable Ministerial decision, as defined in the decree of 3 July 2012 relating to the PPST. An unfavourable Ministerial decision in respect of a position situated in a ZRR would result in the cancellation of the appointment.
Recruitment Policy : As part of its diversity policy, all Inria positions are accessible to people with disabilities.
Warning : you must enter your e-mail address in order to save your application to Inria. Applications must be submitted online on the Inria website. Processing of applications sent from other channels is not guaranteed.