Toward more frugal deep learning architectures with mixtures of experts
Latest research in deep learning has led to impressive results on many computer vision tasks (object recognition and detection, image generation from text, etc.). An important part of the quality of these results comes from the use of huge parametric models (e.g. GPT-3 has 175 billion parameters) and the use of huge image databases (for example, LAYON 5B has more than 5 billion image/text pairs). The size of the models and datasets raises important questions about the computational resources that must be used to train and exploit these models, since only the major economic players can now have access to them. In addition, the consumption of computing power and memory of such technology has a strong environmental impact. It is therefore important to find new approaches that are more frugal and simpler to learn, but that maintain at least today's performances, in order to better respond to the challenges of tomorrow's AI.
Recent works (Rajbhandari et al 2022; Lewis et. al 2021; Fedus et al 2022) have shown the relevance of using mixture of expert (MoE) models (Jacobs et al 1991; Masoudnia et al 2014) to build efficient, more resource-efficient models. These approaches have been proposed in the context of language models. The thesis aims at generalizing these works to other modalities, such as images. We plan to build on MoE models introduced in the image domain, such as (Chen 2019; Wang 2019) and study the possible developments that the above- mentioned recent literature can bring.
During this thesis, we will be interested in building model whose architecture (e.g., their number of layers) could vary dynamically depending on the difficulty of the examples to be processed or on computation time constraints. Similar to the cascade techniques in Adaboost classifiers (Freund and Schapire 1997), this type of architecture could quickly classify most images and would deploy more resources for "difficult examples" or those close to the decision frontier. The study will focus on defining new network architectures that evolve with the examples processed and can adapt its energy cost to the difficulty of the task at hand.
References - Yoav Freund et Robert Schapire (1997), « A decision-theoretic generalization of on-line learning and an application to boosting », Journal of Computer and System Sciences, vol. 55, no 1, 1997, p. 119-139 - Robert A. Jacobs, Michael I. Jordan, Steven J. Nowlan et Geoffrey E. Hinton (1991), «Adaptive Mixtures of Local Experts», Neural Computation, p79–87, doi : 10.1162/neco.19126.96.36.199. - Samyam Rajbhandari, Conglong Li, Zhewei Yao, Minjia Zhang, Reza Yazdani Aminabadi, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He, « DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale. », Proceedings of the 39th International Conference on Machine Learning, PMLR 162:18332-18346, 2022. - Mike Lewis, Shruti Bhosale, Tim Dettmers, Naman Goyal and Luke Zettlemoyer , « BASE Layers: Simplifying Training of Large, Sparse Models »,ICML 2021 - William Fedus, Barret Zoph and Noam Shazeer, « Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity », Journal of Machine Learning Research 2022. - Masoudnia, S., Ebrahimpour, R.: Mixture of experts: a literature survey. Artif. Intell. Rev. 42(2) (2014) 275–293 - Zhourong Chen, Yang Li, Samy Bengio et Si Si, « You Look Twice : GaterNet for Dynamic Filter Selection in CNNs » CVPR 2019 - Xin Wang, Fisher Yu, Lisa Dunlap, Yi-An Ma, Ruth Wang, Azalia Mirhoseini, Trevor Darrell et Joseph E. Gonzalez, « Deep Mixture of Experts via Shallow Embedding » UAI 2019
Candidates must have an M.Sc. or engineering degree in a field related to computer science, electrical engineering, or applied mathematics, with strong programming skills (in particular with deep learning frameworks). Experience with image processing will be a plus. Candidates are expected to have abilities to write scientific reports and communicate research results at conferences in English.
Information and application
The position is starting as soon as possible with a salary of 32kEuros gross, and will be located in Caen, France. Applications should include the following documents in electronic format : i) A short motivation letter stating why you are interested in this project, ii) A detailed CV describing your past research background related to the position, iii) The transcripts for master degrees. iv) The contact information for three references (do not include the reference letters with your applications, as we will only ask for the reference letters for short-listed candidates)
Please send your application package to Alexis Lechervy and Frederic Jurie.
Ideally located in the heart of Normandy, two hours from Paris and just 10 minutes away from the beaches, Caen, William the Conqueror's hometown, is a lively and dynamic city.
Funding category: Contrat doctoral
PHD title: Informatique
PHD Country: FranceOffer Requirements Specific Requirements
Candidates must have an M.Sc. or engineering degree in a field related to computer science, electrical engineering, or applied mathematics, with strong programming skills (in particular with deep learning frameworks). Experience with image processing will be a plus. Candidates are expected to have abilities to write scientific reports and communicate research results at conferences in English.Contact Information