# PhD position in Machine Learning for Particle Accelerators

- Employer
- IJCLab, Orsay
- Location
- Paris, France
- Salary
- Unspecified
- Posting live until
- 3 Oct 2024

## View more categoriesView less categories

Particle accelerators play a pivotal role in modern physics, enabling the exploration of fundamental particles and phenomena. Ensuring the efficient and stable operation of these complex systems is crucial for the success of various scientific experiments. This research proposal aims to investigate the application of cutting-edge machine learning techniques, specifically Neural Ordinary Differential Equations (NODEs) and Reinforcement Learning (RL), and analog models of computation for anomaly detection in particle accelerators.

Particle accelerators consist of intricate components, and their optimal functioning is essential for obtaining accurate experimental results. Anomalies in accelerator performance can lead to data corruption, experimental failure, or even equipment damage. Traditional monitoring methods often fall short in identifying subtle deviations or predicting potential issues, necessitating the integration of advanced machine learning approaches.

From an application point of view, this research aims to develop novel anomaly detection methods for particle accelerators using machine learning techniques.

Specifically, the goals include:

- Exploring the application of Ordinary Differential Equations based models, and in particular Neural Ordinary Differential Equations (NODEs), to model the dynamic behavior of particle accelerators.
- Investigating Reinforcement Learning (RL) to enhance anomaly detection and optimize control strategies in real-time.
- Investigation methods based on polynomial ODEs (pODEs) in this context,
- Developing a comprehensive framework that integrates NODEs and RL to provide a robust and adaptive anomaly detection system for particle accelerators.
**General context: Computer Science/Models of deep learning**

With no contests, models, and approaches from deep learning have revolutionized machine learning. It is well known that when the number of layers increases (so-called very deep models, with sometimes more than 100 or 1000 layers), the models become very hard to train. Among a plethora of options that have been considered, Residual Neural Networks (ResNets) [9] have very clearly emerged as an important subclass of models. They mitigate the gradient issues [1] arising when training the deep neural networks. The idea in these particular models is to add skip connections between the successive layers, an idea partially bio-inspired. Since residual neural network was used and won the ImageNet 2015 competition, this architecture became the most cited neural network of the 21st century according to some studies (see references in Wikipedia). To date, the winners of this competition have been variations of such models.

Some authors, such as [17], proved that there is a mathematical explanation for their performance in practice, as the discrete-time process used in these models can be proved to be the Euler discretization of some continuous time Ordinary Differential Equation (ODE). The observed obtained robustness and training properties come then from the well-known robustness of ODEs concerning perturbation and perturbation of their initial conditions.

It was later realized and proved mathematically that various efficient models are nothing but reformulations of discretization schemes for ODEs. For example, following [13], the architecture of PolyNet [18] can be viewed as an approximation to the backward Euler scheme solving the ODEut = f(u). Fractalnet [12] can be read as a well-known Runge-Kutta scheme in numerical analysis. RevNet [7] can be interpreted as a simple forward Euler approximation of some simple continuous dynamical system. All these models are very deep models, but this remains true for simpler models. For example, following [11], it transpires that the key features of a well-known GRU [6] or an LSTM [10], over generic recurrent networks, are updated rules that look suspiciously like discretized differential equations.

This led to the consideration of some models such as Neural ODE (NODE) [5], which can be seen as continuous versions of ResNet. While Neural ODEs do not necessarily improve upon the sheer predictive performance of ResNets, they offer the vast knowledge of ODE theory to be applied to deep learning research. For instance, the authors in [8] discovered that Neural ODEs are more robust for specific perturbations than convolutional neural networks. Moreover, inspired by the theoretical properties of the solution curves, they proposed a regulariser that improved the robustness of Neural ODE models even further.

In relation to the above context related to physics, we have already started to explore NODEs in this context, and we proved their high performance. The context was a PhD thesis about controlling the particle accelerator ThomX. This is a complex system that requires static control and dynamic to produce some X-rays. The provided solutions are based on Neural ODEs and turn out to provide good performances in this context. We want to explore the limits of such an approach and whether it extends to other contexts, as described above.

In an orthogonal world, Neural ODEs turn out to be an analog model of computation. However the connection between these two worlds has not been explored yet, and we believe a real benefit could be obtained from the very recent results in this field.

**Analog models of computation**

Today’s theoretical computer science, and in particular classical computability and complexity, consider mostly computations over a discrete space with a discrete space. This aims at modeling today’s computers, which are digital computers working over bits. This covers today’s machines and classical models such as Turing machines, which work over words over a finite alphabet in a discrete time.

However, machines where time is continuous can be considered and built. Such machines are analog and work over continuous quantities like voltage. Notice that the first ever built programmable computers were analog machines. This includes, for example, the differential analyzers that were first mechanical machines working over quantities encoded by angles of shafts and later on electronic machines working over quantities like continuous voltages. Such machines were typically used to solve ordinary differential equations. This also includes Neural ODE models.

It turns out that the corresponding computability and complexity theory has not received so much attention, even if models of computation where space could be continuous and time remains discrete have been considered (see e.g. [2], or [16]), these models are still discrete time.

The purpose of this work is to focus on these latter models and their relations to the deep learning model in the applicative context.

In the context of analog models of computation, for many reasons related to recent results, the equivalent of a Turing machine can be considered as a polynomial ordinary differential equation (pODE). Indeed, a 2

(projection of a) solution of such a pODE, which could be considered as the analog of computable functions enjoy many stability properties similar to the stability properties of computable functions. All common analytic functions are in this class, an observation similar to the fact that all common functions in mathematics are computable. Such functions are stable by most of the operations (addition, multiplication, subtraction, division, inverse, composition, ODE solving, etc. . . ). Some analytic computable functions are known not to be in this class. However, if a modern definition of computability is considered for pODEs, then computable functions for Turing machines and by pODEs coincide. Etc . . .

We involve Olivier Bournez, an expert on computability and complexity issues related to continuous time models of computation, and in particular, models based on ordinary differential equations. In particular, he knows how to program with ordinary differential equations and how to measure complexity for such models: see e.g. [4, 3, 14] for surveys. He used this knowledge in various contexts to solve some open problems in bioinformatics, applied mathematics, and other contexts.

**Description of the work**

We propose to combine the two approaches to explore how far the results from analog models of computation can be used in the context of deep learning. This will be guided by the applicative context described above.

In the applicative context:

- NODEs offer a unique approach to modeling continuous-time dynamics. By applying NODEs to accelerator data, we aim to capture the underlying system dynamics more accurately than traditional discrete-time models.
- Reinforcement Learning (RL): RL techniques will enable the anomaly detection system to adapt and optimize its actions based on real-time feedback. This includes exploring algorithms such as Proximal Policy Optimization (PPO) and Deep Deterministic Policy Gradients (DDPG).
- Polynomial ODEs provide a subclass of ODEs whose parameterization is easy and that has not yet been explored in the context of learning.
- Integration: The developed NODE and RL models will be integrated into a unified framework. This integration aims to leverage the strengths of both approaches, providing a more comprehensive and effective anomaly detection system.
- We expect practical outcomes but even more foundational statements:
- For example, we expect to provide contributions to the following questions:
- Can we learn polynomial ODEs instead of today’s usual techniques for learning ODEs? This can provide a
- nice alternative with clear theoretical foundations. We propose to explore the learnability of dynamical systems such as Lorrens’s attractor. We proved it cannot be learned using the classical approach, but we believe it could be learned if we work with polynomial ODEs and not generic ODEs as in the classical approach.
- As another example: can uniform approximation theorem be related to the complexity of the involved function? We believe it is possible to prove that a polynomial time computable function corresponds exactly to a depth 2 neural network, whose coefficients can be computed in polynomial time. Such networks have arbitrary widths, as in the classical proof of approximation theorem. We believe this also holds for fixed width but of arbitrary depth. Then, possibly to ResNet networks.
- All these statements can lead to a hierarchy of hardness (a kind of Kolmogorov complexity) for the learnability of functions according to their complexity. These remain theoretical statements. However, we believe that this can also lead to practical issues, like the possibility of learning functions that currently cannot be learned, as the current approaches are not using what is known since old times in the context of analog models of computation.
- Please contact us if interested or in case of questions.

**Involved teams:**

1. Johanne Cohen : CNRS, LISN, johanne.cohen@universite-paris-saclay.fr

• Computer-Scientist. LISN. Already working with 2, Hayg Guler.

• Expertise: algorithms, deep learning methods for Neural ODEs (work done with Hayg Guler)

2. Hayg GULER : CNRS, IJCLAB ,, IN2P3 hayg.guler@ijclab.in2p3.fr

• Physicist,

• Expertise: particle accelerator, optimization, learning methods

3. Olivier Bournez, Professor, Ecole Polytechnique. bournez@lix.polytechnique.fr

• Computer-Scientist. His research is related to analog models of computation,

• Expertise: analog models of computation, computations with ODEs, and Neural ODEs.

## Get job alerts

Create a job alert and receive personalised job recommendations straight to your inbox.

Create alert