Where PhDs and companies meet
Menu
Login

Already registered?

New user?

Optimal Transport for Anomaly Detection and Change Point Detection

ABG-124776 Thesis topic
2024-06-27 Public funding alone (i.e. government, region, European, international organization research grant)
LITIS, Université de Rouen
- Normandie - France
Optimal Transport for Anomaly Detection and Change Point Detection
  • Computer science
  • Mathematics
  • Digital
Optimal Transport, Anomaly Detection, Change Point Detection

Topic description

PhD position in

Optimal Transport for Anomaly Detection and Change Point Detection

  • Location: LITIS, University of Rouen, Saint Etienne du Rouvray
  • Duration: 36 months
  • Expected starting: October/November 2024

Optimal transport (OT) [1] is a powerful framework to define/compute distances between distributions (aka Wasserstein or Earth mover's distance), with tractable computation thanks to the Sinkhorn algorithm, of which an online version has recently been proposed [2]. Furthermore, OT can represent the transport map between the distributions. In this thesis, we intend to use OT theory to design statistical tests and algorithms that deal with Out-Of-Distribution (OOD) detection and Change Point Detection (CPD) in a non-parametric setting, operating over sliding windows. Specifically, we will aim to localize outlying samples in an online manner. Even at low rates, detecting and localizing efficiently abnormal situations can be paramount.

 

In a first time, the goal of this PhD is to devise methods to spot the abnormal samples from the distributions. While computing the discrepancy between distributions with OT may only assess how close these distributions are, a detailed assignment information resides in the transport (coupling) map. By studying how the assignment resulting from the partial or unbalanced formulation of OT, which transports only a given fraction α of the total probability mass [3], can be used in the out-of-distribution and outliers scenarios, the PhD student will design statistical tests allowing to estimate the proportion α of out-of-distribution samples. For example by investigating the randomization for varying values of α via a bootstrap procedure on the samples of compared sliding windows.

Another promising research direction is to investigate the statistics of the distance between the “reference” marginals (computed from normal operating conditions) and the reconstructed marginals w.r.t. λ to implement statistical tests for localizing abnormal samples.

 

In a second time, he/she will investigate Change Point Detection and concept drift in incomparable spaces. A large body of work has been dedicated to two-sample tests including Maximum Mean Discrepancy or OT. Some preliminary studies have demonstrated the interest of OT for sequential anomaly detection, such as in [5, 6]. Recently, sequential CPD has been tackled based on multivariate ranks via OT maps [7]. All these methods are restricted to distributions with support in the same space and few works consider CPD in incomparable spaces. One can consider Gromov-Wasserstein distance to derive statistical tests but at a high computation cost. Rather, in [8], local distributions of distances computed in each space are used to define a pseudo-metric between distributions in unregistered spaces leading to a statistical two-sample test. In ̇ODD, we aim to tackle the challenging problem of sequential CPD in incomparable spaces by investigating two different approaches.

 

The objectives are the followings :

  • Statistical tests design with application on anomaly detection on toy data
  • Inclusion in a deep-learning framework for real data from well-known benchmark
  • Change Point Detection in incomparable spaces

This project is part of the ODD: Online Deep anomaly Detection project funded by ANR.

 

[1] G. Peyré, M. Cuturi, et al., “Computational optimal transport: With application to data science,” Foundations and Trends® in Machine Learning

[2 ]A. Mensch and G. Peyré, “Online sinkhorn: Optimal transport distances from sample streams,” in NeurIPS 2020

[3] L. Chapel, M. Z. Alaya, and G. Gasso, “Partial Optimal Transport with Applications
on Positive-Unlabeled Learning,” in NeurIPS, 2020.

 

[4]A. Ramdas, N. Garcia Trillos, and M. Cuturi, “On wasserstein two-sample testing

and related families of nonparametric tests,” Entropy, vol. 19, no. 2, p. 47, 2017.

 

[5] A. Alaoui-Belghiti, S. Chevallier, and E. Monacelli, “Unsupervised anomaly detection

using optimal transport for predictive maintenance,” in ICANN, pp. 686–697, 2019.

 

[6] K. C. Cheng, S. Aeron, M. C. Hughes, E. Hussey, and E. L. Miller, “Optimal

transport based change point detection and time series segment clustering,” in IEEE

ICASSP, pp. 6034–6038, 2020.

 

[7] M. Werenski, S. B. Masud, J. M. Murphy, and S. Aeron, “On Rank Energy Statistics

via Optimal Transport: Continuity, Convergence, and Change Point Detection,”

arXiv:2302.07964, 2023.

 

[8] C. Brécheteau, “A statistical test of isomorphism between metric-measure spaces

using the distance-to-a-measure signature,” Electronic Journal of Statistics, vol. 13,

no. 1, pp. 795–849, 2019.

Funding category

Public funding alone (i.e. government, region, European, international organization research grant)

Funding further details

ANR

Presentation of host institution and host laboratory

LITIS, Université de Rouen

.

Le LITIS, Laboratoire d’Informatique, du Traitement de l’Information et des Systèmes, est une unité de recherche (UR 4108) en sciences et technologies de l’information rattachée à l’Université de Rouen Normandie (URN), l’Université Le Havre Normandie (ULHN) et l’Institut National des Sciences Appliquées de Rouen Normandie (INSARN). Le LITIS est né en janvier 2006 de la volonté des membres des laboratoires STIC de Haute-Normandie, et des trois établissements de tutelle, d’unir leurs forces pour valoriser les synergies existantes et renforcer leur visibilité (historique). 

Comprendre la nature profonde de l’information et sa représentation est au cœur du projet scientifique du LITIS, qui couvre un large spectre des STIC, de la recherche fondamentale aux domaines appliqués, la démarche du LITIS est résolument pluridisciplinaire, associant praticiens et théoriciens à la jonction de l’informatique, de l’intelligence artificielle, du traitement du signal et des images et des mathématiques, avec des applications dans les systèmes de mobilité intelligents, le traitement de l’information en santé et la valorisation du patrimoine.

PhD title

Doctorat en Informatique

Country where you obtained your PhD

France

Institution awarding doctoral degree

Université de Rouen

Graduate school

Mathématiques, Information, Ingénierie des Systèmes (MIIS)

Candidate's profile

  • Master degree or Engineer title, in mathematical engineering or data science.
  •  Solid programming skills in Python
  • Good knowledge of statistics;

 

2024-08-31
Partager via
Apply
Close

Vous avez déjà un compte ?

Nouvel utilisateur ?