Où docteurs et entreprises se rencontrent
Menu
Connexion

Vous avez déjà un compte ?

Nouvel utilisateur ?

ENACT - Modèle articulatoire du conduit vocal générique indépendant de la langue et du locuteur // ENACT - Language and speaker independent generic articulatory model of the vocal tract

ABG-129290
ADUM-62379
Sujet de Thèse
08/03/2025 Contrat doctoral
Université de Lorraine
Vandoeuvre lès Nancy cedex - France
ENACT - Modèle articulatoire du conduit vocal générique indépendant de la langue et du locuteur // ENACT - Language and speaker independent generic articulatory model of the vocal tract
Intelligence artificielle, traitement automatique de la parole, IRM temps réel, Synthèse de la parole
Artificial intelligence, automatic speech processing, real-time MRI, Speech synthesis

Description du sujet

Le projet de doctorat proposé vise à améliorer la synthèse de la parole multilingue en prenant en compte la dynamique temporelle du conduit vocal. Actuellement, les méthodes existantes utilisent des représentations statiques des phonèmes qui ne capturent pas les phénomènes d'anticipation et de coarticulation, essentiels à la parole naturelle. L'objectif du projet est de créer un modèle dynamique du conduit vocal qui puisse s'adapter à n'importe quelle langue et locuteur. Le travail repose sur l'utilisation de données d'IRM temps réel, permettant de visualiser l'évolution du conduit vocal à une fréquence de 50 Hz.
Le projet se divise en trois étapes : 1) recalage anatomique des données d'IRM afin de les aligner dans un repère anatomique unique ; 2) construction d'un modèle articulatoire générique qui intègre les dynamiques des différentes langues et locuteurs ; 3) adaptation de ce modèle à une langue non présente dans la base de données initiale.


Cette offre de doctorat est proposée par le cluster ENACT AI et ses partenaires. Retrouvez toutes les offres et actions de doctorat ENACT sur https://cluster-ia-enact.ai/.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------

CONTEXT
Current methods for multilingual acoustic speech synthesis [1] rely on static phoneme representations, using phonological databases. Although they allow the phonemes of all languages to be “immersed” in a single space, in order to merge acoustic databases to synthesize the speech of poor-resourced languages, they do not capture the temporal dynamics of the vocal tract corresponding to the anticipation and coarticulation phenomena of natural speech. Phenomena of anticipation and coarticulation [2] are essential for the realization of phonetic contrasts. Moreover, articulatory gestures depend on individual anatomy (shape of the hard palate for instance) and require millimetric precision to guarantee the expected acoustic properties.

This PhD offer is provided by the ENACT AI Cluster and its partners. Find all ENACT PhD offers and actions on https://cluster-ia-enact.ai/.
OBJECTIVE
This project aims to synthesize the temporal evolution of the vocal tract for any language and any speaker. It falls within the field of articulatory synthesis, seeking to model and simulate the physical process of human speech production via advanced approaches.

The work will make use of real-time MRI databases [3] (see references in the pdf or the French version), which provide images of the evolution of the geometric shape of the vocal tract in the medio-sagittal plane at a frequency of 50 Hz. This frequency is sufficient to capture articulator gestures during speech production.
The task will be to build a dynamic model of the vocal tract that can be adapted to a specific language and speaker from these data.

WORK
The work will involve three stages:
(i) anatomical registration of real-time MRI data, with the aim of representing all gestures in a single anatomical landmark.
(ii) construction of a generic articulatory model merging the dynamics of the languages and speakers in the database used.
(iii) adaptation of the generic model to a language not included in the original database.

The first step, i.e. anatomical registration relies on the search for visible and robustly identifiable anatomical points on the MRI images. Of the numerous registration techniques available, we prefer those that explicitly identify anatomical points, so that we can link an anatomical transformation to the articulators concerned.

The second step is to develop a generic dynamic model capable of taking all articulation points into account. In the model we built previously [3], we used discrete phonetic labels, which limits the model to a language whose articulation points correspond exactly to the phonemes of the database language. To obtain a generic model, we need to move on to continuous coding covering the entire vocal tract.

The third step will be to adapt the generic model to a specific language described by its places of articulation and a speaker described by anatomical points. This model can be used in conjunction with multilingual acoustic synthesis, or as input for acoustic simulations.

ENVIRONMENT
Our two teams have already been working closely together for several years on deep learning to model articulatory gestures, making extensive use of dynamic MRI data. We are one of the leading teams in the use of real-time MRI for automatic speech processing. The PhD student will have access to the vast databases already acquired. It will also be possible to acquire complementary data using the MRI system available in the IADI laboratory.
The PhD student will also have the opportunity to attend one or two summer schools and conferences on MRI and automatic speech processing.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Début de la thèse : 01/10/2025

Nature du financement

Contrat doctoral

Précisions sur le financement

Concours pour un contrat doctoral

Présentation établissement et labo d'accueil

Université de Lorraine

Etablissement délivrant le doctorat

Université de Lorraine

Ecole doctorale

77 IAEM - INFORMATIQUE - AUTOMATIQUE - ELECTRONIQUE - ELECTROTECHNIQUE - MATHEMATIQUES

Profil du candidat

Master en informatique ou en mathématiques appliquées Le candidat doit avoir une solide expérience en apprentissage profond, en mathématiques appliquées et en informatique. Des connaissances en traitement de la parole et de l'IRM seront également appréciées.
Master in computer science or applied mathematics The applicant should have a solid background in deep learning, applied mathematics and computer sciences. Knowledge in speech and MRI processing will be also appreciated.
28/04/2025
Partager via
Postuler
Fermer

Vous avez déjà un compte ?

Nouvel utilisateur ?