ENACT - Modèle articulatoire du conduit vocal générique indépendant de la langue et du locuteur // ENACT - Language and speaker independent generic articulatory model of the vocal tract
ABG-129290
ADUM-62379 |
Thesis topic | |
2025-03-08 | Public funding alone (i.e. government, region, European, international organization research grant) |
Université de Lorraine
Vandoeuvre lès Nancy cedex - France
ENACT - Modèle articulatoire du conduit vocal générique indépendant de la langue et du locuteur // ENACT - Language and speaker independent generic articulatory model of the vocal tract
Intelligence artificielle, traitement automatique de la parole, IRM temps réel, Synthèse de la parole
Artificial intelligence, automatic speech processing, real-time MRI, Speech synthesis
Artificial intelligence, automatic speech processing, real-time MRI, Speech synthesis
Topic description
Le projet de doctorat proposé vise à améliorer la synthèse de la parole multilingue en prenant en compte la dynamique temporelle du conduit vocal. Actuellement, les méthodes existantes utilisent des représentations statiques des phonèmes qui ne capturent pas les phénomènes d'anticipation et de coarticulation, essentiels à la parole naturelle. L'objectif du projet est de créer un modèle dynamique du conduit vocal qui puisse s'adapter à n'importe quelle langue et locuteur. Le travail repose sur l'utilisation de données d'IRM temps réel, permettant de visualiser l'évolution du conduit vocal à une fréquence de 50 Hz.
Le projet se divise en trois étapes : 1) recalage anatomique des données d'IRM afin de les aligner dans un repère anatomique unique ; 2) construction d'un modèle articulatoire générique qui intègre les dynamiques des différentes langues et locuteurs ; 3) adaptation de ce modèle à une langue non présente dans la base de données initiale.
Cette offre de doctorat est proposée par le cluster ENACT AI et ses partenaires. Retrouvez toutes les offres et actions de doctorat ENACT sur https://cluster-ia-enact.ai/.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
CONTEXT
Current methods for multilingual acoustic speech synthesis [1] rely on static phoneme representations, using phonological databases. Although they allow the phonemes of all languages to be “immersed” in a single space, in order to merge acoustic databases to synthesize the speech of poor-resourced languages, they do not capture the temporal dynamics of the vocal tract corresponding to the anticipation and coarticulation phenomena of natural speech. Phenomena of anticipation and coarticulation [2] are essential for the realization of phonetic contrasts. Moreover, articulatory gestures depend on individual anatomy (shape of the hard palate for instance) and require millimetric precision to guarantee the expected acoustic properties.
This PhD offer is provided by the ENACT AI Cluster and its partners. Find all ENACT PhD offers and actions on https://cluster-ia-enact.ai/.
OBJECTIVE
This project aims to synthesize the temporal evolution of the vocal tract for any language and any speaker. It falls within the field of articulatory synthesis, seeking to model and simulate the physical process of human speech production via advanced approaches.
The work will make use of real-time MRI databases [3] (see references in the pdf or the French version), which provide images of the evolution of the geometric shape of the vocal tract in the medio-sagittal plane at a frequency of 50 Hz. This frequency is sufficient to capture articulator gestures during speech production.
The task will be to build a dynamic model of the vocal tract that can be adapted to a specific language and speaker from these data.
WORK
The work will involve three stages:
(i) anatomical registration of real-time MRI data, with the aim of representing all gestures in a single anatomical landmark.
(ii) construction of a generic articulatory model merging the dynamics of the languages and speakers in the database used.
(iii) adaptation of the generic model to a language not included in the original database.
The first step, i.e. anatomical registration relies on the search for visible and robustly identifiable anatomical points on the MRI images. Of the numerous registration techniques available, we prefer those that explicitly identify anatomical points, so that we can link an anatomical transformation to the articulators concerned.
The second step is to develop a generic dynamic model capable of taking all articulation points into account. In the model we built previously [3], we used discrete phonetic labels, which limits the model to a language whose articulation points correspond exactly to the phonemes of the database language. To obtain a generic model, we need to move on to continuous coding covering the entire vocal tract.
The third step will be to adapt the generic model to a specific language described by its places of articulation and a speaker described by anatomical points. This model can be used in conjunction with multilingual acoustic synthesis, or as input for acoustic simulations.
ENVIRONMENT
Our two teams have already been working closely together for several years on deep learning to model articulatory gestures, making extensive use of dynamic MRI data. We are one of the leading teams in the use of real-time MRI for automatic speech processing. The PhD student will have access to the vast databases already acquired. It will also be possible to acquire complementary data using the MRI system available in the IADI laboratory.
The PhD student will also have the opportunity to attend one or two summer schools and conferences on MRI and automatic speech processing.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Début de la thèse : 01/10/2025
Le projet se divise en trois étapes : 1) recalage anatomique des données d'IRM afin de les aligner dans un repère anatomique unique ; 2) construction d'un modèle articulatoire générique qui intègre les dynamiques des différentes langues et locuteurs ; 3) adaptation de ce modèle à une langue non présente dans la base de données initiale.
Cette offre de doctorat est proposée par le cluster ENACT AI et ses partenaires. Retrouvez toutes les offres et actions de doctorat ENACT sur https://cluster-ia-enact.ai/.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
CONTEXT
Current methods for multilingual acoustic speech synthesis [1] rely on static phoneme representations, using phonological databases. Although they allow the phonemes of all languages to be “immersed” in a single space, in order to merge acoustic databases to synthesize the speech of poor-resourced languages, they do not capture the temporal dynamics of the vocal tract corresponding to the anticipation and coarticulation phenomena of natural speech. Phenomena of anticipation and coarticulation [2] are essential for the realization of phonetic contrasts. Moreover, articulatory gestures depend on individual anatomy (shape of the hard palate for instance) and require millimetric precision to guarantee the expected acoustic properties.
This PhD offer is provided by the ENACT AI Cluster and its partners. Find all ENACT PhD offers and actions on https://cluster-ia-enact.ai/.
OBJECTIVE
This project aims to synthesize the temporal evolution of the vocal tract for any language and any speaker. It falls within the field of articulatory synthesis, seeking to model and simulate the physical process of human speech production via advanced approaches.
The work will make use of real-time MRI databases [3] (see references in the pdf or the French version), which provide images of the evolution of the geometric shape of the vocal tract in the medio-sagittal plane at a frequency of 50 Hz. This frequency is sufficient to capture articulator gestures during speech production.
The task will be to build a dynamic model of the vocal tract that can be adapted to a specific language and speaker from these data.
WORK
The work will involve three stages:
(i) anatomical registration of real-time MRI data, with the aim of representing all gestures in a single anatomical landmark.
(ii) construction of a generic articulatory model merging the dynamics of the languages and speakers in the database used.
(iii) adaptation of the generic model to a language not included in the original database.
The first step, i.e. anatomical registration relies on the search for visible and robustly identifiable anatomical points on the MRI images. Of the numerous registration techniques available, we prefer those that explicitly identify anatomical points, so that we can link an anatomical transformation to the articulators concerned.
The second step is to develop a generic dynamic model capable of taking all articulation points into account. In the model we built previously [3], we used discrete phonetic labels, which limits the model to a language whose articulation points correspond exactly to the phonemes of the database language. To obtain a generic model, we need to move on to continuous coding covering the entire vocal tract.
The third step will be to adapt the generic model to a specific language described by its places of articulation and a speaker described by anatomical points. This model can be used in conjunction with multilingual acoustic synthesis, or as input for acoustic simulations.
ENVIRONMENT
Our two teams have already been working closely together for several years on deep learning to model articulatory gestures, making extensive use of dynamic MRI data. We are one of the leading teams in the use of real-time MRI for automatic speech processing. The PhD student will have access to the vast databases already acquired. It will also be possible to acquire complementary data using the MRI system available in the IADI laboratory.
The PhD student will also have the opportunity to attend one or two summer schools and conferences on MRI and automatic speech processing.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Début de la thèse : 01/10/2025
Funding category
Public funding alone (i.e. government, region, European, international organization research grant)
Funding further details
Concours pour un contrat doctoral
Presentation of host institution and host laboratory
Université de Lorraine
Institution awarding doctoral degree
Université de Lorraine
Graduate school
77 IAEM - INFORMATIQUE - AUTOMATIQUE - ELECTRONIQUE - ELECTROTECHNIQUE - MATHEMATIQUES
Candidate's profile
Master en informatique ou en mathématiques appliquées
Le candidat doit avoir une solide expérience en apprentissage profond, en mathématiques appliquées et en informatique. Des connaissances en traitement de la parole et de l'IRM seront également appréciées.
Master in computer science or applied mathematics The applicant should have a solid background in deep learning, applied mathematics and computer sciences. Knowledge in speech and MRI processing will be also appreciated.
Master in computer science or applied mathematics The applicant should have a solid background in deep learning, applied mathematics and computer sciences. Knowledge in speech and MRI processing will be also appreciated.
2025-04-28
Apply
Close
Vous avez déjà un compte ?
Nouvel utilisateur ?
More information about ABG?
Get ABG’s monthly newsletters including news, job offers, grants & fellowships and a selection of relevant events…
Discover our members
ADEME
CESI
Ifremer
ONERA - The French Aerospace Lab
MabDesign
ANRT
Tecknowmetrix
ASNR - Autorité de sûreté nucléaire et de radioprotection - Siège
PhDOOC
SUEZ
Laboratoire National de Métrologie et d'Essais - LNE
CASDEN
Aérocentre, Pôle d'excellence régional
MabDesign
Généthon
TotalEnergies
Groupe AFNOR - Association française de normalisation
Institut Sup'biotech de Paris
Nokia Bell Labs France
-
JobPermanentRef. ABG129192Association Bernard Gregory (ABG)Paris (3ème) - Ile-de-France - France
Business Developer (F/H)
Open to all scientific expertisesAny -
JobPermanentRef. ABG128675Mini Green PowerHyères - Provence-Alpes-Côte d'Azur - France
Ingénieur (e) / Chercheur (se) R&D – Innovation en énergie décarbonée
Process engineeringJunior -
JobPermanentRef. ABG129119Aguaro- Pays de la Loire - France
Docteur·e Data Environnement
Ecology, environmentJunior