Models for the speech recognition of French variants in Ivory Coast.
Keywords: Machine Learning, Signal classification, Functional data analysis; French variants, Language science.
Project description
For several decades, there has been an Ivorian French, which has received the interest of many researchers in the socio-linguistic and linguistic fields. Phonological, lexical and syntactic studies have shown that this Ivory Coast French, which is composed of various varieties, contains specific recurrent elements in all the varieties, which attest to the existence of a local norm. Today, this French is largely influenced by Ivorian languages (through young speakers and other crossbred practices), while at the same time showing its own autonomy and evolution. There is therefore a linguistic community in Côte d'Ivoire, characterized by common behaviors and judgement.
There is still no application (software) adapted to Ivorian French, no speech synthesis made with this variety of French, nor even any Ivorian voice used for the automatic answering machines of telephone operators in Côte d'Ivoire.
For the past fifteen years, data on recorded speech in Côte d'Ivoire have been collected for large international corpora, structured according to types of social situations (work meetings, family meetings, interviews, etc.), types of speakers (age, gender, level of education, professions, etc.) and types of tasks (reading words, text, spontaneous speech, elicited speech, ecological collections). These archived sound data are transcribed and annotated with Praat software and made available for analysis.
This thesis has a double objective: 1) to propose models allowing to understand the structures (possibly spatial) of the French variants in Ivory Coast, 2) to provide software that can be integrated into computer solutions for French speech recognition in this context for example for terminals (smartphone or PC, answering machines, ...).
The mission will consist of :
Exploring the database of signals.
- Building methods (algorithms) to analyse and understand the structure of this data. This step will require the knowledge of Linguists ("feature engineering").
- Characterisation (by algorithms) of the different classes of signals in collaboration with the linguists.
- Highlighting a possible specialization of the signal classes (a mapping of the Ivory Coast for these signals).
- Building models (statistical, probabilistic, mathematical) specific to steps 2 to 4.
Deliverables
- Papers on the methods to be developed.
- Context-specific speech recognition software.
- Simulation production software for each class of speech signals.
Some examples of spin-offs of the results of the thesis work
- Speech dictation and applications.
- Implementation of an A.I. (Artificial Intelligence) allowing the automation of responses to certain requests from a service: customer service, ISD, ....
Profile required
- Master's degree in statistics, data science or equivalent.
- Good knowledge of R and Python software.
- Ability to work in a multidisciplinary context
Supervisors
- MOUSSA K. Richard, PR at the École Nationale Supérieure de Statistique et d'Économie Appliquée (Côte d'Ivoire)
- Beatrice Akissi Boutin, Researcher at the Department of European, American and Intercultural Studies of La Sapienza University, Rome, Italy (Position "French on the African continent") and at the Institute of Applied Linguistics of the University Félix Houphouët Boigny, Abidjan. HDR in Language Sciences (Université Paris Ouest Nanterre La Défense, France).
- Anne-Françoise Yao, PR at the University of Clermont Auvergne (France) and Lecturer at the Ecole Polytechnique Paris (France)
Application file
- A letter of motivation addressed to the Director of ENSEA;
- A recent detailed curriculum vitae (CV)
- A legalized copy of the High School Diploma;
- A legalized copy of the the degrees obtained after the High School Diploma and transcripts, and any other evidence that may support the application;
- A thesis research proposal of five (5) pages maximum clearly indicating the title, the problem, the objectives, the analysis approach, a literature review, the hypotheses and the expected results as well as the corresponding bibliographical references. Particular attention will be paid to the quality of the proposed research topic and its relevance to the research interests of ACE researchers or to economic development issues;
- Two (2) letters of recommendation justifying the student's research capacity (preferably two letters from professors qualified to supervise a thesis);
Applications including all certified documents are received exclusively by electronic means at ecoledoctorale@ensea.edu.ci
Please mention in the subject line: "Thesis project: Statistics, Machine Learning and Linguistics". Submission deadline is 12 August 2022
For further information, please visit ENSEA at Office 802 or contact the following numbers: Tel: (+225) 27 22 44 08 42
Some references
Boula de Mareüil, P. & Boutin, B.A. 2011. Perceptual evaluation and identification of West African accents in French. Journal of French Language Studies, n° 21, 3, p. 361-379.
- Boutin, B.A. 2014. Liaisons in French and African terrains. In J. Durand, G. Kristoffersen, B. Laks & J. Peuvergne (eds). The phonology of French: norms, peripheries, modeling. Mélanges pour Chantal Lyche, pp. 153-172. Presses Universitaires de Paris Ouest.
- Boutin, B.A. 2018. Plurilingualism and francophonie in Côte d'Ivoire. In O. Floquet (ed), Linguistic and sociolinguistic aspects of African French, Roma: Sapienza Università Editrice, pp. 101-119.
- Boutin, B.A. 2019. État des lieux de la recherche sur le français en Afrique, Langue Française, n° 202, p. 11-26.
- Boutin, B.A. & Turcsan, G. 2009. The pronunciation of French in Africa: the Ivory Coast. In J. Durand, B. Laks and C. Lyche: Phonologie, variation et accents du français, p. 131-152, Paris: Hermès.
- Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis. Springer Series in Statistics, 2nd edition. Springer, New York.
- Ramsay, J. O. and Silverman, B. (2002). Applied Functional Data Analysis: Methods and Case Studies. Springer, New York.
- Dabo-Niang & A.F. Yao. Kernel spatial density estimation in infinite dimension space. Metrika. Vol 76, pp. 19-52, 2013.
- Dabo-Niang, S., Ternynck, C. & Yao, A.-F. (2016). Nonparametric prediction of spatial multivariate data, Journal of Nonparametric Statistics, Vol. 28, No. 2, pages 428-458.
- Dabo-Niang, S., Ternynck, C., Thiam, B. and Yao, A-F. (2021). Non-parametric statistical analysis of spatially distributed functional data. In Wiley book; Geostatistical Functional Data Analysis: Theory and Methods. Editors : Jorge Mateu, Ramon Giraldo. To be published.