Nom du projet  : Réalisation de corpus de données visuelles pour l’analyse des processus de création d’unités gestuelles (LSF et gestualité naturelle) (CREAGEST).


Responsables :

Porteur du projet : Christian Cuxac, UMR 7023 "Stuctures formelles du langage", Université Paris 8

Responsables scientifiques des autres partenaires :

Antonio Balvet, STL, UMR 8163, Université Lille 3
Cyril Courtin, UMR 6194, Groupe d'Imagerie Neurofonctionnelle

Durée du projet : 2007-2010.


Mots-clés : Sémiogénèse, gestualité, langue des signes, acquisition, iconicisation, néologisme, outils d’annotation, linguistique, psycholinguistique, traitement automatique des langues naturelles






1. Scientific background and objectives

This project aims at achieving the constitution of a video database of various corpora in French Sign Language (LSF), namely: an overall gesture corpus comprising both children and adult discourses in LSF, together with coverbal gesture.

This project aims at fulfilling the needs of the academic community, both national and international, in terms of extensive and representative sign language corpora. It also aims at building bridges with existing sign language corpora databases.

The types of corpora which will be developed throughout this project will address two interwoven issues.

1) A linguistic modelling issue: understanding more thoroughly the process of sign creation,

semiotisation and stabilization of human signifying gesture. The model by Cuxac (1996, 2000) for LSF is based on the hypothesis of a semiogenesis entrenched in the perceptivo-practical experience, which would be at work both on the ontogenetic dimension of language acquisition by deaf children and on the phylogenetic one of language constitution and evolution. Our project aims at constituting corpora in order to explore further and evaluate this hypothesis.


2) A documentation issue: our goal is to allow the different academic communities a constructive and perennial access to the gathered data. We plan to:


-set up a platform for training deaf investigators in data collection, and for archiving collected data

on a server, thus ensuring proper access to the community and fostering collaborations between

research groups


-tailor and develop where necessary corpus annotation tools. This part of the project comprises hands-on training sessions for general annotation tools (ANVIL ), in order to allow researchers to devise their own XML-based annotation specifications


-devise interoperable processes and data, in order to foster international scientific collaboration and exchange with other academic researchers: e.g. members of the ECHO project (IMDI norm for metadata along with ELAN annotation editor), members of the CHILDES network.


2. Description of the project, methodology

The project is segmented into 5 Sub-Projects, among which 3 are centered on the constitution of corpora, while the remaining 2 focuse on perennial acces to corpus annotation.

We will strive to achieve the following goals:


-constitution of a corpus of deaf children discourses (very few existing corpora, topic seldom adressed while the institutional demand is strong)


-constitution of a coverbal gestures corpus of hearing adults, alone or in interaction with deaf adult

signers (no widely-distributed video database on this topic in France)


-constitution of a neologism corpus in LSF (no systematic collection as yet)


These corpora will have in common: a) a homogeneous distribution over the whole geographical area of France (existing corpora center on the Paris area), b) the corpora will mirror the diversity of discourse genres: dialogs (existing corpora consisting mostly of monologs), descriptive (vs. narrative) genre, metalinguistic register.


Both of the more technical Sub-Projects will focus on elaborating a collaborative platform hosted on a web server and on adapting software tools to annotation, transcription and exploration of LSF corpora (plugin development for much-needed functionalities such as in-depth indexing of transcribed structures etc.).


The project will span across 48 months. The different tasks will be distributed among the three partners in a complementary way: SFL will devote itself to data collection and analysis, based on a long experience in the matter, together with the unique theoretical background for linguistic description of LSF. GIN will devote itself to matters regarding language acquisition by deaf children and the cognitive development of the latter. Finally, STL will bring its know-how in Natural Language Processing and corpus linguistics, which prove necessary to data annotation and management, together with their computer-aided processing. We would like to emphasize that all partners have had to opportunity to collaborate in the past, on training and research projects; they all exhibit an in-depth knowledge of LSF and of the Deaf community.


3. Expected results

The expected results of this project have to do both with fundamental and applied research. This project is expected to provide the basis of a standard for:


a) establishing lexicographic norms of description for the neologisms gathered and recorded on the web platform, aiming at LSF-dedicated dictionaries, which entries will be based on LSF’s morphemic components (Cuxac, 2004)


b) a reference framework for acquisition studies on LSF, and for pedagogy of LSF. In the area of pedagogy, prospective outcomes are:

- the foundations of a grammar of LSF in its early stages of acquisition

- the constitution, on this basis, of a learning skills reference for the acquisition of LSF.


Finally, the project should allow a fairer collaboration with Deaf researchers, thanks to the different training sessions, both theoretical and methodological, which are an integral part of the project, thus allowing for the preservation of LSF, an endangered language.