ANR PROGRAMME THEMATIQUE EN SCIENCES HUMAINES ET SOCIALES
CORPUS ET OUTILS DE
HUMAINES ET SOCIALES
Nom du projet : Réalisation de corpus de données visuelles pour l’analyse des processus de création d’unités gestuelles (LSF et gestualité naturelle) (CREAGEST).
Durée du projet : 2007-2010.
Durée du projet
Mots-clés : Sémiogénèse, gestualité, langue des signes, acquisition, iconicisation, néologisme, outils d’annotation, linguistique, psycholinguistique, traitement automatique des langues naturelles
1. Scientific background and objectives
This project aims at achieving the constitution of a video database of various corpora in French Sign Language (LSF), namely: an overall gesture corpus comprising both children and adult discourses in LSF, together with coverbal gesture.
This project aims at fulfilling the needs of the academic community, both national and international, in terms of extensive and representative sign language corpora. It also aims at building bridges with existing sign language corpora databases.
The types of corpora which will be developed throughout this project will address two interwoven issues.
1) A linguistic modelling issue: understanding more thoroughly the process of sign creation,
semiotisation and stabilization of human signifying gesture. The model by Cuxac (1996, 2000) for LSF is based on the hypothesis of a semiogenesis entrenched in the perceptivo-practical experience, which would be at work both on the ontogenetic dimension of language acquisition by deaf children and on the phylogenetic one of language constitution and evolution. Our project aims at constituting corpora in order to explore further and evaluate this hypothesis.
2) A documentation issue: our goal is to allow the different academic communities a constructive and perennial access to the gathered data. We plan to:
-set up a platform for training deaf investigators in data collection, and for archiving collected data
on a server, thus ensuring proper access to the community and fostering collaborations between
-tailor and develop where necessary corpus annotation tools. This part of the project comprises hands-on training sessions for general annotation tools (ANVIL ), in order to allow researchers to devise their own XML-based annotation specifications
-devise interoperable processes and data, in order to foster international scientific collaboration and exchange with other academic researchers: e.g. members of the ECHO project (IMDI norm for metadata along with ELAN annotation editor), members of the CHILDES network.
2. Description of the project, methodology
The project is segmented into 5 Sub-Projects, among which 3 are centered on the constitution of corpora, while the remaining 2 focuse on perennial acces to corpus annotation.
We will strive to achieve the following goals:
-constitution of a corpus of deaf children discourses (very few existing corpora, topic seldom adressed while the institutional demand is strong)
-constitution of a coverbal gestures corpus of hearing adults, alone or in interaction with deaf adult
signers (no widely-distributed video database on
this topic in
-constitution of a neologism corpus in LSF (no systematic collection as yet)
These corpora will have
in common: a) a homogeneous distribution over the whole geographical area of
Both of the more technical Sub-Projects will focus on elaborating a collaborative platform hosted on a web server and on adapting software tools to annotation, transcription and exploration of LSF corpora (plugin development for much-needed functionalities such as in-depth indexing of transcribed structures etc.).
The project will span across 48 months. The different tasks will be distributed among the three partners in a complementary way: SFL will devote itself to data collection and analysis, based on a long experience in the matter, together with the unique theoretical background for linguistic description of LSF. GIN will devote itself to matters regarding language acquisition by deaf children and the cognitive development of the latter. Finally, STL will bring its know-how in Natural Language Processing and corpus linguistics, which prove necessary to data annotation and management, together with their computer-aided processing. We would like to emphasize that all partners have had to opportunity to collaborate in the past, on training and research projects; they all exhibit an in-depth knowledge of LSF and of the Deaf community.
3. Expected results
The expected results of this project have to do both with fundamental and applied research. This project is expected to provide the basis of a standard for:
a) establishing lexicographic norms of description for the neologisms gathered and recorded on the web platform, aiming at LSF-dedicated dictionaries, which entries will be based on LSF’s morphemic components (Cuxac, 2004)
b) a reference framework for acquisition studies on LSF, and for pedagogy of LSF. In the area of pedagogy, prospective outcomes are:
- the foundations of a grammar of LSF in its early stages of acquisition
- the constitution, on this basis, of a learning skills reference for the acquisition of LSF.
Finally, the project should allow a fairer collaboration with Deaf researchers, thanks to the different training sessions, both theoretical and methodological, which are an integral part of the project, thus allowing for the preservation of LSF, an endangered language.