An open-source, sign language data crowdsourcing platform for AI enabled dialect-agnostic communication and domain study

An open-source, sign language data crowdsourcing platform for AI enabled dialect-agnostic communication and domain study

Even though most people now have better access to education because of technological advancements, 70 million people with hearing impairment still cannot use the majority of these tools. Our project aims to meet SDG goal 4 through minimizing this gap and ensuring inclusive and equitable quality education, thus promoting lifelong learning opportunities for all. Through our platform Bornil, we will enable creating video-based Sign Language dataset for all the 300+ low resource sign language dialects. This dataset will be used for Automated Sign Language Recognition (ASLR) for these dialects so that people using these dialects can communicate efficiently. Currently there are no standardized protocols for crowdsourcing sign language dataset construction, and data acquisition is highly expensive. To democratize the data collection process and expedite development of ASLR, we are offering this open source platform that will address multiple data collection process factors. We will consider explicit age and gender-based differentials as our platform will enable crowdsourcing of sign language recordings using readily available devices like phones and laptops, without the need of setting up recording studios. We will enable gathering data from a variety of sources, including the deaf community, hard of hearing persons, children of deaf parents, and siblings of deaf adults using an intuitive platform interface. To offer real-world representative data, voluntary data contributors will assist by adjusting camera resolution, background noise and other metadata. GDPR will be ensured throughout the process. Users will contribute to the datasets in three ways: by recording videos, validating recordings and metadata, and annotating videos. We will curate this data in such a way that artificial intelligence algorithms can be used for training ASLR systems. As a case study, we will use this platform for investigating statistical variety of the participants and regional variations in Bangladeshi Sign Language.

Recording

The recording section provides users with an in-built video recorder. The users are given a text (one or more sentences or a topic) based on their selected language for which they will record a video in sign language. Users will also submit metadata related to the recordings. After recording the video, the platform automatically switches to a video player form where a user can preview the recording.

Annotation

In this section, users is given an unannotated video and its corresponding text/topic. User is given an interface with text boxes and timestamps which (s)he will use for the annotation. The platform provides a timeline with drag-and-drop feature, allowing the user to freely change the length, start time, end time of the text boxes and also seek the video to a specific time.

Validation

The validation section is used for validating the recorded videos, their metadata and annotation. Metadata validation is similar to recording where a user watches the video to validate if the recording and the provided info is correct or not. As for annotation validation, users validate if the annotation matches the video. In both sections users are able to fix any errors.