Abstract: In recent years, visual-based sign language recognition (SLR) has become an active research area with the advancement of deep learning. However, it is difficult to collect sign language data ...
Vision-Language-Action (VLA) models have shown remarkable potential in visuomotor control and instruction comprehension through end-to-end learning processes. However, current VLA models face ...
We find a commonality of various dirty samples is visual-linguistic inconsistency between images and associated labels. To capture the semantic inconsistency between modalities, we propose versatile ...
Developing AI systems that operate effectively across languages while remaining culturally grounded is a long-standing challenge, particularly in low-resource settings. Synthetic data provides a ...
Voice-based technologies are essential to cater to the hundreds of millions of new smartphone users. However, most of the languages spoken by these new users have little to no labelled speech data.