View on GitHub


Yorùbá Flickr Audio Caption Corpus

Download YFACC

The Yorùbá Flickr Audio Caption Corpus (YFACC) dataset extends the Flickr8k image-text dataset to Yorùbá with three modalities:

  1. Yorùbá translations of 6k of the captions.
  2. Corresponding spoken recordings of these translations, obtained from a single speaker.
  3. Temporal alignments of 67 Yorùbá keywords for a subset of 500 of the captions.

The dataset is described in the following paper. Please cite the paper if you use the data:


YFACC (6.8 GB): yfacc_v6.tar.gz
MD5 checksum: 7e086f4424246e3dfc742abba488c429


© 2022 Stellenbosch University
This data is released under a Creative Commons Attribution-ShareAlike license (CC BY-SA 4.0).