Quantcast
Channel: Blog – Center for Data Innovation
Viewing all articles
Browse latest Browse all 1218

Building a Dataset of Translated Sentences

$
0
0

Facebook has released CCMatrix, a dataset that contains 4.5 billion parallel sentences—sentences in one language and their corresponding translations in other languages. The dataset comprises parallel sentences for more than 500 language pairs. CCMatrix can help advance the development of translation systems, particularly for languages for which there is relatively little digitized material. 

Get the data.

Image: PxHere


Viewing all articles
Browse latest Browse all 1218

Trending Articles