Quantcast
Channel: Blog – Center for Data Innovation
Viewing all articles
Browse latest Browse all 1154

Building a Dataset of Islamicate Texts

$
0
0

Researchers from Knowledge, Information, and The Arabic Book (KITAB), a project to create digital tools to analyze Arabic writing, have released a dataset of more than 4,000 Arabic texts to help construct the first machine-readable corpus of premodern Islamicate texts. The texts include work from nearly 2,000 authors and contain more than a billion words combined. Researchers can use this dataset to develop algorithms that can identify relationships between ideas within Arabic texts.

Get the data.

Image: Wellcome Images


Viewing all articles
Browse latest Browse all 1154

Trending Articles