Multilingual Semantic Search

Covid-19 MLIA Data

Corpora

Corpus Size Time span Language Documents Download
Round 1
EU Press Corner 7.2 Mbyte (compressed) June 2020 English
French
German
Greek
Italian
Spanish
Swedish
Ukranian
335
276
266
120
123
115
122
0
europresscorner-202006-xml.zip
EUR-Lex 23.3 Mbyte (compressed) June 2020 English
French
German
Greek
Italian
Spanish
Swedish
Ukranian
352
345
345
344
342
342
343
0
eurlex-202006-xml.zip
Global Voices 13.6 Mbyte (compressed) June 2020 English
French
German
Greek
Italian
Spanish
Swedish
Ukranian
571
446
51
328
539
595
5
66
global-voices-20200611-xml.zip
MEDISYS 2,036.0 Mbyte (compressed) December 2019 to April 2020 English
French
German
Greek
Italian
Spanish
Swedish
Ukranian
1,450,251
325,178
272,645
146,763
661,514
832,639
37,615
15,395
medisys-201912-xml_ir.zip
medisys-202001-xml_ir.zip
medisys-202002-xml_ir.zip
medisys-202003-p1-7-xml_ir.zip
medisys-202004-xml_ir.zip
Wikipedia 13.7 Mbyte (compressed) June 2020 English
French
German
Greek
Italian
Spanish
Swedish
Ukranian
731
357
364
103
271
342
111
121
wikipedia-20200611-xml.zip
Total documents by language English
French
German
Greek
Italian
Spanish
Swedish
Ukranian
1,452,240
326,599
273,761
147,658
662,789
833,763
38,196
15,582

Relevance Judgements

Round 1 English (not available yet)
French (not available yet)
German (not available yet)
Greek (not available yet)
Italian (not available yet)
Japanese (not available yet)
Spanish (not available yet)
Swedish (not available yet)
Ukranian (not available yet)

Runs and Rolling Reports

Runs and rolling reports for all the round are available in following git repository.