Multilingual Semantic Search

Covid-19 MLIA Data

Corpora Round 1

Corpus Size Time span Language Documents Download
Round 1
EU Press Corner 7.2 Mbyte (compressed) June 2020 English
French
German
Greek
Italian
Spanish
Swedish
Ukranian
335
276
266
120
123
115
122
0
europresscorner-202006-xml.zip
EUR-Lex 23.3 Mbyte (compressed) June 2020 English
French
German
Greek
Italian
Spanish
Swedish
Ukranian
352
345
345
344
342
342
343
0
eurlex-202006-xml.zip
Global Voices 13.6 Mbyte (compressed) June 2020 English
French
German
Greek
Italian
Spanish
Swedish
Ukranian
571
446
51
328
539
595
5
66
global-voices-20200611-xml.zip
MEDISYS 2,036.0 Mbyte (compressed) December 2019 to April 2020 English
French
German
Greek
Italian
Spanish
Swedish
Ukranian
1,450,251
325,178
272,645
146,763
661,514
832,639
37,615
15,395
medisys-201912-xml_ir.zip
medisys-202001-xml_ir.zip
medisys-202002-xml_ir.zip
medisys-202003-p1-7-xml_ir.zip
medisys-202004-xml_ir.zip
Wikipedia 13.7 Mbyte (compressed) June 2020 English
French
German
Greek
Italian
Spanish
Swedish
Ukranian
731
357
364
103
271
342
111
121
wikipedia-20200611-xml.zip
Total documents by language English
French
German
Greek
Italian
Spanish
Swedish
Ukranian
1,452,240
326,599
273,761
147,658
662,789
833,763
38,196
15,582

Corpora Round 2

Corpus Size Time span Language Documents Download
Round 2
MEDISYS 46.9 Gbyte (compressed) April 2020 to September 2020 English
French
German
Greek
Italian
Spanish
Swedish
Ukranian
Arabic
3,542,790
667,339
334,824
285,658
1,067,155
2,272,703
66,412
50,775
638,581
en_medisys_2020_round2.zip
fr_medisys_2020_round2.zip
de_medisys_2020_round2.zip
el_medisys_2020_round2.zip
it_medisys_2020_round2.zip
es_medisys_2020_round2.zip
sv_medisys_2020_round2.zip
uk_medisys_2020_round2.zip
ar_2020_round2.zip

Relevance Judgements

Round 1 English
French
German
Greek
Italian
Spanish
Swedish
Ukranian (not available yet)

Runs and Rolling Reports

Runs and rolling reports for all the round are available in following git repository.