Overall, GTCOM’s machine translation automatic evaluation test set contains 50,000 records, which is divided into 2 language pairs: Chinese-English, and English-Chinese. Each language pair is consist of 5 domains: military domain, medical domain, financial domain, patent domain and common domain. Moreover, each domain includes 1000 original texts and 4000 corresponding translated texts. In terms of text length, the proportion is as following: for Chinese as original text, 1-15 words accounts for 20%, 15-30 words accounts for 30%, 30-75 words accounts for 40%, and 75-120 words accounts for 10%; for English as original text, 1-10 words accounts for 20%, 10-20 words accounts for 30%, 20-50 words accounts for 40%, 50-80 words accounts for 10%.
The test set sample is for reference only. For applying complete test set, please contact Global Tone Communication Technology Co., Ltd. - 2020 AI Lab- Wang Yiming (firstname.lastname@example.org) by email, in which contact name, phone number and company name is required.