Report on the experience with the automatic verification programme in SHARE wave 9
This document is the deliverable D4.11 of the Horizon2020 project “Social Sciences & Humanities Open Cloud”). It reports on the experience with the automatic verification checks implemented during the development phase of the questionnaire for SHARE wave 9. It describes the outcomes of the exercise, and it points out the critical issues to be addressed for further development.
The report starts with the motivation of the innovation and provides the context where the innovation takes place. The complexity of translating source text with programming syntax makes the task of the translators more difficult and affects the quality of the translations.
The section “Verification Tools” describes the tools developed, their technical implementation and the design of the innovative procedure. Two types of verification tool have been developed. The sanity check for the translations and the Automated Verification Tool (AVT). The former type consists of the empty text field check, the missing identification number check, the missing technical text check and the fill usage check. The latter one, the AVT, is able to compare the source and the target text and to assign a translation score to the text using the predictions made by a model.
The section “SHARE Exercise” lists the critical issues identified during the implementation phase. The sanity checks and the AVT worked technically fine. The procedure was easily understood by the translators and smoothly implemented. The performance mainly depends on the model trained.
The section “Results” summarizes the outcomes and the lesson learnt from the verification exercise. From the verification exercise, the SHARE team learnt that the innovation described makes the process more efficient in terms of effort and time. Sanity checks reduce the number of iterations in generating the national versions of the questionnaire tool. The preliminary results from the Automated Verification tool (AVT) provided mixed evidence about its contribution in improving the efficiency of the verification.
The final section concludes and mentions the direction for future steps. A way to improve the tool is to retrain the model with better training corpora. The training of bilingual phrase embeddings model could also improve the performance of the tool.