Which metric is commonly used to evaluate machine-translation models?
A.
F1 Score
B.
BLEU score
C.
ROUGE score
D.
Perplexity
The Answer Is:
B
This question includes an explanation.
Explanation:
The BLEU (Bilingual Evaluation Understudy) score is the most commonly used metric for evaluating machine-translation models. It measures the precision of n-gram overlaps between the generated translation and reference translations, providing a quantitative measure of translation quality. NVIDIA’s NeMo documentation on NLP tasks, particularly machine translation, highlights BLEU as the standard metric for assessing translation performance due to its focus on precision and fluency. Option A (F1 Score) is used for classification tasks, not translation. Option C (ROUGE) is primarily for summarization, focusing on recall. Option D (Perplexity) measures language model quality but is less specific to translation evaluation.
[References:, NVIDIA NeMo Documentation:https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/intro.html, Papineni, K., et al. (2002). "BLEU: A Method for Automatic Evaluation of Machine Translation.", ]
NCA-GENL PDF/Engine
Printable Format
Value of Money
100% Pass Assurance
Verified Answers
Researched by Industry Experts
Based on Real Exams Scenarios
100% Real Questions
Get 65% Discount on All Products,
Use Coupon: "ac4s65"