subscribe to arXiv mailings

Efficient Parameter Mining and Freezing for Continual Object Detection

Authors: Angelo G. Menezes, Augusto J. Peterlevitz, Mateus A. Chinelatto, André C. P. L. F. de Carvalho

Abstract: Continual Object Detection is essential for enabling intelligent agents to interact proactively with humans in real-world settings. While parameter-isolation strategies have been extensively explored in the context of continual learning for classification, they have yet to be fully harnessed for incremental object detection scenarios. Drawing inspiration from prior research that focused on mining… ▽ More Continual Object Detection is essential for enabling intelligent agents to interact proactively with humans in real-world settings. While parameter-isolation strategies have been extensively explored in the context of continual learning for classification, they have yet to be fully harnessed for incremental object detection scenarios. Drawing inspiration from prior research that focused on mining individual neuron responses and integrating insights from recent developments in neural pruning, we proposed efficient ways to identify which layers are the most important for a network to maintain the performance of a detector across sequential updates. The presented findings highlight the substantial advantages of layer-level parameter isolation in facilitating incremental learning within object detection models, offering promising avenues for future research and application in real-world scenarios. △ Less

Submitted 19 February, 2024; originally announced February 2024.

Comments: In Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2: VISAPP, ISBN 978-989-758-679-8, ISSN 2184-4321, pages 466-474

arXiv:2310.15987 [pdf, other]

Dissecting In-Context Learning of Translations in GPTs

Authors: Vikas Raunak, Hany Hassan Awadalla, Arul Menezes

Abstract: Most of the recent work in leveraging Large Language Models (LLMs) such as GPT-3 for Machine Translation (MT) has focused on selecting the few-shot samples for prompting. In this work, we try to better understand the role of demonstration attributes for the in-context learning of translations through perturbations of high-quality, in-domain demonstrations. We find that asymmetric perturbation of t… ▽ More Most of the recent work in leveraging Large Language Models (LLMs) such as GPT-3 for Machine Translation (MT) has focused on selecting the few-shot samples for prompting. In this work, we try to better understand the role of demonstration attributes for the in-context learning of translations through perturbations of high-quality, in-domain demonstrations. We find that asymmetric perturbation of the source-target mappings yield vastly different results. We show that the perturbation of the source side has surprisingly little impact, while target perturbation can drastically reduce translation quality, suggesting that it is the output text distribution that provides the most important learning signal during in-context learning of translations. We propose a method named Zero-Shot-Context to add this signal automatically in Zero-Shot prompting. We demonstrate that it improves upon the zero-shot translation performance of GPT-3, even making it competitive with few-shot prompted translations. △ Less

Submitted 24 October, 2023; originally announced October 2023.

Comments: EMNLP Findings (+ Minor Updates over Camera-Ready)

arXiv:2305.19835 [pdf, ps, other]

Deliberate then Generate: Enhanced Prompting Framework for Text Generation

Authors: Bei Li, Rui Wang, Junliang Guo, Kaitao Song, Xu Tan, Hany Hassan, Arul Menezes, Tong Xiao, Jiang Bian, JingBo Zhu

Abstract: Large language models (LLMs) have shown remarkable success across a wide range of natural language generation tasks, where proper prompt designs make great impacts. While existing prompting methods are normally restricted to providing correct information, in this paper, we encourage the model to deliberate by proposing a novel Deliberate then Generate (DTG) prompting framework, which consists of e… ▽ More Large language models (LLMs) have shown remarkable success across a wide range of natural language generation tasks, where proper prompt designs make great impacts. While existing prompting methods are normally restricted to providing correct information, in this paper, we encourage the model to deliberate by proposing a novel Deliberate then Generate (DTG) prompting framework, which consists of error detection instructions and candidates that may contain errors. DTG is a simple yet effective technique that can be applied to various text generation tasks with minimal modifications. We conduct extensive experiments on 20+ datasets across 7 text generation tasks, including summarization, translation, dialogue, and more. We show that DTG consistently outperforms existing prompting methods and achieves state-of-the-art performance on multiple text generation tasks. We also provide in-depth analyses to reveal the underlying mechanisms of DTG, which may inspire future research on prompting for LLMs. △ Less

Submitted 31 May, 2023; originally announced May 2023.

arXiv:2305.16806 [pdf, other]

Do GPTs Produce Less Literal Translations?

Authors: Vikas Raunak, Arul Menezes, Matt Post, Hany Hassan Awadalla

Abstract: Large Language Models (LLMs) such as GPT-3 have emerged as general-purpose language models capable of addressing many natural language generation or understanding tasks. On the task of Machine Translation (MT), multiple works have investigated few-shot prompting mechanisms to elicit better translations from LLMs. However, there has been relatively little investigation on how such translations diff… ▽ More Large Language Models (LLMs) such as GPT-3 have emerged as general-purpose language models capable of addressing many natural language generation or understanding tasks. On the task of Machine Translation (MT), multiple works have investigated few-shot prompting mechanisms to elicit better translations from LLMs. However, there has been relatively little investigation on how such translations differ qualitatively from the translations generated by standard Neural Machine Translation (NMT) models. In this work, we investigate these differences in terms of the literalness of translations produced by the two systems. Using literalness measures involving word alignment and monotonicity, we find that translations out of English (E-X) from GPTs tend to be less literal, while exhibiting similar or better scores on MT quality metrics. We demonstrate that this finding is borne out in human evaluations as well. We then show that these differences are especially pronounced when translating sentences that contain idiomatic expressions. △ Less

Submitted 5 June, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

Comments: ACL 2023

arXiv:2305.14878 [pdf, other]

Leveraging GPT-4 for Automatic Translation Post-Editing

Authors: Vikas Raunak, Amr Sharaf, Yiren Wang, Hany Hassan Awadallah, Arul Menezes

Abstract: While Neural Machine Translation (NMT) represents the leading approach to Machine Translation (MT), the outputs of NMT models still require translation post-editing to rectify errors and enhance quality under critical settings. In this work, we formalize the task of direct translation post-editing with Large Language Models (LLMs) and explore the use of GPT-4 to automatically post-edit NMT outputs… ▽ More While Neural Machine Translation (NMT) represents the leading approach to Machine Translation (MT), the outputs of NMT models still require translation post-editing to rectify errors and enhance quality under critical settings. In this work, we formalize the task of direct translation post-editing with Large Language Models (LLMs) and explore the use of GPT-4 to automatically post-edit NMT outputs across several language pairs. Our results demonstrate that GPT-4 is adept at translation post-editing, producing meaningful and trustworthy edits to translations that help improve its general quality as well as remove different classes of major errors in translations. In particular, human evaluations on assessing edit trustworthiness show that GPT-4 exhibits a large improvement over the prior state-of-the-art LLM. Notably, we improve upon state-of-the-art performance on WMT-22 English-Chinese, English-German, Chinese-English and German-English language pairs using GPT-4 based post-editing, as evaluated by state-of-the-art MT quality metrics. However, we also show that GPT-4 could produce hallucinated edits, thereby urging caution in its use as an expert translation post-editor. △ Less

Submitted 23 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: EMNLP Findings 2023

arXiv:2305.03361 [pdf, other]

CHAMELEON: OutSystems Live Bidirectional Transformations

Authors: Hugo Lourenço, João Costa Seco, Carla Ferreira, Tiago Simões, Vasco Silva, Filipe Assunção, André Menezes

Abstract: In model-driven engineering, the bidirectional transformation of models plays a crucial role in facilitating the use of editors that operate at different levels of abstraction. This is particularly important in the context of industrial-grade low-code platforms like OutSystems, which feature a comprehensive ecosystem of tools that complement the standard integrated development environment with dom… ▽ More In model-driven engineering, the bidirectional transformation of models plays a crucial role in facilitating the use of editors that operate at different levels of abstraction. This is particularly important in the context of industrial-grade low-code platforms like OutSystems, which feature a comprehensive ecosystem of tools that complement the standard integrated development environment with domain-specific builders and abstract model viewers. We introduce CHAMELEON, a tool that enables the dynamic definition of a live bidirectional model transformation in a declarative manner by leveraging simple and intuitive component patterns. Through this approach, we can gradually define the view and synthesis paths to an abstract model built on top of a low-code metamodel. We devise a standard parser-generating technique for tree-like models that builds upon extended grammar definitions with constraints and name binders. We allow for a greater overlap of model patterns that can still be disambiguated for a clear lens-like behaviour of the transformation. CHAMELEON is evaluated in the fragment of the OutSystems language targeting the definition of user interfaces. To assess performance we used a large set of real OutSystems applications, with approximately 200K UI widgets, and a database of curated widget patterns. We found a worst-case processing time of 92ms for complete models in our benchmark, which is still suitable for the operation of an interactive model editor. △ Less

Submitted 5 May, 2023; originally announced May 2023.

arXiv:2304.14802 [pdf, other]

ResiDual: Transformer with Dual Residual Connections

Authors: Shufang Xie, Huishuai Zhang, Junliang Guo, Xu Tan, Jiang Bian, Hany Hassan Awadalla, Arul Menezes, Tao Qin, Rui Yan

Abstract: Transformer networks have become the preferred architecture for many tasks due to their state-of-the-art performance. However, the optimal way to implement residual connections in Transformer, which are essential for effective training, is still debated. Two widely used variants are the Post-Layer-Normalization (Post-LN) and Pre-Layer-Normalization (Pre-LN) Transformers, which apply layer normaliz… ▽ More Transformer networks have become the preferred architecture for many tasks due to their state-of-the-art performance. However, the optimal way to implement residual connections in Transformer, which are essential for effective training, is still debated. Two widely used variants are the Post-Layer-Normalization (Post-LN) and Pre-Layer-Normalization (Pre-LN) Transformers, which apply layer normalization after each residual block's output or before each residual block's input, respectively. While both variants enjoy their advantages, they also suffer from severe limitations: Post-LN causes gradient vanishing issue that hinders training deep Transformers, and Pre-LN causes representation collapse issue that limits model capacity. In this paper, we propose ResiDual, a novel Transformer architecture with Pre-Post-LN (PPLN), which fuses the connections in Post-LN and Pre-LN together and inherits their advantages while avoids their limitations. We conduct both theoretical analyses and empirical experiments to verify the effectiveness of ResiDual. Theoretically, we prove that ResiDual has a lower bound on the gradient to avoid the vanishing issue due to the residual connection from Pre-LN. Moreover, ResiDual also has diverse model representations to avoid the collapse issue due to the residual connection from Post-LN. Empirically, ResiDual outperforms both Post-LN and Pre-LN on several machine translation benchmarks across different network depths and data sizes. Thanks to the good theoretical and empirical performance, ResiDual Transformer can serve as a foundation architecture for different AI models (e.g., large language models). Our code is available at https://github.com/microsoft/ResiDual. △ Less

Submitted 28 April, 2023; originally announced April 2023.

arXiv:2303.16870 [pdf, ps, other]

Questions of science: chatting with ChatGPT about complex systems

Authors: Nuno Crokidakis, Marcio Argollo de Menezes, Daniel O. Cajueiro

Abstract: We present an overview of the complex systems field using ChatGPT as a representation of the community's understanding. ChatGPT has learned language patterns and styles from a large dataset of internet texts, allowing it to provide answers that reflect common opinions, ideas, and language patterns found in the community. Our exploration covers both teaching and learning, and research topics. We re… ▽ More We present an overview of the complex systems field using ChatGPT as a representation of the community's understanding. ChatGPT has learned language patterns and styles from a large dataset of internet texts, allowing it to provide answers that reflect common opinions, ideas, and language patterns found in the community. Our exploration covers both teaching and learning, and research topics. We recognize the value of ChatGPT as a source for the community's ideas. △ Less

Submitted 29 March, 2023; originally announced March 2023.

Comments: This is a work in progress

arXiv:2212.00006 [pdf, other]

Operationalizing Specifications, In Addition to Test Sets for Evaluating Constrained Generative Models

Authors: Vikas Raunak, Matt Post, Arul Menezes

Abstract: In this work, we present some recommendations on the evaluation of state-of-the-art generative models for constrained generation tasks. The progress on generative models has been rapid in recent years. These large-scale models have had three impacts: firstly, the fluency of generation in both language and vision modalities has rendered common average-case evaluation metrics much less useful in dia… ▽ More In this work, we present some recommendations on the evaluation of state-of-the-art generative models for constrained generation tasks. The progress on generative models has been rapid in recent years. These large-scale models have had three impacts: firstly, the fluency of generation in both language and vision modalities has rendered common average-case evaluation metrics much less useful in diagnosing system errors. Secondly, the same substrate models now form the basis of a number of applications, driven both by the utility of their representations as well as phenomena such as in-context learning, which raise the abstraction level of interacting with such models. Thirdly, the user expectations around these models and their feted public releases have made the technical challenge of out of domain generalization much less excusable in practice. Subsequently, our evaluation methodologies haven't adapted to these changes. More concretely, while the associated utility and methods of interacting with generative models have expanded, a similar expansion has not been observed in their evaluation practices. In this paper, we argue that the scale of generative models could be exploited to raise the abstraction level at which evaluation itself is conducted and provide recommendations for the same. Our recommendations are based on leveraging specifications as a powerful instrument to evaluate generation quality and are readily applicable to a variety of tasks. △ Less

Submitted 19 November, 2022; originally announced December 2022.

Comments: NeurIPS 2022 Workshop on Human Evaluation of Generative Models

arXiv:2211.16934 [pdf, other]

VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing

Authors: Yihan Wu, Junliang Guo, Xu Tan, Chen Zhang, Bohan Li, Ruihua Song, Lei He, Sheng Zhao, Arul Menezes, Jiang Bian

Abstract: Video dubbing aims to translate the original speech in a film or television program into the speech in a target language, which can be achieved with a cascaded system consisting of speech recognition, machine translation and speech synthesis. To ensure the translated speech to be well aligned with the corresponding video, the length/duration of the translated speech should be as close as possible… ▽ More Video dubbing aims to translate the original speech in a film or television program into the speech in a target language, which can be achieved with a cascaded system consisting of speech recognition, machine translation and speech synthesis. To ensure the translated speech to be well aligned with the corresponding video, the length/duration of the translated speech should be as close as possible to that of the original speech, which requires strict length control. Previous works usually control the number of words or characters generated by the machine translation model to be similar to the source sentence, without considering the isochronicity of speech as the speech duration of words/characters in different languages varies. In this paper, we propose a machine translation system tailored for the task of video dubbing, which directly considers the speech duration of each token in translation, to match the length of source and target speech. Specifically, we control the speech length of generated sentence by guiding the prediction of each word with the duration information, including the speech duration of itself as well as how much duration is left for the remaining words. We design experiments on four language directions (German -> English, Spanish -> English, Chinese <-> English), and the results show that the proposed method achieves better length control ability on the generated speech than baseline methods. To make up the lack of real-world datasets, we also construct a real-world test set collected from films to provide comprehensive evaluations on the video dubbing task. △ Less

Submitted 4 December, 2023; v1 submitted 30 November, 2022; originally announced November 2022.

Comments: AAAI 2023 camera version

arXiv:2211.13317 [pdf, other]

Rank-One Editing of Encoder-Decoder Models

Authors: Vikas Raunak, Arul Menezes

Abstract: Large sequence to sequence models for tasks such as Neural Machine Translation (NMT) are usually trained over hundreds of millions of samples. However, training is just the origin of a model's life-cycle. Real-world deployments of models require further behavioral adaptations as new requirements emerge or shortcomings become known. Typically, in the space of model behaviors, behavior deletion requ… ▽ More Large sequence to sequence models for tasks such as Neural Machine Translation (NMT) are usually trained over hundreds of millions of samples. However, training is just the origin of a model's life-cycle. Real-world deployments of models require further behavioral adaptations as new requirements emerge or shortcomings become known. Typically, in the space of model behaviors, behavior deletion requests are addressed through model retrainings whereas model finetuning is done to address behavior addition requests, both procedures being instances of data-based model intervention. In this work, we present a preliminary study investigating rank-one editing as a direct intervention method for behavior deletion requests in encoder-decoder transformer models. We propose four editing tasks for NMT and show that the proposed editing algorithm achieves high efficacy, while requiring only a single instance of positive example to fix an erroneous (negative) model behavior. △ Less

Submitted 23 November, 2022; originally announced November 2022.

Comments: The Second Workshop On Interactive Learning For Natural Language Processing (InterNLP 2022), NeurIPS 2022

arXiv:2210.12929 [pdf, other]

Finding Memo: Extractive Memorization in Constrained Sequence Generation Tasks

Authors: Vikas Raunak, Arul Menezes

Abstract: Memorization presents a challenge for several constrained Natural Language Generation (NLG) tasks such as Neural Machine Translation (NMT), wherein the proclivity of neural models to memorize noisy and atypical samples reacts adversely with the noisy (web crawled) datasets. However, previous studies of memorization in constrained NLG tasks have only focused on counterfactual memorization, linking… ▽ More Memorization presents a challenge for several constrained Natural Language Generation (NLG) tasks such as Neural Machine Translation (NMT), wherein the proclivity of neural models to memorize noisy and atypical samples reacts adversely with the noisy (web crawled) datasets. However, previous studies of memorization in constrained NLG tasks have only focused on counterfactual memorization, linking it to the problem of hallucinations. In this work, we propose a new, inexpensive algorithm for extractive memorization (exact training data generation under insufficient context) in constrained sequence generation tasks and use it to study extractive memorization and its effects in NMT. We demonstrate that extractive memorization poses a serious threat to NMT reliability by qualitatively and quantitatively characterizing the memorized samples as well as the model behavior in their vicinity. Based on empirical observations, we develop a simple algorithm which elicits non-memorized translations of memorized samples from the same model, for a large fraction of such samples. Finally, we show that the proposed algorithm could also be leveraged to mitigate memorization in the model through finetuning. We have released the code to reproduce our results at https://github.com/vyraun/Finding-Memo. △ Less

Submitted 23 October, 2022; originally announced October 2022.

Comments: EMNLP Findings 2022

arXiv:2206.04615 [pdf, other]

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting. △ Less

Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

arXiv:2205.15445 [pdf, other]

Continual Object Detection: A review of definitions, strategies, and challenges

Authors: Angelo G. Menezes, Gustavo de Moura, Cézanne Alves, André C. P. L. F. de Carvalho

Abstract: The field of Continual Learning investigates the ability to learn consecutive tasks without losing performance on those previously learned. Its focus has been mainly on incremental classification tasks. We believe that research in continual object detection deserves even more attention due to its vast range of applications in robotics and autonomous vehicles. This scenario is more complex than con… ▽ More The field of Continual Learning investigates the ability to learn consecutive tasks without losing performance on those previously learned. Its focus has been mainly on incremental classification tasks. We believe that research in continual object detection deserves even more attention due to its vast range of applications in robotics and autonomous vehicles. This scenario is more complex than conventional classification given the occurrence of instances of classes that are unknown at the time, but can appear in subsequent tasks as a new class to be learned, resulting in missing annotations and conflicts with the background label. In this review, we analyze the current strategies proposed to tackle the problem of class-incremental object detection. Our main contributions are: (1) a short and systematic review of the methods that propose solutions to traditional incremental object detection scenarios; (2) A comprehensive evaluation of the existing approaches using a new metric to quantify the stability and plasticity of each technique in a standard way; (3) an overview of the current trends within continual object detection and a discussion of possible future research directions. △ Less

Submitted 30 May, 2022; originally announced May 2022.

arXiv:2205.09988 [pdf, other]

SALTED: A Framework for SAlient Long-Tail Translation Error Detection

Authors: Vikas Raunak, Matt Post, Arul Menezes

Abstract: Traditional machine translation (MT) metrics provide an average measure of translation quality that is insensitive to the long tail of behavioral problems in MT. Examples include translation of numbers, physical units, dropped content and hallucinations. These errors, which occur rarely and unpredictably in Neural Machine Translation (NMT), greatly undermine the reliability of state-of-the-art MT… ▽ More Traditional machine translation (MT) metrics provide an average measure of translation quality that is insensitive to the long tail of behavioral problems in MT. Examples include translation of numbers, physical units, dropped content and hallucinations. These errors, which occur rarely and unpredictably in Neural Machine Translation (NMT), greatly undermine the reliability of state-of-the-art MT systems. Consequently, it is important to have visibility into these problems during model development. Towards this direction, we introduce SALTED, a specifications-based framework for behavioral testing of MT models that provides fine-grained views of salient long-tail errors, permitting trustworthy visibility into previously invisible problems. At the core of our approach is the development of high-precision detectors that flag errors (or alternatively, verify output correctness) between a source sentence and a system output. We demonstrate that such detectors could be used not just to identify salient long-tail errors in MT systems, but also for higher-recall filtering of the training data, fixing targeted errors with model fine-tuning in NMT and generating novel data for metamorphic testing to elicit further bugs in models. △ Less

Submitted 20 May, 2022; originally announced May 2022.

arXiv:2107.10821 [pdf, other]

To Ship or Not to Ship: An Extensive Evaluation of Automatic Metrics for Machine Translation

Authors: Tom Kocmi, Christian Federmann, Roman Grundkiewicz, Marcin Junczys-Dowmunt, Hitokazu Matsushita, Arul Menezes

Abstract: Automatic metrics are commonly used as the exclusive tool for declaring the superiority of one machine translation system's quality over another. The community choice of automatic metric guides research directions and industrial developments by deciding which models are deemed better. Evaluating metrics correlations with sets of human judgements has been limited by the size of these sets. In this… ▽ More Automatic metrics are commonly used as the exclusive tool for declaring the superiority of one machine translation system's quality over another. The community choice of automatic metric guides research directions and industrial developments by deciding which models are deemed better. Evaluating metrics correlations with sets of human judgements has been limited by the size of these sets. In this paper, we corroborate how reliable metrics are in contrast to human judgements on -- to the best of our knowledge -- the largest collection of judgements reported in the literature. Arguably, pairwise rankings of two systems are the most common evaluation tasks in research or deployment scenarios. Taking human judgement as a gold standard, we investigate which metrics have the highest accuracy in predicting translation quality rankings for such system pairs. Furthermore, we evaluate the performance of various metrics across different language pairs and domains. Lastly, we show that the sole use of BLEU impeded the development of improved models leading to bad deployment decisions. We release the collection of 2.3M sentence-level human judgements for 4380 systems for further analysis and replication of our work. △ Less

Submitted 13 September, 2021; v1 submitted 22 July, 2021; originally announced July 2021.

Comments: Accepted to WMT 2021 research papers

arXiv:2107.01017 [pdf, other]

MegazordNet: combining statistical and machine learning standpoints for time series forecasting

Authors: Angelo Garangau Menezes, Saulo Martiello Mastelini

Abstract: Forecasting financial time series is considered to be a difficult task due to the chaotic feature of the series. Statistical approaches have shown solid results in some specific problems such as predicting market direction and single-price of stocks; however, with the recent advances in deep learning and big data techniques, new promising options have arises to tackle financial time series forecas… ▽ More Forecasting financial time series is considered to be a difficult task due to the chaotic feature of the series. Statistical approaches have shown solid results in some specific problems such as predicting market direction and single-price of stocks; however, with the recent advances in deep learning and big data techniques, new promising options have arises to tackle financial time series forecasting. Moreover, recent literature has shown that employing a combination of statistics and machine learning may improve accuracy in the forecasts in comparison to single solutions. Taking into consideration the mentioned aspects, in this work, we proposed the MegazordNet, a framework that explores statistical features within a financial series combined with a structured deep learning model for time series forecasting. We evaluated our approach predicting the closing price of stocks in the S&P 500 using different metrics, and we were able to beat single statistical and machine learning methods. △ Less

Submitted 23 June, 2021; originally announced July 2021.

arXiv:2104.06683 [pdf, other]

The Curious Case of Hallucinations in Neural Machine Translation

Authors: Vikas Raunak, Arul Menezes, Marcin Junczys-Dowmunt

Abstract: In this work, we study hallucinations in Neural Machine Translation (NMT), which lie at an extreme end on the spectrum of NMT pathologies. Firstly, we connect the phenomenon of hallucinations under source perturbation to the Long-Tail theory of Feldman (2020), and present an empirically validated hypothesis that explains hallucinations under source perturbation. Secondly, we consider hallucination… ▽ More In this work, we study hallucinations in Neural Machine Translation (NMT), which lie at an extreme end on the spectrum of NMT pathologies. Firstly, we connect the phenomenon of hallucinations under source perturbation to the Long-Tail theory of Feldman (2020), and present an empirically validated hypothesis that explains hallucinations under source perturbation. Secondly, we consider hallucinations under corpus-level noise (without any source perturbation) and demonstrate that two prominent types of natural hallucinations (detached and oscillatory outputs) could be generated and explained through specific corpus-level noise patterns. Finally, we elucidate the phenomenon of hallucination amplification in popular data-generation processes such as Backtranslation and sequence-level Knowledge Distillation. △ Less

Submitted 14 April, 2021; originally announced April 2021.

Comments: Accepted to NAACL 2021

arXiv:2101.10845 [pdf, other]

Analysis and evaluation of Deep Learning based Super-Resolution algorithms to improve performance in Low-Resolution Face Recognition

Authors: Angelo G. Menezes

Abstract: Surveillance scenarios are prone to several problems since they usually involve low-resolution footage, and there is no control of how far the subjects may be from the camera in the first place. This situation is suitable for the application of upsampling (super-resolution) algorithms since they may be able to recover the discriminant properties of the subjects involved. While general super-resolu… ▽ More Surveillance scenarios are prone to several problems since they usually involve low-resolution footage, and there is no control of how far the subjects may be from the camera in the first place. This situation is suitable for the application of upsampling (super-resolution) algorithms since they may be able to recover the discriminant properties of the subjects involved. While general super-resolution approaches were proposed to enhance image quality for human-level perception, biometrics super-resolution methods seek the best "computer perception" version of the image since their focus is on improving automatic recognition performance. Convolutional neural networks and deep learning algorithms, in general, have been applied to computer vision tasks and are now state-of-the-art for several sub-domains, including image classification, restoration, and super-resolution. However, no work has evaluated the effects that the latest proposed super-resolution methods may have upon the accuracy and face verification performance in low-resolution "in-the-wild" data. This project aimed at evaluating and adapting different deep neural network architectures for the task of face super-resolution driven by face recognition performance in real-world low-resolution images. The experimental results in a real-world surveillance and attendance datasets showed that general super-resolution architectures might enhance face verification performance of deep neural networks trained on high-resolution faces. Also, since neural networks are function approximators and can be trained based on specific objective functions, the use of a customized loss function optimized for feature extraction showed promising results for recovering discriminant features in low-resolution face images. △ Less

Submitted 18 January, 2021; originally announced January 2021.

Comments: MSc Thesis under supervision of Carlos A. E. Montesco presented at the Federal University of Sergipe, Brazil (2019)

ACM Class: I.4.0; I.4.9

arXiv:2012.15547 [pdf, other]

XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders

Authors: Shuming Ma, Jian Yang, Haoyang Huang, Zewen Chi, Li Dong, Dongdong Zhang, Hany Hassan Awadalla, Alexandre Muzio, Akiko Eriguchi, Saksham Singhal, Xia Song, Arul Menezes, Furu Wei

Abstract: Multilingual machine translation enables a single model to translate between different languages. Most existing multilingual machine translation systems adopt a randomly initialized Transformer backbone. In this work, inspired by the recent success of language model pre-training, we present XLM-T, which initializes the model with an off-the-shelf pretrained cross-lingual Transformer encoder and fi… ▽ More Multilingual machine translation enables a single model to translate between different languages. Most existing multilingual machine translation systems adopt a randomly initialized Transformer backbone. In this work, inspired by the recent success of language model pre-training, we present XLM-T, which initializes the model with an off-the-shelf pretrained cross-lingual Transformer encoder and fine-tunes it with multilingual parallel data. This simple method achieves significant improvements on a WMT dataset with 10 language pairs and the OPUS-100 corpus with 94 pairs. Surprisingly, the method is also effective even upon the strong baseline with back-translation. Moreover, extensive analysis of XLM-T on unsupervised syntactic parsing, word alignment, and multilingual classification explains its effectiveness for machine translation. The code will be at https://aka.ms/xlm-t. △ Less

Submitted 31 December, 2020; originally announced December 2020.

arXiv:2005.06989 [pdf, other]

doi 10.1088/1748-0221/16/05/T05006

A continuous integration and web framework in support of the ATLAS Publication Process

Authors: Juan Pedro Araque Espinosa, Gabriel Baldi Levcovitz, Riccardo-Maria Bianchi, Ian Brock, Tancredi Carli, Nuno Filipe Castro, Alessandra Ciocio, Maurizio Colautti, Ana Carolina Da Silva Menezes, Gabriel De Oliveira da Fonseca, Leandro Domingues Macedo Alves, Andreas Hoecker, Bruno Lange Ramos, Gabriela Lemos Lúcidi Pinhão, Carmen Maidantchik, Fairouz Malek, Robert McPherson, Gianluca Picco, Marcelo Teixeira Dos Santos

Abstract: The ATLAS collaboration defines methods, establishes procedures, and organises advisory groups to manage the publication processes of scientific papers, conference papers, and public notes. All stages are managed through web systems, computing programs, and tools that are designed and developed by the collaboration. A framework called FENCE is integrated into the CERN GitLab software repository, t… ▽ More The ATLAS collaboration defines methods, establishes procedures, and organises advisory groups to manage the publication processes of scientific papers, conference papers, and public notes. All stages are managed through web systems, computing programs, and tools that are designed and developed by the collaboration. A framework called FENCE is integrated into the CERN GitLab software repository, to automatically configure workspaces where each analysis can be documented by the analysis team and managed by the relevant coordinators. Continuous integration is used to guide the writers in applying consistent and correct formatting when preparing papers to be submitted to scientific journals. Additional software assures the correctness of other aspects of each paper, such as the lists of collaboration authors, funding agencies, and foundations. The framework and the workflow therein provide automatic and easy support to the researchers and facilitates each phase of the publication process, allowing authors to focus on the article contents. The framework and its integration with the most up to date and efficient tools has consequently provided a more professional and efficient automatized work environment to the whole collaboration. △ Less

Submitted 28 January, 2021; v1 submitted 14 May, 2020; originally announced May 2020.

Comments: 22 pages in total,11 figures, submitted to JINST. All figures including auxiliary figures are available at https://atlas.web.cern.ch/Atlas/GROUPS/PHYSICS/PAPERS/GENR-2018-01/

Report number: CERN-OPEN-2020-007

arXiv:1803.05567 [pdf, other]

Achieving Human Parity on Automatic Chinese to English News Translation

Authors: Hany Hassan, Anthony Aue, Chang Chen, Vishal Chowdhary, Jonathan Clark, Christian Federmann, Xuedong Huang, Marcin Junczys-Dowmunt, William Lewis, Mu Li, Shujie Liu, Tie-Yan Liu, Renqian Luo, Arul Menezes, Tao Qin, Frank Seide, Xu Tan, Fei Tian, Lijun Wu, Shuangzhi Wu, Yingce Xia, Dongdong Zhang, Zhirui Zhang, Ming Zhou

Abstract: Machine translation has made rapid advances in recent years. Millions of people are using it today in online translation systems and mobile applications in order to communicate across language barriers. The question naturally arises whether such systems can approach or achieve parity with human translations. In this paper, we first address the problem of how to define and accurately measure human… ▽ More Machine translation has made rapid advances in recent years. Millions of people are using it today in online translation systems and mobile applications in order to communicate across language barriers. The question naturally arises whether such systems can approach or achieve parity with human translations. In this paper, we first address the problem of how to define and accurately measure human parity in translation. We then describe Microsoft's machine translation system and measure the quality of its translations on the widely used WMT 2017 news translation task from Chinese to English. We find that our latest neural machine translation system has reached a new state-of-the-art, and that the translation quality is at human parity when compared to professional human translations. We also find that it significantly exceeds the quality of crowd-sourced non-professional translations. △ Less

Submitted 29 June, 2018; v1 submitted 14 March, 2018; originally announced March 2018.

arXiv:1203.6673 [pdf, ps, other]

doi 10.1088/1742-5468/2012/05/P05012

Critical behavior of the SIS epidemic model with time-dependent infection rate

Authors: Nuno Crokidakis, Marcio Argollo de Menezes

Abstract: In this work we study a modified Susceptible-Infected-Susceptible (SIS) model in which the infection rate $λ$ decays exponentially with the number of reinfections $n$, saturating after $n=l$. We find a critical decaying rate $ε_{c}(l)$ above which a finite fraction of the population becomes permanently infected. From the mean-field solution and computer simulations on hypercubic lattices we find e… ▽ More In this work we study a modified Susceptible-Infected-Susceptible (SIS) model in which the infection rate $λ$ decays exponentially with the number of reinfections $n$, saturating after $n=l$. We find a critical decaying rate $ε_{c}(l)$ above which a finite fraction of the population becomes permanently infected. From the mean-field solution and computer simulations on hypercubic lattices we find evidences that the upper critical dimension is 6 like in the SIR model, which can be mapped in ordinary percolation. △ Less

Submitted 29 March, 2012; originally announced March 2012.

Comments: 13 pages, 11 figures, Submitted for publication

Journal ref: J. Stat. Mech. P05012 (2012)

Showing 1–23 of 23 results for author: Menezes, A