Skip to main content

Showing 1–20 of 20 results for author: Zheltonozhskii, E

  1. arXiv:2402.19173  [pdf, other

    cs.SE cs.AI

    StarCoder 2 and The Stack v2: The Next Generation

    Authors: Anton Lozhkov, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy-Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, Tianyang Liu, Max Tian, Denis Kocetkov, Arthur Zucker, Younes Belkada, Zijian Wang, Qian Liu, Dmitry Abulkhanov, Indraneil Paul, Zhuang Li, Wen-Ding Li, Megan Risdal, Jia Li, Jian Zhu, Terry Yue Zhuo , et al. (41 additional authors not shown)

    Abstract: The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  2. arXiv:2308.13900  [pdf, other

    cs.CV cs.LG

    Semi-Supervised Semantic Segmentation via Marginal Contextual Information

    Authors: Moshe Kimhi, Shai Kimhi, Evgenii Zheltonozhskii, Or Litany, Chaim Baskin

    Abstract: We present a novel confidence refinement scheme that enhances pseudo labels in semi-supervised semantic segmentation. Unlike existing methods, which filter pixels with low-confidence predictions in isolation, our approach leverages the spatial correlation of labels in segmentation maps by grouping neighboring pixels and considering their pseudo labels collectively. With this contextual information… ▽ More

    Submitted 3 July, 2024; v1 submitted 26 August, 2023; originally announced August 2023.

    Comments: Published at TMLR

  3. arXiv:2305.06161  [pdf, other

    cs.CL cs.AI cs.PL cs.SE

    StarCoder: may the source be with you!

    Authors: Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, Qian Liu, Evgenii Zheltonozhskii, Terry Yue Zhuo, Thomas Wang, Olivier Dehaene, Mishig Davaadorj, Joel Lamy-Poirier, João Monteiro, Oleh Shliazhko, Nicolas Gontier, Nicholas Meade, Armel Zebaze, Ming-Ho Yee, Logesh Kumar Umapathi, Jian Zhu , et al. (42 additional authors not shown)

    Abstract: The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large colle… ▽ More

    Submitted 13 December, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

  4. arXiv:2206.05967  [pdf, other

    cs.CV cs.LG

    GoToNet: Fast Monocular Scene Exposure and Exploration

    Authors: Tom Avrech, Evgenii Zheltonozhskii, Chaim Baskin, Ehud Rivlin

    Abstract: Autonomous scene exposure and exploration, especially in localization or communication-denied areas, useful for finding targets in unknown scenes, remains a challenging problem in computer navigation. In this work, we present a novel method for real-time environment exploration, whose only requirements are a visually similar dataset for pre-training, enough lighting in the scene, and an on-board f… ▽ More

    Submitted 13 June, 2022; originally announced June 2022.

  5. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  6. arXiv:2201.12843  [pdf, other

    cs.LG

    Graph Representation Learning via Aggregation Enhancement

    Authors: Maxim Fishman, Chaim Baskin, Evgenii Zheltonozhskii, Almog David, Ron Banner, Avi Mendelson

    Abstract: Graph neural networks (GNNs) have become a powerful tool for processing graph-structured data but still face challenges in effectively aggregating and propagating information between layers, which limits their performance. We tackle this problem with the kernel regression (KR) approach, using KR loss as the primary loss in self-supervised settings or as a regularization term in supervised settings… ▽ More

    Submitted 8 February, 2023; v1 submitted 30 January, 2022; originally announced January 2022.

  7. arXiv:2111.14821  [pdf, other

    cs.CV cs.CL cs.LG

    End-to-End Referring Video Object Segmentation with Multimodal Transformers

    Authors: Adam Botach, Evgenii Zheltonozhskii, Chaim Baskin

    Abstract: The referring video object segmentation task (RVOS) involves segmentation of a text-referred object instance in the frames of a given video. Due to the complex nature of this multimodal task, which combines text reasoning, video understanding, instance segmentation and tracking, existing approaches typically rely on sophisticated pipelines in order to tackle it. In this paper, we propose a simple… ▽ More

    Submitted 3 April, 2022; v1 submitted 29 November, 2021; originally announced November 2021.

    Comments: Accepted to CVPR 2022

  8. Contrast to Divide: Self-Supervised Pre-Training for Learning with Noisy Labels

    Authors: Evgenii Zheltonozhskii, Chaim Baskin, Avi Mendelson, Alex M. Bronstein, Or Litany

    Abstract: The success of learning with noisy labels (LNL) methods relies heavily on the success of a warm-up stage where standard supervised training is performed using the full (noisy) training set. In this paper, we identify a "warm-up obstacle": the inability of standard warm-up stages to train high quality feature extractors and avert memorization of noisy labels. We propose "Contrast to Divide" (C2D),… ▽ More

    Submitted 20 October, 2021; v1 submitted 25 March, 2021; originally announced March 2021.

  9. Single-Node Attacks for Fooling Graph Neural Networks

    Authors: Ben Finkelshtein, Chaim Baskin, Evgenii Zheltonozhskii, Uri Alon

    Abstract: Graph neural networks (GNNs) have shown broad applicability in a variety of domains. These domains, e.g., social networks and product recommendations, are fertile ground for malicious users and behavior. In this paper, we show that GNNs are vulnerable to the extremely limited (and thus quite realistic) scenarios of a single-node adversarial attack, where the perturbed node cannot be chosen by the… ▽ More

    Submitted 29 September, 2022; v1 submitted 6 November, 2020; originally announced November 2020.

    Comments: Appeared in Neurocomputing

  10. arXiv:2008.10312  [pdf, other

    cs.CV cs.LG

    Self-Supervised Learning for Large-Scale Unsupervised Image Clustering

    Authors: Evgenii Zheltonozhskii, Chaim Baskin, Alex M. Bronstein, Avi Mendelson

    Abstract: Unsupervised learning has always been appealing to machine learning researchers and practitioners, allowing them to avoid an expensive and complicated process of labeling the data. However, unsupervised learning of complex data is challenging, and even the best approaches show much weaker performance than their supervised counterparts. Self-supervised deep learning has become a strong instrument f… ▽ More

    Submitted 9 November, 2020; v1 submitted 24 August, 2020; originally announced August 2020.

    Comments: accepted to NeurIPS 2020 Workshop: Self-Supervised Learning - Theory and Practice

  11. HCM: Hardware-Aware Complexity Metric for Neural Network Architectures

    Authors: Alex Karbachevsky, Chaim Baskin, Evgenii Zheltonozhskii, Yevgeny Yermolin, Freddy Gabbay, Alex M. Bronstein, Avi Mendelson

    Abstract: Convolutional Neural Networks (CNNs) have become common in many fields including computer vision, speech recognition, and natural language processing. Although CNN hardware accelerators are already included as part of many SoC architectures, the task of achieving high accuracy on resource-restricted devices is still considered challenging, mainly due to the vast number of design parameters that ne… ▽ More

    Submitted 26 April, 2020; v1 submitted 19 April, 2020; originally announced April 2020.

  12. arXiv:2003.02188  [pdf, other

    cs.LG cs.CV stat.ML

    Colored Noise Injection for Training Adversarially Robust Neural Networks

    Authors: Evgenii Zheltonozhskii, Chaim Baskin, Yaniv Nemcovsky, Brian Chmiel, Avi Mendelson, Alex M. Bronstein

    Abstract: Even though deep learning has shown unmatched performance on various tasks, neural networks have been shown to be vulnerable to small adversarial perturbations of the input that lead to significant performance degradation. In this work we extend the idea of adding white Gaussian noise to the network weights and activations during adversarial training (PNI) to the injection of colored noise for def… ▽ More

    Submitted 20 March, 2020; v1 submitted 4 March, 2020; originally announced March 2020.

  13. arXiv:1911.07198  [pdf, other

    cs.LG cs.CV stat.ML

    Smoothed Inference for Adversarially-Trained Models

    Authors: Yaniv Nemcovsky, Evgenii Zheltonozhskii, Chaim Baskin, Brian Chmiel, Maxim Fishman, Alex M. Bronstein, Avi Mendelson

    Abstract: Deep neural networks are known to be vulnerable to adversarial attacks. Current methods of defense from such attacks are based on either implicit or explicit regularization, e.g., adversarial training. Randomized smoothing, the averaging of the classifier outputs over a random distribution centered in the sample, has been shown to guarantee the performance of a classifier subject to bounded pertur… ▽ More

    Submitted 16 March, 2020; v1 submitted 17 November, 2019; originally announced November 2019.

  14. arXiv:1911.07190  [pdf, other

    cs.LG cs.CV

    Loss Aware Post-training Quantization

    Authors: Yury Nahshan, Brian Chmiel, Chaim Baskin, Evgenii Zheltonozhskii, Ron Banner, Alex M. Bronstein, Avi Mendelson

    Abstract: Neural network quantization enables the deployment of large models on resource-constrained devices. Current post-training quantization methods fall short in terms of accuracy for INT4 (or lower) but provide reasonable accuracy for INT8 (or above). In this work, we study the effect of quantization on the structure of the loss landscape. Additionally, we show that the structure is flat and separable… ▽ More

    Submitted 16 March, 2020; v1 submitted 17 November, 2019; originally announced November 2019.

  15. arXiv:1909.11481  [pdf, other

    cs.CV cs.LG

    CAT: Compression-Aware Training for bandwidth reduction

    Authors: Chaim Baskin, Brian Chmiel, Evgenii Zheltonozhskii, Ron Banner, Alex M. Bronstein, Avi Mendelson

    Abstract: Convolutional neural networks (CNNs) have become the dominant neural network architecture for solving visual processing tasks. One of the major obstacles hindering the ubiquitous use of CNNs for inference is their relatively high memory bandwidth requirements, which can be a main energy consumer and throughput bottleneck in hardware accelerators. Accordingly, an efficient feature map compression m… ▽ More

    Submitted 25 September, 2019; originally announced September 2019.

  16. Feature Map Transform Coding for Energy-Efficient CNN Inference

    Authors: Brian Chmiel, Chaim Baskin, Ron Banner, Evgenii Zheltonozhskii, Yevgeny Yermolin, Alex Karbachevsky, Alex M. Bronstein, Avi Mendelson

    Abstract: Convolutional neural networks (CNNs) achieve state-of-the-art accuracy in a variety of tasks in computer vision and beyond. One of the major obstacles hindering the ubiquitous use of CNNs for inference on low-power edge devices is their high computational complexity and memory bandwidth requirements. The latter often dominates the energy footprint on modern hardware. In this paper, we introduce a… ▽ More

    Submitted 26 September, 2019; v1 submitted 26 May, 2019; originally announced May 2019.

  17. arXiv:1904.09872  [pdf, other

    cs.CV cs.LG cs.NE

    Towards Learning of Filter-Level Heterogeneous Compression of Convolutional Neural Networks

    Authors: Yochai Zur, Chaim Baskin, Evgenii Zheltonozhskii, Brian Chmiel, Itay Evron, Alex M. Bronstein, Avi Mendelson

    Abstract: Recently, deep learning has become a de facto standard in machine learning with convolutional neural networks (CNNs) demonstrating spectacular success on a wide variety of tasks. However, CNNs are typically very demanding computationally at inference time. One of the ways to alleviate this burden on certain hardware platforms is quantization relying on the use of low-precision arithmetic represent… ▽ More

    Submitted 26 September, 2019; v1 submitted 22 April, 2019; originally announced April 2019.

    Comments: Accepted to ICML Workshop on AutoML 2019

  18. NICE: Noise Injection and Clamping Estimation for Neural Network Quantization

    Authors: Chaim Baskin, Natan Liss, Yoav Chai, Evgenii Zheltonozhskii, Eli Schwartz, Raja Giryes, Avi Mendelson, Alexander M. Bronstein

    Abstract: Convolutional Neural Networks (CNN) are very popular in many fields including computer vision, speech recognition, natural language processing, to name a few. Though deep learning leads to groundbreaking performance in these domains, the networks used are very demanding computationally and are far from real-time even on a GPU, which is not power efficient and therefore does not suit low power syst… ▽ More

    Submitted 2 October, 2018; v1 submitted 29 September, 2018; originally announced October 2018.

  19. arXiv:1804.10969  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    UNIQ: Uniform Noise Injection for Non-Uniform Quantization of Neural Networks

    Authors: Chaim Baskin, Eli Schwartz, Evgenii Zheltonozhskii, Natan Liss, Raja Giryes, Alex M. Bronstein, Avi Mendelson

    Abstract: We present a novel method for neural network quantization that emulates a non-uniform $k$-quantile quantizer, which adapts to the distribution of the quantized parameters. Our approach provides a novel alternative to the existing uniform quantization techniques for neural networks. We suggest to compare the results as a function of the bit-operations (BOPS) performed, assuming a look-up table avai… ▽ More

    Submitted 2 October, 2018; v1 submitted 29 April, 2018; originally announced April 2018.

  20. Streaming Architecture for Large-Scale Quantized Neural Networks on an FPGA-Based Dataflow Platform

    Authors: Chaim Baskin, Natan Liss, Evgenii Zheltonozhskii, Alex M. Bronshtein, Avi Mendelson

    Abstract: Deep neural networks (DNNs) are used by different applications that are executed on a range of computer architectures, from IoT devices to supercomputers. The footprint of these networks is huge as well as their computational and communication needs. In order to ease the pressure on resources, research indicates that in many cases a low precision representation (1-2 bit per parameter) of weights a… ▽ More

    Submitted 13 March, 2018; v1 submitted 31 July, 2017; originally announced August 2017.

    Comments: Will appear in RAW 2018