Skip to main content

Showing 1–29 of 29 results for author: Zou, A

  1. arXiv:2407.10701  [pdf, other

    cs.CL

    DOCBENCH: A Benchmark for Evaluating LLM-based Document Reading Systems

    Authors: Anni Zou, Wenhao Yu, Hongming Zhang, Kaixin Ma, Deng Cai, Zhuosheng Zhang, Hai Zhao, Dong Yu

    Abstract: Recently, there has been a growing interest among large language model (LLM) developers in LLM-based document reading systems, which enable users to upload their own documents and pose questions related to the document contents, going beyond simple reading comprehension tasks. Consequently, these systems have been carefully designed to tackle challenges such as file parsing, metadata extraction, m… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Work in progress

  2. arXiv:2406.04313  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.CY

    Improving Alignment and Robustness with Circuit Breakers

    Authors: Andy Zou, Long Phan, Justin Wang, Derek Duenas, Maxwell Lin, Maksym Andriushchenko, Rowan Wang, Zico Kolter, Matt Fredrikson, Dan Hendrycks

    Abstract: AI systems can take harmful actions and are highly vulnerable to adversarial attacks. We present an approach, inspired by recent advances in representation engineering, that interrupts the models as they respond with harmful outputs with "circuit breakers." Existing techniques aimed at improving alignment, such as refusal training, are often bypassed. Techniques such as adversarial training try to… ▽ More

    Submitted 12 July, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: Code and models are available at https://github.com/GraySwanAI/circuit-breakers

  3. arXiv:2405.14782  [pdf, other

    cs.CL

    Lessons from the Trenches on Reproducible Evaluation of Language Models

    Authors: Stella Biderman, Hailey Schoelkopf, Lintang Sutawika, Leo Gao, Jonathan Tow, Baber Abbasi, Alham Fikri Aji, Pawan Sasanka Ammanamanchi, Sidney Black, Jordan Clive, Anthony DiPofi, Julen Etxaniz, Benjamin Fattori, Jessica Zosa Forde, Charles Foster, Jeffrey Hsu, Mimansa Jaiswal, Wilson Y. Lee, Haonan Li, Charles Lovering, Niklas Muennighoff, Ellie Pavlick, Jason Phang, Aviya Skowron, Samson Tan , et al. (5 additional authors not shown)

    Abstract: Effective evaluation of language models remains an open challenge in NLP. Researchers and engineers face methodological issues such as the sensitivity of models to evaluation setup, difficulty of proper comparisons across methods, and the lack of reproducibility and transparency. In this paper we draw on three years of experience in evaluating large language models to provide guidance and lessons… ▽ More

    Submitted 29 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  4. arXiv:2405.10389  [pdf, other

    eess.SY cs.LG

    Physics-Informed Heterogeneous Graph Neural Networks for DC Blocker Placement

    Authors: Hongwei Jin, Prasanna Balaprakash, Allen Zou, Pieter Ghysels, Aditi S. Krishnapriyan, Adam Mate, Arthur Barnes, Russell Bent

    Abstract: The threat of geomagnetic disturbances (GMDs) to the reliable operation of the bulk energy system has spurred the development of effective strategies for mitigating their impacts. One such approach involves placing transformer neutral blocking devices, which interrupt the path of geomagnetically induced currents (GICs) to limit their impact. The high cost of these devices and the sparsity of trans… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Paper is accepted by PSCC 2024

  5. arXiv:2403.17447  [pdf, other

    cs.LG cs.CV cs.NE

    Chain of Compression: A Systematic Approach to Combinationally Compress Convolutional Neural Networks

    Authors: Yingtao Shen, Minqing Sun, Jie Zhao, An Zou

    Abstract: Convolutional neural networks (CNNs) have achieved significant popularity, but their computational and memory intensity poses challenges for resource-constrained computing systems, particularly with the prerequisite of real-time performance. To release this burden, model compression has become an important research focus. Many approaches like quantization, pruning, early exit, and knowledge distil… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: 10 pages, 15 figures

  6. arXiv:2403.03218  [pdf, other

    cs.LG cs.AI cs.CL cs.CY

    The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

    Authors: Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatti, Justin D. Li, Ann-Kathrin Dombrowski, Shashwat Goel, Long Phan, Gabriel Mukobi, Nathan Helm-Burger, Rassin Lababidi, Lennart Justen, Andrew B. Liu, Michael Chen, Isabelle Barrass, Oliver Zhang, Xiaoyuan Zhu, Rishub Tamirisa, Bhrugu Bharathi, Adam Khoja, Zhenqi Zhao, Ariel Herbert-Voss, Cort B. Breuer , et al. (32 additional authors not shown)

    Abstract: The White House Executive Order on Artificial Intelligence highlights the risks of large language models (LLMs) empowering malicious actors in developing biological, cyber, and chemical weapons. To measure these risks of malicious use, government institutions and major AI labs are developing evaluations for hazardous capabilities in LLMs. However, current evaluations are private, preventing furthe… ▽ More

    Submitted 15 May, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: See the project page at https://wmdp.ai

  7. arXiv:2402.04249  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

    Authors: Mantas Mazeika, Long Phan, Xuwang Yin, Andy Zou, Zifan Wang, Norman Mu, Elham Sakhaee, Nathaniel Li, Steven Basart, Bo Li, David Forsyth, Dan Hendrycks

    Abstract: Automated red teaming holds substantial promise for uncovering and mitigating the risks associated with the malicious use of large language models (LLMs), yet the field lacks a standardized evaluation framework to rigorously assess new methods. To address this issue, we introduce HarmBench, a standardized evaluation framework for automated red teaming. We identify several desirable properties prev… ▽ More

    Submitted 26 February, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: Website: https://www.harmbench.org

  8. arXiv:2402.00395  [pdf, other

    cs.AR eess.SP

    ONE-SA: Enabling Nonlinear Operations in Systolic Arrays for Efficient and Flexible Neural Network Inference

    Authors: Ruiqi Sun, Yinchen Ni, Xin He, Jie Zhao, An Zou

    Abstract: The computation and memory-intensive nature of DNNs limits their use in many mobile and embedded contexts. Application-specific integrated circuit (ASIC) hardware accelerators employ matrix multiplication units (such as the systolic arrays) and dedicated nonlinear function units to speed up DNN computations. A close examination of these ASIC accelerators reveals that the designs are often speciali… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: Accepted to DATE 2024

  9. arXiv:2311.10537  [pdf, other

    cs.CL cs.AI

    MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning

    Authors: Xiangru Tang, Anni Zou, Zhuosheng Zhang, Ziming Li, Yilun Zhao, Xingyao Zhang, Arman Cohan, Mark Gerstein

    Abstract: Large language models (LLMs), despite their remarkable progress across various general domains, encounter significant barriers in medicine and healthcare. This field faces unique challenges such as domain-specific terminologies and reasoning over specialized knowledge. To address these issues, we propose MedAgents, a novel multi-disciplinary collaboration framework for the medical domain. MedAgent… ▽ More

    Submitted 4 June, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

  10. arXiv:2310.06692  [pdf, other

    cs.CL cs.AI

    Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models

    Authors: Anni Zou, Zhuosheng Zhang, Hai Zhao, Xiangru Tang

    Abstract: Large language models (LLMs) have unveiled remarkable reasoning capabilities by exploiting chain-of-thought (CoT) prompting, which generates intermediate reasoning chains to serve as the rationale for deriving the answer. However, current CoT methods either simply employ general prompts such as Let's think step by step, or heavily rely on pre-defined task-specific demonstrations to attain preferab… ▽ More

    Submitted 20 February, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: 17 pages, 12 figures

  11. arXiv:2310.01405  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.CY

    Representation Engineering: A Top-Down Approach to AI Transparency

    Authors: Andy Zou, Long Phan, Sarah Chen, James Campbell, Phillip Guo, Richard Ren, Alexander Pan, Xuwang Yin, Mantas Mazeika, Ann-Kathrin Dombrowski, Shashwat Goel, Nathaniel Li, Michael J. Byun, Zifan Wang, Alex Mallen, Steven Basart, Sanmi Koyejo, Dawn Song, Matt Fredrikson, J. Zico Kolter, Dan Hendrycks

    Abstract: In this paper, we identify and characterize the emerging area of representation engineering (RepE), an approach to enhancing the transparency of AI systems that draws on insights from cognitive neuroscience. RepE places population-level representations, rather than neurons or circuits, at the center of analysis, equipping us with novel methods for monitoring and manipulating high-level cognitive p… ▽ More

    Submitted 10 October, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: Code is available at https://github.com/andyzoujm/representation-engineering

  12. arXiv:2307.15043  [pdf, other

    cs.CL cs.AI cs.CR cs.LG

    Universal and Transferable Adversarial Attacks on Aligned Language Models

    Authors: Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J. Zico Kolter, Matt Fredrikson

    Abstract: Because "out-of-the-box" large language models are capable of generating a great deal of objectionable content, recent work has focused on aligning these models in an attempt to prevent undesirable generation. While there has been some success at circumventing these measures -- so-called "jailbreaks" against LLMs -- these attacks have required significant human ingenuity and are brittle in practic… ▽ More

    Submitted 20 December, 2023; v1 submitted 27 July, 2023; originally announced July 2023.

    Comments: Website: http://llm-attacks.org/

  13. arXiv:2305.14405  [pdf, other

    cs.LG cs.AI cs.AR

    NeuralMatrix: Compute the Entire Neural Networks with Linear Matrix Operations for Efficient Inference

    Authors: Ruiqi Sun, Siwei Ye, Jie Zhao, Xin He, Yiran Li, An Zou

    Abstract: The inherent diversity of computation types within individual Deep Neural Network (DNN) models imposes a corresponding need for a varied set of computation units within hardware processors. This diversity poses a significant constraint on computation efficiency during the execution of different neural networks. In this study, we present NeuralMatrix, a framework that transforms the computation of… ▽ More

    Submitted 8 February, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: 11 pages, 6figures, Submitted to 41st International Conference on Machine Learning

  14. arXiv:2305.05921  [pdf, other

    cs.CL

    Decker: Double Check with Heterogeneous Knowledge for Commonsense Fact Verification

    Authors: Anni Zou, Zhuosheng Zhang, Hai Zhao

    Abstract: Commonsense fact verification, as a challenging branch of commonsense question-answering (QA), aims to verify through facts whether a given commonsense claim is correct or not. Answering commonsense questions necessitates a combination of knowledge from various levels. However, existing studies primarily rest on grasping either unstructured evidence or potential reasoning paths from structured kno… ▽ More

    Submitted 27 May, 2023; v1 submitted 10 May, 2023; originally announced May 2023.

    Comments: Accepted to ACL 2023 Findings

  15. arXiv:2304.07689  [pdf, other

    cs.CV cs.AI cs.IT cs.LG stat.ML

    Learning Empirical Bregman Divergence for Uncertain Distance Representation

    Authors: Zhiyuan Li, Ziru Liu, Anna Zou, Anca L. Ralescu

    Abstract: Deep metric learning techniques have been used for visual representation in various supervised and unsupervised learning tasks through learning embeddings of samples with deep networks. However, classic approaches, which employ a fixed distance metric as a similarity function between two embeddings, may lead to suboptimal performance for capturing the complex data distribution. The Bregman diverge… ▽ More

    Submitted 15 May, 2023; v1 submitted 16 April, 2023; originally announced April 2023.

    Comments: Accepted by IEEE FUSION 2023

  16. arXiv:2304.03279  [pdf, other

    cs.LG cs.AI cs.CL cs.CY

    Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark

    Authors: Alexander Pan, Jun Shern Chan, Andy Zou, Nathaniel Li, Steven Basart, Thomas Woodside, Jonathan Ng, Hanlin Zhang, Scott Emmons, Dan Hendrycks

    Abstract: Artificial agents have traditionally been trained to maximize reward, which may incentivize power-seeking and deception, analogous to how next-token prediction in language models (LMs) may incentivize toxicity. So do agents naturally learn to be Machiavellian? And how do we measure these behaviors in general-purpose models such as GPT-4? Towards answering these questions, we introduce MACHIAVELLI,… ▽ More

    Submitted 12 June, 2023; v1 submitted 6 April, 2023; originally announced April 2023.

    Comments: ICML 2023 Oral (camera-ready); 31 pages, 5 figures

  17. arXiv:2303.06189  [pdf, other

    cs.LG cs.DC

    Papaya: Federated Learning, but Fully Decentralized

    Authors: Ram M Kripa, Andy Zou, Ryan Jia, Kenny Huang

    Abstract: Federated Learning systems use a centralized server to aggregate model updates. This is a bandwidth and resource-heavy constraint and exposes the system to privacy concerns. We instead implement a peer to peer learning system in which nodes train on their own data and periodically perform a weighted average of their parameters with that of their peers according to a learned trust matrix. So far, w… ▽ More

    Submitted 10 March, 2023; originally announced March 2023.

  18. arXiv:2301.12549  [pdf, other

    cs.LG cs.CV

    Unlocking Deterministic Robustness Certification on ImageNet

    Authors: Kai Hu, Andy Zou, Zifan Wang, Klas Leino, Matt Fredrikson

    Abstract: Despite the promise of Lipschitz-based methods for provably-robust deep learning with deterministic guarantees, current state-of-the-art results are limited to feed-forward Convolutional Networks (ConvNets) on low-dimensional data, such as CIFAR-10. This paper investigates strategies for expanding certifiably robust training to larger, deeper models. A key challenge in certifying deep networks is… ▽ More

    Submitted 29 October, 2023; v1 submitted 29 January, 2023; originally announced January 2023.

  19. arXiv:2210.10039  [pdf, other

    cs.CV cs.CY cs.LG

    How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios

    Authors: Mantas Mazeika, Eric Tang, Andy Zou, Steven Basart, Jun Shern Chan, Dawn Song, David Forsyth, Jacob Steinhardt, Dan Hendrycks

    Abstract: In recent years, deep neural networks have demonstrated increasingly strong abilities to recognize objects and activities in videos. However, as video understanding becomes widely used in real-world applications, a key consideration is developing human-centric systems that understand not only the content of the video but also how it would affect the wellbeing and emotional state of viewers. To fac… ▽ More

    Submitted 18 October, 2022; originally announced October 2022.

    Comments: NeurIPS 2022; datasets available at https://github.com/hendrycks/emodiversity/

  20. arXiv:2206.15474  [pdf, other

    cs.LG cs.CL

    Forecasting Future World Events with Neural Networks

    Authors: Andy Zou, Tristan Xiao, Ryan Jia, Joe Kwon, Mantas Mazeika, Richard Li, Dawn Song, Jacob Steinhardt, Owain Evans, Dan Hendrycks

    Abstract: Forecasting future world events is a challenging but valuable task. Forecasts of climate, geopolitical conflict, pandemics and economic indicators help shape policy and decision making. In these domains, the judgment of expert humans contributes to the best forecasts. Given advances in language modeling, can these forecasts be automated? To this end, we introduce Autocast, a dataset containing tho… ▽ More

    Submitted 9 October, 2022; v1 submitted 30 June, 2022; originally announced June 2022.

    Comments: NeurIPS 2022; our dataset is available at https://github.com/andyzoujm/autocast

  21. arXiv:2206.04685  [pdf, other

    cs.LG cs.AR cs.NE

    Predictive Exit: Prediction of Fine-Grained Early Exits for Computation- and Energy-Efficient Inference

    Authors: Xiangjie Li, Chenfei Lou, Zhengping Zhu, Yuchi Chen, Yingtao Shen, Yehan Ma, An Zou

    Abstract: By adding exiting layers to the deep learning networks, early exit can terminate the inference earlier with accurate results. The passive decision-making of whether to exit or continue the next layer has to go through every pre-placed exiting layer until it exits. In addition, it is also hard to adjust the configurations of the computing platforms alongside the inference proceeds. By incorporating… ▽ More

    Submitted 28 December, 2022; v1 submitted 9 June, 2022; originally announced June 2022.

  22. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  23. arXiv:2203.13262  [pdf, ps, other

    cs.NE cs.AI q-bio.NC

    Interpretability of Neural Network With Physiological Mechanisms

    Authors: Anna Zou, Zhiyuan Li

    Abstract: Deep learning continues to play as a powerful state-of-art technique that has achieved extraordinary accuracy levels in various domains of regression and classification tasks, including images, video, signal, and natural language data. The original goal of proposing the neural network model is to improve the understanding of complex human brains using a mathematical expression approach. However, r… ▽ More

    Submitted 2 June, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

    Comments: Updated a new version

  24. arXiv:2112.05135  [pdf, other

    cs.LG cs.CV

    PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures

    Authors: Dan Hendrycks, Andy Zou, Mantas Mazeika, Leonard Tang, Bo Li, Dawn Song, Jacob Steinhardt

    Abstract: In real-world applications of machine learning, reliable and safe systems must consider measures of performance beyond standard test set accuracy. These other goals include out-of-distribution (OOD) robustness, prediction consistency, resilience to adversaries, calibrated uncertainty estimates, and the ability to detect anomalous inputs. However, improving performance towards these goals is often… ▽ More

    Submitted 29 March, 2022; v1 submitted 9 December, 2021; originally announced December 2021.

    Comments: CVPR 2022. Code and models are available at https://github.com/andyzoujm/pixmix

  25. arXiv:2110.13136  [pdf, other

    cs.LG cs.AI cs.CL cs.CY

    What Would Jiminy Cricket Do? Towards Agents That Behave Morally

    Authors: Dan Hendrycks, Mantas Mazeika, Andy Zou, Sahil Patel, Christine Zhu, Jesus Navarro, Dawn Song, Bo Li, Jacob Steinhardt

    Abstract: When making everyday decisions, people are guided by their conscience, an internal sense of right and wrong. By contrast, artificial agents are currently not endowed with a moral sense. As a consequence, they may learn to behave immorally when trained on environments that ignore moral concerns, such as violent video games. With the advent of generally capable agents that pretrain on many environme… ▽ More

    Submitted 7 February, 2022; v1 submitted 25 October, 2021; originally announced October 2021.

    Comments: NeurIPS 2021. Environments available here https://github.com/hendrycks/jiminy-cricket

  26. arXiv:2101.10463  [pdf, other

    cs.DC cs.AI cs.AR cs.GR

    RTGPU: Real-Time GPU Scheduling of Hard Deadline Parallel Tasks with Fine-Grain Utilization

    Authors: An Zou, Jing Li, Christopher D. Gill, Xuan Zhang

    Abstract: Many emerging cyber-physical systems, such as autonomous vehicles and robots, rely heavily on artificial intelligence and machine learning algorithms to perform important system operations. Since these highly parallel applications are computationally intensive, they need to be accelerated by graphics processing units (GPUs) to meet stringent timing constraints. However, despite the wide adoption o… ▽ More

    Submitted 6 February, 2023; v1 submitted 25 January, 2021; originally announced January 2021.

  27. arXiv:2009.03300  [pdf, other

    cs.CY cs.AI cs.CL cs.LG

    Measuring Massive Multitask Language Understanding

    Authors: Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, Jacob Steinhardt

    Abstract: We propose a new test to measure a text model's multitask accuracy. The test covers 57 tasks including elementary mathematics, US history, computer science, law, and more. To attain high accuracy on this test, models must possess extensive world knowledge and problem solving ability. We find that while most recent models have near random-chance accuracy, the very largest GPT-3 model improves over… ▽ More

    Submitted 12 January, 2021; v1 submitted 7 September, 2020; originally announced September 2020.

    Comments: ICLR 2021; the test and code is available at https://github.com/hendrycks/test

  28. arXiv:1911.11132  [pdf, other

    cs.CV cs.LG

    Scaling Out-of-Distribution Detection for Real-World Settings

    Authors: Dan Hendrycks, Steven Basart, Mantas Mazeika, Andy Zou, Joe Kwon, Mohammadreza Mostajabi, Jacob Steinhardt, Dawn Song

    Abstract: Detecting out-of-distribution examples is important for safety-critical machine learning applications such as detecting novel biological phenomena and self-driving cars. However, existing research mainly focuses on simple small-scale settings. To set the stage for more realistic out-of-distribution detection, we depart from small-scale settings and explore large-scale multiclass and multi-label se… ▽ More

    Submitted 15 May, 2022; v1 submitted 25 November, 2019; originally announced November 2019.

    Comments: ICML 2022; The Species dataset and code are available at https://github.com/hendrycks/anomaly-seg

  29. arXiv:1910.14627  [pdf, other

    cs.NE

    An Automatic Design Framework of Swarm Pattern Formation based on Multi-objective Genetic Programming

    Authors: Zhun Fan, Zhaojun Wang, Xiaomin Zhu, Bingliang Hu, Anmin Zou, Dongwei Bao

    Abstract: Most existing swarm pattern formation methods depend on a predefined gene regulatory network (GRN) structure that requires designers' priori knowledge, which is difficult to adapt to complex and changeable environments. To dynamically adapt to the complex and changeable environments, we propose an automatic design framework of swarm pattern formation based on multi-objective genetic programming. T… ▽ More

    Submitted 1 November, 2019; v1 submitted 31 October, 2019; originally announced October 2019.