subscribe to arXiv mailings

3D Geometric Shape Assembly via Efficient Point Cloud Matching

Authors: Nahyuk Lee, Juhong Min, Junha Lee, Seungwook Kim, Kanghee Lee, Jaesik Park, Minsu Cho

Abstract: Learning to assemble geometric shapes into a larger target structure is a pivotal task in various practical applications. In this work, we tackle this problem by establishing local correspondences between point clouds of part shapes in both coarse- and fine-levels. To this end, we introduce Proxy Match Transform (PMT), an approximate high-order feature transform layer that enables reliable matchin… ▽ More Learning to assemble geometric shapes into a larger target structure is a pivotal task in various practical applications. In this work, we tackle this problem by establishing local correspondences between point clouds of part shapes in both coarse- and fine-levels. To this end, we introduce Proxy Match Transform (PMT), an approximate high-order feature transform layer that enables reliable matching between mating surfaces of parts while incurring low costs in memory and computation. Building upon PMT, we introduce a new framework, dubbed Proxy Match TransformeR (PMTR), for the geometric assembly task. We evaluate the proposed PMTR on the large-scale 3D geometric shape assembly benchmark dataset of Breaking Bad and demonstrate its superior performance and efficiency compared to state-of-the-art methods. Project page: https://nahyuklee.github.io/pmtr. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: Accepted to ICML 2024

arXiv:2407.10461 [pdf, ps, other]

Multibeam Satellite Communications with Massive MIMO: Asymptotic Performance Analysis and Design Insights

Authors: Seyong Kim, Jinseok Choi, Wonjae Shin, Namyoon Lee, Jeonghun Park

Abstract: To achieve high performance without substantial overheads associated with channel state information (CSI) of ground users, we consider a fixed-beam precoding approach, where a satellite forms multiple fixed-beams without relying on CSI, then select a suitable user set for each beam. Upon this precoding method, we put forth a satellite equipped with massive multiple-input multiple-output (MIMO), by… ▽ More To achieve high performance without substantial overheads associated with channel state information (CSI) of ground users, we consider a fixed-beam precoding approach, where a satellite forms multiple fixed-beams without relying on CSI, then select a suitable user set for each beam. Upon this precoding method, we put forth a satellite equipped with massive multiple-input multiple-output (MIMO), by which inter-beam interference is efficiently mitigated by narrowing corresponding beam width. By modeling the ground users' locations via a Poisson point process, we rigorously analyze the achievable performance of the presented multibeam satellite system. In particular, we investigate the asymptotic scaling laws that reveal the interplay between the user density, the number of beams, and the number of antennas. Our analysis offers critical design insights for the multibeam satellite with massive MIMO: i) If the user density scales in power with the number of antennas, the considered precoding can achieve a linear fraction of the optimal rate in the asymptotic regime. ii) A certain additional scaling factor for the user density is needed as the number of beams increases to maintain the asymptotic optimality. △ Less

Submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.09043 [pdf, other]

Molecule Language Model with Augmented Pairs and Expertise Transfer

Authors: Namkyeong Lee, Siddhartha Laghuvarapu, Chanyoung Park, Jimeng Sun

Abstract: Understanding the molecules and their textual descriptions via molecule language models (MoLM) recently got a surge of interest among researchers. However, unique challenges exist in the field of MoLM due to 1) a limited amount of molecule-text paired data and 2) missing expertise that occurred due to the specialized areas of focus among the experts. To this end, we propose AMOLE, which 1) augment… ▽ More Understanding the molecules and their textual descriptions via molecule language models (MoLM) recently got a surge of interest among researchers. However, unique challenges exist in the field of MoLM due to 1) a limited amount of molecule-text paired data and 2) missing expertise that occurred due to the specialized areas of focus among the experts. To this end, we propose AMOLE, which 1) augments molecule-text pairs with structural similarity preserving loss, and 2) transfers the expertise between the molecules. Extensive experiments on various downstream tasks demonstrate the superiority of AMOLE in comprehending molecules and their descriptions, highlighting its potential for application in real-world drug discovery. △ Less

Submitted 16 July, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

Comments: CIKM 2024 / ACL 2024 Workshop on Languages and Molecule

arXiv:2406.15524 [pdf, other]

Rethinking Pruning Large Language Models: Benefits and Pitfalls of Reconstruction Error Minimization

Authors: Sungbin Shin, Wonpyo Park, Jaeho Lee, Namhoon Lee

Abstract: This work suggests fundamentally rethinking the current practice of pruning large language models (LLMs). The way it is done is by divide and conquer: split the model into submodels, sequentially prune them, and reconstruct predictions of the dense counterparts on small calibration data one at a time; the final model is obtained simply by putting the resulting sparse submodels together. While this… ▽ More This work suggests fundamentally rethinking the current practice of pruning large language models (LLMs). The way it is done is by divide and conquer: split the model into submodels, sequentially prune them, and reconstruct predictions of the dense counterparts on small calibration data one at a time; the final model is obtained simply by putting the resulting sparse submodels together. While this approach enables pruning under memory constraints, it generates high reconstruction errors. In this work, we first present an array of reconstruction techniques that can significantly reduce this error by more than $90\%$. Unwittingly, however, we discover that minimizing reconstruction error is not always ideal and can overfit the given calibration data, resulting in rather increased language perplexity and poor performance at downstream tasks. We find out that a strategy of self-generating calibration data can mitigate this trade-off between reconstruction and generalization, suggesting new directions in the presence of both benefits and pitfalls of reconstruction for pruning LLMs. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.09948 [pdf, other]

BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages

Authors: Junho Myung, Nayeon Lee, Yi Zhou, Jiho Jin, Rifki Afina Putri, Dimosthenis Antypas, Hsuvas Borkakoty, Eunsu Kim, Carla Perez-Almendros, Abinew Ali Ayele, Víctor Gutiérrez-Basulto, Yazmín Ibáñez-García, Hwaran Lee, Shamsuddeen Hassan Muhammad, Kiwoong Park, Anar Sabuhi Rzayev, Nina White, Seid Muhie Yimam, Mohammad Taher Pilehvar, Nedjma Ousidhoum, Jose Camacho-Collados, Alice Oh

Abstract: Large language models (LLMs) often lack culture-specific knowledge of daily life, especially across diverse regions and non-English languages. Existing benchmarks for evaluating LLMs' cultural sensitivities are limited to a single language or collected from online sources such as Wikipedia, which do not reflect the mundane everyday lifestyles of diverse regions. That is, information about the food… ▽ More Large language models (LLMs) often lack culture-specific knowledge of daily life, especially across diverse regions and non-English languages. Existing benchmarks for evaluating LLMs' cultural sensitivities are limited to a single language or collected from online sources such as Wikipedia, which do not reflect the mundane everyday lifestyles of diverse regions. That is, information about the food people eat for their birthday celebrations, spices they typically use, musical instruments youngsters play, or the sports they practice in school is common cultural knowledge but uncommon in easily collected online sources, especially for underrepresented cultures. To address this issue, we introduce BLEnD, a hand-crafted benchmark designed to evaluate LLMs' everyday knowledge across diverse cultures and languages. BLEnD comprises 52.6k question-answer pairs from 16 countries/regions, in 13 different languages, including low-resource ones such as Amharic, Assamese, Azerbaijani, Hausa, and Sundanese. We construct the benchmark to include two formats of questions: short-answer and multiple-choice. We show that LLMs perform better for cultures that are highly represented online, with a maximum 57.34% difference in GPT-4, the best-performing model, in the short-answer format. For cultures represented by mid-to-high-resource languages, LLMs perform better in their local languages, but for cultures represented by low-resource languages, LLMs perform better in English than the local languages. We make our dataset publicly available at: https://github.com/nlee0212/BLEnD. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.06424 [pdf, other]

Margin-aware Preference Optimization for Aligning Diffusion Models without Reference

Authors: Jiwoo Hong, Sayak Paul, Noah Lee, Kashif Rasul, James Thorne, Jongheon Jeong

Abstract: Modern alignment techniques based on human preferences, such as RLHF and DPO, typically employ divergence regularization relative to the reference model to ensure training stability. However, this often limits the flexibility of models during alignment, especially when there is a clear distributional discrepancy between the preference data and the reference model. In this paper, we focus on the al… ▽ More Modern alignment techniques based on human preferences, such as RLHF and DPO, typically employ divergence regularization relative to the reference model to ensure training stability. However, this often limits the flexibility of models during alignment, especially when there is a clear distributional discrepancy between the preference data and the reference model. In this paper, we focus on the alignment of recent text-to-image diffusion models, such as Stable Diffusion XL (SDXL), and find that this "reference mismatch" is indeed a significant problem in aligning these models due to the unstructured nature of visual modalities: e.g., a preference for a particular stylistic aspect can easily induce such a discrepancy. Motivated by this observation, we propose a novel and memory-friendly preference alignment method for diffusion models that does not depend on any reference model, coined margin-aware preference optimization (MaPO). MaPO jointly maximizes the likelihood margin between the preferred and dispreferred image sets and the likelihood of the preferred sets, simultaneously learning general stylistic features and preferences. For evaluation, we introduce two new pairwise preference datasets, which comprise self-generated image pairs from SDXL, Pick-Style and Pick-Safety, simulating diverse scenarios of reference mismatch. Our experiments validate that MaPO can significantly improve alignment on Pick-Style and Pick-Safety and general preference alignment when used with Pick-a-Pic v2, surpassing the base SDXL and other existing methods. Our code, models, and datasets are publicly available via https://mapo-t2i.github.io △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: Preprint

arXiv:2406.05761 [pdf, other]

The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models

Authors: Seungone Kim, Juyoung Suk, Ji Yong Cho, Shayne Longpre, Chaeeun Kim, Dongkeun Yoon, Guijin Son, Yejin Cho, Sheikh Shafayat, Jinheon Baek, Sue Hyun Park, Hyeonbin Hwang, Jinkyung Jo, Hyowon Cho, Haebin Shin, Seongyun Lee, Hanseok Oh, Noah Lee, Namgyu Ho, Se June Joo, Miyoung Ko, Yoonjoo Lee, Hyungjoo Chae, Jamin Shin, Joel Jang , et al. (7 additional authors not shown)

Abstract: As language models (LMs) become capable of handling a wide range of tasks, their evaluation is becoming as challenging as their development. Most generation benchmarks currently assess LMs using abstract evaluation criteria like helpfulness and harmlessness, which often lack the flexibility and granularity of human assessment. Additionally, these benchmarks tend to focus disproportionately on spec… ▽ More As language models (LMs) become capable of handling a wide range of tasks, their evaluation is becoming as challenging as their development. Most generation benchmarks currently assess LMs using abstract evaluation criteria like helpfulness and harmlessness, which often lack the flexibility and granularity of human assessment. Additionally, these benchmarks tend to focus disproportionately on specific capabilities such as instruction following, leading to coverage bias. To overcome these limitations, we introduce the BiGGen Bench, a principled generation benchmark designed to thoroughly evaluate nine distinct capabilities of LMs across 77 diverse tasks. A key feature of the BiGGen Bench is its use of instance-specific evaluation criteria, closely mirroring the nuanced discernment of human evaluation. We apply this benchmark to assess 103 frontier LMs using five evaluator LMs. Our code, data, and evaluation results are all publicly available at https://github.com/prometheus-eval/prometheus-eval/tree/main/BiGGen-Bench. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Comments: Work in Progress

arXiv:2404.14276 [pdf, other]

A Bayesian Approach for Prioritising Driving Behaviour Investigations in Telematic Auto Insurance Policies

Authors: Mark McLeod, Bernardo Perez-Orozco, Nika Lee, Davide Zilli

Abstract: Automotive insurers increasingly have access to telematic information via black-box recorders installed in the insured vehicle, and wish to identify undesirable behaviour which may signify increased risk or uninsured activities. However, identification of such behaviour with machine learning is non-trivial, and results are far from perfect, requiring human investigation to verify suspected cases.… ▽ More Automotive insurers increasingly have access to telematic information via black-box recorders installed in the insured vehicle, and wish to identify undesirable behaviour which may signify increased risk or uninsured activities. However, identification of such behaviour with machine learning is non-trivial, and results are far from perfect, requiring human investigation to verify suspected cases. An appropriately formed priority score, generated by automated analysis of GPS data, allows underwriters to make more efficient use of their time, improving detection of the behaviour under investigation. An example of such behaviour is the use of a privately insured vehicle for commercial purposes, such as delivering meals and parcels. We first make use of trip GPS and accelerometer data, augmented by geospatial information, to train an imperfect classifier for delivery driving on a per-trip basis. We make use of a mixture of Beta-Binomial distributions to model the propensity of a policyholder to undertake trips which result in a positive classification as being drawn from either a rare high-scoring or common low-scoring group, and learn the parameters of this model using MCMC. This model provides us with a posterior probability that any policyholder will be a regular generator of automated alerts given any number of trips and alerts. This posterior probability is converted to a priority score, which was used to select the most valuable candidates for manual investigation. Testing over a 1-year period ranked policyholders by likelihood of commercial driving activity on a weekly basis. The top 0.9% have been reviewed at least once by the underwriters at the time of writing, and of those 99.4% have been confirmed as correctly identified, showing the approach has achieved a significant improvement in efficiency of human resource allocation compared to manual searching. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: International Congress of Actuaries (2023)

arXiv:2404.01954 [pdf, other]

HyperCLOVA X Technical Report

Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs. △ Less

Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 44 pages; updated authors list and fixed author names

arXiv:2403.18932 [pdf, other]

Measuring Political Bias in Large Language Models: What Is Said and How It Is Said

Authors: Yejin Bang, Delong Chen, Nayeon Lee, Pascale Fung

Abstract: We propose to measure political bias in LLMs by analyzing both the content and style of their generated content regarding political issues. Existing benchmarks and measures focus on gender and racial biases. However, political bias exists in LLMs and can lead to polarization and other harms in downstream applications. In order to provide transparency to users, we advocate that there should be fine… ▽ More We propose to measure political bias in LLMs by analyzing both the content and style of their generated content regarding political issues. Existing benchmarks and measures focus on gender and racial biases. However, political bias exists in LLMs and can lead to polarization and other harms in downstream applications. In order to provide transparency to users, we advocate that there should be fine-grained and explainable measures of political biases generated by LLMs. Our proposed measure looks at different political issues such as reproductive rights and climate change, at both the content (the substance of the generation) and the style (the lexical polarity) of such bias. We measured the political bias in eleven open-sourced LLMs and showed that our proposed framework is easily scalable to other topics and is explainable. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: 16 pages

arXiv:2403.16372 [pdf, other]

SignSGD with Federated Voting

Authors: Chanho Park, H. Vincent Poor, Namyoon Lee

Abstract: Distributed learning is commonly used for accelerating model training by harnessing the computational capabilities of multiple-edge devices. However, in practical applications, the communication delay emerges as a bottleneck due to the substantial information exchange required between workers and a central parameter server. SignSGD with majority voting (signSGD-MV) is an effective distributed lear… ▽ More Distributed learning is commonly used for accelerating model training by harnessing the computational capabilities of multiple-edge devices. However, in practical applications, the communication delay emerges as a bottleneck due to the substantial information exchange required between workers and a central parameter server. SignSGD with majority voting (signSGD-MV) is an effective distributed learning algorithm that can significantly reduce communication costs by one-bit quantization. However, due to heterogeneous computational capabilities, it fails to converge when the mini-batch sizes differ among workers. To overcome this, we propose a novel signSGD optimizer with \textit{federated voting} (signSGD-FV). The idea of federated voting is to exploit learnable weights to perform weighted majority voting. The server learns the weights assigned to the edge devices in an online fashion based on their computational capabilities. Subsequently, these weights are employed to decode the signs of the aggregated local gradients in such a way to minimize the sign decoding error probability. We provide a unified convergence rate analysis framework applicable to scenarios where the estimated weights are known to the parameter server either perfectly or imperfectly. We demonstrate that the proposed signSGD-FV algorithm has a theoretical convergence guarantee even when edge devices use heterogeneous mini-batch sizes. Experimental results show that signSGD-FV outperforms signSGD-MV, exhibiting a faster convergence rate, especially in heterogeneous mini-batch sizes. △ Less

Submitted 24 March, 2024; originally announced March 2024.

arXiv:2403.15692 [pdf, other]

Block Orthogonal Sparse Superposition Codes for $ \sf{L}^3 $ Communications: Low Error Rate, Low Latency, and Low Power Consumption

Authors: Donghwa Han, Bowhyung Lee, Min Jang, Donghun Lee, Seho Myung, Namyoon Lee

Abstract: Block orthogonal sparse superposition (BOSS) code is a class of joint coded modulation methods, which can closely achieve the finite-blocklength capacity with a low-complexity decoder at a few coding rates under Gaussian channels. However, for fading channels, the code performance degrades considerably because coded symbols experience different channel fading effects. In this paper, we put forth n… ▽ More Block orthogonal sparse superposition (BOSS) code is a class of joint coded modulation methods, which can closely achieve the finite-blocklength capacity with a low-complexity decoder at a few coding rates under Gaussian channels. However, for fading channels, the code performance degrades considerably because coded symbols experience different channel fading effects. In this paper, we put forth novel joint demodulation and decoding methods for BOSS codes under fading channels. For a fast fading channel, we present a minimum mean square error approximate maximum a posteriori (MMSE-A-MAP) algorithm for the joint demodulation and decoding when channel state information is available at the receiver (CSIR). We also propose a joint demodulation and decoding method without using CSIR for a block fading channel scenario. We refer to this as the non-coherent sphere decoding (NSD) algorithm. Simulation results demonstrate that BOSS codes with MMSE-A-MAP decoding outperform CRC-aided polar codes, while NSD decoding achieves comparable performance to quasi-maximum likelihood decoding with significantly reduced complexity. Both decoding algorithms are suitable for parallelization, satisfying low-latency constraints. Additionally, real-time simulations on a software-defined radio testbed validate the feasibility of using BOSS codes for low-power transmission. △ Less

Submitted 22 March, 2024; originally announced March 2024.

arXiv:2403.15042 [pdf, other]

LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement

Authors: Nicholas Lee, Thanakul Wattanawong, Sehoon Kim, Karttikeya Mangalam, Sheng Shen, Gopala Anumanchipalli, Michael W. Mahoney, Kurt Keutzer, Amir Gholami

Abstract: Pretrained large language models (LLMs) are currently state-of-the-art for solving the vast majority of natural language processing tasks. While many real-world applications still require fine-tuning to reach satisfactory levels of performance, many of them are in the low-data regime, making fine-tuning challenging. To address this, we propose LLM2LLM, a targeted and iterative data augmentation st… ▽ More Pretrained large language models (LLMs) are currently state-of-the-art for solving the vast majority of natural language processing tasks. While many real-world applications still require fine-tuning to reach satisfactory levels of performance, many of them are in the low-data regime, making fine-tuning challenging. To address this, we propose LLM2LLM, a targeted and iterative data augmentation strategy that uses a teacher LLM to enhance a small seed dataset by augmenting additional data that can be used for fine-tuning on a specific task. LLM2LLM (1) fine-tunes a baseline student LLM on the initial seed data, (2) evaluates and extracts data points that the model gets wrong, and (3) uses a teacher LLM to generate synthetic data based on these incorrect data points, which are then added back into the training data. This approach amplifies the signal from incorrectly predicted data points by the LLM during training and reintegrates them into the dataset to focus on more challenging examples for the LLM. Our results show that LLM2LLM significantly enhances the performance of LLMs in the low-data regime, outperforming both traditional fine-tuning and other data augmentation baselines. LLM2LLM reduces the dependence on labor-intensive data curation and paves the way for more scalable and performant LLM solutions, allowing us to tackle data-constrained domains and tasks. We achieve improvements up to 24.2% on the GSM8K dataset, 32.6% on CaseHOLD, 32.0% on SNIPS, 52.6% on TREC and 39.8% on SST-2 over regular fine-tuning in the low-data regime using a Llama-2-7B student model. Our code is available at https://github.com/SqueezeAILab/LLM2LLM . △ Less

Submitted 13 July, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

Comments: ACL 2024

arXiv:2403.11762 [pdf, other]

Full-Duplex MU-MIMO Systems with Coarse Quantization: How Many Bits Do We Need?

Authors: Seunghyeong Yoo, Seokjun Park, Mintaek Oh, Namyoon Lee, Jinseok Choi

Abstract: This paper investigates full-duplex (FD) multi-user multiple-input multiple-output (MU-MIMO) system design with coarse quantization. We first analyze the impact of self-interference (SI) on quantization in FD single-input single-output systems. The analysis elucidates that the minimum required number of analog-to-digital converter (ADC) bits is logarithmically proportional to the ratio of total re… ▽ More This paper investigates full-duplex (FD) multi-user multiple-input multiple-output (MU-MIMO) system design with coarse quantization. We first analyze the impact of self-interference (SI) on quantization in FD single-input single-output systems. The analysis elucidates that the minimum required number of analog-to-digital converter (ADC) bits is logarithmically proportional to the ratio of total received power to the received power of desired signals. Motivated by this, we design a FD MIMO beamforming method that effectively manages the SI. Dividing a spectral efficiency maximization beamforming problem into two sub-problems for alternating optimization, we address the first by optimizing the precoder: obtaining a generalized eigenvalue problem from the first-order optimality condition, where the principal eigenvector is the optimal stationary solution, and adopting a power iteration method to identify this eigenvector. Subsequently, a quantization-aware minimum mean square error combiner is computed for the derived precoder. Through numerical studies, we observe that the proposed beamformer reduces the minimum required number of ADC bits for achieving higher spectral efficiency than that of half-duplex (HD) systems, compared to FD benchmarks. The overall analysis shows that, unlike with quantized HD systems, more than 6 bits are required for the ADC to fully realize the potential of the quantized FD system. △ Less

Submitted 18 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.07821 [pdf, other]

Augmenting Interpolation-Based Model Checking with Auxiliary Invariants (Extended Version)

Authors: Dirk Beyer, Po-Chun Chien, Nian-Ze Lee

Abstract: Software model checking is a challenging problem, and generating relevant invariants is a key factor in proving the safety properties of a program. Program invariants can be obtained by various approaches, including lightweight procedures based on data-flow analysis and intensive techniques using Craig interpolation. Although data-flow analysis runs efficiently, it often produces invariants that a… ▽ More Software model checking is a challenging problem, and generating relevant invariants is a key factor in proving the safety properties of a program. Program invariants can be obtained by various approaches, including lightweight procedures based on data-flow analysis and intensive techniques using Craig interpolation. Although data-flow analysis runs efficiently, it often produces invariants that are too weak to prove the properties. By contrast, interpolation-based approaches build strong invariants from interpolants, but they might not scale well due to expensive interpolation procedures. Invariants can also be injected into model-checking algorithms to assist the analysis. Invariant injection has been studied for many well-known approaches, including k-induction, predicate abstraction, and symbolic execution. We propose an augmented interpolation-based verification algorithm that injects external invariants into interpolation-based model checking (McMillan, 2003), a hardware model-checking algorithm recently adopted for software verification. The auxiliary invariants help prune unreachable states in Craig interpolants and confine the analysis to the reachable parts of a program. We implemented the proposed technique in the verification framework CPAchecker and evaluated it against mature SMT-based methods in CPAchecker as well as other state-of-the-art software verifiers. We found that injecting invariants reduces the number of interpolation queries needed to prove safety properties and improves the run-time efficiency. Consequently, the proposed invariant-injection approach verified difficult tasks that none of its plain version (i.e., without invariants), the invariant generator, or any compared tools could solve. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.07691 [pdf, other]

ORPO: Monolithic Preference Optimization without Reference Model

Authors: Jiwoo Hong, Noah Lee, James Thorne

Abstract: While recent preference alignment algorithms for language models have demonstrated promising results, supervised fine-tuning (SFT) remains imperative for achieving successful convergence. In this paper, we study the crucial role of SFT within the context of preference alignment, emphasizing that a minor penalty for the disfavored generation style is sufficient for preference-aligned SFT. Building… ▽ More While recent preference alignment algorithms for language models have demonstrated promising results, supervised fine-tuning (SFT) remains imperative for achieving successful convergence. In this paper, we study the crucial role of SFT within the context of preference alignment, emphasizing that a minor penalty for the disfavored generation style is sufficient for preference-aligned SFT. Building on this foundation, we introduce a straightforward and innovative reference model-free monolithic odds ratio preference optimization algorithm, ORPO, eliminating the necessity for an additional preference alignment phase. We demonstrate, both empirically and theoretically, that the odds ratio is a sensible choice for contrasting favored and disfavored styles during SFT across the diverse sizes from 125M to 7B. Specifically, fine-tuning Phi-2 (2.7B), Llama-2 (7B), and Mistral (7B) with ORPO on the UltraFeedback alone surpasses the performance of state-of-the-art language models with more than 7B and 13B parameters: achieving up to 12.20% on $\text{AlpacaEval}_{2.0}$ (Figure 1), 66.19% on IFEval (instruction-level loose, Table 6), and 7.32 in MT-Bench (Figure 12). We release code and model checkpoints for Mistral-ORPO-$α$ (7B) and Mistral-ORPO-$β$ (7B). △ Less

Submitted 14 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

Comments: Preprint

arXiv:2402.09155 [pdf, ps, other]

Joint and Robust Beamforming Framework for Integrated Sensing and Communication Systems

Authors: Jinseok Choi, Jeonghun Park, Namyoon Lee, Ahmed Alkhateeb

Abstract: Integrated sensing and communication (ISAC) is widely recognized as a fundamental enabler for future wireless communications. In this paper, we present a joint communication and radar beamforming framework for maximizing a sum spectral efficiency (SE) while guaranteeing desired radar performance with imperfect channel state information (CSI) in multi-user and multi-target ISAC systems. To this end… ▽ More Integrated sensing and communication (ISAC) is widely recognized as a fundamental enabler for future wireless communications. In this paper, we present a joint communication and radar beamforming framework for maximizing a sum spectral efficiency (SE) while guaranteeing desired radar performance with imperfect channel state information (CSI) in multi-user and multi-target ISAC systems. To this end, we adopt either a radar transmit beam mean square error (MSE) or receive signal-to-clutter-plus-noise ratio (SCNR) as a radar performance constraint of a sum SE maximization problem. To resolve inherent challenges such as non-convexity and imperfect CSI, we reformulate the problems and identify first-order optimality conditions for the joint radar and communication beamformer. Turning the condition to a nonlinear eigenvalue problem with eigenvector dependency (NEPv), we develop an alternating method which finds the joint beamformer through power iteration and a Lagrangian multiplier through binary search. The proposed framework encompasses both the radar metrics and is robust to channel estimation error with low complexity. Simulations validate the proposed methods. In particular, we observe that the MSE and SCNR constraints exhibit complementary performance depending on the operating environment, which manifests the importance of the proposed comprehensive and robust optimization framework. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: submitted for possible IEEE publication

arXiv:2402.07381 [pdf, other]

RIS-Empowered LEO Satellite Networks for 6G: Promising Usage Scenarios and Future Directions

Authors: Mesut Toka, Byungju Lee, Jaehyup Seong, Aryan Kaushik, Juhwan Lee, Jungwoo Lee, Namyoon Lee, Wonjae Shin, H. Vincent Poor

Abstract: Low-Earth orbit (LEO) satellite systems have been deemed a promising key enabler for current 5G and the forthcoming 6G wireless networks. Such LEO satellite constellations can provide worldwide three-dimensional coverage, high data rate, and scalability, thus enabling truly ubiquitous connectivity. On the other hand, another promising technology, reconfigurable intelligent surfaces (RISs), has eme… ▽ More Low-Earth orbit (LEO) satellite systems have been deemed a promising key enabler for current 5G and the forthcoming 6G wireless networks. Such LEO satellite constellations can provide worldwide three-dimensional coverage, high data rate, and scalability, thus enabling truly ubiquitous connectivity. On the other hand, another promising technology, reconfigurable intelligent surfaces (RISs), has emerged with favorable features, such as flexible deployment, cost & power efficiency, less transmission delay, noise-free nature, and in-band full-duplex structure. LEO satellite networks have many practical imperfections and limitations; however, exploiting RISs has been shown to be a potential solution to overcome these challenges. Particularly, RISs can enhance link quality, reduce the Doppler shift effect, and mitigate inter-/intra beam interference. In this article, we delve into exploiting RISs in LEO satellite networks. First, we present a holistic overview of LEO satellite communication and RIS technology, highlighting potential benefits and challenges. Second, we describe promising usage scenarios and applications in detail. Finally, we discuss potential future directions and challenges on RIS-empowered LEO networks, offering futuristic visions of the upcoming 6G era. △ Less

Submitted 11 February, 2024; originally announced February 2024.

Comments: 18 pages, 5 figures, Paper accepted by IEEE Communications Magazine

arXiv:2402.04248 [pdf, other]

Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks

Authors: Jongho Park, Jaeseung Park, Zheyang Xiong, Nayoung Lee, Jaewoong Cho, Samet Oymak, Kangwook Lee, Dimitris Papailiopoulos

Abstract: State-space models (SSMs), such as Mamba (Gu & Dao, 2023), have been proposed as alternatives to Transformer networks in language modeling, by incorporating gating, convolutions, and input-dependent token selection to mitigate the quadratic cost of multi-head attention. Although SSMs exhibit competitive performance, their in-context learning (ICL) capabilities, a remarkable emergent property of mo… ▽ More State-space models (SSMs), such as Mamba (Gu & Dao, 2023), have been proposed as alternatives to Transformer networks in language modeling, by incorporating gating, convolutions, and input-dependent token selection to mitigate the quadratic cost of multi-head attention. Although SSMs exhibit competitive performance, their in-context learning (ICL) capabilities, a remarkable emergent property of modern language models that enables task execution without parameter optimization, remain underexplored compared to Transformers. In this study, we evaluate the ICL performance of SSMs, focusing on Mamba, against Transformer models across various tasks. Our results show that SSMs perform comparably to Transformers in standard regression ICL tasks, while outperforming them in tasks like sparse parity learning. However, SSMs fall short in tasks involving non-standard retrieval functionality. To address these limitations, we introduce a hybrid model, MambaFormer, that combines Mamba with attention blocks, surpassing individual models in tasks where they struggle independently. Our findings suggest that hybrid architectures offer promising avenues for enhancing ICL in language models. △ Less

Submitted 25 April, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

Comments: Changes in v2: experiments on formal language ICL and explorations of width vs. depth on ICL; code repo available (24 pages, 10 figures)

arXiv:2402.01340 [pdf, ps, other]

SignSGD with Federated Defense: Harnessing Adversarial Attacks through Gradient Sign Decoding

Authors: Chanho Park, Namyoon Lee

Abstract: Distributed learning is an effective approach to accelerate model training using multiple workers. However, substantial communication delays emerge between workers and a parameter server due to massive costs associated with communicating gradients. SignSGD with majority voting (signSGD-MV) is a simple yet effective optimizer that reduces communication costs through one-bit quantization, yet the co… ▽ More Distributed learning is an effective approach to accelerate model training using multiple workers. However, substantial communication delays emerge between workers and a parameter server due to massive costs associated with communicating gradients. SignSGD with majority voting (signSGD-MV) is a simple yet effective optimizer that reduces communication costs through one-bit quantization, yet the convergence rates considerably decrease as adversarial workers increase. In this paper, we show that the convergence rate is invariant as the number of adversarial workers increases, provided that the number of adversarial workers is smaller than that of benign workers. The key idea showing this counter-intuitive result is our novel signSGD with federated defense (signSGD-FD). Unlike the traditional approaches, signSGD-FD exploits the gradient information sent by adversarial workers with the proper weights, which are obtained through gradient sign decoding. Experimental results demonstrate signSGD-FD achieves superior convergence rates over traditional algorithms in various adversarial attack scenarios. △ Less

Submitted 2 February, 2024; originally announced February 2024.

arXiv:2401.05193 [pdf, ps, other]

Experiment Planning with Function Approximation

Authors: Aldo Pacchiano, Jonathan N. Lee, Emma Brunskill

Abstract: We study the problem of experiment planning with function approximation in contextual bandit problems. In settings where there is a significant overhead to deploying adaptive algorithms -- for example, when the execution of the data collection policies is required to be distributed, or a human in the loop is needed to implement these policies -- producing in advance a set of policies for data coll… ▽ More We study the problem of experiment planning with function approximation in contextual bandit problems. In settings where there is a significant overhead to deploying adaptive algorithms -- for example, when the execution of the data collection policies is required to be distributed, or a human in the loop is needed to implement these policies -- producing in advance a set of policies for data collection is paramount. We study the setting where a large dataset of contexts but not rewards is available and may be used by the learner to design an effective data collection strategy. Although when rewards are linear this problem has been well studied, results are still missing for more complex reward models. In this work we propose two experiment planning strategies compatible with function approximation. The first is an eluder planning and sampling procedure that can recover optimality guarantees depending on the eluder dimension of the reward function class. For the second, we show that a uniform sampler achieves competitive optimality rates in the setting where the number of actions is small. We finalize our results introducing a statistical gap fleshing out the fundamental differences between planning and adaptive learning and provide results for planning with model selection. △ Less

Submitted 10 January, 2024; originally announced January 2024.

Comments: 10 pages main

arXiv:2312.13289 [pdf, other]

Stoichiometry Representation Learning with Polymorphic Crystal Structures

Authors: Namkyeong Lee, Heewoong Noh, Gyoung S. Na, Tianfan Fu, Jimeng Sun, Chanyoung Park

Abstract: Despite the recent success of machine learning (ML) in materials science, its success heavily relies on the structural description of crystal, which is itself computationally demanding and occasionally unattainable. Stoichiometry descriptors can be an alternative approach, which reveals the ratio between elements involved to form a certain compound without any structural information. However, it i… ▽ More Despite the recent success of machine learning (ML) in materials science, its success heavily relies on the structural description of crystal, which is itself computationally demanding and occasionally unattainable. Stoichiometry descriptors can be an alternative approach, which reveals the ratio between elements involved to form a certain compound without any structural information. However, it is not trivial to learn the representations of stoichiometry due to the nature of materials science called polymorphism, i.e., a single stoichiometry can exist in multiple structural forms due to the flexibility of atomic arrangements, inducing uncertainties in representation. To this end, we propose PolySRL, which learns the probabilistic representation of stoichiometry by utilizing the readily available structural information, whose uncertainty reveals the polymorphic structures of stoichiometry. Extensive experiments on sixteen datasets demonstrate the superiority of PolySRL, and analysis of uncertainties shed light on the applicability of PolySRL in real-world material discovery. The source code for PolySRL is available at https://github.com/Namkyeong/PolySRL_AI4Science. △ Less

Submitted 17 November, 2023; originally announced December 2023.

Comments: NeurIPS 2023 AI4Science Workshop

arXiv:2312.04511 [pdf, other]

An LLM Compiler for Parallel Function Calling

Authors: Sehoon Kim, Suhong Moon, Ryan Tabrizi, Nicholas Lee, Michael W. Mahoney, Kurt Keutzer, Amir Gholami

Abstract: The reasoning capabilities of the recent LLMs enable them to execute external function calls to overcome their inherent limitations, such as knowledge cutoffs, poor arithmetic skills, or lack of access to private data. This development has allowed LLMs to select and coordinate multiple functions based on the context to tackle more complex problems. However, current methods for function calling oft… ▽ More The reasoning capabilities of the recent LLMs enable them to execute external function calls to overcome their inherent limitations, such as knowledge cutoffs, poor arithmetic skills, or lack of access to private data. This development has allowed LLMs to select and coordinate multiple functions based on the context to tackle more complex problems. However, current methods for function calling often require sequential reasoning and acting for each function which can result in high latency, cost, and sometimes inaccurate behavior. To address this, we introduce LLMCompiler, which executes functions in parallel to efficiently orchestrate multiple function calls. Drawing inspiration from the principles of classical compilers, LLMCompiler enables parallel function calling with three components: (i) a Function Calling Planner, formulating execution plans for function calling; (ii) a Task Fetching Unit, dispatching function calling tasks; and (iii) an Executor, executing these tasks in parallel. LLMCompiler automatically generates an optimized orchestration for the function calls and can be used with both open-source and closed-source models. We have benchmarked LLMCompiler on a range of tasks with different patterns of function calling. We observe consistent latency speedup of up to 3.7x, cost savings of up to 6.7x, and accuracy improvement of up to ~9% compared to ReAct. Our code is available at https://github.com/SqueezeAILab/LLMCompiler. △ Less

Submitted 4 June, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

Comments: ICML 2024

arXiv:2312.03901 [pdf, other]

Redrawing the 2012 map of the Maryland congressional districts

Authors: Noah Lee, Hyunwoo Park, Sangho Shim

Abstract: Gerrymandering is the practice of drawing biased electoral maps that manipulate the voter population to gain an advantage. The most recent time gerrymandering became an issue was 2019 when the U.S. Federal Supreme Court decided that the court does not have the authority to dictate how to draw the district map and state legislators are the ones who should come up with an electoral district plan. We… ▽ More Gerrymandering is the practice of drawing biased electoral maps that manipulate the voter population to gain an advantage. The most recent time gerrymandering became an issue was 2019 when the U.S. Federal Supreme Court decided that the court does not have the authority to dictate how to draw the district map and state legislators are the ones who should come up with an electoral district plan. We solve the political districting problem and redraw the 2012 map of Maryland congressional districts which raised the issue in 2019. △ Less

Submitted 6 December, 2023; originally announced December 2023.

Comments: 8 pages, to be submitted to IISE 2024 Annual Conference Proceedings

MSC Class: 90

arXiv:2311.18172 [pdf, other]

Multi-Rate Variable-Length CSI Compression for FDD Massive MIMO

Authors: Bumsu Park, Heedong Do, Namyoon Lee

Abstract: For frequency-division-duplexing (FDD) systems, channel state information (CSI) should be fed back from the user terminal to the base station. This feedback overhead becomes problematic as the number of antennas grows. To alleviate this issue, we propose a flexible CSI compression method using variational autoencoder (VAE) with an entropy bottleneck structure, which can support multi-rate and vari… ▽ More For frequency-division-duplexing (FDD) systems, channel state information (CSI) should be fed back from the user terminal to the base station. This feedback overhead becomes problematic as the number of antennas grows. To alleviate this issue, we propose a flexible CSI compression method using variational autoencoder (VAE) with an entropy bottleneck structure, which can support multi-rate and variable-length operation. Numerical study confirms that the proposed method outperforms the existing CSI compression techniques in terms of normalized mean squared error. △ Less

Submitted 29 November, 2023; originally announced November 2023.

arXiv:2311.17539 [pdf, other]

Critical Influence of Overparameterization on Sharpness-aware Minimization

Authors: Sungbin Shin, Dongyeop Lee, Maksym Andriushchenko, Namhoon Lee

Abstract: Training an overparameterized neural network can yield minimizers of different generalization capabilities despite the same level of training loss. Meanwhile, with evidence that suggests a strong correlation between the sharpness of minima and their generalization errors, increasing efforts have been made to develop optimization methods to explicitly find flat minima as more generalizable solution… ▽ More Training an overparameterized neural network can yield minimizers of different generalization capabilities despite the same level of training loss. Meanwhile, with evidence that suggests a strong correlation between the sharpness of minima and their generalization errors, increasing efforts have been made to develop optimization methods to explicitly find flat minima as more generalizable solutions. Despite its contemporary relevance to overparameterization, however, this sharpness-aware minimization (SAM) strategy has not been studied much yet as to exactly how it is affected by overparameterization. Hence, in this work, we analyze SAM under overparameterization of varying degrees and present both empirical and theoretical results that indicate a critical influence of overparameterization on SAM. At first, we conduct extensive numerical experiments across vision, language, graph, and reinforcement learning domains and show that SAM consistently improves with overparameterization. Next, we attribute this phenomenon to the interplay between the enlarged solution space and increased implicit bias from overparameterization. Further, we prove multiple theoretical benefits of overparameterization for SAM to attain (i) minima with more uniform Hessian moments compared to SGD, (ii) much faster convergence at a linear rate, and (iii) lower test error for two-layer networks. Last but not least, we discover that the effect of overparameterization is more significantly pronounced in practical settings of label noise and sparsity, and yet, sufficient regularization is necessary. △ Less

Submitted 19 June, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

arXiv:2311.12856 [pdf, other]

Density of States Prediction of Crystalline Materials via Prompt-guided Multi-Modal Transformer

Authors: Namkyeong Lee, Heewoong Noh, Sungwon Kim, Dongmin Hyun, Gyoung S. Na, Chanyoung Park

Abstract: The density of states (DOS) is a spectral property of crystalline materials, which provides fundamental insights into various characteristics of the materials. While previous works mainly focus on obtaining high-quality representations of crystalline materials for DOS prediction, we focus on predicting the DOS from the obtained representations by reflecting the nature of DOS: DOS determines the ge… ▽ More The density of states (DOS) is a spectral property of crystalline materials, which provides fundamental insights into various characteristics of the materials. While previous works mainly focus on obtaining high-quality representations of crystalline materials for DOS prediction, we focus on predicting the DOS from the obtained representations by reflecting the nature of DOS: DOS determines the general distribution of states as a function of energy. That is, DOS is not solely determined by the crystalline material but also by the energy levels, which has been neglected in previous works. In this paper, we propose to integrate heterogeneous information obtained from the crystalline materials and the energies via a multi-modal transformer, thereby modeling the complex relationships between the atoms in the crystalline materials and various energy levels for DOS prediction. Moreover, we propose to utilize prompts to guide the model to learn the crystal structural system-specific interactions between crystalline materials and energies. Extensive experiments on two types of DOS, i.e., Phonon DOS and Electron DOS, with various real-world scenarios demonstrate the superiority of DOSTransformer. The source code for DOSTransformer is available at https://github.com/HeewoongNoh/DOSTransformer. △ Less

Submitted 22 November, 2023; v1 submitted 24 October, 2023; originally announced November 2023.

Comments: NeurIPS 2023. arXiv admin note: text overlap with arXiv:2303.07000

arXiv:2311.04912 [pdf]

doi 10.1038/s41597-024-02959-0

ezBIDS: Guided standardization of neuroimaging data interoperable with major data archives and platforms

Authors: Daniel Levitas, Soichi Hayashi, Sophia Vinci-Booher, Anibal Heinsfeld, Dheeraj Bhatia, Nicholas Lee, Anthony Galassi, Guiomar Niso, Franco Pestilli

Abstract: Data standardization has become one of the leading methods neuroimaging researchers rely on for data sharing and reproducibility. Data standardization promotes a common framework through which researchers can utilize others' data. Yet, as of today, formatting datasets that adhere to community best practices requires technical expertise involving coding and considerable knowledge of file formats an… ▽ More Data standardization has become one of the leading methods neuroimaging researchers rely on for data sharing and reproducibility. Data standardization promotes a common framework through which researchers can utilize others' data. Yet, as of today, formatting datasets that adhere to community best practices requires technical expertise involving coding and considerable knowledge of file formats and standards. We describe ezBIDS, a tool for converting neuroimaging data and associated metadata to the Brain Imaging Data Structure (BIDS) standard. ezBIDS provides four unique features: (1) No installation or programming requirements. (2) Handling of both imaging and task events data and metadata. (3) Automated inference and guidance for adherence to BIDS. (4) Multiple data management options: download BIDS data to local system, or transfer to OpenNeuro.org or brainlife.io. In sum, ezBIDS requires neither coding proficiency nor knowledge of BIDS and is the first BIDS tool to offer guided standardization, support for task events conversion, and interoperability with OpenNeuro and brainlife.io. △ Less

Submitted 1 November, 2023; originally announced November 2023.

arXiv:2311.03285 [pdf, other]

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

Authors: Ying Sheng, Shiyi Cao, Dacheng Li, Coleman Hooper, Nicholas Lee, Shuo Yang, Christopher Chou, Banghua Zhu, Lianmin Zheng, Kurt Keutzer, Joseph E. Gonzalez, Ion Stoica

Abstract: The "pretrain-then-finetune" paradigm is commonly adopted in the deployment of large language models. Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning method, is often employed to adapt a base model to a multitude of tasks, resulting in a substantial collection of LoRA adapters derived from one base model. We observe that this paradigm presents significant opportunities for batched in… ▽ More The "pretrain-then-finetune" paradigm is commonly adopted in the deployment of large language models. Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning method, is often employed to adapt a base model to a multitude of tasks, resulting in a substantial collection of LoRA adapters derived from one base model. We observe that this paradigm presents significant opportunities for batched inference during serving. To capitalize on these opportunities, we present S-LoRA, a system designed for the scalable serving of many LoRA adapters. S-LoRA stores all adapters in the main memory and fetches the adapters used by the currently running queries to the GPU memory. To efficiently use the GPU memory and reduce fragmentation, S-LoRA proposes Unified Paging. Unified Paging uses a unified memory pool to manage dynamic adapter weights with different ranks and KV cache tensors with varying sequence lengths. Additionally, S-LoRA employs a novel tensor parallelism strategy and highly optimized custom CUDA kernels for heterogeneous batching of LoRA computation. Collectively, these features enable S-LoRA to serve thousands of LoRA adapters on a single GPU or across multiple GPUs with a small overhead. Compared to state-of-the-art libraries such as HuggingFace PEFT and vLLM (with naive support of LoRA serving), S-LoRA can improve the throughput by up to 4 times and increase the number of served adapters by several orders of magnitude. As a result, S-LoRA enables scalable serving of many task-specific fine-tuned models and offers the potential for large-scale customized fine-tuning services. The code is available at https://github.com/S-LoRA/S-LoRA △ Less

Submitted 5 June, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

arXiv:2311.02236 [pdf, other]

Robust Fine-Tuning of Vision-Language Models for Domain Generalization

Authors: Kevin Vogt-Lowell, Noah Lee, Theodoros Tsiligkaridis, Marc Vaillant

Abstract: Transfer learning enables the sharing of common knowledge among models for a variety of downstream tasks, but traditional methods suffer in limited training data settings and produce narrow models incapable of effectively generalizing under distribution shifts. Foundation models have recently demonstrated impressive zero-shot inference capabilities and robustness under distribution shifts. However… ▽ More Transfer learning enables the sharing of common knowledge among models for a variety of downstream tasks, but traditional methods suffer in limited training data settings and produce narrow models incapable of effectively generalizing under distribution shifts. Foundation models have recently demonstrated impressive zero-shot inference capabilities and robustness under distribution shifts. However, zero-shot evaluation for these models has been predominantly confined to benchmarks with simple distribution shifts, limiting our understanding of their effectiveness under the more realistic shifts found in practice. Moreover, common fine-tuning methods for these models have yet to be evaluated against vision models in few-shot scenarios where training data is limited. To address these gaps, we present a new recipe for few-shot fine-tuning of the popular vision-language foundation model CLIP and evaluate its performance on challenging benchmark datasets with realistic distribution shifts from the WILDS collection. Our experimentation demonstrates that, while zero-shot CLIP fails to match performance of trained vision models on more complex benchmarks, few-shot CLIP fine-tuning outperforms its vision-only counterparts in terms of in-distribution and out-of-distribution accuracy at all levels of training data availability. This provides a strong incentive for adoption of foundation models within few-shot learning applications operating with real-world data. Code is available at https://github.com/mit-ll/robust-vision-language-finetuning △ Less

Submitted 3 November, 2023; originally announced November 2023.

Comments: In proceedings of the 27th IEEE High Performance Extreme Computing Conference

arXiv:2311.01817 [pdf, other]

Mitigating Framing Bias with Polarity Minimization Loss

Authors: Yejin Bang, Nayeon Lee, Pascale Fung

Abstract: Framing bias plays a significant role in exacerbating political polarization by distorting the perception of actual events. Media outlets with divergent political stances often use polarized language in their reporting of the same event. We propose a new loss function that encourages the model to minimize the polarity difference between the polarized input articles to reduce framing bias. Specific… ▽ More Framing bias plays a significant role in exacerbating political polarization by distorting the perception of actual events. Media outlets with divergent political stances often use polarized language in their reporting of the same event. We propose a new loss function that encourages the model to minimize the polarity difference between the polarized input articles to reduce framing bias. Specifically, our loss is designed to jointly optimize the model to map polarity ends bidirectionally. Our experimental results demonstrate that incorporating the proposed polarity minimization loss leads to a substantial reduction in framing bias when compared to a BART-based multi-document summarization model. Notably, we find that the effectiveness of this approach is most pronounced when the model is trained to minimize the polarity loss associated with informational framing bias (i.e., skewed selection of information to report). △ Less

Submitted 3 November, 2023; originally announced November 2023.

Comments: 11 pages, EMNLP2023

arXiv:2310.07101 [pdf, other]

Hybrid Arrays: How Many RF Chains Are Required to Prevent Beam Squint?

Authors: Heedong Do, Namyoon Lee, Robert W. Heath Jr, Angel Lozano

Abstract: With increasing frequencies, bandwidths, and array apertures, the phenomenon of beam squint arises as a serious impairment to beamforming. Fully digital arrays with true time delay per antenna element are a potential solution, but they require downconversion at each element. This paper shows that hybrid arrays can perform essentially as well as digital arrays once the number of radio-frequency cha… ▽ More With increasing frequencies, bandwidths, and array apertures, the phenomenon of beam squint arises as a serious impairment to beamforming. Fully digital arrays with true time delay per antenna element are a potential solution, but they require downconversion at each element. This paper shows that hybrid arrays can perform essentially as well as digital arrays once the number of radio-frequency chains exceeds a certain threshold that is far below the number of elements. The result is robust, holding also for suboptimum but highly appealing beamspace architectures. △ Less

Submitted 10 October, 2023; originally announced October 2023.

arXiv:2310.06271 [pdf, other]

Towards Mitigating Hallucination in Large Language Models via Self-Reflection

Authors: Ziwei Ji, Tiezheng Yu, Yan Xu, Nayeon Lee, Etsuko Ishii, Pascale Fung

Abstract: Large language models (LLMs) have shown promise for generative and knowledge-intensive tasks including question-answering (QA) tasks. However, the practical deployment still faces challenges, notably the issue of "hallucination", where models generate plausible-sounding but unfaithful or nonsensical information. This issue becomes particularly critical in the medical domain due to the uncommon pro… ▽ More Large language models (LLMs) have shown promise for generative and knowledge-intensive tasks including question-answering (QA) tasks. However, the practical deployment still faces challenges, notably the issue of "hallucination", where models generate plausible-sounding but unfaithful or nonsensical information. This issue becomes particularly critical in the medical domain due to the uncommon professional concepts and potential social risks involved. This paper analyses the phenomenon of hallucination in medical generative QA systems using widely adopted LLMs and datasets. Our investigation centers on the identification and comprehension of common problematic answers, with a specific emphasis on hallucination. To tackle this challenge, we present an interactive self-reflection methodology that incorporates knowledge acquisition and answer generation. Through this feedback process, our approach steadily enhances the factuality, consistency, and entailment of the generated answers. Consequently, we harness the interactivity and multitasking ability of LLMs and produce progressively more precise and accurate answers. Experimental results on both automatic and human evaluation demonstrate the superiority of our approach in hallucination reduction compared to baselines. △ Less

Submitted 9 October, 2023; originally announced October 2023.

Comments: Accepted by the findings of EMNLP 2023

arXiv:2310.00954 [pdf, other]

doi 10.1109/TALE56641.2023.10398393

How Helpful do Novice Programmers Find the Feedback of an Automated Repair Tool?

Authors: Oka Kurniawan, Christopher M. Poskitt, Ismam Al Hoque, Norman Tiong Seng Lee, Cyrille Jégourel, Nachamma Sockalingam

Abstract: Immediate feedback has been shown to improve student learning. In programming courses, immediate, automated feedback is typically provided in the form of pre-defined test cases run by a submission platform. While these are excellent for highlighting the presence of logical errors, they do not provide novice programmers enough scaffolding to help them identify where an error is or how to fix it. To… ▽ More Immediate feedback has been shown to improve student learning. In programming courses, immediate, automated feedback is typically provided in the form of pre-defined test cases run by a submission platform. While these are excellent for highlighting the presence of logical errors, they do not provide novice programmers enough scaffolding to help them identify where an error is or how to fix it. To address this, several tools have been developed that provide richer feedback in the form of program repairs. Studies of such tools, however, tend to focus more on whether correct repairs can be generated, rather than how novices are using them. In this paper, we describe our experience of using CLARA, an automated repair tool, to provide feedback to novices. First, we extended CLARA to support a larger subset of the Python language, before integrating it with the Jupyter Notebooks used for our programming exercises. Second, we devised a preliminary study in which students tackled programming problems with and without support of the tool using the 'think aloud' protocol. We found that novices often struggled to understand the proposed repairs, echoing the well-known challenge to understand compiler/interpreter messages. Furthermore, we found that students valued being told where a fix was needed - without necessarily the fix itself - suggesting that 'less may be more' from a pedagogical perspective. △ Less

Submitted 7 October, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

Comments: Experience report accepted by the International Conference on Teaching, Assessment, and Learning for Engineering (TALE'23)

Journal ref: Proc. TALE'23. IEEE, 2023

arXiv:2309.14381 [pdf, other]

Survey of Social Bias in Vision-Language Models

Authors: Nayeon Lee, Yejin Bang, Holy Lovenia, Samuel Cahyawijaya, Wenliang Dai, Pascale Fung

Abstract: In recent years, the rapid advancement of machine learning (ML) models, particularly transformer-based pre-trained models, has revolutionized Natural Language Processing (NLP) and Computer Vision (CV) fields. However, researchers have discovered that these models can inadvertently capture and reinforce social biases present in their training datasets, leading to potential social harms, such as une… ▽ More In recent years, the rapid advancement of machine learning (ML) models, particularly transformer-based pre-trained models, has revolutionized Natural Language Processing (NLP) and Computer Vision (CV) fields. However, researchers have discovered that these models can inadvertently capture and reinforce social biases present in their training datasets, leading to potential social harms, such as uneven resource allocation and unfair representation of specific social groups. Addressing these biases and ensuring fairness in artificial intelligence (AI) systems has become a critical concern in the ML community. The recent introduction of pre-trained vision-and-language (VL) models in the emerging multimodal field demands attention to the potential social biases present in these models as well. Although VL models are susceptible to social bias, there is a limited understanding compared to the extensive discussions on bias in NLP and CV. This survey aims to provide researchers with a high-level insight into the similarities and differences of social bias studies in pre-trained models across NLP, CV, and VL. By examining these perspectives, the survey aims to offer valuable guidelines on how to approach and mitigate social bias in both unimodal and multimodal settings. The findings and recommendations presented here can benefit the ML community, fostering the development of fairer and non-biased AI models in various applications and research endeavors. △ Less

Submitted 24 September, 2023; originally announced September 2023.

arXiv:2309.01150 [pdf, other]

FedFwd: Federated Learning without Backpropagation

Authors: Seonghwan Park, Dahun Shin, Jinseok Chung, Namhoon Lee

Abstract: In federated learning (FL), clients with limited resources can disrupt the training efficiency. A potential solution to this problem is to leverage a new learning procedure that does not rely on backpropagation (BP). We present a novel approach to FL called FedFwd that employs a recent BP-free method by Hinton (2022), namely the Forward Forward algorithm, in the local training process. FedFwd can… ▽ More In federated learning (FL), clients with limited resources can disrupt the training efficiency. A potential solution to this problem is to leverage a new learning procedure that does not rely on backpropagation (BP). We present a novel approach to FL called FedFwd that employs a recent BP-free method by Hinton (2022), namely the Forward Forward algorithm, in the local training process. FedFwd can reduce a significant amount of computations for updating parameters by performing layer-wise local updates, and therefore, there is no need to store all intermediate activation values during training. We conduct various experiments to evaluate FedFwd on standard datasets including MNIST and CIFAR-10, and show that it works competitively to other BP-dependent FL methods. △ Less

Submitted 3 September, 2023; originally announced September 2023.

Comments: ICML 2023 Workshop (Federated Learning and Analytics in Practice: Algorithms, Systems, Applications, and Opportunities)

arXiv:2308.16705 [pdf, other]

Exploring Cross-Cultural Differences in English Hate Speech Annotations: From Dataset Construction to Analysis

Authors: Nayeon Lee, Chani Jung, Junho Myung, Jiho Jin, Jose Camacho-Collados, Juho Kim, Alice Oh

Abstract: Warning: this paper contains content that may be offensive or upsetting. Most hate speech datasets neglect the cultural diversity within a single language, resulting in a critical shortcoming in hate speech detection. To address this, we introduce CREHate, a CRoss-cultural English Hate speech dataset. To construct CREHate, we follow a two-step procedure: 1) cultural post collection and 2) cross-… ▽ More Warning: this paper contains content that may be offensive or upsetting. Most hate speech datasets neglect the cultural diversity within a single language, resulting in a critical shortcoming in hate speech detection. To address this, we introduce CREHate, a CRoss-cultural English Hate speech dataset. To construct CREHate, we follow a two-step procedure: 1) cultural post collection and 2) cross-cultural annotation. We sample posts from the SBIC dataset, which predominantly represents North America, and collect posts from four geographically diverse English-speaking countries (Australia, United Kingdom, Singapore, and South Africa) using culturally hateful keywords we retrieve from our survey. Annotations are collected from the four countries plus the United States to establish representative labels for each country. Our analysis highlights statistically significant disparities across countries in hate speech annotations. Only 56.2% of the posts in CREHate achieve consensus among all countries, with the highest pairwise label difference rate of 26%. Qualitative analysis shows that label disagreement occurs mostly due to different interpretations of sarcasm and the personal bias of annotators on divisive topics. Lastly, we evaluate large language models (LLMs) under a zero-shot setting and show that current LLMs tend to show higher accuracies on Anglosphere country labels in CREHate. Our dataset and codes are available at: https://github.com/nlee0212/CREHate △ Less

Submitted 3 April, 2024; v1 submitted 31 August, 2023; originally announced August 2023.

Comments: Accepted to NAACL 2024 Main Conference

arXiv:2308.11189 [pdf, other]

Diversity Measures: Domain-Independent Proxies for Failure in Language Model Queries

Authors: Noel Ngu, Nathaniel Lee, Paulo Shakarian

Abstract: Error prediction in large language models often relies on domain-specific information. In this paper, we present measures for quantification of error in the response of a large language model based on the diversity of responses to a given prompt - hence independent of the underlying application. We describe how three such measures - based on entropy, Gini impurity, and centroid distance - can be e… ▽ More Error prediction in large language models often relies on domain-specific information. In this paper, we present measures for quantification of error in the response of a large language model based on the diversity of responses to a given prompt - hence independent of the underlying application. We describe how three such measures - based on entropy, Gini impurity, and centroid distance - can be employed. We perform a suite of experiments on multiple datasets and temperature settings to demonstrate that these measures strongly correlate with the probability of failure. Additionally, we present empirical results demonstrating how these measures can be applied to few-shot prompting, chain-of-thought reasoning, and error detection. △ Less

Submitted 22 August, 2023; originally announced August 2023.

Report number: Accepted to IEEE ICSC '24

arXiv:2308.07317 [pdf, other]

Platypus: Quick, Cheap, and Powerful Refinement of LLMs

Authors: Ariel N. Lee, Cole J. Hunter, Nataniel Ruiz

Abstract: We present $\textbf{Platypus}$, a family of fine-tuned and merged Large Language Models (LLMs) that achieves the strongest performance and currently stands at first place in HuggingFace's Open LLM Leaderboard as of the release date of this work. In this work we describe (1) our curated dataset $\textbf{Open-Platypus}$, that is a subset of other open datasets and which… ▽ More We present $\textbf{Platypus}$, a family of fine-tuned and merged Large Language Models (LLMs) that achieves the strongest performance and currently stands at first place in HuggingFace's Open LLM Leaderboard as of the release date of this work. In this work we describe (1) our curated dataset $\textbf{Open-Platypus}$, that is a subset of other open datasets and which $\textit{we release to the public}$ (2) our process of fine-tuning and merging LoRA modules in order to conserve the strong prior of pretrained LLMs, while bringing specific domain knowledge to the surface (3) our efforts in checking for test data leaks and contamination in the training data, which can inform future research. Specifically, the Platypus family achieves strong performance in quantitative LLM metrics across model sizes, topping the global Open LLM leaderboard while using just a fraction of the fine-tuning data and overall compute that are required for other state-of-the-art fine-tuned LLMs. In particular, a 13B Platypus model can be trained on $\textit{a single}$ A100 GPU using 25k questions in 5 hours. This is a testament of the quality of our Open-Platypus dataset, and opens opportunities for more improvements in the field. Project page: https://platypus-llm.github.io △ Less

Submitted 14 March, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

Comments: Workshop on Instruction Tuning and Instruction Following at NeurIPS 2023

arXiv:2308.04058 [pdf, ps, other]

Finding Globally Optimal Configuration of Active RIS in Linear Time

Authors: Heedong Do, Namyoon Lee

Abstract: This paper presents an algorithm for finding the optimal configuration of active reconfigurable intelligent surface (RIS) when both transmitter and receiver are equipped with a single antenna each. The resultant configuration is globally optimal and it takes linear time for the computation. Moreover, there is a closed-form expression for the optimal configuration when the direct link vanishes, whi… ▽ More This paper presents an algorithm for finding the optimal configuration of active reconfigurable intelligent surface (RIS) when both transmitter and receiver are equipped with a single antenna each. The resultant configuration is globally optimal and it takes linear time for the computation. Moreover, there is a closed-form expression for the optimal configuration when the direct link vanishes, which enables further analysis. △ Less

Submitted 8 August, 2023; originally announced August 2023.

arXiv:2308.03004 [pdf, other]

Deep Polar Codes

Authors: Geon Choi, Namyoon Lee

Abstract: In this paper, we introduce a novel class of pre-transformed polar codes, termed as deep polar codes. We first present a deep polar encoder that harnesses a series of multi-layered polar transformations with varying sizes. Our approach to encoding enables a low-complexity implementation while significantly enhancing the weight distribution of the code. Moreover, our encoding method offers flexibil… ▽ More In this paper, we introduce a novel class of pre-transformed polar codes, termed as deep polar codes. We first present a deep polar encoder that harnesses a series of multi-layered polar transformations with varying sizes. Our approach to encoding enables a low-complexity implementation while significantly enhancing the weight distribution of the code. Moreover, our encoding method offers flexibility in rate-profiling, embracing a wide range of code rates and blocklengths. Next, we put forth a low-complexity decoding algorithm called successive cancellation list with backpropagation parity checks (SCL-BPC). This decoding algorithm leverages the parity check equations in the reverse process of the multi-layered pre-transformed encoding for SCL decoding. Additionally, we present a low-latency decoding algorithm that employs parallel-SCL decoding by treating partially pre-transformed bit patterns as additional frozen bits. Through simulations, we demonstrate that deep polar codes outperform existing pre-transformed polar codes in terms of block error rates across various code rates under short block lengths, while maintaining low encoding and decoding complexity. Furthermore, we show that concatenating deep polar codes with cyclic-redundancy-check codes can achieve the meta-converse bound of the finite block length capacity within 0.4 dB in some instances. △ Less

Submitted 5 August, 2023; originally announced August 2023.

arXiv:2307.16778 [pdf, other]

KoBBQ: Korean Bias Benchmark for Question Answering

Authors: Jiho Jin, Jiseon Kim, Nayeon Lee, Haneul Yoo, Alice Oh, Hwaran Lee

Abstract: The Bias Benchmark for Question Answering (BBQ) is designed to evaluate social biases of language models (LMs), but it is not simple to adapt this benchmark to cultural contexts other than the US because social biases depend heavily on the cultural context. In this paper, we present KoBBQ, a Korean bias benchmark dataset, and we propose a general framework that addresses considerations for cultura… ▽ More The Bias Benchmark for Question Answering (BBQ) is designed to evaluate social biases of language models (LMs), but it is not simple to adapt this benchmark to cultural contexts other than the US because social biases depend heavily on the cultural context. In this paper, we present KoBBQ, a Korean bias benchmark dataset, and we propose a general framework that addresses considerations for cultural adaptation of a dataset. Our framework includes partitioning the BBQ dataset into three classes--Simply-Transferred (can be used directly after cultural translation), Target-Modified (requires localization in target groups), and Sample-Removed (does not fit Korean culture)-- and adding four new categories of bias specific to Korean culture. We conduct a large-scale survey to collect and validate the social biases and the targets of the biases that reflect the stereotypes in Korean culture. The resulting KoBBQ dataset comprises 268 templates and 76,048 samples across 12 categories of social bias. We use KoBBQ to measure the accuracy and bias scores of several state-of-the-art multilingual LMs. The results clearly show differences in the bias of LMs as measured by KoBBQ and a machine-translated version of BBQ, demonstrating the need for and utility of a well-constructed, culturally-aware social bias benchmark. △ Less

Submitted 25 January, 2024; v1 submitted 31 July, 2023; originally announced July 2023.

Comments: TACL 2024 (pre-MIT Press publication version)

arXiv:2307.03381 [pdf, other]

Teaching Arithmetic to Small Transformers

Authors: Nayoung Lee, Kartik Sreenivasan, Jason D. Lee, Kangwook Lee, Dimitris Papailiopoulos

Abstract: Large language models like GPT-4 exhibit emergent capabilities across general-purpose tasks, such as basic arithmetic, when trained on extensive text data, even though these tasks are not explicitly encoded by the unsupervised, next-token prediction objective. This study investigates how small transformers, trained from random initialization, can efficiently learn arithmetic operations such as add… ▽ More Large language models like GPT-4 exhibit emergent capabilities across general-purpose tasks, such as basic arithmetic, when trained on extensive text data, even though these tasks are not explicitly encoded by the unsupervised, next-token prediction objective. This study investigates how small transformers, trained from random initialization, can efficiently learn arithmetic operations such as addition, multiplication, and elementary functions like square root, using the next-token prediction objective. We first demonstrate that conventional training data is not the most effective for arithmetic learning, and simple formatting changes can significantly improve accuracy. This leads to sharp phase transitions as a function of training data scale, which, in some cases, can be explained through connections to low-rank matrix completion. Building on prior work, we then train on chain-of-thought style data that includes intermediate step results. Even in the complete absence of pretraining, this approach significantly and simultaneously improves accuracy, sample complexity, and convergence speed. We also study the interplay between arithmetic and text data during training and examine the effects of few-shot prompting, pretraining, and model scale. Additionally, we discuss length generalization challenges. Our work highlights the importance of high-quality, instructive data that considers the particular characteristics of the next-word prediction objective for rapidly eliciting arithmetic capabilities. △ Less

Submitted 7 July, 2023; originally announced July 2023.

arXiv:2307.03343 [pdf, other]

Unified Modeling and Rate Coverage Analysis for Satellite-Terrestrial Integrated Networks: Coverage Extension or Data Offloading?

Authors: Jeonghun Park, Jinseok Choi, Namyoon Lee, François Baccelli

Abstract: With the growing interest in satellite networks, satellite-terrestrial integrated networks (STINs) have gained significant attention because of their potential benefits. However, due to the lack of a tractable network model for the STIN architecture, analytical studies allowing one to investigate the performance of such networks are not yet available. In this work, we propose a unified network mod… ▽ More With the growing interest in satellite networks, satellite-terrestrial integrated networks (STINs) have gained significant attention because of their potential benefits. However, due to the lack of a tractable network model for the STIN architecture, analytical studies allowing one to investigate the performance of such networks are not yet available. In this work, we propose a unified network model that jointly captures satellite and terrestrial networks into one analytical framework. Our key idea is based on Poisson point processes distributed on concentric spheres, assigning a random height to each point as a mark. This allows one to consider each point as a source of desired signal or a source of interference while ensuring visibility to the typical user. Thanks to this model, we derive the probability of coverage of STINs as a function of major system parameters, chiefly path-loss exponent, satellites and terrestrial base stations' height distributions and density, transmit power and biasing factors. Leveraging the analysis, we concretely explore two benefits that STINs provide: i) coverage extension in remote rural areas and ii) data offloading in dense urban areas. △ Less

Submitted 3 February, 2024; v1 submitted 6 July, 2023; originally announced July 2023.

Comments: submitted to IEEE journal

arXiv:2306.17848 [pdf, other]

Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing

Authors: Ariel N. Lee, Sarah Adel Bargal, Janavi Kasera, Stan Sclaroff, Kate Saenko, Nataniel Ruiz

Abstract: Vision transformers (ViTs) have significantly changed the computer vision landscape and have periodically exhibited superior performance in vision tasks compared to convolutional neural networks (CNNs). Although the jury is still out on which model type is superior, each has unique inductive biases that shape their learning and generalization performance. For example, ViTs have interesting propert… ▽ More Vision transformers (ViTs) have significantly changed the computer vision landscape and have periodically exhibited superior performance in vision tasks compared to convolutional neural networks (CNNs). Although the jury is still out on which model type is superior, each has unique inductive biases that shape their learning and generalization performance. For example, ViTs have interesting properties with respect to early layer non-local feature dependence, as well as self-attention mechanisms which enhance learning flexibility, enabling them to ignore out-of-context image information more effectively. We hypothesize that this power to ignore out-of-context information (which we name $\textit{patch selectivity}$), while integrating in-context information in a non-local manner in early layers, allows ViTs to more easily handle occlusion. In this study, our aim is to see whether we can have CNNs $\textit{simulate}$ this ability of patch selectivity by effectively hardwiring this inductive bias using Patch Mixing data augmentation, which consists of inserting patches from another image onto a training image and interpolating labels between the two image classes. Specifically, we use Patch Mixing to train state-of-the-art ViTs and CNNs, assessing its impact on their ability to ignore out-of-context patches and handle natural occlusions. We find that ViTs do not improve nor degrade when trained using Patch Mixing, but CNNs acquire new capabilities to ignore out-of-context information and improve on occlusion benchmarks, leaving us to conclude that this training method is a way of simulating in CNNs the abilities that ViTs already possess. We will release our Patch Mixing implementation and proposed datasets for public use. Project page: https://arielnlee.github.io/PatchMixing/ △ Less

Submitted 30 June, 2023; originally announced June 2023.

arXiv:2306.14892 [pdf, other]

Supervised Pretraining Can Learn In-Context Reinforcement Learning

Authors: Jonathan N. Lee, Annie Xie, Aldo Pacchiano, Yash Chandak, Chelsea Finn, Ofir Nachum, Emma Brunskill

Abstract: Large transformer models trained on diverse datasets have shown a remarkable ability to learn in-context, achieving high few-shot performance on tasks they were not explicitly trained to solve. In this paper, we study the in-context learning capabilities of transformers in decision-making problems, i.e., reinforcement learning (RL) for bandits and Markov decision processes. To do so, we introduce… ▽ More Large transformer models trained on diverse datasets have shown a remarkable ability to learn in-context, achieving high few-shot performance on tasks they were not explicitly trained to solve. In this paper, we study the in-context learning capabilities of transformers in decision-making problems, i.e., reinforcement learning (RL) for bandits and Markov decision processes. To do so, we introduce and study Decision-Pretrained Transformer (DPT), a supervised pretraining method where the transformer predicts an optimal action given a query state and an in-context dataset of interactions, across a diverse set of tasks. This procedure, while simple, produces a model with several surprising capabilities. We find that the pretrained transformer can be used to solve a range of RL problems in-context, exhibiting both exploration online and conservatism offline, despite not being explicitly trained to do so. The model also generalizes beyond the pretraining distribution to new tasks and automatically adapts its decision-making strategies to unknown structure. Theoretically, we show DPT can be viewed as an efficient implementation of Bayesian posterior sampling, a provably sample-efficient RL algorithm. We further leverage this connection to provide guarantees on the regret of the in-context algorithm yielded by DPT, and prove that it can learn faster than algorithms used to generate the pretraining data. These results suggest a promising yet simple path towards instilling strong in-context decision-making abilities in transformers. △ Less

Submitted 26 June, 2023; originally announced June 2023.

arXiv:2306.12978 [pdf, other]

Rate-Splitting Multiple Access for 6G Networks: Ten Promising Scenarios and Applications

Authors: Jeonghun Park, Byungju Lee, Jinseok Choi, Hoon Lee, Namyoon Lee, Seok-Hwan Park, Kyoung-Jae Lee, Junil Choi, Sung Ho Chae, Sang-Woon Jeon, Kyung Sup Kwak, Bruno Clerckx, Wonjae Shin

Abstract: In the upcoming 6G era, multiple access (MA) will play an essential role in achieving high throughput performances required in a wide range of wireless applications. Since MA and interference management are closely related issues, the conventional MA techniques are limited in that they cannot provide near-optimal performance in universal interference regimes. Recently, rate-splitting multiple acce… ▽ More In the upcoming 6G era, multiple access (MA) will play an essential role in achieving high throughput performances required in a wide range of wireless applications. Since MA and interference management are closely related issues, the conventional MA techniques are limited in that they cannot provide near-optimal performance in universal interference regimes. Recently, rate-splitting multiple access (RSMA) has been gaining much attention. RSMA splits an individual message into two parts: a common part, decodable by every user, and a private part, decodable only by the intended user. Each user first decodes the common message and then decodes its private message by applying successive interference cancellation (SIC). By doing so, RSMA not only embraces the existing MA techniques as special cases but also provides significant performance gains by efficiently mitigating inter-user interference in a broad range of interference regimes. In this article, we first present the theoretical foundation of RSMA. Subsequently, we put forth four key benefits of RSMA: spectral efficiency, robustness, scalability, and flexibility. Upon this, we describe how RSMA can enable ten promising scenarios and applications along with future research directions to pave the way for 6G. △ Less

Submitted 22 June, 2023; originally announced June 2023.

Comments: 17 pages, 6 figures, submitted to IEEE Network Magazine

arXiv:2306.08997

Exploring the MIT Mathematics and EECS Curriculum Using Large Language Models

Authors: Sarah J. Zhang, Samuel Florin, Ariel N. Lee, Eamon Niknafs, Andrei Marginean, Annie Wang, Keith Tyser, Zad Chin, Yann Hicke, Nikhil Singh, Madeleine Udell, Yoon Kim, Tonio Buonassisi, Armando Solar-Lezama, Iddo Drori

Abstract: We curate a comprehensive dataset of 4,550 questions and solutions from problem sets, midterm exams, and final exams across all MIT Mathematics and Electrical Engineering and Computer Science (EECS) courses required for obtaining a degree. We evaluate the ability of large language models to fulfill the graduation requirements for any MIT major in Mathematics and EECS. Our results demonstrate that… ▽ More We curate a comprehensive dataset of 4,550 questions and solutions from problem sets, midterm exams, and final exams across all MIT Mathematics and Electrical Engineering and Computer Science (EECS) courses required for obtaining a degree. We evaluate the ability of large language models to fulfill the graduation requirements for any MIT major in Mathematics and EECS. Our results demonstrate that GPT-3.5 successfully solves a third of the entire MIT curriculum, while GPT-4, with prompt engineering, achieves a perfect solve rate on a test set excluding questions based on images. We fine-tune an open-source large language model on this dataset. We employ GPT-4 to automatically grade model responses, providing a detailed performance breakdown by course, question, and answer type. By embedding questions in a low-dimensional space, we explore the relationships between questions, topics, and classes and discover which questions and classes are required for solving other questions and classes through few-shot learning. Our analysis offers valuable insights into course prerequisites and curriculum design, highlighting language models' potential for learning and improving Mathematics and EECS education. △ Less

Submitted 24 June, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

Comments: Did not receive permission to release the data or model fine-tuned on the data

arXiv:2306.01792 [pdf, other]

doi 10.1145/3580305.3599516

Task Relation-aware Continual User Representation Learning

Authors: Sein Kim, Namkyeong Lee, Donghyun Kim, Minchul Yang, Chanyoung Park

Abstract: User modeling, which learns to represent users into a low-dimensional representation space based on their past behaviors, got a surge of interest from the industry for providing personalized services to users. Previous efforts in user modeling mainly focus on learning a task-specific user representation that is designed for a single task. However, since learning task-specific user representations… ▽ More User modeling, which learns to represent users into a low-dimensional representation space based on their past behaviors, got a surge of interest from the industry for providing personalized services to users. Previous efforts in user modeling mainly focus on learning a task-specific user representation that is designed for a single task. However, since learning task-specific user representations for every task is infeasible, recent studies introduce the concept of universal user representation, which is a more generalized representation of a user that is relevant to a variety of tasks. Despite their effectiveness, existing approaches for learning universal user representations are impractical in real-world applications due to the data requirement, catastrophic forgetting and the limited learning capability for continually added tasks. In this paper, we propose a novel continual user representation learning method, called TERACON, whose learning capability is not limited as the number of learned tasks increases while capturing the relationship between the tasks. The main idea is to introduce an embedding for each task, i.e., task embedding, which is utilized to generate task-specific soft masks that not only allow the entire model parameters to be updated until the end of training sequence, but also facilitate the relationship between the tasks to be captured. Moreover, we introduce a novel knowledge retention module with pseudo-labeling strategy that successfully alleviates the long-standing problem of continual learning, i.e., catastrophic forgetting. Extensive experiments on public and proprietary real-world datasets demonstrate the superiority and practicality of TERACON. Our code is available at https://github.com/Sein-Kim/TERACON. △ Less

Submitted 23 August, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

Comments: KDD 2023

arXiv:2305.18758 [pdf, other]

doi 10.1145/3580305.3599515

Task-Equivariant Graph Few-shot Learning

Authors: Sungwon Kim, Junseok Lee, Namkyeong Lee, Wonjoong Kim, Seungyoon Choi, Chanyoung Park

Abstract: Although Graph Neural Networks (GNNs) have been successful in node classification tasks, their performance heavily relies on the availability of a sufficient number of labeled nodes per class. In real-world situations, not all classes have many labeled nodes and there may be instances where the model needs to classify new classes, making manual labeling difficult. To solve this problem, it is impo… ▽ More Although Graph Neural Networks (GNNs) have been successful in node classification tasks, their performance heavily relies on the availability of a sufficient number of labeled nodes per class. In real-world situations, not all classes have many labeled nodes and there may be instances where the model needs to classify new classes, making manual labeling difficult. To solve this problem, it is important for GNNs to be able to classify nodes with a limited number of labeled nodes, known as few-shot node classification. Previous episodic meta-learning based methods have demonstrated success in few-shot node classification, but our findings suggest that optimal performance can only be achieved with a substantial amount of diverse training meta-tasks. To address this challenge of meta-learning based few-shot learning (FSL), we propose a new approach, the Task-Equivariant Graph few-shot learning (TEG) framework. Our TEG framework enables the model to learn transferable task-adaptation strategies using a limited number of training meta-tasks, allowing it to acquire meta-knowledge for a wide range of meta-tasks. By incorporating equivariant neural networks, TEG can utilize their strong generalization abilities to learn highly adaptable task-specific strategies. As a result, TEG achieves state-of-the-art performance with limited training meta-tasks. Our experiments on various benchmark datasets demonstrate TEG's superiority in terms of accuracy and generalization ability, even when using minimal meta-training data, highlighting the effectiveness of our proposed approach in addressing the challenges of meta-learning based few-shot node classification. Our code is available at the following link: https://github.com/sung-won-kim/TEG △ Less

Submitted 24 June, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

Comments: KDD 2023

Showing 1–50 of 183 results for author: Lee, N