subscribe to arXiv mailings

Whitening Not Recommended for Classification Tasks in LLMs

Authors: Ali Forooghi, Shaghayegh Sadeghi, Jianguo Lu

Abstract: Sentence embedding is a cornerstone in NLP. Whitening has been claimed to be an effective operation to improve embedding quality obtained from Large Language Models (LLMs). However, we find that the efficacy of whitening is model-dependent and task-dependent. In particular, whitening degenerates embeddings for classification tasks. The conclusion is supported by extensive experiments. We also expl… ▽ More Sentence embedding is a cornerstone in NLP. Whitening has been claimed to be an effective operation to improve embedding quality obtained from Large Language Models (LLMs). However, we find that the efficacy of whitening is model-dependent and task-dependent. In particular, whitening degenerates embeddings for classification tasks. The conclusion is supported by extensive experiments. We also explored a variety of whitening operations, including PCA, ZCA, PCA-Cor, ZCA-Cor and Cholesky whitenings. A by-product of our research is embedding evaluation platform for LLMs called SentEval+. △ Less

Submitted 16 July, 2024; originally announced July 2024.

arXiv:2406.18568 [pdf]

A Diagnostic Model for Acute Lymphoblastic Leukemia Using Metaheuristics and Deep Learning Methods

Authors: M. Hosseinzadeh, P. Khoshaght, S. Sadeghi, P. Asghari, Z. Arabi, J. Lansky, P. Budinsky, A. Masoud Rahmani, S. W. Lee

Abstract: Acute lymphoblastic leukemia (ALL) severity is determined by the presence and ratios of blast cells (abnormal white blood cells) in both bone marrow and peripheral blood. Manual diagnosis of this disease is a tedious and time-consuming operation, making it difficult for professionals to accurately examine blast cell characteristics. To address this difficulty, researchers use deep learning and mac… ▽ More Acute lymphoblastic leukemia (ALL) severity is determined by the presence and ratios of blast cells (abnormal white blood cells) in both bone marrow and peripheral blood. Manual diagnosis of this disease is a tedious and time-consuming operation, making it difficult for professionals to accurately examine blast cell characteristics. To address this difficulty, researchers use deep learning and machine learning. In this paper, a ResNet-based feature extractor is utilized to detect ALL, along with a variety of feature selectors and classifiers. To get the best results, a variety of transfer learning models, including the Resnet, VGG, EfficientNet, and DensNet families, are used as deep feature extractors. Following extraction, different feature selectors are used, including Genetic algorithm, PCA, ANOVA, Random Forest, Univariate, Mutual information, Lasso, XGB, Variance, and Binary ant colony. After feature qualification, a variety of classifiers are used, with MLP outperforming the others. The recommended technique is used to categorize ALL and HEM in the selected dataset which is C-NMC 2019. This technique got an impressive 90.71% accuracy and 95.76% sensitivity for the relevant classifications, and its metrics on this dataset outperformed others. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2406.00118 [pdf, other]

ADEP: A Novel Approach Based on Discriminator-Enhanced Encoder-Decoder Architecture for Accurate Prediction of Adverse Effects in Polypharmacy

Authors: Katayoun Kobraei, Mehrdad Baradaran, Seyed Mohsen Sadeghi, Raziyeh Masumshah, Changiz Eslahchi

Abstract: Motivation: Unanticipated drug-drug interactions (DDIs) pose significant risks in polypharmacy, emphasizing the need for predictive methods. Recent advancements in computational techniques aim to address this challenge. Methods: We introduce ADEP, a novel approach integrating a discriminator and an encoder-decoder model to address data sparsity and enhance feature extraction. ADEP employs a thre… ▽ More Motivation: Unanticipated drug-drug interactions (DDIs) pose significant risks in polypharmacy, emphasizing the need for predictive methods. Recent advancements in computational techniques aim to address this challenge. Methods: We introduce ADEP, a novel approach integrating a discriminator and an encoder-decoder model to address data sparsity and enhance feature extraction. ADEP employs a three-part model, including multiple classification methods, to predict adverse effects in polypharmacy. Results: Evaluation on benchmark datasets shows ADEP outperforms well-known methods such as GGI-DDI, SSF-DDI, LSFC, DPSP, GNN-DDI, MSTE, MDF-SA-DDI, NNPS, DDIMDL, Random Forest, K-Nearest-Neighbor, Logistic Regression, and Decision Tree. Key metrics include Accuracy, AUROC, AUPRC, F-score, Recall, Precision, False Negatives, and False Positives. ADEP achieves more accurate predictions of adverse effects in polypharmacy. A case study with real-world data illustrates ADEP's practical application in identifying potential DDIs and preventing adverse effects. Conclusions: ADEP significantly advances the prediction of polypharmacy adverse effects, offering improved accuracy and reliability. Its innovative architecture enhances feature extraction from sparse medical data, improving medication safety and patient outcomes. Availability: Source code and datasets are available at https://github.com/m0hssn/ADEP. △ Less

Submitted 31 May, 2024; originally announced June 2024.

Comments: 13 pages, 1 figure

arXiv:2405.18772 [pdf, ps, other]

Evolving Reliable Differentiating Constraints for the Chance-constrained Maximum Coverage Problem

Authors: Saba Sadeghi Ahouei, Jacob de Nobel, Aneta Neumann, Thomas Bäck, Frank Neumann

Abstract: Chance-constrained problems involve stochastic components in the constraints which can be violated with a small probability. We investigate the impact of different types of chance constraints on the performance of iterative search algorithms and study the classical maximum coverage problem in graphs with chance constraints. Our goal is to evolve reliable chance constraint settings for a given grap… ▽ More Chance-constrained problems involve stochastic components in the constraints which can be violated with a small probability. We investigate the impact of different types of chance constraints on the performance of iterative search algorithms and study the classical maximum coverage problem in graphs with chance constraints. Our goal is to evolve reliable chance constraint settings for a given graph where the performance of algorithms differs significantly not just in expectation but with high confidence. This allows to better learn and understand how different types of algorithms can deal with different types of constraint settings and supports automatic algorithm selection. We develop an evolutionary algorithm that provides sets of chance constraints that differentiate the performance of two stochastic search algorithms with high confidence. We initially use traditional approximation ratio as the fitness function of (1+1)~EA to evolve instances, which shows inadequacy to generate reliable instances. To address this issue, we introduce a new measure to calculate the performance difference for two algorithms, which considers variances of performance ratios. Our experiments show that our approach is highly successful in solving the instability issue of the performance ratios and leads to evolving reliable sets of chance constraints with significantly different performance for various types of algorithms. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2402.00024 [pdf, other]

Can Large Language Models Understand Molecules?

Authors: Shaghayegh Sadeghi, Alan Bui, Ali Forooghi, Jianguo Lu, Alioune Ngom

Abstract: Purpose: Large Language Models (LLMs) like GPT (Generative Pre-trained Transformer) from OpenAI and LLaMA (Large Language Model Meta AI) from Meta AI are increasingly recognized for their potential in the field of cheminformatics, particularly in understanding Simplified Molecular Input Line Entry System (SMILES), a standard method for representing chemical structures. These LLMs also have the abi… ▽ More Purpose: Large Language Models (LLMs) like GPT (Generative Pre-trained Transformer) from OpenAI and LLaMA (Large Language Model Meta AI) from Meta AI are increasingly recognized for their potential in the field of cheminformatics, particularly in understanding Simplified Molecular Input Line Entry System (SMILES), a standard method for representing chemical structures. These LLMs also have the ability to decode SMILES strings into vector representations. Method: We investigate the performance of GPT and LLaMA compared to pre-trained models on SMILES in embedding SMILES strings on downstream tasks, focusing on two key applications: molecular property prediction and drug-drug interaction prediction. Results: We find that SMILES embeddings generated using LLaMA outperform those from GPT in both molecular property and DDI prediction tasks. Notably, LLaMA-based SMILES embeddings show results comparable to pre-trained models on SMILES in molecular prediction tasks and outperform the pre-trained models for the DDI prediction tasks. Conclusion: The performance of LLMs in generating SMILES embeddings shows great potential for further investigation of these models for molecular embedding. We hope our study bridges the gap between LLMs and molecular embedding, motivating additional research into the potential of LLMs in the molecular representation field. GitHub: https://github.com/sshaghayeghs/LLaMA-VS-GPT △ Less

Submitted 20 May, 2024; v1 submitted 5 January, 2024; originally announced February 2024.

arXiv:2206.04615 [pdf, other]

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting. △ Less

Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

arXiv:2102.12893 [pdf, other]

Novelty and Primacy: A Long-Term Estimator for Online Experiments

Authors: Soheil Sadeghi, Somit Gupta, Stefan Gramatovici, Jiannan Lu, Hao Ai, Ruhan Zhang

Abstract: Online experiments are the gold standard for evaluating impact on user experience and accelerating innovation in software. However, since experiments are typically limited in duration, observed treatment effects are not always permanently stable, sometimes revealing increasing or decreasing patterns over time. There are multiple causes for a treatment effect to change over time. In this paper, we… ▽ More Online experiments are the gold standard for evaluating impact on user experience and accelerating innovation in software. However, since experiments are typically limited in duration, observed treatment effects are not always permanently stable, sometimes revealing increasing or decreasing patterns over time. There are multiple causes for a treatment effect to change over time. In this paper, we focus on a particular cause, user-learning, which is primarily associated with novelty or primacy. Novelty describes the desire to use new technology that tends to diminish over time. Primacy describes the growing engagement with technology as a result of adoption of the innovation. User-learning estimation is critical because it holds experimentation responsible for trustworthiness, empowers organizations to make better decisions by providing a long-term view of expected impact, and prevents user dissatisfaction. In this paper, we propose an observational approach, based on difference-in-differences technique to estimate user-learning at scale. We use this approach to test and estimate user-learning in many experiments at Microsoft. We compare our approach with the existing experimental method to show its benefits in terms of ease of use and higher statistical power, and to discuss its limitation in presence of other forms of treatment interaction with time. △ Less

Submitted 17 February, 2021; originally announced February 2021.

Comments: Submitted to the Technometrics Journal

arXiv:2012.06154 [pdf, other]

ParsiNLU: A Suite of Language Understanding Challenges for Persian

Authors: Daniel Khashabi, Arman Cohan, Siamak Shakeri, Pedram Hosseini, Pouya Pezeshkpour, Malihe Alikhani, Moin Aminnaseri, Marzieh Bitaab, Faeze Brahman, Sarik Ghazarian, Mozhdeh Gheini, Arman Kabiri, Rabeeh Karimi Mahabadi, Omid Memarrast, Ahmadreza Mosallanezhad, Erfan Noury, Shahab Raji, Mohammad Sadegh Rasooli, Sepideh Sadeghi, Erfan Sadeqi Azer, Niloofar Safi Samghabadi, Mahsa Shafaei, Saber Sheybani, Ali Tazarv, Yadollah Yaghoobzadeh

Abstract: Despite the progress made in recent years in addressing natural language understanding (NLU) challenges, the majority of this progress remains to be concentrated on resource-rich languages like English. This work focuses on Persian language, one of the widely spoken languages in the world, and yet there are few NLU datasets available for this rich language. The availability of high-quality evaluat… ▽ More Despite the progress made in recent years in addressing natural language understanding (NLU) challenges, the majority of this progress remains to be concentrated on resource-rich languages like English. This work focuses on Persian language, one of the widely spoken languages in the world, and yet there are few NLU datasets available for this rich language. The availability of high-quality evaluation datasets is a necessity for reliable assessment of the progress on different NLU tasks and domains. We introduce ParsiNLU, the first benchmark in Persian language that includes a range of high-level tasks -- Reading Comprehension, Textual Entailment, etc. These datasets are collected in a multitude of ways, often involving manual annotations by native speakers. This results in over 14.5$k$ new instances across 6 distinct NLU tasks. Besides, we present the first results on state-of-the-art monolingual and multi-lingual pre-trained language-models on this benchmark and compare them with human performance, which provides valuable insights into our ability to tackle natural language understanding challenges in Persian. We hope ParsiNLU fosters further research and advances in Persian language understanding. △ Less

Submitted 13 July, 2021; v1 submitted 11 December, 2020; originally announced December 2020.

Comments: To appear on Transactions of the Association for Computational Linguistics (TACL), 2021

arXiv:2010.13942 [pdf, other]

Exploiting the Nonlinear Stiffness of TMP Origami Folding to Enhance Robotic Jumping Performance

Authors: Sahand Sadeghi, Samuel Allison, Blake Betsill, Suyi Li

Abstract: Via numerical simulation and experimental assessment, this study examines the use of origami folding to develop robotic jumping mechanisms with tailored nonlinear stiffness to improve dynamic performance. Specifically, we use Tachi-Miura Polyhedron (TMP) bellow origami -- which exhibits a nonlinear "strain-softening" force-displacement curve -- as a jumping robotic skeleton with embedded energy st… ▽ More Via numerical simulation and experimental assessment, this study examines the use of origami folding to develop robotic jumping mechanisms with tailored nonlinear stiffness to improve dynamic performance. Specifically, we use Tachi-Miura Polyhedron (TMP) bellow origami -- which exhibits a nonlinear "strain-softening" force-displacement curve -- as a jumping robotic skeleton with embedded energy storage. TMP's nonlinear stiffness allows it to store more energy than a linear spring and offers improved jumping height and airtime. Moreover, the nonlinearity can be tailored by directly changing the underlying TMP crease geometry. A critical challenge is to minimize the TMP's hysteresis and energy loss during its compression stage right before jumping. So we used the plastically annealed lamina emergent origami (PALEO) concept to modify the TMP creases. PALEO increases the folding limit before plastic deformation occurs, thus improving the overall strain energy retention. Jumping experiments confirmed that a nonlinear TMP mechanism achieved roughly 9% improvement in air time and a 13% improvement in jumping height compared to a "control" TMP sample with a relatively linear stiffness. This study's results validate the advantages of using origami in robotic jumping mechanisms and demonstrate the benefits of utilizing nonlinear spring elements for improving jumping performance. Therefore, they could foster a new family of energetically efficient jumping mechanisms with optimized performance in the future. △ Less

Submitted 26 October, 2020; originally announced October 2020.

arXiv:2009.05015 [pdf]

Bias Variance Tradeoff in Analysis of Online Controlled Experiments

Authors: Ali Mahmoudzadeh, Sophia Liu, Sol Sadeghi, Paul Luo Li, Somit Gupta

Abstract: Many organizations utilize large-scale online controlled experiments (OCEs) to accelerate innovation. Having high statistical power to detect small differences between control and treatment accurately is critical, as even small changes in key metrics can be worth millions of dollars or indicate user dissatisfaction for a very large number of users. For large-scale OCE, the duration is typically sh… ▽ More Many organizations utilize large-scale online controlled experiments (OCEs) to accelerate innovation. Having high statistical power to detect small differences between control and treatment accurately is critical, as even small changes in key metrics can be worth millions of dollars or indicate user dissatisfaction for a very large number of users. For large-scale OCE, the duration is typically short (e.g. two weeks) to expedite changes and improvements to the product. In this paper, we examine two common approaches for analyzing usage data collected from users within the time window of an experiment, which can differ in accuracy and power. The open approach includes all relevant usage data from all active users for the entire duration of the experiment. The bounded approach includes data from a fixed period of observation for each user (e.g. seven days after exposure) after the first time a user became active in the experiment window. △ Less

Submitted 10 September, 2020; originally announced September 2020.

MSC Class: 62K99

arXiv:1612.01006 [pdf]

A Non-Local Means Approach for Gaussian Noise Removal from Images using a Modified Weighting Kernel

Authors: Mojtaba Kazemi, Ehsan Mohammadi. P, Parichehr shahidi sadeghi, Mohamad B. Menhaj

Abstract: Gaussian noise removal is an interesting area in digital image processing not only to improve the visual quality, but for its impact on other post-processing algorithms like image registration or segmentation. Many presented state-of-the-art denoising methods are based on the self-similarity or patch-based image processing. Specifically, Non-Local Means (NLM) as a patch-based filter has gained inc… ▽ More Gaussian noise removal is an interesting area in digital image processing not only to improve the visual quality, but for its impact on other post-processing algorithms like image registration or segmentation. Many presented state-of-the-art denoising methods are based on the self-similarity or patch-based image processing. Specifically, Non-Local Means (NLM) as a patch-based filter has gained increasing attention in recent years. Essentially, this filter tends to obtain the noise-less signal value by computing the Gaussian-weighted Euclidean distance between the patch under-processing and other patches inside the image. However, the NLM filter is sensitive to the outliers (pixels that their intensity values are far away from other pixels) inside the patch, meaning that the pixels with the symmetric locations in the patch are assigned the same weight. This can lead to sub-optimal denoising performance when the destructive nature of noise generates some outliers inside patches. In this paper, we propose a new weighting approach to modify the Gaussian kernel of the NLM filter. Our approach employs the geometric distance between image intensities to come up with new weights for each pixel of a patch, lowering the impact of outliers on the denoising performance. Experiments on a set of standard images and different noise levels show that our proposed method outperforms the other compared denoising filters. △ Less

Submitted 3 December, 2016; originally announced December 2016.

Comments: 2017 25th Iranian Conference on Electrical Engineering (ICEE)

arXiv:1308.3312 [pdf]

Designing secure clustering protocol with the approach of reducing energy consumption in wireless sensor networks

Authors: Sanaz Sadeghi, Behrouz Sadeghi

Abstract: In recent years, many researchers have focused on wireless sensor networks and their applications. To obtain scalability potential in these networks most of the nodes are categorized as distinct groups named cluster and the node which is selected as cluster head or Aggregation Node offers the operation of data collection from other cluster nodes and aggregation and sending it to the rest of the ne… ▽ More In recent years, many researchers have focused on wireless sensor networks and their applications. To obtain scalability potential in these networks most of the nodes are categorized as distinct groups named cluster and the node which is selected as cluster head or Aggregation Node offers the operation of data collection from other cluster nodes and aggregation and sending it to the rest of the network. Clustering and data aggregation increase network scalability and cause that limited resources of the network are used well. However, these mechanisms also make several breaches in the network, for example in clustered networks cluster head nodes are considered Desirable and attractive targets for attackers since reaching their information whether by physical attack and node capturing or by other attacks, the attacker can obtain the whole information of corresponding cluster. In this study secure clustering of the nodes are considered with the approach of reducing energy consumption of nodes and a protocol is presented that in addition to satisfying Advanced security needs of wireless sensor networks reduces the amount of energy consumption by the nodes. Due to network clustering there is scalability potential in such a network and According to frequent change of cluster head nodes load distribution is performed in the cluster and eventually increase the network lifetime. △ Less

Submitted 15 August, 2013; originally announced August 2013.

Journal ref: The International Journal of Computer Networks & Communications (IJCNC), July 2013, Volume 5. Number 4

Showing 1–12 of 12 results for author: Sadeghi, S