Skip to main content

Showing 1–50 of 91 results for author: Garg, R

  1. arXiv:2407.12354  [pdf, other

    cs.CV

    Invertible Neural Warp for NeRF

    Authors: Shin-Fang Chng, Ravi Garg, Hemanth Saratchandran, Simon Lucey

    Abstract: This paper tackles the simultaneous optimization of pose and Neural Radiance Fields (NeRF). Departing from the conventional practice of using explicit global representations for camera pose, we propose a novel overparameterized representation that models camera poses as learnable rigid warp functions. We establish that modeling the rigid warps must be tightly coupled with constraints and regulariz… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024. Project page: https://sfchng.github.io/ineurowarping-github.io/

  2. arXiv:2406.18954  [pdf, other

    cs.LG cs.AI

    Alignment For Performance Improvement in Conversation Bots

    Authors: Raghav Garg, Kapil Sharma, Shrey Singla

    Abstract: This paper shows that alignment methods can achieve superior adherence to guardrails compared to instruction fine-tuning alone in conversational agents, also known as bots, within predefined guidelines or 'guardrails'. It examines traditional training approaches such as instruction fine-tuning and the recent advancements in direct alignment methods like Identity Preference Optimization (IPO), and… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  3. A PCA based Keypoint Tracking Approach to Automated Facial Expressions Encoding

    Authors: Shivansh Chandra Tripathi, Rahul Garg

    Abstract: The Facial Action Coding System (FACS) for studying facial expressions is manual and requires significant effort and expertise. This paper explores the use of automated techniques to generate Action Units (AUs) for studying facial expressions. We propose an unsupervised approach based on Principal Component Analysis (PCA) and facial keypoint tracking to generate data-driven AUs called PCA AUs usin… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: This preprint has not undergone peer review or any post-submission improvements or corrections. The Version of Record of this contribution is published in [LNCS,volume 14301], and is available online at https://doi.org/10.1007/978-3-031-45170-6_85

  4. arXiv:2406.05434  [pdf, other

    cs.CV cs.HC

    Unsupervised learning of Data-driven Facial Expression Coding System (DFECS) using keypoint tracking

    Authors: Shivansh Chandra Tripathi, Rahul Garg

    Abstract: The development of existing facial coding systems, such as the Facial Action Coding System (FACS), relied on manual examination of facial expression videos for defining Action Units (AUs). To overcome the labor-intensive nature of this process, we propose the unsupervised learning of an automated facial coding system by leveraging computer-vision-based facial keypoint tracking. In this novel facia… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  5. arXiv:2405.16759  [pdf, other

    cs.CV cs.LG

    Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models

    Authors: Cristina N. Vasconcelos, Abdullah Rashwan, Austin Waters, Trevor Walker, Keyang Xu, Jimmy Yan, Rui Qian, Shixin Luo, Zarana Parekh, Andrew Bunner, Hongliang Fei, Roopal Garg, Mandy Guo, Ivana Kajic, Yeqing Li, Henna Nandwani, Jordi Pont-Tuset, Yasumasa Onoe, Sarah Rosston, Su Wang, Wenlei Zhou, Kevin Swersky, David J. Fleet, Jason M. Baldridge, Oliver Wang

    Abstract: We address the long-standing problem of how to learn effective pixel-based image diffusion models at scale, introducing a remarkably simple greedy growing method for stable training of large-scale, high-resolution models. without the needs for cascaded super-resolution components. The key insight stems from careful pre-training of core components, namely, those responsible for text-to-image alignm… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  6. arXiv:2405.02793  [pdf, other

    cs.CV cs.CL

    ImageInWords: Unlocking Hyper-Detailed Image Descriptions

    Authors: Roopal Garg, Andrea Burns, Burcu Karagol Ayan, Yonatan Bitton, Ceslee Montgomery, Yasumasa Onoe, Andrew Bunner, Ranjay Krishna, Jason Baldridge, Radu Soricut

    Abstract: Despite the longstanding adage "an image is worth a thousand words," creating accurate and hyper-detailed image descriptions for training Vision-Language models remains challenging. Current datasets typically have web-scraped descriptions that are short, low-granularity, and often contain details unrelated to the visual content. As a result, models trained on such data generate descriptions replet… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: Webpage (https://google.github.io/imageinwords), GitHub (https://github.com/google/imageinwords), HuggingFace (https://huggingface.co/datasets/google/imageinwords)

  7. arXiv:2405.01736  [pdf, other

    cs.AR

    PipeOrgan: Efficient Inter-operation Pipelining with Flexible Spatial Organization and Interconnects

    Authors: Raveesh Garg, Hyoukjun Kwon, Eric Qin, Yu-Hsin Chen, Tushar Krishna, Liangzhen Lai

    Abstract: Because of the recent trends in Deep Neural Networks (DNN) models being memory-bound, inter-operator pipelining for DNN accelerators is emerging as a promising optimization. Inter-operator pipelining reduces costly on-chip global memory and off-chip memory accesses by forwarding the output of a layer as the input of the next layer within the compute array, which is proven to be an effective optimi… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  8. arXiv:2404.19753  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    DOCCI: Descriptions of Connected and Contrasting Images

    Authors: Yasumasa Onoe, Sunayana Rane, Zachary Berger, Yonatan Bitton, Jaemin Cho, Roopal Garg, Alexander Ku, Zarana Parekh, Jordi Pont-Tuset, Garrett Tanzer, Su Wang, Jason Baldridge

    Abstract: Vision-language datasets are vital for both text-to-image (T2I) and image-to-text (I2T) research. However, current datasets lack descriptions with fine-grained detail that would allow for richer associations to be learned by models. To fill the gap, we introduce Descriptions of Connected and Contrasting Images (DOCCI), a dataset with long, human-annotated English descriptions for 15k images that w… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  9. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  10. arXiv:2312.03766  [pdf, other

    cs.CL cs.CV

    Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment

    Authors: Brian Gordon, Yonatan Bitton, Yonatan Shafir, Roopal Garg, Xi Chen, Dani Lischinski, Daniel Cohen-Or, Idan Szpektor

    Abstract: While existing image-text alignment models reach high quality binary assessments, they fall short of pinpointing the exact source of misalignment. In this paper, we present a method to provide detailed textual and visual explanation of detected misalignments between text-image pairs. We leverage large language models and visual grounding models to automatically construct a training set that holds… ▽ More

    Submitted 17 July, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

    Journal ref: ECCV 2024

  11. arXiv:2310.18235  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation

    Authors: Jaemin Cho, Yushi Hu, Roopal Garg, Peter Anderson, Ranjay Krishna, Jason Baldridge, Mohit Bansal, Jordi Pont-Tuset, Su Wang

    Abstract: Evaluating text-to-image models is notoriously difficult. A strong recent approach for assessing text-image faithfulness is based on QG/A (question generation and answering), which uses pre-trained foundational models to automatically generate a set of questions and answers from the prompt, and output images are scored based on whether these answers extracted with a visual question answering model… ▽ More

    Submitted 13 March, 2024; v1 submitted 27 October, 2023; originally announced October 2023.

    Comments: ICLR 2024; Project website: https://google.github.io/dsg

  12. arXiv:2309.08685  [pdf, other

    cs.GT cs.DC

    Fairly Allocating Goods in Parallel

    Authors: Rohan Garg, Alexandros Psomas

    Abstract: We initiate the study of parallel algorithms for fairly allocating indivisible goods among agents with additive preferences. We give fast parallel algorithms for various fundamental problems, such as finding a Pareto Optimal and EF1 allocation under restricted additive valuations, finding an EF1 allocation for up to three agents, and finding an envy-free allocation with subsidies. On the flip side… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

  13. arXiv:2306.10392  [pdf, other

    cs.CR cs.LG

    GlyphNet: Homoglyph domains dataset and detection using attention-based Convolutional Neural Networks

    Authors: Akshat Gupta, Laxman Singh Tomar, Ridhima Garg

    Abstract: Cyber attacks deceive machines into believing something that does not exist in the first place. However, there are some to which even humans fall prey. One such famous attack that attackers have used over the years to exploit the vulnerability of vision is known to be a Homoglyph attack. It employs a primary yet effective mechanism to create illegitimate domains that are hard to differentiate from… ▽ More

    Submitted 17 June, 2023; originally announced June 2023.

    Journal ref: AAAI AICS Conference 2023

  14. CFL: Causally Fair Language Models Through Token-level Attribute Controlled Generation

    Authors: Rahul Madhavan, Rishabh Garg, Kahini Wadhawan, Sameep Mehta

    Abstract: We propose a method to control the attributes of Language Models (LMs) for the text generation task using Causal Average Treatment Effect (ATE) scores and counterfactual augmentation. We explore this method, in the context of LM detoxification, and propose the Causally Fair Language (CFL) architecture for detoxifying pre-trained LMs in a plug-and-play manner. Our architecture is based on a Structu… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: 19 pages, 10 figures. Findings of ACL 2023

    Journal ref: Findings of the Association for Computational Linguistics: ACL 2023

  15. arXiv:2303.18135  [pdf

    cs.CR

    Towards A Sustainable and Ethical Supply Chain Management: The Potential of IoT Solutions

    Authors: Hardik Sharma, Rajat Garg, Harshini Sewani, Rasha Kashef

    Abstract: Globalization has introduced many new challenges making Supply chain management (SCM) complex and huge, for which improvement is needed in many industries. The Internet of Things (IoT) has solved many problems by providing security and traceability with a promising solution for supply chain management. SCM is segregated into different processes, each requiring different types of solutions. IoT dev… ▽ More

    Submitted 29 March, 2023; originally announced March 2023.

    Comments: 9 pages

  16. arXiv:2303.13504  [pdf, other

    cs.CV

    ReBotNet: Fast Real-time Video Enhancement

    Authors: Jeya Maria Jose Valanarasu, Rahul Garg, Andeep Toor, Xin Tong, Weijuan Xi, Andreas Lugmayr, Vishal M. Patel, Anne Menini

    Abstract: Most video restoration networks are slow, have high computational load, and can't be used for real-time video enhancement. In this work, we design an efficient and fast framework to perform real-time video enhancement for practical use-cases like live video calls and video streams. Our proposed method, called Recurrent Bottleneck Mixer Network (ReBotNet), employs a dual-branch framework. The first… ▽ More

    Submitted 23 March, 2023; originally announced March 2023.

    Comments: Project Website: https://jeya-maria-jose.github.io/rebotnet-web/

  17. arXiv:2303.11499  [pdf, other

    cs.DC cs.AR

    Exploiting Inter-Operation Data Reuse in Scientific Applications using GOGETA

    Authors: Raveesh Garg, Michael Pellauer, Sivasankaran Rajamanickam, Tushar Krishna

    Abstract: HPC applications are critical in various scientific domains ranging from molecular dynamics to chemistry to fluid dynamics. Conjugate Gradient (CG) is a popular application kernel used in iterative linear HPC solvers and has applications in numerous scientific domains. However, the HPCG benchmark shows that the peformance achieved by Top500 HPC systems on CG is a small fraction of the performance… ▽ More

    Submitted 20 March, 2023; originally announced March 2023.

  18. mlpack 4: a fast, header-only C++ machine learning library

    Authors: Ryan R. Curtin, Marcus Edel, Omar Shrit, Shubham Agrawal, Suryoday Basak, James J. Balamuta, Ryan Birmingham, Kartik Dutt, Dirk Eddelbuettel, Rishabh Garg, Shikhar Jaiswal, Aakash Kaushik, Sangyeon Kim, Anjishnu Mukherjee, Nanubala Gnana Sai, Nippun Sharma, Yashwant Singh Parihar, Roshan Swain, Conrad Sanderson

    Abstract: For over 15 years, the mlpack machine learning library has served as a "swiss army knife" for C++-based machine learning. Its efficient implementations of common and cutting-edge machine learning algorithms have been used in a wide variety of scientific and industrial applications. This paper overviews mlpack 4, a significant upgrade over its predecessor. The library has been significantly refacto… ▽ More

    Submitted 1 February, 2023; originally announced February 2023.

    Journal ref: Journal of Open Source Software, Vol. 8, No. 82, 2023

  19. arXiv:2301.10852  [pdf, other

    cs.AR

    Flexagon: A Multi-Dataflow Sparse-Sparse Matrix Multiplication Accelerator for Efficient DNN Processing

    Authors: Francisco Muñoz-Martínez, Raveesh Garg, José L. Abellán, Michael Pellauer, Manuel E. Acacio, Tushar Krishna

    Abstract: Sparsity is a growing trend in modern DNN models. Existing Sparse-Sparse Matrix Multiplication (SpMSpM) accelerators are tailored to a particular SpMSpM dataflow (i.e., Inner Product, Outer Product or Gustavsons), that determines their overall efficiency. We demonstrate that this static decision inherently results in a suboptimal dynamic solution. This is because different SpMSpM kernels show vary… ▽ More

    Submitted 25 January, 2023; originally announced January 2023.

    Comments: To appear on ASPLOS 2023

  20. arXiv:2211.14387  [pdf

    cs.LG cs.AI econ.EM

    Machine Learning Algorithms for Time Series Analysis and Forecasting

    Authors: Rameshwar Garg, Shriya Barpanda, Girish Rao Salanke N S, Ramya S

    Abstract: Time series data is being used everywhere, from sales records to patients' health evolution metrics. The ability to deal with this data has become a necessity, and time series analysis and forecasting are used for the same. Every Machine Learning enthusiast would consider these as very important tools, as they deepen the understanding of the characteristics of data. Forecasting is used to predict… ▽ More

    Submitted 25 November, 2022; originally announced November 2022.

    Comments: 9 Pages, 4 Figures, 9 Formulae, 1 Table, 6th International Conference on Microelectronics, Computing & Communication Systems (MCCS-2021), Paper ID: MCCS21084, Presented at MCCS-2021, Accepted, In Press

  21. arXiv:2206.13577  [pdf, other

    cs.CV cs.AI cs.LG

    A View Independent Classification Framework for Yoga Postures

    Authors: Mustafa Chasmai, Nirjhar Das, Aman Bhardwaj, Rahul Garg

    Abstract: Yoga is a globally acclaimed and widely recommended practice for a healthy living. Maintaining correct posture while performing a Yogasana is of utmost importance. In this work, we employ transfer learning from Human Pose Estimation models for extracting 136 key-points spread all over the body to train a Random Forest classifier which is used for estimation of the Yogasanas. The results are evalua… ▽ More

    Submitted 14 August, 2022; v1 submitted 27 June, 2022; originally announced June 2022.

  22. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  23. arXiv:2202.11233  [pdf, other

    cs.CV

    Retrieval Augmented Classification for Long-Tail Visual Recognition

    Authors: Alexander Long, Wei Yin, Thalaiyasingam Ajanthan, Vu Nguyen, Pulak Purkait, Ravi Garg, Alan Blair, Chunhua Shen, Anton van den Hengel

    Abstract: We introduce Retrieval Augmented Classification (RAC), a generic approach to augmenting standard image classification pipelines with an explicit retrieval module. RAC consists of a standard base image encoder fused with a parallel retrieval branch that queries a non-parametric external memory of pre-encoded images and associated text snippets. We apply RAC to the problem of long-tail classificatio… ▽ More

    Submitted 22 February, 2022; originally announced February 2022.

  24. arXiv:2201.08916  [pdf, other

    cs.AR

    Enabling Flexibility for Sparse Tensor Acceleration via Heterogeneity

    Authors: Eric Qin, Raveesh Garg, Abhimanyu Bambhaniya, Michael Pellauer, Angshuman Parashar, Sivasankaran Rajamanickam, Cong Hao, Tushar Krishna

    Abstract: Recently, numerous sparse hardware accelerators for Deep Neural Networks (DNNs), Graph Neural Networks (GNNs), and scientific computing applications have been proposed. A common characteristic among all of these accelerators is that they target tensor algebra (typically matrix multiplications); yet dozens of new accelerators are proposed for every new application. The motivation is that the size a… ▽ More

    Submitted 21 January, 2022; originally announced January 2022.

  25. arXiv:2112.14406  [pdf, other

    cs.CV cs.LG

    Overcoming Mode Collapse with Adaptive Multi Adversarial Training

    Authors: Karttikeya Mangalam, Rohin Garg

    Abstract: Generative Adversarial Networks (GANs) are a class of generative models used for various applications, but they have been known to suffer from the mode collapse problem, in which some modes of the target distribution are ignored by the generator. Investigative study using a new data generation procedure indicates that the mode collapse of the generator is driven by the discriminator's inability to… ▽ More

    Submitted 29 December, 2021; originally announced December 2021.

    Comments: BMVC 2021 Poster

  26. arXiv:2112.05858  [pdf, other

    cs.DC

    MANA-2.0: A Future-Proof Design for Transparent Checkpointing of MPI at Scale

    Authors: Yao Xu, Zhengji Zhao, Rohan Garg, Harsh Khetawat, Rebecca Hartman-Baker, Gene Cooperman

    Abstract: MANA-2.0 is a scalable, future-proof design for transparent checkpointing of MPI-based computations. Its network transparency ("network-agnostic") feature ensures that MANA-2.0 will provide a viable, efficient mechanism for transparently checkpointing MPI applications on current and future supercomputers. MANA-2.0 is an enhancement of previous work, the original MANA, which interposes MPI calls, a… ▽ More

    Submitted 10 December, 2021; originally announced December 2021.

  27. arXiv:2111.10882  [pdf, other

    cs.CV cs.SD eess.AS

    Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video

    Authors: Rishabh Garg, Ruohan Gao, Kristen Grauman

    Abstract: Binaural audio provides human listeners with an immersive spatial sound experience, but most existing videos lack binaural audio recordings. We propose an audio spatialization method that draws on visual information in videos to convert their monaural (single-channel) audio to binaural audio. Whereas existing approaches leverage visual features extracted directly from video frames, our approach ex… ▽ More

    Submitted 21 November, 2021; originally announced November 2021.

    Comments: Published in BMVC 2021, project page: http://vision.cs.utexas.edu/projects/geometry-aware-binaural/

  28. arXiv:2110.12012  [pdf

    cs.DC cs.DB cs.LG

    RDD-Eclat: Approaches to Parallelize Eclat Algorithm on Spark RDD Framework (Extended Version)

    Authors: Pankaj Singh, Sudhakar Singh, P K Mishra, Rakhi Garg

    Abstract: Frequent itemset mining (FIM) is a highly computational and data intensive algorithm. Therefore, parallel and distributed FIM algorithms have been designed to process large volume of data in a reduced time. Recently, a number of FIM algorithms have been designed on Hadoop MapReduce, a distributed big data processing framework. But, due to heavy disk I/O, MapReduce is found to be inefficient for th… ▽ More

    Submitted 22 October, 2021; originally announced October 2021.

    Comments: This version is not published or communicated anywhere. arXiv admin note: substantial text overlap with arXiv:1912.06415

  29. arXiv:2110.05655  [pdf, other

    cs.CV

    Defocus Map Estimation and Deblurring from a Single Dual-Pixel Image

    Authors: Shumian Xin, Neal Wadhwa, Tianfan Xue, Jonathan T. Barron, Pratul P. Srinivasan, Jiawen Chen, Ioannis Gkioulekas, Rahul Garg

    Abstract: We present a method that takes as input a single dual-pixel image, and simultaneously estimates the image's defocus map -- the amount of defocus blur at each pixel -- and recovers an all-in-focus image. Our method is inspired from recent works that leverage the dual-pixel sensors available in many consumer cameras to assist with autofocus, and use them for recovery of defocus maps or all-in-focus… ▽ More

    Submitted 11 October, 2021; originally announced October 2021.

    Comments: ICCV 2021 (Oral)

  30. i-Pulse: A NLP based novel approach for employee engagement in logistics organization

    Authors: Rachit Garg, Arvind W Kiwelekar, Laxman D Netak, Akshay Ghodake

    Abstract: Although most logistics and freight forwarding organizations, in one way or another, claim to have core values. The engagement of employees is a vast structure that affects almost every part of the company's core environmental values. There is little theoretical knowledge about the relationship between firms and the engagement of employees. Based on research literature, this paper aims to provide… ▽ More

    Submitted 24 May, 2021; originally announced June 2021.

    Comments: 11 Pages 7 Figures. International Journal of Information Management Data Insights (Elsevier) 2021

  31. arXiv:2105.04419  [pdf, other

    cs.RO

    VDB-EDT: An Efficient Euclidean Distance Transform Algorithm Based on VDB Data Structure

    Authors: Delong Zhu, Chaoqun Wang, Wenshan Wang, Rohit Garg, Sebastian Scherer, Max Q. -H. Meng

    Abstract: This paper presents a fundamental algorithm, called VDB-EDT, for Euclidean distance transform (EDT) based on the VDB data structure. The algorithm executes on grid maps and generates the corresponding distance field for recording distance information against obstacles, which forms the basis of numerous motion planning algorithms. The contributions of this work mainly lie in three folds. Firstly, w… ▽ More

    Submitted 10 May, 2021; originally announced May 2021.

  32. arXiv:2104.01272  [pdf, other

    cs.RO eess.SY

    Visual Servoing Approach for Autonomous UAV Landing on a Moving Vehicle

    Authors: Azarakhsh Keipour, Guilherme A. S. Pereira, Rogerio Bonatti, Rohit Garg, Puru Rastogi, Geetesh Dubey, Sebastian Scherer

    Abstract: Many aerial robotic applications require the ability to land on moving platforms, such as delivery trucks and marine research boats. We present a method to autonomously land an Unmanned Aerial Vehicle on a moving vehicle. A visual servoing controller approaches the ground vehicle using velocity commands calculated directly in image space. The control laws generate velocity commands in all three di… ▽ More

    Submitted 26 December, 2022; v1 submitted 2 April, 2021; originally announced April 2021.

    Comments: 18 pages. Published in Sensors Journal

    Journal ref: Sensors 2022, 22(17), 6549

  33. arXiv:2103.08546  [pdf, other

    cs.DC

    Improving scalability and reliability of MPI-agnostic transparent checkpointing for production workloads at NERSC

    Authors: Prashant Singh Chouhan, Harsh Khetawat, Neil Resnik, Twinkle Jain, Rohan Garg, Gene Cooperman, Rebecca Hartman-Baker, Zhengji Zhao

    Abstract: Checkpoint/restart (C/R) provides fault-tolerant computing capability, enables long running applications, and provides scheduling flexibility for computing centers to support diverse workloads with different priority. It is therefore vital to get transparent C/R capability working at NERSC. MANA, by Garg et. al., is a transparent checkpointing tool that has been selected due to its MPI-agnostic an… ▽ More

    Submitted 16 March, 2021; v1 submitted 15 March, 2021; originally announced March 2021.

  34. arXiv:2103.07977  [pdf, other

    cs.DC cs.AR

    Understanding the Design-Space of Sparse/Dense Multiphase GNN dataflows on Spatial Accelerators

    Authors: Raveesh Garg, Eric Qin, Francisco Muñoz-Martínez, Robert Guirado, Akshay Jain, Sergi Abadal, José L. Abellán, Manuel E. Acacio, Eduard Alarcón, Sivasankaran Rajamanickam, Tushar Krishna

    Abstract: Graph Neural Networks (GNNs) have garnered a lot of recent interest because of their success in learning representations from graph-structured data across several critical applications in cloud and HPC. Owing to their unique compute and memory characteristics that come from an interplay between dense and sparse phases of computations, the emergence of reconfigurable dataflow (aka spatial) accelera… ▽ More

    Submitted 6 March, 2022; v1 submitted 14 March, 2021; originally announced March 2021.

    Comments: Accepted for publication at the 36th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2022)

  35. arXiv:2103.00933  [pdf, other

    cs.CV

    DF-VO: What Should Be Learnt for Visual Odometry?

    Authors: Huangying Zhan, Chamara Saroj Weerasekera, Jia-Wang Bian, Ravi Garg, Ian Reid

    Abstract: Multi-view geometry-based methods dominate the last few decades in monocular Visual Odometry for their superior performance, while they have been vulnerable to dynamic and low-texture scenes. More importantly, monocular methods suffer from scale-drift issue, i.e., errors accumulate over time. Recent studies show that deep neural networks can learn scene depths and relative camera in a self-supervi… ▽ More

    Submitted 1 March, 2021; originally announced March 2021.

    Comments: extended version of ICRA-2020 paper (Visual Odometry Revisited: What Should Be Learnt?)

  36. Context- and Sequence-Aware Convolutional Recurrent Encoder for Neural Machine Translation

    Authors: Ritam Mallick, Seba Susan, Vaibhaw Agrawal, Rizul Garg, Prateek Rawal

    Abstract: Neural Machine Translation model is a sequence-to-sequence converter based on neural networks. Existing models use recurrent neural networks to construct both the encoder and decoder modules. In alternative research, the recurrent networks were substituted by convolutional neural networks for capturing the syntactic structure in the input sentence and decreasing the processing time. We incorporate… ▽ More

    Submitted 21 March, 2021; v1 submitted 11 January, 2021; originally announced January 2021.

    Comments: Accepted in 36th ACM/SIGAPP Symposium On Applied Computing 2021

  37. arXiv:2101.00216  [pdf

    cs.CV

    Brain Tumor Detection and Classification based on Hybrid Ensemble Classifier

    Authors: Ginni Garg, Ritu Garg

    Abstract: To improve patient survival and treatment outcomes, early diagnosis of brain tumors is an essential task. It is a difficult task to evaluate the magnetic resonance imaging (MRI) images manually. Thus, there is a need for digital methods for tumor diagnosis with better accuracy. However, it is still a very challenging task in assessing their shape, volume, boundaries, tumor detection, size, segment… ▽ More

    Submitted 1 January, 2021; originally announced January 2021.

    Comments: 18 Pages, 12 figures, 4 Tables

  38. arXiv:2101.00214  [pdf

    cs.CV cs.LG

    A Hybrid MLP-SVM Model for Classification using Spatial-Spectral Features on Hyper-Spectral Images

    Authors: Ginni Garg, Dheeraj Kumar, ArvinderPal, Yash Sonker, Ritu Garg

    Abstract: There are many challenges in the classification of hyper spectral images such as large dimensionality, scarcity of labeled data and spatial variability of spectral signatures. In this proposed method, we make a hybrid classifier (MLP-SVM) using multilayer perceptron (MLP) and support vector machine (SVM) which aimed to improve the various classification parameters such as accuracy, precision, reca… ▽ More

    Submitted 1 January, 2021; originally announced January 2021.

    Comments: 9 pages, 5 figures, 4 Tables

  39. arXiv:2012.09401  [pdf, other

    cs.CV

    Zoom-to-Inpaint: Image Inpainting with High-Frequency Details

    Authors: Soo Ye Kim, Kfir Aberman, Nori Kanazawa, Rahul Garg, Neal Wadhwa, Huiwen Chang, Nikhil Karnad, Munchurl Kim, Orly Liba

    Abstract: Although deep learning has enabled a huge leap forward in image inpainting, current methods are often unable to synthesize realistic high-frequency details. In this paper, we propose applying super-resolution to coarsely reconstructed outputs, refining them at high resolution, and then downscaling the output to the original resolution. By introducing high-resolution images to the refinement networ… ▽ More

    Submitted 29 June, 2022; v1 submitted 17 December, 2020; originally announced December 2020.

    Comments: Accepted to CVPRW 2022

  40. arXiv:2011.12485  [pdf, other

    eess.IV cs.CV

    How to Train Neural Networks for Flare Removal

    Authors: Yicheng Wu, Qiurui He, Tianfan Xue, Rahul Garg, Jiawen Chen, Ashok Veeraraghavan, Jonathan T. Barron

    Abstract: When a camera is pointed at a strong light source, the resulting photograph may contain lens flare artifacts. Flares appear in a wide variety of patterns (halos, streaks, color bleeding, haze, etc.) and this diversity in appearance makes flare removal challenging. Existing analytical solutions make strong assumptions about the artifact's geometry or brightness, and therefore only work well on a sm… ▽ More

    Submitted 7 October, 2021; v1 submitted 24 November, 2020; originally announced November 2020.

    Comments: A new version paper is uploaded

  41. arXiv:2011.06237  [pdf, other

    cs.HC cs.IR cs.LG

    Goal-driven Command Recommendations for Analysts

    Authors: Samarth Aggarwal, Rohin Garg, Abhilasha Sancheti, Bhanu Prakash Reddy Guda, Iftikhar Ahamath Burhanuddin

    Abstract: Recent times have seen data analytics software applications become an integral part of the decision-making process of analysts. The users of these software applications generate a vast amount of unstructured log data. These logs contain clues to the user's goals, which traditional recommender systems may find difficult to model implicitly from the log data. With this assumption, we would like to a… ▽ More

    Submitted 12 November, 2020; originally announced November 2020.

    Comments: 14th ACM Conference on Recommender Systems (RecSys 2020)

  42. arXiv:2010.00702  [pdf, other

    cs.CV

    Learned Dual-View Reflection Removal

    Authors: Simon Niklaus, Xuaner Cecilia Zhang, Jonathan T. Barron, Neal Wadhwa, Rahul Garg, Feng Liu, Tianfan Xue

    Abstract: Traditional reflection removal algorithms either use a single image as input, which suffers from intrinsic ambiguities, or use multiple images from a moving camera, which is inconvenient for users. We instead propose a learning-based dereflection algorithm that uses stereo images as input. This is an effective trade-off between the two extremes: the parallax between two views provides cues to remo… ▽ More

    Submitted 1 October, 2020; originally announced October 2020.

    Comments: http://sniklaus.com/dualref

  43. arXiv:2008.12516  [pdf, other

    cs.DC cs.DS

    Fast and Work-Optimal Parallel Algorithms for Predicate Detection

    Authors: Rohan Garg

    Abstract: Recently, the predicate detection problem was shown to be in the parallel complexity class NC. In this paper, we give the first work-optimal parallel algorithm to solve the predicate detection problem on a distributed computation with $n$ processes and at most $m$ states per process. The previous best known parallel predicate detection algorithm, ParallelCut, has time complexity $O(\log mn)$ and w… ▽ More

    Submitted 2 December, 2020; v1 submitted 28 August, 2020; originally announced August 2020.

    Comments: Fixed minor bug in JLSDetect from Version 3 with new subroutine FLIS

  44. arXiv:2007.10866  [pdf, ps, other

    cs.CL cs.AI cs.LG

    IITK-RSA at SemEval-2020 Task 5: Detecting Counterfactuals

    Authors: Anirudh Anil Ojha, Rohin Garg, Shashank Gupta, Ashutosh Modi

    Abstract: This paper describes our efforts in tackling Task 5 of SemEval-2020. The task involved detecting a class of textual expressions known as counterfactuals and separating them into their constituent elements. Counterfactual statements describe events that have not or could not have occurred and the possible implications of such events. While counterfactual reasoning is natural for humans, understandi… ▽ More

    Submitted 21 July, 2020; originally announced July 2020.

    Comments: 10 pages, 1 figure, 4 tables. For associated code, see https://github.com/gargrohin/Counterfactuals-NLP. Accepted at Proceedings of 14th International Workshop on Semantic Evaluation (SemEval-2020)

  45. arXiv:2004.12260  [pdf, other

    cs.CV

    Learning to Autofocus

    Authors: Charles Herrmann, Richard Strong Bowen, Neal Wadhwa, Rahul Garg, Qiurui He, Jonathan T. Barron, Ramin Zabih

    Abstract: Autofocus is an important task for digital cameras, yet current approaches often exhibit poor performance. We propose a learning-based approach to this problem, and provide a realistic dataset of sufficient size for effective learning. Our dataset is labeled with per-pixel depths obtained from multi-view stereo, following "Learning single camera depth estimation using dual-pixels". Using this data… ▽ More

    Submitted 2 May, 2020; v1 submitted 25 April, 2020; originally announced April 2020.

    Comments: CVPR 2020

  46. arXiv:2003.14299  [pdf, other

    cs.CV

    Du$^2$Net: Learning Depth Estimation from Dual-Cameras and Dual-Pixels

    Authors: Yinda Zhang, Neal Wadhwa, Sergio Orts-Escolano, Christian Häne, Sean Fanello, Rahul Garg

    Abstract: Computational stereo has reached a high level of accuracy, but degrades in the presence of occlusions, repeated textures, and correspondence errors along edges. We present a novel approach based on neural networks for depth estimation that combines stereo from dual cameras with stereo from a dual-pixel sensor, which is increasingly common on consumer cameras. Our network uses a novel architecture… ▽ More

    Submitted 31 March, 2020; originally announced March 2020.

  47. arXiv:1912.06415  [pdf

    cs.DC cs.DB cs.DS cs.LG

    RDD-Eclat: Approaches to Parallelize Eclat Algorithm on Spark RDD Framework

    Authors: Pankaj Singh, Sudhakar Singh, P. K. Mishra, Rakhi Garg

    Abstract: Initially, a number of frequent itemset mining (FIM) algorithms have been designed on the Hadoop MapReduce, a distributed big data processing framework. But, due to heavy disk I/O, MapReduce is found to be inefficient for such highly iterative algorithms. Therefore, Spark, a more efficient distributed data processing framework, has been developed with in-memory computation and resilient distribute… ▽ More

    Submitted 13 December, 2019; originally announced December 2019.

    Comments: 16 pages, 6 figures, ICCNCT 2019

    Report number: ICCNCT-171

    Journal ref: ICCNCT 2019, LNDECT 44

  48. arXiv:1912.04453  [pdf

    cs.LG eess.IV

    Enhancing Learnability of classification algorithms using simple data preprocessing in fMRI scans of Alzheimer's disease

    Authors: Rishu Garg, Rekh Ram Janghel, Yogesh Rathore

    Abstract: Alzheimer's Disease (AD) is the most common type of dementia. In all leading countries, it is one of the primary reasons of death in senior citizens. Currently, it is diagnosed by calculating the MSME score and by the manual study of MRI Scan. Also, different machine learning methods are utilized for automatic diagnosis but existing has some limitations in terms of accuracy. In this paper, we have… ▽ More

    Submitted 9 December, 2019; originally announced December 2019.

    Comments: 8 Pages, 6 Figures, 3 Tables

  49. arXiv:1912.03798  [pdf

    cs.LG stat.ML

    Decision Support System for Detection and Classification of Skin Cancer using CNN

    Authors: Rishu Garg, Saumil Maheshwari, Anupam Shukla

    Abstract: Skin Cancer is one of the most deathful of all the cancers. It is bound to spread to different parts of the body on the off chance that it is not analyzed and treated at the beginning time. It is mostly because of the abnormal growth of skin cells, often develops when the body is exposed to sunlight. The Detection Furthermore, the characterization of skin malignant growth in the beginning time is… ▽ More

    Submitted 8 December, 2019; originally announced December 2019.

    Comments: 9 pages, 3 figures, 5 tables

  50. arXiv:1912.03789   

    cs.LG stat.ML

    Feature Engineering Combined with 1 D Convolutional Neural Network for Improved Mortality Prediction

    Authors: Saumil Maheshwari, Rohit Verma, Anupam Shukla, Ritu Tiwari, Rishu Garg

    Abstract: The intensive care units (ICUs) are responsible for generating a wealth of useful data in the form of Electronic Health Record (EHR). This data allows for the development of a prediction tool with perfect knowledge backing. We aimed to build a mortality prediction model on 2012 Physionet Challenge mortality prediction database of 4000 patients admitted in ICU. The challenges in the dataset, such a… ▽ More

    Submitted 27 July, 2020; v1 submitted 8 December, 2019; originally announced December 2019.

    Comments: Being a short term project, this paper is not exhaustive