Skip to main content

Showing 1–50 of 65 results for author: Erdem, E

  1. arXiv:2407.12498  [pdf, other

    cs.CL cs.CV

    Evaluating Linguistic Capabilities of Multimodal LLMs in the Lens of Few-Shot Learning

    Authors: Mustafa Dogan, Ilker Kesen, Iacer Calixto, Aykut Erdem, Erkut Erdem

    Abstract: The linguistic capabilities of Multimodal Large Language Models (MLLMs) are critical for their effective application across diverse tasks. This study aims to evaluate the performance of MLLMs on the VALSE benchmark, focusing on the efficacy of few-shot In-Context Learning (ICL), and Chain-of-Thought (CoT) prompting. We conducted a comprehensive assessment of state-of-the-art MLLMs, varying in mode… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Preprint. 33 pages, 17 Figures, 3 Tables

  2. arXiv:2406.09368  [pdf, other

    cs.CV

    CLIPAway: Harmonizing Focused Embeddings for Removing Objects via Diffusion Models

    Authors: Yigit Ekin, Ahmet Burak Yildirim, Erdem Eren Caglar, Aykut Erdem, Erkut Erdem, Aysegul Dundar

    Abstract: Advanced image editing techniques, particularly inpainting, are essential for seamlessly removing unwanted elements while preserving visual integrity. Traditional GAN-based methods have achieved notable success, but recent advancements in diffusion models have produced superior results due to their training on large-scale datasets, enabling the generation of remarkably realistic inpainted images.… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Project page: https://yigitekin.github.io/CLIPAway/

  3. arXiv:2405.00878  [pdf, other

    cs.CV

    SonicDiffusion: Audio-Driven Image Generation and Editing with Pretrained Diffusion Models

    Authors: Burak Can Biner, Farrin Marouf Sofian, Umur Berkay Karakaş, Duygu Ceylan, Erkut Erdem, Aykut Erdem

    Abstract: We are witnessing a revolution in conditional image synthesis with the recent success of large scale text-to-image generation methods. This success also opens up new opportunities in controlling the generation and editing process using multi-modal input. While spatial control using cues such as depth, sketch, and other images has attracted a lot of research, we argue that another equally effective… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  4. arXiv:2404.16621  [pdf, other

    cs.LG cs.AI cs.CL

    Hippocrates: An Open-Source Framework for Advancing Large Language Models in Healthcare

    Authors: Emre Can Acikgoz, Osman Batur İnce, Rayene Bench, Arda Anıl Boz, İlker Kesen, Aykut Erdem, Erkut Erdem

    Abstract: The integration of Large Language Models (LLMs) into healthcare promises to transform medical diagnostics, research, and patient care. Yet, the progression of medical LLMs faces obstacles such as complex training requirements, rigorous evaluation demands, and the dominance of proprietary models that restrict academic exploration. Transparent, comprehensive access to LLM resources is essential for… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  5. arXiv:2404.12013  [pdf, other

    cs.CL

    Sequential Compositional Generalization in Multimodal Models

    Authors: Semih Yagcioglu, Osman Batur İnce, Aykut Erdem, Erkut Erdem, Desmond Elliott, Deniz Yuret

    Abstract: The rise of large-scale multimodal models has paved the pathway for groundbreaking advances in generative modeling and reasoning, unlocking transformative applications in a variety of complex tasks. However, a pressing question that remains is their genuine capability for stronger forms of generalization, which has been largely underexplored in the multimodal setting. Our study aims to address thi… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: Accepted to the main conference of NAACL (2024) as a long paper

  6. arXiv:2311.07022  [pdf, other

    cs.CL cs.AI cs.CV

    ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models

    Authors: Ilker Kesen, Andrea Pedrotti, Mustafa Dogan, Michele Cafagna, Emre Can Acikgoz, Letitia Parcalabescu, Iacer Calixto, Anette Frank, Albert Gatt, Aykut Erdem, Erkut Erdem

    Abstract: With the ever-increasing popularity of pretrained Video-Language Models (VidLMs), there is a pressing need to develop robust evaluation methodologies that delve deeper into their visio-linguistic capabilities. To address this challenge, we present ViLMA (Video Language Model Assessment), a task-agnostic benchmark that places the assessment of fine-grained capabilities of these models on a firm foo… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

    Comments: Preprint. 48 pages, 22 figures, 10 tables

  7. arXiv:2310.12118  [pdf, other

    cs.CL

    Harnessing Dataset Cartography for Improved Compositional Generalization in Transformers

    Authors: Osman Batur İnce, Tanin Zeraati, Semih Yagcioglu, Yadollah Yaghoobzadeh, Erkut Erdem, Aykut Erdem

    Abstract: Neural networks have revolutionized language modeling and excelled in various downstream tasks. However, the extent to which these models achieve compositional generalization comparable to human cognitive abilities remains a topic of debate. While existing approaches in the field have mainly focused on novel architectures and alternative learning paradigms, we introduce a pioneering method harness… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Comments: Accepted to Findings of EMNLP 2023

  8. Hyperspectral Image Denoising via Self-Modulating Convolutional Neural Networks

    Authors: Orhan Torun, Seniha Esen Yuksel, Erkut Erdem, Nevrez Imamoglu, Aykut Erdem

    Abstract: Compared to natural images, hyperspectral images (HSIs) consist of a large number of bands, with each band capturing different spectral information from a certain wavelength, even some beyond the visible spectrum. These characteristics of HSIs make them highly effective for remote sensing applications. That said, the existing hyperspectral imaging devices introduce severe degradation in HSIs. Henc… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Journal ref: Signal Processing, Volume 214, January 2024, 109248

  9. arXiv:2308.13004  [pdf, other

    cs.CV cs.AI cs.MM

    Spherical Vision Transformer for 360-degree Video Saliency Prediction

    Authors: Mert Cokelek, Nevrez Imamoglu, Cagri Ozcinar, Erkut Erdem, Aykut Erdem

    Abstract: The growing interest in omnidirectional videos (ODVs) that capture the full field-of-view (FOV) has gained 360-degree saliency prediction importance in computer vision. However, predicting where humans look in 360-degree scenes presents unique challenges, including spherical distortion, high resolution, and limited labelled data. We propose a novel vision-transformer-based model for omnidirectiona… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

    Comments: 12 pages, 4 figures, accepted to BMVC 2023

  10. arXiv:2307.08397  [pdf, other

    cs.CV

    CLIP-Guided StyleGAN Inversion for Text-Driven Real Image Editing

    Authors: Ahmet Canberk Baykal, Abdul Basit Anees, Duygu Ceylan, Erkut Erdem, Aykut Erdem, Deniz Yuret

    Abstract: Researchers have recently begun exploring the use of StyleGAN-based models for real image editing. One particularly interesting application is using natural language descriptions to guide the editing process. Existing approaches for editing images using language either resort to instance-level latent code optimization or map predefined text prompts to some editing directions in the latent space. H… ▽ More

    Submitted 18 July, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

    Comments: Accepted for publication in ACM Transactions on Graphics

  11. HyperE2VID: Improving Event-Based Video Reconstruction via Hypernetworks

    Authors: Burak Ercan, Onur Eker, Canberk Saglam, Aykut Erdem, Erkut Erdem

    Abstract: Event-based cameras are becoming increasingly popular for their ability to capture high-speed motion with low latency and high dynamic range. However, generating videos from events remains challenging due to the highly sparse and varying nature of event data. To address this, in this study, we propose HyperE2VID, a dynamic neural network architecture for event-based video reconstruction. Our appro… ▽ More

    Submitted 20 February, 2024; v1 submitted 10 May, 2023; originally announced May 2023.

    Comments: 20 pages, 11 figures. Accepted by IEEE Transactions on Image Processing. The project page can be found at https://ercanburak.github.io/HyperE2VID.html

    Journal ref: IEEE Trans. Image Process., 33 (2024), 1826-1837

  12. EVREAL: Towards a Comprehensive Benchmark and Analysis Suite for Event-based Video Reconstruction

    Authors: Burak Ercan, Onur Eker, Aykut Erdem, Erkut Erdem

    Abstract: Event cameras are a new type of vision sensor that incorporates asynchronous and independent pixels, offering advantages over traditional frame-based cameras such as high dynamic range and minimal motion blur. However, their output is not easily understandable by humans, making the reconstruction of intensity images from event streams a fundamental task in event-based vision. While recent deep lea… ▽ More

    Submitted 5 April, 2024; v1 submitted 30 April, 2023; originally announced May 2023.

    Comments: 19 pages, 9 figures. Has been accepted for publication at the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Vancouver, 2023. The project page can be found at https://ercanburak.github.io/evreal.html

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 3942-3951. 2023

  13. arXiv:2304.06020  [pdf, other

    cs.CV

    VidStyleODE: Disentangled Video Editing via StyleGAN and NeuralODEs

    Authors: Moayed Haji Ali, Andrew Bond, Tolga Birdal, Duygu Ceylan, Levent Karacan, Erkut Erdem, Aykut Erdem

    Abstract: We propose $\textbf{VidStyleODE}$, a spatiotemporally continuous disentangled $\textbf{Vid}$eo representation based upon $\textbf{Style}$GAN and Neural-$\textbf{ODE}$s. Effective traversal of the latent space learned by Generative Adversarial Networks (GANs) has been the basis for recent breakthroughs in image editing. However, the applicability of such advancements to the video domain has been hi… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

    Journal ref: ICCV 2023

  14. arXiv:2304.03246  [pdf, other

    cs.CV

    Inst-Inpaint: Instructing to Remove Objects with Diffusion Models

    Authors: Ahmet Burak Yildirim, Vedat Baday, Erkut Erdem, Aykut Erdem, Aysegul Dundar

    Abstract: Image inpainting task refers to erasing unwanted pixels from images and filling them in a semantically consistent and realistic way. Traditionally, the pixels that are wished to be erased are defined with binary masks. From the application point of view, a user needs to generate the masks for the objects they would like to remove which can be time-consuming and prone to errors. In this work, we ar… ▽ More

    Submitted 9 August, 2023; v1 submitted 6 April, 2023; originally announced April 2023.

  15. arXiv:2303.06907  [pdf, other

    cs.CV eess.IV

    ST360IQ: No-Reference Omnidirectional Image Quality Assessment with Spherical Vision Transformers

    Authors: Nafiseh Jabbari Tofighi, Mohamed Hedi Elfkir, Nevrez Imamoglu, Cagri Ozcinar, Erkut Erdem, Aykut Erdem

    Abstract: Omnidirectional images, aka 360 images, can deliver immersive and interactive visual experiences. As their popularity has increased dramatically in recent years, evaluating the quality of 360 images has become a problem of interest since it provides insights for capturing, transmitting, and consuming this new media. However, directly adapting quality assessment methods proposed for standard natura… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

    Comments: ICASSP 2023

  16. arXiv:2211.04576  [pdf, other

    cs.CL cs.AI

    Detecting Euphemisms with Literal Descriptions and Visual Imagery

    Authors: İlker Kesen, Aykut Erdem, Erkut Erdem, Iacer Calixto

    Abstract: This paper describes our two-stage system for the Euphemism Detection shared task hosted by the 3rd Workshop on Figurative Language Processing in conjunction with EMNLP 2022. Euphemisms tone down expressions about sensitive or unpleasant issues like addiction and death. The ambiguous nature of euphemistic words or expressions makes it challenging to detect their actual meaning within a context. In… ▽ More

    Submitted 8 November, 2022; originally announced November 2022.

    Comments: 7 pages, 1 table, 1 figure. Accepted to the 3rd Workshop on Figurative Language Processing at EMNLP 2022. https://github.com/ilkerkesen/euphemism

  17. arXiv:2211.02980  [pdf, other

    cs.CV

    Disentangling Content and Motion for Text-Based Neural Video Manipulation

    Authors: Levent Karacan, Tolga Kerimoğlu, İsmail İnan, Tolga Birdal, Erkut Erdem, Aykut Erdem

    Abstract: Giving machines the ability to imagine possible new objects or scenes from linguistic descriptions and produce their realistic renderings is arguably one of the most challenging problems in computer vision. Recent advances in deep generative models have led to new approaches that give promising results towards this goal. In this paper, we introduce a new method called DiCoMoGAN for manipulating vi… ▽ More

    Submitted 5 November, 2022; originally announced November 2022.

  18. arXiv:2209.08564  [pdf, other

    cs.CV cs.LG eess.IV eess.SP

    Perception-Distortion Trade-off in the SR Space Spanned by Flow Models

    Authors: Cansu Korkmaz, A. Murat Tekalp, Zafer Dogan, Erkut Erdem, Aykut Erdem

    Abstract: Flow-based generative super-resolution (SR) models learn to produce a diverse set of feasible SR solutions, called the SR space. Diversity of SR solutions increases with the temperature ($τ$) of latent variables, which introduces random variations of texture among sample solutions, resulting in visual artifacts and low fidelity. In this paper, we present a simple but effective image ensembling/fus… ▽ More

    Submitted 18 September, 2022; originally announced September 2022.

    Comments: 5 pages, 4 figures, accepted for publication in IEEE ICIP 2022 Conference

  19. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  20. arXiv:2108.05165  [pdf, other

    cs.AI

    Stable Marriage Problems with Ties and Incomplete Preferences: An Empirical Comparison of ASP, SAT, ILP, CP, and Local Search Methods

    Authors: Selin Eyupoglu, Muge Fidan, Yavuz Gulesen, Ilayda Begum Izci, Berkan Teber, Baturay Yilmaz, Ahmet Alkan, Esra Erdem

    Abstract: We study a variation of the Stable Marriage problem, where every man and every woman express their preferences as preference lists which may be incomplete and contain ties. This problem is called the Stable Marriage problem with Ties and Incomplete preferences (SMTI). We consider three optimization variants of SMTI, Max Cardinality, Sex-Equal and Egalitarian, and empirically compare the following… ▽ More

    Submitted 17 August, 2021; v1 submitted 11 August, 2021; originally announced August 2021.

    Comments: This paper is under consideration for acceptance in Theory and Practice of Logic Programming (TPLP)

  21. arXiv:2108.04940  [pdf, other

    cs.AI cs.GT cs.LO

    Knowledge-Based Stable Roommates Problem: A Real-World Application

    Authors: Muge Fidan, Esra Erdem

    Abstract: The Stable Roommates problem with Ties and Incomplete lists (SRTI) is a matching problem characterized by the preferences of agents over other agents as roommates, where the preferences may have ties or be incomplete. SRTI asks for a matching that is stable and, sometimes, optimizes a domain-independent fairness criterion (e.g., Egalitarian). However, in real-world applications (e.g., assigning st… ▽ More

    Submitted 10 August, 2021; originally announced August 2021.

    Comments: This paper is under consideration for acceptance in Theory and Practice of Logic Programming (TPLP)

  22. arXiv:2108.02760  [pdf, other

    cs.CV

    SLAMP: Stochastic Latent Appearance and Motion Prediction

    Authors: Adil Kaan Akan, Erkut Erdem, Aykut Erdem, Fatma Güney

    Abstract: Motion is an important cue for video prediction and often utilized by separating video content into static and dynamic components. Most of the previous work utilizing motion is deterministic but there are stochastic methods that can model the inherent uncertainty of the future. Existing stochastic models either do not reason about motion explicitly or make limiting assumptions about the static par… ▽ More

    Submitted 5 August, 2021; originally announced August 2021.

    Comments: ICCV 2021

  23. arXiv:2102.07682  [pdf, other

    cs.CV

    A Gated Fusion Network for Dynamic Saliency Prediction

    Authors: Aysun Kocak, Erkut Erdem, Aykut Erdem

    Abstract: Predicting saliency in videos is a challenging problem due to complex modeling of interactions between spatial and temporal information, especially when ever-changing, dynamic nature of videos is considered. Recently, researchers have proposed large-scale datasets and models that take advantage of deep learning as a way to understand what's important for video saliency. These approaches, however,… ▽ More

    Submitted 15 February, 2021; originally announced February 2021.

    Comments: Project page: https://hucvl.github.io/GFSalNet/

  24. Object and Relation Centric Representations for Push Effect Prediction

    Authors: Ahmet E. Tekden, Aykut Erdem, Erkut Erdem, Tamim Asfour, Emre Ugur

    Abstract: Pushing is an essential non-prehensile manipulation skill used for tasks ranging from pre-grasp manipulation to scene rearrangement, reasoning about object relations in the scene, and thus pushing actions have been widely studied in robotics. The effective use of pushing actions often requires an understanding of the dynamics of the manipulated objects and adaptation to the discrepancies between p… ▽ More

    Submitted 22 February, 2023; v1 submitted 3 February, 2021; originally announced February 2021.

    Comments: Project Page: https://fzaero.github.io/push_learning/

  25. arXiv:2101.10044  [pdf, other

    cs.CL cs.CV

    Cross-lingual Visual Pre-training for Multimodal Machine Translation

    Authors: Ozan Caglayan, Menekse Kuyu, Mustafa Sercan Amac, Pranava Madhyastha, Erkut Erdem, Aykut Erdem, Lucia Specia

    Abstract: Pre-trained language models have been shown to improve performance in many natural language tasks substantially. Although the early focus of such models was single language pre-training, recent advances have resulted in cross-lingual and visual pre-training methods. In this paper, we combine these two approaches to learn visually-grounded cross-lingual representations. Specifically, we extend the… ▽ More

    Submitted 20 April, 2021; v1 submitted 25 January, 2021; originally announced January 2021.

    Comments: Accepted to EACL 2021 (Camera-ready version)

  26. arXiv:2012.10988  [pdf, other

    cs.LG cs.AI stat.ML

    Post-hoc Uncertainty Calibration for Domain Drift Scenarios

    Authors: Christian Tomani, Sebastian Gruber, Muhammed Ebrar Erdem, Daniel Cremers, Florian Buettner

    Abstract: We address the problem of uncertainty calibration. While standard deep neural networks typically yield uncalibrated predictions, calibrated confidence scores that are representative of the true likelihood of a prediction can be achieved using post-hoc calibration methods. However, to date the focus of these approaches has been on in-domain calibration. Our contribution is two-fold. First, we show… ▽ More

    Submitted 23 June, 2021; v1 submitted 20 December, 2020; originally announced December 2020.

    Comments: In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021. Code available at https://github.com/tochris/calibration-domain-drift

  27. arXiv:2012.07098  [pdf, other

    cs.CV

    MSVD-Turkish: A Comprehensive Multimodal Dataset for Integrated Vision and Language Research in Turkish

    Authors: Begum Citamak, Ozan Caglayan, Menekse Kuyu, Erkut Erdem, Aykut Erdem, Pranava Madhyastha, Lucia Specia

    Abstract: Automatic generation of video descriptions in natural language, also called video captioning, aims to understand the visual content of the video and produce a natural language sentence depicting the objects and actions in the scene. This challenging integrated vision and language problem, however, has been predominantly addressed for English. The lack of data and the linguistic properties of other… ▽ More

    Submitted 13 December, 2020; originally announced December 2020.

  28. arXiv:2012.04293  [pdf, other

    cs.AI cs.CL cs.CV

    CRAFT: A Benchmark for Causal Reasoning About Forces and inTeractions

    Authors: Tayfun Ates, M. Samil Atesoglu, Cagatay Yigit, Ilker Kesen, Mert Kobas, Erkut Erdem, Aykut Erdem, Tilbe Goksun, Deniz Yuret

    Abstract: Humans are able to perceive, understand and reason about causal events. Developing models with similar physical and causal understanding capabilities is a long-standing goal of artificial intelligence. As a step towards this direction, we introduce CRAFT, a new video question answering dataset that requires causal reasoning about physical forces and object interactions. It contains 58K video and q… ▽ More

    Submitted 1 March, 2022; v1 submitted 8 December, 2020; originally announced December 2020.

    Comments: Accepted to Findings of ACL 2022

  29. arXiv:2009.10249  [pdf, other

    cs.AI cs.LO cs.MA cs.RO

    Dynamic Multi-Agent Path Finding based on Conflict Resolution using Answer Set Programming

    Authors: Basem Atiq, Volkan Patoglu, Esra Erdem

    Abstract: We study a dynamic version of multi-agent path finding problem (called D-MAPF) where existing agents may leave and new agents may join the team at different times. We introduce a new method to solve D-MAPF based on conflict-resolution. The idea is, when a set of new agents joins the team and there are conflicts, instead of replanning for the whole team, to replan only for a minimal subset of agent… ▽ More

    Submitted 21 September, 2020; originally announced September 2020.

    Comments: In Proceedings ICLP 2020, arXiv:2009.09158

    Journal ref: EPTCS 325, 2020, pp. 223-229

  30. Solving Gossip Problems using Answer Set Programming: An Epistemic Planning Approach

    Authors: Esra Erdem, Andreas Herzig

    Abstract: We investigate the use of Answer Set Programming to solve variations of gossip problems, by modeling them as epistemic planning problems.

    Submitted 21 September, 2020; originally announced September 2020.

    Comments: In Proceedings ICLP 2020, arXiv:2009.09158

    Journal ref: EPTCS 325, 2020, pp. 52-58

  31. arXiv:2008.04126  [pdf, other

    cs.AI cs.LO

    Reasoning about Cardinal Directions between 3-Dimensional Extended Objects using Answer Set Programming

    Authors: Yusuf Izmirlioglu, Esra Erdem

    Abstract: We propose a novel formal framework (called 3D-nCDC-ASP) to represent and reason about cardinal directions between extended objects in 3-dimensional (3D) space, using Answer Set Programming (ASP). 3D-nCDC-ASP extends Cardinal Directional Calculus (CDC) with a new type of default constraints, and nCDC-ASP to 3D. 3D-nCDC-ASP provides a flexible platform offering different types of reasoning: Nonmono… ▽ More

    Submitted 10 August, 2020; originally announced August 2020.

    Comments: Paper presented at the 36th International Conference on Logic Programming (ICLP 2020), University Of Calabria, Rende (CS), Italy, September 2020, 29 pages, 6 figures

  32. arXiv:2008.03573  [pdf, other

    cs.AI cs.LO cs.MA cs.RO

    Explanation Generation for Multi-Modal Multi-Agent Path Finding with Optimal Resource Utilization using Answer Set Programming

    Authors: Aysu Bogatarkan, Esra Erdem

    Abstract: The multi-agent path finding (MAPF) problem is a combinatorial search problem that aims at finding paths for multiple agents (e.g., robots) in an environment (e.g., an autonomous warehouse) such that no two agents collide with each other, and subject to some constraints on the lengths of paths. We consider a general version of MAPF, called mMAPF, that involves multi-modal transportation modes (e.g… ▽ More

    Submitted 8 August, 2020; originally announced August 2020.

    Comments: Paper presented at the 36th International Conference on Logic Programming (ICLP 2020), University Of Calabria, Rende (CS), Italy, September 2020, 16 pages, 6 figures

  33. arXiv:2008.03496  [pdf, other

    cs.AI cs.LO cs.RO

    Human Robot Collaborative Assembly Planning: An Answer Set Programming Approach

    Authors: Momina Rizwan, Volkan Patoglu, Esra Erdem

    Abstract: For planning an assembly of a product from a given set of parts, robots necessitate certain cognitive skills: high-level planning is needed to decide the order of actuation actions, while geometric reasoning is needed to check the feasibility of these actions. For collaborative assembly tasks with humans, robots require further cognitive capabilities, such as commonsense reasoning, sensing, and co… ▽ More

    Submitted 8 August, 2020; originally announced August 2020.

    Comments: 36th International Conference on Logic Programming (ICLP 2020), University Of Calabria, Rende (CS), Italy, September 2020, 15 pages

  34. arXiv:2008.03050  [pdf, other

    cs.AI cs.GT cs.LO

    A General Framework for Stable Roommates Problems using Answer Set Programming

    Authors: Esra Erdem, Muge Fidan, David Manlove, Patrick Prosser

    Abstract: The Stable Roommates problem (SR) is characterized by the preferences of agents over other agents as roommates: each agent ranks all others in strict order of preference. A solution to SR is then a partition of the agents into pairs so that each pair shares a room, and there is no pair of agents that would block this matching (i.e., who prefers the other to their roommate in the matching). There a… ▽ More

    Submitted 7 August, 2020; originally announced August 2020.

    Comments: Paper presented at the 36th International Conference on Logic Programming (ICLP 2020), University Of Calabria, Rende (CS), Italy, September 2020, 16 pages, 1 figure

  35. Burst Photography for Learning to Enhance Extremely Dark Images

    Authors: Ahmet Serdar Karadeniz, Erkut Erdem, Aykut Erdem

    Abstract: Capturing images under extremely low-light conditions poses significant challenges for the standard camera pipeline. Images become too dark and too noisy, which makes traditional enhancement techniques almost impossible to apply. Recently, learning-based approaches have shown very promising results for this task since they have substantially more expressive capabilities to allow for improved quali… ▽ More

    Submitted 19 November, 2021; v1 submitted 17 June, 2020; originally announced June 2020.

    Comments: Published in IEEE Transactions on Image Processing

  36. arXiv:2003.12739  [pdf, other

    cs.CV cs.CL cs.LG

    Modulating Bottom-Up and Top-Down Visual Processing via Language-Conditional Filters

    Authors: İlker Kesen, Ozan Arkan Can, Erkut Erdem, Aykut Erdem, Deniz Yuret

    Abstract: How to best integrate linguistic and perceptual processing in multi-modal tasks that involve language and vision is an important open problem. In this work, we argue that the common practice of using language in a top-down manner, to direct visual attention over high-level visual features, may not be optimal. We hypothesize that the use of language to also condition the bottom-up processing from p… ▽ More

    Submitted 23 June, 2022; v1 submitted 28 March, 2020; originally announced March 2020.

    Comments: 13 pages, 6 figures, 6 tables. Appeared in MULA Workshop at CVPR 2022

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2022, pp. 4610-4620

  37. arXiv:2003.07823   

    cs.CV

    Burst Denoising of Dark Images

    Authors: Ahmet Serdar Karadeniz, Erkut Erdem, Aykut Erdem

    Abstract: Capturing images under extremely low-light conditions poses significant challenges for the standard camera pipeline. Images become too dark and too noisy, which makes traditional image enhancement techniques almost impossible to apply. Very recently, researchers have shown promising results using learning based approaches. Motivated by these ideas, in this paper, we propose a deep learning framewo… ▽ More

    Submitted 18 June, 2020; v1 submitted 17 March, 2020; originally announced March 2020.

    Comments: This paper has been withdrawn by the authors to be replaced by a new version available at arXiv:2006.09845

  38. arXiv:1909.11504  [pdf, other

    eess.IV cs.CV

    mustGAN: Multi-Stream Generative Adversarial Networks for MR Image Synthesis

    Authors: Mahmut Yurt, Salman Ul Hassan Dar, Aykut Erdem, Erkut Erdem, Tolga Çukur

    Abstract: Multi-contrast MRI protocols increase the level of morphological information available for diagnosis. Yet, the number and quality of contrasts is limited in practice by various factors including scan time and patient motion. Synthesis of missing or corrupted contrasts can alleviate this limitation to improve clinical utility. Common approaches for multi-contrast MRI involve either one-to-one and m… ▽ More

    Submitted 25 September, 2019; originally announced September 2019.

  39. arXiv:1909.08859  [pdf, other

    cs.CL cs.CV

    Procedural Reasoning Networks for Understanding Multimodal Procedures

    Authors: Mustafa Sercan Amac, Semih Yagcioglu, Aykut Erdem, Erkut Erdem

    Abstract: This paper addresses the problem of comprehending procedural commonsense knowledge. This is a challenging task as it requires identifying key entities, keeping track of their state changes, and understanding temporal and causal relations. Contrary to most of the previous work, in this study, we do not rely on strong inductive bias and explore the question of how multimodality can be exploited to p… ▽ More

    Submitted 19 September, 2019; originally announced September 2019.

    Comments: Accepted to CoNLL 2019. The project website with code and demo is available at https://hucvl.github.io/prn/

  40. arXiv:1909.07646   

    cs.LO cs.AI cs.PL

    Proceedings 35th International Conference on Logic Programming (Technical Communications)

    Authors: Bart Bogaerts, Esra Erdem, Paul Fodor, Andrea Formisano, Giovambattista Ianni, Daniela Inclezan, German Vidal, Alicia Villanueva, Marina De Vos, Fangkai Yang

    Abstract: Since the first conference held in Marseille in 1982, ICLP has been the premier international event for presenting research in logic programming. Contributions are sought in all areas of logic programming, including but not restricted to: Foundations: Semantics, Formalisms, Nonmonotonic reasoning, Knowledge representation. Languages: Concurrency, Objects, Coordination, Mobility, Higher Order,… ▽ More

    Submitted 17 September, 2019; originally announced September 2019.

    Journal ref: EPTCS 306, 2019

  41. arXiv:1909.03785  [pdf, other

    cs.RO

    Belief Regulated Dual Propagation Nets for Learning Action Effects on Groups of Articulated Objects

    Authors: Ahmet E. Tekden, Aykut Erdem, Erkut Erdem, Mert Imre, M. Yunus Seker, Emre Ugur

    Abstract: In recent years, graph neural networks have been successfully applied for learning the dynamics of complex and partially observable physical systems. However, their use in the robotics domain is, to date, still limited. In this paper, we introduce Belief Regulated Dual Propagation Networks (BRDPN), a general-purpose learnable physics engine, which enables a robot to predict the effects of its acti… ▽ More

    Submitted 16 March, 2020; v1 submitted 9 September, 2019; originally announced September 2019.

    Comments: Accepted to ICRA 2020. Project page: https://fzaero.github.io/BRDPN/ , Video: https://youtu.be/uWPr7IFT_9k

  42. arXiv:1908.03719  [pdf, ps, other

    cs.LO cs.AI cs.PL

    Introduction to the 35th International Conference on Logic Programming Special Issue

    Authors: Esra Erdem, Andrea Formisano, German Vidal, Fangkai Yang

    Abstract: We are proud to introduce this special issue of Theory and Practice of Logic Programming (TPLP), dedicated to the regular papers accepted for the 35th International Conference on Logic Programming (ICLP). The ICLP meetings started in Marseille in 1982 and since then constitute the main venue for presenting and discussing work in the area of logic programming. Under consideration for acceptance in… ▽ More

    Submitted 10 August, 2019; originally announced August 2019.

    Comments: The 35th International Conference on Logic Programming (ICLP 2019), Las Cruces, New Mexico, USA, September 20--25, 2019. 7 pages

  43. arXiv:1906.08494  [pdf, other

    cs.RO cs.AI

    Object Placement on Cluttered Surfaces: A Nested Local Search Approach

    Authors: Abdul Rahman Dabbour, Esra Erdem, Volkan Patoglu

    Abstract: For planning rearrangements of objects in a clutter, it is required to know the goal configuration of the objects. However, in real life scenarios, this information is not available most of the time. We introduce a novel method that computes a collision-free placement of objects on a cluttered surface, while minimizing the total number and amount of displacements of the existing moveable objects.… ▽ More

    Submitted 20 June, 2019; originally announced June 2019.

  44. arXiv:1903.00745  [pdf, other

    cs.AI cs.LO cs.RO

    A Formal Framework for Robot Construction Problems: A Hybrid Planning Approach

    Authors: Faseeh Ahmad, Esra Erdem, Volkan Patoglu

    Abstract: We study robot construction problems where multiple autonomous robots rearrange stacks of prefabricated blocks to build stable structures. These problems are challenging due to ramifications of actions, true concurrency, and requirements of supportedness of blocks by other blocks and stability of the structure at all times. We propose a formal hybrid planning framework to solve a wide range of rob… ▽ More

    Submitted 17 March, 2019; v1 submitted 2 March, 2019; originally announced March 2019.

    Comments: 8 pages (double-column), 7 figures

  45. arXiv:1809.00812  [pdf, other

    cs.CL cs.CV

    RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes

    Authors: Semih Yagcioglu, Aykut Erdem, Erkut Erdem, Nazli Ikizler-Cinbis

    Abstract: Understanding and reasoning about cooking recipes is a fruitful research direction towards enabling machines to interpret procedural text. In this work, we introduce RecipeQA, a dataset for multimodal comprehension of cooking recipes. It comprises of approximately 20K instructional recipes with multiple modalities such as titles, descriptions and aligned set of images. With over 36K automatically… ▽ More

    Submitted 4 September, 2018; originally announced September 2018.

    Comments: EMNLP 2018

  46. arXiv:1808.07413  [pdf, other

    cs.CV

    Manipulating Attributes of Natural Scenes via Hallucination

    Authors: Levent Karacan, Zeynep Akata, Aykut Erdem, Erkut Erdem

    Abstract: In this study, we explore building a two-stage framework for enabling users to directly manipulate high-level attributes of a natural scene. The key to our approach is a deep generative network which can hallucinate images of a scene as if they were taken at a different season (e.g. during winter), weather condition (e.g. in a cloudy day) or time of the day (e.g. at sunset). Once the scene is hall… ▽ More

    Submitted 9 October, 2019; v1 submitted 22 August, 2018; originally announced August 2018.

    Comments: Accepted for publication in ACM Transactions on Graphics

  47. arXiv:1808.04000  [pdf, other

    cs.CV

    Language Guided Fashion Image Manipulation with Feature-wise Transformations

    Authors: Mehmet Günel, Erkut Erdem, Aykut Erdem

    Abstract: Developing techniques for editing an outfit image through natural sentences and accordingly generating new outfits has promising applications for art, fashion and design. However, it is considered as a certainly challenging task since image manipulation should be carried out only on the relevant parts of the image while keeping the remaining sections untouched. Moreover, this manipulation process… ▽ More

    Submitted 12 August, 2018; originally announced August 2018.

    Comments: Accepted to ECCV 2018, First Workshop on Computer Vision For Fashion, Art and Design (extended version)

  48. arXiv:1802.01221  [pdf

    cs.CV

    Image Synthesis in Multi-Contrast MRI with Conditional Generative Adversarial Networks

    Authors: Salman Ul Hassan Dar, Mahmut Yurt, Levent Karacan, Aykut Erdem, Erkut Erdem, Tolga Çukur

    Abstract: Acquiring images of the same anatomy with multiple different contrasts increases the diversity of diagnostic information available in an MR exam. Yet, scan time limitations may prohibit acquisition of certain contrasts, and images for some contrast may be corrupted by noise and artifacts. In such cases, the ability to synthesize unacquired or corrupted contrasts from remaining contrasts can improv… ▽ More

    Submitted 4 February, 2018; originally announced February 2018.

  49. arXiv:1707.05904  [pdf, other

    cs.AI cs.LO cs.RO

    Hybrid Conditional Planning using Answer Set Programming

    Authors: Ibrahim Faruk Yalciner, Ahmed Nouman, Volkan Patoglu, Esra Erdem

    Abstract: We introduce a parallel offline algorithm for computing hybrid conditional plans, called HCP-ASP, oriented towards robotics applications. HCP-ASP relies on modeling actuation actions and sensing actions in an expressive nonmonotonic language of answer set programming (ASP), and computation of the branches of a conditional plan in parallel using an ASP solver. In particular, thanks to external atom… ▽ More

    Submitted 18 July, 2017; originally announced July 2017.

    Comments: Paper presented at the 33nd International Conference on Logic Programming (ICLP 2017), Melbourne, Australia, August 28 to September 1, 2017; 28 pages, 3 figures (arXiv:YYMM.NNNNN)

  50. arXiv:1612.07600  [pdf, other

    cs.CL cs.CV

    Re-evaluating Automatic Metrics for Image Captioning

    Authors: Mert Kilickaya, Aykut Erdem, Nazli Ikizler-Cinbis, Erkut Erdem

    Abstract: The task of generating natural language descriptions from images has received a lot of attention in recent years. Consequently, it is becoming increasingly important to evaluate such image captioning approaches in an automatic manner. In this paper, we provide an in-depth evaluation of the existing image captioning metrics through a series of carefully designed experiments. Moreover, we explore th… ▽ More

    Submitted 22 December, 2016; originally announced December 2016.