-
Context Matters: Leveraging Spatiotemporal Metadata for Semi-Supervised Learning on Remote Sensing Images
Authors:
Maximilian Bernhard,
Tanveer Hannan,
Niklas Strauß,
Matthias Schubert
Abstract:
Remote sensing projects typically generate large amounts of imagery that can be used to train powerful deep neural networks. However, the amount of labeled images is often small, as remote sensing applications generally require expert labelers. Thus, semi-supervised learning (SSL), i.e., learning with a small pool of labeled and a larger pool of unlabeled data, is particularly useful in this domai…
▽ More
Remote sensing projects typically generate large amounts of imagery that can be used to train powerful deep neural networks. However, the amount of labeled images is often small, as remote sensing applications generally require expert labelers. Thus, semi-supervised learning (SSL), i.e., learning with a small pool of labeled and a larger pool of unlabeled data, is particularly useful in this domain. Current SSL approaches generate pseudo-labels from model predictions for unlabeled samples. As the quality of these pseudo-labels is crucial for performance, utilizing additional information to improve pseudo-label quality yields a promising direction. For remote sensing images, geolocation and recording time are generally available and provide a valuable source of information as semantic concepts, such as land cover, are highly dependent on spatiotemporal context, e.g., due to seasonal effects and vegetation zones. In this paper, we propose to exploit spatiotemporal metainformation in SSL to improve the quality of pseudo-labels and, therefore, the final model performance. We show that directly adding the available metadata to the input of the predictor at test time degenerates the prediction quality for metadata outside the spatiotemporal distribution of the training set. Thus, we propose a teacher-student SSL framework where only the teacher network uses metainformation to improve the quality of pseudo-labels on the training set. Correspondingly, our student network benefits from the improved pseudo-labels but does not receive metadata as input, making it invariant to spatiotemporal shifts at test time. Furthermore, we propose methods for encoding and injecting spatiotemporal information into the model and introduce a novel distillation mechanism to enhance the knowledge transfer between teacher and student. Our framework dubbed Spatiotemporal SSL can be easily combined with several stat...
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
A Time-Inhomogeneous Markov Model for Resource Availability under Sparse Observations
Authors:
Lukas Rottkamp,
Matthias Schubert
Abstract:
Accurate spatio-temporal information about the current situation is crucial for smart city applications such as modern routing algorithms. Often, this information describes the state of stationary resources, e.g. the availability of parking bays, charging stations or the amount of people waiting for a vehicle to pick them up near a given location. To exploit this kind of information, predicting fu…
▽ More
Accurate spatio-temporal information about the current situation is crucial for smart city applications such as modern routing algorithms. Often, this information describes the state of stationary resources, e.g. the availability of parking bays, charging stations or the amount of people waiting for a vehicle to pick them up near a given location. To exploit this kind of information, predicting future states of the monitored resources is often mandatory because a resource might change its state within the time until it is needed. To train an accurate predictive model, it is often not possible to obtain a continuous time series on the state of the resource. For example, the information might be collected from traveling agents visiting the resource with an irregular frequency. Thus, it is necessary to develop methods which work on sparse observations for training and prediction. In this paper, we propose time-inhomogeneous discrete Markov models to allow accurate prediction even when the frequency of observation is very rare. Our new model is able to blend recent observations with historic data and also provide useful probabilistic estimates for future states. Since resources availability in a city is typically time-dependent, our Markov model is time-inhomogeneous and cyclic within a predefined time interval. To train our model, we propose a modified Baum-Welch algorithm. Evaluations on real-world datasets of parking bay availability show that our new method indeed yields good results compared to methods being trained on complete data and non-cyclic variants.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
Simplex Decomposition for Portfolio Allocation Constraints in Reinforcement Learning
Authors:
David Winkel,
Niklas Strauß,
Matthias Schubert,
Thomas Seidl
Abstract:
Portfolio optimization tasks describe sequential decision problems in which the investor's wealth is distributed across a set of assets. Allocation constraints are used to enforce minimal or maximal investments into particular subsets of assets to control for objectives such as limiting the portfolio's exposure to a certain sector due to environmental concerns. Although methods for constrained Rei…
▽ More
Portfolio optimization tasks describe sequential decision problems in which the investor's wealth is distributed across a set of assets. Allocation constraints are used to enforce minimal or maximal investments into particular subsets of assets to control for objectives such as limiting the portfolio's exposure to a certain sector due to environmental concerns. Although methods for constrained Reinforcement Learning (CRL) can optimize policies while considering allocation constraints, it can be observed that these general methods yield suboptimal results. In this paper, we propose a novel approach to handle allocation constraints based on a decomposition of the constraint action space into a set of unconstrained allocation problems. In particular, we examine this approach for the case of two constraints. For example, an investor may wish to invest at least a certain percentage of the portfolio into green technologies while limiting the investment in the fossil energy sector. We show that the action space of the task is equivalent to the decomposed action space, and introduce a new reinforcement learning (RL) approach CAOSD, which is built on top of the decomposition. The experimental evaluation on real-world Nasdaq-100 data demonstrates that our approach consistently outperforms state-of-the-art CRL benchmarks for portfolio optimization.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
Efficient Parking Search using Shared Fleet Data
Authors:
Niklas Strauß,
Lukas Rottkamp,
Sebatian Schmoll,
Matthias Schubert
Abstract:
Finding an available on-street parking spot is a relevant problem of day-to-day life. In recent years, cities such as Melbourne and San Francisco deployed sensors that provide real-time information about the occupation of parking spots. Finding a free parking spot in such a smart environment can be modeled and solved as a Markov decision process (MDP). The problem has to consider uncertainty as av…
▽ More
Finding an available on-street parking spot is a relevant problem of day-to-day life. In recent years, cities such as Melbourne and San Francisco deployed sensors that provide real-time information about the occupation of parking spots. Finding a free parking spot in such a smart environment can be modeled and solved as a Markov decision process (MDP). The problem has to consider uncertainty as available parking spots might not remain available until arrival due to other vehicles also claiming spots in the meantime. Knowing the parking intention of every vehicle in the environment would eliminate this uncertainty. Unfortunately, it does currently not seem realistic to have such data from all vehicles. In contrast, acquiring data from a subset of vehicles or a vehicle fleet appears feasible and has the potential to reduce uncertainty.
In this paper, we examine the question of how useful sharing data within a vehicle fleet might be for the search times of particular drivers. We use fleet data to better estimate the availability of parking spots at arrival. Since optimal solutions for large scenarios are infeasible, we base our method on approximate solutions, which have been shown to perform well in single-agent settings. Our experiments are conducted on a simulation using real-world and synthetic data from the city of Melbourne. The results indicate that fleet data can significantly reduce search times for an available parking spot.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
User-Centric Cell-Free Wireless Networks for 6G: Communication Theoretic Models and Research Challenges
Authors:
Fabian Göttsch,
Giuseppe Caire,
Wen Xu,
Martin Schubert
Abstract:
This paper presents a comprehensive communication theoretic model for the physical layer of a cell-free user-centric network, formed by user equipments (UEs), radio units (RUs), and decentralized units (DUs), uniformly spatially distributed over a given coverage area. We consider RUs equipped with multiple antennas, and focus on the regime where the UE, RU, and DU densities are constant and theref…
▽ More
This paper presents a comprehensive communication theoretic model for the physical layer of a cell-free user-centric network, formed by user equipments (UEs), radio units (RUs), and decentralized units (DUs), uniformly spatially distributed over a given coverage area. We consider RUs equipped with multiple antennas, and focus on the regime where the UE, RU, and DU densities are constant and therefore the number of such nodes grows with the coverage area. A system is said scalable if the computing load and information rate at any node in the network converges to a constant as the network size (coverage area) grows to infinity. This imposes that each UE must be processed by a (user-centric) finite-size cluster of RUs, and that such cluster processors are dynamically allocated to the DUs (e.g., as software defined virtual network functions) in order to achieve a balanced computation load. We also assume that the RUs are connected to the DUs through a packet switching network, in order to achieve adaptive routing and load balance. For this model, we define in details the dynamic cluster formation and uplink pilot allocation. As a consequence of the pilot allocation and the scalability constraint, each cluster processor has a partial view of the network channel state information. We define the condition of ``ideal partial CSI'' when the channel vectors that can be estimated are perfectly known (while the ones that cannot be estimated are not know at all). We develop two attractive cluster-based linear receiver schemes for the uplink, and an uplink-downlink duality that allows to reuse such vectors as precoders for the downlink.
△ Less
Submitted 12 January, 2024;
originally announced January 2024.
-
Spatial-Aware Deep Reinforcement Learning for the Traveling Officer Problem
Authors:
Niklas Strauß,
Matthias Schubert
Abstract:
The traveling officer problem (TOP) is a challenging stochastic optimization task. In this problem, a parking officer is guided through a city equipped with parking sensors to fine as many parking offenders as possible. A major challenge in TOP is the dynamic nature of parking offenses, which randomly appear and disappear after some time, regardless of whether they have been fined. Thus, solutions…
▽ More
The traveling officer problem (TOP) is a challenging stochastic optimization task. In this problem, a parking officer is guided through a city equipped with parking sensors to fine as many parking offenders as possible. A major challenge in TOP is the dynamic nature of parking offenses, which randomly appear and disappear after some time, regardless of whether they have been fined. Thus, solutions need to dynamically adjust to currently fineable parking offenses while also planning ahead to increase the likelihood that the officer arrives during the offense taking place. Though various solutions exist, these methods often struggle to take the implications of actions on the ability to fine future parking violations into account. This paper proposes SATOP, a novel spatial-aware deep reinforcement learning approach for TOP. Our novel state encoder creates a representation of each action, leveraging the spatial relationships between parking spots, the agent, and the action. Furthermore, we propose a novel message-passing module for learning future inter-action correlations in the given environment. Thus, the agent can estimate the potential to fine further parking violations after executing an action. We evaluate our method using an environment based on real-world data from Melbourne. Our results show that SATOP consistently outperforms state-of-the-art TOP agents and is able to fine up to 22% more parking offenses.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
Deep Active Learning with Noisy Oracle in Object Detection
Authors:
Marius Schubert,
Tobias Riedlinger,
Karsten Kahl,
Matthias Rottmann
Abstract:
Obtaining annotations for complex computer vision tasks such as object detection is an expensive and time-intense endeavor involving a large number of human workers or expert opinions. Reducing the amount of annotations required while maintaining algorithm performance is, therefore, desirable for machine learning practitioners and has been successfully achieved by active learning algorithms. Howev…
▽ More
Obtaining annotations for complex computer vision tasks such as object detection is an expensive and time-intense endeavor involving a large number of human workers or expert opinions. Reducing the amount of annotations required while maintaining algorithm performance is, therefore, desirable for machine learning practitioners and has been successfully achieved by active learning algorithms. However, it is not merely the amount of annotations which influences model performance but also the annotation quality. In practice, the oracles that are queried for new annotations frequently contain significant amounts of noise. Therefore, cleansing procedures are oftentimes necessary to review and correct given labels. This process is subject to the same budget as the initial annotation itself since it requires human workers or even domain experts. Here, we propose a composite active learning framework including a label review module for deep object detection. We show that utilizing part of the annotation budget to correct the noisy annotations partially in the active dataset leads to early improvements in model performance, especially when coupled with uncertainty-based query strategies. The precision of the label error proposals has a significant influence on the measured effect of the label review. In our experiments we achieve improvements of up to 4.5 mAP points of object detection performance by incorporating label reviews at equal annotation budget.
△ Less
Submitted 30 September, 2023;
originally announced October 2023.
-
LMD: Light-weight Prediction Quality Estimation for Object Detection in Lidar Point Clouds
Authors:
Tobias Riedlinger,
Marius Schubert,
Sarina Penquitt,
Jan-Marcel Kezmann,
Pascal Colling,
Karsten Kahl,
Lutz Roese-Koerner,
Michael Arnold,
Urs Zimmermann,
Matthias Rottmann
Abstract:
Object detection on Lidar point cloud data is a promising technology for autonomous driving and robotics which has seen a significant rise in performance and accuracy during recent years. Particularly uncertainty estimation is a crucial component for down-stream tasks and deep neural networks remain error-prone even for predictions with high confidence. Previously proposed methods for quantifying…
▽ More
Object detection on Lidar point cloud data is a promising technology for autonomous driving and robotics which has seen a significant rise in performance and accuracy during recent years. Particularly uncertainty estimation is a crucial component for down-stream tasks and deep neural networks remain error-prone even for predictions with high confidence. Previously proposed methods for quantifying prediction uncertainty tend to alter the training scheme of the detector or rely on prediction sampling which results in vastly increased inference time. In order to address these two issues, we propose LidarMetaDetect (LMD), a light-weight post-processing scheme for prediction quality estimation. Our method can easily be added to any pre-trained Lidar object detector without altering anything about the base model and is purely based on post-processing, therefore, only leading to a negligible computational overhead. Our experiments show a significant increase of statistical reliability in separating true from false predictions. We propose and evaluate an additional application of our method leading to the detection of annotation errors. Explicit samples and a conservative count of annotation error proposals indicates the viability of our method for large-scale datasets like KITTI and nuScenes. On the widely-used nuScenes test dataset, 43 out of the top 100 proposals of our method indicate, in fact, erroneous annotations.
△ Less
Submitted 15 June, 2023; v1 submitted 13 June, 2023;
originally announced June 2023.
-
GRAtt-VIS: Gated Residual Attention for Auto Rectifying Video Instance Segmentation
Authors:
Tanveer Hannan,
Rajat Koner,
Maximilian Bernhard,
Suprosanna Shit,
Bjoern Menze,
Volker Tresp,
Matthias Schubert,
Thomas Seidl
Abstract:
Recent trends in Video Instance Segmentation (VIS) have seen a growing reliance on online methods to model complex and lengthy video sequences. However, the degradation of representation and noise accumulation of the online methods, especially during occlusion and abrupt changes, pose substantial challenges. Transformer-based query propagation provides promising directions at the cost of quadratic…
▽ More
Recent trends in Video Instance Segmentation (VIS) have seen a growing reliance on online methods to model complex and lengthy video sequences. However, the degradation of representation and noise accumulation of the online methods, especially during occlusion and abrupt changes, pose substantial challenges. Transformer-based query propagation provides promising directions at the cost of quadratic memory attention. However, they are susceptible to the degradation of instance features due to the above-mentioned challenges and suffer from cascading effects. The detection and rectification of such errors remain largely underexplored. To this end, we introduce \textbf{GRAtt-VIS}, \textbf{G}ated \textbf{R}esidual \textbf{Att}ention for \textbf{V}ideo \textbf{I}nstance \textbf{S}egmentation. Firstly, we leverage a Gumbel-Softmax-based gate to detect possible errors in the current frame. Next, based on the gate activation, we rectify degraded features from its past representation. Such a residual configuration alleviates the need for dedicated memory and provides a continuous stream of relevant instance features. Secondly, we propose a novel inter-instance interaction using gate activation as a mask for self-attention. This masking strategy dynamically restricts the unrepresentative instance queries in the self-attention and preserves vital information for long-term tracking. We refer to this novel combination of Gated Residual Connection and Masked Self-Attention as \textbf{GRAtt} block, which can easily be integrated into the existing propagation-based framework. Further, GRAtt blocks significantly reduce the attention overhead and simplify dynamic temporal modeling. GRAtt-VIS achieves state-of-the-art performance on YouTube-VIS and the highly challenging OVIS dataset, significantly improving over previous methods. Code is available at \url{https://github.com/Tanveer81/GRAttVIS}.
△ Less
Submitted 26 May, 2023;
originally announced May 2023.
-
MapFormer: Boosting Change Detection by Using Pre-change Information
Authors:
Maximilian Bernhard,
Niklas Strauß,
Matthias Schubert
Abstract:
Change detection in remote sensing imagery is essential for a variety of applications such as urban planning, disaster management, and climate research. However, existing methods for identifying semantically changed areas overlook the availability of semantic information in the form of existing maps describing features of the earth's surface. In this paper, we leverage this information for change…
▽ More
Change detection in remote sensing imagery is essential for a variety of applications such as urban planning, disaster management, and climate research. However, existing methods for identifying semantically changed areas overlook the availability of semantic information in the form of existing maps describing features of the earth's surface. In this paper, we leverage this information for change detection in bi-temporal images. We show that the simple integration of the additional information via concatenation of latent representations suffices to significantly outperform state-of-the-art change detection methods. Motivated by this observation, we propose the new task of *Conditional Change Detection*, where pre-change semantic information is used as input next to bi-temporal images. To fully exploit the extra information, we propose *MapFormer*, a novel architecture based on a multi-modal feature fusion module that allows for feature processing conditioned on the available semantic information. We further employ a supervised, cross-modal contrastive loss to guide the learning of visual representations. Our approach outperforms existing change detection methods by an absolute 11.7\% and 18.4\% in terms of binary change IoU on DynamicEarthNet and HRSCD, respectively. Furthermore, we demonstrate the robustness of our approach to the quality of the pre-change semantic information and the absence pre-change imagery. The code is available at https://github.com/mxbh/mapformer.
△ Less
Submitted 7 December, 2023; v1 submitted 31 March, 2023;
originally announced March 2023.
-
Identifying Label Errors in Object Detection Datasets by Loss Inspection
Authors:
Marius Schubert,
Tobias Riedlinger,
Karsten Kahl,
Daniel Kröll,
Sebastian Schoenen,
Siniša Šegvić,
Matthias Rottmann
Abstract:
Labeling datasets for supervised object detection is a dull and time-consuming task. Errors can be easily introduced during annotation and overlooked during review, yielding inaccurate benchmarks and performance degradation of deep neural networks trained on noisy labels. In this work, we for the first time introduce a benchmark for label error detection methods on object detection datasets as wel…
▽ More
Labeling datasets for supervised object detection is a dull and time-consuming task. Errors can be easily introduced during annotation and overlooked during review, yielding inaccurate benchmarks and performance degradation of deep neural networks trained on noisy labels. In this work, we for the first time introduce a benchmark for label error detection methods on object detection datasets as well as a label error detection method and a number of baselines. We simulate four different types of randomly introduced label errors on train and test sets of well-labeled object detection datasets. For our label error detection method we assume a two-stage object detector to be given and consider the sum of both stages' classification and regression losses. The losses are computed with respect to the predictions and the noisy labels including simulated label errors, aiming at detecting the latter. We compare our method to three baselines: a naive one without deep learning, the object detector's score and the entropy of the classification softmax distribution. We outperform all baselines and demonstrate that among the considered methods, ours is the only one that detects label errors of all four types efficiently. Furthermore, we detect real label errors a) on commonly used test datasets in object detection and b) on a proprietary dataset. In both cases we achieve low false positives rates, i.e., we detect label errors with a precision for a) of up to 71.5% and for b) with 97%.
△ Less
Submitted 19 December, 2023; v1 submitted 13 March, 2023;
originally announced March 2023.
-
Towards Causal Credit Assignment
Authors:
Mátyás Schubert
Abstract:
Adequately assigning credit to actions for future outcomes based on their contributions is a long-standing open challenge in Reinforcement Learning. The assumptions of the most commonly used credit assignment method are disadvantageous in tasks where the effects of decisions are not immediately evident. Furthermore, this method can only evaluate actions that have been selected by the agent, making…
▽ More
Adequately assigning credit to actions for future outcomes based on their contributions is a long-standing open challenge in Reinforcement Learning. The assumptions of the most commonly used credit assignment method are disadvantageous in tasks where the effects of decisions are not immediately evident. Furthermore, this method can only evaluate actions that have been selected by the agent, making it highly inefficient. Still, no alternative methods have been widely adopted in the field. Hindsight Credit Assignment is a promising, but still unexplored candidate, which aims to solve the problems of both long-term and counterfactual credit assignment. In this thesis, we empirically investigate Hindsight Credit Assignment to identify its main benefits, and key points to improve. Then, we apply it to factored state representations, and in particular to state representations based on the causal structure of the environment. In this setting, we propose a variant of Hindsight Credit Assignment that effectively exploits a given causal structure. We show that our modification greatly decreases the workload of Hindsight Credit Assignment, making it more efficient and enabling it to outperform the baseline credit assignment method on various tasks. This opens the way to other methods based on given or learned causal structures.
△ Less
Submitted 17 May, 2023; v1 submitted 22 December, 2022;
originally announced December 2022.
-
Towards Rapid Prototyping and Comparability in Active Learning for Deep Object Detection
Authors:
Tobias Riedlinger,
Marius Schubert,
Karsten Kahl,
Hanno Gottschalk,
Matthias Rottmann
Abstract:
Active learning as a paradigm in deep learning is especially important in applications involving intricate perception tasks such as object detection where labels are difficult and expensive to acquire. Development of active learning methods in such fields is highly computationally expensive and time consuming which obstructs the progression of research and leads to a lack of comparability between…
▽ More
Active learning as a paradigm in deep learning is especially important in applications involving intricate perception tasks such as object detection where labels are difficult and expensive to acquire. Development of active learning methods in such fields is highly computationally expensive and time consuming which obstructs the progression of research and leads to a lack of comparability between methods. In this work, we propose and investigate a sandbox setup for rapid development and transparent evaluation of active learning in deep object detection. Our experiments with commonly used configurations of datasets and detection architectures found in the literature show that results obtained in our sandbox environment are representative of results on standard configurations. The total compute time to obtain results and assess the learning behavior can thereby be reduced by factors of up to 14 when comparing with Pascal VOC and up to 32 when comparing with BDD100k. This allows for testing and evaluating data acquisition and labeling strategies in under half a day and contributes to the transparency and development speed in the field of active learning for object detection.
△ Less
Submitted 21 December, 2022;
originally announced December 2022.
-
Robust Object Detection in Remote Sensing Imagery with Noisy and Sparse Geo-Annotations (Full Version)
Authors:
Maximilian Bernhard,
Matthias Schubert
Abstract:
Recently, the availability of remote sensing imagery from aerial vehicles and satellites constantly improved. For an automated interpretation of such data, deep-learning-based object detectors achieve state-of-the-art performance. However, established object detectors require complete, precise, and correct bounding box annotations for training. In order to create the necessary training annotations…
▽ More
Recently, the availability of remote sensing imagery from aerial vehicles and satellites constantly improved. For an automated interpretation of such data, deep-learning-based object detectors achieve state-of-the-art performance. However, established object detectors require complete, precise, and correct bounding box annotations for training. In order to create the necessary training annotations for object detectors, imagery can be georeferenced and combined with data from other sources, such as points of interest localized by GPS sensors. Unfortunately, this combination often leads to poor object localization and missing annotations. Therefore, training object detectors with such data often results in insufficient detection performance. In this paper, we present a novel approach for training object detectors with extremely noisy and incomplete annotations. Our method is based on a teacher-student learning framework and a correction module accounting for imprecise and missing annotations. Thus, our method is easy to use and can be combined with arbitrary object detectors. We demonstrate that our approach improves standard detectors by 37.1\% $AP_{50}$ on a noisy real-world remote-sensing dataset. Furthermore, our method achieves great performance gains on two datasets with synthetic noise. Code is available at \url{https://github.com/mxbh/robust_object_detection}.
△ Less
Submitted 24 October, 2022;
originally announced October 2022.
-
SurCo: Learning Linear Surrogates For Combinatorial Nonlinear Optimization Problems
Authors:
Aaron Ferber,
Taoan Huang,
Daochen Zha,
Martin Schubert,
Benoit Steiner,
Bistra Dilkina,
Yuandong Tian
Abstract:
Optimization problems with nonlinear cost functions and combinatorial constraints appear in many real-world applications but remain challenging to solve efficiently compared to their linear counterparts. To bridge this gap, we propose $\textbf{SurCo}$ that learns linear $\underline{\text{Sur}}$rogate costs which can be used in existing $\underline{\text{Co}}$mbinatorial solvers to output good solu…
▽ More
Optimization problems with nonlinear cost functions and combinatorial constraints appear in many real-world applications but remain challenging to solve efficiently compared to their linear counterparts. To bridge this gap, we propose $\textbf{SurCo}$ that learns linear $\underline{\text{Sur}}$rogate costs which can be used in existing $\underline{\text{Co}}$mbinatorial solvers to output good solutions to the original nonlinear combinatorial optimization problem. The surrogate costs are learned end-to-end with nonlinear loss by differentiating through the linear surrogate solver, combining the flexibility of gradient-based methods with the structure of linear combinatorial optimization. We propose three $\texttt{SurCo}$ variants: $\texttt{SurCo}-\texttt{zero}$ for individual nonlinear problems, $\texttt{SurCo}-\texttt{prior}$ for problem distributions, and $\texttt{SurCo}-\texttt{hybrid}$ to combine both distribution and problem-specific information. We give theoretical intuition motivating $\texttt{SurCo}$, and evaluate it empirically. Experiments show that $\texttt{SurCo}$ finds better solutions faster than state-of-the-art and domain expert approaches in real-world optimization problems such as embedding table sharding, inverse photonic design, and nonlinear route planning.
△ Less
Submitted 19 July, 2023; v1 submitted 22 October, 2022;
originally announced October 2022.
-
Federated Continual Learning for Text Classification via Selective Inter-client Transfer
Authors:
Yatin Chaudhary,
Pranav Rai,
Matthias Schubert,
Hinrich Schütze,
Pankaj Gupta
Abstract:
In this work, we combine the two paradigms: Federated Learning (FL) and Continual Learning (CL) for text classification task in cloud-edge continuum. The objective of Federated Continual Learning (FCL) is to improve deep learning models over life time at each client by (relevant and efficient) knowledge transfer without sharing data. Here, we address challenges in minimizing inter-client interfere…
▽ More
In this work, we combine the two paradigms: Federated Learning (FL) and Continual Learning (CL) for text classification task in cloud-edge continuum. The objective of Federated Continual Learning (FCL) is to improve deep learning models over life time at each client by (relevant and efficient) knowledge transfer without sharing data. Here, we address challenges in minimizing inter-client interference while knowledge sharing due to heterogeneous tasks across clients in FCL setup. In doing so, we propose a novel framework, Federated Selective Inter-client Transfer (FedSeIT) which selectively combines model parameters of foreign clients. To further maximize knowledge transfer, we assess domain overlap and select informative tasks from the sequence of historical tasks at each foreign client while preserving privacy. Evaluating against the baselines, we show improved performance, a gain of (average) 12.4\% in text classification over a sequence of tasks using five datasets from diverse domains. To the best of our knowledge, this is the first work that applies FCL to NLP.
△ Less
Submitted 12 February, 2023; v1 submitted 12 October, 2022;
originally announced October 2022.
-
InstanceFormer: An Online Video Instance Segmentation Framework
Authors:
Rajat Koner,
Tanveer Hannan,
Suprosanna Shit,
Sahand Sharifzadeh,
Matthias Schubert,
Thomas Seidl,
Volker Tresp
Abstract:
Recent transformer-based offline video instance segmentation (VIS) approaches achieve encouraging results and significantly outperform online approaches. However, their reliance on the whole video and the immense computational complexity caused by full Spatio-temporal attention limit them in real-life applications such as processing lengthy videos. In this paper, we propose a single-stage transfor…
▽ More
Recent transformer-based offline video instance segmentation (VIS) approaches achieve encouraging results and significantly outperform online approaches. However, their reliance on the whole video and the immense computational complexity caused by full Spatio-temporal attention limit them in real-life applications such as processing lengthy videos. In this paper, we propose a single-stage transformer-based efficient online VIS framework named InstanceFormer, which is especially suitable for long and challenging videos. We propose three novel components to model short-term and long-term dependency and temporal coherence. First, we propagate the representation, location, and semantic information of prior instances to model short-term changes. Second, we propose a novel memory cross-attention in the decoder, which allows the network to look into earlier instances within a certain temporal window. Finally, we employ a temporal contrastive loss to impose coherence in the representation of an instance across all frames. Memory attention and temporal coherence are particularly beneficial to long-range dependency modeling, including challenging scenarios like occlusion. The proposed InstanceFormer outperforms previous online benchmark methods by a large margin across multiple datasets. Most importantly, InstanceFormer surpasses offline approaches for challenging and long datasets such as YouTube-VIS-2021 and OVIS. Code is available at https://github.com/rajatkoner08/InstanceFormer.
△ Less
Submitted 22 August, 2022;
originally announced August 2022.
-
V-Coder: Adaptive AutoEncoder for Semantic Disclosure in Knowledge Graphs
Authors:
Christian M. M. Frey,
Matthias Schubert
Abstract:
Semantic Web or Knowledge Graphs (KG) emerged to one of the most important information source for intelligent systems requiring access to structured knowledge. One of the major challenges is the extraction and processing of unambiguous information from textual data. Following the human perception, overlapping semantic linkages between two named entities become clear due to our common-sense about t…
▽ More
Semantic Web or Knowledge Graphs (KG) emerged to one of the most important information source for intelligent systems requiring access to structured knowledge. One of the major challenges is the extraction and processing of unambiguous information from textual data. Following the human perception, overlapping semantic linkages between two named entities become clear due to our common-sense about the context a relationship lives in which is not the case when we look at it from an automatically driven process of a machine. In this work, we are interested in the problem of Relational Resolution within the scope of KGs, i.e, we are investigating the inherent semantic of relationships between entities within a network. We propose a new adaptive AutoEncoder, called V-Coder, to identify relations inherently connecting entities from different domains. Those relations can be considered as being ambiguous and are candidates for disentanglement. Likewise to the Adaptive Learning Theory (ART), our model learns new patterns from the KG by increasing units in a competitive layer without discarding the previous observed patterns whilst learning the quality of each relation separately. The evaluation on real-world datasets of Freebase, Yago and NELL shows that the V-Coder is not only able to recover links from corrupted input data, but also shows that the semantic disclosure of relations in a KG show the tendency to improve link prediction. A semantic evaluation wraps the evaluation up.
△ Less
Submitted 22 July, 2022;
originally announced August 2022.
-
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Authors:
Aarohi Srivastava,
Abhinav Rastogi,
Abhishek Rao,
Abu Awal Md Shoeb,
Abubakar Abid,
Adam Fisch,
Adam R. Brown,
Adam Santoro,
Aditya Gupta,
Adrià Garriga-Alonso,
Agnieszka Kluska,
Aitor Lewkowycz,
Akshat Agarwal,
Alethea Power,
Alex Ray,
Alex Warstadt,
Alexander W. Kocurek,
Ali Safaya,
Ali Tazarv,
Alice Xiang,
Alicia Parrish,
Allen Nie,
Aman Hussain,
Amanda Askell,
Amanda Dsouza
, et al. (426 additional authors not shown)
Abstract:
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur…
▽ More
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting.
△ Less
Submitted 12 June, 2023; v1 submitted 9 June, 2022;
originally announced June 2022.
-
Simulating the 1976 Teton Dam Failure using Geoclaw and HEC-RAS and comparing with Historical Observations
Authors:
Hannah Spero,
Donna Calhoun,
Michael Schubert
Abstract:
Dam failures occur worldwide, often from factors including aging structures, extreme hydrologic loading, and design oversights related to the changing climate. Understanding and mitigating risk to downstream inhabited areas require developing and improving low-cost high-fidelity tools, such as numerical models, which allow emergency managers to predict the consequences of dam failures better. Two-…
▽ More
Dam failures occur worldwide, often from factors including aging structures, extreme hydrologic loading, and design oversights related to the changing climate. Understanding and mitigating risk to downstream inhabited areas require developing and improving low-cost high-fidelity tools, such as numerical models, which allow emergency managers to predict the consequences of dam failures better. Two-dimensional (2D) depth-averaged hydraulic models can provide valuable insights into the importance of breach parameters or downstream flow characteristics, but historical studies considering historic failures using real topographies are less common in literature. This study compares Geoclaw, a 2D hydraulic model with adaptive mesh refinement capabilities, to an industry-standard software HEC-RAS (Hydrologic Engineering Center - River Analysis System) using the 1976 Teton Dam failure as a case study. The suitability of Geoclaw for dam failure modeling is determined based on its capability to resolve inundation extent and flood wave arrival times. This study performs sensitivity analyses of the HEC-RAS model to compare an instantaneous dam breach assumption with a time-dependent breach formation for quantifying the model uncertainty. We find the 2D Geoclaw dam-break model results compare reasonably with historical gauge and field observational data and HEC-RAS results. The model demonstrates stability and relatively low computational costs. Our findings highlight opportunities for future work, with the Geoclaw software performance supporting continued studies to evaluate performance. Outcomes of this study will assist dam owners, floodplain managers, and emergency managers by providing an additional tool for estimating the impacts of dam failures to protect lives and infrastructure downstream.
△ Less
Submitted 17 July, 2022; v1 submitted 15 May, 2022;
originally announced June 2022.
-
Uncertainty Quantification and Resource-Demanding Computer Vision Applications of Deep Learning
Authors:
Julian Burghoff,
Robin Chan,
Hanno Gottschalk,
Annika Muetze,
Tobias Riedlinger,
Matthias Rottmann,
Marius Schubert
Abstract:
Bringing deep neural networks (DNNs) into safety critical applications such as automated driving, medical imaging and finance, requires a thorough treatment of the model's uncertainties. Training deep neural networks is already resource demanding and so is also their uncertainty quantification. In this overview article, we survey methods that we developed to teach DNNs to be uncertain when they en…
▽ More
Bringing deep neural networks (DNNs) into safety critical applications such as automated driving, medical imaging and finance, requires a thorough treatment of the model's uncertainties. Training deep neural networks is already resource demanding and so is also their uncertainty quantification. In this overview article, we survey methods that we developed to teach DNNs to be uncertain when they encounter new object classes. Additionally, we present training methods to learn from only a few labels with help of uncertainty quantification. Note that this is typically paid with a massive overhead in computation of an order of magnitude and more compared to ordinary network training. Finally, we survey our work on neural architecture search which is also an order of magnitude more resource demanding then ordinary network training.
△ Less
Submitted 30 May, 2022;
originally announced May 2022.
-
Closed-form max-min power control for some cellular and cell-free massive MIMO networks
Authors:
Lorenzo Miretti,
Renato L. G. Cavalcante,
Slawomir Stanczak,
Martin Schubert,
Ronald Boehnke,
Wen Xu
Abstract:
Many common instances of power control problems for cellular and cell-free massive MIMO networks can be interpreted as max-min utility optimization problems involving affine interference mappings and polyhedral constraints. We show that these problems admit a closed-form solution which depends on the spectral radius of known matrices. In contrast, previous solutions in the literature have been ind…
▽ More
Many common instances of power control problems for cellular and cell-free massive MIMO networks can be interpreted as max-min utility optimization problems involving affine interference mappings and polyhedral constraints. We show that these problems admit a closed-form solution which depends on the spectral radius of known matrices. In contrast, previous solutions in the literature have been indirectly obtained using iterative algorithms based on the bisection method, or on fixed-point iterations. Furthermore, we also show an asymptotically tight bound for the optimal utility, which in turn provides a simple rule of thumb for evaluating whether the network is operating in the noise or interference limited regime. We finally illustrate our results by focusing on classical max-min fair power control for cell-free massive MIMO networks.
△ Less
Submitted 3 May, 2022; v1 submitted 1 May, 2022;
originally announced May 2022.
-
Box Supervised Video Segmentation Proposal Network
Authors:
Tanveer Hannan,
Rajat Koner,
Jonathan Kobold,
Matthias Schubert
Abstract:
Video Object Segmentation (VOS) has been targeted by various fully-supervised and self-supervised approaches. While fully-supervised methods demonstrate excellent results, self-supervised ones, which do not use pixel-level ground truth, attract much attention. However, self-supervised approaches pose a significant performance gap. Box-level annotations provide a balanced compromise between labelin…
▽ More
Video Object Segmentation (VOS) has been targeted by various fully-supervised and self-supervised approaches. While fully-supervised methods demonstrate excellent results, self-supervised ones, which do not use pixel-level ground truth, attract much attention. However, self-supervised approaches pose a significant performance gap. Box-level annotations provide a balanced compromise between labeling effort and result quality for image segmentation but have not been exploited for the video domain. In this work, we propose a box-supervised video object segmentation proposal network, which takes advantage of intrinsic video properties. Our method incorporates object motion in the following way: first, motion is computed using a bidirectional temporal difference and a novel bounding box-guided motion compensation. Second, we introduce a novel motion-aware affinity loss that encourages the network to predict positive pixel pairs if they share similar motion and color. The proposed method outperforms the state-of-the-art self-supervised benchmark by 16.4% and 6.9% $\mathcal{J}$ &$\mathcal{F}$ score and the majority of fully supervised methods on the DAVIS and Youtube-VOS dataset without imposing network architectural specifications. We provide extensive tests and ablations on the datasets, demonstrating the robustness of our method.
△ Less
Submitted 16 February, 2022; v1 submitted 14 February, 2022;
originally announced February 2022.
-
Massively parallel pixel-by-pixel nanophotonic optimization using a Green's function formalism
Authors:
Jiahui Wang,
Alfred K. C. Cheung,
Aleksandra Spyra,
Ian A. D. Williamson,
Jian Guan,
Martin F. Schubert
Abstract:
We introduce an efficient parallelization scheme to implement pixel-by-pixel nanophotonic optimization using a Green's function based formalism. The crucial insight in our proposal is the reframing of the optimization algorithm as a large-scale data processing pipeline, which allows for the efficient distribution of computational tasks across thousands of workers. We demonstrate the utility of our…
▽ More
We introduce an efficient parallelization scheme to implement pixel-by-pixel nanophotonic optimization using a Green's function based formalism. The crucial insight in our proposal is the reframing of the optimization algorithm as a large-scale data processing pipeline, which allows for the efficient distribution of computational tasks across thousands of workers. We demonstrate the utility of our implementation by exercising it to optimize a high numerical aperture focusing metalens at problem sizes that would otherwise be far out of reach for the Green's function based method. Finally, we highlight the connection to powerful ideas from reinforcement learning as a natural corollary of reinterpreting the nanophotonic inverse design problem as a graph traversal enabled by the pixel-by-pixel optimization paradigm.
△ Less
Submitted 10 February, 2022;
originally announced February 2022.
-
Inverse design of photonic devices with strict foundry fabrication constraints
Authors:
Martin F. Schubert,
Alfred K. C. Cheung,
Ian A. D. Williamson,
Aleksandra Spyra,
David H. Alexander
Abstract:
We introduce a new method for inverse design of nanophotonic devices which guarantees that resulting designs satisfy strict length scale constraints - including minimum width and spacing constraints required by commercial semiconductor foundries. The method adopts several concepts from machine learning to transform the problem of topology optimization with strict length scale constraints to an unc…
▽ More
We introduce a new method for inverse design of nanophotonic devices which guarantees that resulting designs satisfy strict length scale constraints - including minimum width and spacing constraints required by commercial semiconductor foundries. The method adopts several concepts from machine learning to transform the problem of topology optimization with strict length scale constraints to an unconstrained stochastic gradient optimization problem. Specifically, we introduce a conditional generator for feasible designs and adopt a straight-through estimator for backpropagation of gradients to a latent design. We demonstrate the performance and reliability of our method by designing several common integrated photonic components.
△ Less
Submitted 13 June, 2022; v1 submitted 30 January, 2022;
originally announced January 2022.
-
Conifer Seedling Detection in UAV-Imagery with RGB-Depth Information
Authors:
Jason Jooste,
Michael Fromm,
Matthias Schubert
Abstract:
Monitoring of reforestation is currently being considerably streamlined through the use of drones and image recognition algorithms, which have already proven to be effective on colour imagery. In addition to colour imagery, elevation data is often also available. The primary aim of this work was to improve the performance of the faster-RCNN object detection algorithm by integrating this height inf…
▽ More
Monitoring of reforestation is currently being considerably streamlined through the use of drones and image recognition algorithms, which have already proven to be effective on colour imagery. In addition to colour imagery, elevation data is often also available. The primary aim of this work was to improve the performance of the faster-RCNN object detection algorithm by integrating this height information, which showed itself to notably improve performance. Interestingly, the structure of the network played a key role, with direct addition of the height information as a fourth image channel showing no improvement, while integration after the backbone network and before the region proposal network led to marked improvements. This effect persisted with very long training regimes. Increasing the resolution of this height information also showed little effect.
△ Less
Submitted 22 November, 2021;
originally announced November 2021.
-
APPTeK: Agent-Based Predicate Prediction in Temporal Knowledge Graphs
Authors:
Christian M. M. Frey,
Yunpu Ma,
Matthias Schubert
Abstract:
In temporal Knowledge Graphs (tKGs), the temporal dimension is attached to facts in a knowledge base resulting in quadruples between entities such as (Nintendo, released, Super Mario, Sep-13-1985), where the predicate holds within a time interval or at a timestamp. We propose a reinforcement learning agent gathering temporal relevant information about the query entities' neighborhoods, simultaneou…
▽ More
In temporal Knowledge Graphs (tKGs), the temporal dimension is attached to facts in a knowledge base resulting in quadruples between entities such as (Nintendo, released, Super Mario, Sep-13-1985), where the predicate holds within a time interval or at a timestamp. We propose a reinforcement learning agent gathering temporal relevant information about the query entities' neighborhoods, simultaneously. We refer to the encodings of the explored graph structures as fingerprints which are used as input to a Q-network. Our agent decides sequentially which relation type needs to be explored next to expand the local subgraphs of the query entities. Our evaluation shows that the proposed method yields competitive results compared to state-of-the-art embedding algorithms for tKGs, and we additionally gain information about the relevant structures between subjects and objects.
△ Less
Submitted 21 July, 2022; v1 submitted 27 October, 2021;
originally announced October 2021.
-
SEA: Graph Shell Attention in Graph Neural Networks
Authors:
Christian M. M. Frey,
Yunpu Ma,
Matthias Schubert
Abstract:
A common issue in Graph Neural Networks (GNNs) is known as over-smoothing. By increasing the number of iterations within the message-passing of GNNs, the nodes' representations of the input graph align with each other and become indiscernible. Recently, it has been shown that increasing a model's complexity by integrating an attention mechanism yields more expressive architectures. This is majorly…
▽ More
A common issue in Graph Neural Networks (GNNs) is known as over-smoothing. By increasing the number of iterations within the message-passing of GNNs, the nodes' representations of the input graph align with each other and become indiscernible. Recently, it has been shown that increasing a model's complexity by integrating an attention mechanism yields more expressive architectures. This is majorly contributed to steering the nodes' representations only towards nodes that are more informative than others. Transformer models in combination with GNNs result in architectures including Graph Transformer Layers (GTL), where layers are entirely based on the attention operation. However, the calculation of a node's representation is still restricted to the computational working flow of a GNN. In our work, we relax the GNN architecture by means of implementing a routing heuristic. Specifically, the nodes' representations are routed to dedicated experts. Each expert calculates the representations according to their respective GNN workflow. The definitions of distinguishable GNNs result from k-localized views starting from the central node. We call this procedure Graph Shell Attention (SEA), where experts process different subgraphs in a transformer-motivated fashion. Intuitively, by increasing the number of experts, the models gain in expressiveness such that a node's representation is solely based on nodes that are located within the receptive field of an expert. We evaluate our architecture on various benchmark datasets showing competitive results compared to state-of-the-art models.
△ Less
Submitted 20 October, 2021;
originally announced October 2021.
-
Gradient-Based Quantification of Epistemic Uncertainty for Deep Object Detectors
Authors:
Tobias Riedlinger,
Matthias Rottmann,
Marius Schubert,
Hanno Gottschalk
Abstract:
The vast majority of uncertainty quantification methods for deep object detectors such as variational inference are based on the network output. Here, we study gradient-based epistemic uncertainty metrics for deep object detectors to obtain reliable confidence estimates. We show that they contain predictive information and that they capture information orthogonal to that of common, output-based un…
▽ More
The vast majority of uncertainty quantification methods for deep object detectors such as variational inference are based on the network output. Here, we study gradient-based epistemic uncertainty metrics for deep object detectors to obtain reliable confidence estimates. We show that they contain predictive information and that they capture information orthogonal to that of common, output-based uncertainty estimation methods like Monte-Carlo dropout and deep ensembles. To this end, we use meta classification and meta regression to produce confidence estimates using gradient metrics and other baselines for uncertainty quantification which are in principle applicable to any object detection architecture. Specifically, we employ false positive detection and prediction of localization quality to investigate uncertainty content of our metrics and compute the calibration errors of meta classifiers. Moreover, we use them as a post-processing filter mechanism to the object detection pipeline and compare object detection performance. Our results show that gradient-based uncertainty is itself on par with output-based methods across different detectors and datasets. More significantly, combined meta classifiers based on gradient and output-based metrics outperform the standalone models. Based on this result, we conclude that gradient uncertainty adds orthogonal information to output-based methods. This suggests that variational inference may be supplemented by gradient-based uncertainty to obtain improved confidence measures, contributing to down-stream applications of deep object detectors and improving their probabilistic reliability.
△ Less
Submitted 17 March, 2022; v1 submitted 9 July, 2021;
originally announced July 2021.
-
An Efficient Diagnosis Algorithm for Inconsistent Constraint Sets
Authors:
Alexander Felfernig,
Monika Schubert,
Christoph Zehentner
Abstract:
Constraint sets can become inconsistent in different contexts. For example, during a configuration session the set of customer requirements can become inconsistent with the configuration knowledge base. Another example is the engineering phase of a configuration knowledge base where the underlying constraints can become inconsistent with a set of test cases. In such situations we are in the need o…
▽ More
Constraint sets can become inconsistent in different contexts. For example, during a configuration session the set of customer requirements can become inconsistent with the configuration knowledge base. Another example is the engineering phase of a configuration knowledge base where the underlying constraints can become inconsistent with a set of test cases. In such situations we are in the need of techniques that support the identification of minimal sets of faulty constraints that have to be deleted in order to restore consistency. In this paper we introduce a divide-and-conquer based diagnosis algorithm (FastDiag) which identifies minimal sets of faulty constraints in an over-constrained problem. This algorithm is specifically applicable in scenarios where the efficient identification of leading (preferred) diagnoses is crucial. We compare the performance of FastDiag with the conflict-directed calculation of hitting sets and present an in-depth performance analysis that shows the advantages of our approach.
△ Less
Submitted 17 February, 2021;
originally announced February 2021.
-
MetaDetect: Uncertainty Quantification and Prediction Quality Estimates for Object Detection
Authors:
Marius Schubert,
Karsten Kahl,
Matthias Rottmann
Abstract:
In object detection with deep neural networks, the box-wise objectness score tends to be overconfident, sometimes even indicating high confidence in presence of inaccurate predictions. Hence, the reliability of the prediction and therefore reliable uncertainties are of highest interest. In this work, we present a post processing method that for any given neural network provides predictive uncertai…
▽ More
In object detection with deep neural networks, the box-wise objectness score tends to be overconfident, sometimes even indicating high confidence in presence of inaccurate predictions. Hence, the reliability of the prediction and therefore reliable uncertainties are of highest interest. In this work, we present a post processing method that for any given neural network provides predictive uncertainty estimates and quality estimates. These estimates are learned by a post processing model that receives as input a hand-crafted set of transparent metrics in form of a structured dataset. Therefrom, we learn two tasks for predicted bounding boxes. We discriminate between true positives ($\mathit{IoU}\geq0.5$) and false positives ($\mathit{IoU} < 0.5$) which we term meta classification, and we predict $\mathit{IoU}$ values directly which we term meta regression. The probabilities of the meta classification model aim at learning the probabilities of success and failure and therefore provide a modelled predictive uncertainty estimate. On the other hand, meta regression gives rise to a quality estimate. In numerical experiments, we use the publicly available YOLOv3 network and the Faster-RCNN network and evaluate meta classification and regression performance on the Kitti, Pascal VOC and COCO datasets. We demonstrate that our metrics are indeed well correlated with the $\mathit{IoU}$. For meta classification we obtain classification accuracies of up to 98.92% and AUROCs of up to 99.93%. For meta regression we obtain an $R^2$ value of up to 91.78%. These results yield significant improvements compared to other network's objectness score and other baseline approaches. Therefore, we obtain more reliable uncertainty and quality estimates which is particularly interesting in the absence of ground truth.
△ Less
Submitted 6 October, 2020; v1 submitted 4 October, 2020;
originally announced October 2020.
-
Learning Self-Expression Metrics for Scalable and Inductive Subspace Clustering
Authors:
Julian Busch,
Evgeniy Faerman,
Matthias Schubert,
Thomas Seidl
Abstract:
Subspace clustering has established itself as a state-of-the-art approach to clustering high-dimensional data. In particular, methods relying on the self-expressiveness property have recently proved especially successful. However, they suffer from two major shortcomings: First, a quadratic-size coefficient matrix is learned directly, preventing these methods from scaling beyond small datasets. Sec…
▽ More
Subspace clustering has established itself as a state-of-the-art approach to clustering high-dimensional data. In particular, methods relying on the self-expressiveness property have recently proved especially successful. However, they suffer from two major shortcomings: First, a quadratic-size coefficient matrix is learned directly, preventing these methods from scaling beyond small datasets. Secondly, the trained models are transductive and thus cannot be used to cluster out-of-sample data unseen during training. Instead of learning self-expression coefficients directly, we propose a novel metric learning approach to learn instead a subspace affinity function using a siamese neural network architecture. Consequently, our model benefits from a constant number of parameters and a constant-size memory footprint, allowing it to scale to considerably larger datasets. In addition, we can formally show that out model is still able to exactly recover subspace clusters given an independence assumption. The siamese architecture in combination with a novel geometric classifier further makes our model inductive, allowing it to cluster out-of-sample data. Additionally, non-linear clusters can be detected by simply adding an auto-encoder module to the architecture. The whole model can then be trained end-to-end in a self-supervised manner. This work in progress reports promising preliminary results on the MNIST dataset. In the spirit of reproducible research, me make all code publicly available. In future work we plan to investigate several extensions of our model and to expand experimental evaluation.
△ Less
Submitted 17 December, 2020; v1 submitted 27 September, 2020;
originally announced September 2020.
-
Small-Object Detection in Remote Sensing Images with End-to-End Edge-Enhanced GAN and Object Detector Network
Authors:
Jakaria Rabbi,
Nilanjan Ray,
Matthias Schubert,
Subir Chowdhury,
Dennis Chao
Abstract:
The detection performance of small objects in remote sensing images is not satisfactory compared to large objects, especially in low-resolution and noisy images. A generative adversarial network (GAN)-based model called enhanced super-resolution GAN (ESRGAN) shows remarkable image enhancement performance, but reconstructed images miss high-frequency edge information. Therefore, object detection pe…
▽ More
The detection performance of small objects in remote sensing images is not satisfactory compared to large objects, especially in low-resolution and noisy images. A generative adversarial network (GAN)-based model called enhanced super-resolution GAN (ESRGAN) shows remarkable image enhancement performance, but reconstructed images miss high-frequency edge information. Therefore, object detection performance degrades for small objects on recovered noisy and low-resolution remote sensing images. Inspired by the success of edge enhanced GAN (EEGAN) and ESRGAN, we apply a new edge-enhanced super-resolution GAN (EESRGAN) to improve the image quality of remote sensing images and use different detector networks in an end-to-end manner where detector loss is backpropagated into the EESRGAN to improve the detection performance. We propose an architecture with three components: ESRGAN, Edge Enhancement Network (EEN), and Detection network. We use residual-in-residual dense blocks (RRDB) for both the ESRGAN and EEN, and for the detector network, we use the faster region-based convolutional network (FRCNN) (two-stage detector) and single-shot multi-box detector (SSD) (one stage detector). Extensive experiments on a public (car overhead with context) and a self-assembled (oil and gas storage tank) satellite dataset show superior performance of our method compared to the standalone state-of-the-art object detectors.
△ Less
Submitted 28 April, 2020; v1 submitted 19 March, 2020;
originally announced March 2020.
-
Unsupervised Anomaly Detection for X-Ray Images
Authors:
Diana Davletshina,
Valentyn Melnychuk,
Viet Tran,
Hitansh Singla,
Max Berrendorf,
Evgeniy Faerman,
Michael Fromm,
Matthias Schubert
Abstract:
Obtaining labels for medical (image) data requires scarce and expensive experts. Moreover, due to ambiguous symptoms, single images rarely suffice to correctly diagnose a medical condition. Instead, it often requires to take additional background information such as the patient's medical history or test results into account. Hence, instead of focusing on uninterpretable black-box systems deliverin…
▽ More
Obtaining labels for medical (image) data requires scarce and expensive experts. Moreover, due to ambiguous symptoms, single images rarely suffice to correctly diagnose a medical condition. Instead, it often requires to take additional background information such as the patient's medical history or test results into account. Hence, instead of focusing on uninterpretable black-box systems delivering an uncertain final diagnosis in an end-to-end-fashion, we investigate how unsupervised methods trained on images without anomalies can be used to assist doctors in evaluating X-ray images of hands. Our method increases the efficiency of making a diagnosis and reduces the risk of missing important regions. Therefore, we adopt state-of-the-art approaches for unsupervised learning to detect anomalies and show how the outputs of these methods can be explained. To reduce the effect of noise, which often can be mistaken for an anomaly, we introduce a powerful preprocessing pipeline. We provide an extensive evaluation of different approaches and demonstrate empirically that even without labels it is possible to achieve satisfying results on a real-world dataset of X-ray images of hands. We also evaluate the importance of preprocessing and one of our main findings is that without it, most of our approaches perform not better than random. To foster reproducibility and accelerate research we make our code publicly available at https://github.com/Valentyn1997/xray
△ Less
Submitted 4 November, 2020; v1 submitted 29 January, 2020;
originally announced January 2020.
-
MMGAN: Generative Adversarial Networks for Multi-Modal Distributions
Authors:
Teodora Pandeva,
Matthias Schubert
Abstract:
Over the past years, Generative Adversarial Networks (GANs) have shown a remarkable generation performance especially in image synthesis. Unfortunately, they are also known for having an unstable training process and might loose parts of the data distribution for heterogeneous input data. In this paper, we propose a novel GAN extension for multi-modal distribution learning (MMGAN). In our approach…
▽ More
Over the past years, Generative Adversarial Networks (GANs) have shown a remarkable generation performance especially in image synthesis. Unfortunately, they are also known for having an unstable training process and might loose parts of the data distribution for heterogeneous input data. In this paper, we propose a novel GAN extension for multi-modal distribution learning (MMGAN). In our approach, we model the latent space as a Gaussian mixture model with a number of clusters referring to the number of disconnected data manifolds in the observation space, and include a clustering network, which relates each data manifold to one Gaussian cluster. Thus, the training gets more stable. Moreover, MMGAN allows for clustering real data according to the learned data manifold in the latent space. By a series of benchmark experiments, we illustrate that MMGAN outperforms competitive state-of-the-art models in terms of clustering performance.
△ Less
Submitted 15 November, 2019;
originally announced November 2019.
-
Uplink Grant-Free Random Access Solutions for URLLC services in 5G New Radio
Authors:
Nurul Huda Mahmood,
Renato Abreu,
Ronald Böhnke,
Martin Schubert,
Gilberto Berardinelli,
Thomas H. Jacobsen
Abstract:
The newly introduced ultra-reliable low latency communication service class in 5G New Radio depends on innovative low latency radio resource management solutions that can guarantee high reliability. Grant-free random access, where channel resources are accessed without undergoing assignment through a handshake process, is proposed in 5G New Radio as an important latency reducing solution. However,…
▽ More
The newly introduced ultra-reliable low latency communication service class in 5G New Radio depends on innovative low latency radio resource management solutions that can guarantee high reliability. Grant-free random access, where channel resources are accessed without undergoing assignment through a handshake process, is proposed in 5G New Radio as an important latency reducing solution. However, this comes at an increased likelihood of collisions resulting from uncontrolled channel access, when the same resources are preallocated to a group of users. Novel reliability enhancement techniques are therefore needed. This article provides an overview of grant-free random access in 5G New Radio focusing on the ultra-reliable low latency communication service class, and presents two reliability-enhancing solutions. The first proposes retransmissions over shared resources, whereas the second proposal incorporates grant-free transmission with non-orthogonal multiple access with overlapping transmissions being resolved through the use of advanced receivers. Both proposed solutions result in significant performance gains, in terms of reliability as well as resource efficiency. For example, the proposed non-orthogonal multiple access scheme can support a normalized load of more than 1.5 users/slot at packet loss rates of ~10^{-5} - a significant improvement over the maximum supported load with conventional grant-free schemes like slotted-ALOHA.
△ Less
Submitted 11 April, 2019;
originally announced April 2019.
-
Uncertainty Measures and Prediction Quality Rating for the Semantic Segmentation of Nested Multi Resolution Street Scene Images
Authors:
Matthias Rottmann,
Marius Schubert
Abstract:
In the semantic segmentation of street scenes the reliability of the prediction and therefore uncertainty measures are of highest interest. We present a method that generates for each input image a hierarchy of nested crops around the image center and presents these, all re-scaled to the same size, to a neural network for semantic segmentation. The resulting softmax outputs are then post processed…
▽ More
In the semantic segmentation of street scenes the reliability of the prediction and therefore uncertainty measures are of highest interest. We present a method that generates for each input image a hierarchy of nested crops around the image center and presents these, all re-scaled to the same size, to a neural network for semantic segmentation. The resulting softmax outputs are then post processed such that we can investigate mean and variance over all image crops as well as mean and variance of uncertainty heat maps obtained from pixel-wise uncertainty measures, like the entropy, applied to each crop's softmax output. In our tests, we use the publicly available DeepLabv3+ MobilenetV2 network (trained on the Cityscapes dataset) and demonstrate that the incorporation of crops improves the quality of the prediction and that we obtain more reliable uncertainty measures. These are then aggregated over predicted segments for either classifying between IoU=0 and IoU>0 (meta classification) or predicting the IoU via linear regression (meta regression). The latter yields reliable performance estimates for segmentation networks, in particular useful in the absence of ground truth. For the task of meta classification we obtain a classification accuracy of $81.93\%$ and an AUROC of $89.89\%$. For meta regression we obtain an $R^2$ value of $84.77\%$. These results yield significant improvements compared to other approaches.
△ Less
Submitted 9 April, 2019;
originally announced April 2019.
-
Semi-Supervised Learning on Graphs Based on Local Label Distributions
Authors:
Evgeniy Faerman,
Felix Borutta,
Julian Busch,
Matthias Schubert
Abstract:
Most approaches that tackle the problem of node classification consider nodes to be similar, if they have shared neighbors or are close to each other in the graph. Recent methods for attributed graphs additionally take attributes of neighboring nodes into account. We argue that the class labels of the neighbors bear important information and considering them helps to improve classification quality…
▽ More
Most approaches that tackle the problem of node classification consider nodes to be similar, if they have shared neighbors or are close to each other in the graph. Recent methods for attributed graphs additionally take attributes of neighboring nodes into account. We argue that the class labels of the neighbors bear important information and considering them helps to improve classification quality. Two nodes which are similar based on class labels in their neighborhood do not need to be close-by in the graph and may even belong to different connected components. In this work, we propose a novel approach for the semi-supervised node classification. Precisely, we propose a new node embedding which is based on the class labels in the local neighborhood of a node. We show that this is a different setting from attribute-based embeddings and thus, we propose a new method to learn label-based node embeddings which can mirror a variety of relations between the class labels of neighboring nodes. Our experimental evaluation demonstrates that our new methods can significantly improve the prediction quality on real world data sets.
△ Less
Submitted 22 May, 2018; v1 submitted 15 February, 2018;
originally announced February 2018.
-
Scenic Routes Now: Efficiently Solving the Time-Dependent Arc Orienteering Problem
Authors:
Gregor Jossé,
Ying Lu,
Tobias Emrich,
Matthias Renz,
Cyrus Shahabi,
Ugur Demiryurek,
Matthias Schubert
Abstract:
This paper extends the Arc Orienteering Problem (AOP) to large road networks with time-dependent travel times and time-dependent value gain, termed Twofold Time-Dependent AOP or 2TD-AOP for short. In its original definition, the NP-hard Orienteering Problem (OP) asks to find a path from a source to a destination maximizing the accumulated value while not exceeding a cost budget. Variations of the…
▽ More
This paper extends the Arc Orienteering Problem (AOP) to large road networks with time-dependent travel times and time-dependent value gain, termed Twofold Time-Dependent AOP or 2TD-AOP for short. In its original definition, the NP-hard Orienteering Problem (OP) asks to find a path from a source to a destination maximizing the accumulated value while not exceeding a cost budget. Variations of the OP and AOP have many practical applications such as mobile crowdsourcing tasks (e.g., repairing and maintenance or dispatching field workers), diverse logistics problems (e.g., crowd control or controlling wildfires) as well as several tourist guidance problems (e.g., generating trip recommendations or navigating through theme parks). In the proposed 2TD-AOP, travel times and value functions are assumed to be time-dependent. The dynamic values model, for instance, varying rewards in crowdsourcing tasks or varying urgency levels in damage control tasks. We discuss this novel problem, prove the benefit of time-dependence empirically and present an efficient approximative solution, optimized for fast response systems. Our approach is the first time-dependent variant of the AOP to be evaluated on a large scale, fine-grained, real-world road network. We show that optimal solutions are infeasible and solutions to the static problem are often invalid. We propose an approximate dynamic programming solution which produces valid paths and is orders of magnitude faster than any optimal solution.
△ Less
Submitted 27 September, 2016;
originally announced September 2016.
-
Approximation and Heuristic Algorithms for Computing Backbones in Asymmetric Ad-Hoc Networks
Authors:
Faisal N. Abu-Khzam,
Christine Markarian,
Friedhelm Meyer auf der Heide,
Michael Schubert
Abstract:
We consider the problem of dominating set-based virtual backbone used for routing in asymmetric wireless ad-hoc networks. These networks have non-uniform transmission ranges and are modeled using the well-established disk graphs. The corresponding graph theoretic problem seeks a strongly connected dominating-absorbent set of minimum cardinality in a digraph. A subset of nodes in a digraph is a str…
▽ More
We consider the problem of dominating set-based virtual backbone used for routing in asymmetric wireless ad-hoc networks. These networks have non-uniform transmission ranges and are modeled using the well-established disk graphs. The corresponding graph theoretic problem seeks a strongly connected dominating-absorbent set of minimum cardinality in a digraph. A subset of nodes in a digraph is a strongly connected dominating-absorbent set if the subgraph induced by these nodes is strongly connected and each node in the graph is either in the set or has both an in-neighbor and an out-neighbor in it. Distributed algorithms for this problem are of practical significance due to the dynamic nature of ad-hoc networks. We present a first distributed approximation algorithm, with a constant approximation factor and O(Diam) running time, where Diam is the diameter of the graph. Moreover we present a simple heuristic algorithm and conduct an extensive simulation study showing that our heuristic outperforms previously known approaches for the problem.
△ Less
Submitted 7 October, 2015;
originally announced October 2015.
-
ParetoPrep: Fast computation of Path Skylines Queries
Authors:
Michael Shekelyan,
Gregor Jossé,
Matthias Schubert
Abstract:
Computing cost optimal paths in network data is a very important task in many application areas like transportation networks, computer networks or social graphs. In many cases, the cost of an edge can be described by various cost criteria. For example, in a road network possible cost criteria are distance, time, ascent, energy consumption or toll fees. In such a multicriteria network, a route or p…
▽ More
Computing cost optimal paths in network data is a very important task in many application areas like transportation networks, computer networks or social graphs. In many cases, the cost of an edge can be described by various cost criteria. For example, in a road network possible cost criteria are distance, time, ascent, energy consumption or toll fees. In such a multicriteria network, a route or path skyline query computes the set of all paths having pareto optimal costs, i.e. each result path is optimal for different user preferences. In this paper, we propose a new method for computing route skylines which significantly decreases processing time and memory consumption. Furthermore, our method does not rely on any precomputation or indexing method and thus, it is suitable for dynamically changing edge costs. Our experiments demonstrate that our method outperforms state of the art approaches and allows highly efficient path skyline computation without any preprocessing.
△ Less
Submitted 1 October, 2014;
originally announced October 2014.
-
Toward Energy-Efficient 5G Wireless Communications Technologies
Authors:
R. L. G. Cavalcante,
S. Stańczak,
M. Schubert,
A. Eisenblätter,
U. Türke
Abstract:
The densification and expansion of wireless networks pose new challenges on energy efficiency. With a drastic increase of infrastructure nodes (e.g. ultra-dense deployment of small cells), the total energy consumption may easily exceed an acceptable level. While most studies focus on the energy radiated by the antennas, the bigger part of the total energy budget is actually consumed by the hardwar…
▽ More
The densification and expansion of wireless networks pose new challenges on energy efficiency. With a drastic increase of infrastructure nodes (e.g. ultra-dense deployment of small cells), the total energy consumption may easily exceed an acceptable level. While most studies focus on the energy radiated by the antennas, the bigger part of the total energy budget is actually consumed by the hardware (e.g., coolers and circuit energy consumption). The ability to shutdown infrastructure nodes (or parts of it) or to adapt the transmission strategy according to the traffic will therefore become an important design aspect of future wireless architectures. Network infrastructure should be regarded as a resource that can be occupied or released on demand. However, the modeling and optimization of such systems are complicated by the potential interference coupling between active nodes. In this article, we give an overview on different aspects of this problem. We show how prior knowledge of traffic patterns can be exploited for optimization. Then, we discuss the framework of interference functions, which has proved a useful tool for various types of coupled systems in the literature. Finally, we introduce different classes of algorithms that have the objective of improving the energy efficiency by adapting the network configuration to the traffic load.
△ Less
Submitted 27 August, 2014; v1 submitted 1 July, 2014;
originally announced July 2014.
-
Nowhere-zero flows on signed regular graphs
Authors:
Michael Schubert,
Eckhard Steffen
Abstract:
We study the flow spectrum ${\cal S}(G)$ and the integer flow spectrum $\overline{\cal S}(G)$ of signed $(2t+1)$-regular graphs. We show that if $r \in {\cal S}(G)$, then $r = 2+\frac{1}{t}$ or $r \geq 2 + \frac{2}{2t-1}$. Furthermore, $2 + \frac{1}{t} \in {\cal S}(G)$ if and only if $G$ has a $t$-factor. If $G$ has a 1-factor, then $3 \in \overline{\cal S}(G)$, and for every $t \geq 2$, there is…
▽ More
We study the flow spectrum ${\cal S}(G)$ and the integer flow spectrum $\overline{\cal S}(G)$ of signed $(2t+1)$-regular graphs. We show that if $r \in {\cal S}(G)$, then $r = 2+\frac{1}{t}$ or $r \geq 2 + \frac{2}{2t-1}$. Furthermore, $2 + \frac{1}{t} \in {\cal S}(G)$ if and only if $G$ has a $t$-factor. If $G$ has a 1-factor, then $3 \in \overline{\cal S}(G)$, and for every $t \geq 2$, there is a signed $(2t+1)$-regular graph $(H,σ)$ with $ 3 \in \overline{\cal S}(H)$ and $H$ does not have a 1-factor.
If $G$ $(\not = K_2^3)$ is a cubic graph which has a 1-factor, then $\{3,4\} \subseteq {\cal S}(G) \cap \overline{\cal S}(G)$. Furthermore, the following four statements are equivalent: (1) $G$ has a 1-factor. (2) $3 \in {\cal S}(G)$. (3) $3 \in \overline{\cal S}(G)$. (4) $4 \in \overline{\cal S}(G)$. There are cubic graphs whose integer flow spectrum does not contain 5 or 6, and we construct an infinite family of bridgeless cubic graphs with integer flow spectrum $\{3,4,6\}$.
We show that there are signed graphs where the difference between the integer flow number and the flow number is greater than or equal to 1, disproving a conjecture of Raspaud and Zhu.
The paper concludes with a proof of Bouchet's 6-flow conjecture for Kotzig-graphs.
△ Less
Submitted 29 January, 2015; v1 submitted 5 July, 2013;
originally announced July 2013.
-
Maximum Gain Round Trips with Cost Constraints
Authors:
Franz Graf,
Hans-Peter Kriegel,
Matthias Schubert
Abstract:
Searching for optimal ways in a network is an important task in multiple application areas such as social networks, co-citation graphs or road networks. In the majority of applications, each edge in a network is associated with a certain cost and an optimal way minimizes the cost while fulfilling a certain property, e.g connecting a start and a destination node. In this paper, we want to extend pu…
▽ More
Searching for optimal ways in a network is an important task in multiple application areas such as social networks, co-citation graphs or road networks. In the majority of applications, each edge in a network is associated with a certain cost and an optimal way minimizes the cost while fulfilling a certain property, e.g connecting a start and a destination node. In this paper, we want to extend pure cost networks to so-called cost-gain networks. In this type of network, each edge is additionally associated with a certain gain. Thus, a way having a certain cost additionally provides a certain gain. In the following, we will discuss the problem of finding ways providing maximal gain while costing less than a certain budget. An application for this type of problem is the round trip problem of a traveler: Given a certain amount of time, which is the best round trip traversing the most scenic landscape or visiting the most important sights? In the following, we distinguish two cases of the problem. The first does not control any redundant edges and the second allows a more sophisticated handling of edges occurring more than once. To answer the maximum round trip queries on a given graph data set, we propose unidirectional and bidirectional search algorithms. Both types of algorithms are tested for the use case named above on real world spatial networks.
△ Less
Submitted 5 May, 2011; v1 submitted 4 May, 2011;
originally announced May 2011.