subscribe to arXiv mailings

RAPiD-Seg: Range-Aware Pointwise Distance Distribution Networks for 3D LiDAR Segmentation

Authors: Li Li, Hubert P. H. Shum, Toby P. Breckon

Abstract: 3D point clouds play a pivotal role in outdoor scene perception, especially in the context of autonomous driving. Recent advancements in 3D LiDAR segmentation often focus intensely on the spatial positioning and distribution of points for accurate segmentation. However, these methods, while robust in variable conditions, encounter challenges due to sole reliance on coordinates and point intensity,… ▽ More 3D point clouds play a pivotal role in outdoor scene perception, especially in the context of autonomous driving. Recent advancements in 3D LiDAR segmentation often focus intensely on the spatial positioning and distribution of points for accurate segmentation. However, these methods, while robust in variable conditions, encounter challenges due to sole reliance on coordinates and point intensity, leading to poor isometric invariance and suboptimal segmentation. To tackle this challenge, our work introduces Range-Aware Pointwise Distance Distribution (RAPiD) features and the associated RAPiD-Seg architecture. Our RAPiD features exhibit rigid transformation invariance and effectively adapt to variations in point density, with a design focus on capturing the localized geometry of neighboring structures. They utilize inherent LiDAR isotropic radiation and semantic categorization for enhanced local representation and computational efficiency, while incorporating a 4D distance metric that integrates geometric and surface material reflectivity for improved semantic segmentation. To effectively embed high-dimensional RAPiD features, we propose a double-nested autoencoder structure with a novel class-aware embedding objective to encode high-dimensional features into manageable voxel-wise embeddings. Additionally, we propose RAPiD-Seg which incorporates a channel-wise attention fusion and two effective RAPiD-Seg variants, further optimizing the embedding for enhanced performance and generalization. Our method outperforms contemporary LiDAR segmentation work in terms of mIoU on SemanticKITTI (76.1) and nuScenes (83.6) datasets. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: ECCV 2024; 18 pages, 6 figures, 7 tables; Code at https://github.com/l1997i/rapid_seg

Journal ref: Eur. Conf. Comput. Vis. (ECCV 2024)

arXiv:2406.10068 [pdf, other]

doi 10.1109/3DV53792.2021.00130

DurLAR: A High-fidelity 128-channel LiDAR Dataset with Panoramic Ambient and Reflectivity Imagery for Multi-modal Autonomous Driving Applications

Authors: Li Li, Khalid N. Ismail, Hubert P. H. Shum, Toby P. Breckon

Abstract: We present DurLAR, a high-fidelity 128-channel 3D LiDAR dataset with panoramic ambient (near infrared) and reflectivity imagery, as well as a sample benchmark task using depth estimation for autonomous driving applications. Our driving platform is equipped with a high resolution 128 channel LiDAR, a 2MPix stereo camera, a lux meter and a GNSS/INS system. Ambient and reflectivity images are made av… ▽ More We present DurLAR, a high-fidelity 128-channel 3D LiDAR dataset with panoramic ambient (near infrared) and reflectivity imagery, as well as a sample benchmark task using depth estimation for autonomous driving applications. Our driving platform is equipped with a high resolution 128 channel LiDAR, a 2MPix stereo camera, a lux meter and a GNSS/INS system. Ambient and reflectivity images are made available along with the LiDAR point clouds to facilitate multi-modal use of concurrent ambient and reflectivity scene information. Leveraging DurLAR, with a resolution exceeding that of prior benchmarks, we consider the task of monocular depth estimation and use this increased availability of higher resolution, yet sparse ground truth scene depth information to propose a novel joint supervised/self-supervised loss formulation. We compare performance over both our new DurLAR dataset, the established KITTI benchmark and the Cityscapes dataset. Our evaluation shows our joint use supervised and self-supervised loss terms, enabled via the superior ground truth resolution and availability within DurLAR improves the quantitative and qualitative performance of leading contemporary monocular depth estimation approaches (RMSE=3.639, Sq Rel=0.936). △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: Accepted by 3DV 2021; 13 pages, 14 figures; Dataset at https://github.com/l1997i/durlar

Journal ref: Proc. Int. Conf. on 3D Vision (3DV 2021)

arXiv:2404.12285 [pdf, other]

Performance Evaluation of Segment Anything Model with Variational Prompting for Application to Non-Visible Spectrum Imagery

Authors: Yona Falinie A. Gaus, Neelanjan Bhowmik, Brian K. S. Isaac-Medina, Toby P. Breckon

Abstract: The Segment Anything Model (SAM) is a deep neural network foundational model designed to perform instance segmentation which has gained significant popularity given its zero-shot segmentation ability. SAM operates by generating masks based on various input prompts such as text, bounding boxes, points, or masks, introducing a novel methodology to overcome the constraints posed by dataset-specific s… ▽ More The Segment Anything Model (SAM) is a deep neural network foundational model designed to perform instance segmentation which has gained significant popularity given its zero-shot segmentation ability. SAM operates by generating masks based on various input prompts such as text, bounding boxes, points, or masks, introducing a novel methodology to overcome the constraints posed by dataset-specific scarcity. While SAM is trained on an extensive dataset, comprising ~11M images, it mostly consists of natural photographic images with only very limited images from other modalities. Whilst the rapid progress in visual infrared surveillance and X-ray security screening imaging technologies, driven forward by advances in deep learning, has significantly enhanced the ability to detect, classify and segment objects with high accuracy, it is not evident if the SAM zero-shot capabilities can be transferred to such modalities. This work assesses SAM capabilities in segmenting objects of interest in the X-ray/infrared modalities. Our approach reuses the pre-trained SAM with three different prompts: bounding box, centroid and random points. We present quantitative/qualitative results to showcase the performance on selected datasets. Our results show that SAM can segment objects in the X-ray modality when given a box prompt, but its performance varies for point prompts. Specifically, SAM performs poorly in segmenting slender objects and organic materials, such as plastic bottles. We find that infrared objects are also challenging to segment with point prompts given the low-contrast nature of this modality. This study shows that while SAM demonstrates outstanding zero-shot capabilities with box prompts, its performance ranges from moderate to poor for point prompts, indicating that special consideration on the cross-modal generalisation of SAM is needed when considering use on X-ray/infrared imagery. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2403.19897 [pdf, other]

Disentangling Racial Phenotypes: Fine-Grained Control of Race-related Facial Phenotype Characteristics

Authors: Seyma Yucer, Amir Atapour Abarghouei, Noura Al Moubayed, Toby P. Breckon

Abstract: Achieving an effective fine-grained appearance variation over 2D facial images, whilst preserving facial identity, is a challenging task due to the high complexity and entanglement of common 2D facial feature encoding spaces. Despite these challenges, such fine-grained control, by way of disentanglement is a crucial enabler for data-driven racial bias mitigation strategies across multiple automate… ▽ More Achieving an effective fine-grained appearance variation over 2D facial images, whilst preserving facial identity, is a challenging task due to the high complexity and entanglement of common 2D facial feature encoding spaces. Despite these challenges, such fine-grained control, by way of disentanglement is a crucial enabler for data-driven racial bias mitigation strategies across multiple automated facial analysis tasks, as it allows to analyse, characterise and synthesise human facial diversity. In this paper, we propose a novel GAN framework to enable fine-grained control over individual race-related phenotype attributes of the facial images. Our framework factors the latent (feature) space into elements that correspond to race-related facial phenotype representations, thereby separating phenotype aspects (e.g. skin, hair colour, nose, eye, mouth shapes), which are notoriously difficult to annotate robustly in real-world facial data. Concurrently, we also introduce a high quality augmented, diverse 2D face image dataset drawn from CelebA-HQ for GAN training. Unlike prior work, our framework only relies upon 2D imagery and related parameters to achieve state-of-the-art individual control over race-related phenotype attributes with improved photo-realistic output. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2311.06018 [pdf, other]

U3DS$^3$: Unsupervised 3D Semantic Scene Segmentation

Authors: Jiaxu Liu, Zhengdi Yu, Toby P. Breckon, Hubert P. H. Shum

Abstract: Contemporary point cloud segmentation approaches largely rely on richly annotated 3D training data. However, it is both time-consuming and challenging to obtain consistently accurate annotations for such 3D scene data. Moreover, there is still a lack of investigation into fully unsupervised scene segmentation for point clouds, especially for holistic 3D scenes. This paper presents U3DS$^3$, as a s… ▽ More Contemporary point cloud segmentation approaches largely rely on richly annotated 3D training data. However, it is both time-consuming and challenging to obtain consistently accurate annotations for such 3D scene data. Moreover, there is still a lack of investigation into fully unsupervised scene segmentation for point clouds, especially for holistic 3D scenes. This paper presents U3DS$^3$, as a step towards completely unsupervised point cloud segmentation for any holistic 3D scenes. To achieve this, U3DS$^3$ leverages a generalized unsupervised segmentation method for both object and background across both indoor and outdoor static 3D point clouds with no requirement for model pre-training, by leveraging only the inherent information of the point cloud to achieve full 3D scene segmentation. The initial step of our proposed approach involves generating superpoints based on the geometric characteristics of each scene. Subsequently, it undergoes a learning process through a spatial clustering-based methodology, followed by iterative training using pseudo-labels generated in accordance with the cluster centroids. Moreover, by leveraging the invariance and equivariance of the volumetric representations, we apply the geometric transformation on voxelized features to provide two sets of descriptors for robust representation learning. Finally, our evaluation provides state-of-the-art results on the ScanNet and SemanticKITTI, and competitive results on the S3DIS, benchmark datasets. △ Less

Submitted 10 November, 2023; originally announced November 2023.

Comments: 10 Pages, 4 figures, accepted to IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2024

arXiv:2310.16435 [pdf, other]

On Pixel-level Performance Assessment in Anomaly Detection

Authors: Mehdi Rafiei, Toby P. Breckon, Alexandros Iosifidis

Abstract: Anomaly detection methods have demonstrated remarkable success across various applications. However, assessing their performance, particularly at the pixel-level, presents a complex challenge due to the severe imbalance that is most commonly present between normal and abnormal samples. Commonly adopted evaluation metrics designed for pixel-level detection may not effectively capture the nuanced pe… ▽ More Anomaly detection methods have demonstrated remarkable success across various applications. However, assessing their performance, particularly at the pixel-level, presents a complex challenge due to the severe imbalance that is most commonly present between normal and abnormal samples. Commonly adopted evaluation metrics designed for pixel-level detection may not effectively capture the nuanced performance variations arising from this class imbalance. In this paper, we dissect the intricacies of this challenge, underscored by visual evidence and statistical analysis, leading to delve into the need for evaluation metrics that account for the imbalance. We offer insights into more accurate metrics, using eleven leading contemporary anomaly detection methods on twenty-one anomaly detection problems. Overall, from this extensive experimental evaluation, we can conclude that Precision-Recall-based metrics can better capture relative method performance, making them more suitable for the task. △ Less

Submitted 25 October, 2023; originally announced October 2023.

Comments: 5 pages, 5 figures, 1 table

arXiv:2308.14152 [pdf, other]

Unaligned 2D to 3D Translation with Conditional Vector-Quantized Code Diffusion using Transformers

Authors: Abril Corona-Figueroa, Sam Bond-Taylor, Neelanjan Bhowmik, Yona Falinie A. Gaus, Toby P. Breckon, Hubert P. H. Shum, Chris G. Willcocks

Abstract: Generating 3D images of complex objects conditionally from a few 2D views is a difficult synthesis problem, compounded by issues such as domain gap and geometric misalignment. For instance, a unified framework such as Generative Adversarial Networks cannot achieve this unless they explicitly define both a domain-invariant and geometric-invariant joint latent distribution, whereas Neural Radiance F… ▽ More Generating 3D images of complex objects conditionally from a few 2D views is a difficult synthesis problem, compounded by issues such as domain gap and geometric misalignment. For instance, a unified framework such as Generative Adversarial Networks cannot achieve this unless they explicitly define both a domain-invariant and geometric-invariant joint latent distribution, whereas Neural Radiance Fields are generally unable to handle both issues as they optimize at the pixel level. By contrast, we propose a simple and novel 2D to 3D synthesis approach based on conditional diffusion with vector-quantized codes. Operating in an information-rich code space enables high-resolution 3D synthesis via full-coverage attention across the views. Specifically, we generate the 3D codes (e.g. for CT images) conditional on previously generated 3D codes and the entire codebook of two 2D views (e.g. 2D X-rays). Qualitative and quantitative results demonstrate state-of-the-art performance over specialized methods across varied evaluation criteria, including fidelity metrics such as density, coverage, and distortion metrics for two complex volumetric imagery datasets from in real-world scenarios. △ Less

Submitted 27 August, 2023; originally announced August 2023.

Comments: Camera-ready version for ICCV 2023

arXiv:2305.00817 [pdf, other]

Racial Bias within Face Recognition: A Survey

Authors: Seyma Yucer, Furkan Tektas, Noura Al Moubayed, Toby P. Breckon

Abstract: Facial recognition is one of the most academically studied and industrially developed areas within computer vision where we readily find associated applications deployed globally. This widespread adoption has uncovered significant performance variation across subjects of different racial profiles leading to focused research attention on racial bias within face recognition spanning both current cau… ▽ More Facial recognition is one of the most academically studied and industrially developed areas within computer vision where we readily find associated applications deployed globally. This widespread adoption has uncovered significant performance variation across subjects of different racial profiles leading to focused research attention on racial bias within face recognition spanning both current causation and future potential solutions. In support, this study provides an extensive taxonomic review of research on racial bias within face recognition exploring every aspect and stage of the face recognition processing pipeline. Firstly, we discuss the problem definition of racial bias, starting with race definition, grouping strategies, and the societal implications of using race or race-related groupings. Secondly, we divide the common face recognition processing pipeline into four stages: image acquisition, face localisation, face representation, face verification and identification, and review the relevant corresponding literature associated with each stage. The overall aim is to provide comprehensive coverage of the racial bias problem with respect to each and every stage of the face recognition processing pipeline whilst also highlighting the potential pitfalls and limitations of contemporary mitigation strategies that need to be considered within future research endeavours or commercial applications alike. △ Less

Submitted 1 May, 2023; originally announced May 2023.

arXiv:2303.11203 [pdf, other]

Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation

Authors: Li Li, Hubert P. H. Shum, Toby P. Breckon

Abstract: Whilst the availability of 3D LiDAR point cloud data has significantly grown in recent years, annotation remains expensive and time-consuming, leading to a demand for semi-supervised semantic segmentation methods with application domains such as autonomous driving. Existing work very often employs relatively large segmentation backbone networks to improve segmentation accuracy, at the expense of c… ▽ More Whilst the availability of 3D LiDAR point cloud data has significantly grown in recent years, annotation remains expensive and time-consuming, leading to a demand for semi-supervised semantic segmentation methods with application domains such as autonomous driving. Existing work very often employs relatively large segmentation backbone networks to improve segmentation accuracy, at the expense of computational costs. In addition, many use uniform sampling to reduce ground truth data requirements for learning needed, often resulting in sub-optimal performance. To address these issues, we propose a new pipeline that employs a smaller architecture, requiring fewer ground-truth annotations to achieve superior segmentation accuracy compared to contemporary approaches. This is facilitated via a novel Sparse Depthwise Separable Convolution module that significantly reduces the network parameter count while retaining overall task performance. To effectively sub-sample our training data, we propose a new Spatio-Temporal Redundant Frame Downsampling (ST-RFD) method that leverages knowledge of sensor motion within the environment to extract a more diverse subset of training data frame samples. To leverage the use of limited annotated data samples, we further propose a soft pseudo-label method informed by LiDAR reflectivity. Our method outperforms contemporary semi-supervised work in terms of mIoU, using less labeled data, on the SemanticKITTI (59.5@5%) and ScribbleKITTI (58.1@5%) benchmark datasets, based on a 2.3x reduction in model parameters and 641x fewer multiply-add operations whilst also demonstrating significant performance improvement on limited training data (i.e., Less is More). △ Less

Submitted 28 March, 2023; v1 submitted 20 March, 2023; originally announced March 2023.

Comments: Accepted by CVPR 2023; 11 pages, 8 figures; Code at https://github.com/l1997i/lim3d

arXiv:2303.05938 [pdf, other]

ACR: Attention Collaboration-based Regressor for Arbitrary Two-Hand Reconstruction

Authors: Zhengdi Yu, Shaoli Huang, Chen Fang, Toby P. Breckon, Jue Wang

Abstract: Reconstructing two hands from monocular RGB images is challenging due to frequent occlusion and mutual confusion. Existing methods mainly learn an entangled representation to encode two interacting hands, which are incredibly fragile to impaired interaction, such as truncated hands, separate hands, or external occlusion. This paper presents ACR (Attention Collaboration-based Regressor), which make… ▽ More Reconstructing two hands from monocular RGB images is challenging due to frequent occlusion and mutual confusion. Existing methods mainly learn an entangled representation to encode two interacting hands, which are incredibly fragile to impaired interaction, such as truncated hands, separate hands, or external occlusion. This paper presents ACR (Attention Collaboration-based Regressor), which makes the first attempt to reconstruct hands in arbitrary scenarios. To achieve this, ACR explicitly mitigates interdependencies between hands and between parts by leveraging center and part-based attention for feature extraction. However, reducing interdependence helps release the input constraint while weakening the mutual reasoning about reconstructing the interacting hands. Thus, based on center attention, ACR also learns cross-hand prior that handle the interacting hands better. We evaluate our method on various types of hand reconstruction datasets. Our method significantly outperforms the best interacting-hand approaches on the InterHand2.6M dataset while yielding comparable performance with the state-of-the-art single-hand methods on the FreiHand dataset. More qualitative results on in-the-wild and hand-object interaction datasets and web images/videos further demonstrate the effectiveness of our approach for arbitrary hand reconstruction. Our code is available at https://github.com/ZhengdiYu/Arbitrary-Hands-3D-Reconstruction. △ Less

Submitted 10 March, 2023; originally announced March 2023.

Comments: Accepted by CVPR 2023; Code at https://github.com/ZhengdiYu/Arbitrary-Hands-3D-Reconstruction

arXiv:2303.03925 [pdf, other]

Robust Semi-Supervised Anomaly Detection via Adversarially Learned Continuous Noise Corruption

Authors: Jack W Barker, Neelanjan Bhowmik, Yona Falinie A Gaus, Toby P Breckon

Abstract: Anomaly detection is the task of recognising novel samples which deviate significantly from pre-establishednormality. Abnormal classes are not present during training meaning that models must learn effective rep-resentations solely across normal class data samples. Deep Autoencoders (AE) have been widely used foranomaly detection tasks, but suffer from overfitting to a null identity function. To a… ▽ More Anomaly detection is the task of recognising novel samples which deviate significantly from pre-establishednormality. Abnormal classes are not present during training meaning that models must learn effective rep-resentations solely across normal class data samples. Deep Autoencoders (AE) have been widely used foranomaly detection tasks, but suffer from overfitting to a null identity function. To address this problem, weimplement a training scheme applied to a Denoising Autoencoder (DAE) which introduces an efficient methodof producing Adversarially Learned Continuous Noise (ALCN) to maximally globally corrupt the input priorto denoising. Prior methods have applied similar approaches of adversarial training to increase the robustnessof DAE, however they exhibit limitations such as slow inference speed reducing their real-world applicabilityor producing generalised obfuscation which is more trivial to denoise. We show through rigorous evaluationthat our ALCN method of regularisation during training improves AUC performance during inference whileremaining efficient over both classical, leave-one-out novelty detection tasks with the variations-: 9 (normal)vs. 1 (abnormal) & 1 (normal) vs. 9 (abnormal); MNIST - AUCavg: 0.890 & 0.989, CIFAR-10 - AUCavg: 0.670& 0.742, in addition to challenging real-world anomaly detection tasks: industrial inspection (MVTEC-AD -AUCavg: 0.780) and plant disease detection (Plant Village - AUC: 0.770) when compared to prior approaches. △ Less

Submitted 2 March, 2023; originally announced March 2023.

Comments: 18th International Conference on Computer Vision Theory and Applications

Journal ref: Volume 4: VISAPP, ISBN 978-989-758-555-5, ISSN 2184-4321, pages 868-876. 2023

arXiv:2211.13508 [pdf, other]

1st Workshop on Maritime Computer Vision (MaCVi) 2023: Challenge Results

Authors: Benjamin Kiefer, Matej Kristan, Janez Perš, Lojze Žust, Fabio Poiesi, Fabio Augusto de Alcantara Andrade, Alexandre Bernardino, Matthew Dawkins, Jenni Raitoharju, Yitong Quan, Adem Atmaca, Timon Höfer, Qiming Zhang, Yufei Xu, Jing Zhang, Dacheng Tao, Lars Sommer, Raphael Spraul, Hangyue Zhao, Hongpu Zhang, Yanyun Zhao, Jan Lukas Augustin, Eui-ik Jeon, Impyeong Lee, Luca Zedda , et al. (48 additional authors not shown)

Abstract: The 1$^{\text{st}}$ Workshop on Maritime Computer Vision (MaCVi) 2023 focused on maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicle (USV), and organized several subchallenges in this domain: (i) UAV-based Maritime Object Detection, (ii) UAV-based Maritime Object Tracking, (iii) USV-based Maritime Obstacle Segmentation and (iv) USV-based Maritime Obstacle Detec… ▽ More The 1$^{\text{st}}$ Workshop on Maritime Computer Vision (MaCVi) 2023 focused on maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicle (USV), and organized several subchallenges in this domain: (i) UAV-based Maritime Object Detection, (ii) UAV-based Maritime Object Tracking, (iii) USV-based Maritime Obstacle Segmentation and (iv) USV-based Maritime Obstacle Detection. The subchallenges were based on the SeaDronesSee and MODS benchmarks. This report summarizes the main findings of the individual subchallenges and introduces a new benchmark, called SeaDronesSee Object Detection v2, which extends the previous benchmark by including more classes and footage. We provide statistical and qualitative analyses, and assess trends in the best-performing methodologies of over 130 submissions. The methods are summarized in the appendix. The datasets, evaluation code and the leaderboard are publicly available at https://seadronessee.cs.uni-tuebingen.de/macvi. △ Less

Submitted 28 November, 2022; v1 submitted 24 November, 2022; originally announced November 2022.

Comments: MaCVi 2023 was part of WACV 2023. This report (38 pages) discusses the competition as part of MaCVi

arXiv:2211.12285 [pdf, other]

Exact-NeRF: An Exploration of a Precise Volumetric Parameterization for Neural Radiance Fields

Authors: Brian K. S. Isaac-Medina, Chris G. Willcocks, Toby P. Breckon

Abstract: Neural Radiance Fields (NeRF) have attracted significant attention due to their ability to synthesize novel scene views with great accuracy. However, inherent to their underlying formulation, the sampling of points along a ray with zero width may result in ambiguous representations that lead to further rendering artifacts such as aliasing in the final scene. To address this issue, the recent varia… ▽ More Neural Radiance Fields (NeRF) have attracted significant attention due to their ability to synthesize novel scene views with great accuracy. However, inherent to their underlying formulation, the sampling of points along a ray with zero width may result in ambiguous representations that lead to further rendering artifacts such as aliasing in the final scene. To address this issue, the recent variant mip-NeRF proposes an Integrated Positional Encoding (IPE) based on a conical view frustum. Although this is expressed with an integral formulation, mip-NeRF instead approximates this integral as the expected value of a multivariate Gaussian distribution. This approximation is reliable for short frustums but degrades with highly elongated regions, which arises when dealing with distant scene objects under a larger depth of field. In this paper, we explore the use of an exact approach for calculating the IPE by using a pyramid-based integral formulation instead of an approximated conical-based one. We denote this formulation as Exact-NeRF and contribute the first approach to offer a precise analytical solution to the IPE within the NeRF domain. Our exploratory work illustrates that such an exact formulation Exact-NeRF matches the accuracy of mip-NeRF and furthermore provides a natural extension to more challenging scenarios without further modification, such as in the case of unbounded scenes. Our contribution aims to both address the hitherto unexplored issues of frustum approximation in earlier NeRF work and additionally provide insight into the potential future consideration of analytical solutions in future NeRF extensions. △ Less

Submitted 25 March, 2023; v1 submitted 22 November, 2022; originally announced November 2022.

Comments: 15 pages,10 figures

arXiv:2210.16453 [pdf, other]

Joint Sub-component Level Segmentation and Classification for Anomaly Detection within Dual-Energy X-Ray Security Imagery

Authors: Neelanjan Bhowmik, Toby P. Breckon

Abstract: X-ray baggage security screening is in widespread use and crucial to maintaining transport security for threat/anomaly detection tasks. The automatic detection of anomaly, which is concealed within cluttered and complex electronics/electrical items, using 2D X-ray imagery is of primary interest in recent years. We address this task by introducing joint object sub-component level segmentation and c… ▽ More X-ray baggage security screening is in widespread use and crucial to maintaining transport security for threat/anomaly detection tasks. The automatic detection of anomaly, which is concealed within cluttered and complex electronics/electrical items, using 2D X-ray imagery is of primary interest in recent years. We address this task by introducing joint object sub-component level segmentation and classification strategy using deep Convolution Neural Network architecture. The performance is evaluated over a dataset of cluttered X-ray baggage security imagery, consisting of consumer electrical and electronics items using variants of dual-energy X-ray imagery (pseudo-colour, high, low, and effective-Z). The proposed joint sub-component level segmentation and classification approach achieve ~99% true positive and ~5% false positive for anomaly detection task. △ Less

Submitted 28 October, 2022; originally announced October 2022.

arXiv:2210.14083 [pdf, other]

On Fine-Tuned Deep Features for Unsupervised Domain Adaptation

Authors: Qian Wang, Toby P. Breckon

Abstract: Prior feature transformation based approaches to Unsupervised Domain Adaptation (UDA) employ the deep features extracted by pre-trained deep models without fine-tuning them on the specific source or target domain data for a particular domain adaptation task. In contrast, end-to-end learning based approaches optimise the pre-trained backbones and the customised adaptation modules simultaneously to… ▽ More Prior feature transformation based approaches to Unsupervised Domain Adaptation (UDA) employ the deep features extracted by pre-trained deep models without fine-tuning them on the specific source or target domain data for a particular domain adaptation task. In contrast, end-to-end learning based approaches optimise the pre-trained backbones and the customised adaptation modules simultaneously to learn domain-invariant features for UDA. In this work, we explore the potential of combining fine-tuned features and feature transformation based UDA methods for improved domain adaptation performance. Specifically, we integrate the prevalent progressive pseudo-labelling techniques into the fine-tuning framework to extract fine-tuned features which are subsequently used in a state-of-the-art feature transformation based domain adaptation method SPL (Selective Pseudo-Labeling). Thorough experiments with multiple deep models including ResNet-50/101 and DeiT-small/base are conducted to demonstrate the combination of fine-tuned features and SPL can achieve state-of-the-art performance on several benchmark datasets. △ Less

Submitted 25 October, 2022; originally announced October 2022.

arXiv:2208.07613 [pdf, other]

Does lossy image compression affect racial bias within face recognition?

Authors: Seyma Yucer, Matt Poyser, Noura Al Moubayed, Toby P. Breckon

Abstract: Yes - This study investigates the impact of commonplace lossy image compression on face recognition algorithms with regard to the racial characteristics of the subject. We adopt a recently proposed racial phenotype-based bias analysis methodology to measure the effect of varying levels of lossy compression across racial phenotype categories. Additionally, we determine the relationship between chro… ▽ More Yes - This study investigates the impact of commonplace lossy image compression on face recognition algorithms with regard to the racial characteristics of the subject. We adopt a recently proposed racial phenotype-based bias analysis methodology to measure the effect of varying levels of lossy compression across racial phenotype categories. Additionally, we determine the relationship between chroma-subsampling and race-related phenotypes for recognition performance. Prior work investigates the impact of lossy JPEG compression algorithm on contemporary face recognition performance. However, there is a gap in how this impact varies with different race-related inter-sectional groups and the cause of this impact. Via an extensive experimental setup, we demonstrate that common lossy image compression approaches have a more pronounced negative impact on facial recognition performance for specific racial phenotype categories such as darker skin tones (by up to 34.55\%). Furthermore, removing chroma-subsampling during compression improves the false matching rate (up to 15.95\%) across all phenotype categories affected by the compression, including darker skin tones, wide noses, big lips, and monolid eye categories. In addition, we outline the characteristics that may be attributable as the underlying cause of such phenomenon for lossy compression algorithms such as JPEG. △ Less

Submitted 16 August, 2022; originally announced August 2022.

arXiv:2206.00432 [pdf, ps, other]

Evaluating Gaussian Grasp Maps for Generative Grasping Models

Authors: William Prew, Toby P. Breckon, Magnus Bordewich, Ulrik Beierholm

Abstract: Generalising robotic grasping to previously unseen objects is a key task in general robotic manipulation. The current method for training many antipodal generative grasping models rely on a binary ground truth grasp map generated from the centre thirds of correctly labelled grasp rectangles. However, these binary maps do not accurately reflect the positions in which a robotic arm can correctly gra… ▽ More Generalising robotic grasping to previously unseen objects is a key task in general robotic manipulation. The current method for training many antipodal generative grasping models rely on a binary ground truth grasp map generated from the centre thirds of correctly labelled grasp rectangles. However, these binary maps do not accurately reflect the positions in which a robotic arm can correctly grasp a given object. We propose a continuous Gaussian representation of annotated grasps to generate ground truth training data which achieves a higher success rate on a simulated robotic grasping benchmark. Three modern generative grasping networks are trained with either binary or Gaussian grasp maps, along with recent advancements from the robotic grasping literature, such as discretisation of grasp angles into bins and an attentional loss function. Despite negligible difference according to the standard rectangle metric, Gaussian maps better reproduce the training data and therefore improve success rates when tested on the same simulated robot arm by avoiding collisions with the object: achieving 87.94\% accuracy. Furthermore, the best performing model is shown to operate with a high success rate when transferred to a real robotic arm, at high inference speeds, without the need for transfer learning. The system is then shown to be capable of performing grasps on an antagonistic physical object dataset benchmark. △ Less

Submitted 1 June, 2022; originally announced June 2022.

Comments: 9 pages, 6 figures, to be published in IJCNN 2022

arXiv:2205.08002 [pdf, other]

Lost in Compression: the Impact of Lossy Image Compression on Variable Size Object Detection within Infrared Imagery

Authors: Neelanjan Bhowmik, Jack W. Barker, Yona Falinie A. Gaus, Toby P. Breckon

Abstract: Lossy image compression strategies allow for more efficient storage and transmission of data by encoding data to a reduced form. This is essential enable training with larger datasets on less storage-equipped environments. However, such compression can cause severe decline in performance of deep Convolution Neural Network (CNN) architectures even when mild compression is applied and the resulting… ▽ More Lossy image compression strategies allow for more efficient storage and transmission of data by encoding data to a reduced form. This is essential enable training with larger datasets on less storage-equipped environments. However, such compression can cause severe decline in performance of deep Convolution Neural Network (CNN) architectures even when mild compression is applied and the resulting compressed imagery is visually identical. In this work, we apply the lossy JPEG compression method with six discrete levels of increasing compression {95, 75, 50, 15, 10, 5} to infrared band (thermal) imagery. Our study quantitatively evaluates the affect that increasing levels of lossy compression has upon the performance of characteristically diverse object detection architectures (Cascade-RCNN, FSAF and Deformable DETR) with respect to varying sizes of objects present in the dataset. When training and evaluating on uncompressed data as a baseline, we achieve maximal mean Average Precision (mAP) of 0.823 with Cascade R-CNN across the FLIR dataset, outperforming prior work. The impact of the lossy compression is more extreme at higher compression levels (15, 10, 5) across all three CNN architectures. However, re-training models on lossy compressed imagery notably ameliorated performances for all three CNN models with an average increment of ~76% (at higher compression level 5). Additionally, we demonstrate the relative sensitivity of differing object areas {tiny, small, medium, large} with respect to the compression level. We show that tiny and small objects are more sensitive to compression than medium and large objects. Overall, Cascade R-CNN attains the maximal mAP across most of the object area categories. △ Less

Submitted 16 May, 2022; originally announced May 2022.

arXiv:2112.00556 [pdf, other]

Semi-Supervised Surface Anomaly Detection of Composite Wind Turbine Blades From Drone Imagery

Authors: Jack. W. Barker, Neelanjan Bhowmik, Toby. P. Breckon

Abstract: Within commercial wind energy generation, the monitoring and predictive maintenance of wind turbine blades in-situ is a crucial task, for which remote monitoring via aerial survey from an Unmanned Aerial Vehicle (UAV) is commonplace. Turbine blades are susceptible to both operational and weather-based damage over time, reducing the energy efficiency output of turbines. In this study, we address au… ▽ More Within commercial wind energy generation, the monitoring and predictive maintenance of wind turbine blades in-situ is a crucial task, for which remote monitoring via aerial survey from an Unmanned Aerial Vehicle (UAV) is commonplace. Turbine blades are susceptible to both operational and weather-based damage over time, reducing the energy efficiency output of turbines. In this study, we address automating the otherwise time-consuming task of both blade detection and extraction, together with fault detection within UAV-captured turbine blade inspection imagery. We propose BladeNet, an application-based, robust dual architecture to perform both unsupervised turbine blade detection and extraction, followed by super-pixel generation using the Simple Linear Iterative Clustering (SLIC) method to produce regional clusters. These clusters are then processed by a suite of semi-supervised detection methods. Our dual architecture detects surface faults of glass fibre composite material blades with high aptitude while requiring minimal prior manual image annotation. BladeNet produces an Average Precision (AP) of 0.995 across our Ørsted blade inspection dataset for offshore wind turbines and 0.223 across the Danish Technical University (DTU) NordTank turbine blade inspection dataset. BladeNet also obtains an AUC of 0.639 for surface anomaly detection across the Ørsted blade inspection dataset. △ Less

Submitted 1 December, 2021; originally announced December 2021.

Comments: In-proceedings at 2022 17th International Conference on Computer Vision Theory and Applications (VISAPP)

arXiv:2111.12701 [pdf, other]

Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes

Authors: Sam Bond-Taylor, Peter Hessey, Hiroshi Sasaki, Toby P. Breckon, Chris G. Willcocks

Abstract: Whilst diffusion probabilistic models can generate high quality image content, key limitations remain in terms of both generating high-resolution imagery and their associated high computational requirements. Recent Vector-Quantized image models have overcome this limitation of image resolution but are prohibitively slow and unidirectional as they generate tokens via element-wise autoregressive sam… ▽ More Whilst diffusion probabilistic models can generate high quality image content, key limitations remain in terms of both generating high-resolution imagery and their associated high computational requirements. Recent Vector-Quantized image models have overcome this limitation of image resolution but are prohibitively slow and unidirectional as they generate tokens via element-wise autoregressive sampling from the prior. By contrast, in this paper we propose a novel discrete diffusion probabilistic model prior which enables parallel prediction of Vector-Quantized tokens by using an unconstrained Transformer architecture as the backbone. During training, tokens are randomly masked in an order-agnostic manner and the Transformer learns to predict the original tokens. This parallelism of Vector-Quantized token prediction in turn facilitates unconditional generation of globally consistent high-resolution and diverse imagery at a fraction of the computational expense. In this manner, we can generate image resolutions exceeding that of the original training set samples whilst additionally provisioning per-image likelihood estimates (in a departure from generative adversarial approaches). Our approach achieves state-of-the-art results in terms of Density (LSUN Bedroom: 1.51; LSUN Churches: 1.12; FFHQ: 1.20) and Coverage (LSUN Bedroom: 0.83; LSUN Churches: 0.73; FFHQ: 0.80), and performs competitively on FID (LSUN Bedroom: 3.64; LSUN Churches: 4.07; FFHQ: 6.11) whilst offering advantages in terms of both computation and reduced training set requirements. △ Less

Submitted 24 November, 2021; originally announced November 2021.

Comments: 19 pages, 14 figures

MSC Class: 68T01 (Primary); 68T07 (Secondary) ACM Class: I.5.0; I.4.0; G.3

arXiv:2110.12635 [pdf, other]

Progressively Select and Reject Pseudo-labelled Samples for Open-Set Domain Adaptation

Authors: Qian Wang, Fanlin Meng, Toby P. Breckon

Abstract: Domain adaptation solves image classification problems in the target domain by taking advantage of the labelled source data and unlabelled target data. Usually, the source and target domains share the same set of classes. As a special case, Open-Set Domain Adaptation (OSDA) assumes there exist additional classes in the target domain but not present in the source domain. To solve such a domain adap… ▽ More Domain adaptation solves image classification problems in the target domain by taking advantage of the labelled source data and unlabelled target data. Usually, the source and target domains share the same set of classes. As a special case, Open-Set Domain Adaptation (OSDA) assumes there exist additional classes in the target domain but not present in the source domain. To solve such a domain adaptation problem, our proposed method learns discriminative common subspaces for the source and target domains using a novel Open-Set Locality Preserving Projection (OSLPP) algorithm. The source and target domain data are aligned in the learned common spaces class-wisely. To handle the open-set classification problem, our method progressively selects target samples to be pseudo-labelled as known classes and rejects the outliers if they are detected as from unknown classes. The common subspace learning algorithm OSLPP simultaneously aligns the labelled source data and pseudo-labelled target data from known classes and pushes the rejected target data away from the known classes. The common subspace learning and the pseudo-labelled sample selection/rejection facilitate each other in an iterative learning framework and achieves state-of-the-art performance on benchmark datasets Office-31 and Office-Home with the average HOS of 87.4% and 67.0% respectively. △ Less

Submitted 25 October, 2021; originally announced October 2021.

Comments: 8 pages

arXiv:2110.09839 [pdf, other]

doi 10.1109/WACV51458.2022.00326

Measuring Hidden Bias within Face Recognition via Racial Phenotypes

Authors: Seyma Yucer, Furkan Tektas, Noura Al Moubayed, Toby P. Breckon

Abstract: Recent work reports disparate performance for intersectional racial groups across face recognition tasks: face verification and identification. However, the definition of those racial groups has a significant impact on the underlying findings of such racial bias analysis. Previous studies define these groups based on either demographic information (e.g. African, Asian etc.) or skin tone (e.g. ligh… ▽ More Recent work reports disparate performance for intersectional racial groups across face recognition tasks: face verification and identification. However, the definition of those racial groups has a significant impact on the underlying findings of such racial bias analysis. Previous studies define these groups based on either demographic information (e.g. African, Asian etc.) or skin tone (e.g. lighter or darker skins). The use of such sensitive or broad group definitions has disadvantages for bias investigation and subsequent counter-bias solutions design. By contrast, this study introduces an alternative racial bias analysis methodology via facial phenotype attributes for face recognition. We use the set of observable characteristics of an individual face where a race-related facial phenotype is hence specific to the human face and correlated to the racial profile of the subject. We propose categorical test cases to investigate the individual influence of those attributes on bias within face recognition tasks. We compare our phenotype-based grouping methodology with previous grouping strategies and show that phenotype-based groupings uncover hidden bias without reliance upon any potentially protected attributes or ill-defined grouping strategies. Furthermore, we contribute corresponding phenotype attribute category labels for two face recognition tasks: RFW for face verification and VGGFace2 (test set) for face identification. △ Less

Submitted 19 October, 2021; originally announced October 2021.

Comments: published in IEEE Winter Conference on Applications of Computer Vision, WACV, 2022

arXiv:2110.04906 [pdf, other]

Operationalizing Convolutional Neural Network Architectures for Prohibited Object Detection in X-Ray Imagery

Authors: Thomas W. Webb, Neelanjan Bhowmik, Yona Falinie A. Gaus, Toby P. Breckon

Abstract: The recent advancement in deep Convolutional Neural Network (CNN) has brought insight into the automation of X-ray security screening for aviation security and beyond. Here, we explore the viability of two recent end-to-end object detection CNN architectures, Cascade R-CNN and FreeAnchor, for prohibited item detection by balancing processing time and the impact of image data compression from an op… ▽ More The recent advancement in deep Convolutional Neural Network (CNN) has brought insight into the automation of X-ray security screening for aviation security and beyond. Here, we explore the viability of two recent end-to-end object detection CNN architectures, Cascade R-CNN and FreeAnchor, for prohibited item detection by balancing processing time and the impact of image data compression from an operational viewpoint. Overall, we achieve maximal detection performance using a FreeAnchor architecture with a ResNet50 backbone, obtaining mean Average Precision (mAP) of 87.7 and 85.8 for using the OPIXray and SIXray benchmark datasets, showing superior performance over prior work on both. With fewer parameters and less training time, FreeAnchor achieves the highest detection inference speed of ~13 fps (3.9 ms per image). Furthermore, we evaluate the impact of lossy image compression upon detector performance. The CNN models display substantial resilience to the lossy compression, resulting in only a 1.1% decrease in mAP at the JPEG compression level of 50. Additionally, a thorough evaluation of data augmentation techniques is provided, including adaptions of MixUp and CutMix strategy as well as other standard transformations, further improving the detection accuracy. △ Less

Submitted 10 October, 2021; originally announced October 2021.

arXiv:2108.12505 [pdf, other]

On the impact of using X-ray energy response imagery for object detection via Convolutional Neural Networks

Authors: Neelanjan Bhowmik, Yona Falinie A. Gaus, Toby P. Breckon

Abstract: Automatic detection of prohibited items within complex and cluttered X-ray security imagery is essential to maintaining transport security, where prior work on automatic prohibited item detection focus primarily on pseudo-colour (rgb}) X-ray imagery. In this work we study the impact of variant X-ray imagery, i.e., X-ray energy response (high, low}) and effective-z compared to rgb, via the use of d… ▽ More Automatic detection of prohibited items within complex and cluttered X-ray security imagery is essential to maintaining transport security, where prior work on automatic prohibited item detection focus primarily on pseudo-colour (rgb}) X-ray imagery. In this work we study the impact of variant X-ray imagery, i.e., X-ray energy response (high, low}) and effective-z compared to rgb, via the use of deep Convolutional Neural Networks (CNN) for the joint object detection and segmentation task posed within X-ray baggage security screening. We evaluate state-of-the-art CNN architectures (Mask R-CNN, YOLACT, CARAFE and Cascade Mask R-CNN) to explore the transferability of models trained with such 'raw' variant imagery between the varying X-ray security scanners that exhibits differing imaging geometries, image resolutions and material colour profiles. Overall, we observe maximal detection performance using CARAFE, attributable to training using combination of rgb, high, low, and effective-z X-ray imagery, obtaining 0.7 mean Average Precision (mAP) for a six class object detection problem. Our results also exhibit a remarkable degree of generalisation capability in terms of cross-scanner transferability (AP: 0.835/0.611) for a one class object detection problem by combining rgb, high, low, and effective-z imagery. △ Less

Submitted 27 August, 2021; originally announced August 2021.

arXiv:2104.13702 [pdf, ps, other]

PANDA : Perceptually Aware Neural Detection of Anomalies

Authors: Jack W. Barker, Toby P. Breckon

Abstract: Semi-supervised methods of anomaly detection have seen substantial advancement in recent years. Of particular interest are applications of such methods to diverse, real-world anomaly detection problems where anomalous variations can vary from the visually obvious to the very subtle. In this work, we propose a novel fine-grained VAE-GAN architecture trained in a semi-supervised manner in order to d… ▽ More Semi-supervised methods of anomaly detection have seen substantial advancement in recent years. Of particular interest are applications of such methods to diverse, real-world anomaly detection problems where anomalous variations can vary from the visually obvious to the very subtle. In this work, we propose a novel fine-grained VAE-GAN architecture trained in a semi-supervised manner in order to detect both visually distinct and subtle anomalies. With the use of a residually connected dual-feature extractor, a fine-grained discriminator and a perceptual loss function, we are able to detect subtle, low inter-class (anomaly vs. normal) variant anomalies with greater detection capability and smaller margins of deviation in AUC value during inference compared to prior work whilst also remaining time-efficient during inference. We achieve state of-the-art anomaly detection results when compared extensively with prior semi-supervised approaches across a multitude of anomaly detection benchmark tasks including trivial leave-one out tasks (CIFAR-10 - AUPRCavg: 0.91; MNIST - AUPRCavg: 0.90) in addition to challenging real-world anomaly detection tasks (plant leaf disease - AUC: 0.776; threat item X-ray - AUC: 0.51), video frame-level anomaly detection (UCSDPed1 - AUC: 0.95) and high frequency texture with object anomalous defect detection (MVTEC - AUCavg: 0.83). △ Less

Submitted 28 April, 2021; originally announced April 2021.

Comments: In-proceedings at 2021 International Joint Conference on Neural Networks (IJCNN)

arXiv:2104.06219 [pdf, other]

UAV-ReID: A Benchmark on Unmanned Aerial Vehicle Re-identification in Video Imagery

Authors: Daniel Organisciak, Matthew Poyser, Aishah Alsehaim, Shanfeng Hu, Brian K. S. Isaac-Medina, Toby P. Breckon, Hubert P. H. Shum

Abstract: As unmanned aerial vehicles (UAVs) become more accessible with a growing range of applications, the potential risk of UAV disruption increases. Recent development in deep learning allows vision-based counter-UAV systems to detect and track UAVs with a single camera. However, the coverage of a single camera is limited, necessitating the need for multicamera configurations to match UAVs across camer… ▽ More As unmanned aerial vehicles (UAVs) become more accessible with a growing range of applications, the potential risk of UAV disruption increases. Recent development in deep learning allows vision-based counter-UAV systems to detect and track UAVs with a single camera. However, the coverage of a single camera is limited, necessitating the need for multicamera configurations to match UAVs across cameras - a problem known as re-identification (reID). While there has been extensive research on person and vehicle reID to match objects across time and viewpoints, to the best of our knowledge, there has been no research in UAV reID. UAVs are challenging to re-identify: they are much smaller than pedestrians and vehicles and they are often detected in the air so appear at a greater range of angles. Because no UAV data sets currently use multiple cameras, we propose the first new UAV re-identification data set, UAV-reID, that facilitates the development of machine learning solutions in this emerging area. UAV-reID has two settings: Temporally-Near to evaluate performance across views to assist tracking frameworks, and Big-to-Small to evaluate reID performance across scale and to allow early reID when UAVs are detected from a long distance. We conduct a benchmark study by extensively evaluating different reID backbones and loss functions. We demonstrate that with the right setup, deep networks are powerful enough to learn good representations for UAVs, achieving 81.9% mAP on the Temporally-Near setting and 46.5% on the challenging Big-to-Small setting. Furthermore, we find that vision transformers are the most robust to extreme variance of scale. △ Less

Submitted 2 December, 2021; v1 submitted 13 April, 2021; originally announced April 2021.

arXiv:2104.05358 [pdf, other]

UNIT-DDPM: UNpaired Image Translation with Denoising Diffusion Probabilistic Models

Authors: Hiroshi Sasaki, Chris G. Willcocks, Toby P. Breckon

Abstract: We propose a novel unpaired image-to-image translation method that uses denoising diffusion probabilistic models without requiring adversarial training. Our method, UNpaired Image Translation with Denoising Diffusion Probabilistic Models (UNIT-DDPM), trains a generative model to infer the joint distribution of images over both domains as a Markov chain by minimising a denoising score matching obje… ▽ More We propose a novel unpaired image-to-image translation method that uses denoising diffusion probabilistic models without requiring adversarial training. Our method, UNpaired Image Translation with Denoising Diffusion Probabilistic Models (UNIT-DDPM), trains a generative model to infer the joint distribution of images over both domains as a Markov chain by minimising a denoising score matching objective conditioned on the other domain. In particular, we update both domain translation models simultaneously, and we generate target domain images by a denoising Markov Chain Monte Carlo approach that is conditioned on the input source domain images, based on Langevin dynamics. Our approach provides stable model training for image-to-image translation and generates high-quality image outputs. This enables state-of-the-art Fréchet Inception Distance (FID) performance on several public datasets, including both colour and multispectral imagery, significantly outperforming the contemporary adversarial image-to-image translation methods. △ Less

Submitted 12 April, 2021; originally announced April 2021.

Comments: 10 pages, 8 figures

arXiv:2103.13933 [pdf, other]

doi 10.1109/ICCVW54120.2021.00142

Unmanned Aerial Vehicle Visual Detection and Tracking using Deep Neural Networks: A Performance Benchmark

Authors: Brian K. S. Isaac-Medina, Matt Poyser, Daniel Organisciak, Chris G. Willcocks, Toby P. Breckon, Hubert P. H. Shum

Abstract: Unmanned Aerial Vehicles (UAV) can pose a major risk for aviation safety, due to both negligent and malicious use. For this reason, the automated detection and tracking of UAV is a fundamental task in aerial security systems. Common technologies for UAV detection include visible-band and thermal infrared imaging, radio frequency and radar. Recent advances in deep neural networks (DNNs) for image-b… ▽ More Unmanned Aerial Vehicles (UAV) can pose a major risk for aviation safety, due to both negligent and malicious use. For this reason, the automated detection and tracking of UAV is a fundamental task in aerial security systems. Common technologies for UAV detection include visible-band and thermal infrared imaging, radio frequency and radar. Recent advances in deep neural networks (DNNs) for image-based object detection open the possibility to use visual information for this detection and tracking task. Furthermore, these detection architectures can be implemented as backbones for visual tracking systems, thereby enabling persistent tracking of UAV incursions. To date, no comprehensive performance benchmark exists that applies DNNs to visible-band imagery for UAV detection and tracking. To this end, three datasets with varied environmental conditions for UAV detection and tracking, comprising a total of 241 videos (331,486 images), are assessed using four detection architectures and three tracking frameworks. The best performing detector architecture obtains an mAP of 98.6% and the best performing tracking framework obtains a MOTA of 96.3%. Cross-modality evaluation is carried out between visible and infrared spectrums, achieving a maximal 82.8% mAP on visible images when training in the infrared modality. These results provide the first public multi-approach benchmark for state-of-the-art deep learning-based methods and give insight into which detection and tracking architectures are effective in the UAV domain. △ Less

Submitted 18 August, 2021; v1 submitted 25 March, 2021; originally announced March 2021.

arXiv:2012.11753 [pdf, other]

Contraband Materials Detection Within Volumetric 3D Computed Tomography Baggage Security Screening Imagery

Authors: Qian Wang, Toby P. Breckon

Abstract: Automatic prohibited object detection within 2D/3D X-ray Computed Tomography (CT) has been studied in literature to enhance the aviation security screening at checkpoints. Deep Convolutional Neural Networks (CNN) have demonstrated superior performance in 2D X-ray imagery. However, there exists very limited proof of how deep neural networks perform in materials detection within volumetric 3D CT bag… ▽ More Automatic prohibited object detection within 2D/3D X-ray Computed Tomography (CT) has been studied in literature to enhance the aviation security screening at checkpoints. Deep Convolutional Neural Networks (CNN) have demonstrated superior performance in 2D X-ray imagery. However, there exists very limited proof of how deep neural networks perform in materials detection within volumetric 3D CT baggage screening imagery. We attempt to close this gap by applying Deep Neural Networks in 3D contraband substance detection based on their material signatures. Specifically, we formulate it as a 3D semantic segmentation problem to identify material types for all voxels based on which contraband materials can be detected. To this end, we firstly investigate 3D CNN based semantic segmentation algorithms such as 3D U-Net and its variants. In contrast to the original dense representation form of volumetric 3D CT data, we propose to convert the CT volumes into sparse point clouds which allows the use of point cloud processing approaches such as PointNet++ towards more efficient processing. Experimental results on a publicly available dataset (NEU ATR) demonstrate the effectiveness of both 3D U-Net and PointNet++ in materials detection in 3D CT imagery for baggage security screening. △ Less

Submitted 21 December, 2020; originally announced December 2020.

Comments: 8 pages

arXiv:2012.05320 [pdf, other]

Multi-Model Learning for Real-Time Automotive Semantic Foggy Scene Understanding via Domain Adaptation

Authors: Naif Alshammari, Samet Akcay, Toby P. Breckon

Abstract: Robust semantic scene segmentation for automotive applications is a challenging problem in two key aspects: (1) labelling every individual scene pixel and (2) performing this task under unstable weather and illumination changes (e.g., foggy weather), which results in poor outdoor scene visibility. Such visibility limitations lead to non-optimal performance of generalised deep convolutional neural… ▽ More Robust semantic scene segmentation for automotive applications is a challenging problem in two key aspects: (1) labelling every individual scene pixel and (2) performing this task under unstable weather and illumination changes (e.g., foggy weather), which results in poor outdoor scene visibility. Such visibility limitations lead to non-optimal performance of generalised deep convolutional neural network-based semantic scene segmentation. In this paper, we propose an efficient end-to-end automotive semantic scene understanding approach that is robust to foggy weather conditions. As an end-to-end pipeline, our proposed approach provides: (1) the transformation of imagery from foggy to clear weather conditions using a domain transfer approach (correcting for poor visibility) and (2) semantically segmenting the scene using a competitive encoder-decoder architecture with low computational complexity (enabling real-time performance). Our approach incorporates RGB colour, depth and luminance images via distinct encoders with dense connectivity and features fusion to effectively exploit information from different inputs, which contributes to an optimal feature representation within the overall model. Using this architectural formulation with dense skip connections, our model achieves comparable performance to contemporary approaches at a fraction of the overall model complexity. △ Less

Submitted 9 December, 2020; originally announced December 2020.

Comments: arXiv admin note: substantial text overlap with arXiv:1909.07697

arXiv:2012.05304 [pdf, other]

Competitive Simplicity for Multi-Task Learning for Real-Time Foggy Scene Understanding via Domain Adaptation

Authors: Naif Alshammari, Samet Akcay, Toby P. Breckon

Abstract: Automotive scene understanding under adverse weather conditions raises a realistic and challenging problem attributable to poor outdoor scene visibility (e.g. foggy weather). However, because most contemporary scene understanding approaches are applied under ideal-weather conditions, such approaches may not provide genuinely optimal performance when compared to established a priori insights on ext… ▽ More Automotive scene understanding under adverse weather conditions raises a realistic and challenging problem attributable to poor outdoor scene visibility (e.g. foggy weather). However, because most contemporary scene understanding approaches are applied under ideal-weather conditions, such approaches may not provide genuinely optimal performance when compared to established a priori insights on extreme-weather understanding. In this paper, we propose a complex but competitive multi-task learning approach capable of performing in real-time semantic scene understanding and monocular depth estimation under foggy weather conditions by leveraging both recent advances in adversarial training and domain adaptation. As an end-to-end pipeline, our model provides a novel solution to surpass degraded visibility in foggy weather conditions by transferring scenes from foggy to normal using a GAN-based model. For optimal performance in semantic segmentation, our model generates depth to be used as complementary source information with RGB in the segmentation network. We provide a robust method for foggy scene understanding by training two models (normal and foggy) simultaneously with shared weights (each model is trained on each weather condition independently). Our model incorporates RGB colour, depth, and luminance images via distinct encoders with dense connectivity and features fusing, and leverages skip connections to produce consistent depth and segmentation predictions. Using this architectural formulation with light computational complexity at inference time, we are able to achieve comparable performance to contemporary approaches at a fraction of the overall model complexity. △ Less

Submitted 9 December, 2020; originally announced December 2020.

arXiv:2012.00848 [pdf, other]

Data Augmentation with norm-VAE for Unsupervised Domain Adaptation

Authors: Qian Wang, Fanlin Meng, Toby P. Breckon

Abstract: We address the Unsupervised Domain Adaptation (UDA) problem in image classification from a new perspective. In contrast to most existing works which either align the data distributions or learn domain-invariant features, we directly learn a unified classifier for both domains within a high-dimensional homogeneous feature space without explicit domain adaptation. To this end, we employ the effectiv… ▽ More We address the Unsupervised Domain Adaptation (UDA) problem in image classification from a new perspective. In contrast to most existing works which either align the data distributions or learn domain-invariant features, we directly learn a unified classifier for both domains within a high-dimensional homogeneous feature space without explicit domain adaptation. To this end, we employ the effective Selective Pseudo-Labelling (SPL) techniques to take advantage of the unlabelled samples in the target domain. Surprisingly, data distribution discrepancy across the source and target domains can be well handled by a computationally simple classifier (e.g., a shallow Multi-Layer Perceptron) trained in the original feature space. Besides, we propose a novel generative model norm-VAE to generate synthetic features for the target domain as a data augmentation strategy to enhance classifier training. Experimental results on several benchmark datasets demonstrate the pseudo-labelling strategy itself can lead to comparable performance to many state-of-the-art methods whilst the use of norm-VAE for feature augmentation can further improve the performance in most cases. As a result, our proposed methods (i.e. naive-SPL and norm-VAE-SPL) can achieve new state-of-the-art performance with the average accuracy of 93.4% and 90.4% on Office-Caltech and ImageCLEF-DA datasets, and comparable performance on Digits, Office31 and Office-Home datasets with the average accuracy of 97.2%, 87.6% and 67.9% respectively. △ Less

Submitted 1 December, 2020; originally announced December 2020.

Comments: 12 pages

arXiv:2010.08833 [pdf, other]

Efficient and Compact Convolutional Neural Network Architectures for Non-temporal Real-time Fire Detection

Authors: William Thomson, Neelanjan Bhowmik, Toby P. Breckon

Abstract: Automatic visual fire detection is used to complement traditional fire detection sensor systems (smoke/heat). In this work, we investigate different Convolutional Neural Network (CNN) architectures and their variants for the non-temporal real-time bounds detection of fire pixel regions in video (or still) imagery. Two reduced complexity compact CNN architectures (NasNet-A-OnFire and ShuffleNetV2-O… ▽ More Automatic visual fire detection is used to complement traditional fire detection sensor systems (smoke/heat). In this work, we investigate different Convolutional Neural Network (CNN) architectures and their variants for the non-temporal real-time bounds detection of fire pixel regions in video (or still) imagery. Two reduced complexity compact CNN architectures (NasNet-A-OnFire and ShuffleNetV2-OnFire) are proposed through experimental analysis to optimise the computational efficiency for this task. The results improve upon the current state-of-the-art solution for fire detection, achieving an accuracy of 95% for full-frame binary classification and 97% for superpixel localisation. We notably achieve a classification speed up by a factor of 2.3x for binary classification and 1.3x for superpixel localisation, with runtime of 40 fps and 18 fps respectively, outperforming prior work in the field presenting an efficient, robust and real-time solution for fire region detection. Subsequent implementation on low-powered devices (Nvidia Xavier-NX, achieving 49 fps for full-frame classification via ShuffleNetV2-OnFire) demonstrates our architectures are suitable for various real-world deployment applications. △ Less

Submitted 17 October, 2020; originally announced October 2020.

arXiv:2008.06318 [pdf, other]

Not 3D Re-ID: a Simple Single Stream 2D Convolution for Robust Video Re-identification

Authors: Toby P. Breckon, Aishah Alsehaim

Abstract: Video-based person re-identification has received increasing attention recently, as it plays an important role within surveillance video analysis. Video-based Re-ID is an expansion of earlier image-based re-identification methods by learning features from a video via multiple image frames for each person. Most contemporary video Re-ID methods utilise complex CNNbased network architectures using 3D… ▽ More Video-based person re-identification has received increasing attention recently, as it plays an important role within surveillance video analysis. Video-based Re-ID is an expansion of earlier image-based re-identification methods by learning features from a video via multiple image frames for each person. Most contemporary video Re-ID methods utilise complex CNNbased network architectures using 3D convolution or multibranch networks to extract spatial-temporal video features. By contrast, in this paper, we illustrate superior performance from a simple single stream 2D convolution network leveraging the ResNet50-IBN architecture to extract frame-level features followed by temporal attention for clip level features. These clip level features can be generalised to extract video level features by averaging without any significant additional cost. Our approach uses best video Re-ID practice and transfer learning between datasets to outperform existing state-of-the-art approaches on the MARS, PRID2011 and iLIDS-VID datasets with 89:62%, 97:75%, 97:33% rank-1 accuracy respectively and with 84:61% mAP for MARS, without reliance on complex and memory intensive 3D convolutions or multi-stream networks architectures as found in other contemporary work. Conversely, our work shows that global features extracted by the 2D convolution network are a sufficient representation for robust state of the art video Re-ID. △ Less

Submitted 17 August, 2020; v1 submitted 14 August, 2020; originally announced August 2020.

Comments: have been submitted to ICPR 2020 and has been ACCEPTED for presentation and inclusion in the proceedings

arXiv:2008.01218 [pdf, other]

Multi-Class 3D Object Detection Within Volumetric 3D Computed Tomography Baggage Security Screening Imagery

Authors: Qian Wang, Neelanjan Bhowmik, Toby P. Breckon

Abstract: Automatic detection of prohibited objects within passenger baggage is important for aviation security. X-ray Computed Tomography (CT) based 3D imaging is widely used in airports for aviation security screening whilst prior work on automatic prohibited item detection focus primarily on 2D X-ray imagery. These works have proven the possibility of extending deep convolutional neural networks (CNN) ba… ▽ More Automatic detection of prohibited objects within passenger baggage is important for aviation security. X-ray Computed Tomography (CT) based 3D imaging is widely used in airports for aviation security screening whilst prior work on automatic prohibited item detection focus primarily on 2D X-ray imagery. These works have proven the possibility of extending deep convolutional neural networks (CNN) based automatic prohibited item detection from 2D X-ray imagery to volumetric 3D CT baggage security screening imagery. However, previous work on 3D object detection in baggage security screening imagery focused on the detection of one specific type of objects (e.g., either {\it bottles} or {\it handguns}). As a result, multiple models are needed if more than one type of prohibited item is required to be detected in practice. In this paper, we consider the detection of multiple object categories of interest using one unified framework. To this end, we formulate a more challenging multi-class 3D object detection problem within 3D CT imagery and propose a viable solution (3D RetinaNet) to tackle this problem. To enhance the performance of detection we investigate a variety of strategies including data augmentation and varying backbone networks. Experimentation carried out to provide both quantitative and qualitative evaluations of the proposed approach to multi-class 3D object detection within 3D CT baggage security screening imagery. Experimental results demonstrate the combination of the 3D RetinaNet and a series of favorable strategies can achieve a mean Average Precision (mAP) of 65.3\% over five object classes (i.e. {\it bottles, handguns, binoculars, glock frames, iPods}). The overall performance is affected by the poor performance on {\it glock frames} and {\it iPods} due to the lack of data and their resemblance with the baggage clutter. △ Less

Submitted 3 August, 2020; originally announced August 2020.

Comments: Durham University

arXiv:2008.01214 [pdf, other]

Generalized Zero-Shot Domain Adaptation via Coupled Conditional Variational Autoencoders

Authors: Qian Wang, Toby P. Breckon

Abstract: Domain adaptation approaches aim to exploit useful information from the source domain where supervised learning examples are easier to obtain to address a learning problem in the target domain where there is no or limited availability of such examples. In classification problems, domain adaptation has been studied under varying supervised, unsupervised and semi-supervised conditions. However, a co… ▽ More Domain adaptation approaches aim to exploit useful information from the source domain where supervised learning examples are easier to obtain to address a learning problem in the target domain where there is no or limited availability of such examples. In classification problems, domain adaptation has been studied under varying supervised, unsupervised and semi-supervised conditions. However, a common situation when the labelled samples are available for a subset of target domain classes has been overlooked. In this paper, we formulate this particular domain adaptation problem within a generalized zero-shot learning framework by treating the labelled source domain samples as semantic representations for zero-shot learning. For this particular problem, neither conventional domain adaptation approaches nor zero-shot learning algorithms directly apply. To address this generalized zero-shot domain adaptation problem, we present a novel Coupled Conditional Variational Autoencoder (CCVAE) which can generate synthetic target domain features for unseen classes from their source domain counterparts. Extensive experiments have been conducted on three domain adaptation datasets including a bespoke X-ray security checkpoint dataset to simulate a real-world application in aviation security. The results demonstrate the effectiveness of our proposed approach both against established benchmarks and in terms of real-world applicability. △ Less

Submitted 3 August, 2020; originally announced August 2020.

Comments: Durham University

arXiv:2007.14314 [pdf, other]

On the Impact of Lossy Image and Video Compression on the Performance of Deep Convolutional Neural Network Architectures

Authors: Matt Poyser, Amir Atapour-Abarghouei, Toby P. Breckon

Abstract: Recent advances in generalized image understanding have seen a surge in the use of deep convolutional neural networks (CNN) across a broad range of image-based detection, classification and prediction tasks. Whilst the reported performance of these approaches is impressive, this study investigates the hitherto unapproached question of the impact of commonplace image and video compression technique… ▽ More Recent advances in generalized image understanding have seen a surge in the use of deep convolutional neural networks (CNN) across a broad range of image-based detection, classification and prediction tasks. Whilst the reported performance of these approaches is impressive, this study investigates the hitherto unapproached question of the impact of commonplace image and video compression techniques on the performance of such deep learning architectures. Focusing on the JPEG and H.264 (MPEG-4 AVC) as a representative proxy for contemporary lossy image/video compression techniques that are in common use within network-connected image/video devices and infrastructure, we examine the impact on performance across five discrete tasks: human pose estimation, semantic segmentation, object detection, action recognition, and monocular depth estimation. As such, within this study we include a variety of network architectures and domains spanning end-to-end convolution, encoder-decoder, region-based CNN (R-CNN), dual-stream, and generative adversarial networks (GAN). Our results show a non-linear and non-uniform relationship between network performance and the level of lossy compression applied. Notably, performance decreases significantly below a JPEG quality (quantization) level of 15% and a H.264 Constant Rate Factor (CRF) of 40. However, retraining said architectures on pre-compressed imagery conversely recovers network performance by up to 78.4% in some cases. Furthermore, there is a correlation between architectures employing an encoder-decoder pipeline and those that demonstrate resilience to lossy image compression. The characteristics of the relationship between input compression to output task performance can be used to inform design decisions within future image/video devices and infrastructure. △ Less

Submitted 28 July, 2020; originally announced July 2020.

Comments: 8 pages, 21 figures, to be published in ICPR 2020 conference

arXiv:2005.02436 [pdf, other]

Data Augmentation via Mixed Class Interpolation using Cycle-Consistent Generative Adversarial Networks Applied to Cross-Domain Imagery

Authors: Hiroshi Sasaki, Chris G. Willcocks, Toby P. Breckon

Abstract: Machine learning driven object detection and classification within non-visible imagery has an important role in many fields such as night vision, all-weather surveillance and aviation security. However, such applications often suffer due to the limited quantity and variety of non-visible spectral domain imagery, in contrast to the high data availability of visible-band imagery that readily enables… ▽ More Machine learning driven object detection and classification within non-visible imagery has an important role in many fields such as night vision, all-weather surveillance and aviation security. However, such applications often suffer due to the limited quantity and variety of non-visible spectral domain imagery, in contrast to the high data availability of visible-band imagery that readily enables contemporary deep learning driven detection and classification approaches. To address this problem, this paper proposes and evaluates a novel data augmentation approach that leverages the more readily available visible-band imagery via a generative domain transfer model. The model can synthesise large volumes of non-visible domain imagery by image-to-image (I2I) translation from the visible image domain. Furthermore, we show that the generation of interpolated mixed class (non-visible domain) image examples via our novel Conditional CycleGAN Mixup Augmentation (C2GMA) methodology can lead to a significant improvement in the quality of non-visible domain classification tasks that otherwise suffer due to limited data availability. Focusing on classification within the Synthetic Aperture Radar (SAR) domain, our approach is evaluated on a variation of the Statoil/C-CORE Iceberg Classifier Challenge dataset and achieves 75.4% accuracy, demonstrating a significant improvement when compared against traditional data augmentation strategies (Rotation, Mixup, and MixCycleGAN). △ Less

Submitted 1 January, 2021; v1 submitted 5 May, 2020; originally announced May 2020.

Comments: 9 pages, 9 figures, accepted at the 25th International Conference on Pattern Recognition (ICPR 2020)

arXiv:2004.12427 [pdf, other]

Cross-Domain Structure Preserving Projection for Heterogeneous Domain Adaptation

Authors: Qian Wang, Toby P. Breckon

Abstract: Heterogeneous Domain Adaptation (HDA) addresses the transfer learning problems where data from the source and target domains are of different modalities (e.g., texts and images) or feature dimensions (e.g., features extracted with different methods). It is useful for multi-modal data analysis. Traditional domain adaptation algorithms assume that the representations of source and target samples res… ▽ More Heterogeneous Domain Adaptation (HDA) addresses the transfer learning problems where data from the source and target domains are of different modalities (e.g., texts and images) or feature dimensions (e.g., features extracted with different methods). It is useful for multi-modal data analysis. Traditional domain adaptation algorithms assume that the representations of source and target samples reside in the same feature space, hence are likely to fail in solving the heterogeneous domain adaptation problem. Contemporary state-of-the-art HDA approaches are usually composed of complex optimization objectives for favourable performance and are therefore computationally expensive and less generalizable. To address these issues, we propose a novel Cross-Domain Structure Preserving Projection (CDSPP) algorithm for HDA. As an extension of the classic LPP to heterogeneous domains, CDSPP aims to learn domain-specific projections to map sample features from source and target domains into a common subspace such that the class consistency is preserved and data distributions are sufficiently aligned. CDSPP is simple and has deterministic solutions by solving a generalized eigenvalue problem. It is naturally suitable for supervised HDA but has also been extended for semi-supervised HDA where the unlabelled target domain samples are available. Extensive experiments have been conducted on commonly used benchmark datasets (i.e. Office-Caltech, Multilingual Reuters Collection, NUS-WIDE-ImageNet) for HDA as well as the Office-Home dataset firstly introduced for HDA by ourselves due to its significantly larger number of classes than the existing ones (65 vs 10, 6 and 8). The experimental results of both supervised and semi-supervised HDA demonstrate the superior performance of our proposed method against contemporary state-of-the-art methods. △ Less

Submitted 8 October, 2021; v1 submitted 26 April, 2020; originally announced April 2020.

Comments: Technical Report

arXiv:2004.08945 [pdf, other]

Exploring Racial Bias within Face Recognition via per-subject Adversarially-Enabled Data Augmentation

Authors: Seyma Yucer, Samet Akçay, Noura Al-Moubayed, Toby P. Breckon

Abstract: Whilst face recognition applications are becoming increasingly prevalent within our daily lives, leading approaches in the field still suffer from performance bias to the detriment of some racial profiles within society. In this study, we propose a novel adversarial derived data augmentation methodology that aims to enable dataset balance at a per-subject level via the use of image-to-image transf… ▽ More Whilst face recognition applications are becoming increasingly prevalent within our daily lives, leading approaches in the field still suffer from performance bias to the detriment of some racial profiles within society. In this study, we propose a novel adversarial derived data augmentation methodology that aims to enable dataset balance at a per-subject level via the use of image-to-image transformation for the transfer of sensitive racial characteristic facial features. Our aim is to automatically construct a synthesised dataset by transforming facial images across varying racial domains, while still preserving identity-related features, such that racially dependant features subsequently become irrelevant within the determination of subject identity. We construct our experiments on three significant face recognition variants: Softmax, CosFace and ArcFace loss over a common convolutional neural network backbone. In a side-by-side comparison, we show the positive impact our proposed technique can have on the recognition performance for (racial) minority groups within an originally imbalanced training dataset by reducing the pre-race variance in performance. △ Less

Submitted 19 April, 2020; originally announced April 2020.

Comments: CVPR 2020 - Fair, Data Efficient and Trusted Computer Vision Workshop

arXiv:2003.12625 [pdf, other]

On the Evaluation of Prohibited Item Classification and Detection in Volumetric 3D Computed Tomography Baggage Security Screening Imagery

Authors: Qian Wang, Neelanjan Bhowmik, Toby P. Breckon

Abstract: X-ray Computed Tomography (CT) based 3D imaging is widely used in airports for aviation security screening whilst prior work on prohibited item detection focuses primarily on 2D X-ray imagery. In this paper, we aim to evaluate the possibility of extending the automatic prohibited item detection from 2D X-ray imagery to volumetric 3D CT baggage security screening imagery. To these ends, we take adv… ▽ More X-ray Computed Tomography (CT) based 3D imaging is widely used in airports for aviation security screening whilst prior work on prohibited item detection focuses primarily on 2D X-ray imagery. In this paper, we aim to evaluate the possibility of extending the automatic prohibited item detection from 2D X-ray imagery to volumetric 3D CT baggage security screening imagery. To these ends, we take advantage of 3D Convolutional Neural Neworks (CNN) and popular object detection frameworks such as RetinaNet and Faster R-CNN in our work. As the first attempt to use 3D CNN for volumetric 3D CT baggage security screening, we first evaluate different CNN architectures on the classification of isolated prohibited item volumes and compare against traditional methods which use hand-crafted features. Subsequently, we evaluate object detection performance of different architectures on volumetric 3D CT baggage images. The results of our experiments on Bottle and Handgun datasets demonstrate that 3D CNN models can achieve comparable performance (98% true positive rate and 1.5% false positive rate) to traditional methods but require significantly less time for inference (0.014s per volume). Furthermore, the extended 3D object detection models achieve promising performance in detecting prohibited items within volumetric 3D CT baggage imagery with 76% mAP for bottles and 88% mAP for handguns, which shows both the challenge and promise of such threat detection within 3D CT X-ray security imagery. △ Less

Submitted 27 March, 2020; originally announced March 2020.

Comments: Accepted to IJCNN 2020

arXiv:2001.05459 [pdf, other]

A Reference Architecture for Plausible Threat Image Projection (TIP) Within 3D X-ray Computed Tomography Volumes

Authors: Qian Wang, Najla Megherbi, Toby P. Breckon

Abstract: Threat Image Projection (TIP) is a technique used in X-ray security baggage screening systems that superimposes a threat object signature onto a benign X-ray baggage image in a plausible and realistic manner. It has been shown to be highly effective in evaluating the ongoing performance of human operators, improving their vigilance and performance on threat detection. However, with the increasing… ▽ More Threat Image Projection (TIP) is a technique used in X-ray security baggage screening systems that superimposes a threat object signature onto a benign X-ray baggage image in a plausible and realistic manner. It has been shown to be highly effective in evaluating the ongoing performance of human operators, improving their vigilance and performance on threat detection. However, with the increasing use of 3D Computed Tomography (CT) in aviation security for both hold and cabin baggage screening a significant challenge arises in extending TIP to 3D CT volumes due to the difficulty in 3D CT volume segmentation and the proper insertion location determination. In this paper, we present an approach for 3D TIP in CT volumes targeting realistic and plausible threat object insertion within 3D CT baggage images. The proposed approach consists of dual threat (source) and baggage (target) volume segmentation, particle swarm optimisation based insertion determination and metal artefact generation. In addition, we propose a TIP quality score metric to evaluate the quality of generated TIP volumes. Qualitative evaluations on real 3D CT baggage imagery show that our approach is able to generate realistic and plausible TIP which are indiscernible from real CT volumes and the TIP quality scores are consistent with human evaluations. △ Less

Submitted 15 January, 2020; originally announced January 2020.

Comments: Technical Report, Durham University

arXiv:1912.05684 [pdf, other]

Online Deep Reinforcement Learning for Autonomous UAV Navigation and Exploration of Outdoor Environments

Authors: Bruna G. Maciel-Pearson, Letizia Marchegiani, Samet Akcay, Amir Atapour-Abarghouei, James Garforth, Toby P. Breckon

Abstract: With the rapidly growing expansion in the use of UAVs, the ability to autonomously navigate in varying environments and weather conditions remains a highly desirable but as-of-yet unsolved challenge. In this work, we use Deep Reinforcement Learning to continuously improve the learning and understanding of a UAV agent while exploring a partially observable environment, which simulates the challenge… ▽ More With the rapidly growing expansion in the use of UAVs, the ability to autonomously navigate in varying environments and weather conditions remains a highly desirable but as-of-yet unsolved challenge. In this work, we use Deep Reinforcement Learning to continuously improve the learning and understanding of a UAV agent while exploring a partially observable environment, which simulates the challenges faced in a real-life scenario. Our innovative approach uses a double state-input strategy that combines the acquired knowledge from the raw image and a map containing positional information. This positional data aids the network understanding of where the UAV has been and how far it is from the target position, while the feature map from the current scene highlights cluttered areas that are to be avoided. Our approach is extensively tested using variants of Deep Q-Network adapted to cope with double state input data. Further, we demonstrate that by altering the reward and the Q-value function, the agent is capable of consistently outperforming the adapted Deep Q-Network, Double Deep Q- Network and Deep Recurrent Q-Network. Our results demonstrate that our proposed Extended Double Deep Q-Network (EDDQN) approach is capable of navigating through multiple unseen environments and under severe weather conditions. △ Less

Submitted 11 December, 2019; originally announced December 2019.

Comments: Journal Submission

arXiv:1911.09010 [pdf, other]

Experimental Exploration of Compact Convolutional Neural Network Architectures for Non-temporal Real-time Fire Detection

Authors: Ganesh Samarth C. A., Neelanjan Bhowmik, Toby P. Breckon

Abstract: In this work we explore different Convolutional Neural Network (CNN) architectures and their variants for non-temporal binary fire detection and localization in video or still imagery. We consider the performance of experimentally defined, reduced complexity deep CNN architectures for this task and evaluate the effects of different optimization and normalization techniques applied to different CNN… ▽ More In this work we explore different Convolutional Neural Network (CNN) architectures and their variants for non-temporal binary fire detection and localization in video or still imagery. We consider the performance of experimentally defined, reduced complexity deep CNN architectures for this task and evaluate the effects of different optimization and normalization techniques applied to different CNN architectures (spanning the Inception, ResNet and EfficientNet architectural concepts). Contrary to contemporary trends in the field, our work illustrates a maximum overall accuracy of 0.96 for full frame binary fire detection and 0.94 for superpixel localization using an experimentally defined reduced CNN architecture based on the concept of InceptionV4. We notably achieve a lower false positive rate of 0.06 compared to prior work in the field presenting an efficient, robust and real-time solution for fire region detection. △ Less

Submitted 20 November, 2019; originally announced November 2019.

arXiv:1911.08966 [pdf, ps, other]

Evaluating the Transferability and Adversarial Discrimination of Convolutional Neural Networks for Threat Object Detection and Classification within X-Ray Security Imagery

Authors: Yona Falinie A. Gaus, Neelanjan Bhowmik, Samet Akcay, Toby P. Breckon

Abstract: X-ray imagery security screening is essential to maintaining transport security against a varying profile of threat or prohibited items. Particular interest lies in the automatic detection and classification of weapons such as firearms and knives within complex and cluttered X-ray security imagery. Here, we address this problem by exploring various end-to-end object detection Convolutional Neural… ▽ More X-ray imagery security screening is essential to maintaining transport security against a varying profile of threat or prohibited items. Particular interest lies in the automatic detection and classification of weapons such as firearms and knives within complex and cluttered X-ray security imagery. Here, we address this problem by exploring various end-to-end object detection Convolutional Neural Network (CNN) architectures. We evaluate several leading variants spanning the Faster R-CNN, Mask R-CNN, and RetinaNet architectures to explore the transferability of such models between varying X-ray scanners with differing imaging geometries, image resolutions and material colour profiles. Whilst the limited availability of X-ray threat imagery can pose a challenge, we employ a transfer learning approach to evaluate whether such inter-scanner generalisation may exist over a multiple class detection problem. Overall, we achieve maximal detection performance using a Faster R-CNN architecture with a ResNet$_{101}$ classification network, obtaining 0.88 and 0.86 of mean Average Precision (mAP) for a three-class and two class item from varying X-ray imaging sources. Our results exhibit a remarkable degree of generalisability in terms of cross-scanner performance (mAP: 0.87, firearm detection: 0.94 AP). In addition, we examine the inherent adversarial discriminative capability of such networks using a specifically generated adversarial dataset for firearms detection - with a variable low false positive, as low as 5%, this shows both the challenge and promise of such threat detection within X-ray security imagery. △ Less

Submitted 20 November, 2019; originally announced November 2019.

arXiv:1911.08216 [pdf, other]

On the Impact of Object and Sub-component Level Segmentation Strategies for Supervised Anomaly Detection within X-ray Security Imagery

Authors: Neelanjan Bhowmik, Yona Falinie A. Gaus, Samet Akcay, Jack W. Barker, Toby P. Breckon

Abstract: X-ray security screening is in widespread use to maintain transportation security against a wide range of potential threat profiles. Of particular interest is the recent focus on the use of automated screening approaches, including the potential anomaly detection as a methodology for concealment detection within complex electronic items. Here we address this problem considering varying segmentatio… ▽ More X-ray security screening is in widespread use to maintain transportation security against a wide range of potential threat profiles. Of particular interest is the recent focus on the use of automated screening approaches, including the potential anomaly detection as a methodology for concealment detection within complex electronic items. Here we address this problem considering varying segmentation strategies to enable the use of both object level and sub-component level anomaly detection via the use of secondary convolutional neural network (CNN) architectures. Relative performance is evaluated over an extensive dataset of exemplar cluttered X-ray imagery, with a focus on consumer electronics items. We find that sub-component level segmentation produces marginally superior performance in the secondary anomaly detection via classification stage, with true positive of ~98% of anomalies, with a ~3% false positive. △ Less

Submitted 19 November, 2019; originally announced November 2019.

arXiv:1911.07990 [pdf, other]

Crowd Counting via Segmentation Guided Attention Networks and Curriculum Loss

Authors: Qian Wang, Toby P. Breckon

Abstract: Automatic crowd behaviour analysis is an important task for intelligent transportation systems to enable effective flow control and dynamic route planning for varying road participants. Crowd counting is one of the keys to automatic crowd behaviour analysis. Crowd counting using deep convolutional neural networks (CNN) has achieved encouraging progress in recent years. Researchers have devoted muc… ▽ More Automatic crowd behaviour analysis is an important task for intelligent transportation systems to enable effective flow control and dynamic route planning for varying road participants. Crowd counting is one of the keys to automatic crowd behaviour analysis. Crowd counting using deep convolutional neural networks (CNN) has achieved encouraging progress in recent years. Researchers have devoted much effort to the design of variant CNN architectures and most of them are based on the pre-trained VGG16 model. Due to the insufficient expressive capacity, the backbone network of VGG16 is usually followed by another cumbersome network specially designed for good counting performance. Although VGG models have been outperformed by Inception models in image classification tasks, the existing crowd counting networks built with Inception modules still only have a small number of layers with basic types of Inception modules. To fill in this gap, in this paper, we firstly benchmark the baseline Inception-v3 model on commonly used crowd counting datasets and achieve surprisingly good performance comparable with or better than most existing crowd counting models. Subsequently, we push the boundary of this disruptive work further by proposing a Segmentation Guided Attention Network (SGANet) with Inception-v3 as the backbone and a novel curriculum loss for crowd counting. We conduct thorough experiments to compare the performance of our SGANet with prior arts and the proposed model can achieve state-of-the-art performance with MAE of 57.6, 6.3 and 87.6 on ShanghaiTechA, ShanghaiTechB and UCF\_QNRF, respectively. △ Less

Submitted 3 August, 2020; v1 submitted 18 November, 2019; originally announced November 2019.

Comments: Technical Report, Durham University

arXiv:1911.07982 [pdf, other]

Unsupervised Domain Adaptation via Structured Prediction Based Selective Pseudo-Labeling

Authors: Qian Wang, Toby P. Breckon

Abstract: Unsupervised domain adaptation aims to address the problem of classifying unlabeled samples from the target domain whilst labeled samples are only available from the source domain and the data distributions are different in these two domains. As a result, classifiers trained from labeled samples in the source domain suffer from significant performance drop when directly applied to the samples from… ▽ More Unsupervised domain adaptation aims to address the problem of classifying unlabeled samples from the target domain whilst labeled samples are only available from the source domain and the data distributions are different in these two domains. As a result, classifiers trained from labeled samples in the source domain suffer from significant performance drop when directly applied to the samples from the target domain. To address this issue, different approaches have been proposed to learn domain-invariant features or domain-specific classifiers. In either case, the lack of labeled samples in the target domain can be an issue which is usually overcome by pseudo-labeling. Inaccurate pseudo-labeling, however, could result in catastrophic error accumulation during learning. In this paper, we propose a novel selective pseudo-labeling strategy based on structured prediction. The idea of structured prediction is inspired by the fact that samples in the target domain are well clustered within the deep feature space so that unsupervised clustering analysis can be used to facilitate accurate pseudo-labeling. Experimental results on four datasets (i.e. Office-Caltech, Office31, ImageCLEF-DA and Office-Home) validate our approach outperforms contemporary state-of-the-art methods. △ Less

Submitted 18 November, 2019; originally announced November 2019.

Comments: Accepted to AAAI 2020

arXiv:1909.11508 [pdf, other]

The Good, the Bad and the Ugly: Evaluating Convolutional Neural Networks for Prohibited Item Detection Using Real and Synthetically Composited X-ray Imagery

Authors: Neelanjan Bhowmik, Qian Wang, Yona Falinie A. Gaus, Marcin Szarek, Toby P. Breckon

Abstract: Detecting prohibited items in X-ray security imagery is pivotal in maintaining border and transport security against a wide range of threat profiles. Convolutional Neural Networks (CNN) with the support of a significant volume of data have brought advancement in such automated prohibited object detection and classification. However, collating such large volumes of X-ray security imagery remains a… ▽ More Detecting prohibited items in X-ray security imagery is pivotal in maintaining border and transport security against a wide range of threat profiles. Convolutional Neural Networks (CNN) with the support of a significant volume of data have brought advancement in such automated prohibited object detection and classification. However, collating such large volumes of X-ray security imagery remains a significant challenge. This work opens up the possibility of using synthetically composed imagery, avoiding the need to collate such large volumes of hand-annotated real-world imagery. Here we investigate the difference in detection performance achieved using real and synthetic X-ray training imagery for CNN architecture detecting three exemplar prohibited items, {Firearm, Firearm Parts, Knives}, within cluttered and complex X-ray security baggage imagery. We achieve 0.88 of mean average precision (mAP) with a Faster R-CNN and ResNet-101 CNN architecture for this 3-class object detection using real X-ray imagery. While the performance is comparable with synthetically composited X-ray imagery (0.78 mAP), our extended evaluation demonstrates both challenge and promise of using synthetically composed images to diversify the X-ray security training imagery for automated detection algorithm training. △ Less

Submitted 25 September, 2019; originally announced September 2019.

Journal ref: In Proc. British Machine Vision Conference Workshops, BMVA, 2019

arXiv:1909.07697 [pdf, other]

Multi-Task Learning for Automotive Foggy Scene Understanding via Domain Adaptation to an Illumination-Invariant Representation

Authors: Naif Alshammari, Samet Akçay, Toby P. Breckon

Abstract: Joint scene understanding and segmentation for automotive applications is a challenging problem in two key aspects:- (1) classifying every pixel in the entire scene and (2) performing this task under unstable weather and illumination changes (e.g. foggy weather), which results in poor outdoor scene visibility. This poor outdoor scene visibility leads to a non-optimal performance of deep convolutio… ▽ More Joint scene understanding and segmentation for automotive applications is a challenging problem in two key aspects:- (1) classifying every pixel in the entire scene and (2) performing this task under unstable weather and illumination changes (e.g. foggy weather), which results in poor outdoor scene visibility. This poor outdoor scene visibility leads to a non-optimal performance of deep convolutional neural network-based scene understanding and segmentation. In this paper, we propose an efficient end-to-end contemporary automotive semantic scene understanding approach under foggy weather conditions, employing domain adaptation and illumination-invariant image per-transformation. As a multi-task pipeline, our proposed model provides:- (1) transferring images from extreme to clear-weather condition using domain transfer approach and (2) semantically segmenting a scene using a competitive encoder-decoder convolutional neural network (CNN) with dense connectivity, skip connections and fusion-based techniques. We evaluate our approach on challenging foggy datasets, including synthetic dataset (Foggy Cityscapes) as well as real-world datasets (Foggy Zurich and Foggy Driving). By incorporating RGB, depth, and illumination-invariant information, our approach outperforms the state-of-the-art within automotive scene understanding, under foggy weather condition. △ Less

Submitted 17 September, 2019; originally announced September 2019.

Comments: Conference submission

Showing 1–50 of 64 results for author: Breckon, T P