-
Ev-GS: Event-based Gaussian splatting for Efficient and Accurate Radiance Field Rendering
Authors:
Jingqian Wu,
Shuo Zhu,
Chutian Wang,
Edmund Y. Lam
Abstract:
Computational neuromorphic imaging (CNI) with event cameras offers advantages such as minimal motion blur and enhanced dynamic range, compared to conventional frame-based methods. Existing event-based radiance field rendering methods are built on neural radiance field, which is computationally heavy and slow in reconstruction speed. Motivated by the two aspects, we introduce Ev-GS, the first CNI-i…
▽ More
Computational neuromorphic imaging (CNI) with event cameras offers advantages such as minimal motion blur and enhanced dynamic range, compared to conventional frame-based methods. Existing event-based radiance field rendering methods are built on neural radiance field, which is computationally heavy and slow in reconstruction speed. Motivated by the two aspects, we introduce Ev-GS, the first CNI-informed scheme to infer 3D Gaussian splatting from a monocular event camera, enabling efficient novel view synthesis. Leveraging 3D Gaussians with pure event-based supervision, Ev-GS overcomes challenges such as the detection of fast-moving objects and insufficient lighting. Experimental results show that Ev-GS outperforms the method that takes frame-based signals as input by rendering realistic views with reduced blurring and improved visual quality. Moreover, it demonstrates competitive reconstruction quality and reduced computing occupancy compared to existing methods, which paves the way to a highly efficient CNI approach for signal processing.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Harnessing Data and Physics for Deep Learning Phase Recovery
Authors:
Kaiqiang Wang,
Edmund Y. Lam
Abstract:
Phase recovery, calculating the phase of a light wave from its intensity measurements, is essential for various applications, such as coherent diffraction imaging, adaptive optics, and biomedical imaging. It enables the reconstruction of an object's refractive index distribution or topography as well as the correction of imaging system aberrations. In recent years, deep learning has been proven to…
▽ More
Phase recovery, calculating the phase of a light wave from its intensity measurements, is essential for various applications, such as coherent diffraction imaging, adaptive optics, and biomedical imaging. It enables the reconstruction of an object's refractive index distribution or topography as well as the correction of imaging system aberrations. In recent years, deep learning has been proven to be highly effective in addressing phase recovery problems. Two main deep learning phase recovery strategies are data-driven (DD) with supervised learning mode and physics-driven (PD) with self-supervised learning mode. DD and PD achieve the same goal in different ways and lack the necessary study to reveal similarities and differences. Therefore, in this paper, we comprehensively compare these two deep learning phase recovery strategies in terms of time consumption, accuracy, generalization ability, ill-posedness adaptability, and prior capacity. What's more, we propose a co-driven (CD) strategy of combining datasets and physics for the balance of high- and low-frequency information. The codes for DD, PD, and CD are publicly available at https://github.com/kqwang/DLPR.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
CausalChaos! Dataset for Comprehensive Causal Action Question Answering Over Longer Causal Chains Grounded in Dynamic Visual Scenes
Authors:
Paritosh Parmar,
Eric Peh,
Ruirui Chen,
Ting En Lam,
Yuhan Chen,
Elston Tan,
Basura Fernando
Abstract:
Causal video question answering (QA) has garnered increasing interest, yet existing datasets often lack depth in causal reasoning. To address this gap, we capitalize on the unique properties of cartoons and construct CausalChaos!, a novel, challenging causal Why-QA dataset built upon the iconic "Tom and Jerry" cartoon series. Cartoons use the principles of animation that allow animators to create…
▽ More
Causal video question answering (QA) has garnered increasing interest, yet existing datasets often lack depth in causal reasoning. To address this gap, we capitalize on the unique properties of cartoons and construct CausalChaos!, a novel, challenging causal Why-QA dataset built upon the iconic "Tom and Jerry" cartoon series. Cartoons use the principles of animation that allow animators to create expressive, unambiguous causal relationships between events to form a coherent storyline. Utilizing these properties, along with thought-provoking questions and multi-level answers (answer and detailed causal explanation), our questions involve causal chains that interconnect multiple dynamic interactions between characters and visual scenes. These factors demand models to solve more challenging, yet well-defined causal relationships. We also introduce hard incorrect answer mining, including a causally confusing version that is even more challenging. While models perform well, there is much room for improvement, especially, on open-ended answers. We identify more advanced/explicit causal relationship modeling & joint modeling of vision and language as the immediate areas for future efforts to focus upon. Along with the other complementary datasets, our new challenging dataset will pave the way for these developments in the field.
△ Less
Submitted 14 June, 2024; v1 submitted 1 April, 2024;
originally announced April 2024.
-
Branch and Price for the Length-Constrained Cycle Partition Problem
Authors:
Mohammed Ghannam,
Gioni Mexi,
Edward Lam,
Ambros Gleixner
Abstract:
The length-constrained cycle partition problem (LCCP) is a graph optimization problem in which a set of nodes must be partitioned into a minimum number of cycles. Every node is associated with a critical time and the length of every cycle must not exceed the critical time of any node in the cycle. We formulate LCCP as a set partitioning model and solve it using an exact branch-and-price approach.…
▽ More
The length-constrained cycle partition problem (LCCP) is a graph optimization problem in which a set of nodes must be partitioned into a minimum number of cycles. Every node is associated with a critical time and the length of every cycle must not exceed the critical time of any node in the cycle. We formulate LCCP as a set partitioning model and solve it using an exact branch-and-price approach. We use a dynamic programming-based pricing algorithm to generate improving cycles, exploiting the particular structure of the pricing problem for efficient bidirectional search and symmetry breaking. Computational results show that the LP relaxation of the set partitioning model produces strong dual bounds and our branch-and-price method improves significantly over the state of the art. It is able to solve closed instances in a fraction of the previously needed time and closes 13 previously unsolved instances, one of which has 76 nodes, a notable improvement over the previous limit of 52 nodes.
△ Less
Submitted 2 February, 2024; v1 submitted 31 January, 2024;
originally announced January 2024.
-
Correct and Compositional Hardware Generators
Authors:
Rachit Nigam,
Ethan Gabizon,
Edmund Lam,
Adrian Sampson
Abstract:
Hardware generators help designers explore families of concrete designs and their efficiency trade-offs. Both parameterized hardware description languages (HDLs) and higher-level programming models, however, can obstruct composability. Different concrete designs in a family can have dramatically different timing behavior, and high-level hardware generators rarely expose a consistent HDL-level inte…
▽ More
Hardware generators help designers explore families of concrete designs and their efficiency trade-offs. Both parameterized hardware description languages (HDLs) and higher-level programming models, however, can obstruct composability. Different concrete designs in a family can have dramatically different timing behavior, and high-level hardware generators rarely expose a consistent HDL-level interface. Composition, therefore, is typically only feasible at the level of individual instances: the user generates concrete designs and then composes them, sacrificing the ability to parameterize the combined design.
We design Parafil, a system for correctly composing hardware generators. Parafil builds on Filament, an HDL with strong compile-time guarantees, and lifts those guarantees to generators to prove that all possible instantiations are free of timing bugs. Parafil can integrate with external hardware generators via a novel system of output parameters and a framework for invoking generator tools. We conduct experiments with two other generators, FloPoCo and Google's XLS, and we implement a parameterized FFT generator to show that Parafil ensures correct design space exploration.
△ Less
Submitted 4 January, 2024;
originally announced January 2024.
-
A Unifying Tensor View for Lightweight CNNs
Authors:
Jason Chun Lok Li,
Rui Lin,
Jiajun Zhou,
Edmund Yin Mun Lam,
Ngai Wong
Abstract:
Despite the decomposition of convolutional kernels for lightweight CNNs being well studied, existing works that rely on tensor network diagrams or hyperdimensional abstraction lack geometry intuition. This work devises a new perspective by linking a 3D-reshaped kernel tensor to its various slice-wise and rank-1 decompositions, permitting a straightforward connection between various tensor approxim…
▽ More
Despite the decomposition of convolutional kernels for lightweight CNNs being well studied, existing works that rely on tensor network diagrams or hyperdimensional abstraction lack geometry intuition. This work devises a new perspective by linking a 3D-reshaped kernel tensor to its various slice-wise and rank-1 decompositions, permitting a straightforward connection between various tensor approximations and efficient CNN modules. Specifically, it is discovered that a pointwise-depthwise-pointwise (PDP) configuration constitutes a viable construct for lightweight CNNs. Moreover, a novel link to the latest ShiftNet is established, inspiring a first-ever shift layer pruning that achieves nearly 50% compression with < 1% drop in accuracy for ShiftResNet.
△ Less
Submitted 15 December, 2023;
originally announced December 2023.
-
Segment Anything Model is a Good Teacher for Local Feature Learning
Authors:
Jingqian Wu,
Rongtao Xu,
Zach Wood-Doughty,
Changwei Wang,
Shibiao Xu,
Edmund Y. Lam
Abstract:
Local feature detection and description play an important role in many computer vision tasks, which are designed to detect and describe keypoints in "any scene" and "any downstream task". Data-driven local feature learning methods need to rely on pixel-level correspondence for training, which is challenging to acquire at scale, thus hindering further improvements in performance. In this paper, we…
▽ More
Local feature detection and description play an important role in many computer vision tasks, which are designed to detect and describe keypoints in "any scene" and "any downstream task". Data-driven local feature learning methods need to rely on pixel-level correspondence for training, which is challenging to acquire at scale, thus hindering further improvements in performance. In this paper, we propose SAMFeat to introduce SAM (segment anything model), a fundamental model trained on 11 million images, as a teacher to guide local feature learning and thus inspire higher performance on limited datasets. To do so, first, we construct an auxiliary task of Attention-weighted Semantic Relation Distillation (ASRD), which distillates feature relations with category-agnostic semantic information learned by the SAM encoder into a local feature learning network, to improve local feature description using semantic discrimination. Second, we develop a technique called Weakly Supervised Contrastive Learning Based on Semantic Grouping (WSC), which utilizes semantic groupings derived from SAM as weakly supervised signals, to optimize the metric space of local descriptors. Third, we design an Edge Attention Guidance (EAG) to further improve the accuracy of local feature detection and description by prompting the network to pay more attention to the edge region guided by SAM. SAMFeat's performance on various tasks such as image matching on HPatches, and long-term visual localization on Aachen Day-Night showcases its superiority over previous local features. The release code is available at https://github.com/vignywang/SAMFeat.
△ Less
Submitted 17 June, 2024; v1 submitted 29 September, 2023;
originally announced September 2023.
-
Neuromorphic Imaging and Classification with Graph Learning
Authors:
Pei Zhang,
Chutian Wang,
Edmund Y. Lam
Abstract:
Bio-inspired neuromorphic cameras asynchronously record pixel brightness changes and generate sparse event streams. They can capture dynamic scenes with little motion blur and more details in extreme illumination conditions. Due to the multidimensional address-event structure, most existing vision algorithms cannot properly handle asynchronous event streams. While several event representations and…
▽ More
Bio-inspired neuromorphic cameras asynchronously record pixel brightness changes and generate sparse event streams. They can capture dynamic scenes with little motion blur and more details in extreme illumination conditions. Due to the multidimensional address-event structure, most existing vision algorithms cannot properly handle asynchronous event streams. While several event representations and processing methods have been developed to address such an issue, they are typically driven by a large number of events, leading to substantial overheads in runtime and memory. In this paper, we propose a new graph representation of the event data and couple it with a Graph Transformer to perform accurate neuromorphic classification. Extensive experiments show that our approach leads to better results and excels at the challenging realistic situations where only a small number of events and limited computational resources are available, paving the way for neuromorphic applications embedded into mobile facilities.
△ Less
Submitted 21 March, 2024; v1 submitted 27 September, 2023;
originally announced September 2023.
-
On the use of deep learning for phase recovery
Authors:
Kaiqiang Wang,
Li Song,
Chutian Wang,
Zhenbo Ren,
Guangyuan Zhao,
Jiazhen Dou,
Jianglei Di,
George Barbastathis,
Renjie Zhou,
Jianlin Zhao,
Edmund Y. Lam
Abstract:
Phase recovery (PR) refers to calculating the phase of the light field from its intensity measurements. As exemplified from quantitative phase imaging and coherent diffraction imaging to adaptive optics, PR is essential for reconstructing the refractive index distribution or topography of an object and correcting the aberration of an imaging system. In recent years, deep learning (DL), often imple…
▽ More
Phase recovery (PR) refers to calculating the phase of the light field from its intensity measurements. As exemplified from quantitative phase imaging and coherent diffraction imaging to adaptive optics, PR is essential for reconstructing the refractive index distribution or topography of an object and correcting the aberration of an imaging system. In recent years, deep learning (DL), often implemented through deep neural networks, has provided unprecedented support for computational imaging, leading to more efficient solutions for various PR problems. In this review, we first briefly introduce conventional methods for PR. Then, we review how DL provides support for PR from the following three stages, namely, pre-processing, in-processing, and post-processing. We also review how DL is used in phase image processing. Finally, we summarize the work in DL for PR and outlook on how to better use DL to improve the reliability and efficiency in PR. Furthermore, we present a live-updating resource (https://github.com/kqwang/phase-recovery) for readers to learn more about PR.
△ Less
Submitted 2 August, 2023;
originally announced August 2023.
-
Improving Video Colorization by Test-Time Tuning
Authors:
Yaping Zhao,
Haitian Zheng,
Jiebo Luo,
Edmund Y. Lam
Abstract:
With the advancements in deep learning, video colorization by propagating color information from a colorized reference frame to a monochrome video sequence has been well explored. However, the existing approaches often suffer from overfitting the training dataset and sequentially lead to suboptimal performance on colorizing testing samples. To address this issue, we propose an effective method, wh…
▽ More
With the advancements in deep learning, video colorization by propagating color information from a colorized reference frame to a monochrome video sequence has been well explored. However, the existing approaches often suffer from overfitting the training dataset and sequentially lead to suboptimal performance on colorizing testing samples. To address this issue, we propose an effective method, which aims to enhance video colorization through test-time tuning. By exploiting the reference to construct additional training samples during testing, our approach achieves a performance boost of 1~3 dB in PSNR on average compared to the baseline. Code is available at: https://github.com/IndigoPurple/T3
△ Less
Submitted 25 June, 2023;
originally announced July 2023.
-
PGformer: Proxy-Bridged Game Transformer for Multi-Person Highly Interactive Extreme Motion Prediction
Authors:
Yanwen Fang,
Jintai Chen,
Peng-Tao Jiang,
Chao Li,
Yifeng Geng,
Eddy K. F. Lam,
Guodong Li
Abstract:
Multi-person motion prediction is a challenging task, especially for real-world scenarios of highly interacted persons. Most previous works have been devoted to studying the case of weak interactions (e.g., walking together), in which typically forecasting each human pose in isolation can still achieve good performances. This paper focuses on collaborative motion prediction for multiple persons wi…
▽ More
Multi-person motion prediction is a challenging task, especially for real-world scenarios of highly interacted persons. Most previous works have been devoted to studying the case of weak interactions (e.g., walking together), in which typically forecasting each human pose in isolation can still achieve good performances. This paper focuses on collaborative motion prediction for multiple persons with extreme motions and attempts to explore the relationships between the highly interactive persons' pose trajectories. Specifically, a novel cross-query attention (XQA) module is proposed to bilaterally learn the cross-dependencies between the two pose sequences tailored for this situation. A proxy unit is additionally introduced to bridge the involved persons, which cooperates with our proposed XQA module and subtly controls the bidirectional spatial information flows. These designs are then integrated into a Transformer-based architecture and the resulting model is called Proxy-bridged Game Transformer (PGformer) for multi-person interactive motion prediction. Its effectiveness has been evaluated on the challenging ExPI dataset, which involves highly interactive actions. Our PGformer consistently outperforms the state-of-the-art methods in both short- and long-term predictions by a large margin. Besides, our approach can also be compatible with the weakly interacted CMU-Mocap and MuPoTS-3D datasets and extended to the case of more than 2 individuals with encouraging results.
△ Less
Submitted 7 January, 2024; v1 submitted 5 June, 2023;
originally announced June 2023.
-
Context-Aware Transformer for 3D Point Cloud Automatic Annotation
Authors:
Xiaoyan Qian,
Chang Liu,
Xiaojuan Qi,
Siew-Chong Tan,
Edmund Lam,
Ngai Wong
Abstract:
3D automatic annotation has received increased attention since manually annotating 3D point clouds is laborious. However, existing methods are usually complicated, e.g., pipelined training for 3D foreground/background segmentation, cylindrical object proposals, and point completion. Furthermore, they often overlook the inter-object feature relation that is particularly informative to hard samples…
▽ More
3D automatic annotation has received increased attention since manually annotating 3D point clouds is laborious. However, existing methods are usually complicated, e.g., pipelined training for 3D foreground/background segmentation, cylindrical object proposals, and point completion. Furthermore, they often overlook the inter-object feature relation that is particularly informative to hard samples for 3D annotation. To this end, we propose a simple yet effective end-to-end Context-Aware Transformer (CAT) as an automated 3D-box labeler to generate precise 3D box annotations from 2D boxes, trained with a small number of human annotations. We adopt the general encoder-decoder architecture, where the CAT encoder consists of an intra-object encoder (local) and an inter-object encoder (global), performing self-attention along the sequence and batch dimensions, respectively. The former models intra-object interactions among points, and the latter extracts feature relations among different objects, thus boosting scene-level understanding. Via local and global encoders, CAT can generate high-quality 3D box annotations with a streamlined workflow, allowing it to outperform existing state-of-the-art by up to 1.79% 3D AP on the hard task of the KITTI test set.
△ Less
Submitted 26 March, 2023;
originally announced March 2023.
-
Unsupervised Light Field Depth Estimation via Multi-view Feature Matching with Occlusion Prediction
Authors:
Shansi Zhang,
Nan Meng,
Edmund Y. Lam
Abstract:
Depth estimation from light field (LF) images is a fundamental step for numerous applications. Recently, learning-based methods have achieved higher accuracy and efficiency than the traditional methods. However, it is costly to obtain sufficient depth labels for supervised training. In this paper, we propose an unsupervised framework to estimate depth from LF images. First, we design a disparity e…
▽ More
Depth estimation from light field (LF) images is a fundamental step for numerous applications. Recently, learning-based methods have achieved higher accuracy and efficiency than the traditional methods. However, it is costly to obtain sufficient depth labels for supervised training. In this paper, we propose an unsupervised framework to estimate depth from LF images. First, we design a disparity estimation network (DispNet) with a coarse-to-fine structure to predict disparity maps from different view combinations. It explicitly performs multi-view feature matching to learn the correspondences effectively. As occlusions may cause the violation of photo-consistency, we introduce an occlusion prediction network (OccNet) to predict the occlusion maps, which are used as the element-wise weights of photometric loss to solve the occlusion issue and assist the disparity learning. With the disparity maps estimated by multiple input combinations, we then propose a disparity fusion strategy based on the estimated errors with effective occlusion handling to obtain the final disparity map with higher accuracy. Experimental results demonstrate that our method achieves superior performance on both the dense and sparse LF images, and also shows better robustness and generalization on the real-world LF images compared to the other methods.
△ Less
Submitted 18 August, 2023; v1 submitted 20 January, 2023;
originally announced January 2023.
-
PATE: Property, Amenities, Traffic and Emotions Coming Together for Real Estate Price Prediction
Authors:
Yaping Zhao,
Ramgopal Ravi,
Shuhui Shi,
Zhongrui Wang,
Edmund Y. Lam,
Jichang Zhao
Abstract:
Real estate prices have a significant impact on individuals, families, businesses, and governments. The general objective of real estate price prediction is to identify and exploit socioeconomic patterns arising from real estate transactions over multiple aspects, ranging from the property itself to other contributing factors. However, price prediction is a challenging multidimensional problem tha…
▽ More
Real estate prices have a significant impact on individuals, families, businesses, and governments. The general objective of real estate price prediction is to identify and exploit socioeconomic patterns arising from real estate transactions over multiple aspects, ranging from the property itself to other contributing factors. However, price prediction is a challenging multidimensional problem that involves estimating many characteristics beyond the property itself. In this paper, we use multiple sources of data to evaluate the economic contribution of different socioeconomic characteristics such as surrounding amenities, traffic conditions and social emotions. Our experiments were conducted on 28,550 houses in Beijing, China and we rank each characteristic by its importance. Since the use of multi-source information improves the accuracy of predictions, the aforementioned characteristics can be an invaluable resource to assess the economic and social value of real estate. Code and data are available at: https://github.com/IndigoPurple/PATE
△ Less
Submitted 11 October, 2022; v1 submitted 29 August, 2022;
originally announced September 2022.
-
LRT: An Efficient Low-Light Restoration Transformer for Dark Light Field Images
Authors:
Shansi Zhang,
Nan Meng,
Edmund Y. Lam
Abstract:
Light field (LF) images containing information for multiple views have numerous applications, which can be severely affected by low-light imaging. Recent learning-based methods for low-light enhancement have some disadvantages, such as a lack of noise suppression, complex training process and poor performance in extremely low-light conditions. To tackle these deficiencies while fully utilizing the…
▽ More
Light field (LF) images containing information for multiple views have numerous applications, which can be severely affected by low-light imaging. Recent learning-based methods for low-light enhancement have some disadvantages, such as a lack of noise suppression, complex training process and poor performance in extremely low-light conditions. To tackle these deficiencies while fully utilizing the multi-view information, we propose an efficient Low-light Restoration Transformer (LRT) for LF images, with multiple heads to perform intermediate tasks within a single network, including denoising, luminance adjustment, refinement and detail enhancement, achieving progressive restoration from small scale to full scale. Moreover, we design an angular transformer block with an efficient view-token scheme to model the global angular dependencies, and a multi-scale spatial transformer block to encode the multi-scale local and global information within each view. To address the issue of insufficient training data, we formulate a synthesis pipeline by simulating the major noise sources with the estimated noise parameters of LF camera. Experimental results demonstrate that our method achieves the state-of-the-art performance on low-light LF restoration with high efficiency.
△ Less
Submitted 15 March, 2023; v1 submitted 5 September, 2022;
originally announced September 2022.
-
H4M: Heterogeneous, Multi-source, Multi-modal, Multi-view and Multi-distributional Dataset for Socioeconomic Analytics in the Case of Beijing
Authors:
Yaping Zhao,
Shuhui Shi,
Ramgopal Ravi,
Zhongrui Wang,
Edmund Y. Lam,
Jichang Zhao
Abstract:
The study of socioeconomic status has been reformed by the availability of digital records containing data on real estate, points of interest, traffic and social media trends such as micro-blogging. In this paper, we describe a heterogeneous, multi-source, multi-modal, multi-view and multi-distributional dataset named "H4M". The mixed dataset contains data on real estate transactions, points of in…
▽ More
The study of socioeconomic status has been reformed by the availability of digital records containing data on real estate, points of interest, traffic and social media trends such as micro-blogging. In this paper, we describe a heterogeneous, multi-source, multi-modal, multi-view and multi-distributional dataset named "H4M". The mixed dataset contains data on real estate transactions, points of interest, traffic patterns and micro-blogging trends from Beijing, China. The unique composition of H4M makes it an ideal test bed for methodologies and approaches aimed at studying and solving problems related to real estate, traffic, urban mobility planning, social sentiment analysis etc. The dataset is available at: https://indigopurple.github.io/H4M/index.html
△ Less
Submitted 11 August, 2022;
originally announced August 2022.
-
Multimodal Transformer for Automatic 3D Annotation and Object Detection
Authors:
Chang Liu,
Xiaoyan Qian,
Binxiao Huang,
Xiaojuan Qi,
Edmund Lam,
Siew-Chong Tan,
Ngai Wong
Abstract:
Despite a growing number of datasets being collected for training 3D object detection models, significant human effort is still required to annotate 3D boxes on LiDAR scans. To automate the annotation and facilitate the production of various customized datasets, we propose an end-to-end multimodal transformer (MTrans) autolabeler, which leverages both LiDAR scans and images to generate precise 3D…
▽ More
Despite a growing number of datasets being collected for training 3D object detection models, significant human effort is still required to annotate 3D boxes on LiDAR scans. To automate the annotation and facilitate the production of various customized datasets, we propose an end-to-end multimodal transformer (MTrans) autolabeler, which leverages both LiDAR scans and images to generate precise 3D box annotations from weak 2D bounding boxes. To alleviate the pervasive sparsity problem that hinders existing autolabelers, MTrans densifies the sparse point clouds by generating new 3D points based on 2D image information. With a multi-task design, MTrans segments the foreground/background, densifies LiDAR point clouds, and regresses 3D boxes simultaneously. Experimental results verify the effectiveness of the MTrans for improving the quality of the generated labels. By enriching the sparse point clouds, our method achieves 4.48\% and 4.03\% better 3D AP on KITTI moderate and hard samples, respectively, versus the state-of-the-art autolabeler. MTrans can also be extended to improve the accuracy for 3D object detection, resulting in a remarkable 89.45\% AP on KITTI hard samples. Codes are at \url{https://github.com/Cliu2/MTrans}.
△ Less
Submitted 20 July, 2022;
originally announced July 2022.
-
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Authors:
Aarohi Srivastava,
Abhinav Rastogi,
Abhishek Rao,
Abu Awal Md Shoeb,
Abubakar Abid,
Adam Fisch,
Adam R. Brown,
Adam Santoro,
Aditya Gupta,
Adrià Garriga-Alonso,
Agnieszka Kluska,
Aitor Lewkowycz,
Akshat Agarwal,
Alethea Power,
Alex Ray,
Alex Warstadt,
Alexander W. Kocurek,
Ali Safaya,
Ali Tazarv,
Alice Xiang,
Alicia Parrish,
Allen Nie,
Aman Hussain,
Amanda Askell,
Amanda Dsouza
, et al. (426 additional authors not shown)
Abstract:
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur…
▽ More
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting.
△ Less
Submitted 12 June, 2023; v1 submitted 9 June, 2022;
originally announced June 2022.
-
MAP-Gen: An Automated 3D-Box Annotation Flow with Multimodal Attention Point Generator
Authors:
Chang Liu,
Xiaoyan Qian,
Xiaojuan Qi,
Edmund Y. Lam,
Siew-Chong Tan,
Ngai Wong
Abstract:
Manually annotating 3D point clouds is laborious and costly, limiting the training data preparation for deep learning in real-world object detection. While a few previous studies tried to automatically generate 3D bounding boxes from weak labels such as 2D boxes, the quality is sub-optimal compared to human annotators. This work proposes a novel autolabeler, called multimodal attention point gener…
▽ More
Manually annotating 3D point clouds is laborious and costly, limiting the training data preparation for deep learning in real-world object detection. While a few previous studies tried to automatically generate 3D bounding boxes from weak labels such as 2D boxes, the quality is sub-optimal compared to human annotators. This work proposes a novel autolabeler, called multimodal attention point generator (MAP-Gen), that generates high-quality 3D labels from weak 2D boxes. It leverages dense image information to tackle the sparsity issue of 3D point clouds, thus improving label quality. For each 2D pixel, MAP-Gen predicts its corresponding 3D coordinates by referencing context points based on their 2D semantic or geometric relationships. The generated 3D points densify the original sparse point clouds, followed by an encoder to regress 3D bounding boxes. Using MAP-Gen, object detection networks that are weakly supervised by 2D boxes can achieve 94~99% performance of those fully supervised by 3D annotations. It is hopeful this newly proposed MAP-Gen autolabeling flow can shed new light on utilizing multimodal information for enriching sparse point clouds.
△ Less
Submitted 29 March, 2022;
originally announced March 2022.
-
Point Cloud Denoising via Momentum Ascent in Gradient Fields
Authors:
Yaping Zhao,
Haitian Zheng,
Zhongrui Wang,
Jiebo Luo,
Edmund Y. Lam
Abstract:
To achieve point cloud denoising, traditional methods heavily rely on geometric priors, and most learning-based approaches suffer from outliers and loss of details. Recently, the gradient-based method was proposed to estimate the gradient fields from the noisy point clouds using neural networks, and refine the position of each point according to the estimated gradient. However, the predicted gradi…
▽ More
To achieve point cloud denoising, traditional methods heavily rely on geometric priors, and most learning-based approaches suffer from outliers and loss of details. Recently, the gradient-based method was proposed to estimate the gradient fields from the noisy point clouds using neural networks, and refine the position of each point according to the estimated gradient. However, the predicted gradient could fluctuate, leading to perturbed and unstable solutions, as well as a long inference time. To address these issues, we develop the momentum gradient ascent method that leverages the information of previous iterations in determining the trajectories of the points, thus improving the stability of the solution and reducing the inference time. Experiments demonstrate that the proposed method outperforms state-of-the-art approaches with a variety of point clouds, noise types, and noise levels. Code is available at: https://github.com/IndigoPurple/MAG
△ Less
Submitted 25 June, 2023; v1 submitted 21 February, 2022;
originally announced February 2022.
-
MANet: Improving Video Denoising with a Multi-Alignment Network
Authors:
Yaping Zhao,
Haitian Zheng,
Zhongrui Wang,
Jiebo Luo,
Edmund Y. Lam
Abstract:
In video denoising, the adjacent frames often provide very useful information, but accurate alignment is needed before such information can be harnassed. In this work, we present a multi-alignment network, which generates multiple flow proposals followed by attention-based averaging. It serves to mimic the non-local mechanism, suppressing noise by averaging multiple observations. Our approach can…
▽ More
In video denoising, the adjacent frames often provide very useful information, but accurate alignment is needed before such information can be harnassed. In this work, we present a multi-alignment network, which generates multiple flow proposals followed by attention-based averaging. It serves to mimic the non-local mechanism, suppressing noise by averaging multiple observations. Our approach can be applied to various state-of-the-art models that are based on flow estimation. Experiments on a large-scale video dataset demonstrate that our method improves the denoising baseline model by 0.2dB, and further reduces the parameters by 47% with model distillation. Code is available at https://github.com/IndigoPurple/MANet.
△ Less
Submitted 11 July, 2022; v1 submitted 19 February, 2022;
originally announced February 2022.
-
Namesakes: Ambiguously Named Entities from Wikipedia and News
Authors:
Oleg Vasilyev,
Aysu Altun,
Nidhi Vyas,
Vedant Dharnidharka,
Erika Lam,
John Bohannon
Abstract:
We present Namesakes, a dataset of ambiguously named entities obtained from English-language Wikipedia and news articles. It consists of 58862 mentions of 4148 unique entities and their namesakes: 1000 mentions from news, 28843 from Wikipedia articles about the entity, and 29019 Wikipedia backlink mentions. Namesakes should be helpful in establishing challenging benchmarks for the task of named en…
▽ More
We present Namesakes, a dataset of ambiguously named entities obtained from English-language Wikipedia and news articles. It consists of 58862 mentions of 4148 unique entities and their namesakes: 1000 mentions from news, 28843 from Wikipedia articles about the entity, and 29019 Wikipedia backlink mentions. Namesakes should be helpful in establishing challenging benchmarks for the task of named entity linking (NEL).
△ Less
Submitted 22 November, 2021;
originally announced November 2021.
-
An Effective Image Restorer: Denoising and Luminance Adjustment for Low-photon-count Imaging
Authors:
Shansi Zhang,
Edmund Y. Lam
Abstract:
Imaging under photon-scarce situations introduces challenges to many applications as the captured images are with low signal-to-noise ratio and poor luminance. In this paper, we investigate the raw image restoration under low-photon-count conditions by simulating the imaging of quanta image sensor (QIS). We develop a lightweight framework, which consists of a multi-level pyramid denoising network…
▽ More
Imaging under photon-scarce situations introduces challenges to many applications as the captured images are with low signal-to-noise ratio and poor luminance. In this paper, we investigate the raw image restoration under low-photon-count conditions by simulating the imaging of quanta image sensor (QIS). We develop a lightweight framework, which consists of a multi-level pyramid denoising network (MPDNet) and a luminance adjustment (LA) module to achieve separate denoising and luminance enhancement. The main component of our framework is the multi-skip attention residual block (MARB), which integrates multi-scale feature fusion and attention mechanism for better feature representation. Our MPDNet adopts the idea of Laplacian pyramid to learn the small-scale noise map and larger-scale high-frequency details at different levels, and feature extractions are conducted on the multi-scale input images to encode richer contextual information. Our LA module enhances the luminance of the denoised image by estimating its illumination, which can better avoid color distortion. Extensive experimental results have demonstrated that our image restorer can achieve superior performance on the degraded images with various photon levels by suppressing noise and recovering luminance and color effectively.
△ Less
Submitted 1 November, 2021; v1 submitted 29 October, 2021;
originally announced October 2021.
-
Transfer Learning U-Net Deep Learning for Lung Ultrasound Segmentation
Authors:
Dorothy Cheng,
Edmund Y. Lam
Abstract:
Transfer learning (TL) for medical image segmentation helps deep learning models achieve more accurate performances when there are scarce medical images. This study focuses on completing segmentation of the ribs from lung ultrasound images and finding the best TL technique with U-Net, a convolutional neural network for precise and fast image segmentation. Two approaches of TL were used, using a pr…
▽ More
Transfer learning (TL) for medical image segmentation helps deep learning models achieve more accurate performances when there are scarce medical images. This study focuses on completing segmentation of the ribs from lung ultrasound images and finding the best TL technique with U-Net, a convolutional neural network for precise and fast image segmentation. Two approaches of TL were used, using a pre-trained VGG16 model to build the U-Net (V-Unet) and pre-training U-Net network with grayscale natural salient object dataset (X-Unet). Visual results and dice coefficients (DICE) of the models were compared. X-Unet showed more accurate and artifact-free visual performances on the actual mask prediction, despite its lower DICE than V-Unet. A partial-frozen network fine-tuning (FT) technique was also applied to X-Unet to compare results between different FT strategies, which FT all layers slightly outperformed freezing part of the network. The effect of dataset sizes was also evaluated, showing the importance of the combination between TL and data augmentation.
△ Less
Submitted 5 October, 2021;
originally announced October 2021.
-
Cross-Camera Human Motion Transfer by Time Series Analysis
Authors:
Yaping Zhao,
Guanghan Li,
Edmund Y. Lam
Abstract:
With advances in optical sensor technology, heterogeneous camera systems are increasingly used for high-resolution (HR) video acquisition and analysis. However, motion transfer across multiple cameras poses challenges. To address this, we propose an algorithm based on time series analysis that identifies motion seasonality and constructs an additive model to extract transferable patterns. Validate…
▽ More
With advances in optical sensor technology, heterogeneous camera systems are increasingly used for high-resolution (HR) video acquisition and analysis. However, motion transfer across multiple cameras poses challenges. To address this, we propose an algorithm based on time series analysis that identifies motion seasonality and constructs an additive model to extract transferable patterns. Validated on real-world data, our algorithm demonstrates effectiveness and interpretability. Notably, it improves pose estimation in low-resolution videos by leveraging patterns derived from HR counterparts, enhancing practical utility. Code is available at: https://github.com/IndigoPurple/TSAMT
△ Less
Submitted 30 December, 2023; v1 submitted 28 September, 2021;
originally announced September 2021.
-
AET-EFN: A Versatile Design for Static and Dynamic Event-Based Vision
Authors:
Chang Liu,
Xiaojuan Qi,
Edmund Lam,
Ngai Wong
Abstract:
The neuromorphic event cameras, which capture the optical changes of a scene, have drawn increasing attention due to their high speed and low power consumption. However, the event data are noisy, sparse, and nonuniform in the spatial-temporal domain with an extremely high temporal resolution, making it challenging to design backend algorithms for event-based vision. Existing methods encode events…
▽ More
The neuromorphic event cameras, which capture the optical changes of a scene, have drawn increasing attention due to their high speed and low power consumption. However, the event data are noisy, sparse, and nonuniform in the spatial-temporal domain with an extremely high temporal resolution, making it challenging to design backend algorithms for event-based vision. Existing methods encode events into point-cloud-based or voxel-based representations, but suffer from noise and/or information loss. Additionally, there is little research that systematically studies how to handle static and dynamic scenes with one universal design for event-based vision. This work proposes the Aligned Event Tensor (AET) as a novel event data representation, and a neat framework called Event Frame Net (EFN), which enables our model for event-based vision under static and dynamic scenes. The proposed AET and EFN are evaluated on various datasets, and proved to surpass existing state-of-the-art methods by large margins. Our method is also efficient and achieves the fastest inference speed among others.
△ Less
Submitted 22 March, 2021;
originally announced March 2021.
-
A Scalable Two Stage Approach to Computing Optimal Decision Sets
Authors:
Alexey Ignatiev,
Edward Lam,
Peter J. Stuckey,
Joao Marques-Silva
Abstract:
Machine learning (ML) is ubiquitous in modern life. Since it is being deployed in technologies that affect our privacy and safety, it is often crucial to understand the reasoning behind its decisions, warranting the need for explainable AI. Rule-based models, such as decision trees, decision lists, and decision sets, are conventionally deemed to be the most interpretable. Recent work uses proposit…
▽ More
Machine learning (ML) is ubiquitous in modern life. Since it is being deployed in technologies that affect our privacy and safety, it is often crucial to understand the reasoning behind its decisions, warranting the need for explainable AI. Rule-based models, such as decision trees, decision lists, and decision sets, are conventionally deemed to be the most interpretable. Recent work uses propositional satisfiability (SAT) solving (and its optimization variants) to generate minimum-size decision sets. Motivated by limited practical scalability of these earlier methods, this paper proposes a novel approach to learn minimum-size decision sets by enumerating individual rules of the target decision set independently of each other, and then solving a set cover problem to select a subset of rules. The approach makes use of modern maximum satisfiability and integer linear programming technologies. Experiments on a wide range of publicly available datasets demonstrate the advantage of the new approach over the state of the art in SAT-based decision set learning.
△ Less
Submitted 3 February, 2021;
originally announced February 2021.
-
Light Field View Synthesis via Aperture Disparity and Warping Confidence Map
Authors:
Nan Meng,
Kai Li,
Jianzhuang Liu,
Edmund Y. Lam
Abstract:
This paper presents a learning-based approach to synthesize the view from an arbitrary camera position given a sparse set of images. A key challenge for this novel view synthesis arises from the reconstruction process, when the views from different input images may not be consistent due to obstruction in the light path. We overcome this by jointly modeling the epipolar property and occlusion in de…
▽ More
This paper presents a learning-based approach to synthesize the view from an arbitrary camera position given a sparse set of images. A key challenge for this novel view synthesis arises from the reconstruction process, when the views from different input images may not be consistent due to obstruction in the light path. We overcome this by jointly modeling the epipolar property and occlusion in designing a convolutional neural network. We start by defining and computing the aperture disparity map, which approximates the parallax and measures the pixel-wise shift between two views. While this relates to free-space rendering and can fail near the object boundaries, we further develop a warping confidence map to address pixel occlusion in these challenging regions. The proposed method is evaluated on diverse real-world and synthetic light field scenes, and it shows better performance over several state-of-the-art techniques.
△ Less
Submitted 3 April, 2021; v1 submitted 7 September, 2020;
originally announced September 2020.
-
High-Order Residual Network for Light Field Super-Resolution
Authors:
Nan Meng,
Xiaofei Wu,
Jianzhuang Liu,
Edmund Y. Lam
Abstract:
Plenoptic cameras usually sacrifice the spatial resolution of their SAIs to acquire geometry information from different viewpoints. Several methods have been proposed to mitigate such spatio-angular trade-off, but seldom make use of the structural properties of the light field (LF) data efficiently. In this paper, we propose a novel high-order residual network to learn the geometric features hiera…
▽ More
Plenoptic cameras usually sacrifice the spatial resolution of their SAIs to acquire geometry information from different viewpoints. Several methods have been proposed to mitigate such spatio-angular trade-off, but seldom make use of the structural properties of the light field (LF) data efficiently. In this paper, we propose a novel high-order residual network to learn the geometric features hierarchically from the LF for reconstruction. An important component in the proposed network is the high-order residual block (HRB), which learns the local geometric features by considering the information from all input views. After fully obtaining the local features learned from each HRB, our model extracts the representative geometric features for spatio-angular upsampling through the global residual learning. Additionally, a refinement network is followed to further enhance the spatial details by minimizing a perceptual loss. Compared with previous work, our model is tailored to the rich structure inherent in the LF, and therefore can reduce the artifacts near non-Lambertian and occlusion regions. Experimental results show that our approach enables high-quality reconstruction even in challenging regions and outperforms state-of-the-art single image or LF reconstruction methods with both quantitative measurements and visual evaluation.
△ Less
Submitted 29 March, 2020;
originally announced March 2020.
-
High-dimensional Dense Residual Convolutional Neural Network for Light Field Reconstruction
Authors:
Nan Meng,
Hayden K. -H. So,
Xing Sun,
Edmund Y. Lam
Abstract:
We consider the problem of high-dimensional light field reconstruction and develop a learning-based framework for spatial and angular super-resolution. Many current approaches either require disparity clues or restore the spatial and angular details separately. Such methods have difficulties with non-Lambertian surfaces or occlusions. In contrast, we formulate light field super-resolution (LFSR) a…
▽ More
We consider the problem of high-dimensional light field reconstruction and develop a learning-based framework for spatial and angular super-resolution. Many current approaches either require disparity clues or restore the spatial and angular details separately. Such methods have difficulties with non-Lambertian surfaces or occlusions. In contrast, we formulate light field super-resolution (LFSR) as tensor restoration and develop a learning framework based on a two-stage restoration with 4-dimensional (4D) convolution. This allows our model to learn the features capturing the geometry information encoded in multiple adjacent views. Such geometric features vary near the occlusion regions and indicate the foreground object border. To train a feasible network, we propose a novel normalization operation based on a group of views in the feature maps, design a stage-wise loss function, and develop the multi-range training strategy to further improve the performance. Evaluations are conducted on a number of light field datasets including real-world scenes, synthetic data, and microscope light fields. The proposed method achieves superior performance and less execution time comparing with other state-of-the-art schemes.
△ Less
Submitted 17 September, 2020; v1 submitted 3 October, 2019;
originally announced October 2019.
-
Image Reconstruction Using Deep Learning
Authors:
Po-Yu Liu,
Edmund Y. Lam
Abstract:
This paper proposes a deep learning architecture that attains statistically significant improvements over traditional algorithms in Poisson image denoising espically when the noise is strong. Poisson noise commonly occurs in low-light and photon- limited settings, where the noise can be most accurately modeled by the Poission distribution. Poisson noise traditionally prevails only in specific fiel…
▽ More
This paper proposes a deep learning architecture that attains statistically significant improvements over traditional algorithms in Poisson image denoising espically when the noise is strong. Poisson noise commonly occurs in low-light and photon- limited settings, where the noise can be most accurately modeled by the Poission distribution. Poisson noise traditionally prevails only in specific fields such as astronomical imaging. However, with the booming market of surveillance cameras, which commonly operate in low-light environments, or mobile phones, which produce noisy night scene pictures due to lower-grade sensors, the necessity for an advanced Poisson image denoising algorithm has increased. Deep learning has achieved amazing breakthroughs in other imaging problems, such image segmentation and recognition, and this paper proposes a deep learning denoising network that outperforms traditional algorithms in Poisson denoising especially when the noise is strong. The architecture incorporates a hybrid of convolutional and deconvolutional layers along with symmetric connections. The denoising network achieved statistically significant 0.38dB, 0.68dB, and 1.04dB average PSNR gains over benchmark traditional algorithms in experiments with image peak values 4, 2, and 1. The denoising network can also operate with shorter computational time while still outperforming the benchmark algorithm by tuning the reconstruction stride sizes.
△ Less
Submitted 27 September, 2018;
originally announced September 2018.
-
Fast and robust misalignment correction of Fourier ptychographic microscopy
Authors:
Ao Zhou,
Wei Wang,
Ni Chen,
Edmund Y. Lam,
Byoungho Lee,
Guohai Situ
Abstract:
Fourier ptychographi cmicroscopy(FPM) is a newly developed computational imaging technique that can provide gigapixel images with both high resolution (HR) and wide field of view (FOV). However, the positional misalignment of the LED array induces a degradation of the reconstruction, especially in the regions away from the optical axis. In this paper, we propose a robust and fast method to correct…
▽ More
Fourier ptychographi cmicroscopy(FPM) is a newly developed computational imaging technique that can provide gigapixel images with both high resolution (HR) and wide field of view (FOV). However, the positional misalignment of the LED array induces a degradation of the reconstruction, especially in the regions away from the optical axis. In this paper, we propose a robust and fast method to correct the LED misalignment of FPM, termed as misalignment correction for FPM (mcFPM). Although different regions in the FOV have different sensitivity to the LED misalignment, the experimental results show that mcFPM is robust to eliminate the degradation in each region. Compared with the state-of-the-art methods, mcFPM is much faster.
△ Less
Submitted 19 February, 2018;
originally announced March 2018.
-
ChimpCheck: Property-Based Randomized Test Generation for Interactive Apps
Authors:
Edmund S. L. Lam,
Peilun Zhang,
Bor-Yuh Evan Chang
Abstract:
We consider the problem of generating relevant execution traces to test rich interactive applications. Rich interactive applications, such as apps on mobile platforms, are complex stateful and often distributed systems where sufficiently exercising the app with user-interaction (UI) event sequences to expose defects is both hard and time-consuming. In particular, there is a fundamental tension bet…
▽ More
We consider the problem of generating relevant execution traces to test rich interactive applications. Rich interactive applications, such as apps on mobile platforms, are complex stateful and often distributed systems where sufficiently exercising the app with user-interaction (UI) event sequences to expose defects is both hard and time-consuming. In particular, there is a fundamental tension between brute-force random UI exercising tools, which are fully-automated but offer low relevance, and UI test scripts, which are manual but offer high relevance. In this paper, we consider a middle way---enabling a seamless fusion of scripted and randomized UI testing. This fusion is prototyped in a testing tool called ChimpCheck for programming, generating, and executing property-based randomized test cases for Android apps. Our approach realizes this fusion by offering a high-level, embedded domain-specific language for defining custom generators of simulated user-interaction event sequences. What follows is a combinator library built on industrial strength frameworks for property-based testing (ScalaCheck) and Android testing (Android JUnit and Espresso) to implement property-based randomized testing for Android development. Driven by real, reported issues in open source Android apps, we show, through case studies, how ChimpCheck enables expressing effective testing patterns in a compact manner.
△ Less
Submitted 29 August, 2017;
originally announced August 2017.
-
Rank Persistence: Assessing the Temporal Performance of Real-World Person Re-Identification
Authors:
Srikrishna Karanam,
Eric Lam,
Richard J. Radke
Abstract:
Designing useful person re-identification systems for real-world applications requires attention to operational aspects not typically considered in academic research. Here, we focus on the temporal aspect of re-identification; that is, instead of finding a match to a probe person of interest in a fixed candidate gallery, we consider the more realistic scenario in which the gallery is continuously…
▽ More
Designing useful person re-identification systems for real-world applications requires attention to operational aspects not typically considered in academic research. Here, we focus on the temporal aspect of re-identification; that is, instead of finding a match to a probe person of interest in a fixed candidate gallery, we consider the more realistic scenario in which the gallery is continuously populated by new candidates over a long time period. A key question of interest for an operator of such a system is: how long is a correct match to a probe likely to remain in a rank-k shortlist of possible candidates? We propose to distill this information into a Rank Persistence Curve (RPC), which allows different algorithms' temporal performance characteristics to be directly compared. We present examples to illustrate the RPC using a new long-term dataset with multiple candidate reappearances, and discuss considerations for future re-identification research that explicitly involves temporal aspects.
△ Less
Submitted 4 June, 2017; v1 submitted 2 June, 2017;
originally announced June 2017.
-
Analysis of the noise in back-projection light field acquisition and its optimization
Authors:
Ni Chen,
Zhenbo Ren,
Dayan Li,
Edmund Y. Lam,
Guohai Situ
Abstract:
Light field reconstruction from images captured by focal plane sweeping can achieve high lateral resolution comparable to the modern camera sensor. This is impossible for the conventional micro-lenslet based light field capture systems. However, the severe defocus noise and the low depth resolution limit its applications. In this paper, we analyze the defocus noise and the depth resolution in the…
▽ More
Light field reconstruction from images captured by focal plane sweeping can achieve high lateral resolution comparable to the modern camera sensor. This is impossible for the conventional micro-lenslet based light field capture systems. However, the severe defocus noise and the low depth resolution limit its applications. In this paper, we analyze the defocus noise and the depth resolution in the focal plane sweeping based light field reconstruction technique, and propose a method to reduce the defocus noise and improve the depth resolution. Both numerical and experimental results verify the proposed method.
△ Less
Submitted 29 December, 2016;
originally announced January 2017.
-
Consistency Analysis for the Doubly Stochastic Dirichlet Process
Authors:
Xing Sun,
Nelson H. C. Yung,
Edmund Y. Lam,
Hayden K. -H. So
Abstract:
This technical report proves components consistency for the Doubly Stochastic Dirichlet Process with exponential convergence of posterior probability. We also present the fundamental properties for DSDP as well as inference algorithms. Simulation toy experiment and real-world experiment results for single and multi-cluster also support the consistency proof. This report is also a support document…
▽ More
This technical report proves components consistency for the Doubly Stochastic Dirichlet Process with exponential convergence of posterior probability. We also present the fundamental properties for DSDP as well as inference algorithms. Simulation toy experiment and real-world experiment results for single and multi-cluster also support the consistency proof. This report is also a support document for the paper "Computationally Efficient Hyperspectral Data Learning Based on the Doubly Stochastic Dirichlet Process".
△ Less
Submitted 24 May, 2016;
originally announced May 2016.
-
Spectrally and Energy Efficient OFDM (SEE-OFDM) for Intensity Modulated Optical Wireless Systems
Authors:
Emily Lam,
Sarah Kate Wilson,
Hany Elgala,
Thomas D. C. Little
Abstract:
Spectrally and energy efficient orthogonal frequency division multiplexing (SEE-OFDM) is an optical OFDM technique based on combining multiple asymmetrically clipped optical OFDM (ACO-OFDM) signals into one OFDM signal. By summing different components together, SEE-OFDM can achieve the same spectral efficiency as DC-biased optical OFDM (DCO-OFDM) without an energy-inefficient DC-bias. This paper i…
▽ More
Spectrally and energy efficient orthogonal frequency division multiplexing (SEE-OFDM) is an optical OFDM technique based on combining multiple asymmetrically clipped optical OFDM (ACO-OFDM) signals into one OFDM signal. By summing different components together, SEE-OFDM can achieve the same spectral efficiency as DC-biased optical OFDM (DCO-OFDM) without an energy-inefficient DC-bias. This paper introduces multiple methods for decoding a SEE-OFDM symbol and shows that an iterative decoder with hard decisions gives the best performance. Being a multi-component format, different energy allocation amongst the different components of SEE-OFDM is possible. However, equal energy allocation performs 1.5 dB better than unequal energy allocation. A hard-decision, iterative subtraction receiver can further increase performance by another 1.5 dB over soft-decision subtraction and reconstruction receivers. SEE-OFDM consistently performs 3 dB or better and with higher spectral efficiency than ACO-OFDM at the same bit-error-rate (BER). Comparing other combination methods at the same BER, SEE-OFDM performs up to 3 dB better than hybrid asymmetrically clipped optical (OFDM) (HACO-OFDM) and up to 1.5 dB better than asymmetrically and symmetrically clipped optical OFDM (ASCO-OFDM) and enhanced unipolar OFDM (eU-OFDM) when using hard decisions at the receiver. Additionally, SEE-OFDM has the best peak-to-average-power rate (PAPR) as compared to the other combination OFDM formats and ACO-OFDM, which makes it excellent for any range limited optical source, such as laser diodes and light-emitting diodes (LEDs). In summary, SEE-OFDM is shown to have excellent properties to glean additional capacity from an intensity modulation and direct detection (IM/DD) optical wireless communications system.
△ Less
Submitted 27 October, 2015;
originally announced October 2015.
-
Constraint Handling Rules with Multiset Comprehension Patterns
Authors:
Edmund S. L. Lam,
Iliano Cervesato
Abstract:
CHR is a declarative, concurrent and committed choice rule-based constraint programming language. We extend CHR with multiset comprehension patterns, providing the programmer with the ability to write multiset rewriting rules that can match a variable number of constraints in the store. This enables writing more readable, concise and declarative code for algorithms that coordinate large amounts of…
▽ More
CHR is a declarative, concurrent and committed choice rule-based constraint programming language. We extend CHR with multiset comprehension patterns, providing the programmer with the ability to write multiset rewriting rules that can match a variable number of constraints in the store. This enables writing more readable, concise and declarative code for algorithms that coordinate large amounts of data or require aggregate operations. We call this extension $\mathit{CHR}^\mathit{cp}$. We give a high-level abstract semantics of $\mathit{CHR}^\mathit{cp}$, followed by a lower-level operational semantics. We then show the soundness of this operational semantics with respect to the abstract semantics.
△ Less
Submitted 9 June, 2014;
originally announced June 2014.
-
Concurrent Goal-Based Execution of Constraint Handling Rules
Authors:
Edmund S. L. Lam,
Martin Sulzmann
Abstract:
(To appear in Theory and Practice of Logic Programming (TPLP)) We introduce a systematic, concurrent execution scheme for Constraint Handling Rules (CHR) based on a previously proposed sequential goal-based CHR semantics. We establish strong correspondence results to the abstract CHR semantics, thus guaranteeing that any answer in the concurrent, goal-based CHR semantics is reproducible in the abs…
▽ More
(To appear in Theory and Practice of Logic Programming (TPLP)) We introduce a systematic, concurrent execution scheme for Constraint Handling Rules (CHR) based on a previously proposed sequential goal-based CHR semantics. We establish strong correspondence results to the abstract CHR semantics, thus guaranteeing that any answer in the concurrent, goal-based CHR semantics is reproducible in the abstract CHR semantics. Our work provides the foundation to obtain efficient, parallel CHR execution schemes.
△ Less
Submitted 20 June, 2010; v1 submitted 15 June, 2010;
originally announced June 2010.