subscribe to arXiv mailings

arXiv:2405.19760 [pdf, ps, other]

Identifiability of a statistical model with two latent vectors: Importance of the dimensionality relation and application to graph embedding

Authors: Hiroaki Sasaki

Abstract: Identifiability of statistical models is a key notion in unsupervised representation learning. Recent work of nonlinear independent component analysis (ICA) employs auxiliary data and has established identifiable conditions. This paper proposes a statistical model of two latent vectors with single auxiliary data generalizing nonlinear ICA, and establishes various identifiability conditions. Unlike… ▽ More Identifiability of statistical models is a key notion in unsupervised representation learning. Recent work of nonlinear independent component analysis (ICA) employs auxiliary data and has established identifiable conditions. This paper proposes a statistical model of two latent vectors with single auxiliary data generalizing nonlinear ICA, and establishes various identifiability conditions. Unlike previous work, the two latent vectors in the proposed model can have arbitrary dimensions, and this property enables us to reveal an insightful dimensionality relation among two latent vectors and auxiliary data in identifiability conditions. Furthermore, surprisingly, we prove that the indeterminacies of the proposed model has the same as \emph{linear} ICA under certain conditions: The elements in the latent vector can be recovered up to their permutation and scales. Next, we apply the identifiability theory to a statistical model for graph data. As a result, one of the identifiability conditions includes an appealing implication: Identifiability of the statistical model could depend on the maximum value of link weights in graph data. Then, we propose a practical method for identifiable graph embedding. Finally, we numerically demonstrate that the proposed method well-recovers the latent vectors and model identifiability clearly depends on the maximum value of link weights, which supports the implication of our theoretical results △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2402.10781 [pdf, other]

Towards 6G Evolution: Three Enhancements, Three Innovations, and Three Major Challenges

Authors: Rohit Singh, Aryan Kaushik, Wonjae Shin, Marco Di Renzo, Vincenzo Sciancalepore, Doohwan Lee, Hirofumi Sasaki, Arman Shojaeifard, Octavia A. Dobre

Abstract: Over the past few decades, wireless communication has witnessed remarkable growth, experiencing several transformative changes. This article aims to provide a comprehensive overview of wireless communication technologies, from the foundations to the recent wireless advances. Specifically, we take a neutral look at the state-of-the-art technologies for 5G and the ongoing evolutions towards 6G, revi… ▽ More Over the past few decades, wireless communication has witnessed remarkable growth, experiencing several transformative changes. This article aims to provide a comprehensive overview of wireless communication technologies, from the foundations to the recent wireless advances. Specifically, we take a neutral look at the state-of-the-art technologies for 5G and the ongoing evolutions towards 6G, reviewing the recommendations of the International Mobile Communication vision for 2030 (IMT-2030). We first highlight specific features of IMT 2030, including three IMT-2020 extensions (URLLC+, eMBB+, and mMTC+) and three new innovations (Ubiquitous connectivity and integrating the new capabilities of sensing & AI with communication functionality). Then, we delve into three major challenges in implementing 6G, along with global standardization efforts. Besides, a proof of concept is provided by demonstrating terahertz (THz) signal transmission using Orbital Angular Momentum (OAM) multiplexing, which is one of the potential candidates for 6G and beyond. To inspire further potential research, we conclude by identifying research opportunities and future visions on IMT-2030 recommendations. △ Less

Submitted 16 February, 2024; originally announced February 2024.

Comments: 8 pages, 4 figures, 1 table

arXiv:2303.12375 [pdf, other]

Disturbance Injection under Partial Automation: Robust Imitation Learning for Long-horizon Tasks

Authors: Hirotaka Tahara, Hikaru Sasaki, Hanbit Oh, Edgar Anarossi, Takamitsu Matsubara

Abstract: Partial Automation (PA) with intelligent support systems has been introduced in industrial machinery and advanced automobiles to reduce the burden of long hours of human operation. Under PA, operators perform manual operations (providing actions) and operations that switch to automatic/manual mode (mode-switching). Since PA reduces the total duration of manual operation, these two action and mode-… ▽ More Partial Automation (PA) with intelligent support systems has been introduced in industrial machinery and advanced automobiles to reduce the burden of long hours of human operation. Under PA, operators perform manual operations (providing actions) and operations that switch to automatic/manual mode (mode-switching). Since PA reduces the total duration of manual operation, these two action and mode-switching operations can be replicated by imitation learning with high sample efficiency. To this end, this paper proposes Disturbance Injection under Partial Automation (DIPA) as a novel imitation learning framework. In DIPA, mode and actions (in the manual mode) are assumed to be observables in each state and are used to learn both action and mode-switching policies. The above learning is robustified by injecting disturbances into the operator's actions to optimize the disturbance's level for minimizing the covariate shift under PA. We experimentally validated the effectiveness of our method for long-horizon tasks in two simulations and a real robot environment and confirmed that our method outperformed the previous methods and reduced the demonstration burden. △ Less

Submitted 22 March, 2023; originally announced March 2023.

Comments: 8 pages, Accepted by Robotics and Automation Letters (RA-L) 2023

arXiv:2211.03393 [pdf, other]

Bayesian Disturbance Injection: Robust Imitation Learning of Flexible Policies for Robot Manipulation

Authors: Hanbit Oh, Hikaru Sasaki, Brendan Michael, Takamitsu Matsubara

Abstract: Humans demonstrate a variety of interesting behavioral characteristics when performing tasks, such as selecting between seemingly equivalent optimal actions, performing recovery actions when deviating from the optimal trajectory, or moderating actions in response to sensed risks. However, imitation learning, which attempts to teach robots to perform these same tasks from observations of human demo… ▽ More Humans demonstrate a variety of interesting behavioral characteristics when performing tasks, such as selecting between seemingly equivalent optimal actions, performing recovery actions when deviating from the optimal trajectory, or moderating actions in response to sensed risks. However, imitation learning, which attempts to teach robots to perform these same tasks from observations of human demonstrations, often fails to capture such behavior. Specifically, commonly used learning algorithms embody inherent contradictions between the learning assumptions (e.g., single optimal action) and actual human behavior (e.g., multiple optimal actions), thereby limiting robot generalizability, applicability, and demonstration feasibility. To address this, this paper proposes designing imitation learning algorithms with a focus on utilizing human behavioral characteristics, thereby embodying principles for capturing and exploiting actual demonstrator behavioral characteristics. This paper presents the first imitation learning framework, Bayesian Disturbance Injection (BDI), that typifies human behavioral characteristics by incorporating model flexibility, robustification, and risk sensitivity. Bayesian inference is used to learn flexible non-parametric multi-action policies, while simultaneously robustifying policies by injecting risk-sensitive disturbances to induce human recovery action and ensuring demonstration feasibility. Our method is evaluated through risk-sensitive simulations and real-robot experiments (e.g., table-sweep task, shaft-reach task and shaft-insertion task) using the UR5e 6-DOF robotic arm, to demonstrate the improved characterisation of behavior. Results show significant improvement in task performance, through improved flexibility, robustness as well as demonstration feasibility. △ Less

Submitted 7 November, 2022; originally announced November 2022.

Comments: 69 pages, 9 figures, accepted by Elsevier Neural Networks - Journal

arXiv:2205.04195 [pdf, other]

Disturbance-Injected Robust Imitation Learning with Task Achievement

Authors: Hirotaka Tahara, Hikaru Sasaki, Hanbit Oh, Brendan Michael, Takamitsu Matsubara

Abstract: Robust imitation learning using disturbance injections overcomes issues of limited variation in demonstrations. However, these methods assume demonstrations are optimal, and that policy stabilization can be learned via simple augmentations. In real-world scenarios, demonstrations are often of diverse-quality, and disturbance injection instead learns sub-optimal policies that fail to replicate desi… ▽ More Robust imitation learning using disturbance injections overcomes issues of limited variation in demonstrations. However, these methods assume demonstrations are optimal, and that policy stabilization can be learned via simple augmentations. In real-world scenarios, demonstrations are often of diverse-quality, and disturbance injection instead learns sub-optimal policies that fail to replicate desired behavior. To address this issue, this paper proposes a novel imitation learning framework that combines both policy robustification and optimal demonstration learning. Specifically, this combinatorial approach forces policy learning and disturbance injection optimization to focus on mainly learning from high task achievement demonstrations, while utilizing low achievement ones to decrease the number of samples needed. The effectiveness of the proposed method is verified through experiments using an excavation task in both simulations and a real robot, resulting in high-achieving policies that are more stable and robust to diverse-quality demonstrations. In addition, this method utilizes all of the weighted sub-optimal demonstrations without eliminating them, resulting in practical data efficiency benefits. △ Less

Submitted 9 May, 2022; originally announced May 2022.

Comments: 7 pages, Accepted by the 2022 International Conference on Robotics and Automation (ICRA 2022)

arXiv:2205.03552 [pdf, other]

Gaussian Process Self-triggered Policy Search in Weakly Observable Environments

Authors: Hikaru Sasaki, Terushi Hirabayashi, Kaoru Kawabata, Takamitsu Matsubara

Abstract: The environments of such large industrial machines as waste cranes in waste incineration plants are often weakly observable, where little information about the environmental state is contained in the observations due to technical difficulty or maintenance cost (e.g., no sensors for observing the state of the garbage to be handled). Based on the findings that skilled operators in such environments… ▽ More The environments of such large industrial machines as waste cranes in waste incineration plants are often weakly observable, where little information about the environmental state is contained in the observations due to technical difficulty or maintenance cost (e.g., no sensors for observing the state of the garbage to be handled). Based on the findings that skilled operators in such environments choose predetermined control strategies (e.g., grasping and scattering) and their durations based on sensor values, %thereby improving the robustness of their actions, we propose a novel non-parametric policy search algorithm: Gaussian process self-triggered policy search (GPSTPS). GPSTPS has two types of control policies: action and duration. A gating mechanism either maintains the action selected by the action policy for the duration specified by the duration policy or updates the action and duration by passing new observations to the policy; therefore, it is categorized as self-triggered. GPSTPS simultaneously learns both policies by trial and error based on sparse GP priors and variational learning to maximize the return. To verify the performance of our proposed method, we conducted experiments on garbage-grasping-scattering task for a waste crane with weak observations using a simulation and a robotic waste crane system. As experimental results, the proposed method acquired suitable policies to determine the action and duration based on the garbage's characteristics. △ Less

Submitted 7 May, 2022; originally announced May 2022.

Comments: Accepted for IEEE ICRA2022

arXiv:2111.12701 [pdf, other]

Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes

Authors: Sam Bond-Taylor, Peter Hessey, Hiroshi Sasaki, Toby P. Breckon, Chris G. Willcocks

Abstract: Whilst diffusion probabilistic models can generate high quality image content, key limitations remain in terms of both generating high-resolution imagery and their associated high computational requirements. Recent Vector-Quantized image models have overcome this limitation of image resolution but are prohibitively slow and unidirectional as they generate tokens via element-wise autoregressive sam… ▽ More Whilst diffusion probabilistic models can generate high quality image content, key limitations remain in terms of both generating high-resolution imagery and their associated high computational requirements. Recent Vector-Quantized image models have overcome this limitation of image resolution but are prohibitively slow and unidirectional as they generate tokens via element-wise autoregressive sampling from the prior. By contrast, in this paper we propose a novel discrete diffusion probabilistic model prior which enables parallel prediction of Vector-Quantized tokens by using an unconstrained Transformer architecture as the backbone. During training, tokens are randomly masked in an order-agnostic manner and the Transformer learns to predict the original tokens. This parallelism of Vector-Quantized token prediction in turn facilitates unconditional generation of globally consistent high-resolution and diverse imagery at a fraction of the computational expense. In this manner, we can generate image resolutions exceeding that of the original training set samples whilst additionally provisioning per-image likelihood estimates (in a departure from generative adversarial approaches). Our approach achieves state-of-the-art results in terms of Density (LSUN Bedroom: 1.51; LSUN Churches: 1.12; FFHQ: 1.20) and Coverage (LSUN Bedroom: 0.83; LSUN Churches: 0.73; FFHQ: 0.80), and performs competitively on FID (LSUN Bedroom: 3.64; LSUN Churches: 4.07; FFHQ: 6.11) whilst offering advantages in terms of both computation and reduced training set requirements. △ Less

Submitted 24 November, 2021; originally announced November 2021.

Comments: 19 pages, 14 figures

MSC Class: 68T01 (Primary); 68T07 (Secondary) ACM Class: I.5.0; I.4.0; G.3

arXiv:2106.07125 [pdf, other]

Variational Policy Search using Sparse Gaussian Process Priors for Learning Multimodal Optimal Actions

Authors: Hikaru Sasaki, Takamitsu Matsubara

Abstract: Policy search reinforcement learning has been drawing much attention as a method of learning a robot control policy. In particular, policy search using such non-parametric policies as Gaussian process regression can learn optimal actions with high-dimensional and redundant sensors as input. However, previous methods implicitly assume that the optimal action becomes unique for each state. This assu… ▽ More Policy search reinforcement learning has been drawing much attention as a method of learning a robot control policy. In particular, policy search using such non-parametric policies as Gaussian process regression can learn optimal actions with high-dimensional and redundant sensors as input. However, previous methods implicitly assume that the optimal action becomes unique for each state. This assumption can severely limit such practical applications as robot manipulations since designing a reward function that appears in only one optimal action for complex tasks is difficult. The previous methods might have caused critical performance deterioration because the typical non-parametric policies cannot capture the optimal actions due to their unimodality. We propose novel approaches in non-parametric policy searches with multiple optimal actions and offer two different algorithms commonly based on a sparse Gaussian process prior and variational Bayesian inference. The following are the key ideas: 1) multimodality for capturing multiple optimal actions and 2) mode-seeking for capturing one optimal action by ignoring the others. First, we propose a multimodal sparse Gaussian process policy search that uses multiple overlapped GPs as a prior. Second, we propose a mode-seeking sparse Gaussian process policy search that uses the student-t distribution for a likelihood function. The effectiveness of those algorithms is demonstrated through applications to object manipulation tasks with multiple optimal actions in simulations. △ Less

Submitted 13 June, 2021; originally announced June 2021.

Comments: Accepted by Neural Networks

arXiv:2104.05358 [pdf, other]

UNIT-DDPM: UNpaired Image Translation with Denoising Diffusion Probabilistic Models

Authors: Hiroshi Sasaki, Chris G. Willcocks, Toby P. Breckon

Abstract: We propose a novel unpaired image-to-image translation method that uses denoising diffusion probabilistic models without requiring adversarial training. Our method, UNpaired Image Translation with Denoising Diffusion Probabilistic Models (UNIT-DDPM), trains a generative model to infer the joint distribution of images over both domains as a Markov chain by minimising a denoising score matching obje… ▽ More We propose a novel unpaired image-to-image translation method that uses denoising diffusion probabilistic models without requiring adversarial training. Our method, UNpaired Image Translation with Denoising Diffusion Probabilistic Models (UNIT-DDPM), trains a generative model to infer the joint distribution of images over both domains as a Markov chain by minimising a denoising score matching objective conditioned on the other domain. In particular, we update both domain translation models simultaneously, and we generate target domain images by a denoising Markov Chain Monte Carlo approach that is conditioned on the input source domain images, based on Langevin dynamics. Our approach provides stable model training for image-to-image translation and generates high-quality image outputs. This enables state-of-the-art Fréchet Inception Distance (FID) performance on several public datasets, including both colour and multispectral imagery, significantly outperforming the contemporary adversarial image-to-image translation methods. △ Less

Submitted 12 April, 2021; originally announced April 2021.

Comments: 10 pages, 8 figures

arXiv:2103.13623 [pdf, other]

doi 10.1109/ICRA48506.2021.9561573.

Bayesian Disturbance Injection: Robust Imitation Learning of Flexible Policies

Authors: Hanbit Oh, Hikaru Sasaki, Brendan Michael, Takamitsu Matsubara

Abstract: Scenarios requiring humans to choose from multiple seemingly optimal actions are commonplace, however standard imitation learning often fails to capture this behavior. Instead, an over-reliance on replicating expert actions induces inflexible and unstable policies, leading to poor generalizability in an application. To address the problem, this paper presents the first imitation learning framework… ▽ More Scenarios requiring humans to choose from multiple seemingly optimal actions are commonplace, however standard imitation learning often fails to capture this behavior. Instead, an over-reliance on replicating expert actions induces inflexible and unstable policies, leading to poor generalizability in an application. To address the problem, this paper presents the first imitation learning framework that incorporates Bayesian variational inference for learning flexible non-parametric multi-action policies, while simultaneously robustifying the policies against sources of error, by introducing and optimizing disturbances to create a richer demonstration dataset. This combinatorial approach forces the policy to adapt to challenging situations, enabling stable multi-action policies to be learned efficiently. The effectiveness of our proposed method is evaluated through simulations and real-robot experiments for a table-sweep task using the UR3 6-DOF robotic arm. Results show that, through improved flexibility and robustness, the learning performance and control safety are better than comparison methods. △ Less

Submitted 7 November, 2022; v1 submitted 25 March, 2021; originally announced March 2021.

Comments: 7 pages, Accepted by the 2021 International Conference on Robotics and Automation (ICRA 2021)

arXiv:2101.02083 [pdf, other]

Representation learning for maximization of MI, nonlinear ICA and nonlinear subspaces with robust density ratio estimation

Authors: Hiroaki Sasaki, Takashi Takenouchi

Abstract: Contrastive learning is a recent promising approach in unsupervised representation learning where a feature representation of data is learned by solving a pseudo classification problem from unlabelled data. However, it is not straightforward to understand what representation contrastive learning yields. In addition, contrastive learning is often based on the maximum likelihood estimation, which te… ▽ More Contrastive learning is a recent promising approach in unsupervised representation learning where a feature representation of data is learned by solving a pseudo classification problem from unlabelled data. However, it is not straightforward to understand what representation contrastive learning yields. In addition, contrastive learning is often based on the maximum likelihood estimation, which tends to be vulnerable to the contamination by outliers. To promote the understanding to contrastive learning, this paper first theoretically shows a connection to maximization of mutual information (MI). Our result indicates that density ratio estimation is necessary and sufficient for maximization of MI under some conditions. Thus, contrastive learning related to density ratio estimation as done in popular objective functions can be interpreted as maximizing MI. Next, with the density ratio, we establish new recovery conditions for the latent source components in nonlinear independent component analysis (ICA). In contrast with existing work, the established conditions include a novel insight for the dimensionality of data, which is clearly supported by numerical experiments. Furthermore, inspired by nonlinear ICA, we propose a novel framework to estimate a nonlinear subspace for lower-dimensional latent source components, and some theoretical conditions for the subspace estimation are established with the density ratio. Then, we propose a practical method through outlier-robust density ratio estimation, which can be seen as performing maximization of MI, nonlinear ICA or nonlinear subspace estimation. Moreover, a sample-efficient nonlinear ICA method is also proposed. We theoretically investigate outlier-robustness of the proposed methods. Finally, the usefulness of the proposed methods is numerically demonstrated in nonlinear ICA and through application to linear classification. △ Less

Submitted 8 August, 2022; v1 submitted 6 January, 2021; originally announced January 2021.

arXiv:2009.08005 [pdf]

doi 10.1016/j.compbiomed.2020.104009

Model-based approach for analyzing prevalence of nuclear cataracts in elderly residents

Authors: Sachiko Kodera, Akimasa Hirata, Fumiaki Miura, Essam A. Rashed, Natsuko Hatsusaka, Naoki Yamamoto, Eri Kubo, Hiroshi Sasaki

Abstract: Recent epidemiological studies have hypothesized that the prevalence of cortical cataracts is closely related to ultraviolet radiation. However, the prevalence of nuclear cataracts is higher in elderly people in tropical areas than in temperate areas. The dominant factors inducing nuclear cataracts have been widely debated. In this study, the temperature increase in the lens due to exposure to amb… ▽ More Recent epidemiological studies have hypothesized that the prevalence of cortical cataracts is closely related to ultraviolet radiation. However, the prevalence of nuclear cataracts is higher in elderly people in tropical areas than in temperate areas. The dominant factors inducing nuclear cataracts have been widely debated. In this study, the temperature increase in the lens due to exposure to ambient conditions was computationally quantified in subjects of 50-60 years of age in tropical and temperate areas, accounting for differences in thermoregulation. A thermoregulatory response model was extended to consider elderly people in tropical areas. The time course of lens temperature for different weather conditions in five cities in Asia was computed. The temperature was higher around the mid and posterior part of the lens, which coincides with the position of the nuclear cataract. The duration of higher temperatures in the lens varied, although the daily maximum temperatures were comparable. A strong correlation (adjusted R2 > 0.85) was observed between the prevalence of nuclear cataract and the computed cumulative thermal dose in the lens. We propose the use of a cumulative thermal dose to assess the prevalence of nuclear cataracts. Cumulative wet-bulb globe temperature, a new metric computed from weather data, would be useful for practical assessment in different cities. △ Less

Submitted 16 September, 2020; originally announced September 2020.

Comments: Submitted to Computers in Biology and Medicine

Journal ref: Computers in Biology and Medicine, 2020

arXiv:2005.02436 [pdf, other]

Data Augmentation via Mixed Class Interpolation using Cycle-Consistent Generative Adversarial Networks Applied to Cross-Domain Imagery

Authors: Hiroshi Sasaki, Chris G. Willcocks, Toby P. Breckon

Abstract: Machine learning driven object detection and classification within non-visible imagery has an important role in many fields such as night vision, all-weather surveillance and aviation security. However, such applications often suffer due to the limited quantity and variety of non-visible spectral domain imagery, in contrast to the high data availability of visible-band imagery that readily enables… ▽ More Machine learning driven object detection and classification within non-visible imagery has an important role in many fields such as night vision, all-weather surveillance and aviation security. However, such applications often suffer due to the limited quantity and variety of non-visible spectral domain imagery, in contrast to the high data availability of visible-band imagery that readily enables contemporary deep learning driven detection and classification approaches. To address this problem, this paper proposes and evaluates a novel data augmentation approach that leverages the more readily available visible-band imagery via a generative domain transfer model. The model can synthesise large volumes of non-visible domain imagery by image-to-image (I2I) translation from the visible image domain. Furthermore, we show that the generation of interpolated mixed class (non-visible domain) image examples via our novel Conditional CycleGAN Mixup Augmentation (C2GMA) methodology can lead to a significant improvement in the quality of non-visible domain classification tasks that otherwise suffer due to limited data availability. Focusing on classification within the Synthetic Aperture Radar (SAR) domain, our approach is evaluated on a variation of the Statoil/C-CORE Iceberg Classifier Challenge dataset and achieves 75.4% accuracy, demonstrating a significant improvement when compared against traditional data augmentation strategies (Rotation, Mixup, and MixCycleGAN). △ Less

Submitted 1 January, 2021; v1 submitted 5 May, 2020; originally announced May 2020.

Comments: 9 pages, 9 figures, accepted at the 25th International Conference on Pattern Recognition (ICPR 2020)

arXiv:1911.00265 [pdf, other]

Robust contrastive learning and nonlinear ICA in the presence of outliers

Authors: Hiroaki Sasaki, Takashi Takenouchi, Ricardo Monti, Aapo Hyvärinen

Abstract: Nonlinear independent component analysis (ICA) is a general framework for unsupervised representation learning, and aimed at recovering the latent variables in data. Recent practical methods perform nonlinear ICA by solving a series of classification problems based on logistic regression. However, it is well-known that logistic regression is vulnerable to outliers, and thus the performance can be… ▽ More Nonlinear independent component analysis (ICA) is a general framework for unsupervised representation learning, and aimed at recovering the latent variables in data. Recent practical methods perform nonlinear ICA by solving a series of classification problems based on logistic regression. However, it is well-known that logistic regression is vulnerable to outliers, and thus the performance can be strongly weakened by outliers. In this paper, we first theoretically analyze nonlinear ICA models in the presence of outliers. Our analysis implies that estimation in nonlinear ICA can be seriously hampered when outliers exist on the tails of the (noncontaminated) target density, which happens in a typical case of contamination by outliers. We develop two robust nonlinear ICA methods based on the γ-divergence, which is a robust alternative to the KL-divergence in logistic regression. The proposed methods are shown to have desired robustness properties in the context of nonlinear ICA. We also experimentally demonstrate that the proposed methods are very robust and outperform existing methods in the presence of outliers. Finally, the proposed method is applied to ICA-based causal discovery and shown to find a plausible causal relationship on fMRI data. △ Less

Submitted 1 November, 2019; originally announced November 2019.

arXiv:1910.08280 [pdf, other]

Robust modal regression with direct log-density derivative estimation

Authors: Hiroaki Sasaki, Tomoya Sakai, Takafumi Kanamori

Abstract: Modal regression is aimed at estimating the global mode (i.e., global maximum) of the conditional density function of the output variable given input variables, and has led to regression methods robust against heavy-tailed or skewed noises. The conditional mode is often estimated through maximization of the modal regression risk (MRR). In order to apply a gradient method for the maximization, the… ▽ More Modal regression is aimed at estimating the global mode (i.e., global maximum) of the conditional density function of the output variable given input variables, and has led to regression methods robust against heavy-tailed or skewed noises. The conditional mode is often estimated through maximization of the modal regression risk (MRR). In order to apply a gradient method for the maximization, the fundamental challenge is accurate approximation of the gradient of MRR, not MRR itself. To overcome this challenge, in this paper, we take a novel approach of directly approximating the gradient of MRR. To approximate the gradient, we develop kernelized and neural-network-based versions of the least-squares log-density derivative estimator, which directly approximates the derivative of the log-density without density estimation. With direct approximation of the MRR gradient, we first propose a modal regression method with kernels, and derive a new parameter update rule based on a fixed-point method. Then, the derived update rule is theoretically proved to have a monotonic hill-climbing property towards the conditional mode. Furthermore, we indicate that our approach of directly approximating the gradient is compatible with recent sophisticated stochastic gradient methods (e.g., Adam), and then propose another modal regression method based on neural networks. Finally, the superior performance of the proposed methods is demonstrated on various artificial and benchmark datasets. △ Less

Submitted 18 October, 2019; originally announced October 2019.

arXiv:1906.01838 [pdf, other]

Practical Byte-Granular Memory Blacklisting using Califorms

Authors: Hiroshi Sasaki, Miguel A. Arroyo, M. Tarek Ibn Ziad, Koustubha Bhat, Kanad Sinha, Simha Sethumadhavan

Abstract: Recent rapid strides in memory safety tools and hardware have improved software quality and security. While coarse-grained memory safety has improved, achieving memory safety at the granularity of individual objects remains a challenge due to high performance overheads which can be between ~1.7x-2.2x. In this paper, we present a novel idea called Califorms, and associated program observations, to… ▽ More Recent rapid strides in memory safety tools and hardware have improved software quality and security. While coarse-grained memory safety has improved, achieving memory safety at the granularity of individual objects remains a challenge due to high performance overheads which can be between ~1.7x-2.2x. In this paper, we present a novel idea called Califorms, and associated program observations, to obtain a low overhead security solution for practical, byte-granular memory safety. The idea we build on is called memory blacklisting, which prohibits a program from accessing certain memory regions based on program semantics. State of the art hardware-supported memory blacklisting while much faster than software blacklisting creates memory fragmentation (of the order of few bytes) for each use of the blacklisted location. In this paper, we observe that metadata used for blacklisting can be stored in dead spaces in a program's data memory and that this metadata can be integrated into microarchitecture by changing the cache line format. Using these observations, Califorms based system proposed in this paper reduces the performance overheads of memory safety to ~1.02x-1.16x while providing byte-granular protection and maintaining very low hardware overheads. The low overhead offered by Califorms enables always on, memory safety for small and large objects alike, and the fundamental idea of storing metadata in empty spaces, and microarchitecture can be used for other security and performance applications. △ Less

Submitted 10 June, 2019; v1 submitted 5 June, 2019; originally announced June 2019.

arXiv:1806.01754 [pdf, ps, other]

Neural-Kernelized Conditional Density Estimation

Authors: Hiroaki Sasaki, Aapo Hyvärinen

Abstract: Conditional density estimation is a general framework for solving various problems in machine learning. Among existing methods, non-parametric and/or kernel-based methods are often difficult to use on large datasets, while methods based on neural networks usually make restrictive parametric assumptions on the probability densities. Here, we propose a novel method for estimating the conditional den… ▽ More Conditional density estimation is a general framework for solving various problems in machine learning. Among existing methods, non-parametric and/or kernel-based methods are often difficult to use on large datasets, while methods based on neural networks usually make restrictive parametric assumptions on the probability densities. Here, we propose a novel method for estimating the conditional density based on score matching. In contrast to existing methods, we employ scalable neural networks, but do not make explicit parametric assumptions on densities. The key challenge in applying score matching to neural networks is computation of the first- and second-order derivatives of a model for the log-density. We tackle this challenge by developing a new neural-kernelized approach, which can be applied on large datasets with stochastic gradient descent, while the reproducing kernels allow for easy computation of the derivatives needed in score matching. We show that the neural-kernelized function approximator has universal approximation capability and that our method is consistent in conditional density estimation. We numerically demonstrate that our method is useful in high-dimensional conditional density estimation, and compares favourably with existing methods. Finally, we prove that the proposed method has interesting connections to two probabilistically principled frameworks of representation learning: Nonlinear sufficient dimension reduction and nonlinear independent component analysis. △ Less

Submitted 5 June, 2018; originally announced June 2018.

arXiv:1805.08651 [pdf, other]

Nonlinear ICA Using Auxiliary Variables and Generalized Contrastive Learning

Authors: Aapo Hyvarinen, Hiroaki Sasaki, Richard E. Turner

Abstract: Nonlinear ICA is a fundamental problem for unsupervised representation learning, emphasizing the capacity to recover the underlying latent variables generating the data (i.e., identifiability). Recently, the very first identifiability proofs for nonlinear ICA have been proposed, leveraging the temporal structure of the independent components. Here, we propose a general framework for nonlinear ICA,… ▽ More Nonlinear ICA is a fundamental problem for unsupervised representation learning, emphasizing the capacity to recover the underlying latent variables generating the data (i.e., identifiability). Recently, the very first identifiability proofs for nonlinear ICA have been proposed, leveraging the temporal structure of the independent components. Here, we propose a general framework for nonlinear ICA, which, as a special case, can make use of temporal structure. It is based on augmenting the data by an auxiliary variable, such as the time index, the history of the time series, or any other available information. We propose to learn nonlinear ICA by discriminating between true augmented data, or data in which the auxiliary variable has been randomized. This enables the framework to be implemented algorithmically through logistic regression, possibly in a neural network. We provide a comprehensive proof of the identifiability of the model as well as the consistency of our estimation method. The approach not only provides a general theoretical framework combining and generalizing previously proposed nonlinear ICA models and algorithms, but also brings practical advantages. △ Less

Submitted 4 February, 2019; v1 submitted 22 May, 2018; originally announced May 2018.

Comments: Camera-ready version of article accepted for AISTATS2019

arXiv:1504.05628 [pdf, ps, other]

doi 10.1109/ISIT.2015.7282544

Key Rate of the B92 Quantum Key Distribution Protocol with Finite Qubits

Authors: Hiroaki Sasaki, Ryutaroh Matsumoto, Tomohiko Uyematsu

Abstract: The key rate of the B92 quantum key distribution protocol had not been reported before this research when the number of qubits is finite. We compute it by using the security analysis framework proposed by Scarani and Renner in 2008. The key rate of the B92 quantum key distribution protocol had not been reported before this research when the number of qubits is finite. We compute it by using the security analysis framework proposed by Scarani and Renner in 2008. △ Less

Submitted 21 April, 2015; originally announced April 2015.

Comments: 4 pages, 2 figures, IEEEtran.cls. Some overlap with arXiv:1301.5083. Accepted for presentation in 2015 IEEE International Symposium on Information Theory, Hong Kong, June 14-19, 2015

Showing 1–19 of 19 results for author: Sasaki, H