-
Recovering short generators via negative moments of Dirichlet $L$-functions
Authors:
Iu-Iong Ng,
Yuichiro Toma
Abstract:
In 2016, Cramer, Ducas, Peikert and, Regev proposed an efficient algorithm for recovering short generators of principal ideals in $q$-th cyclotomic fields with $q$ being a prime power. In this paper, we improve their analysis of the dual basis of the log-cyclotomic-unit lattice under the Generalised Riemann Hypothesis and in the case that $q$ is a prime number by the negative square moment of Diri…
▽ More
In 2016, Cramer, Ducas, Peikert and, Regev proposed an efficient algorithm for recovering short generators of principal ideals in $q$-th cyclotomic fields with $q$ being a prime power. In this paper, we improve their analysis of the dual basis of the log-cyclotomic-unit lattice under the Generalised Riemann Hypothesis and in the case that $q$ is a prime number by the negative square moment of Dirichlet $L$-functions at $s=1$. As an implication, we obtain a better lower bound on the success probability for the algorithm in this special case. In order to prove our main result, we also give an analysis of the behaviour of negative $2k$-th moments of Dirichlet $L$-functions at $s=1$.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Gene Regulatory Network Inference in the Presence of Dropouts: a Causal View
Authors:
Haoyue Dai,
Ignavier Ng,
Gongxu Luo,
Peter Spirtes,
Petar Stojanov,
Kun Zhang
Abstract:
Gene regulatory network inference (GRNI) is a challenging problem, particularly owing to the presence of zeros in single-cell RNA sequencing data: some are biological zeros representing no gene expression, while some others are technical zeros arising from the sequencing procedure (aka dropouts), which may bias GRNI by distorting the joint distribution of the measured gene expressions. Existing ap…
▽ More
Gene regulatory network inference (GRNI) is a challenging problem, particularly owing to the presence of zeros in single-cell RNA sequencing data: some are biological zeros representing no gene expression, while some others are technical zeros arising from the sequencing procedure (aka dropouts), which may bias GRNI by distorting the joint distribution of the measured gene expressions. Existing approaches typically handle dropout error via imputation, which may introduce spurious relations as the true joint distribution is generally unidentifiable. To tackle this issue, we introduce a causal graphical model to characterize the dropout mechanism, namely, Causal Dropout Model. We provide a simple yet effective theoretical result: interestingly, the conditional independence (CI) relations in the data with dropouts, after deleting the samples with zero values (regardless if technical or not) for the conditioned variables, are asymptotically identical to the CI relations in the original data without dropouts. This particular test-wise deletion procedure, in which we perform CI tests on the samples without zeros for the conditioned variables, can be seamlessly integrated with existing structure learning approaches including constraint-based and greedy score-based methods, thus giving rise to a principled framework for GRNI in the presence of dropouts. We further show that the causal dropout model can be validated from data, and many existing statistical models to handle dropouts fit into our model as specific parametric instances. Empirical evaluation on synthetic, curated, and real-world experimental transcriptomic data comprehensively demonstrate the efficacy of our method.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Local Causal Discovery with Linear non-Gaussian Cyclic Models
Authors:
Haoyue Dai,
Ignavier Ng,
Yujia Zheng,
Zhengqing Gao,
Kun Zhang
Abstract:
Local causal discovery is of great practical significance, as there are often situations where the discovery of the global causal structure is unnecessary, and the interest lies solely on a single target variable. Most existing local methods utilize conditional independence relations, providing only a partially directed graph, and assume acyclicity for the ground-truth structure, even though real-…
▽ More
Local causal discovery is of great practical significance, as there are often situations where the discovery of the global causal structure is unnecessary, and the interest lies solely on a single target variable. Most existing local methods utilize conditional independence relations, providing only a partially directed graph, and assume acyclicity for the ground-truth structure, even though real-world scenarios often involve cycles like feedback mechanisms. In this work, we present a general, unified local causal discovery method with linear non-Gaussian models, whether they are cyclic or acyclic. We extend the application of independent component analysis from the global context to independent subspace analysis, enabling the exact identification of the equivalent local directed structures and causal strengths from the Markov blanket of the target variable. We also propose an alternative regression-based method in the particular acyclic scenarios. Our identifiability results are empirically validated using both synthetic and real-world datasets.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Federated Causal Discovery from Heterogeneous Data
Authors:
Loka Li,
Ignavier Ng,
Gongxu Luo,
Biwei Huang,
Guangyi Chen,
Tongliang Liu,
Bin Gu,
Kun Zhang
Abstract:
Conventional causal discovery methods rely on centralized data, which is inconsistent with the decentralized nature of data in many real-world situations. This discrepancy has motivated the development of federated causal discovery (FCD) approaches. However, existing FCD methods may be limited by their potentially restrictive assumptions of identifiable functional causal models or homogeneous data…
▽ More
Conventional causal discovery methods rely on centralized data, which is inconsistent with the decentralized nature of data in many real-world situations. This discrepancy has motivated the development of federated causal discovery (FCD) approaches. However, existing FCD methods may be limited by their potentially restrictive assumptions of identifiable functional causal models or homogeneous data distributions, narrowing their applicability in diverse scenarios. In this paper, we propose a novel FCD method attempting to accommodate arbitrary causal models and heterogeneous data. We first utilize a surrogate variable corresponding to the client index to account for the data heterogeneity across different clients. We then develop a federated conditional independence test (FCIT) for causal skeleton discovery and establish a federated independent change principle (FICP) to determine causal directions. These approaches involve constructing summary statistics as a proxy of the raw data to protect data privacy. Owing to the nonparametric properties, FCIT and FICP make no assumption about particular functional forms, thereby facilitating the handling of arbitrary causal models. We conduct extensive experiments on synthetic and real datasets to show the efficacy of our method. The code is available at https://github.com/lokali/FedCDH.git.
△ Less
Submitted 26 February, 2024; v1 submitted 20 February, 2024;
originally announced February 2024.
-
Causal Representation Learning from Multiple Distributions: A General Setting
Authors:
Kun Zhang,
Shaoan Xie,
Ignavier Ng,
Yujia Zheng
Abstract:
In many problems, the measured variables (e.g., image pixels) are just mathematical functions of the hidden causal variables (e.g., the underlying concepts or objects). For the purpose of making predictions in changing environments or making proper changes to the system, it is helpful to recover the hidden causal variables $Z_i$ and their causal relations represented by graph $\mathcal{G}_Z$. This…
▽ More
In many problems, the measured variables (e.g., image pixels) are just mathematical functions of the hidden causal variables (e.g., the underlying concepts or objects). For the purpose of making predictions in changing environments or making proper changes to the system, it is helpful to recover the hidden causal variables $Z_i$ and their causal relations represented by graph $\mathcal{G}_Z$. This problem has recently been known as causal representation learning. This paper is concerned with a general, completely nonparametric setting of causal representation learning from multiple distributions (arising from heterogeneous data or nonstationary time series), without assuming hard interventions behind distribution changes. We aim to develop general solutions in this fundamental case; as a by product, this helps see the unique benefit offered by other assumptions such as parametric causal models or hard interventions. We show that under the sparsity constraint on the recovered graph over the latent variables and suitable sufficient change conditions on the causal influences, interestingly, one can recover the moralized graph of the underlying directed acyclic graph, and the recovered latent variables and their relations are related to the underlying causal model in a specific, nontrivial way. In some cases, each latent variable can even be recovered up to component-wise transformations. Experimental results verify our theoretical claims.
△ Less
Submitted 9 April, 2024; v1 submitted 7 February, 2024;
originally announced February 2024.
-
A Versatile Causal Discovery Framework to Allow Causally-Related Hidden Variables
Authors:
Xinshuai Dong,
Biwei Huang,
Ignavier Ng,
Xiangchen Song,
Yujia Zheng,
Songyao Jin,
Roberto Legaspi,
Peter Spirtes,
Kun Zhang
Abstract:
Most existing causal discovery methods rely on the assumption of no latent confounders, limiting their applicability in solving real-life problems. In this paper, we introduce a novel, versatile framework for causal discovery that accommodates the presence of causally-related hidden variables almost everywhere in the causal network (for instance, they can be effects of observed variables), based o…
▽ More
Most existing causal discovery methods rely on the assumption of no latent confounders, limiting their applicability in solving real-life problems. In this paper, we introduce a novel, versatile framework for causal discovery that accommodates the presence of causally-related hidden variables almost everywhere in the causal network (for instance, they can be effects of observed variables), based on rank information of covariance matrix over observed variables. We start by investigating the efficacy of rank in comparison to conditional independence and, theoretically, establish necessary and sufficient conditions for the identifiability of certain latent structural patterns. Furthermore, we develop a Rank-based Latent Causal Discovery algorithm, RLCD, that can efficiently locate hidden variables, determine their cardinalities, and discover the entire causal structure over both measured and hidden ones. We also show that, under certain graphical conditions, RLCD correctly identifies the Markov Equivalence Class of the whole latent causal graph asymptotically. Experimental results on both synthetic and real-world personality data sets demonstrate the efficacy of the proposed approach in finite-sample cases.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
Generalized Precision Matrix for Scalable Estimation of Nonparametric Markov Networks
Authors:
Yujia Zheng,
Ignavier Ng,
Yewen Fan,
Kun Zhang
Abstract:
A Markov network characterizes the conditional independence structure, or Markov property, among a set of random variables. Existing work focuses on specific families of distributions (e.g., exponential families) and/or certain structures of graphs, and most of them can only handle variables of a single data type (continuous or discrete). In this work, we characterize the conditional independence…
▽ More
A Markov network characterizes the conditional independence structure, or Markov property, among a set of random variables. Existing work focuses on specific families of distributions (e.g., exponential families) and/or certain structures of graphs, and most of them can only handle variables of a single data type (continuous or discrete). In this work, we characterize the conditional independence structure in general distributions for all data types (i.e., continuous, discrete, and mixed-type) with a Generalized Precision Matrix (GPM). Besides, we also allow general functional relations among variables, thus giving rise to a Markov network structure learning algorithm in one of the most general settings. To deal with the computational challenge of the problem, especially for large graphs, we unify all cases under the same umbrella of a regularized score matching framework. We validate the theoretical results and demonstrate the scalability empirically in various settings.
△ Less
Submitted 18 May, 2023;
originally announced May 2023.
-
Structure Learning with Continuous Optimization: A Sober Look and Beyond
Authors:
Ignavier Ng,
Biwei Huang,
Kun Zhang
Abstract:
This paper investigates in which cases continuous optimization for directed acyclic graph (DAG) structure learning can and cannot perform well and why this happens, and suggests possible directions to make the search procedure more reliable. Reisach et al. (2021) suggested that the remarkable performance of several continuous structure learning approaches is primarily driven by a high agreement be…
▽ More
This paper investigates in which cases continuous optimization for directed acyclic graph (DAG) structure learning can and cannot perform well and why this happens, and suggests possible directions to make the search procedure more reliable. Reisach et al. (2021) suggested that the remarkable performance of several continuous structure learning approaches is primarily driven by a high agreement between the order of increasing marginal variances and the topological order, and demonstrated that these approaches do not perform well after data standardization. We analyze this phenomenon for continuous approaches assuming equal and non-equal noise variances, and show that the statement may not hold in either case by providing counterexamples, justifications, and possible alternative explanations. We further demonstrate that nonconvexity may be a main concern especially for the non-equal noise variances formulation, while recent advances in continuous structure learning fail to achieve improvement in this case. Our findings suggest that future works should take into account the non-equal noise variances formulation to handle more general settings and for a more comprehensive empirical evaluation. Lastly, we provide insights into other aspects of the search procedure, including thresholding and sparsity, and show that they play an important role in the final solutions.
△ Less
Submitted 4 April, 2023;
originally announced April 2023.
-
Truncated Matrix Power Iteration for Differentiable DAG Learning
Authors:
Zhen Zhang,
Ignavier Ng,
Dong Gong,
Yuhang Liu,
Ehsan M Abbasnejad,
Mingming Gong,
Kun Zhang,
Javen Qinfeng Shi
Abstract:
Recovering underlying Directed Acyclic Graph (DAG) structures from observational data is highly challenging due to the combinatorial nature of the DAG-constrained optimization problem. Recently, DAG learning has been cast as a continuous optimization problem by characterizing the DAG constraint as a smooth equality one, generally based on polynomials over adjacency matrices. Existing methods place…
▽ More
Recovering underlying Directed Acyclic Graph (DAG) structures from observational data is highly challenging due to the combinatorial nature of the DAG-constrained optimization problem. Recently, DAG learning has been cast as a continuous optimization problem by characterizing the DAG constraint as a smooth equality one, generally based on polynomials over adjacency matrices. Existing methods place very small coefficients on high-order polynomial terms for stabilization, since they argue that large coefficients on the higher-order terms are harmful due to numeric exploding. On the contrary, we discover that large coefficients on higher-order terms are beneficial for DAG learning, when the spectral radiuses of the adjacency matrices are small, and that larger coefficients for higher-order terms can approximate the DAG constraints much better than the small counterparts. Based on this, we propose a novel DAG learning method with efficient truncated matrix power iteration to approximate geometric series based DAG constraints. Empirically, our DAG learning method outperforms the previous state-of-the-arts in various settings, often by a factor of $3$ or more in terms of structural Hamming distance.
△ Less
Submitted 20 December, 2022; v1 submitted 30 August, 2022;
originally announced August 2022.
-
On the Identifiability of Nonlinear ICA: Sparsity and Beyond
Authors:
Yujia Zheng,
Ignavier Ng,
Kun Zhang
Abstract:
Nonlinear independent component analysis (ICA) aims to recover the underlying independent latent sources from their observable nonlinear mixtures. How to make the nonlinear ICA model identifiable up to certain trivial indeterminacies is a long-standing problem in unsupervised learning. Recent breakthroughs reformulate the standard independence assumption of sources as conditional independence give…
▽ More
Nonlinear independent component analysis (ICA) aims to recover the underlying independent latent sources from their observable nonlinear mixtures. How to make the nonlinear ICA model identifiable up to certain trivial indeterminacies is a long-standing problem in unsupervised learning. Recent breakthroughs reformulate the standard independence assumption of sources as conditional independence given some auxiliary variables (e.g., class labels and/or domain/time indexes) as weak supervision or inductive bias. However, nonlinear ICA with unconditional priors cannot benefit from such developments. We explore an alternative path and consider only assumptions on the mixing process, such as Structural Sparsity. We show that under specific instantiations of such constraints, the independent latent sources can be identified from their nonlinear mixtures up to a permutation and a component-wise transformation, thus achieving nontrivial identifiability of nonlinear ICA without auxiliary variables. We provide estimation methods and validate the theoretical results experimentally. The results on image data suggest that our conditions may hold in a number of practical data generating processes.
△ Less
Submitted 25 February, 2024; v1 submitted 15 June, 2022;
originally announced June 2022.
-
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Authors:
Aarohi Srivastava,
Abhinav Rastogi,
Abhishek Rao,
Abu Awal Md Shoeb,
Abubakar Abid,
Adam Fisch,
Adam R. Brown,
Adam Santoro,
Aditya Gupta,
Adrià Garriga-Alonso,
Agnieszka Kluska,
Aitor Lewkowycz,
Akshat Agarwal,
Alethea Power,
Alex Ray,
Alex Warstadt,
Alexander W. Kocurek,
Ali Safaya,
Ali Tazarv,
Alice Xiang,
Alicia Parrish,
Allen Nie,
Aman Hussain,
Amanda Askell,
Amanda Dsouza
, et al. (426 additional authors not shown)
Abstract:
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur…
▽ More
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting.
△ Less
Submitted 12 June, 2023; v1 submitted 9 June, 2022;
originally announced June 2022.
-
MissDAG: Causal Discovery in the Presence of Missing Data with Continuous Additive Noise Models
Authors:
Erdun Gao,
Ignavier Ng,
Mingming Gong,
Li Shen,
Wei Huang,
Tongliang Liu,
Kun Zhang,
Howard Bondell
Abstract:
State-of-the-art causal discovery methods usually assume that the observational data is complete. However, the missing data problem is pervasive in many practical scenarios such as clinical trials, economics, and biology. One straightforward way to address the missing data problem is first to impute the data using off-the-shelf imputation methods and then apply existing causal discovery methods. H…
▽ More
State-of-the-art causal discovery methods usually assume that the observational data is complete. However, the missing data problem is pervasive in many practical scenarios such as clinical trials, economics, and biology. One straightforward way to address the missing data problem is first to impute the data using off-the-shelf imputation methods and then apply existing causal discovery methods. However, such a two-step method may suffer from suboptimality, as the imputation algorithm may introduce bias for modeling the underlying data distribution. In this paper, we develop a general method, which we call MissDAG, to perform causal discovery from data with incomplete observations. Focusing mainly on the assumptions of ignorable missingness and the identifiable additive noise models (ANMs), MissDAG maximizes the expected likelihood of the visible part of observations under the expectation-maximization (EM) framework. In the E-step, in cases where computing the posterior distributions of parameters in closed-form is not feasible, Monte Carlo EM is leveraged to approximate the likelihood. In the M-step, MissDAG leverages the density transformation to model the noise distributions with simpler and specific formulations by virtue of the ANMs and uses a likelihood-based causal discovery algorithm with directed acyclic graph constraint. We demonstrate the flexibility of MissDAG for incorporating various causal discovery algorithms and its efficacy through extensive simulations and real data experiments.
△ Less
Submitted 16 January, 2023; v1 submitted 27 May, 2022;
originally announced May 2022.
-
Quantum Approximate Counting for Markov Chains and Application to Collision Counting
Authors:
François Le Gall,
Iu-Iong Ng
Abstract:
In this paper we show how to generalize the quantum approximate counting technique developed by Brassard, Høyer and Tapp [ICALP 1998] to a more general setting: estimating the number of marked states of a Markov chain (a Markov chain can be seen as a random walk over a graph with weighted edges). This makes it possible to construct quantum approximate counting algorithms from quantum search algori…
▽ More
In this paper we show how to generalize the quantum approximate counting technique developed by Brassard, Høyer and Tapp [ICALP 1998] to a more general setting: estimating the number of marked states of a Markov chain (a Markov chain can be seen as a random walk over a graph with weighted edges). This makes it possible to construct quantum approximate counting algorithms from quantum search algorithms based on the powerful "quantum walk based search" framework established by Magniez, Nayak, Roland and Santha [SIAM Journal on Computing 2011]. As an application, we apply this approach to the quantum element distinctness algorithm by Ambainis [SIAM Journal on Computing 2007]: for two injective functions over a set of $N$ elements, we obtain a quantum algorithm that estimates the number $m$ of collisions of the two functions within relative error $ε$ by making $\tilde{O}\left(\frac{1}{ε^{25/24}}\big(\frac{N}{\sqrt{m}}\big)^{2/3}\right)$ queries, which gives an improvement over the $Θ\big(\frac{1}ε\frac{N}{\sqrt{m}}\big)$-query classical algorithm based on random sampling when $m\ll N$.
△ Less
Submitted 7 April, 2022; v1 submitted 5 April, 2022;
originally announced April 2022.
-
STICC: A multivariate spatial clustering method for repeated geographic pattern discovery with consideration of spatial contiguity
Authors:
Yuhao Kang,
Kunlin Wu,
Song Gao,
Ignavier Ng,
Jinmeng Rao,
Shan Ye,
Fan Zhang,
Teng Fei
Abstract:
Spatial clustering has been widely used for spatial data mining and knowledge discovery. An ideal multivariate spatial clustering should consider both spatial contiguity and aspatial attributes. Existing spatial clustering approaches may face challenges for discovering repeated geographic patterns with spatial contiguity maintained. In this paper, we propose a Spatial Toeplitz Inverse Covariance-B…
▽ More
Spatial clustering has been widely used for spatial data mining and knowledge discovery. An ideal multivariate spatial clustering should consider both spatial contiguity and aspatial attributes. Existing spatial clustering approaches may face challenges for discovering repeated geographic patterns with spatial contiguity maintained. In this paper, we propose a Spatial Toeplitz Inverse Covariance-Based Clustering (STICC) method that considers both attributes and spatial relationships of geographic objects for multivariate spatial clustering. A subregion is created for each geographic object serving as the basic unit when performing clustering. A Markov random field is then constructed to characterize the attribute dependencies of subregions. Using a spatial consistency strategy, nearby objects are encouraged to belong to the same cluster. To test the performance of the proposed STICC algorithm, we apply it in two use cases. The comparison results with several baseline methods show that the STICC outperforms others significantly in terms of adjusted rand index and macro-F1 score. Join count statistics is also calculated and shows that the spatial contiguity is well preserved by STICC. Such a spatial clustering method may benefit various applications in the fields of geography, remote sensing, transportation, and urban planning, etc.
△ Less
Submitted 30 March, 2022; v1 submitted 17 March, 2022;
originally announced March 2022.
-
Reliable Causal Discovery with Improved Exact Search and Weaker Assumptions
Authors:
Ignavier Ng,
Yujia Zheng,
Jiji Zhang,
Kun Zhang
Abstract:
Many of the causal discovery methods rely on the faithfulness assumption to guarantee asymptotic correctness. However, the assumption can be approximately violated in many ways, leading to sub-optimal solutions. Although there is a line of research in Bayesian network structure learning that focuses on weakening the assumption, such as exact search methods with well-defined score functions, they d…
▽ More
Many of the causal discovery methods rely on the faithfulness assumption to guarantee asymptotic correctness. However, the assumption can be approximately violated in many ways, leading to sub-optimal solutions. Although there is a line of research in Bayesian network structure learning that focuses on weakening the assumption, such as exact search methods with well-defined score functions, they do not scale well to large graphs. In this work, we introduce several strategies to improve the scalability of exact score-based methods in the linear Gaussian setting. In particular, we develop a super-structure estimation method based on the support of inverse covariance matrix which requires assumptions that are strictly weaker than faithfulness, and apply it to restrict the search space of exact search. We also propose a local search strategy that performs exact search on the local clusters formed by each variable and its neighbors within two hops in the super-structure. Numerical experiments validate the efficacy of the proposed procedure, and demonstrate that it scales up to hundreds of nodes with a high accuracy.
△ Less
Submitted 14 January, 2022;
originally announced January 2022.
-
gCastle: A Python Toolbox for Causal Discovery
Authors:
Keli Zhang,
Shengyu Zhu,
Marcus Kalander,
Ignavier Ng,
Junjian Ye,
Zhitang Chen,
Lujia Pan
Abstract:
$\texttt{gCastle}…
▽ More
$\texttt{gCastle}$ is an end-to-end Python toolbox for causal structure learning. It provides functionalities of generating data from either simulator or real-world dataset, learning causal structure from the data, and evaluating the learned graph, together with useful practices such as prior knowledge insertion, preliminary neighborhood selection, and post-processing to remove false discoveries. Compared with related packages, $\texttt{gCastle}$ includes many recently developed gradient-based causal discovery methods with optional GPU acceleration. $\texttt{gCastle}$ brings convenience to researchers who may directly experiment with the code as well as practitioners with graphical user interference. Three real-world datasets in telecommunications are also provided in the current version. $\texttt{gCastle}$ is available under Apache License 2.0 at \url{https://github.com/huawei-noah/trustworthyAI/tree/master/gcastle}.
△ Less
Submitted 30 November, 2021;
originally announced November 2021.
-
Towards Federated Bayesian Network Structure Learning with Continuous Optimization
Authors:
Ignavier Ng,
Kun Zhang
Abstract:
Traditionally, Bayesian network structure learning is often carried out at a central site, in which all data is gathered. However, in practice, data may be distributed across different parties (e.g., companies, devices) who intend to collectively learn a Bayesian network, but are not willing to disclose information related to their data owing to privacy or security concerns. In this work, we prese…
▽ More
Traditionally, Bayesian network structure learning is often carried out at a central site, in which all data is gathered. However, in practice, data may be distributed across different parties (e.g., companies, devices) who intend to collectively learn a Bayesian network, but are not willing to disclose information related to their data owing to privacy or security concerns. In this work, we present a federated learning approach to estimate the structure of Bayesian network from data that is horizontally partitioned across different parties. We develop a distributed structure learning method based on continuous optimization, using the alternating direction method of multipliers (ADMM), such that only the model parameters have to be exchanged during the optimization process. We demonstrate the flexibility of our approach by adopting it for both linear and nonlinear cases. Experimental results on synthetic and real datasets show that it achieves an improved performance over the other methods, especially when there is a relatively large number of clients and each has a limited sample size.
△ Less
Submitted 4 April, 2022; v1 submitted 18 October, 2021;
originally announced October 2021.
-
On the Convergence of Continuous Constrained Optimization for Structure Learning
Authors:
Ignavier Ng,
Sébastien Lachapelle,
Nan Rosemary Ke,
Simon Lacoste-Julien,
Kun Zhang
Abstract:
Recently, structure learning of directed acyclic graphs (DAGs) has been formulated as a continuous optimization problem by leveraging an algebraic characterization of acyclicity. The constrained problem is solved using the augmented Lagrangian method (ALM) which is often preferred to the quadratic penalty method (QPM) by virtue of its standard convergence result that does not require the penalty c…
▽ More
Recently, structure learning of directed acyclic graphs (DAGs) has been formulated as a continuous optimization problem by leveraging an algebraic characterization of acyclicity. The constrained problem is solved using the augmented Lagrangian method (ALM) which is often preferred to the quadratic penalty method (QPM) by virtue of its standard convergence result that does not require the penalty coefficient to go to infinity, hence avoiding ill-conditioning. However, the convergence properties of these methods for structure learning, including whether they are guaranteed to return a DAG solution, remain unclear, which might limit their practical applications. In this work, we examine the convergence of ALM and QPM for structure learning in the linear, nonlinear, and confounded cases. We show that the standard convergence result of ALM does not hold in these settings, and demonstrate empirically that its behavior is akin to that of the QPM which is prone to ill-conditioning. We further establish the convergence guarantee of QPM to a DAG solution, under mild conditions. Lastly, we connect our theoretical results with existing approaches to help resolve the convergence issue, and verify our findings in light of an empirical comparison of them.
△ Less
Submitted 10 April, 2022; v1 submitted 22 November, 2020;
originally announced November 2020.
-
On the Role of Sparsity and DAG Constraints for Learning Linear DAGs
Authors:
Ignavier Ng,
AmirEmad Ghassami,
Kun Zhang
Abstract:
Learning graphical structures based on Directed Acyclic Graphs (DAGs) is a challenging problem, partly owing to the large search space of possible graphs. A recent line of work formulates the structure learning problem as a continuous constrained optimization task using the least squares objective and an algebraic characterization of DAGs. However, the formulation requires a hard DAG constraint an…
▽ More
Learning graphical structures based on Directed Acyclic Graphs (DAGs) is a challenging problem, partly owing to the large search space of possible graphs. A recent line of work formulates the structure learning problem as a continuous constrained optimization task using the least squares objective and an algebraic characterization of DAGs. However, the formulation requires a hard DAG constraint and may lead to optimization difficulties. In this paper, we study the asymptotic role of the sparsity and DAG constraints for learning DAG models in the linear Gaussian and non-Gaussian cases, and investigate their usefulness in the finite sample regime. Based on the theoretical results, we formulate a likelihood-based score function, and show that one only has to apply soft sparsity and DAG constraints to learn a DAG equivalent to the ground truth DAG. This leads to an unconstrained optimization problem that is much easier to solve. Using gradient-based optimization and GPU acceleration, our procedure can easily handle thousands of nodes while retaining a high accuracy. Extensive experiments validate the effectiveness of our proposed method and show that the DAG-penalized likelihood objective is indeed favorable over the least squares one with the hard DAG constraint.
△ Less
Submitted 8 January, 2021; v1 submitted 17 June, 2020;
originally announced June 2020.
-
A Graph Autoencoder Approach to Causal Structure Learning
Authors:
Ignavier Ng,
Shengyu Zhu,
Zhitang Chen,
Zhuangyan Fang
Abstract:
Causal structure learning has been a challenging task in the past decades and several mainstream approaches such as constraint- and score-based methods have been studied with theoretical guarantees. Recently, a new approach has transformed the combinatorial structure learning problem into a continuous one and then solved it using gradient-based optimization methods. Following the recent state-of-t…
▽ More
Causal structure learning has been a challenging task in the past decades and several mainstream approaches such as constraint- and score-based methods have been studied with theoretical guarantees. Recently, a new approach has transformed the combinatorial structure learning problem into a continuous one and then solved it using gradient-based optimization methods. Following the recent state-of-the-arts, we propose a new gradient-based method to learn causal structures from observational data. The proposed method generalizes the recent gradient-based methods to a graph autoencoder framework that allows nonlinear structural equation models and is easily applicable to vector-valued variables. We demonstrate that on synthetic datasets, our proposed method outperforms other gradient-based methods significantly, especially on large causal graphs. We further investigate the scalability and efficiency of our method, and observe a near linear training time when scaling up the graph size.
△ Less
Submitted 17 November, 2019;
originally announced November 2019.
-
Masked Gradient-Based Causal Structure Learning
Authors:
Ignavier Ng,
Shengyu Zhu,
Zhuangyan Fang,
Haoyang Li,
Zhitang Chen,
Jun Wang
Abstract:
This paper studies the problem of learning causal structures from observational data. We reformulate the Structural Equation Model (SEM) with additive noises in a form parameterized by binary graph adjacency matrix and show that, if the original SEM is identifiable, then the binary adjacency matrix can be identified up to super-graphs of the true causal graph under mild conditions. We then utilize…
▽ More
This paper studies the problem of learning causal structures from observational data. We reformulate the Structural Equation Model (SEM) with additive noises in a form parameterized by binary graph adjacency matrix and show that, if the original SEM is identifiable, then the binary adjacency matrix can be identified up to super-graphs of the true causal graph under mild conditions. We then utilize the reformulated SEM to develop a causal structure learning method that can be efficiently trained using gradient-based optimization, by leveraging a smooth characterization on acyclicity and the Gumbel-Softmax approach to approximate the binary adjacency matrix. It is found that the obtained entries are typically near zero or one and can be easily thresholded to identify the edges. We conduct experiments on synthetic and real datasets to validate the effectiveness of the proposed method, and show that it readily includes different smooth model functions and achieves a much improved performance on most datasets considered.
△ Less
Submitted 10 January, 2022; v1 submitted 18 October, 2019;
originally announced October 2019.
-
Causal Discovery with Reinforcement Learning
Authors:
Shengyu Zhu,
Ignavier Ng,
Zhitang Chen
Abstract:
Discovering causal structure among a set of variables is a fundamental problem in many empirical sciences. Traditional score-based casual discovery methods rely on various local heuristics to search for a Directed Acyclic Graph (DAG) according to a predefined score function. While these methods, e.g., greedy equivalence search, may have attractive results with infinite samples and certain model as…
▽ More
Discovering causal structure among a set of variables is a fundamental problem in many empirical sciences. Traditional score-based casual discovery methods rely on various local heuristics to search for a Directed Acyclic Graph (DAG) according to a predefined score function. While these methods, e.g., greedy equivalence search, may have attractive results with infinite samples and certain model assumptions, they are usually less satisfactory in practice due to finite data and possible violation of assumptions. Motivated by recent advances in neural combinatorial optimization, we propose to use Reinforcement Learning (RL) to search for the DAG with the best scoring. Our encoder-decoder model takes observable data as input and generates graph adjacency matrices that are used to compute rewards. The reward incorporates both the predefined score function and two penalty terms for enforcing acyclicity. In contrast with typical RL applications where the goal is to learn a policy, we use RL as a search strategy and our final output would be the graph, among all graphs generated during training, that achieves the best reward. We conduct experiments on both synthetic and real datasets, and show that the proposed approach not only has an improved search ability but also allows a flexible score function under the acyclicity constraint.
△ Less
Submitted 8 June, 2020; v1 submitted 11 June, 2019;
originally announced June 2019.
-
Implementation of Smart Contracts Using Hybrid Architectures with On- and Off-Blockchain Components
Authors:
Carlos Molina-Jimenez,
Ioannis Sfyrakis,
Ellis Solaiman,
Irene Ng,
Meng Weng Wong,
Alexis Chun,
Jon Crowcroft
Abstract:
Recently, decentralised (on-blockchain) platforms have emerged to complement centralised (off-blockchain) platforms for the implementation of automated, digital (smart) contracts. However, neither alternative can individually satisfy the requirements of a large class of applications. On-blockchain platforms suffer from scalability, performance, transaction costs and other limitations. Off-blockcha…
▽ More
Recently, decentralised (on-blockchain) platforms have emerged to complement centralised (off-blockchain) platforms for the implementation of automated, digital (smart) contracts. However, neither alternative can individually satisfy the requirements of a large class of applications. On-blockchain platforms suffer from scalability, performance, transaction costs and other limitations. Off-blockchain platforms are afflicted by drawbacks due to their dependence on single trusted third parties. We argue that in several application areas, hybrid platforms composed from the integration of on- and off-blockchain platforms are more able to support smart contracts that deliver the desired quality of service (QoS). Hybrid architectures are largely unexplored. To help cover the gap, in this paper we discuss the implementation of smart contracts on hybrid architectures. As a proof of concept, we show how a smart contract can be split and executed partially on an off-blockchain contract compliance checker and partially on the Rinkeby Ethereum network. To test the solution, we expose it to sequences of contractual operations generated mechanically by a contract validator tool.
△ Less
Submitted 31 July, 2018;
originally announced August 2018.
-
On and Off-Blockchain Enforcement Of Smart Contracts
Authors:
Carlos Molina-Jimenez,
Ellis Solaiman,
Ioannis Sfyrakis,
Irene Ng,
Jon Crowcroft
Abstract:
In this paper we discuss how conventional business contracts can be converted into smart contracts---their electronic equivalents that can be used to systematically monitor and enforce contractual rights, obligations and prohibitions at run time. We explain that emerging blockchain technology is certainly a promising platform for implementing smart contracts but argue that there is a large class o…
▽ More
In this paper we discuss how conventional business contracts can be converted into smart contracts---their electronic equivalents that can be used to systematically monitor and enforce contractual rights, obligations and prohibitions at run time. We explain that emerging blockchain technology is certainly a promising platform for implementing smart contracts but argue that there is a large class of applications, where blockchain is inadequate due to performance, scalability, and consistency requirements, and also due to language expressiveness and cost issues that are hard to solve. We explain that in some situations a centralised approach that does not rely on blockchain is a better alternative due to its simplicity, scalability, and performance. We suggest that in applications where decentralisation and transparency are essential, developers can advantageously combine the two approaches into hybrid solutions where some operations are enforced by enforcers deployed on--blockchains and the rest by enforcers deployed on trusted third parties.
△ Less
Submitted 2 May, 2018;
originally announced May 2018.
-
Valorising the IoT Databox: Creating Value for Everyone
Authors:
Charith Perera,
Susan Wakenshaw,
Tim Baarslag,
Hamed Haddadi,
Arosha Bandara,
Richard Mortier,
Andy Crabtree,
Irene Ng,
Derek McAuley,
Jon Crowcroft
Abstract:
The Internet of Things (IoT) is expected to generate large amounts of heterogeneous data from diverse sources including physical sensors, user devices, and social media platforms. Over the last few years, significant attention has been focused on personal data, particularly data generated by smart wearable and smart home devices. Making personal data available for access and trade is expected to b…
▽ More
The Internet of Things (IoT) is expected to generate large amounts of heterogeneous data from diverse sources including physical sensors, user devices, and social media platforms. Over the last few years, significant attention has been focused on personal data, particularly data generated by smart wearable and smart home devices. Making personal data available for access and trade is expected to become a part of the data driven digital economy. In this position paper, we review the research challenges in building personal Databoxes that hold personal data and enable data access by other parties, and potentially thus sharing of data with other parties. These Databoxes are expected to become a core part of future data marketplaces.
△ Less
Submitted 12 September, 2016;
originally announced September 2016.
-
Value, Variety and Viability: New Business Models for Co-Creation in Outcome-based Contracts
Authors:
Irene Ng,
Gerard Briscoe
Abstract:
We propose that designing a manufacturer's equipment-based service value proposition in outcome-based contracts is the design of a new business model capable of managing threats to the firm's viability that can arise from the contextual variety of use that customers may subject the firm's value propositions. Furthermore, manufacturers need to understand these emerging business models as the capabi…
▽ More
We propose that designing a manufacturer's equipment-based service value proposition in outcome-based contracts is the design of a new business model capable of managing threats to the firm's viability that can arise from the contextual variety of use that customers may subject the firm's value propositions. Furthermore, manufacturers need to understand these emerging business models as the capability of managing both asset and service provision to achieve use outcomes with customers, including emotional outcomes such as customer experience. Service-Dominant Logic proposes that all "goods are a distribution mechanism for service provision", upon which we propose a value-centric approach to understanding the interactions between the asset and service provision, and suggest a viable systems approach towards reorganising the firm to achieve such a business model. Three case studies of B2B equipment-based service systems were analysed to understand customers' co-creation activities in achieving outcomes, in which we found that the co-creation of complex multi-dimensional value could be delivered through the different value propositions of the firm catering to different aspects (dimensions) of the value to be co-created. The study provides a way for managers to understand the effectiveness (rather than efficiency) of firms in adopting emerging business models that design for value co-creation in what are ultimately complex socio- technical systems.
△ Less
Submitted 22 November, 2012;
originally announced November 2012.