Skip to main content

Showing 1–9 of 9 results for author: Wong, H M

  1. arXiv:2406.02963  [pdf, other

    cs.SD eess.AS

    Dataset-Distillation Generative Model for Speech Emotion Recognition

    Authors: Fabian Ritter-Gutierrez, Kuan-Po Huang, Jeremy H. M Wong, Dianwen Ng, Hung-yi Lee, Nancy F. Chen, Eng Siong Chng

    Abstract: Deep learning models for speech rely on large datasets, presenting computational challenges. Yet, performance hinges on training data size. Dataset Distillation (DD) aims to learn a smaller dataset without much performance degradation when training with it. DD has been investigated in computer vision but not yet in speech. This paper presents the first approach for DD to speech targeting Speech Em… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  2. arXiv:2312.12153  [pdf, other

    cs.SD eess.AS

    Noise robust distillation of self-supervised speech models via correlation metrics

    Authors: Fabian Ritter-Gutierrez, Kuan-Po Huang, Dianwen Ng, Jeremy H. M. Wong, Hung-yi Lee, Eng Siong Chng, Nancy F. Chen

    Abstract: Compared to large speech foundation models, small distilled models exhibit degraded noise robustness. The student's robustness can be improved by introducing noise at the inputs during pre-training. Despite this, using the standard distillation loss still yields a student with degraded performance. Thus, this paper proposes improving student robustness via distillation with correlation metrics. Te… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: 6 pages

  3. arXiv:2306.02719  [pdf, ps, other

    cs.CL cs.LG cs.SD eess.AS

    Multiple output samples per input in a single-output Gaussian process

    Authors: Jeremy H. M. Wong, Huayun Zhang, Nancy F. Chen

    Abstract: The standard Gaussian Process (GP) only considers a single output sample per input in the training set. Datasets for subjective tasks, such as spoken language assessment, may be annotated with output labels from multiple human raters per input. This paper proposes to generalise the GP to allow for these multiple output samples in the training set, and thus make use of available output uncertainty… ▽ More

    Submitted 25 January, 2024; v1 submitted 5 June, 2023; originally announced June 2023.

    Comments: This paper is presented in the "Symposium for Celebrating 40 Years of Bayesian Learning in Speech and Language Processing and Beyond", which is a satellite event of the ASRU workshop, on 20 December 2023. https://bayesian40.github.io/

  4. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  5. arXiv:2109.11140  [pdf, other

    cs.SD cs.AI cs.CL cs.LG

    Joint speaker diarisation and tracking in switching state-space model

    Authors: Jeremy H. M. Wong, Yifan Gong

    Abstract: Speakers may move around while diarisation is being performed. When a microphone array is used, the instantaneous locations of where the sounds originated from can be estimated, and previous investigations have shown that such information can be complementary to speaker embeddings in the diarisation task. However, these approaches often assume that speakers are fairly stationary throughout a meeti… ▽ More

    Submitted 23 September, 2021; originally announced September 2021.

  6. arXiv:2109.10598  [pdf, other

    cs.LG cs.CL cs.SD eess.AS

    Diarisation using location tracking with agglomerative clustering

    Authors: Jeremy H. M. Wong, Igor Abramovski, Xiong Xiao, Yifan Gong

    Abstract: Previous works have shown that spatial location information can be complementary to speaker embeddings for a speaker diarisation task. However, the models used often assume that speakers are fairly stationary throughout a meeting. This paper proposes to relax this assumption, by explicitly modelling the movements of speakers within an Agglomerative Hierarchical Clustering (AHC) diarisation framewo… ▽ More

    Submitted 23 September, 2021; v1 submitted 22 September, 2021; originally announced September 2021.

  7. arXiv:2104.12424  [pdf, other

    cs.CL

    Attention vs non-attention for a Shapley-based explanation method

    Authors: Tom Kersten, Hugh Mee Wong, Jaap Jumelet, Dieuwke Hupkes

    Abstract: The field of explainable AI has recently seen an explosion in the number of explanation methods for highly non-linear deep neural networks. The extent to which such methods -- that are often proposed and tested in the domain of computer vision -- are appropriate to address the explainability challenges in NLP is yet relatively unexplored. In this work, we consider Contextual Decomposition (CD) --… ▽ More

    Submitted 26 April, 2021; originally announced April 2021.

    Comments: Accepted for publication at DeeLIO 2021

  8. arXiv:2003.07482  [pdf, other

    eess.AS cs.CL cs.SD

    High-Accuracy and Low-Latency Speech Recognition with Two-Head Contextual Layer Trajectory LSTM Model

    Authors: Jinyu Li, Rui Zhao, Eric Sun, Jeremy H. M. Wong, Amit Das, Zhong Meng, Yifan Gong

    Abstract: While the community keeps promoting end-to-end models over conventional hybrid models, which usually are long short-term memory (LSTM) models trained with a cross entropy criterion followed by a sequence discriminative training criterion, we argue that such conventional hybrid models can still be significantly improved. In this paper, we detail our recent efforts to improve conventional hybrid LST… ▽ More

    Submitted 16 March, 2020; originally announced March 2020.

    Comments: Accepted by ICASSP 2020

  9. arXiv:1904.01731  [pdf, other

    quant-ph cs.ET

    The Search For Leakage-free Entangling Fibonacci Braiding Gates

    Authors: Shawn X. Cui, Kevin T. Tian, Jennifer F. Vasquez, Zhenghan Wang, Helen M. Wong

    Abstract: It is an open question if there are leakage-free entangling Fibonacci braiding gates. We provide evidence to the conjecture for the negative in this paper. We also found a much simpler protocol to generate approximately leakage-free entangling Fibonacci braiding gates than existing algorithms in the literature.

    Submitted 2 April, 2019; originally announced April 2019.

    MSC Class: 81P68

    Journal ref: Journal of Physics A: Mathematical and Theoretical, vol. 52, no. 45 (2019)