Yanqi Zhou

Mountain View, California, United States Contact Info
3K followers 500+ connections

Join to view profile

About

I am currently a research scientist at Google Deepmind (previously known as Google…

Activity

Join now to see all activity

Experience & Education

  • Google

View Yanqi’s full experience

See their title, tenure and more.

or

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

Volunteer Experience

  • Volunteer

    Shanghai Science and Technology Museum

    - 2 months

    Social Services

Publications

  • CASH: Supporting IaaS Customers with a Sub-core Configurable Architecture

    ACM/IEEE ISCA

    CASH is a sub-core configurable architecture co-designed with a control theory based runtime. It supports IaaS customers to configure their virtual core configuration to meet QoS while minimizing cost.

    Other authors
    See publication
  • MITTS: Memory Inter-arrival Time Traffic Shaping

    ACM/IEEE ISCA

    MITTS is a distributed hardware mechanism that shapes memory transaction inter-arrival time into a pre-determined distribution on a per-core/per-thread basis. MITTS enables better system throughput and fairness like conventional memory scheduling algorithms. Moreover, it enables fine-grain memory bandwidth provisioning in an IaaS Cloud, which improves economic efficiency.

    Other authors
    See publication
  • OpenPiton: An Open Source Manycore Research Framework

    ACM ASPLOS

    OpenPiton is the first academia open source manycore processor!

    Other authors
    See publication
  • The Sharing Architecture: Sub-core Configurability for IaaS Clouds

    ACM ASPLOS

    We design a configurable architecture on a general-purpose fabric. The Sharing Architecture allows us to configure a virtual core with different number of ALUs and cache. Unlike conventional composable architectures, the Sharing Architecture does not rely on the compiler or a new ISA support. A full chip is composed of hundreds of Slices and L2 cache banks.

    Other authors
    • David Wentzlaff
    See publication
  • Transferable Graph Optimizers for ML Compilers

    NeurIPS 2020

    Most compilers for machine learning (ML) frameworks need to solve many correlated optimization problems to generate efficient machine code. Current ML compilers rely on heuristics based algorithms to solve these optimization problems one at a time. However, this approach is not only hard to maintain but often leads to sub-optimal solutions especially for newer model architectures. Existing learning based approaches in the literature are sample inefficient, tackle a single optimization problem…

    Most compilers for machine learning (ML) frameworks need to solve many correlated optimization problems to generate efficient machine code. Current ML compilers rely on heuristics based algorithms to solve these optimization problems one at a time. However, this approach is not only hard to maintain but often leads to sub-optimal solutions especially for newer model architectures. Existing learning based approaches in the literature are sample inefficient, tackle a single optimization problem, and do not generalize to unseen graphs making them infeasible to be deployed in practice. To address these limitations, we propose an end-to-end, transferable deep reinforcement learning method for computational graph optimization (GO), based on a scalable sequential attention mechanism over an inductive graph neural network. GO generates decisions on the entire graph rather than on each individual node autoregressively, drastically speeding up the search compared to prior methods. Moreover, we propose recurrent attention layers to jointly optimize dependent graph optimization tasks and demonstrate 33%-60% speedup on three graph optimization tasks compared to TensorFlow default optimization. On a diverse set of representative graphs consisting of up to 80,000 nodes, including Inception-v3, Transformer-XL, and WaveNet, GO achieves on average 21% improvement over human experts and 18% improvement over the prior state of the art with 15x faster convergence, on a device placement task evaluated in real systems.

    Other authors
    • Yanqi Zhou
    • Sudip Roy
    • Amirali Abdolrashidi
    • Daniel Wong
    •  Peter Ma
    • Qiumin Xu
    • Hanxiao Liu
    • Phitchaya Mangpo Phothilimthana
    • Shen Wang
    • Anna Goldie
    See publication

Courses

  • Analog Circuits

    -

  • Big Data Analytics

    -

  • Compiler

    -

  • Computer Architecture

    -

  • Computer Network

    -

  • Computer Organizations

    -

  • Data Structures and Algorithms

    -

  • Digital Circuits

    -

  • Finance and Investment

    -

  • German

    -

  • Introduction to programming

    -

  • Linear Algebra

    -

  • Mathematics of Finance

    -

  • Microeconomics

    -

  • Numerical Analysis

    -

  • Operating System

    -

  • Parallel Computing

    -

  • Semiconductor and Devices

    -

  • Theory of Algorithm

    -

  • VLSI

    -

Honors & Awards

  • Princeton Wu Fellowship

    Princeton University

  • Microsoft PhD Fellow

    Microsoft Research

    I was selected as Microsoft PhD Fellow of year 2014.

Languages

  • English

    Full professional proficiency

  • Chinese

    Native or bilingual proficiency

  • German

    Elementary proficiency

More activity by Yanqi

View Yanqi’s full profile

  • See who you know in common
  • Get introduced
  • Contact Yanqi directly
Join to view full profile

People also viewed

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Yanqi Zhou

Add new skills with these courses