Genmo

Deep Learning Performance Engineer

Genmo San Francisco, CA
No longer accepting applications

Our mission

Genmo makes it easy for anyone to create movies, as if it were magic. Using our web application, any user can create cinematic video using a simple text prompt.

We imagine a world where high-quality cinematic video content is as plentiful as water. Our mission is to empower the next billion video creators to tell their stories.

The role is based in the Bay Area (San Francisco). Candidates are expected to be located near the Bay Area or open to relocation.

As a Deep Learning Performance Engineer at Genmo, you will play a critical role in optimizing the performance of our large generative AI models. Your expertise will ensure that our models run efficiently on clusters, leveraging advanced techniques and tools to enhance their performance. This role is perfect for someone with a deep understanding of deep learning performance bottlenecks, kernel optimization, and distributed training strategies.

Responsibilities:

  • Model-Level Performance Optimization: Profile and analyze the performance of deep learning models on the cluster. Identify performance bottlenecks related to arithmetic intensity, memory access patterns, and communication overhead.
  • Kernel Optimization and Tuning: Optimize custom CUDA kernels for specific operations in diffusion models. Utilize profiling tools to guide kernel optimization and achieve maximum GPU utilization. Use graph compilation to perform horizontal/vertical fusion of kernels and kernel rewrites for optimized operators like FlashAttention. Utilize CUDA and Triton for kernel development and optimization.
  • Distributed Training Optimization: Fine-tune distributed training strategies (e.g., sharding, parallelism) for optimal performance on the cluster. Experiment with and implement advanced techniques like model parallelism, pipeline parallelism, and tensor parallelism. Optimize memory footprint of training with methods like rematerialization.

Qualifications:

  • Prior experience working on GPUs / CUDA .
  • Experience with profiling tools such as the PyTorch profiler.
  • Extensive experience in optimizing deep learning models and kernels.
  • Knowledge of distributed training strategies and techniques.
  • Familiarity with advanced model optimization techniques.
  • Strong problem-solving skills and ability to work in a fast-paced environment.
  • Passion for artificial intelligence and a drive to push the boundaries of what is possible.

Bonus points:

  • Experience with diffusion models and their specific optimization needs.
  • Proven track record of optimizing performance in large-scale deep learning projects.
  • Knowledge of advanced memory optimization techniques.
  • Contributions to open-source deep learning projects or research publications in relevant areas.

Genmo is an Equal Opportunity Employer. Candidates are evaluated without regard to age, race, color, religion, sex, disability, national origin, sexual orientation, veteran status, or any other characteristic protected by federal or state law. Genmo, Inc. is an E-Verify company and you may review the Notice of E-Verify Participation and the Right to Work posters in English and Spanish.
  • Seniority level

    Mid-Senior level
  • Employment type

    Full-time
  • Job function

    Engineering and Information Technology
  • Industries

    Artists and Writers

Referrals increase your chances of interviewing at Genmo by 2x

See who you know

Get notified about new Deep Learning Specialist jobs in San Francisco, CA.

Sign in to create job alert

Similar jobs

People also viewed

Looking for a job?

Visit the Career Advice Hub to see tips on interviewing and resume writing.

View Career Advice Hub