Skip to content

modelscope/DiffSynth-Studio

Repository files navigation

DiffSynth Studio

Introduction

DiffSynth Studio is a Diffusion engine. We have restructured architectures including Text Encoder, UNet, VAE, among others, maintaining compatibility with models from the open-source community while enhancing computational performance. We provide many interesting features. Enjoy the magic of Diffusion models!

Until now, DiffSynth Studio has supported the following models:

News

  • June 21, 2024. 🔥🔥🔥 We propose ExVideo, a post-tuning technique aimed at enhancing the capability of video generation models. We have extended Stable Video Diffusion to achieve the generation of long videos up to 128 frames.

  • June 13, 2024. DiffSynth Studio is transferred to ModelScope. The developers have transitioned from "I" to "we". Of course, I will still participate in development and maintenance.

  • Jan 29, 2024. We propose Diffutoon, a fantastic solution for toon shading.

    • Project Page
    • The source codes are released in this project.
    • The technical report (IJCAI 2024) is released on arXiv.
  • Dec 8, 2023. We decide to develop a new Project, aiming to release the potential of diffusion models, especially in video synthesis. The development of this project is started.

  • Nov 15, 2023. We propose FastBlend, a powerful video deflickering algorithm.

  • Oct 1, 2023. We release an early version of this project, namely FastSDXL. A try for building a diffusion engine.

    • The source codes are released on GitHub.
    • FastSDXL includes a trainable OLSS scheduler for efficiency improvement.
      • The original repo of OLSS is here.
      • The technical report (CIKM 2023) is released on arXiv.
      • A demo video is shown on Bilibili.
      • Since OLSS requires additional training, we don't implement it in this project.
  • Aug 29, 2023. We propose DiffSynth, a video synthesis framework.

Installation

git clone https://github.com/modelscope/DiffSynth-Studio.git
cd DiffSynth-Studio
pip install -e .

Usage (in Python code)

The Python examples are in examples. We provide an overview here.

Long Video Synthesis

We trained an extended video synthesis model, which can generate 128 frames. examples/ExVideo

github_title.mp4

Image Synthesis

Generate high-resolution images, by breaking the limitation of diffusion models! examples/image_synthesis.

LoRA fine-tuning is supported in examples/train.

Model Example
Stable Diffusion 1024
Stable Diffusion XL 1024
Stable Diffusion 3 image_1024
Kolors image_1024
Hunyuan-DiT image_1024

Toon Shading

Render realistic videos in a flatten style and enable video editing features. examples/Diffutoon

Diffutoon.mp4
Diffutoon_edit.mp4

Video Stylization

Video stylization without video models. examples/diffsynth

winter_stone.mp4

Usage (in WebUI)

python -m streamlit run DiffSynth_Studio.py
sdxl_turbo_ui.mp4