Add pipeline tag and improve model card metadata

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +40 -15
README.md CHANGED
@@ -1,39 +1,64 @@
1
  ---
2
  language: en
3
  license: mit
 
4
  tags:
5
  - diffusion
6
  - autoencoder
7
  - feature-space
8
  - svg
9
- references:
10
- - https://arxiv.org/abs/2510.15301
11
  ---
12
 
13
  # SVG: Latent Diffusion Model without Variational Autoencoder
14
 
15
- ## Model Description
16
 
17
- SVG is a latent diffusion model framework that replaces the traditional VAE latent space with semantically structured features from self-supervised vision models (e.g., DINOv3). This design improves generative capability and downstream transferability while maintaining efficiency comparable to standard VAE-based latent diffusion models.
18
 
19
- Key features:
 
 
20
 
21
- - Replaces low-dimensional VAE latent space with high-dimensional semantic feature space.
22
- - Includes a lightweight residual encoder for refining fine-grained details.
23
- - Enables strong generation and perception performance.
24
 
 
25
 
26
- ## How to Use
 
 
 
 
27
 
28
- For code, and instructions, see the GitHub repository:
29
 
30
- [https://github.com/shiml20/SVG](https://github.com/shiml20/SVG)
31
 
 
 
 
 
 
 
32
 
33
- Official project page:
 
 
 
 
 
34
 
35
- [https://howlin-wang.github.io/svg/](https://howlin-wang.github.io/svg/)
36
 
37
- Arxiv paper:
38
 
39
- [https://arxiv.org/abs/2510.15301](https://arxiv.org/abs/2510.15301)
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  language: en
3
  license: mit
4
+ pipeline_tag: image-to-image
5
  tags:
6
  - diffusion
7
  - autoencoder
8
  - feature-space
9
  - svg
 
 
10
  ---
11
 
12
  # SVG: Latent Diffusion Model without Variational Autoencoder
13
 
14
+ SVG is a novel latent diffusion model framework that replaces the traditional Variational Autoencoder (VAE) latent space with semantically structured features from self-supervised vision models (e.g., DINOv3). This design improves generative capability and downstream transferability while maintaining efficiency comparable to standard VAE-based models.
15
 
16
+ ## Resources
17
 
18
+ - **Paper:** [Latent Diffusion Model without Variational Autoencoder](https://huggingface.co/papers/2510.15301)
19
+ - **Project Page:** [https://howlin-wang.github.io/svg/](https://howlin-wang.github.io/svg/)
20
+ - **GitHub Repository:** [https://github.com/shiml20/SVG](https://github.com/shiml20/SVG)
21
 
22
+ ## Model Description
 
 
23
 
24
+ SVG constructs a feature space with clear semantic discriminability by leveraging frozen DINO features, while a lightweight residual branch captures fine-grained details for high-fidelity reconstruction. Diffusion models are trained directly on this semantically structured latent space to facilitate more efficient learning.
25
 
26
+ **Key features:**
27
+ - Replaces low-dimensional VAE latent space with high-dimensional semantic feature space.
28
+ - Includes a lightweight residual encoder for refining fine-grained details.
29
+ - Enables accelerated diffusion training and supports few-step sampling.
30
+ - Improves generative quality while preserving semantic and discriminative capabilities.
31
 
32
+ ## Usage
33
 
34
+ For full instructions on training and evaluation, please refer to the official [GitHub repository](https://github.com/shiml20/SVG).
35
 
36
+ ### Installation
37
+ ```bash
38
+ conda create -n svg python=3.10 -y
39
+ conda activate svg
40
+ pip install -r requirements.txt
41
+ ```
42
 
43
+ ### Generation
44
+ To generate images using a trained model:
45
+ ```bash
46
+ # Update ckpt_path in sample_svg.py with your checkpoint
47
+ python sample_svg.py
48
+ ```
49
 
50
+ ## Citation
51
 
52
+ If you find this work useful for your research, please cite:
53
 
54
+ ```bibtex
55
+ @misc{shi2025latentdiffusionmodelvariational,
56
+ title={Latent Diffusion Model without Variational Autoencoder},
57
+ author={Minglei Shi and Haolin Wang and Wenzhao Zheng and Ziyang Yuan and Xiaoshi Wu and Xintao Wang and Pengfei Wan and Jie Zhou and Jiwen Lu},
58
+ year={2025},
59
+ eprint={2510.15301},
60
+ archivePrefix={arXiv},
61
+ primaryClass={cs.CV},
62
+ url={https://arxiv.org/abs/2510.15301},
63
+ }
64
+ ```