Markus-Pobitzer commited on
Commit
7e058fd
·
verified ·
1 Parent(s): d82ee10

Readme: add inference code

Browse files
Files changed (1) hide show
  1. README.md +212 -3
README.md CHANGED
@@ -1,3 +1,212 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ base_model:
4
+ - Wan-AI/Wan2.1-I2V-14B-480P-Diffusers
5
+ pipeline_tag: image-to-video
6
+ tags:
7
+ - Painting
8
+ ---
9
+
10
+ # Loomis Painter: Reconstructing the painting process
11
+
12
+ <p align="center">
13
+ <a href='https://github.com/Markus-Pobitzer/wlp'>
14
+ <img src='https://img.shields.io/badge/github-repo-blue?logo=github'></a>
15
+ <a href='https://arxiv.org/abs/2511.17344'>
16
+ <img src='https://img.shields.io/badge/Arxiv-Pdf-A42C25?style=flat&logo=arXiv&logoColor=white'></a>
17
+ <a href='https://markus-pobitzer.github.io/lplp'>
18
+ <img src='https://img.shields.io/badge/Project-Page-green?style=flat&logo=Google%20chrome&logoColor=white'></a>
19
+ </p>
20
+
21
+
22
+ <table>
23
+ <tr>
24
+ <td align="center">
25
+ <img src="assets/base.gif" width="380" alt="Generated Video" />
26
+ <br />
27
+ <sub>Generated Video</sub>
28
+ </td>
29
+ <td align="center">
30
+ <img src="assets/reference_image.png" width="380" alt="Input" title="Haystacks by Claude Monet. Source: Wikiart." />
31
+ <br />
32
+ <sub>Input</sub>
33
+ </td>
34
+ </tr>
35
+ </table>
36
+
37
+ ## Base Model Inference
38
+
39
+ Before running the code make sure to have installed torch, diffusers, transformers, huggingface_hub, and pillow. You can also install the dependencies from the offical Loomis Portrait repo [link](https://github.com/Markus-Pobitzer/wlp).
40
+
41
+ ```python
42
+ import torch
43
+ from diffusers import AutoencoderKLWan, WanImageToVideoPipeline
44
+ from diffusers.utils import export_to_video, load_image
45
+ from transformers import CLIPVisionModel
46
+ from huggingface_hub import hf_hub_download
47
+ from typing import List, Tuple, Union
48
+ from PIL import Image, ImageOps
49
+
50
+
51
+ def pil_resize(
52
+ image: Image.Image,
53
+ target_size: Tuple[int, int],
54
+ pad_input: bool = False,
55
+ padding_color: Union[str, int, Tuple[int, ...]] = "white",
56
+ ) -> Image.Image:
57
+ """Resizing it to the target size.
58
+
59
+ Args:
60
+ image: Input image to be processed.
61
+ target_size: Target size (width, height).
62
+ pad_input: If set resizes the image while keeping the aspect ratio and pads the unfilled part.
63
+ padding_color: The color for the padded pixels.
64
+
65
+ Returns:
66
+ The resized image
67
+ """
68
+ if pad_input:
69
+ # Resize image, keep aspect ratio
70
+ image = ImageOps.contain(image, size=target_size)
71
+ # Pad while keeping image in center
72
+ image = ImageOps.pad(image, size=target_size, color=padding_color)
73
+ else:
74
+ image = image.resize(target_size)
75
+ return image
76
+
77
+
78
+ def undo_pil_resize(
79
+ image: Image.Image,
80
+ target_size: Tuple[int, int],
81
+ ) -> Image.Image:
82
+ """Undo the resizing and padding of the input image to the a new image with size target_size.
83
+
84
+ Args:
85
+ image: Input image to be processed.
86
+ target_size: Target size (width, height).
87
+
88
+ Returns:
89
+ The resized image
90
+ """
91
+ tmp_img = Image.new(mode="RGB", size=target_size)
92
+ # Get the resized image size
93
+ tmp_img = ImageOps.contain(tmp_img, size=image.size)
94
+
95
+ # Undo padding by center cropping
96
+ width, height = image.size
97
+ tmp_width, tmp_height = tmp_img.size
98
+
99
+ left = int(round((width - tmp_width) / 2.0))
100
+ top = int(round((height - tmp_height) / 2.0))
101
+ right = left + tmp_width
102
+ bottom = top + tmp_height
103
+ cropped = image.crop((left, top, right, bottom))
104
+
105
+ # Undo resizing
106
+ ret = cropped.resize(target_size)
107
+ return ret
108
+
109
+ # Set to True if you have a GPU with less than 80GB VRAM --> Very slow inference!
110
+ enable_sequential_cpu_offload = True
111
+
112
+ # Download the LoRA file
113
+ lora_path = hf_hub_download(repo_id="Markus-Pobitzer/wlp-lora", filename="base.safetensors")
114
+ print(f"LoRA path: {lora_path}")
115
+
116
+ # Loads the pipeline
117
+ model_id = "Wan-AI/Wan2.1-I2V-14B-480P-Diffusers"
118
+ vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32)
119
+ image_encoder = CLIPVisionModel.from_pretrained(
120
+ model_id, subfolder="image_encoder", torch_dtype=torch.float32
121
+ )
122
+ # Takes more than 100 GB of disk space
123
+ pipe = WanImageToVideoPipeline.from_pretrained(
124
+ model_id, vae=vae, image_encoder=image_encoder, torch_dtype=torch.bfloat16
125
+ )
126
+
127
+ # Load LoRA
128
+ pipe.load_lora_weights(lora_path)
129
+ pipe.fuse_lora()
130
+
131
+ # Either offload or directly to GPU
132
+ if enable_sequential_cpu_offload:
133
+ pipe.enable_sequential_cpu_offload()
134
+ else:
135
+ pipe.to("cuda")
136
+
137
+
138
+ ### INFERENCE ###
139
+ image = load_image(
140
+ "https://uploads3.wikiart.org/images/claude-monet/haystacks-at-giverny.jpg"
141
+ )
142
+ og_size = image.size
143
+ height = 480
144
+ width = 832
145
+ # Resize and pad
146
+ ref_image = pil_resize(image, target_size=(width, height), pad_input=True)
147
+ prompt = "Painting process step by step."
148
+
149
+ output = pipe(
150
+ image=ref_image,
151
+ prompt=prompt,
152
+ height=height,
153
+ width=width,
154
+ num_frames=81,
155
+ output_type="pil",
156
+ guidance_scale=1.0,
157
+ ).frames[0]
158
+ # To original image size
159
+ output = [undo_pil_resize(img, og_size) for img in output][::-1]
160
+ # Save video
161
+ export_to_video(output, "output.mp4", fps=3)
162
+ ```
163
+
164
+ ### Art Media Transfer
165
+ To transfer from one art media to the other use following LoRA:
166
+
167
+ ```python
168
+ lora_path = hf_hub_download(repo_id="Markus-Pobitzer/wlp-lora", filename="art_media_transfer.safetensors")
169
+ ```
170
+
171
+ Make sure that you also change the prompt accordingly. The supported art medias are:
172
+ - acrylic
173
+ - colored pencils
174
+ - loomis
175
+ - pencil
176
+ - oil
177
+
178
+ The prompt has following format:
179
+ ```python
180
+ art_media = "..."
181
+ painting_desc = "..."
182
+ prompt = f"<{art_media}> Painting process step by step. {painting_desc}"
183
+ ```
184
+
185
+ For acrylic, colored pencils and oil the prompt can contain color descriptions, i.e.
186
+ ```
187
+ prompt = f"<acrylic> Painting process step by step. The image depicts a serene landscape with a small brown and green island in the center of a body of water, surrounded by green trees and a few boats. The sky is blue with scattered clouds, and there are birds flying in the background."
188
+ ```
189
+
190
+ For the loomis and pencil art media we left the color information out during fine tuning, i.e.
191
+ ```
192
+ prompt = f"<pencil> Painting process step by step. The image depicts a serene landscape with a small island in the center of a body of water, surrounded by trees and a few boats. There are scattered clouds, and birds flying in the background."
193
+ ```
194
+
195
+ Note that the loomis method only works on portrait photos/paintings and otherwise seems to fall back to an other art media.
196
+
197
+
198
+ ## Citation
199
+
200
+ If you use this work, please cite:
201
+
202
+ ```bibtex
203
+ @misc{pobitzer2025loomispainter,
204
+ title={Loomis Painter: Reconstructing the Painting Process},
205
+ author={Markus Pobitzer and Chang Liu and Chenyi Zhuang and Teng Long and Bin Ren and Nicu Sebe},
206
+ year={2025},
207
+ eprint={2511.17344},
208
+ archivePrefix={arXiv},
209
+ primaryClass={cs.CV},
210
+ url={https://arxiv.org/abs/2511.17344},
211
+ }
212
+ ```