May 16, 2025-HunyuanvideoText to video
The recent advancements in AI for video generation have been very fast paced. Models now create much more realistic videos than before, with multiple different use cases. One such model is HunyuanVideo, a model known for its impressive outputs—and tackle the practical challenges of running it efficiently on high-memory GPUs like the A100 or H100.
HunyuanVideo is Tencent's state-of-the-art text-to-video generation model that rivals or surpasses leading closed-source alternatives. As the largest open-source video generation model with over 13 billion parameters, HunyuanVideo represents a significant breakthrough in AI-powered video creation.
We also have a notebook to follow along once you deploy the Vast Instance.
In this guide, we will:
Let’s get started with HunyuanVideo!
Tencent maintains a custom Docker image for HunyuanVideo: hunyuanvideo/hunyuanvideo:cuda_12
. To run the model on Vast, we'll create a custom template using this image.
Follow these steps:
hunyuanvideo/hunyuanvideo:cuda_12
.git clone https://github.com/tencent/HunyuanVideo
This ensures the HunyuanVideo repository is downloaded on instance startup.HunyuanVideo requires a GPU with at least 80GB VRAM to run smoothly. Available GPUs meeting this requirement include Nvidia’s A100 and H100 cards.
To select an instance:
/workspace/HunyuanVideo/
on the server.Before generating videos, we need to download pretrained model weights from Hugging Face. These include the video model weights and the text encoders.
Run the following commands inside your instance:
huggingface-cli download tencent/HunyuanVideo --local-dir ./ckpts
huggingface-cli download xtuner/llava-llama-3-8b-v1_1-transformers --local-dir ./ckpts/llava-llama-3-8b-v1_1-transformers
python hyvideo/utils/preprocess_text_encoder_tokenizer_utils.py --input_dir ckpts/llava-llama-3-8b-v1_1-transformers --output_dir ckpts/text_encoder
huggingface-cli download openai/clip-vit-large-patch14 --local-dir ./ckpts/text_encoder_2
For a more in-depth discussion of the checkpoints, refer to Tencent’s checkpoint README.
To conveniently view generated videos within the Jupyter notebook, let’s define a helper function using IPython.display.Video
:
from IPython.display import Video
def show_video(video_path, width=640, height=360, embed=True):
"""
Display a video in a Jupyter notebook.
Parameters:
-----------
video_path : str
Path to the video file (local file or URL)
width : int, optional
Width of the video player in pixels
height : int, optional
Height of the video player in pixels
embed : bool, optional
Whether to embed the video in the notebook (True)
or just link to it (False)
Returns:
--------
IPython.display.Video
Video display object
"""
return Video(video_path, width=width, height=height, embed=embed)
Let’s generate a realistic video of a cat walking on the grass using the example prompt provided by Tencent.
Run the following command in your terminal:
python sample_video.py \
--video-size 720 1280 \
--video-length 129 \
--infer-steps 50 \
--prompt "A cat walks on the grass, realistic style." \
--flow-reverse \
--use-cpu-offload \
--save-path ./results
This command will create a video approximately 129 frames long (roughly 4-5 seconds at standard frame rates) at 720x1280 resolution.
Once the generation finishes, locate the video file inside the ./results
directory.
To display it in your notebook, run:
show_video("./results/[your_video_filename].mp4")
Replace [your_video_filename].mp4
with the actual file name.
You should see a high-quality, realistically animated cat walking on grass, showcasing HunyuanVideo’s remarkable detail in animal motion and textures.
Next, try generating a completely different scene:
python sample_video.py \
--video-size 720 1280 \
--video-length 129 \
--infer-steps 50 \
--prompt "An astronaut walks across the moon, realistic style." \
--flow-reverse \
--use-cpu-offload \
--save-path ./results
After the video is generated, display it in your notebook the same way:
show_video("./results/[your_astronaut_video].mp4")
The resulting video will demonstrate the model’s versatility at rendering vastly different environments and characters, from furry animals to astronauts in space suits.
You've now successfully generated your first videos using HunyuanVideo on a Vast-powered cloud instance. This powerful model unlocks creative possibilities in AI-driven video generation with:
--video-size
values (e.g., 1280×720, 960×960) to optimize for your needs.--infer-steps
(default 50) to trade off video generation quality and speed.--embedded-cfg-scale
(default 6.0) to balance prompt fidelity and creative variance.--seed
to reproduce favorite generated videos reliably.Deploying large models like HunyuanVideo can be resource-intensive but is now accessible with cloud platforms like Vast, giving you access to top-tier GPUs without upfront hardware costs.
With these foundations, you’re ready to explore more sophisticated prompt engineering, fine-tune generation parameters, or even integrate HunyuanVideo into multimedia projects!
Resources: