Lightricks logoLTX-2

Video
ComfyUI

LTX-2 is a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model

On-Demand Dedicated 1xRTX Pro 6000 WS

Details

Modalities

video

Recommended Hardware

1xRTX Pro 6000 WS

Estimated Price

Loading...

Provider

Lightricks

Family

LTXV

License

LTX-2 Community License Agreement

LTX-2 is a DiT-based (Diffusion Transformer) audio-video foundation model developed by Lightricks that generates synchronized video and audio within a single unified model. With 19 billion parameters, it represents a significant advancement in multimodal generation, enabling practical video creation with accompanying audio from various input modalities.

Key Features

LTX-2 supports multiple generation modes within a single architecture:

  • Text-to-Video: Generate video content directly from text descriptions
  • Image-to-Video: Animate static images into dynamic video sequences
  • Audio-Visual Generation: Create synchronized audio and video output together
  • Cross-Modal Generation: Support for audio-to-video, text-to-audio, and video-to-audio workflows

The unified architecture allows all these capabilities to work together seamlessly, making it possible to generate complete audiovisual content from simple prompts.

Architecture

LTX-2 is built on a Diffusion Transformer (DiT) architecture, combining the strengths of diffusion models with transformer-based processing. This design enables the model to handle both video and audio generation within a single framework, maintaining temporal coherence across both modalities.

The model processes video with width and height divisible by 32, and frame counts divisible by 8 plus 1, allowing for flexible output configurations while maintaining generation quality.

Training and Customization

The base model is fully trainable, supporting various customization approaches:

  • LoRA Training: Create Low-Rank Adaptations for specific styles or subjects
  • IC-LoRA: Image-Conditioned LoRAs for more precise control
  • Motion Adaptation: Train custom motion patterns efficiently
  • Style Transfer: Adapt the model to specific visual styles
  • Likeness Training: Capture both appearance and sound characteristics

These customization options enable users to adapt LTX-2 for specific creative applications while building on its foundation capabilities.

Use Cases

LTX-2 is designed for creative video generation applications including:

  • Short-form video content creation
  • Animation and motion design
  • Visual storytelling with synchronized audio
  • Creative experimentation with multimodal generation
  • Prototyping video concepts from text descriptions

Prompting

Effective prompting significantly impacts generation quality. The model responds well to detailed, descriptive prompts that clearly articulate the desired visual and audio elements. For best results, users should provide specific details about motion, scene composition, and audio characteristics when generating audiovisual content.

Integration

LTX-2 integrates with ComfyUI through built-in LTXVideo nodes, enabling visual workflow-based generation. The model is also supported in the Hugging Face Diffusers library for programmatic access.

For more details about the model architecture and training approach, see the model page on Hugging Face.

Quick Start Guide

Choose a model and click 'Deploy' above to find available GPUs recommended for this model.

Rent your dedicated instance preconfigured with the model you've selected.

Start sending requests to your model instance and getting responses right now.

Vast AI

© 2026 Vast.ai. All rights reserved.

Vast.ai