Beatviz Tutorial: Create Audio-Driven Videos with Confidence

Getting Started

Create a New Task

Every Beatviz project starts with a task. This is where you define how the AI should interpret your audio.

Select a Template

Templates define the AI's rendering behavior.

Storytelling

• Designed for narration and spoken content
• Balanced performance and cost

Singing

• Optimized for music and vocals
• Superior lip-sync accuracy
• Higher credit usage due to advanced rendering

Choose Singing when visual mouth movement quality matters.

Standard vs Pro Mode

Before generation, choose a quality tier:

Standard Mode

• Faster rendering
• Lower credit cost

Pro Mode

• Higher visual fidelity
• Increased credit usage

Select based on quality requirements and budget.

Configure Presets: Character

Upload a character image in the Preset section.

This image is used to:

Guide first frame generation
Maintain character identity across video segments

Without a character preset, the AI may introduce inconsistent or random characters.

Configure Presets: Style

Styles define the emotional and cinematic direction of the video.

Available examples include:

EpicCinematicFunnyHappySad

Styles influence lighting, mood, and visual rhythm.

Think of styles as high-level creative constraints for the AI.

Review First Frame & Video Prompts

After configuration, Beatviz automatically:

Analyzes your audio
Generates a first frame image
Produces a corresponding video prompt

Always review these before generating video.

First Frame Image: Why It Matters

The first frame image sets the visual foundation of the video.

It directly affects:

Character appearance
Scene composition
Overall aesthetic consistency

Important: If the first frame does not contain your intended character, regenerate it before proceeding. This step prevents downstream inconsistencies.

Regeneration & Quality Controls

Beatviz supports iterative refinement without restarting your task.

Regenerate a Video Segment

If a generated clip does not meet expectations, you can:

Regenerate the current segment
Adjust the first frame image
Edit the video prompt

This allows focused improvements while preserving previous work.

Reference Image vs First Frame Image

In Standard Mode, regeneration supports two image types.

Reference Image

• Guides character appearance
• Background is optional
• AI relies more on prompts

First Frame Image

• Becomes the opening frame
• Character and background both matter
• Strongly constrains visual output

Use reference images for flexibility, and first frame images for precision.

Draft Recovery

If your browser closes or a technical issue occurs, your progress is preserved.

Unfinished tasks can be recovered at: https://beatviz.ai/creations

This enables seamless continuation without reconfiguration.

Simple Mode vs Custom Mode

Beatviz provides two main video generation modes:

Simple Mode:https://beatviz.ai/create

Custom Mode:https://beatviz.ai/create-custom

Key Differences

Simple Mode

In Simple Mode, you only need to upload an audio file. Beatviz's AI agent will automatically:

Analyze the audio
Generate suitable video prompts based on rhythm and structure
Create first-frame images
Produce a complete video with one click

This mode is designed for speed and ease of use, while still allowing you to edit and adjust the AI-generated content later if needed.

Custom Mode

In Custom Mode, the AI does not analyze your audio automatically. Instead:

You fully control every video segment
You manually write prompts for each clip
You decide whether to use first-frame images
You design and build the video structure from scratch

Although Simple Mode also allows manual editing, its initial setup is assisted by AI agents.

Custom Mode provides zero agent assistance and is intended for creators who want complete creative freedom and precision.

Creating a Task in Custom Mode

To create a Custom Mode project:

Visit https://beatviz.ai/create-custom
Upload your audio file
Enter a project name
Click Create Task

Custom Mode Interface Overview

The Custom Mode interface is divided into two main sections:

Left Panel: Image and video generation workspace
Right Panel: Audio timeline (track) workspace

The goal is to generate visual content on the left and align it precisely with your audio on the right.

Left Panel: Generation Workspace

The left panel contains three core functional areas:

First-Frame Image Generation
Video Generation
Audio Analysis (reference only, no auto-generation)

The bottom section is the generator, where you input prompts and settings
The top section displays generated images and videos

This panel is where all visual assets are created before being placed on the timeline.

Right Panel: Audio Timeline

The right panel is centered around the audio timeline at the bottom.

Here you can:

Drag generated videos from the left panel onto the timeline
Align video clips with specific segments of your audio
Rearrange video order freely after placement

First-Frame Image Generation in Custom Mode

To generate a first-frame image:

In the generator area, select Image
Choose your preferred image model
Enter your prompt
Optionally upload a reference image
Click Generate

First-frame images can later be reused to guide video generation.

Video Generation in Custom Mode

Video generation follows a similar workflow:

Enter your video prompt
Select a video model
Optionally choose a first-frame image
Generate the video

About First-Frame Images

A first-frame image defines the visual starting point of a video clip.

It strongly influences:

Composition
Character appearance
Overall visual direction

Using a well-designed first frame can significantly improve video consistency and quality.

First-frame image selection for video generation

Applying Videos to the Audio Timeline

Once your video clips are generated:

Drag them from the left panel into the timeline
Align each clip with the desired audio segment

You can also:

Change clip order at any time
Replace or remove clips freely

Lip Sync Feature in Custom Mode

To use the lip sync feature, two steps are required:

Step 1: Select Lip Sync Audio

In the right-side audio timeline:

Select the specific audio segment that requires lip synchronization
Beatviz will use this selected audio to guide the AI's lip movement generation

Step 2: Define Visual Direction

You must also provide:

A prompt describing the character and scene
An optional reference image

These inputs define the overall visual style while the selected audio controls mouth movement.

Ensuring Character Consistency

Character consistency is primarily controlled by the first frame image.

How Consistency Is Determined

If the first frame includes your character, the AI will maintain it throughout the video.
If the first frame lacks a character but prompts reference one, the AI will generate a random character.

Consistent vs inconsistent character example

Best Practice

Always confirm:

The character is clearly visible in the first frame
The image matches the prompt description

This is the most reliable method for visual continuity.

You can also:

Use the Import from Library button to import images generated from previous tasks
Utilize AI to regenerate new images when needed

Improving Lip Sync Quality

For optimal lip-sync results:

Use Singing mode
Especially for music and vocal-heavy content

Singing mode consumes more credits due to:

Longer render time
Advanced facial animation models

The quality improvement is typically substantial.

Summary

This tutorial is structured for modular reading and visual learning. Each section is designed to stand alone and pair naturally with short GIF demonstrations, making it ideal for onboarding, documentation, and product education.

Beatviz Tutorial

Getting Started

Quick Start Guides

Simple Mode vs Custom Mode

Ensuring Character Consistency

Improving Lip Sync Quality

Create a New Task

Select a Template

Storytelling

Singing

Standard vs Pro Mode

Standard Mode

Pro Mode

Configure Presets: Character

Configure Presets: Style

Review First Frame & Video Prompts

First Frame Image: Why It Matters

Regeneration & Quality Controls

Regenerate a Video Segment

Reference Image vs First Frame Image

Reference Image

First Frame Image

Draft Recovery

Simple Mode vs Custom Mode

Key Differences

Simple Mode

Custom Mode

Creating a Task in Custom Mode

Custom Mode Interface Overview

Left Panel: Generation Workspace

Right Panel: Audio Timeline

First-Frame Image Generation in Custom Mode

Video Generation in Custom Mode

About First-Frame Images

Applying Videos to the Audio Timeline

Lip Sync Feature in Custom Mode

Step 1: Select Lip Sync Audio

Step 2: Define Visual Direction

Ensuring Character Consistency

How Consistency Is Determined

Best Practice

Improving Lip Sync Quality

Summary