StoryMem

Abstract

Visual storytelling requires generating multi-shot videos with cinematic quality and long-range consistency. Inspired by human memory, we propose StoryMem, a paradigm that reformulates long-form video storytelling as iterative shot synthesis conditioned on explicit visual memory, transforming pre-trained single-shot video diffusion models into multi-shot storytellers. This is achieved by a novel Memory-to-Video (M2V) design, which maintains a compact and dynamically updated memory bank of keyframes from historical generated shots. The stored memory is then injected into single-shot video diffusion models via latent concatenation and negative RoPE shifts with only LoRA fine-tuning. A semantic keyframe selection strategy, together with aesthetic preference filtering, further ensures informative and stable memory throughout generation. Moreover, the proposed framework naturally accommodates smooth shot transitions and customized story generation application. To facilitate evaluation, we introduce ST-Bench, a diverse benchmark for multi-shot video storytelling. Extensive experiments demonstrate that StoryMem achieves superior cross-shot consistency over previous methods while preserving high aesthetic quality and prompt adherence, marking a significant step toward coherent minute-long video storytelling.

Highlights

🎬
Multi-scene Multi-shot Story
Generates coherent, minute-long stories across multiple scenes with shot-level control and single-shot costs.
🧠
Persistent Memory
Maintains consistent character appearance and scene elements throughout multi-shot video storytelling.
🎨
Cinematic Quality
Inherits high aesthetics, prompt adherence, and camera control from state-of-the-art single-shot video generation models.
🧩
Flexible Design
Natively supports smooth shot connections and reference-guided customized story generation applications.

Method Overview

StoryMem generates each shot conditioned on a memory bank that stores keyframes from previously generated shots (Memory-to-Video, M2V). During generation, the selected memory frames are encoded by a 3D VAE, fused with noisy video latents and binary masks, and fed into a LoRA-finetuned memory-conditioned Video DiT to synthesize the current shot. After generating each shot, semantic keyframe selection and aesthetic preference filtering are applied to obtain informative and reliable memory frames, enabling long-range cross-shot consistency and natural narrative progression. By iteratively generating shots with memory updates, StoryMem produces coherent minute-long, multi-shot story videos.

Video Results

Multi-shot Long Video Storytelling

Story of Lin Daiyu's First Spring in the Garden

A gentle, lyrical portrait of Lin Daiyu arriving at the Jia household, discovering the Grand View Garden, meeting Jia Baoyu, and finding solace in poetry and falling blossoms. Across simple, quiet moments, her grace, sensitivity, and bond with nature unfold in a soft, painterly atmosphere.

Scene 1, Shot 1

Wide shot

Scene 1, Shot 2

Medium shot

Scene 1, Shot 3

Close-up

Scene 2, Shot 4

Wide shot

Scene 2, Shot 5

Medium close-up

Scene 2, Shot 6

Close-up

Scene 3, Shot 7

Two-shot

Scene 3, Shot 8

Medium shot

Scene 3, Shot 9

Medium close-up

Scene 4, Shot 10

Wide shot

Scene 4, Shot 11

Medium close-up

Scene 4, Shot 12

Medium shot

0:00 / 0:00

Shot Prompt ▼

Click on a shot or play the video to see the text description.

Story of the Real Santa Claus

heartfelt story revealing the quiet, genuine side of Santa Claus as he prepares for Christmas Eve. From his cozy workshop in the North Pole to delivering gifts under the starlit sky, we see Santa's care, warmth, and love for children around the world. The story captures the gentle magic of Christmas through intimate moments of preparation, wonder, and kindness.

Scene 1, Shot 1

Medium wide shot

Scene 1, Shot 2

Close-up

Scene 1, Shot 3

Medium shot

Scene 2, Shot 4

Wide shot

Scene 2, Shot 5

Medium close-up

Scene 2, Shot 6

Wide side shot

Scene 3, Shot 7

Wide cinematic shot

Scene 3, Shot 8

Medium shot

Scene 3, Shot 9

Medium close-up

Scene 4, Shot 10

Wide shot

Scene 4, Shot 11

Medium shot

Scene 4, Shot 12

Slow fade-out

0:00 / 0:00

Shot Prompt ▼

Click on a shot or play the video to see the text description.

Story of the Beautiful Princess and the Brave Prince

A gentle love story between a prince and a princess living happily in a peaceful kingdom. One day, the princess is captured by a fierce wolf in the forest. The prince courageously embarks on a journey to rescue her. Through courage and love, he defeats the wolf and saves the princess, and the two are reunited under the golden sunset, symbolizing love's triumph over fear.

Shot 1

Royal Garden Stroll

Shot 2

Flower Exchange

Shot 3

Fountain Scene

Shot 4

Wolf Attack

Shot 5

Princess Captured

Shot 6

Prince's Pursuit

Shot 7

Battle

Shot 8

Rescue

Shot 9

Reunion

0:00 / 0:00

Shot Prompt ▼

Click on a shot or play the video to see the text description.

Comparison

Story of the Little Robot

In a quiet futuristic city, a small curious robot awakens alone in a workshop. It explores the world outside for the first time, learning about nature and emotion. Along its journey, it meets a little girl who shows it kindness. Together, they discover the meaning of friendship in a world of steel and sunlight. The story is gentle, emotional, and full of wonder, emphasizing curiosity, connection, and warmth.

Wan2.2

Wan2.2 + StoryDiffusion

Wan2.2 + IC-LoRA

HoloCine

Sora2

Ours

Story of the Street Musician

A quiet woman finds comfort and clarity in a small café during an ordinary day. From sitting alone by the window with a warm cup of coffee, to observing subtle moments of life around her, and finally lingering as the café grows calm, the story captures solitude, reflection, and the gentle warmth of everyday stillness.

Wan2.2

Wan2.2 + StoryDiffusion

Wan2.2 + IC-LoRA

HoloCine

Sora2

Ours

Story of Neo's Awakening in the Matrix

This short video tells the iconic awakening of Neo — from his ordinary life in the simulated Matrix to the moment he begins to perceive the truth. The story follows Neo’s confusion, his encounter with Morpheus, and his shocking realization that the world around him is not real. The atmosphere gradually shifts from calm and mysterious to tense and surreal, echoing Neo’s transformation from an ordinary man to the chosen one.

Wan2.2

Wan2.2 + StoryDiffusion

Wan2.2 + IC-LoRA

HoloCine

Sora2

Ours

MR2V: Customized Storytelling

Story of a Clumsy Man

A quiet, ordinary man tries to prepare a simple meal in his small kitchen. Awkward and uncoordinated, he approaches each step with sincere determination, only to trigger a series of escalating mishaps. From unruly toast to a stubborn jar and a wildly overflowing blender, his well-meaning efforts turn the kitchen into a mess of gentle chaos. In the end, he abandons the plan and calmly enjoys a single biscuit, surrounded by the evidence of his failed ambition. The story is a lighthearted, dialogue-free visual comedy driven by physical humor and everyday clumsiness.

Reference as initial memory

Story of Young Lovers

During a long ocean voyage aboard a large passenger ship, a quiet young man and a reserved young woman gradually form a connection through shared moments and unspoken understanding. Their bond grows through stillness, wind, and fleeting joy, before the journey takes a sudden and dangerous turn. The story focuses on restrained emotion, human closeness, and a farewell shaped by circumstance rather than words.

Reference as initial memory

Story of a Woman in a Cafe

A quiet woman finds comfort and clarity in a small café during an ordinary day. The woman settles into a corner of the café, reading and working. Time seems to slow, capturing patience, craftsmanship, and the small sense of calm found in everyday rituals.

Reference as initial memory

StoryMem

Abstract

Highlights

Method Overview

Video Results

Multi-shot Long Video Storytelling

Comparison

MR2V: Customized Storytelling

More Results

Citation