StoryMem

Multi-shot Long Video Storytelling with Memory

1S-Lab, Nanyang Technological University    2Intelligent Creation, ByteDance
*Work done during internship at ByteDance    Project Lead    §Corresponding Author
✦ ✦ ✦

Abstract

Visual storytelling requires generating multi-shot videos with cinematic quality and long-range consistency. Inspired by human memory, we propose StoryMem, a paradigm that reformulates long-form video storytelling as iterative shot synthesis conditioned on explicit visual memory, transforming pre-trained single-shot video diffusion models into multi-shot storytellers. This is achieved by a novel Memory-to-Video (M2V) design, which maintains a compact and dynamically updated memory bank of keyframes from historical generated shots. The stored memory is then injected into single-shot video diffusion models via latent concatenation and negative RoPE shifts with only LoRA fine-tuning. A semantic keyframe selection strategy, together with aesthetic preference filtering, further ensures informative and stable memory throughout generation. Moreover, the proposed framework naturally accommodates smooth shot transitions and customized story generation application. To facilitate evaluation, we introduce ST-Bench, a diverse benchmark for multi-shot video storytelling. Extensive experiments demonstrate that StoryMem achieves superior cross-shot consistency over previous methods while preserving high aesthetic quality and prompt adherence, marking a significant step toward coherent minute-long video storytelling.

✦ ✦ ✦

Highlights

🎬
Multi-scene Multi-shot Story
Generates coherent, minute-long stories across multiple scenes with shot-level control and single-shot costs.
🧠
Persistent Memory
Maintains consistent character appearance and scene elements throughout multi-shot video storytelling.
🎨
Cinematic Quality
Inherits high aesthetics, prompt adherence, and camera control from state-of-the-art single-shot video generation models.
🧩
Flexible Design
Natively supports smooth shot connections and reference-guided customized story generation applications.
✦ ✦ ✦

Method Overview

StoryMem Pipeline

StoryMem generates each shot conditioned on a memory bank that stores keyframes from previously generated shots (Memory-to-Video, M2V). During generation, the selected memory frames are encoded by a 3D VAE, fused with noisy video latents and binary masks, and fed into a LoRA-finetuned memory-conditioned Video DiT to synthesize the current shot. After generating each shot, semantic keyframe selection and aesthetic preference filtering are applied to obtain informative and reliable memory frames, enabling long-range cross-shot consistency and natural narrative progression. By iteratively generating shots with memory updates, StoryMem produces coherent minute-long, multi-shot story videos.

✦ ✦ ✦

Video Results

Multi-shot Long Video Storytelling

Story of Lin Daiyu's First Spring in the Garden
A gentle, lyrical portrait of Lin Daiyu arriving at the Jia household, discovering the Grand View Garden, meeting Jia Baoyu, and finding solace in poetry and falling blossoms. Across simple, quiet moments, her grace, sensitivity, and bond with nature unfold in a soft, painterly atmosphere.
Shot 1
Scene 1, Shot 1
Wide shot
Shot 2
Scene 1, Shot 2
Medium shot
Shot 3
Scene 1, Shot 3
Close-up
Shot 4
Scene 2, Shot 4
Wide shot
Shot 5
Scene 2, Shot 5
Medium close-up
Shot 6
Scene 2, Shot 6
Close-up
Shot 7
Scene 3, Shot 7
Two-shot
Shot 8
Scene 3, Shot 8
Medium shot
Shot 9
Scene 3, Shot 9
Medium close-up
Shot 10
Scene 4, Shot 10
Wide shot
Shot 11
Scene 4, Shot 11
Medium close-up
Shot 12
Scene 4, Shot 12
Medium shot
0:00 / 0:00
Shot Prompt

Click on a shot or play the video to see the text description.

Story of the Real Santa Claus
heartfelt story revealing the quiet, genuine side of Santa Claus as he prepares for Christmas Eve. From his cozy workshop in the North Pole to delivering gifts under the starlit sky, we see Santa's care, warmth, and love for children around the world. The story captures the gentle magic of Christmas through intimate moments of preparation, wonder, and kindness.
Shot 1
Scene 1, Shot 1
Medium wide shot
Shot 2
Scene 1, Shot 2
Close-up
Shot 3
Scene 1, Shot 3
Medium shot
Shot 4
Scene 2, Shot 4
Wide shot
Shot 5
Scene 2, Shot 5
Medium close-up
Shot 6
Scene 2, Shot 6
Wide side shot
Shot 7
Scene 3, Shot 7
Wide cinematic shot
Shot 8
Scene 3, Shot 8
Medium shot
Shot 9
Scene 3, Shot 9
Medium close-up
Shot 10
Scene 4, Shot 10
Wide shot
Shot 11
Scene 4, Shot 11
Medium shot
Shot 12
Scene 4, Shot 12
Slow fade-out
0:00 / 0:00
Shot Prompt

Click on a shot or play the video to see the text description.

Story of the Beautiful Princess and the Brave Prince
A gentle love story between a prince and a princess living happily in a peaceful kingdom. One day, the princess is captured by a fierce wolf in the forest. The prince courageously embarks on a journey to rescue her. Through courage and love, he defeats the wolf and saves the princess, and the two are reunited under the golden sunset, symbolizing love's triumph over fear.
Shot 1
Shot 1
Royal Garden Stroll
Shot 2
Shot 2
Flower Exchange
Shot 3
Shot 3
Fountain Scene
Shot 4
Shot 4
Wolf Attack
Shot 5
Shot 5
Princess Captured
Shot 6
Shot 6
Prince's Pursuit
Shot 7
Shot 7
Battle
Shot 8
Shot 8
Rescue
Shot 9
Shot 9
Reunion
0:00 / 0:00
Shot Prompt

Click on a shot or play the video to see the text description.

Comparison

Story of the Little Robot
In a quiet futuristic city, a small curious robot awakens alone in a workshop. It explores the world outside for the first time, learning about nature and emotion. Along its journey, it meets a little girl who shows it kindness. Together, they discover the meaning of friendship in a world of steel and sunlight. The story is gentle, emotional, and full of wonder, emphasizing curiosity, connection, and warmth.
Wan2.2
Wan2.2 + StoryDiffusion
Wan2.2 + IC-LoRA
HoloCine
Sora2
Ours
Story of the Street Musician
A quiet woman finds comfort and clarity in a small café during an ordinary day. From sitting alone by the window with a warm cup of coffee, to observing subtle moments of life around her, and finally lingering as the café grows calm, the story captures solitude, reflection, and the gentle warmth of everyday stillness.
Wan2.2
Wan2.2 + StoryDiffusion
Wan2.2 + IC-LoRA
HoloCine
Sora2
Ours
Story of Neo's Awakening in the Matrix
This short video tells the iconic awakening of Neo — from his ordinary life in the simulated Matrix to the moment he begins to perceive the truth. The story follows Neo’s confusion, his encounter with Morpheus, and his shocking realization that the world around him is not real. The atmosphere gradually shifts from calm and mysterious to tense and surreal, echoing Neo’s transformation from an ordinary man to the chosen one.
Wan2.2
Wan2.2 + StoryDiffusion
Wan2.2 + IC-LoRA
HoloCine
Sora2
Ours

MR2V: Customized Storytelling

Story of a Clumsy Man
A quiet, ordinary man tries to prepare a simple meal in his small kitchen. Awkward and uncoordinated, he approaches each step with sincere determination, only to trigger a series of escalating mishaps. From unruly toast to a stubborn jar and a wildly overflowing blender, his well-meaning efforts turn the kitchen into a mess of gentle chaos. In the end, he abandons the plan and calmly enjoys a single biscuit, surrounded by the evidence of his failed ambition. The story is a lighthearted, dialogue-free visual comedy driven by physical humor and everyday clumsiness.
Reference Image 1
Reference as initial memory
Story of Young Lovers
During a long ocean voyage aboard a large passenger ship, a quiet young man and a reserved young woman gradually form a connection through shared moments and unspoken understanding. Their bond grows through stillness, wind, and fleeting joy, before the journey takes a sudden and dangerous turn. The story focuses on restrained emotion, human closeness, and a farewell shaped by circumstance rather than words.
Reference Image 1
Reference Image 2
Reference as initial memory
Story of a Woman in a Cafe
A quiet woman finds comfort and clarity in a small café during an ordinary day. The woman settles into a corner of the café, reading and working. Time seems to slow, capturing patience, craftsmanship, and the small sense of calm found in everyday rituals.
Reference Image 1
Reference Image 2
Reference as initial memory

More Results

Story of a Cute Cat
Story of the Little Mermaid
Story of a Man and His Dog
Story of Jane Eyre
Story of Robinson Crusoe
Story of a Magic Boy
Story of an Artist
Story of an AI PhD Student
Story of the Monkey King
✦ ✦ ✦

Citation

BibTeX
@article{zhang2025storymem,
  title={{StoryMem}: Multi-shot Long Video Storytelling with Memory},
  author={Zhang, Kaiwen and Jiang, Liming and Wang, Angtian and Fang, Jacob Zhiyuan and Zhi, Tiancheng and Yan, Qing and Kang, Hao and Lu, Xin and Pan, Xingang},
  journal={arXiv preprint},
  volume={arXiv:2512.19539},
  year={2025}
}