Motivation
In recent years, the rapid growth of online video-sharing platforms, along with applications such as surveillance and media archiving, has made the efficient analysis of large-scale video data increasingly important. Because watching full-length videos is often time-consuming and resource-intensive, video summarization has become an essential technology.
In this study, we focus on the unsupervised video summarization problem and propose a summarization framework that preserves both the temporal structure and semantic content of video data.
Background & Motivation
Recently, unsupervised approaches built on Generative Adversarial Networks (GANs) have demonstrated strong potential by casting video summarization as a reconstruction problem. Modern techniques further enhance this process by replacing Long Short-Term Memory (LSTM) architectures with self-attention mechanisms in the frame selection stage, enabling more efficient capture of long-range temporal relationships among video frames.
However, importance scores generated by self-attention may exhibit high variance and temporal inconsistency, which can negatively affect the quality of frame selection.
Proposed Method
In this project, we propose a novel model that integrates a diffusion-based score refinement mechanism into the adversarial training loop. To this end, a Denoising Diffusion Probabilistic Model (DDPM) is employed to act as a regularizer on the outputs of the self-attention module.
During training, the diffusion process refines noisy attention scores to produce more stable and representative importance estimates. These refined scores guide the Variational Autoencoder (VAE) and GAN components to reconstruct the video using only the most informative segments.
Experimental results demonstrate that the proposed method outperforms conventional GAN-based video summarization approaches in terms of capturing important video segments and preserving summary coherence.
Model Architecture
Publications and Source Codes
The paper related to this project is currently under review. Once the review process is complete, the source code will be made available.