I am looking for a video scene detection algorithm implementation. Any programming language used for the implementation is acceptable. I found this implementation but it is very sensitive to small changes and inaccurate.
Video Scene Detection Implementation
Asked Answered
Computer vision is a complex area that's still the subject of a lot of research. There aren't any "perfect" solutions, but if you give us more information on your use case, we may be able to offer solutions that will perform better for you than general-purpose ones will. –
Proverbs
I am working in the research field too :) but I need a simple and effective shot detection algorithm that would segment a provided video into scene intervals. This is not a part of my research, but an added value. Anything can help as I can't find implementations that easy! –
Purpose
I'm not aware of any implementations that are 100% accurate in the general case. Keyframing in particular seems to trip up a lot of algorithms. If you want this sort of thing, you're probably going to have to roll your own and tweak it to your needs - a quick search on Google Scholar for "Scene Change Detection" brings up a number of relevant articles you might want to look at. –
Proverbs
@FearUs: I wrote the scene detection implementation you linked to above. It's nearly 100% accurate in the software domain I wrote it for and is used in several commercial products. I'd be interested in hearing about the types of videos where you ran into problems (perhaps seeing some sample data). –
Emaciated
Ashley Tate, can you please provide the implementation again as the link above no longer works. Thanks. –
Fenn
@Fenn it's available on archive.org here –
Apothecium
Finally I did an simple implementation, by comparing 2 consecutive frames:
- Resize current frame: For computation time + Memory
- Generate the Color Histogram for the resized frame
- Calculate the Euclidean Distance of the current frame with the previous frame's.
- If the Euclidean Distance is greater than a given threshold, then we have a scene change.
The threshold varies from a video to another ... but I got acceptable results.
TIL a color histogram is just the number of pixels for each color. thanks for the idea. –
Apothecium
Oh I am affraid I don't have it anymore. That was in 2011 :/ –
Purpose
another option is a managed video understanding embedding model. if split your video into N segments, then embed each segment capturing motion, context etc you can do semantic queries like "catching a ball" or "breaking the ice".
mixpeek has a model, vuse-generic-v1: https://learn.mixpeek.com/vuse-v1-release/
here's an example indexing pipeline:
obj = {
"embedding": mixpeek.embed.video(chunk, "mixpeek/vuse-generic-v1"),
"tags": mixpeek.extract.video(chunk, ""),
"description": mixpeek.extract.video(chunk, ""),
"file_url": file_url,
"parent_id": FileTools.uuid(),
"metadata": {
"time_start": chunk.start_time,
"time_end": chunk.end_time,
"duration": chunk.duration,
"fps": chunk.fps,
"video_codec": chunk.video_codec,
"audio_codec": chunk.audio_codec
}
}
and to query once you have the video embeddings indexed:
embedding = mixpeek.embed("breaking the ice", "vuse-generic-v1")
© 2022 - 2025 — McMap. All rights reserved.