Is there an elegant way to split a file by chapter using ffmpeg?

Asked 18/5, 2015 at 14:30 Answered 21/6, 2024 at 5:28

In this page, Albert Armea share a code to split videos by chapter using ffmpeg. The code is straight forward, but not quite good-looking.

ffmpeg -i "$SOURCE.$EXT" 2>&1 |
grep Chapter |
sed -E "s/ *Chapter #([0-9]+\.[0-9]+): start ([0-9]+\.[0-9]+), end ([0-9]+\.[0-9]+)/-i \"$SOURCE.$EXT\" -vcodec copy -acodec copy -ss \2 -to \3 \"$SOURCE-\1.$EXT\"/" |
xargs -n 11 ffmpeg

Is there an elegant way to do this job?

Clubbable answered 18/5, 2015 at 14:30 Comment(3)

I had to make a slight modification to get that working because my chapters had the word "Chapter" in the title: | grep '^\s*Chapter' | – Spense 18/11, 2017 at 18:10

I'd like to know how to do the opposite: concat files with chapter markers added for each file. – Helbonia 6/4, 2020 at 17:2

Looks like we have to script it. We need a shortcut to rip vdeos like those from youtube .mkv with chapters, to multiple sound files. – Chapple 14/9, 2021 at 17:58

(Edit: This tip came from https://github.com/phiresky via this issue: https://github.com/harryjackson/ffmpeg_split/issues/2)

You can get chapters using:

ffprobe -i fname -print_format json -show_chapters -loglevel error

If I was writing this again I'd use ffprobe's json options

(Original answer follows)

This is a working python script. I tested it on several videos and it worked well. Python isn't my first language but I noticed you use it so I figure writing it in Python might make more sense. I've added it to Github. If you want to improve please submit pull requests.

#!/usr/bin/env python
import os
import re
import subprocess as sp
from subprocess import *
from optparse import OptionParser

def parseChapters(filename):
  chapters = []
  command = [ "ffmpeg", '-i', filename]
  output = ""
  try:
    # ffmpeg requires an output file and so it errors 
    # when it does not get one so we need to capture stderr, 
    # not stdout.
    output = sp.check_output(command, stderr=sp.STDOUT, universal_newlines=True)
  except CalledProcessError, e:
    output = e.output 
   
  for line in iter(output.splitlines()):
    m = re.match(r".*Chapter #(\d+:\d+): start (\d+\.\d+), end (\d+\.\d+).*", line)
    num = 0 
    if m != None:
      chapters.append({ "name": m.group(1), "start": m.group(2), "end": m.group(3)})
      num += 1
  return chapters

def getChapters():
  parser = OptionParser(usage="usage: %prog [options] filename", version="%prog 1.0")
  parser.add_option("-f", "--file",dest="infile", help="Input File", metavar="FILE")
  (options, args) = parser.parse_args()
  if not options.infile:
    parser.error('Filename required')
  chapters = parseChapters(options.infile)
  fbase, fext = os.path.splitext(options.infile)
  for chap in chapters:
    print "start:" +  chap['start']
    chap['outfile'] = fbase + "-ch-"+ chap['name'] + fext
    chap['origfile'] = options.infile
    print chap['outfile']
  return chapters

def convertChapters(chapters):
  for chap in chapters:
    print "start:" +  chap['start']
    print chap
    command = [
        "ffmpeg", '-i', chap['origfile'],
        '-vcodec', 'copy',
        '-acodec', 'copy',
        '-ss', chap['start'],
        '-to', chap['end'],
        chap['outfile']]
    output = ""
    try:
      # ffmpeg requires an output file and so it errors 
      # when it does not get one
      output = sp.check_output(command, stderr=sp.STDOUT, universal_newlines=True)
    except CalledProcessError, e:
      output = e.output
      raise RuntimeError("command '{}' return with error (code {}): {}".format(e.cmd, e.returncode, e.output))

if __name__ == '__main__':
  chapters = getChapters()
  convertChapters(chapters)

Academic answered 20/4, 2016 at 6:11 Comment(7)

Here's another similar python script meant to parse m4b audio books by chapters. github.com/valekhz/m4b-converter – Carleycarli 20/4, 2016 at 8:30

I posted a modified version below that uses the chapter name as the filename. It's not elegant but it works :) – Covenanter 24/12, 2016 at 3:5

and a second one, written this one just now for AAX to MP3 chapterized conversion github.com/OndrejSkalicka/aax-to-mp3-python – Margemargeaux 19/2, 2018 at 21:32

Confirmed: It does work, and thank you for making it available! – Elkeelkhound 1/3, 2019 at 21:29

Great basis for what I need. I want to edit out some stuff by chapter name and then recombine them afterwards but I can see how to do that easy enough. – Bindery 4/8, 2019 at 19:19

@Crissov I rejected you're edit by mistake, can you please add it back so you get the credit for it ie stackoverflow.com/review/suggested-edits/24746850 – Academic 5/12, 2019 at 7:55

The json from the ffprobe output has been invaluable recently. Admittedly, I have not taken advantage of the python script. Very helpful, thank you. – Nogood 9/1, 2022 at 18:51

A version of the original shell code with:

improved efficiency by
- using ffprobe instead of ffmpeg
- splitting the input rather than the output
improved reliability by avoiding xargs and sed
improved readability by using multiple lines
carrying over of multiple audio or subtitle streams
remove chapters from output files (as they would be invalid timecodes)
simplified command-line arguments

#!/bin/sh -efu

input="$1"
ffprobe \
    -print_format csv \
    -show_chapters \
    "$input" |
cut -d ',' -f '5,7,8' |
while IFS=, read start end chapter
do
    ffmpeg \
        -nostdin \
        -ss "$start" -to "$end" \
        -i "$input" \
        -c copy \
        -map 0 \
        -map_chapters -1 \
        "${input%.*}-$chapter.${input##*.}"
done

To prevent it from interfering with the loop, ffmpeg is instructed not to read from stdin.

Euchology answered 30/11, 2018 at 8:35 Comment(4)

You can use -nostdin instead of </dev/null, -c copy instead of -vcodec copy -acodec copy -scodec copy, and -map 0 instead of -map 0:a -map 0:v -map 0:s. – Issy 8/11, 2019 at 19:0

I'd also move the line -ss ... before the line -i ..., otherwise ffmpeg builds the output file in order to seek rather than seeking directly in the input. This speeds up things immensely when you're also transcoding. Depending on what you're splitting you may not want to do this (I'm splitting and transcoding audio so seeking the input is fine). – Gelignite 17/7, 2020 at 16:46

@Issy @Gelignite great suggestions, thank you! If you have jq at hand, I'd actually recommend @SebMa's answer which appears to be based on mine, but much more future proof thanks to using ffprobe's JSON output. But I'll incorporate your tips anyway. – Euchology 18/7, 2020 at 17:46

This one puts all but the previous chapter informations in all files ie. 1..23 in the first, 2..23 in the second and so on – Thanasi 27/2, 2021 at 20:18

ffmpeg -i "$SOURCE.$EXT" 2>&1 \ # get metadata about file
| grep Chapter \ # search for Chapter in metadata and pass the results
| sed -E "s/ *Chapter #([0-9]+.[0-9]+): start ([0-9]+.[0-9]+), end ([0-9]+.[0-9]+)/-i \"$SOURCE.$EXT\" -vcodec copy -acodec copy -ss \2 -to \3 \"$SOURCE-\1.$EXT\"/" \ # filter the results, explicitly defining the timecode markers for each chapter
| xargs -n 11 ffmpeg # construct argument list with maximum of 11 arguments and execute ffmpeg

Your command parses through the files metadata and reads out the timecode markers for each chapter. You could do this manually for each chapter..

ffmpeg -i ORIGINALFILE.mp4 -acodec copy -vcodec copy -ss 0 -t 00:15:00 OUTFILE-1.mp4

or you can write out the chapter markers and run through them with this bash script which is just a little easier to read..

#!/bin/bash
# Author: http://crunchbang.org/forums/viewtopic.php?id=38748#p414992
# m4bronto

#     Chapter #0:0: start 0.000000, end 1290.013333
#       first   _     _     start    _     end

while [ $# -gt 0 ]; do

ffmpeg -i "$1" 2> tmp.txt

while read -r first _ _ start _ end; do
  if [[ $first = Chapter ]]; then
    read  # discard line with Metadata:
    read _ _ chapter

    ffmpeg -vsync 2 -i "$1" -ss "${start%?}" -to "$end" -vn -ar 44100 -ac 2 -ab 128  -f mp3 "$chapter.mp3" </dev/null

  fi
done <tmp.txt

rm tmp.txt

shift
done

or you can use HandbrakeCLI, as originally mentioned in this post, this example extracts chapter 3 to 3.mkv

HandBrakeCLI -c 3 -i originalfile.mkv -o 3.mkv

or another tool is mentioned in this post

mkvmerge -o output.mkv --split chapters:all input.mkv

Carleycarli answered 19/4, 2016 at 3:29 Comment(2)

Upvote for mkvmerge. One liner to get all chapters that even works with windows 👍 – Liquorish 31/5, 2019 at 14:49

mkvmerge can also take input (maybe even output) in other formats. Used with m4v and m4b just fine. – Wenz 17/3, 2024 at 15:46

A little more simple than extracting data with sed by using JSON with jq :

#!/usr/bin/env bash 
# For systems where "bash" in not in "/bin/"

set -efu

videoFile="$1"
ffprobe -hide_banner \
        "$videoFile" \
        -print_format json \
        -show_chapters \
        -loglevel error |
    jq -r '.chapters[] | [ .id, .start_time, .end_time | tostring ] | join(" ")' |
    while read chapter start end; do
        ffmpeg -nostdin \
               -ss "$start" -to "$end" \
               -i "$videoFile" \
               -map 0 \
               -map_chapters -1 \
               -c copy \
               -metadata title="$chapter"
               "${videoFile%.*}-$chapter.${videoFile##*.}";
    done

I use the tostring jq function because chapers[].id is an integer.

Gurolinick answered 30/4, 2020 at 14:11 Comment(0)

This is the PowerShell version

$filePath = 'C:\InputVideo.mp4'

$file = Get-Item $filePath

$json = ConvertFrom-Json (ffprobe -i $filePath -print_format json -show_chapters -loglevel error | Out-String)

foreach($chapter in $json.chapters)
{
    ffmpeg -loglevel error -i $filePath -c copy -ss $chapter.start_time -to $chapter.end_time "$($file.DirectoryName)\$($chapter.id).$($file.Extension)"
}

Understood answered 11/6, 2021 at 1:53 Comment(0)

I modified Harry's script to use the chapter name for the filename. It outputs into a new directory with the name of the input file (minus extension). It also prefixes each chapter name with "1 - ", "2 - ", etc in case there are chapters with the same name.

#!/usr/bin/env python
import os
import re
import pprint
import sys
import subprocess as sp
from os.path import basename
from subprocess import *
from optparse import OptionParser

def parseChapters(filename):
  chapters = []
  command = [ "ffmpeg", '-i', filename]
  output = ""
  m = None
  title = None
  chapter_match = None
  try:
    # ffmpeg requires an output file and so it errors
    # when it does not get one so we need to capture stderr,
    # not stdout.
    output = sp.check_output(command, stderr=sp.STDOUT, universal_newlines=True)
  except CalledProcessError, e:
    output = e.output

  num = 1

  for line in iter(output.splitlines()):
    x = re.match(r".*title.*: (.*)", line)
    print "x:"
    pprint.pprint(x)

    print "title:"
    pprint.pprint(title)

    if x == None:
      m1 = re.match(r".*Chapter #(\d+:\d+): start (\d+\.\d+), end (\d+\.\d+).*", line)
      title = None
    else:
      title = x.group(1)

    if m1 != None:
      chapter_match = m1

    print "chapter_match:"
    pprint.pprint(chapter_match)

    if title != None and chapter_match != None:
      m = chapter_match
      pprint.pprint(title)
    else:
      m = None

    if m != None:
      chapters.append({ "name": `num` + " - " + title, "start": m.group(2), "end": m.group(3)})
      num += 1

  return chapters

def getChapters():
  parser = OptionParser(usage="usage: %prog [options] filename", version="%prog 1.0")
  parser.add_option("-f", "--file",dest="infile", help="Input File", metavar="FILE")
  (options, args) = parser.parse_args()
  if not options.infile:
    parser.error('Filename required')
  chapters = parseChapters(options.infile)
  fbase, fext = os.path.splitext(options.infile)
  path, file = os.path.split(options.infile)
  newdir, fext = os.path.splitext( basename(options.infile) )

  os.mkdir(path + "/" + newdir)

  for chap in chapters:
    chap['name'] = chap['name'].replace('/',':')
    chap['name'] = chap['name'].replace("'","\'")
    print "start:" +  chap['start']
    chap['outfile'] = path + "/" + newdir + "/" + re.sub("[^-a-zA-Z0-9_.():' ]+", '', chap['name']) + fext
    chap['origfile'] = options.infile
    print chap['outfile']
  return chapters

def convertChapters(chapters):
  for chap in chapters:
    print "start:" +  chap['start']
    print chap
    command = [
        "ffmpeg", '-i', chap['origfile'],
        '-vcodec', 'copy',
        '-acodec', 'copy',
        '-ss', chap['start'],
        '-to', chap['end'],
        chap['outfile']]
    output = ""
    try:
      # ffmpeg requires an output file and so it errors
      # when it does not get one
      output = sp.check_output(command, stderr=sp.STDOUT, universal_newlines=True)
    except CalledProcessError, e:
      output = e.output
      raise RuntimeError("command '{}' return with error (code {}): {}".format(e.cmd, e.returncode, e.output))

if __name__ == '__main__':
  chapters = getChapters()
  convertChapters(chapters)

This took a good bit to figure out since I'm definitely NOT a Python guy. It's also inelegant as there were many hoops to jump through since it is processing the metadata line by line. (Ie, the title and chapter data are found in separate loops through the metadata output)

But it works and it should save you a lot of time. It did for me!

Covenanter answered 24/12, 2016 at 3:5 Comment(6)

@JP. Glad to hear it! – Covenanter 23/2, 2017 at 16:27

This worked well once I ran ffmpeg -i independently, to determine the format of my file's metadata. I had to tinker with the regex since my chapters weren't of the format Chapter #dd:dd. It would be good to try and make your regex more robust :-) – Decrepit 6/4, 2017 at 1:25

Your way of determing the path only works for when using an absolute path for the input file. Otherwise the variable path is empty and therefore the path of the output files is a directory inside the document root, for example /test for the input file test.mp4. – Chigetai 12/2, 2018 at 12:26

thanks @clifgriffin, I liked your version and modified it to work in Python 3. I also cleaned up the imports and added leading zeroes to chapter number gist.github.com/showerbeer/97c1f31770572d05738cd2b74167f8a4 – Layman 12/10, 2018 at 11:0

I saved this as splitfilebychapter.sh. When I run from command line I issue splitfilebychapter.sh alargeaudiobook.mp3. It returns: splitfilebychapter.sh: error: Filename required. Is it looking for the name of an input file or output file? – Striation 26/3, 2019 at 13:45

Instead of posting the adjusted script, you may as well send a pull request since there is a link to the Github project. – Siphonophore 22/10, 2020 at 4:22

I was trying to split an .m4b audiobook myself the other day, and stumbled over this thread and others, but I couldn't find any examples using batch-script. I don't know python or bash, and I am no expert in batch at all, but I tried to read up on how one might do it, and came up with the following which seems to work.

This exports MP3-file numbered by chapter to the same path as the source file:

@echo off
setlocal enabledelayedexpansion
for /f "tokens=2,5,7,8 delims=," %%G in ('c:\ffmpeg\bin\ffprobe -i %1 -print_format csv -show_chapters -loglevel error  2^> nul') do (
   set padded=00%%G
   "c:\ffmpeg\bin\ffmpeg" -ss %%H -to %%I -i %1 -vn -c:a libmp3lame -b:a 32k -ac 1 -metadata title="%%J" -id3v2_version 3 -write_id3v1 1 -y "%~dpnx1-!padded:~-3!.mp3"
)

For your video file file, I have changed it to the following to handle both video and audio data by straight copying. I don't have a video-file with chapters, so I can't test it, but I hope it works.

@echo off
setlocal enabledelayedexpansion
for /f "tokens=2,5,7,8 delims=," %%G in ('c:\ffmpeg\bin\ffprobe -i %1 -print_format csv -show_chapters -loglevel error  2^> nul') do (
   set padded=00%%G
   "c:\ffmpeg\bin\ffmpeg" -ss %%H -to %%I -i %1 -c:v copy -c:a copy -metadata title="%%J" -y "%~dpnx1-!padded:~-3!.mkv"
)

Otero answered 15/3, 2021 at 20:1 Comment(3)

This is broken. -ss and -to should be AFTER -i, and %%J shouldn't be enclosed in quotes because it already in quotes. also %%J contains a CR character (0x0D), which causes problems and needs to be stripped away. – Backbreaker 17/10, 2021 at 23:0

Also, because you are using -print_format csv, this breaks if the title contains new lines (and/or commas, possibly). – Backbreaker 18/10, 2021 at 0:2

@Backbreaker Not broken, just a bit dangerous. The order doesn't matter if there's a single -i. %%J does not automatically get quotes - what you refer to is probably that -print_format csv needs to emit quotes when the chapter title contains a comma. If so, cmd will cut off %%J at the embedded comma anyway, so there is no hope using cmd. If the chapter title just contains letters and spaces, "%%J" is the right way. However, you're right that it's easy to break with mysterious error messages and safer to omit token 8 and %%J altogether (just imagine a chapter title "& format c: &rem"). – Ambuscade 21/4, 2023 at 11:26

I wanted a few extra things like:

extracting the cover
using the chapter name as filename
prefixing a counter to the filename with leading zeros, so alphabetical ordering will work correctly in every software
making a playlist
modifying the metadata to include the chapter name
outputting all the files to a new directory based on metadata (year author - title)

Here's my script (I used the hint with ffprobe json output from Harry)

#!/bin/bash
input="input.aax"
EXT2="m4a"

json=$(ffprobe -activation_bytes secret -i "$input" -loglevel error -print_format json -show_format -show_chapters)
title=$(echo $json | jq -r ".format.tags.title")
count=$(echo $json | jq ".chapters | length")
target=$(echo $json | jq -r ".format.tags | .date + \" \" + .artist + \" - \" + .title")
mkdir "$target"

ffmpeg -activation_bytes secret -i $input -vframes 1 -f image2 "$target/cover.jpg"

echo "[playlist]
NumberOfEntries=$count" > "$target/0_Playlist.pls"

for i in $(seq -w 1 $count);
do
  j=$((10#$i))
  n=$(($j-1))
  start=$(echo $json | jq -r ".chapters[$n].start_time")
  end=$(echo $json | jq -r ".chapters[$n].end_time")
  name=$(echo $json | jq -r ".chapters[$n].tags.title")
  ffmpeg -activation_bytes secret -i $input -vn -acodec -map_chapters -1 copy -ss $start -to $end -metadata title="$title $name" "$target/$i $name.$EXT2"
  echo "File$j=$i $name.$EXT2" >> "$target/0_Playlist.pls"
done

Proctoscope answered 20/12, 2017 at 8:48 Comment(2)

You don't need the j variable. You can loop from 0 to $((count-1)) and have n=$i because jq understands indexes prefixed with zeroes (example : jq -r ".chapeters[05]") – Gurolinick 30/4, 2020 at 14:0

It removes video it seems, hardocdes AAX secret and is a little broken here and there. But I liked playlist and filename/metadata stuff. So I posted a fixed-up version gist.github.com/akostadinov/… – Renunciation 17/2, 2022 at 9:58

in python

#!/usr/bin/env python3

import sys
import os
import subprocess
import shlex

def split_video(pathToInputVideo):
  command="ffprobe -v quiet -print_format csv -show_chapters "
  args=shlex.split(command)
  args.append(pathToInputVideo)
  output = subprocess.check_output(args, stderr=subprocess.STDOUT, universal_newlines=True)

  cpt=0
  for line in iter(output.splitlines()):
    dec=line.split(",")
    st_time=dec[4]
    end_time=dec[6]
    name=dec[7]

    command="ffmpeg -i _VIDEO_ -ss _START_ -to _STOP_ -vcodec copy -acodec copy"
    args=shlex.split(command)
    args[args.index("_VIDEO_")]=pathToInputVideo
    args[args.index("_START_")]=st_time
    args[args.index("_STOP_")]=end_time

    filename=os.path.basename(pathToInputVideo)
    words=filename.split(".");
    l=len(words)
    ext=words[l-1]

    cpt+=1
    filename=" ".join(words[0:l-1])+" - "+str(cpt)+" - "+name+"."+ext

    args.append(filename)
    subprocess.call(args)

for video in sys.argv[1:]:
  split_video(video)

Energetics answered 9/4, 2019 at 19:15 Comment(1)

Thanks for this solution! I like it because it is cross-platform. It works for me on Windows, however, for non-ASCII characters support I had to add the character encoding explicitly: output = subprocess.check_output(args, stderr=subprocess.STDOUT, universal_newlines=True, encoding="UTF8") – Dunedin 23/6, 2024 at 8:35

Naive solution in NodeJS / JavaScript

const probe = function (fpath, debug) {
      var self = this;
      return new Promise((resolve, reject) => {
        var loglevel = debug ? 'debug' : 'error';
        const args = [
          '-v', 'quiet',
          '-loglevel', loglevel,
          '-print_format', 'json',
          '-show_chapters',
          '-show_format',
          '-show_streams',
          '-i', fpath
        ];
        const opts = {
          cwd: self._options.tempDir
        };
        const cb = (error, stdout) => {
          if (error)
            return reject(error);
          try {
            const outputObj = JSON.parse(stdout);
            return resolve(outputObj);
          } catch (ex) {
            self.logger.error("probe failed %s", ex);
            return reject(ex);
          }
        };
        console.log(args)
        cp.execFile('ffprobe', args, opts, cb)
          .on('error', reject);
      });
    }//probe

The json output raw object will contain a chapters array with the following structure:

{
    "chapters": [{
        "id": 0,
        "time_base": "1/1000",
        "start": 0,
        "start_time": "0.000000",
        "end": 145000,
        "end_time": "135.000000",
        "tags": {
            "title": "This is Chapter 1"
        }
    }]
}

Corydalis answered 12/5, 2021 at 11:39 Comment(0)

Tweaked this answer to make output video names as '[count]-[chapter].xyz'

input="$1"
count=0
ffprobe \
    -print_format csv \
    -show_chapters \
    "$input" |
cut -d ',' -f '5,7,8' |
while IFS=, read start end chapter
do
    ffmpeg \
        -nostdin \
        -ss "$start" -to "$end" \
        -i "$input" \
        -c copy \
        -map 0 \
        -map_chapters -1 \
        "${count}-$chapter.${input##*.}"
    count=$((count+=1))
done

By answered 5/1, 2023 at 23:39 Comment(0)

python3 json variant

#!/usr/bin/python3
import sys,os,subprocess,json

def get_chapters(inp):
    result=subprocess.run(["ffprobe", "-v", "16", "-show_chapters", "-of", "json", inp],
        stdout=subprocess.PIPE,
        stderr=subprocess.STDOUT)
    # print(result.stdout.decode("utf-8"))
    chp=json.loads(result.stdout.decode("utf-8"))
    # print(json.dumps(chp,indent=2))
    if "chapters" in chp:
        if len(chp["chapters"])>0:
            print(inp)
            fno=os.path.splitext(os.path.basename(inp))[0]
            ext=os.path.splitext(inp)[1]
            for ch in chp["chapters"]:
                out=f'/tmp/{fno} - {ch["tags"]["title"]}{ext}'
                print(out)
                os.system(f'ffmpeg -ss {ch["start_time"]} -to {ch["end_time"]} -i "{inp}" -map 0 -c copy "{out}" -v 16')

if len(sys.argv)==1:
    path="."
    # path="./videos"
    for f in os.listdir(path):
        if os.path.splitext(f)[1].lower() in [".webm", ".mkv"]:
            get_chapters(os.path.join(path,f))
else:
    for i in range(1,len(sys.argv)):
        get_chapters(sys.argv[i])

Saluki answered 21/6, 2024 at 5:28 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags