How does one encode a series of images into H264 using the x264 C API?

Asked 30/5, 2010 at 23:4 Answered 4/4, 2016 at 14:39

How does one use the x264 C API to encode RBG images into H264 frames? I already created a sequence of RBG images, how can I now transform that sequence into a sequence of H264 frames? In particular, how do I encode this sequence of RGB images into a sequence of H264 frame consisting of a single initial H264 keyframe followed by dependent H264 frames?

Verisimilitude answered 30/5, 2010 at 23:4 Comment(0)

First of all: check the x264.h file, it contains more or less the reference for each function and structure. The x264.c file you can find in the download contains a sample implementation. Most people say to base yourself on that one, but I find it rather complex for beginners, it is good as an example to fall back on however.

First you set up some parameters, of the type x264_param_t, a good site describing parameters is http://mewiki.project357.com/wiki/X264_Settings . Also take a look at the x264_param_default_preset function which allows you to target some functionality without needing to understand all of the (sometimes quite complex) parameters. Also use x264_param_apply_profile afterwards (you'll probably want the "baseline" profile)

This is some example setup from my code:

x264_param_t param;
x264_param_default_preset(&param, "veryfast", "zerolatency");
param.i_threads = 1;
param.i_width = width;
param.i_height = height;
param.i_fps_num = fps;
param.i_fps_den = 1;
// Intra refres:
param.i_keyint_max = fps;
param.b_intra_refresh = 1;
//Rate control:
param.rc.i_rc_method = X264_RC_CRF;
param.rc.f_rf_constant = 25;
param.rc.f_rf_constant_max = 35;
//For streaming:
param.b_repeat_headers = 1;
param.b_annexb = 1;
x264_param_apply_profile(&param, "baseline");

After this you can initialize the encoder as follows

x264_t* encoder = x264_encoder_open(&param);
x264_picture_t pic_in, pic_out;
x264_picture_alloc(&pic_in, X264_CSP_I420, w, h)

X264 expects YUV420P data (I guess some others also, but that's the common one). You can use libswscale (from ffmpeg) to convert images to the right format. Initializing this is like this (i assume RGB data with 24bpp).

struct SwsContext* convertCtx = sws_getContext(in_w, in_h, PIX_FMT_RGB24, out_w, out_h, PIX_FMT_YUV420P, SWS_FAST_BILINEAR, NULL, NULL, NULL);

encoding is as simple as this then, for each frame do:

//data is a pointer to you RGB structure
int srcstride = w*3; //RGB stride is just 3*width
sws_scale(convertCtx, &data, &srcstride, 0, h, pic_in.img.plane, pic_in.img.stride);
x264_nal_t* nals;
int i_nals;
int frame_size = x264_encoder_encode(encoder, &nals, &i_nals, &pic_in, &pic_out);
if (frame_size >= 0)
{
    // OK
}

I hope this will get you going ;), I spent a long time on it myself to get started. X264 is an insanely strong but sometimes complex piece of software.

edit: When you use other parameters there will be delayed frames, this is not the case with my parameters (mostly due to the nolatency option). If this is the case, frame_size will sometimes be zero and you'll have to call x264_encoder_encode as long as the function x264_encoder_delayed_frames does not return 0. But for this functionality you should take a deeper peek into x264.c and x264.h .

Diphtheria answered 30/5, 2010 at 23:18 Comment(6)

This is very helpful (+1). The Python community really needs a wrapper that abstracts away some of this C-style code. – Santinasantini 28/1, 2012 at 17:11

Is there an easy way to stream this to a regular media client, say an XBMC or to wrap it as an AVI stream? – Bernadette 4/4, 2012 at 13:28

You could write a DirectShow source filter. AVI isn't the best container choice for H.264, see en.wikipedia.org/wiki/Comparison_of_container_formats – Avesta 24/8, 2012 at 9:28

x264_encoder_encode does nothing with data, which is the rgb structure, what it's actually encoding? – Koph 30/8, 2017 at 12:51

pic_in.img.stride seems to be renamed to pic_in.img.i_stride. In addition, the resulting frame-data can be found in nals->p_payload – Groh 2/1, 2018 at 17:22

So, assuming that we don't need to do sws_getContext or sws_scale because our input pixels/bytes/data is already in an example color format, what would we do instead to get data into pic_in??????? – Paletot 28/12, 2021 at 16:38

I've uploaded an example which generates raw yuv frames and then encodes them using x264. Full code can be found here: https://gist.github.com/roxlu/6453908

Tancred answered 5/9, 2013 at 18:13 Comment(1)

You could add a summary of your solution here so it lives on past your link's lifetime – Pueblo 5/9, 2013 at 18:31

FFmpeg 2.8.6 C runnable example

Using FFpmeg as a wrapper for x264 is a good idea, as it exposes an uniform API for multiple encoders. So if you ever need to change formats, you can change just one parameter instead of learning a new API.

The example synthesizes and encodes some colorful frames generated by generate_rgb.

Control of frame type (I, P, B) to have as few key-frames as possible (ideally just the first) is discussed here: https://mcmap.net/q/281838/-how-to-write-a-video-encoder-with-ffmpeg-that-explicitly-controls-the-position-of-keyframes As mentioned there, I do not recommend it for most applications.

The key-lines that do frame type control here are:

/* Minimal distance of I-frames. This is the maximum value allowed,
or else we get a warning at runtime. */
c->keyint_min = 600;

and:

if (frame->pts == 1) {
    frame->key_frame = 1;
    frame->pict_type = AV_PICTURE_TYPE_I;
} else {
    frame->key_frame = 0;
    frame->pict_type = AV_PICTURE_TYPE_P;
}

We can then verify the frame type with:

ffprobe -select_streams v \
    -show_frames \
    -show_entries frame=pict_type \
    -of csv \
    tmp.h264

as mentioned at: https://superuser.com/questions/885452/extracting-the-index-of-key-frames-from-a-video-using-ffmpeg

Preview of generated output.

main.c

#include <libavcodec/avcodec.h>
#include <libavutil/imgutils.h>
#include <libavutil/opt.h>
#include <libswscale/swscale.h>

static AVCodecContext *c = NULL;
static AVFrame *frame;
static AVPacket pkt;
static FILE *file;
struct SwsContext *sws_context = NULL;

static void ffmpeg_encoder_set_frame_yuv_from_rgb(uint8_t *rgb) {
    const int in_linesize[1] = { 3 * c->width };
    sws_context = sws_getCachedContext(sws_context,
            c->width, c->height, AV_PIX_FMT_RGB24,
            c->width, c->height, AV_PIX_FMT_YUV420P,
            0, 0, 0, 0);
    sws_scale(sws_context, (const uint8_t * const *)&rgb, in_linesize, 0,
            c->height, frame->data, frame->linesize);
}

uint8_t* generate_rgb(int width, int height, int pts, uint8_t *rgb) {
    int x, y, cur;
    rgb = realloc(rgb, 3 * sizeof(uint8_t) * height * width);
    for (y = 0; y < height; y++) {
        for (x = 0; x < width; x++) {
            cur = 3 * (y * width + x);
            rgb[cur + 0] = 0;
            rgb[cur + 1] = 0;
            rgb[cur + 2] = 0;
            if ((frame->pts / 25) % 2 == 0) {
                if (y < height / 2) {
                    if (x < width / 2) {
                        /* Black. */
                    } else {
                        rgb[cur + 0] = 255;
                    }
                } else {
                    if (x < width / 2) {
                        rgb[cur + 1] = 255;
                    } else {
                        rgb[cur + 2] = 255;
                    }
                }
            } else {
                if (y < height / 2) {
                    rgb[cur + 0] = 255;
                    if (x < width / 2) {
                        rgb[cur + 1] = 255;
                    } else {
                        rgb[cur + 2] = 255;
                    }
                } else {
                    if (x < width / 2) {
                        rgb[cur + 1] = 255;
                        rgb[cur + 2] = 255;
                    } else {
                        rgb[cur + 0] = 255;
                        rgb[cur + 1] = 255;
                        rgb[cur + 2] = 255;
                    }
                }
            }
        }
    }
    return rgb;
}

/* Allocate resources and write header data to the output file. */
void ffmpeg_encoder_start(const char *filename, int codec_id, int fps, int width, int height) {
    AVCodec *codec;
    int ret;

    codec = avcodec_find_encoder(codec_id);
    if (!codec) {
        fprintf(stderr, "Codec not found\n");
        exit(1);
    }
    c = avcodec_alloc_context3(codec);
    if (!c) {
        fprintf(stderr, "Could not allocate video codec context\n");
        exit(1);
    }
    c->bit_rate = 400000;
    c->width = width;
    c->height = height;
    c->time_base.num = 1;
    c->time_base.den = fps;
    c->keyint_min = 600;
    c->pix_fmt = AV_PIX_FMT_YUV420P;
    if (codec_id == AV_CODEC_ID_H264)
        av_opt_set(c->priv_data, "preset", "slow", 0);
    if (avcodec_open2(c, codec, NULL) < 0) {
        fprintf(stderr, "Could not open codec\n");
        exit(1);
    }
    file = fopen(filename, "wb");
    if (!file) {
        fprintf(stderr, "Could not open %s\n", filename);
        exit(1);
    }
    frame = av_frame_alloc();
    if (!frame) {
        fprintf(stderr, "Could not allocate video frame\n");
        exit(1);
    }
    frame->format = c->pix_fmt;
    frame->width  = c->width;
    frame->height = c->height;
    ret = av_image_alloc(frame->data, frame->linesize, c->width, c->height, c->pix_fmt, 32);
    if (ret < 0) {
        fprintf(stderr, "Could not allocate raw picture buffer\n");
        exit(1);
    }
}

/*
Write trailing data to the output file
and free resources allocated by ffmpeg_encoder_start.
*/
void ffmpeg_encoder_finish(void) {
    uint8_t endcode[] = { 0, 0, 1, 0xb7 };
    int got_output, ret;
    do {
        fflush(stdout);
        ret = avcodec_encode_video2(c, &pkt, NULL, &got_output);
        if (ret < 0) {
            fprintf(stderr, "Error encoding frame\n");
            exit(1);
        }
        if (got_output) {
            fwrite(pkt.data, 1, pkt.size, file);
            av_packet_unref(&pkt);
        }
    } while (got_output);
    fwrite(endcode, 1, sizeof(endcode), file);
    fclose(file);
    avcodec_close(c);
    av_free(c);
    av_freep(&frame->data[0]);
    av_frame_free(&frame);
}

/*
Encode one frame from an RGB24 input and save it to the output file.
Must be called after ffmpeg_encoder_start, and ffmpeg_encoder_finish
must be called after the last call to this function.
*/
void ffmpeg_encoder_encode_frame(uint8_t *rgb) {
    int ret, got_output;
    ffmpeg_encoder_set_frame_yuv_from_rgb(rgb);
    av_init_packet(&pkt);
    pkt.data = NULL;
    pkt.size = 0;
    if (frame->pts == 1) {
        frame->key_frame = 1;
        frame->pict_type = AV_PICTURE_TYPE_I;
    } else {
        frame->key_frame = 0;
        frame->pict_type = AV_PICTURE_TYPE_P;
    }
    ret = avcodec_encode_video2(c, &pkt, frame, &got_output);
    if (ret < 0) {
        fprintf(stderr, "Error encoding frame\n");
        exit(1);
    }
    if (got_output) {
        fwrite(pkt.data, 1, pkt.size, file);
        av_packet_unref(&pkt);
    }
}

/* Represents the main loop of an application which generates one frame per loop. */
static void encode_example(const char *filename, int codec_id) {
    int pts;
    int width = 320;
    int height = 240;
    uint8_t *rgb = NULL;
    ffmpeg_encoder_start(filename, codec_id, 25, width, height);
    for (pts = 0; pts < 100; pts++) {
        frame->pts = pts;
        rgb = generate_rgb(width, height, pts, rgb);
        ffmpeg_encoder_encode_frame(rgb);
    }
    ffmpeg_encoder_finish();
}

int main(void) {
    avcodec_register_all();
    encode_example("tmp.h264", AV_CODEC_ID_H264);
    encode_example("tmp.mpg", AV_CODEC_ID_MPEG1VIDEO);
    return 0;
}

Compile and run with:

gcc -o main.out -std=c99 -Wextra main.c -lavcodec -lswscale -lavutil
./main.out
ffplay tmp.mpg
ffplay tmp.h264

Tested on Ubuntu 16.04. GitHub upstream.

Vicinal answered 4/4, 2016 at 14:39 Comment(9)

Downvoters please explain so I can learn and improve content :-) – Vicinal 29/3, 2017 at 6:52

This requires nvcuda.dll and its dependencies. Could not get it to run. – Reins 11/2, 2018 at 2:13

@Reins thanks for the report. Let me know if you find out how to install it. I could only test in Ubuntu. – Vicinal 12/2, 2018 at 13:21

@CiroSantilli新疆再教育营六四事件法轮功郝海东 the output file is not recognized by Quick Time. Do you know what could be happening? – Clasping 22/6, 2021 at 8:57

@NunoSantos sorry, I don't :-) Which ffmpeg/Quick Time versions BTW? Any logs? – Vicinal 22/6, 2021 at 9:22

I'm using the latest version of ffmpeg installed with brew. Your code is deprecated as I had to use avcodec_send_frame / avcodec_receive_packet instead of avcodec_encode_video2. Maybe that has some implications I'm not aware of. This is what I have now -> pastebin.com/BbVJpPcS – Clasping 23/6, 2021 at 10:17

@CiroSantilli新疆再教育营六四事件法轮功郝海东 I believe the solution lies in this -> github.com/FFmpeg/FFmpeg/blob/master/doc/examples/muxing.c – Clasping 23/6, 2021 at 13:53

@NunoSantos thanks for the follow up. Let me know if you manage to fix my example, we can update this answer, or you can add a new one if you want. – Vicinal 23/6, 2021 at 14:2

@CiroSantilli新疆再教育营六四事件法轮功郝海东 the example is correct. There is a sequence of frames in x264 format being encoded. However, if we want to turn this into an mp4 files, readable by movie players we need to perform an additional step which is muxing. It is a bit more complex as it involves usings more libs. The references is definetely here -> github.com/FFmpeg/FFmpeg/blob/master/doc/examples/muxing.c – Clasping 24/6, 2021 at 15:12

Recommended topics

Hot tags