Custom byteArray data to WebRTC videoTrack

Asked 17/7, 2017 at 23:37 Answered 27/7, 2017 at 10:0

Solved android android-camera webrtc android-vision apprtcdemo

I need to use WebRTC for android to send specific cropped(face) video to the videoChannel. I was able manipulate Camera1Session class of WebRTC to get the face cropped. Right now I am setting it to an ImageView. listenForBytebufferFrames() of Camera1Session.java

private void listenForBytebufferFrames() {
    this.camera.setPreviewCallbackWithBuffer(new PreviewCallback() {
        public void onPreviewFrame(byte[] data, Camera callbackCamera) {
            Camera1Session.this.checkIsOnCameraThread();
            if(callbackCamera != Camera1Session.this.camera) {
                Logging.e("Camera1Session", "Callback from a different camera. This should never happen.");
            } else if(Camera1Session.this.state != Camera1Session.SessionState.RUNNING) {
                Logging.d("Camera1Session", "Bytebuffer frame captured but camera is no longer running.");
            } else {
                mFrameProcessor.setNextFrame(data, callbackCamera);
                long captureTimeNs = TimeUnit.MILLISECONDS.toNanos(SystemClock.elapsedRealtime());
                if(!Camera1Session.this.firstFrameReported) {
                    int startTimeMs = (int)TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - Camera1Session.this.constructionTimeNs);
                    Camera1Session.camera1StartTimeMsHistogram.addSample(startTimeMs);
                    Camera1Session.this.firstFrameReported = true;
                }

                ByteBuffer byteBuffer1 = ByteBuffer.wrap(data);
                Frame outputFrame = new Frame.Builder()
                        .setImageData(byteBuffer1,
                                Camera1Session.this.captureFormat.width,
                                Camera1Session.this.captureFormat.height,
                                ImageFormat.NV21)
                        .setTimestampMillis(mFrameProcessor.mPendingTimeMillis)
                        .setId(mFrameProcessor.mPendingFrameId)
                        .setRotation(3)
                        .build();
                int w = outputFrame.getMetadata().getWidth();
                int h = outputFrame.getMetadata().getHeight();
                SparseArray<Face> detectedFaces = mDetector.detect(outputFrame);
                if (detectedFaces.size() > 0) {

                    Face face = detectedFaces.valueAt(0);
                    ByteBuffer byteBufferRaw = outputFrame.getGrayscaleImageData();
                    byte[] byteBuffer = byteBufferRaw.array();
                    YuvImage yuvimage  = new YuvImage(byteBuffer, ImageFormat.NV21, w, h, null);
                    ByteArrayOutputStream baos = new ByteArrayOutputStream();

                    //My crop logic to get face co-ordinates

                    yuvimage.compressToJpeg(new Rect(left, top, right, bottom), 80, baos);
                    final byte[] jpegArray = baos.toByteArray();
                    Bitmap bitmap = BitmapFactory.decodeByteArray(jpegArray, 0, jpegArray.length);

                    Activity currentActivity = getActivity();
                    if (currentActivity instanceof CallActivity) {
                        ((CallActivity) currentActivity).setBitmapToImageView(bitmap); //face on ImageView is set just fine
                    }
                    Camera1Session.this.events.onByteBufferFrameCaptured(Camera1Session.this, data, Camera1Session.this.captureFormat.width, Camera1Session.this.captureFormat.height, Camera1Session.this.getFrameOrientation(), captureTimeNs);
                    Camera1Session.this.camera.addCallbackBuffer(data);
                } else {
                    Camera1Session.this.events.onByteBufferFrameCaptured(Camera1Session.this, data, Camera1Session.this.captureFormat.width, Camera1Session.this.captureFormat.height, Camera1Session.this.getFrameOrientation(), captureTimeNs);
                    Camera1Session.this.camera.addCallbackBuffer(data);
                }

            }
        }
    });
}

jpegArray is the final byteArray that I need to stream via WebRTC, which I tried with something like this:

Camera1Session.this.events.onByteBufferFrameCaptured(Camera1Session.this, jpegArray, (int) face.getWidth(), (int) face.getHeight(), Camera1Session.this.getFrameOrientation(), captureTimeNs);
Camera1Session.this.camera.addCallbackBuffer(jpegArray);

Setting them up like this gives me following error:

../../webrtc/sdk/android/src/jni/androidvideotracksource.cc line 82
Check failed: length >= width * height + 2 * uv_width * ((height + 1) / 2) (2630 vs. 460800)

Which I assume is because androidvideotracksource does not get the same length of byteArray that it expects, since the frame is cropped now. Could someone point me in the direction of how to achieve it? Is this the correct way/place to manipulate the data and feed into the videoTrack?

Edit:bitmap of byteArray data does not give me a camera preview on ImageView, unlike byteArray jpegArray. Maybe because they are packed differently?

Hanahanae answered 17/7, 2017 at 23:37 Comment(9)

Re: bitmap of byteArray data does not give me a camera preview on ImageView - how do you create a bitmap from NV21 data? – Incitement 25/7, 2017 at 8:33

yuvimage.compressToJpeg(new Rect(left, top, right, bottom), 80, baos); does that to byteArray. I get a bitmap from decodeByteArray – Hanahanae 25/7, 2017 at 15:37

So, ((CallActivity) currentActivity).setBitmapToImageView(bitmap) does not work as expected, but ((CallActivity) currentActivity).setBitmapToImageView(jpegArray) works? – Incitement 25/7, 2017 at 15:59

Creating bitmap from byte[] data and setting it to imageView did not work, but from creating it from byte[] jpegArray did work. Anyways, I have posted my answer with the fix. Also in addition to it, I scaled to the expected dimension as your pointed out. I could not make I420Frame work however. – Hanahanae 27/7, 2017 at 10:23

Here is the way to convert NV21 to bitmap: https://mcmap.net/q/1012210/-yuv-nv21-image-converting-to-bitmap-duplicate. – Incitement 27/7, 2017 at 11:6

Right now my conversion takes 12-15 ms. I think it'll not make a significant difference? – Hanahanae 27/7, 2017 at 11:15

Going through Jpeg it takes 12 ms? Actually, you should not do all this image processing on the UI thread, to begin with. – Incitement 27/7, 2017 at 11:23

Just checked this. Going through jpeg takes 5-10ms, and scale() + getNV21() takes me 50-70ms. None of these happen on UI thread. I go back to UI thread only inside setBitmapToImageView(bitmap); – Hanahanae 27/7, 2017 at 11:42

50-70ms could be improved with renderscript – Incitement 27/7, 2017 at 16:2

Okay, this was definitely a problem of how the original byte[] data was packed and the way byte[] jpegArray was packed. Changing the way of packing this and scaling it as AlexCohn suggested worked for me. I found help from other post on StackOverflow on way to pack it. This is the code for it:

private byte[] getNV21(int left, int top, int inputWidth, int inputHeight, Bitmap scaled) {
int [] argb = new int[inputWidth * inputHeight];
    scaled.getPixels(argb, 0, inputWidth, left, top, inputWidth, inputHeight);
    byte [] yuv = new byte[inputWidth*inputHeight*3/2];
    encodeYUV420SP(yuv, argb, inputWidth, inputHeight);
    scaled.recycle();
    return yuv;
}

private void encodeYUV420SP(byte[] yuv420sp, int[] argb, int width, int height) {
    final int frameSize = width * height;

    int yIndex = 0;
    int uvIndex = frameSize;

    int a, R, G, B, Y, U, V;
    int index = 0;
    for (int j = 0; j < height; j++) {
        for (int i = 0; i < width; i++) {

            a = (argb[index] & 0xff000000) >> 24; // a is not used obviously
            R = (argb[index] & 0xff0000) >> 16;
            G = (argb[index] & 0xff00) >> 8;
            B = (argb[index] & 0xff) >> 0;

            // well known RGB to YUV algorithm
            Y = ( (  66 * R + 129 * G +  25 * B + 128) >> 8) +  16;
            U = ( ( -38 * R -  74 * G + 112 * B + 128) >> 8) + 128;
            V = ( ( 112 * R -  94 * G -  18 * B + 128) >> 8) + 128;

            // NV21 has a plane of Y and interleaved planes of VU each sampled by a factor of 2
            //    meaning for every 4 Y pixels there are 1 V and 1 U.  Note the sampling is every other
            //    pixel AND every other scanline.
            yuv420sp[yIndex++] = (byte) ((Y < 0) ? 0 : ((Y > 255) ? 255 : Y));
            if (j % 2 == 0 && index % 2 == 0) {
                yuv420sp[uvIndex++] = (byte)((V<0) ? 0 : ((V > 255) ? 255 : V));
                yuv420sp[uvIndex++] = (byte)((U<0) ? 0 : ((U > 255) ? 255 : U));
            }

            index ++;
        }
    }
}`

I pass this byte[] data to onByteBufferFrameCaptured and callback:

Camera1Session.this.events.onByteBufferFrameCaptured(
                            Camera1Session.this,
                            data,
                            w,
                            h,
                            Camera1Session.this.getFrameOrientation(),
                            captureTimeNs);
Camera1Session.this.camera.addCallbackBuffer(data);

Prior to this, I had to scale the bitmap which is pretty straight forward:

int width = bitmapToScale.getWidth();
int height = bitmapToScale.getHeight();
Matrix matrix = new Matrix();
matrix.postScale(newWidth / width, newHeight / height);
Bitmap scaledBitmap = Bitmap.createBitmap(bitmapToScale, 0, 0, bitmapToScale.getWidth(), bitmapToScale.getHeight(), matrix, true);

Hanahanae answered 27/7, 2017 at 10:0 Comment(2)

It is not clear what you do with the result of getNV21(). – Incitement 27/7, 2017 at 11:1

Edited my answer to reflect that. Thanks. – Hanahanae 27/7, 2017 at 11:12

Can we use WebRTC's Datachannel to exchang custom data ie cropped face "image" in your case and do the respective calculation at receiving end using any third party library ie OpenGL etc? Reason I am suggesting is that the WebRTC Video feed received from channel is a stream in real time not a bytearray . WebRTC Video by its inherent architecture isn't meant to crop video at other hand. If we want to crop or augment video we have to use any ar library to fulfill this job.

We can always leverage WebRTC's Data channel to exchange customized data. Using Video channel for the same is not recommended because it's real time stream not the bytearray.Please revert in case of any concern.

Equivocation answered 25/7, 2017 at 12:45 Comment(2)

Is DataChannel enough to support continuous stream of large byteArray? – Hanahanae 25/7, 2017 at 15:41

No.In overlay or in any sort augmentation object recognition is must. For that purpose one party can exchange image using webrtc data channel with another party along with different other relevant coordinates i.e. cropping details w.r.t. image. At receiving party context calculation can be done to display overlay of cropped face ,live feed can be shown using openGL. – Equivocation 25/7, 2017 at 22:38

WebRTC in particular and video streaming in general presumes that the video has fixed dimensions. If you want to crop the detected face, your options are either to have pad the cropped image with e.g. black pixels (WebRTC does not use transparency), and crop the video on the receiver side, or, if you don't have control over the receiver, resize the cropped region to fill the expected width * height frame (you should also keep the expected aspect ratio).

Note that JPEG compress/decompress that you use to crop the original is far from efficient. Some other options can be found in Image crop and resize in Android.

Incitement answered 25/7, 2017 at 8:56 Comment(2)

Bandwidth consumption in case of a scaled or padded data would be same as that of a video call, no? Also, face crop on receive would be very tedious in case of a low light video if padded(it would also double the computation in my case). Scaling would disrupt aspect ratio of the face itself. Anyway I could append the face aspect ratio within I420Frame? – Hanahanae 25/7, 2017 at 15:45

Bandwidth overhead of constant padding (black is not necessary) is minimal, thanks to video compression. Scaling (zoom in) of a bitmap does not add much to bandwidth for the same reason. I could not understand what you mean by 'disrupt aspect ratio'. Yes, you can work within I420Frame. – Incitement 25/7, 2017 at 15:53

private byte[] getNV21(int left, int top, int inputWidth, int inputHeight, Bitmap scaled) {
int [] argb = new int[inputWidth * inputHeight];
    scaled.getPixels(argb, 0, inputWidth, left, top, inputWidth, inputHeight);
    byte [] yuv = new byte[inputWidth*inputHeight*3/2];
    encodeYUV420SP(yuv, argb, inputWidth, inputHeight);
    scaled.recycle();
    return yuv;
}

private void encodeYUV420SP(byte[] yuv420sp, int[] argb, int width, int height) {
    final int frameSize = width * height;

    int yIndex = 0;
    int uvIndex = frameSize;

    int a, R, G, B, Y, U, V;
    int index = 0;
    for (int j = 0; j < height; j++) {
        for (int i = 0; i < width; i++) {

            a = (argb[index] & 0xff000000) >> 24; // a is not used obviously
            R = (argb[index] & 0xff0000) >> 16;
            G = (argb[index] & 0xff00) >> 8;
            B = (argb[index] & 0xff) >> 0;

            // well known RGB to YUV algorithm
            Y = ( (  66 * R + 129 * G +  25 * B + 128) >> 8) +  16;
            U = ( ( -38 * R -  74 * G + 112 * B + 128) >> 8) + 128;
            V = ( ( 112 * R -  94 * G -  18 * B + 128) >> 8) + 128;

            // NV21 has a plane of Y and interleaved planes of VU each sampled by a factor of 2
            //    meaning for every 4 Y pixels there are 1 V and 1 U.  Note the sampling is every other
            //    pixel AND every other scanline.
            yuv420sp[yIndex++] = (byte) ((Y < 0) ? 0 : ((Y > 255) ? 255 : Y));
            if (j % 2 == 0 && index % 2 == 0) {
                yuv420sp[uvIndex++] = (byte)((V<0) ? 0 : ((V > 255) ? 255 : V));
                yuv420sp[uvIndex++] = (byte)((U<0) ? 0 : ((U > 255) ? 255 : U));
            }

            index ++;
        }
    }
}`

I pass this byte[] data to onByteBufferFrameCaptured and callback:

Camera1Session.this.events.onByteBufferFrameCaptured(
                            Camera1Session.this,
                            data,
                            w,
                            h,
                            Camera1Session.this.getFrameOrientation(),
                            captureTimeNs);
Camera1Session.this.camera.addCallbackBuffer(data);

Prior to this, I had to scale the bitmap which is pretty straight forward:

int width = bitmapToScale.getWidth();
int height = bitmapToScale.getHeight();
Matrix matrix = new Matrix();
matrix.postScale(newWidth / width, newHeight / height);
Bitmap scaledBitmap = Bitmap.createBitmap(bitmapToScale, 0, 0, bitmapToScale.getWidth(), bitmapToScale.getHeight(), matrix, true);

Hanahanae answered 27/7, 2017 at 10:0 Comment(2)

It is not clear what you do with the result of getNV21(). – Incitement 27/7, 2017 at 11:1

Edited my answer to reflect that. Thanks. – Hanahanae 27/7, 2017 at 11:12

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags