IP Camera Capture RTSP stream big latency OPENCV
Asked Answered
B

2

2

I am trying to so some processing on a IP Camera , and it works well, but I see a lag between the real world and the video capture in about 7~10 seconds.

I am using the rtsp://@ip:port/live ext

This camera have a web interface (IE / ActiveX) that shows the image with very low lag. (about 200~300 ms).

i tested this code when i put the video in input there it work well without latencyand when i use my camera ip or camera drone with RTSP protocol the soft work with latency 7~10s.

NB :i set the resolution (1080,720) and i used GPU NVIDIA Qaudro1000 it work well , thats way i think the problem not about the processing or hardware , its about the code.

edit: It May have something to do with VideoCapture buffer. Is there a way to make it aways use the latest image?

edit2: I get good lag results on VLC , just latency 300ms

Thank you!

You can see the code I am using bellow:

import cv2
import time

import argparse
import numpy as np
from PIL import Image
from utils.anchor_generator import generate_anchors
from utils.anchor_decode import decode_bbox
from utils.nms import single_class_non_max_suppression
from load_model.pytorch_loader import load_pytorch_model, pytorch_inference

# model = load_pytorch_model('models/face_mask_detection.pth');
model = load_pytorch_model('models/model360.pth');
# anchor configuration
#feature_map_sizes = [[33, 33], [17, 17], [9, 9], [5, 5], [3, 3]]
feature_map_sizes = [[45, 45], [23, 23], [12, 12], [6, 6], [4, 4]]
anchor_sizes = [[0.04, 0.056], [0.08, 0.11], [0.16, 0.22], [0.32, 0.45], [0.64, 0.72]]
anchor_ratios = [[1, 0.62, 0.42]] * 5

# generate anchors
anchors = generate_anchors(feature_map_sizes, anchor_sizes, anchor_ratios)

# for inference , the batch size is 1, the model output shape is [1, N, 4],
# so we expand dim for anchors to [1, anchor_num, 4]
anchors_exp = np.expand_dims(anchors, axis=0)

id2class = {0: 'Mask', 1: 'NoMask'}


def inference(image,
              conf_thresh=0.5,
              iou_thresh=0.4,
              target_shape=(160, 160),
              draw_result=True,
              show_result=True
              ):
    '''
    Main function of detection inference
    :param image: 3D numpy array of image
    :param conf_thresh: the min threshold of classification probabity.
    :param iou_thresh: the IOU threshold of NMS
    :param target_shape: the model input size.
    :param draw_result: whether to daw bounding box to the image.
    :param show_result: whether to display the image.
    :return:
    '''
    # image = np.copy(image)
    output_info = []
    height, width, _ = image.shape
    image_resized = cv2.resize(image, target_shape)
    image_np = image_resized / 255.0  # 归一化到0~1
    image_exp = np.expand_dims(image_np, axis=0)

    image_transposed = image_exp.transpose((0, 3, 1, 2))

    y_bboxes_output, y_cls_output = pytorch_inference(model, image_transposed)
    # remove the batch dimension, for batch is always 1 for inference.
    y_bboxes = decode_bbox(anchors_exp, y_bboxes_output)[0]
    y_cls = y_cls_output[0]
    # To speed up, do single class NMS, not multiple classes NMS.
    bbox_max_scores = np.max(y_cls, axis=1)
    bbox_max_score_classes = np.argmax(y_cls, axis=1)

    # keep_idx is the alive bounding box after nms.
    keep_idxs = single_class_non_max_suppression(y_bboxes,
                                                 bbox_max_scores,
                                                 conf_thresh=conf_thresh,
                                                 iou_thresh=iou_thresh,
                                                 )

    for idx in keep_idxs:
        conf = float(bbox_max_scores[idx])
        class_id = bbox_max_score_classes[idx]
        bbox = y_bboxes[idx]
        # clip the coordinate, avoid the value exceed the image boundary.
        xmin = max(0, int(bbox[0] * width))
        ymin = max(0, int(bbox[1] * height))
        xmax = min(int(bbox[2] * width), width)
        ymax = min(int(bbox[3] * height), height)

        if draw_result:
            if class_id == 0:
                color = (0, 255, 0)
            else:
                color = (255, 0, 0)
            cv2.rectangle(image, (xmin, ymin), (xmax, ymax), color, 2)
            cv2.putText(image, "%s: %.2f" % (id2class[class_id], conf), (xmin + 2, ymin - 2),
                        cv2.FONT_HERSHEY_SIMPLEX, 0.8, color)
        output_info.append([class_id, conf, xmin, ymin, xmax, ymax])

    if show_result:
        Image.fromarray(image).show()
    return output_info


def run_on_video(video_path, output_video_name, conf_thresh):
    cap = cv2.VideoCapture(video_path)
    height = cap.get(cv2.CAP_PROP_FRAME_HEIGHT)
    width = cap.get(cv2.CAP_PROP_FRAME_WIDTH)
    fps = cap.get(cv2.CAP_PROP_FPS)
    fourcc = cv2.VideoWriter_fourcc(*'XVID')
    # writer = cv2.VideoWriter(output_video_name, fourcc, int(fps), (int(width), int(height)))
    total_frames = cap.get(cv2.CAP_PROP_FRAME_COUNT)
    if not cap.isOpened():
        raise ValueError("Video open failed.")
        return
    status = True
    idx = 0
    while status:
        start_stamp = time.time()
        status, img_raw = cap.read()
        img_raw = cv2.cvtColor(img_raw, cv2.COLOR_BGR2RGB)
        read_frame_stamp = time.time()
        if (status):
            inference(img_raw,
                      conf_thresh,
                      iou_thresh=0.5,
                      target_shape=(360, 360),
                      draw_result=True,
                      show_result=False)
            cv2.imshow('image', img_raw[:, :, ::-1])
            cv2.waitKey(1)
            inference_stamp = time.time()
            # writer.write(img_raw)
            write_frame_stamp = time.time()
            idx += 1
            print("%d of %d" % (idx, total_frames))
            print("read_frame:%f, infer time:%f, write time:%f" % (read_frame_stamp - start_stamp,
                                                                   inference_stamp - read_frame_stamp,
                                                                   write_frame_stamp - inference_stamp))
    # writer.release()


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Face Mask Detection")
    parser.add_argument('--img-mode', type=int, default=1, help='set 1 to run on image, 0 to run on video.')
    parser.add_argument('--img-path', type=str, help='path to your image.')
    parser.add_argument('--video-path', type=str, default='0', help='path to your video, `0` means to use camera.')
    # parser.add_argument('--hdf5', type=str, help='keras hdf5 file')
    args = parser.parse_args()
    if args.img_mode:
        imgPath = args.img_path
        img = cv2.imread(imgPath)
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        inference(img, show_result=True, target_shape=(360, 360))
    else:
        video_path = args.video_path
        if args.video_path == '0':
            video_path = 0
        run_on_video(video_path, '', conf_thresh=0.5)

I have no idea why it's so slow on OpenCV. I wold like some tips to make the capture faster.

Briney answered 19/8, 2020 at 9:59 Comment(6)
So you mean this code is using GPU?Catechumen
For somereason VLC may be using GPU to decode stream but not sure, your code using CPU to decode h264 codec so it can give ~2 sec delay and adding other processes also can make such lagCatechumen
I think the code support GPU, because when i use just CPU the soft work slowly , but when i use the GPU the soft work faster with input video , but with Camera IP i see lag about 7 s , i dont know how can be make sure of that ""your code using CPU to decode h264 codec so it can give ~2 sec delay"" ThanksBriney
I am sure you dont use gpu in your code. Its not possibleCatechumen
import sys import torch sys.path.append('models/') def load_pytorch_model(model_path): model = torch.load(model_path) return model def pytorch_inference(model, img_arr): if torch.cuda.is_available(): dev = 'cuda:0' else: dev = 'cpu' device = torch.device(dev) model.to(device) input_tensor = torch.tensor(img_arr).float().to(device) y_bboxes, y_scores, = model.forward(input_tensor) return y_bboxes.detach().cpu().numpy(), y_scores.detach().cpu().numpy()Briney
you can see in this line u used ` torch.cuda.is_available(): dev = 'cuda:0' `Briney
P
5

The problem is in Opencv RTSP stream implementation.

To get a mat out of the stream, you need to initialize the codec and feed it with several compressed frame packets. The codec has a frame buffer inside. It works as FIFO(first input first output). You call avcodec_send_packet() and after it you call avcodec_receive_frame(). Returned frame is wrapped into mat object and returned to you. First several packets initialize buffers and don't generate any picture.

(more info here https://ffmpeg.org/doxygen/3.3/group__lavc__encdec.html )

Don't expect low latency with RTSP on opencv in python. The only way I could find to decrease the latency in my case was to use FFMPEG example and to rewrite it in c++.

Increasing the number of I-frames might help(spoiler:not much) p.s. some examples of my work with RTSP streams : https://www.youtube.com/channel/UCOK7D73tj7Dl4ZyXE-J0UNA

Propolis answered 19/8, 2020 at 15:33 Comment(1)
Thank you so much for your reply : could you please write the code must be added on C++ , just to understand what should i do , and after that i will try write it with PYTHONBriney
O
0

Opencv have mistake in its decoder implementation. You need to edit opencv/modules/videoio/src/cap_ffmpeg_impl.hpp.

After enc->thread_count = get_number_of_cpus(); add this:

AVDictionaryEntry* threads_entry = av_dict_get(dict, "threads", NULL, 0);
    if(threads_entry)
    {
        int i=1;
        if(sscanf(threads_entry->value, "%d", &i) == 1)
            enc->thread_count=i;
    }

Build opencv_videoio_ffmpeg64.dll from this git repository: https://github.com/opencv/opencv_3rdparty/tree/ffmpeg/4.x

Replace old .dll, then set OPENCV_FFMPEG_CAPTURE_OPTIONS to "threads;1|protocol_whitelist;file,rtp,udp,tcp".

Overleap answered 5/2, 2023 at 11:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.