So I'm trying to record videos using AVAssetWriter
, process each camera frame (e.g.: adding some watermark or text overlays) by creating CIImage
from camera buffer ( CVImageBuffer
), add some filters to CIImage
(which is very fast in performance), and then I need to get new CVPixelBuffer
from CIImage
and it becomes a problem with high resolutions like 4K on base iPhone 11 because cIContext.render(compositedImage, to: pixelBuffer)
takes about 30 ms of CPU time, so the app won't be able to record 4K at 60 FPS.
Are there any solutions to improve it?
Or the only way to improve performance is to use OpenGL/Metal? But not sure how exactly it may be better if we still need to somehow pass pixel buffer to AVAssetWriter
. Is there any simple example with using Metal and AVAssetWriter similar to the following code example?
private let cIContext: CIContext = {
if let mtlDevice = MTLCreateSystemDefaultDevice() {
return CIContext(mtlDevice: mtlDevice) // makes no difference in perfomance for CIContext.render()
} else {
return CIContext()
}
}()
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
guard let assetWriter = assetWriter, let videoWriterInput = videoWriterInput else { return }
if isRecording == false || assetWriter.status != .writing { return }
if CMSampleBufferDataIsReady(sampleBuffer) == false {
return
}
if output == videoOutput, videoWriterInput.isReadyForMoreMediaData {
let presentationTime = CMSampleBufferGetPresentationTimeStamp(sampleBuffer)
if hasWritingSessionStarted == false {
assetWriter.startSession(atSourceTime: presentationTime)
hasWritingSessionStarted = true
guard let pixelBufferPool = pixelBufferAdaptor?.pixelBufferPool else { return }
let status = CVPixelBufferPoolCreatePixelBuffer(kCFAllocatorDefault, pixelBufferPool, &pixelBuffer)
guard status == kCVReturnSuccess else {
print("Failed to create pixel buffer")
return
}
}
guard let pixelBuffer = pixelBuffer else {
print("Pixel buffer is nil")
return
}
guard let imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {
print("Failed to get image buffer.")
return
}
// fast, up to ~1 ms
let ciImage = CIImage(cvPixelBuffer: imageBuffer)
// fast, up to ~1 ms
let compositedImage = watemarkImage.composited(over: ciImage)
var tookTime = CFAbsoluteTimeGetCurrent()
// very slow, ~30 ms for 4K resolution on iPhone 11 (base)
cIContext.render(compositedImage, to: pixelBuffer)
tookTime = CFAbsoluteTimeGetCurrent() - tookTime
// fast, up to ~1 ms
pixelBufferAdaptor?.append(pixelBuffer, withPresentationTime: presentationTime)
print("cIContext.render took \(tookTime * 1000) ms")
}
}
Update:
I have been able to convert CIImage
to MTLTexture
, and then MTLTexture
to CVPixelBuffer
, it works a bit faster but for some reason the image is vertically inverted:
// before starting the recording:
let textureDescriptor = MTLTextureDescriptor.texture2DDescriptor(
pixelFormat: .bgra8Unorm, // Choose a suitable pixel format
width: Int(frameSize.width),
height: Int(frameSize.height),
mipmapped: false
)
textureDescriptor.usage = [.shaderWrite] // Important!
// Create the Metal texture
texture = mtlDevice.makeTexture(descriptor: textureDescriptor)
// in captureOutput function:
var ciImage = CIImage(cvPixelBuffer: imageBuffer)
// additionally process CIImage (add some filters) and get a new CIImage
let commandBuffer = commandQueue.makeCommandBuffer()
ciContext.render(
ciImage,
to: texture,
commandBuffer: commandBuffer,
bounds: CGRect(x: 0, y: 0, width: frameSize.width, height: frameSize.height),
colorSpace: CGColorSpaceCreateDeviceRGB()
)
commandBuffer.commit()
commandBuffer.waitUntilCompleted()
let width = texture.width
let height = texture.height
CVPixelBufferLockBaseAddress(pixelBuffer, .readOnly)
if let pixelBufferBaseAddress = CVPixelBufferGetBaseAddress(pixelBuffer) {
texture.getBytes(
pixelBufferBaseAddress,
bytesPerRow: CVPixelBufferGetBytesPerRow(pixelBuffer),
from: MTLRegionMake2D(0, 0, width, height),
mipmapLevel: 0
)
}
CVPixelBufferUnlockBaseAddress(pixelBuffer, .readOnly)
pixelBufferAdaptor?.append(pixelBuffer, withPresentationTime: presentationTime)
What can be wrong here?
Update 2:
It seems Metal works this way. Anyway I have just used videoWriterInput.transform = CGAffineTransform(scaleX: 1, y: -1)
to "fix it". I was trying to do such transform for CIImage before converting it to MLTexture but it din't work for some reason.
This method takes about 23 ms instead ~30 ms, so it's still not enough.
p.s.:
I'm also going to try to get MLTexture directly from CMSampleBuffer and some how to add filters to it (I guess it's going to be harder comparing to CIImage filters). I have tried the following Unexpected output converting CVPixelBuffer to MTLTexture code but also have the issue the author mentioned
Update 3:
About direct MLTexture from camera buffer:
Because it basically can be YUV format (kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange
), so we would need to get two textures (luma, chroma) and combine them in two one, convert YUV to RGB using Metal Shader.