Transforming ARFrame#capturedImage to view size

Asked 11/11, 2019 at 21:42 Answered 6/9, 2023 at 10:10

When using the ARSessionDelegate to process the raw camera image in ARKit...

func session(_ session: ARSession, didUpdate frame: ARFrame) {

    guard let currentFrame = session.currentFrame else { return }
    let capturedImage = currentFrame.capturedImage

    debugPrint("Display size", UIScreen.main.bounds.size)
    debugPrint("Camera frame resolution", CVPixelBufferGetWidth(capturedImage), CVPixelBufferGetHeight(capturedImage))

    // ...

}

... as documented, the camera image data doesn't match the screen size, for example, on iPhone X I get:

Display size: 375x812pt
Camera resolution: 1920x1440px

Now there is the displayTransform(for:viewportSize:) API to transform camera coordinates to view coordinates. When using the API like this:

let ciimage = CIImage(cvImageBuffer: capturedImage)
let transform = currentFrame.displayTransform(for: .portrait, viewportSize: UIScreen.main.bounds.size)
var transformedImage = ciimage.transformed(by: transform)
debugPrint("Transformed size", transformedImage.extent.size)

I get a size of 2340x1920 which seems incorrect, the result should have an aspect ratio of 375:812 (~0.46). What do I miss here / what's the correct way to use this API to transform the camera image to an image "as displayed by ARSCNView"?

(Example project: ARKitCameraImage)

Erg answered 11/11, 2019 at 21:42 Comment(0)

This turned out to be quite complicated because displayTransform(for:viewportSize) expects normalized image coordinates, it seems you have to flip the coordinates only in portrait mode and the image needs to be not only transformed but also cropped. The following code does the trick for me. Suggestions how to improve this would be appreciated.

guard let frame = session.currentFrame else { return }
let imageBuffer = frame.capturedImage

let imageSize = CGSize(width: CVPixelBufferGetWidth(imageBuffer), height: CVPixelBufferGetHeight(imageBuffer))
let viewPort = sceneView.bounds
let viewPortSize = sceneView.bounds.size

let interfaceOrientation : UIInterfaceOrientation
if #available(iOS 13.0, *) {
    interfaceOrientation = self.sceneView.window!.windowScene!.interfaceOrientation
} else {
    interfaceOrientation = UIApplication.shared.statusBarOrientation
}

let image = CIImage(cvImageBuffer: imageBuffer)

// The camera image doesn't match the view rotation and aspect ratio
// Transform the image:

// 1) Convert to "normalized image coordinates"
let normalizeTransform = CGAffineTransform(scaleX: 1.0/imageSize.width, y: 1.0/imageSize.height)

// 2) Flip the Y axis (for some mysterious reason this is only necessary in portrait mode)
let flipTransform = (interfaceOrientation.isPortrait) ? CGAffineTransform(scaleX: -1, y: -1).translatedBy(x: -1, y: -1) : .identity

// 3) Apply the transformation provided by ARFrame
// This transformation converts:
// - From Normalized image coordinates (Normalized image coordinates range from (0,0) in the upper left corner of the image to (1,1) in the lower right corner)
// - To view coordinates ("a coordinate space appropriate for rendering the camera image onscreen")
// See also: https://developer.apple.com/documentation/arkit/arframe/2923543-displaytransform

let displayTransform = frame.displayTransform(for: interfaceOrientation, viewportSize: viewPortSize)

// 4) Convert to view size
let toViewPortTransform = CGAffineTransform(scaleX: viewPortSize.width, y: viewPortSize.height)

// Transform the image and crop it to the viewport
let transformedImage = image.transformed(by: normalizeTransform.concatenating(flipTransform).concatenating(displayTransform).concatenating(toViewPortTransform)).cropped(to: viewPort)

Erg answered 12/11, 2019 at 11:10 Comment(4)

Here is a standalone example project making use of this code for a Metal shader that uses the camera image as displayed: github.com/ralfebert/ARSCNViewShaderExample . – Erg 24/6, 2020 at 20:30

I tried several other ways, this was the only one that made the ARFrame the same size and orientation as the image returned by snapshot() so they can be overplayed or compared. – Feldt 17/1, 2021 at 5:32

For me, the second part 2) did not work. Instead I skipped this part and applied image = image.oriented(.upMirrored) after the .transformed operation. I also had to multiply the width and heights by 3 to get the device resolution: viewPort.size.width *= 3; viewPort.size.height *= 3; viewPortSize.width *= 3; viewPortSize.height *= 3. I am using an iPhone 12 Pro. – Throat 20/10, 2022 at 9:24

This answer is awesome, it contains all steps explained to obtain the correct result. You can convert the final transformedImage to UIImage with UIImage(ciImage: transformedImage) and you can see in real time your results. – Cerargyrite 6/9, 2023 at 9:59

Thank you so much for your answer! I was working on this for a week.

Here's an alternative way to do it without messing with the orientation. Instead of using the capturedImage property you can use a snapshot of the screen.

func session(_ session: ARSession, didUpdate frame: ARFrame) {
  guard let image = CIImage(image: sceneView.snapshot()) else { return }

  let imageSize = image.extent.size

  // Convert to "normalized image coordinates"
  let resize = CGAffineTransform(scaleX: 1.0 / imageSize.width, y: 1.0 / imageSize.height)

  // Convert to view size
  let viewSize = CGAffineTransform(scaleX: sceneView.bounds.size.width, y: sceneView.bounds.size.height)

  // Transform image
  let editedImage = image.transformed(by: resize.concatenating(viewSize)).cropped(to: sceneView.bounds)

  sceneView.scene.background.contents = context.createCGImage(editedImage, from: editedImage.extent)
 }

Eros answered 2/4, 2020 at 19:23 Comment(1)

The question is not about choosing a method to copy the image pixels. But to use the capturedImage bits, as that last method does not copy bytes and achieve 60fps when used to overlay things. – Hollands 2/12, 2021 at 6:59

I've converted the awesome Ralf Ebert answers to a swift extension:

But first, we need orientation:

extension UIApplication {
    var keyWindow: UIWindow? {
        return self.connectedScenes
            .filter { $0.activationState == .foregroundActive }
            .first(where: { $0 is UIWindowScene })
            .flatMap({ $0 as? UIWindowScene })?.windows
            .first(where: \.isKeyWindow)
    }
}

extension ARSession {
    func resizeTo(_ size:CGSize)->CIImage?{
        guard let frame = self.currentFrame else { return nil }
        let imageBuffer = frame.capturedImage
        let imageSize = CGSize(width: CVPixelBufferGetWidth(imageBuffer), height: CVPixelBufferGetHeight(imageBuffer))
        let interfaceOrientation = UIApplication.shared.keyWindow?.windowScene!.interfaceOrientation ?? .portrait
        let image = CIImage(cvImageBuffer: imageBuffer)
        let normalizeTransform = CGAffineTransform(scaleX: 1.0/imageSize.width, y: 1.0/imageSize.height)
        let flipTransform = interfaceOrientation.isPortrait ? CGAffineTransform(scaleX: -1, y: -1).translatedBy(x: -1, y: -1) : .identity
        let viewPort = CGRect.init(origin: CGPoint.zero, size: size)
        let displayTransform = frame.displayTransform(for: interfaceOrientation, viewportSize: size)
        let toViewPortTransform = CGAffineTransform(scaleX: size.width, y: size.height)
        return image.transformed(by: normalizeTransform.concatenating(flipTransform).concatenating(displayTransform).concatenating(toViewPortTransform)).cropped(to: viewPort)
    }
}

Usage:

let resizedCIImage = myARSession.resizeTo(CGSize(width:300,height:300))

Cerargyrite answered 6/9, 2023 at 10:10 Comment(0)

Recommended topics

Hot tags