Translate Firebase MLKit Bounding box coordinates to screen view coordinates

Asked 27/2, 2020 at 3:46 Answered 29/4, 2024 at 16:53

Solved android firebase-mlkit android-camerax

I am using the FirebaseVision Object detection to detect things from the CameraX camera preview. It is detecting things find but I am trying to draw the bounding box of the items detected over the camera preview. In doing so the bounding box that firebase gives back is not for the image itself not the preview view to they appear in the wrong place.

The image size that I get back from firebase is 1200x1600 and the preview size is 2425x1440

How do I translate the bounding boxes returned from firebase to the correct screen coordinates?

Commonage answered 27/2, 2020 at 3:46 Comment(1)

You can take a look at a good sample here github.com/javaherisaber/MLBarcodeScanner – Mystery 27/9, 2022 at 9:37

What I ended up doing was I took the image size that the camera took, divided the width/height by the view width/height to get the scale size

if(isPortraitMode()){
    _scaleY = overlayView.height.toFloat() / imageWidth.toFloat()
    _scaleX = overlayView.width.toFloat() / imageHeight.toFloat()
}else{
    _scaleY = overlayView.height.toFloat() / imageHeight.toFloat()
    _scaleX = overlayView.width.toFloat() / imageWidth.toFloat()
}

Now that I have the scale I can then take the bounding box return by the firebase detector and translate the x and y coordinates by the scales

private fun translateX(x: Float): Float = x * _scaleX
private fun translateY(y: Float): Float = y * _scaleY

private fun translateRect(rect: Rect) = RectF(
    translateX(rect.left.toFloat()),
    translateY(rect.top.toFloat()),
    translateX(rect.right.toFloat()),
    translateY(rect.bottom.toFloat())
)

Which then gives you the scaled rect coordinates which you then draw on the screen

Commonage answered 7/3, 2020 at 0:36 Comment(3)

Based on this I created a github repo for barcode scanning and placing bounding box in right position, github.com/mtsahakis/barcode – Kaduna 30/8, 2020 at 17:50

This doesn't seem to consider aspect ratio is it? – Lodged 15/6, 2022 at 20:7

@Lodged this assumes the preview view is being shown full screen, if your camera can crop or something then I doubt this will work – Commonage 17/6, 2022 at 12:17

Thanks @tyczj,

Your answer help me find my solution, let me add if someone is using front camera like me for Face detection you need to inverte the x axis, example:

val previewSize = overlayView.width.toFloat()
val newLeft = if (isFrontCamera) previewSize - (rect.right * scaleX) else rect.left * scaleX
val newRight = if (isFrontCamera) previewSize - (rect.left * scaleX) else rect.right * scaleX

Varese answered 6/1, 2022 at 15:13 Comment(0)

Please see my answer in CameraX qrcode scanner detect wrong. Basically you can use CoordinateTransform to transform coordinates from one CameraX UseCase to another.

Annulose answered 1/10, 2021 at 14:32 Comment(0)

The solution now is to use LifecycleCameraController and PreviewView (both from CameraX) with MlKitAnalyzer. If you connect them with each other and then set CameraController.COORDINATE_SYSTEM_VIEW_REFERENCED as targetCoordinateSystem in MlKitAnalyzer, then the coordinates of the barcode bounding box will be automatically translated to the PreviewView coordinates. They will still be in pixels though, so you need to translate them further to dp using LocalDensity.current and toDp() function (in Compose).

For detailed example solution in Jetpack Compose see my answer here. For a good explanation of the separation of concerns between CameraController and PreviewView see this article.

Mohandas answered 21/12, 2023 at 14:49 Comment(0)

First understand that, view coordinate system has different dimensions than captured image given same distance is covered. And face bounding box coordinates is in image coordinate system.

So you need transform 1 coordinate system to otherby scaling up or down.

Find Display coordinates

          val displayMetrics = context.resources.displayMetrics
          val screenWidth = displayMetrics.widthPixels
          val screenHeight = displayMetrics.heightPixels
          val density = displayMetrics.density

From image size from imageproxy from camera api

          val imageWidth = imageProxy.width
          val imageHeight = imageProxy.height

Find scale by comparing both

          val scaleX = screenWidth.toFloat() / imageWidth.toFloat()
          val scaleY = screenHeight.toFloat() / imageHeight.toFloat()

get scaled width to be used to get face bounding box actual left right positions in next steps

          scaledScreenWidth = screenWidth / scaleX

Get your view positions

          val radius: Float = getCircleDiameter() / 2.toFloat() * density
          val centerX = screenWidth / 2
          val centerY = screenHeight/ 2

find frame coordinates

          val frameLeft = (centerX - radius) / scaleX
          val frameRight = (centerX + radius) / scaleX
          val frameTop = (centerY - radius) / scaleY
          val frameBottom = (centerY + radius) / scaleY

          frameRect = Rect(frameLeft, frameTop, frameRight, frameBottom)

find face coordinates

NOte -> for front camera since its mirror image, to get actual left, right you need to get it reversed from current left right

          val faceLeft = scaledScreenWidth - face.boundingBox.right
          val faceRight = scaledScreenWidth - face.boundingBox.left
          val faceTop = face.boundingBox.top.toFloat()
          val faceBottom = face.boundingBox.bottom.toFloat()

Now you have face and frame coordinates in same coordinate system. You can compare easily.

NOte -> we have converted view coordinates to image, so that its a one time step, since view coordinates dont change, and not bringing it to image coordinates, you can always compare by every new face detected without recalculating frame coordinates.

Glynda answered 29/4, 2024 at 16:53 Comment(0)

Recommended topics

Hot tags