I'm dealing with a machine learning task and I'm trying to use blender to generate synthetic images as a training dataset for a neural network. To do this, I have to find the bounding box of the objects in the rendered image.
My code, up to now, is heavily based on the one suggested in this thread, but this doesn't take care of whether a vertex is visible or occluded by another object. The desired result is indeed exactly the same explained here. I've tried the suggestion given there, but it doesn't work. I can't understand if it's because I give the ray_cast function wrong inputs (since bpy APIs are really awful) or if it's just because of the poor performance of the function, as I've read somewhere else. My code, right now, is:
import bpy
import numpy as np
def boundingbox(scene, camera, obj, limit = 0.3):
# Get the inverse transformation matrix.
matrix = camera.matrix_world.normalized().inverted()
# Create a new mesh data block, using the inverse transform matrix to undo any transformations.
dg = bpy.context.evaluated_depsgraph_get()
# eval_obj = bpy.context.object.evaluated_get(dg)
eval_obj = obj.evaluated_get(dg)
mesh = eval_obj.to_mesh()
mesh.transform(obj.matrix_world)
mesh.transform(matrix)
# Get the world coordinates for the camera frame bounding box, before any transformations.
frame = [-v for v in camera.data.view_frame(scene=scene)[:3]]
origin = camera.location
lx = []
ly = []
for v in mesh.vertices:
co_local = v.co
z = -co_local.z
direction = (co_local - origin)
result = scene.ray_cast(view_layer=bpy.context.window.view_layer, origin=origin,
direction= direction) # interested only in the first return value
intersection = result[0]
met_obj = result[4]
if intersection:
if met_obj.type == 'CAMERA':
intersection = False
if z <= 0.0 or (intersection == True and (result[1] - co_local).length > limit):
# Vertex is behind the camera or another object; ignore it.
continue
else:
# Perspective division
frame = [(v / (v.z / z)) for v in frame]
min_x, max_x = frame[1].x, frame[2].x
min_y, max_y = frame[0].y, frame[1].y
x = (co_local.x - min_x) / (max_x - min_x)
y = (co_local.y - min_y) / (max_y - min_y)
lx.append(x)
ly.append(y)
eval_obj.to_mesh_clear()
# Image is not in view if all the mesh verts were ignored
if not lx or not ly:
return None
min_x = np.clip(min(lx), 0.0, 1.0)
min_y = np.clip(min(ly), 0.0, 1.0)
max_x = np.clip(max(lx), 0.0, 1.0)
max_y = np.clip(max(ly), 0.0, 1.0)
# Image is not in view if both bounding points exist on the same side
if min_x == max_x or min_y == max_y:
return None
# Figure out the rendered image size
render = scene.render
fac = render.resolution_percentage * 0.01
dim_x = render.resolution_x * fac
dim_y = render.resolution_y * fac
# return box in the form (top left x, top left y),(width, height)
return (
(round(min_x * dim_x), # X
round(dim_y - max_y * dim_y)), # Y
(round((max_x - min_x) * dim_x), # Width
round((max_y - min_y) * dim_y)) # Height
)
I've also tried to cast the ray from the vertex to the camera position (instead of doing the opposite) and using the small cubes workaround as explained here, but to no avail. Could please someone help me figure out how to properly do this or suggest another strategy?