If we know the height of the camera then this becomes a simple matter of trigonometry.
tan(angle) = height-above-camera / depth
so
height-above-camera = depth * tan(angle)
If the camera is known to be 8ft above ground then we can get the actual height of a pixel as
height = 8ft + depth * tan(angle)
The whole calculation depends on the quality of the depth map.
Another thing you could exploit is assuming the blocks in the depth map end at ground level. Then run an edge detection on the image.
Here I've used an online Canny Edge detection which is picking out the base of the block on the right-hand side. A bit of trig should get the height.
Better than using image processing if you allow interactive use, get the user to click on the base of the object and the top of the object. Knowing the pitch of the point at the base allows the distance to be calculated.
angle
ortan(angle)
? – Natalya