So the IR pattern that the Kinect displays isn't your normal grid. For an example, check out this blog post. The Kinect handles making a normal depth map out of this. Thinking about focal lengths and such for this system is just going to dig yourself a hole. Your thoughts about precision are probably misplaced. The Kinect isn't accurate enough to BE picky about such things. Having used the Kinect for motion detection, there's a lot of noise. If you have a certain situation in mind, you might want to post about it.
edit: Here's a post showing that the depth isn't linear, and more of the precision is focused on closer objects. So the farther away you are, the less precise the data is, and the more severe the noise will become (because having the returned depth change by 1 nearby is pretty much nothing, but farther away that accounts for a larger distance change).