Matching Kinect Audio with Video
Asked Answered
P

6

28

I have a project dealing with video conferencing using the Kinect (or, more likely, four of them). Right now, my company uses these stupidly expensive cameras for our VTC rooms. The hope is, using a couple Kinects linked together, we can reduce the costs. The plan is to have four/five of them covering a 180 degree arc so the Kinects can see the entire room/table (still a lot cheaper than our current cameras!). The applications would choose a video stream coming from a Kinect based on who at the table is talking. Plan is fine in theory, but I've run into a snag.

As far as I can tell, there is no way to tell which microphone array corresponds to Kinect Runtime object. I can get an object representing each Kinect using:

Device device = new Device();
Runtime[] kinects = new Runtime[device.Count];
for( int i = 0; i < kinects.Length; i ++ )
    kinects[i] = new Runtime(i);

And every microphone array using:

var source = new KinectAudioSource();
IEnumerable<AudioDeviceInfo> devices = source.FindCaptureDevices();
foreach( AudioDeviceInfo in device in devices)
{
    KinectAudioSource devSpecificSource = new KinectAudioSource();
    devSpecificSource.MicrophoneIndex = (short)device.DeviceIndex;
}

but I cannot find any way to know that Runtime A corresponds to KinectAudioSource B. This isn't a huge problem for the two Kinects I'm using (I'll just guess which is which, and switch them if they're wrong), but when we get up to four or five Kinects, I don't want to need to do any kind of calibration every time the application runs. I've considered assuming that the Runtime and KinectAudioSource objects will be in the same order (Runtime index 0 corresponds to the first AudioDeviceInfo in devices), but that seems risky.

So, the question: is there any way to match a Runtime object with its KinectAudioSource? If not, is it guaranteed that they will be in the correct order so I can match Runtime 0 with the first KinectAudioSource microphone index in devices?

UPDATE: Finally slammed my face against WPF's single thread apartment requirement and the Kinect audio's multiple thread apartment requirement enough to get the two to behave together. Problem is, as far as I can tell, the order of the Kinect Runtime objects and KinectAudioSources do not line up. I'm in a rather loud lab (I'm one of.. maybe 40 interns in the room), so it's hard to test, but I'm fairly certain that the order is switched for the two Kinects I have plugged in. I have two Runtime objects and two KinectAudioSource objects. When the first KinectAudioSource reports that a sound is coming from directly in front of it, I'm actually standing in front of the Kinect associated with the second Runtime object. So there's no guarantee that the orders of the two will line up. So now, to repeat the question: how do I match up the KinectAudioSource object with the Nui.Runtime object? Right now, I only have two Kinects hooked up, but since the goal is four or five.. I need a concrete way to do this.

UPDATE 2: Brought the two Kinects I have at work back home to play with. Three Kinects, one computer. Fun stuff (it was actually a pain to get them all installed at once, and one of the video feeds doesn't seem to be working, so I'm back to 2 for now). musefan's answer got me hoping that I had missed something in the AudioDeviceInfo objects that would shed some light on this problem, but no luck. I found an interesting looking field in Runtime objects called NuiCamera.UniqueDeviceName, but I can't find any link between that and anything in AudioDeviceInfo.

Output from those fields, in the hopes Sherlock Holmes sees the thread and notices a connection:

Console.WriteLine("Nui{0}: {1}", i, nuis[i].NuiCamera.UniqueDeviceName);
//Nui0: USB\VID_0409&PID_005A\6&1F9D61BF&0&4
//Nui1: USB\VID_0409&PID_005A\6&356AC357&0&3

Console.WriteLine("AudioDeviceInfo{0}: {1}, {2}, {3}", audios.IndexOf(audio), device.DeviceID, device.DeviceIndex, device.DeviceName);
//AudioDeviceInfo0: {0.0.1.00000000}.{1945437e-2d55-45e5-82ba-fc3021441b17}, 0, Microphone Array (Kinect USB Audio)
//AudioDeviceInfo1: {0.0.1.00000000}.{6002e98f-2429-459a-8e82-9810330a8e25}, 1, Microphone Array (2- Kinect USB Audio)

UPDATE 3: I'm not looking for calibration techniques. I'm looking for a way to match the Kinect camera with its microphone array within the application at runtime, with no previous set up required. Please stop posting possible calibration techniques. The entire point of posting the question was to find a way to avoid needing the user to do set up.

UPDATE 4: WMI definitely seems like the way to go. Unfortunately, I haven't had a lot of time to work on it, as I've been struggling just to get 3 Kinects to play nice with each other. Something about USB hubs not being able to handle the bandwidth? I've informed my boss that there doesn't seem to be any easy way to connect 3+ Kinects to a regular computer and not blue screen. I might still try to work on this in my free time, but as far as work goes.. it's pretty much a dead end.

Thanks for the answers guys, sorry I couldn't post a working solution.

Plainlaid answered 5/7, 2011 at 13:52 Comment(4)
I think I heard somewhere that the SDK currently only lets you get audio from one device at a time... I could be wrong on that but you may want to verify before going too far down this path.Sacttler
There's a limitation on the skeleton tracking and depth map (can only get it from the primary Kinect), but there aren't any such limitations on the audio as far as I know. I'll make sure of that soon.Plainlaid
To anyone reading my comment: I was wrong, you can get depth information from any Kinect. Skeletal information is still limited to the primary Kinect though, and therefore the player index information is as well.Plainlaid
This looks like a very interesting project. Good luck with it. I haven't had a chance to mess with a Kinect yet myself!Preamble
S
11

The API provided by Microsoft Research doesn't actually provide this capability. Kinect is essentially multiple cameras, and a microphone array with each sensor having a unique driver stack so there is no linkage to the physical hardware device. The best way to achieve this would be to use the Windows API instead, by way of WMI and take the device ID's you get for the NUI camera, and microphones, and use WMI to find which USB bus they are attachted to (as each Kinect sensor has to be on its own bus) then you'll know which device matches what. This will be an expensive operation, so I would recommend you do this on start-up, or detection of the devices and keep the information persisted until a time you know the hardware configuration changes, or the application is reset. Using WMI through .NET is pretty well documented, but here is one article that specifically talks about USB devices through WMI/.NET: http://www.developerfusion.com/article/84338/making-usb-c-friendly/.

Susurrate answered 13/7, 2011 at 4:22 Comment(0)
J
3

Mannimarco,

the only link I see is that a camera's UniqueDeviceName property equals it's 'device instance path'.

Doing a little research in the device manager on my computer I can tell that the last 2 numbers at the end of the camera's UniqueDeviceName (0&3, 0&4) are incrementing values (based on controller + port?).

My suggestion is that you sort your list of cameras based on those last digits, and sort your audiodevices on their DeviceID property. This way i suppose when you iterate over your camera list, you can use the corresponding index in the audiodevice list to match the 2 together.

Btw, this is my first post so please be gentle if I'm wrong...

Jeanene answered 13/7, 2011 at 14:58 Comment(0)
P
0

I have had a look at the SDK documentation and it is not great in all honesty. Further more I do not have any Kinect devices to test this on.

The first thing I would do thou is to create an output list of all useful property values for each device, then I would start to look for matches across the two that look like they can be used for links. For each one I find, I would test to see if it does the job.

So I would have a simple console application to output the following property values:

For Each AudioDeviceInfo

  • DeviceID = X
  • DeviceIndex = X
  • DeviceName = X

For Each KinectAudioSource

  • MicrophoneIndex = X

For Each Runtime

  • InstanceIndex = X

then look for any matches in values. Nothing else in the SDK seems really useful. But there must be internal logic to the SDK when it return arrays of AudioDeviceInfo and Runtime.

Anyway, I hope you get it right somehow

Pantechnicon answered 12/7, 2011 at 15:59 Comment(1)
Unfortunately, I set the KinectAudioSource.MicrophoneIndex and Runtime.InstanceIndex fields, so those are useless. The rest look like they should be useful, but there's nothing to compare them to. See update above for the contents of those fields.Plainlaid
U
0

I would get the audio stream from all of them and then compare volume levels. Once you have that you can determine the "object" or person in the kinects 3d space that is actually speaking.

From there you need to determine which cameras this object / person is visible in ...

yeh this is one complex project ... kinect is pretty awesome though ... I don't know much about the API but does it not give you distances and such of people?

good luck with it :)

Urethrectomy answered 12/7, 2011 at 16:4 Comment(1)
This would require calibrating every time the application was started, or at least on each new computer. This is something I'm trying very hard to avoid.Plainlaid
H
0

I would just calibrate the kinects one by one, writing the unique device identifier pairs (camera id, microphone id) to a file. In your application you can then use that file at startup time to synchronize mircophone instances and camera instances (ie. create a table that relates one camera instance to one microphone instance). As camera and microphone inside the kinect probably have their own usb interface ic each (connected via an interal usb hub), there is technically no way to relate the two without prior calibration, as the two device identifier are probably completely unrelated. Also you might want to put labels on the Kinect units and reference these labels inside your initialization file.

Hellbox answered 12/7, 2011 at 23:46 Comment(2)
After plugging a single Kinect into a number of different USB ports, I can say that the NuiCamera.UniqueDeviceName depends on the USB port the Kinect is plugged into. I've also seen the AudioDeviceInfo.DeviceID change upon switching USB ports, although it did switch back to the original number at times... odd. This solution would be much more work for calibrating than Wardy's answer... it would just be an epic pain. I'm looking for something that doesn't require calibration.Plainlaid
Accessing the WMI device driver interface as LewisBenge wrote might be an idea, you definitely need to get some information from the usb chip of the camera / mic. With the hardware setup you have right now, in my opinion, it is impossible to go without some kind of unique identifier clibration or similar. Another possibility might be that you open the kinect, remove the internal usb hub and build a simple usb hub yourself, that attaches a data field to the two usb streams that identifies the camera and the microphone as one unit. With the right mcu this might not be too difficult to achieve.Hellbox
L
0

Sounds interesting, maybe you need some "automatic calibration".

Maybe with some "remote power switches for each usb connection" (io card connected to the usb powerlines). So you could power-on one Kinect after the other automatically and now you know which microphone belongs to which camera.

Or something like that...

Regards! Stefan

Lilywhite answered 13/7, 2011 at 14:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.