NTT is developing technology to accurately reproduce the relative positions of participants in remote video calls. The system is intended to create a sense of distance and position as if both people are in the same space. The system uses 3D video combined with wave front synthesised audio to create as real a positional impression as possible.
An array of 64 microphones picks up the audio from the talker, and these channels are then reproduced at the listener end by an array of loudspeakers located behind the video display.
The audio element was demonstrated successfully in 2012, but the latest innovation includes a Kinect camera at the talker end to create a 3D image for the listener giving an accurate positional view to accompany the synthesised audio.
The demonstration at the moment uses a green-screen and digitally created backdrop for the talker, but in the future the intention is to use full real video to give a complete picture.
One limitation of the video component of the system is that it only works for a two way call, there would need to be additional processing of the audio plus more view-points added to the 3D display for a realistic multipoint call.