Projection techniques are being used to give robots human faces in an application that has the potential to dramatically change telepresence experiences. Researchers in Munich have joined forces with Japanese scientists to develop a system that projects a 3D image of a face onto the back of plastic mask. A computer controls voice and facial expressions to create Mask-bot, a human-like plastic head.
The project is part of research being carried out at CoTeSys, Munich’s robotics Cluster of Excellence and developed by a team at the Institute for Cognitive Systems (ICS) at TU München in collaboration with AIST, the National Institute of Advanced Industrial Science and Technology in Japan.
Mask-bot can already reproduce simple dialog. For example, when researcher, Dr. Takaaki Kuratate says “rainbow”, Mask-bot flutters its eyelids and responds with: “When the sunlight strikes raindrops in the air, they act like a prism and form a rainbow”. Mask-bot also moves its head a little and raises its eyebrows.
“Mask-bot will influence the way in which we humans communicate with robots in the future,” predicts Prof. Gordon Cheng, head of the ICS team.
Faces can be changed on-demand and realistic features that can be seen from various angles, including the side.
There is a 12cm gap between the high-compression, x0.25 fish-eye lens with a macro adapter and the face mask. The CoTeSys team therefore had to ensure that an entire face could actually be beamed onto the mask at this short distance.
Mask-bot is also bright enough to function in daylight thanks to a particularly strong and small projector and a coating of luminous paint sprayed on the inside of the plastic mask. “You don’t have to keep Mask-bot behind closed curtains,” said Kuratate. This part of the system could soon be deployed in video conferences. “Usually, participants are shown on screen. With Mask-bot, however, you can create a realistic replica of a person that actually sits and speaks with you at the conference table. You can use a generic mask for male and female, or you can provide a custom-made mask for each person,” he explained.
A new program enables the system to convert a normal two-dimensional photograph into a correctly proportioned projection for a three-dimensional mask. Further algorithms provide the facial expressions and voice.
To replicate facial expressions, Kuratate developed a talking head animation engine – a system in which a computer filters an extensive series of face motion data from people collected by a motion capture system. This selects the facial expressions that best match a specific sound, called a phoneme, when it is being spoken. The computer extracts a set of facial co-ordinates from each of these expressions, which it can then assign to any new face, thus bringing it to life. Emotion synthesis software delivers the visible emotional nuances that indicate, for example, when someone is happy, sad or angry.
Mask-bot can realistically reproduce content typed via a keyboard – in English, Japanese and soon German. A text-to-speech system converts text to audio signals, producing a female or male voice, which can then set to quiet or loud, happy or sad.