Remote Presence

Why Remote Presence?

Bringing the continents together through technological innovation and research is a founding mission for the Namibia plugin campus and Future Tech Lab. Enriching communication mediums to create a greater sense of being together requires the engagement of all the senses and at a level which enables the details of non-verbal communication to be appreciated. Beyond this, there are a large range of sometimes unexpected affordances that we are actively researching.

Using the system in an elementary school for a remote lesson

Current quality — Demonstrating our current level of image quality. Compressed to 17Mbps and a latency of 80ms.

“Sense of Presence”, or the remote sense of presence, consists of a number of dimensions that have been addressed in psychology and education (Community of Inquiry).

Sense of Social Presence
Sense of Spatial Presence
Sense of Cognitive Presence
Sense of Teacher Presence

To achieve the goal of remote presence we are beginning with the technology for visual immersion in a remote location to target the spatial and social dimensions. Specifically, it is necessary to capture live in full 3D a remote location, transmit that over a network and reconstruct the space to present to the user(s). This may be known as 3D teleimmersive video, free viewpoint video, 3D reconstruction or “Holoportation“. This same information can later be leveraged to provide an ability to touch and interact with objects, or it can be analysed using AI.

One key objective is for the system to be passive and unintrusive within the space to allow for a diversity of activities with any age group or technological background. In the near future it is hoped that Augmented Reality headsets and displays will become increasingly capable and minimal in bulk to allow a fuller appreciation of what we are working towards.

What Does It Involve?

Capture Session

We are using a set of cameras organised in pairs and surrounding a room to capture the space from all directions and allowing us to estimate the location of every object in the scene by comparing two views with each other and seeing how the objects change position, a technique called Stereo Matching or Stereo Correspondence.

Depth estimates

Using all of these video cameras and our distance estimates, we combine them together into a model, remove errors from our original distance estimates and add colour to the model using the original colour video information. One possible use case at this point is to cut out people from their original room using “Z-Keying” and mix in some slides as shown in the images below. By using Z-Keying we can avoid having a green curtain or similar arrangement, allowing the technology to be added to an otherwise standard room.

Original view

Coloured model

Once we have a model we can encode it to send over a network, before generating a new 2D video at the other end of the network link which can now be from any viewpoint within the original space rather than being restricted to looking through one of the original cameras. Currently we are able to compress a 3D video like that above to around 30Mbps.

How Can It Be Used?

Since we are able to generate views from any position, we can also then generate two views at the location of a viewers eyes. This allows that viewer to use a Virtual Reality or Augmented Reality headset to see from both eyes and get a 3D immersive experience of the remote location. They are able to move their head and walk around the space as if they were there.

Virtual Reality demo

Stereo rendering

We will also allow for multiple people to be within the same physical space or to combine people from different remote locations into a common virtual space. An older demonstration of group capture is illustrated bellow and a demonstration video from December 2019 shows the concept. It would then allow the shared learning community (teachers, learners and others in different places) to explore a phenomenon together.

Group capture

The quality of the capture is improving rapidly, it will not be long before these techniques become available for real use. After all the above is ready, we will move on to audio, touch and improving the headsets to reduce their bulk and intrusion. We will also be co-designing and exploring alternative display and interaction options that do not involve the use of headsets at all.

We intend this remote presence system to form one part of the supporting technology for our education programmes and research activities.

Contact

Nicolas Pope Lead Developer, Postdoctoral Researchernicolas.pope@utu.fi

Team

Nicolas Pope

Sebastian Hahta

Marko Lahti