Chapter 3: Supporting Remote Collaboration while keeping workflow

Today talking to other people from remote places is ubiquitous using mobile phones and researchers are just beginning to exploit how natural interaction with artefacts can be made possible and just as ubiquitous from distributed sites. People often use objects when working alone or within a group. It is therefore natural that researchers and developers try to achieve such interactivity with distributed technology. Yet a number of technological issues have to be overcome in order to create a system which allows for easy, straight forward distributed shared object interaction, as simple as using a mobile phone.

Chapter 2 introduced the notion that (Co-)Presence and SHC are very important for good collaboration between people. Subconsciously as well as consciously people recognise nuances from gestures, postures and verbal cues. Furthermore, the feeling of presence can enhance the performance of a team and help to recognise SHC cues. Chapter 2 also introduced briefly the notion of a fragmented workflow on the example of desktop CVEs. Hindmarsh et al. [2000] observed that a desktop user, when directed to an object by gesture and verbal comment, tends to visually locate the user and then follow his gesture to locate the discussed object. This focus on common ground can take considerable time (>20sec) and thereby fragment the workflow. CVE’s are only one example of technology that allows people to communicate and collaborate remotely. This chapter will discuss how other technologies allow distributed collaboration and why most of these fail in removing or at least reducing the fragmentation problem. It will conclude with a summary of different participation frames that each technology supports. The latter is a distinction on how distributed object collaboration is supporting either look-into, reach-into or step-into someone’s environment. But first parameters for the following discussion shall be defined.

1.1 Interaction Metaphors

Using a phone or text message to communicate can complicate collaboration due to possible misunderstandings arising from cues that cannot be communicated through the medium. The use of modern video-conferencing systems gives us more flexibility and support for non-verbal communication, such as facial expressions. Using video-conferencing, however, one only “looks into each other’s world”, which limits the operating range to move around shared objects and to be seen. In addition, shared object manipulation in current video conferencing systems is restricted to a window based 3D environment projected alongside the video window in which the remote participant is seen, for example AG-Juggler [Gonzalez, 2005]. The generic look-into metaphor is shown in Figure 3-1a and this is often extended, as is the case with AG-Juggler, by placing avatars in the shared space to represent the users. However, the movement of the avatars is controlled indirectly through mouse and keyboard and thus natural non-verbal communication is lost. In the case of Access Grid (e.g. [Childers et al., 2000]), the users are left to associate the video representations of remote participants in one window with the avatars in say an AG-Juggler window. In particular, it is hard to see how someone is interacting with an object when the operator, observer and object are each in separate windows, as in Access Grid.

look-into	reach-into	Step-into

(a)	(b)	(c)
Figure 3-1: Interaction metaphors / participation frames

Attempts are made to overcome the limitations of a “look-into” environment and to more closely reproduce a co-located setting. Such an alternative participation frame is “reaching-into each other’s world” (Figure 3-1b). An example is the “Office of the future” where a user is sitting on a desk with a video screen attached to it [Raskar et al., 1998]. Thereby the user can interact with other co-located users, as well as the 3D video reconstruction of a remote collaborator. As users move, their locations are tracked so that the images are rendered from the correct perspective. The goal is for the remote room to be seen as an extension of the local room. In combination with augmented virtual objects (on the screen) it is possible for all participants to interact with these objects [Baker et al., 2003; Barré et al., 2005; Raskar et al., 1998; Yang et al., 2004]. The technology can therefore be considered as supporting a reach-into frame of participation. A constraint of this type of tele-immersion is that movements are restricted to the desk and it is still hard, compared to a real face-to-face meeting, to point to an object in the other’s workspace unless the object is between the collaborators.

A possibility to avoid reference problems and other restrictions is to merge real and virtual world by using augmented reality technology. The users are no longer restricted by position and with co-located users share the same environment. However, a challenge is the problem of tracking and registration of real objects, people and the environment, in order for virtual models to be overlaid precisely on the real world.

Reaching-into someone’s environment can be very beneficial for a number of tasks (e.g. [Ulhaas et al., 2001]) and augmented reality technology is a good example on how to realise this. But some tasks require more than just the representation of a few collaborators and few objects of interest. These tasks require the representation of a whole environment and the best way to interact with such an environment is to step-into it. For example, to train / simulate a rescue operation in a hazardous environment, the look and feel of this space is important and CVEs are a good technology to create such a space. More importantly, it even allows for creating environments impossible to (re‑)create in the real world such as micro-spaces. This means that a system that allows users to share a common virtual space and to “step-into each others world” (Figure 3-1c), such as an immersive CVE, provides the closest resemblance to co-location. In a CVE, remote people and shared objects can be situated in a shared synthetic environment, in which one can navigate around and interact with a computer-generated representation of objects and other participants. Thus, whereas tele-conferencing systems allow people to look into each other’s space, CVEs allow people and data to be situated in a shared spatial and social context.

3.2 Summary

In natural face-to-face collaboration, people use speech, gesture, gaze, and non-verbal cues to communicate. In many cases, the surrounding physical world and objects also play an important role, particularly in design and spatial collaboration tasks (Chapter 2). Real objects support collaboration through their appearance, physical affordances, such as size and weight, use as semantic representations, and ability to create reference frames for communication. In contrast, most interfaces for remote collaboration create an artificial separation between the real world and the shared digital task space. People looking at a projection screen or crowding around a desktop monitor are often less able to refer to objects or use natural communication behaviours. Observations of the use of large shared displays have found that simultaneous interaction rarely occurs due to the lack of software support and input devices for co-present collaboration [Pedersen et al., 1993].

Audio-only interfaces remove the visual cues vital for conversational turn taking, leading to increased interruptions and overlap, difficulty disambiguating between speakers and determining another’s willingness to interact. With tele-conferencing, subtle user movements or gestures cannot be captured, there are few spatial cues among participants, the number of participants is limited by monitor resolution, and participants cannot readily make eye contact. Speakers also cannot know when people are paying attention to them or when it might be permissible to hold side conversations.

In contrast, a number of challenges must be overcome before immersive or augmented reality technology is widely used for collaboration. Although shared interaction with objects is greatly supported, the capturing of nuances in body postures or gestures is depending on the technological effort invested. Further, gaze provides an important non-verbal cue in normal face-to-face and remote collaboration, yet current-generation displays or glasses cover the user’s eyes.

Table 3-1: Support for remote collaboration and interaction
	verbal	non-verbal	shared objects	environment	naturalness	being-there	workflow
	natural speech	gestures, postures, facial expressions	artefacts of interest, person & non-person related	set the scene for natural collaboration and communication	intuitive performance of task	feeling of togetherness	continuation without technical interruptions
face-to-face	natural	natural	shared / natural	shared by all	intuitive	high	synchronous and fluent
audio-conference	natural	NA	not shared	hear-into others	reduces	limited	interrupted through descriptions
groupware	NA	limited	asynchronous	shared	reduces	limited	asynchronous
tele-conference	natural	natural	not shared / natural	look-into others	intuitive	medium	continuous (if in frame)
tele-operation	natural	natural	semi-shared / naturalistic	look-into others	intuitive	medium	continuous (if in frame)
tele-immersion	natural	natural	shared / naturalistic	reach-into shared	intuitive	high	continuous (if in frame)
augmented reality	natural	naturalistic	shared / naturalistic	reach-into shared	naturalistic (with right tracking)	high	interrupted through visibility
typical CVEs	natural	unnatural	shared / unnatural	look-into shared	reduced	medium	interrupted through orientation
immersive CVEs	natural	naturalistic	shared / naturalistic	physical situated in shared	naturalistic (with right tracking)	high	continuous

A summary of the support for remote collaboration and interaction which is discussed in this chapter can be found in Table 3-1. It is possible to reduce the limitations and restrictions of computer mediation by enabling more flexible and natural interaction. Although the naturalness and intuitiveness of face-to-face communication is hard to achieve, immersive virtual environments provide additional and novel ways to enhance the weak areas of remote collaborative interaction.