Leave a message

Design Considerations of Effective Haptic Device for Human-Robot Interaction under Virtual Reality and Embodiment

Review Article | DOI: https://doi.org/10.31579/2693-4779/198

Design Considerations of Effective Haptic Device for Human-Robot Interaction under Virtual Reality and Embodiment

  • Dongchan Lee *

 IAE, 175-28, Goan-ro 51 beon-gil, Baegam-myeon, Cheoin-gu, Yongin-si, Gyeonggi-do, Korea 

*Corresponding Author: Dongchan Lee, IAE, 175-28, Goan-ro 51 beon-gil, Baegam-myeon, Cheoin-gu, Yongin-si, Gyeonggi-do, Korea

Citation: Dongchan Lee, (2024), Design Considerations of Effective Haptic Device for Human-Robot Interaction under Virtual Reality and Embodiment, Clinical Research and Clinical Trials, 9(4); DOI:10.31579/2693-4779/198

Copyright: © 2024, Dongchan Lee. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Received: 10 March 2024 | Accepted: 15 March 2024 | Published: 25 March 2024

Keywords: smart haptic device; hri (human-robot interaction); virtual reality; virtual embodiment

Abstract

Human-Robot Interaction Technology (HRI) is a technology that enables robots to assess interactive situations and user intentions, allowing them to plan appropriate responses and actions, facilitating smooth communication and collaboration. HRI involves a synergy of perception, judgment, and expression technologies, wherein robots emulate human perceptual, cognitive, and expressive capabilities to infuse vitality into the field of robotics. This core technology extends beyond personal service robots, finding application in professional service robots and various service sectors, exerting a profound impact on the entire field of integrated robotics industries.
In this paper, the focus lies on judgment and expression technologies for haptic suit-based remote control. HRI technology is pivotal for breathing life into robots by mimicking human perception, cognition, and expressive functions. The implications of HRI technology stretch across diverse applications, from personal service robots to professional service robots, making it a technology with significant ramifications for the entire field of robotic integration industries

1.Introduction

HRI (Human-Robot Interaction Technology) is a convergence foundational technology aimed at achieving natural communication and seamless collaboration between humans and robots. In order to accomplish this goal, HRI researches methods and technologies that enable robots to accurately assess interactive situations and user intentions, considering context to express appropriate behaviors and responses. HRI technology shares a similarity with HCI (Human-Computer Interaction) in its approach, as it takes into account human perceptual, cognitive, and behavioral characteristics when designing interfaces and machines. However, HRI technology differs in that robot are inherently tangible and autonomous intelligent systems, fundamentally distinct from computers. Consequently, the bidirectional nature of interaction is more pronounced in robotics, and the diverse control levels required for robot operation set HRI technology apart from HCI in terms of research topics and technical approaches. [1-5]

Major existing research encompasses situation awareness technology that infers user behavioral intentions, emotional states, personality, etc., situation prediction technology that anticipates the future states of interaction with users, and action planning technology that judges appropriate expressive 

content and methods in context. Key functionalities include the generation and execution of action recipes, integrated 3D sensing for localization and mapping, learning for robot control, dynamic object tracking, and more.[1,2] This requires a knowledge framework that provides a method for generating robot action plans and executing them based on the generated plans by understanding changes in the robot's state and the environment in real-time.[3,4] It offers a mechanism to integrate and infer real-time changes in the robot's state, environmental variations, and diverse external knowledge, utilizing probabilistic inference methods such as MLN (Markov Logic Networks) and BN (Bayesian Networks) for uncertain situation modeling. Long-term interactions between users and robots, leading to the formation of social bonds, are studied to understand the differences in interaction between children and adults in daily life. Additionally, research is underway on sentiment, personalization, and personalized learning for growth through storage, experience, and personalized learning of long-term interactions and interaction episodes.[5] The conceptual organization of HRI technologies can be shown in Figure 1 and the detailed contents are as follows.

 

Figure 1 Conceptual organization of HRI technologies

(a) Perception Technology: In the context of HRI (Human-Robot Interaction) technology, 'perception' refers to the function of gathering perceptual information about the interactive counterpart and the surrounding environment through sensory organs. Perception technology utilizes various sensor devices to collect visual, auditory, tactile signals, etc., and analyzes data patterns relevant to interaction. Representative perception technologies include face recognition, expression recognition, gesture recognition, posture recognition, object recognition, object tracking, speech recognition, sound source recognition, timbre recognition, and touch gesture recognition. To implement perception technology, a diverse range of sensing devices such as cameras (RGB, thermal imaging, infrared, RGB-D, etc.), microphones, inertial sensors, touch sensors, actuators based on the concept of artificial muscles, and touch panels are employed. [6,7]

(b) Judgment Technology: 'Judgment' is the function of interpreting the meaning of perceptual information collected in the perception stage, understanding the interactive situation and the intentions of the counterpart, and planning expressions and actions in accordance with the context. HRI (Human-Robot Interaction) judgment technology can be considered a field that has not yet been systematically established. Judgment technology is grounded in the processes underlying the formation, operation, and development of the human mind, and there is currently no systematic theory to fully explain these processes. Judgment technology encompasses a wide range of cognitive skills, including the structure and operation of memory, representation and inference of knowledge, changes in emotions, problem-solving, learning, and development. It possesses a fundamental and strong technical nature. [6,7]

(c) Expression Technology: 'Expression' is the function of effectively and clearly manifesting planned actions and expressions in the interaction context through various means of representation, facilitated by the judgment function. For clear and effective expression, mechanical structures and control architecture technologies are required to manifest expressive actions. Additionally, behavior control technology is needed to generate expressive actions that can be naturally perceived by humans. Key research in expression technology includes the development of robots with high degrees of freedom and rich emotional expression in the face or head, expressive gestures using arms and the body, and conveying intentions and emotional expressions through Text-To-Speech (TTS) and sound. [6,7]

Moreover, there is extensive research in recent times on large-scale multimodal model representation technology that integrates and synchronizes various devices and expressive content to enhance clarity and 

richness in expression. This involves altering emotional models in response to internal and external stimuli. External stimuli such as visual, tactile, auditory, temperature, and olfactory inputs, along with internal stimuli like hunger, self-preservation, and exploratory desires, impact the three axes of the emotional space (pleasantness, activation, certainty). This alteration influences emotions, allowing the expression of a total of seven emotions based on Ekman's basic six: 'Happiness,' 'Anger,' 'Disgust,' 'Fear,' 'Sadness,' 'Surprise,' and 'Neutral.' Research in this domain is grounded in human cognitive models to express human and social interactions. It explores the manifestation of internal motivations and demonstrates the ability to respond to the environment effectively and adaptively through natural emotional responses derived from environmental information. The research aims to develop cognitive systems for knowledge representation and knowledge extension, focusing on predicting situations through interaction with humans and addressing differences (novelty) and uncertainties (uncertainty) that arise in real environments. Self-understanding refers to representing the differences in beliefs and uncertainties that the robot possesses, while self-extension means expanding the knowledge the robot holds using experiences learned through action planning and execution. [6,7]

2.Haptic Suit for Human-Robot Interactions

Haptic suits are a type of wearable device designed to provide users with tactile sensations, including pressure, texture, and temperature, enhancing the sense of touch. These suits establish direct contact with the user's skin to deliver tactile feedback, enriching interactions in virtual environments and creating a more immersive experience with virtual objects and surroundings. [11-13] The sense of touch engaged by haptic suits is considered a crucial non-verbal communication method in human interaction. Therefore, haptic suits are regarded not merely as information and communication technology but as a technology with social functions in human-computer or human-machine interaction. In particular, the social function of haptic suits can be understood through non-verbal social interactions in human society.[14] A haptic suit is a device designed to provide direct feedback to the user's body in virtual reality environments, enhancing the sense of touch. Various products of haptic suits have been introduced in the market. These suits envelop the entire body and utilize electrical stimulation to convey various tactile sensations. By applying a low-current close to the skin, they stimulate muscles to create sensations of vibration and pressure. Additionally, the suits incorporate temperature sensation capabilities, allowing users to feel changes in temperature within the virtual environment. These features contribute to providing users with a more realistic and immersive virtual experience. Alternatively, wireless haptic vests can be worn on the upper body without covering the entire torso. Using haptic motors, they transmit vibrations, enabling users to feel various physical interactions such as impacts, explosions, and contacts occurring in the virtual environment. [6,7]

The concept of "presence" refers to a psychological phenomenon that often occurs when experiencing mediated virtual environments, defined as the "sense of being with another" or the "sense of being there." While the concept of presence itself is often simplistically defined as the "sense of being there," the theoretical establishment and measurement of presence through survey tools have posed numerous conceptual and operational definition challenges arising from this phrase. Specifically, presence is defined as the "perceptual illusion of non-mediation." According to this definition, two fundamental conditions must be met for the establishment of presence: 1) the use of human sensory organs, and 2) the existence of technology mediating between humans and the experience. In other words, for presence to occur, the following two conditions must be fulfilled: First, conscious efforts using senses such as sight, smell, taste, hearing, and touch are necessary, and during these efforts, there must be an illusion of virtual experiences being perceived as real. Second, there is always technology involved between humans and experiences, but in the process of feeling presence, the technology itself should be unnoticed, much like a person accustomed to wearing glasses who may not consciously perceive the glasses while wearing them. In particular, it provides a crucial psychological mechanism for explaining why users of virtual reality content, especially through avatars, feel a distinct experience when using the virtual embodiment's body directly, as opposed to experiencing it through an avatar. The key insight lies in the phenomenon where visual information perceived by the eyes takes precedence over proprioceptive information, which is the body's response to external stimuli applied to muscles or joints. In virtual reality, if the avatar and the user's perspective align, and the manipulation of virtual controllers closely corresponds to real-world actions, it implies that a high level of ownership sense and subjective perception of virtual embodiment can be achieved.

3.Haptic for Prediction and Movement Assessment of Human-Robot Interactions

The fundamental approach for assessing movement involves the following steps: To evaluate motor skills during the execution of activities, the initial step is to identify the specific exercise the user is currently engaged in. Subsequently, the sensor data is segmented, and the initiation and conclusion of each repetition are determined. Segmentation is a crucial step for training probabilistic movement models that can recognize errors in exercise execution and subsequently deliver haptic feedback. The classification of exercises is based on the starting pose of each exercise. However, this approach is constrained by the variability in starting poses across the exercise pool and necessitates short breaks between repetitions. Alternatively, a training plan could be defined with predetermined exercises for users to follow, but this may be overly restrictive. Consequently, we opt for an approach where users can freely choose their exercises. For exercise classification, we employ support vector machines (SVM) to train models capable of distinguishing between exercise starting poses, repetitions, and movements unrelated to any exercise. [28]

Figure 2. General movement exercise prediction flowchart.[28] Once the exercise has been predicted, and repetitions have been segmented, we proceed to train probabilistic movement models for the detection of exercise execution errors. Probabilistic movement models, which essentially represent a distribution over trajectories, have demonstrated promising results in the field of robotics. They are well-suited for human exercise assessment, as they can learn specific trajectories or movements based on expert demonstrations. Leveraging the advantages of imitation learning, we construct probabilistic movement models for each body part, utilizing joint positions obtained from the reconstructed avatar. The haptic suit plugin already estimates the human pose and applies the movements to a standardized avatar, eliminating the need to account for the user's height and limb lengths. This allows for the comparison of motor skills between subjects, irrespective of their height. The flowchart for movement assessment is depicted in Figure 4. [28]

Figure 3. Operational operation movement assessment flow chart [28]

 

Figure 4. Probabilistic model of trajectories, from a feature space (left column) to trajectories (right column) (A) Generative model. Equally spaced (radial) basis functions are amplitude scaled by a feature vector to approximate a one-dimensional trajectory. (B) Learning the correlations between multi-dimensional input trajectories.

The concept of imitation learning facilitates the real-time detection of exercise execution errors without the need to predefine specific error classes. This means that we identify execution errors by comparing users' observed motor skills with a probabilistic model derived from expert demonstrations. Unlike classification methods, which offer limited information on specific execution errors and require a priori knowledge of possible error classes, we opt to train probabilistic movement models for each Inertial Measurement Unit (IMU) or joint. Probabilistic models also empower us to identify subpar exercise performance as soon as the joint position exceeds the standard deviation of the reference execution. This approach, in particular, allows us to assess the severity of the execution error, enabling the provision of stronger haptic feedback for more significant errors. Additionally, Functional Electrical Stimulation involves applying electrical current pulses to excitable tissue to induce artificial contractions, enhancing or replacing motor functions in individuals with neurological impairments.

Working principle: The stimulation process involves applying electrical 

current through a pair of electrodes positioned on the skin above sensory-motor structures. The electric field created between the two electrodes (anode and cathode) induces an ion flux in the tissue (Figure 6). Specifically, the anode, functioning as the positive electrode, imparts a positive charge to the cell membranes of neighboring neurons, resulting in an accumulation of negative ions and subsequent membrane hyperpolarization. In contrast, the cathode, serving as the negative electrode, attracts positive ions, leading to depolarization in the underlying membrane region. If the depolarization reaches a critical threshold, an action potential is generated, indistinguishable from a physiological one. This generated action potential propagates to the neuromuscular junction, causing muscle fibers to contract. It is essential to note that the charge threshold required for generating action potentials in muscle fibers is significantly higher than that for neurons. As a result, electrical stimulation predominantly activates nerves rather than muscles, underscoring the importance of intact lower motor neurons for the effectiveness of Functional Electrical Stimulation (FES) [18,28].
 

Figure 5 Neurophysiological principles of FES

Muscle recruitment: During a contraction induced by Functional Electrical Stimulation (FES), muscle fibers are recruited in a manner distinct from physiological contractions. Primarily, motor units are recruited based on geometrical activation, starting from superficial layers at low current levels and progressing to deeper layers with increasing current amplitude [19]. Additionally, while voluntary movements involve sequential activation of motor units (asynchronous recruitment), FES recruits all motor units  simultaneously (synchronous recruitment) [20,28]. In asynchronous  recruitment, illustrated in Figure 6, motor units collaborate to maintain constant tension during muscle contraction (tetanic contraction), with adjacent units activated at a frequency of 6-8 Hz. This asynchronous recruitment helps mitigate the onset of muscle fatigue. Conversely, synchronous recruitment in FES requires much higher stimulation, up to 20-40 Hz, leading to an elevated rate of muscle fatigue associated with FES approaches [19]. Furthermore, FES follows a non-physiological recruitment pattern by activating fast-twitch fibers before slow-twitch ones. This occurs because the larger-diameter axons innervating fast-twitch fibers are more influenced by the stimulation-induced electric field than the smaller-diameter ones of slow-twitch fibers. The wider spacing between Ranvier nodes in fast-twitch fibers results in larger induced transmembrane voltage changes at the same charge level [18]. However, fast-twitch fibers tend to fatigue more quickly, contributing to the increased fatigue rate characteristic of FES-induced muscle contractions [19,28].

Figure 6 Summation of tension in motor units (MU) during asynchronous recruitment.

Pulse Shapes: The generation of the ionic flux is induced by the variation in the electric field. A relatively rapid rising edge is crucial for the current to induce excitation, necessitating a properly shaped wave or pulse [21]. Figure 7 illustrates some of the most common pulse waves used in Functional Electrical Stimulation (FES). These waveforms are categorized as monophasic and biphasic. Monophasic waveforms involve repeated unidirectional pulses, typically cathodic, while biphasic waveforms consist of repeated pulses with a cathodic phase followed by an anodic one [22,28]. In biphasic configurations, the positive pulse counterbalances the negative  one, resulting in a net injected charge equal to zero, preventing potential damage at the electrode-tissue interface [22,28]. Within biphasic configurations, various pulse shapes exist, including symmetric, asymmetric, and balanced asymmetric. Symmetric pulses consist of two identical phases in terms of duration and amplitude but with opposite polarities. Conversely, asymmetric pulses involve phases with different durations and/or amplitudes [23]. Balanced asymmetric shapes are characterized by selecting parameters such that the total energy delivered to the body during the leading phase equals the total energy removed from the body during the trailing pulse, despite differing amplitude and duration [23,28].

Figure 7 Examples of commonly used pulse shapes used for functional electrical stimulation.


 

Stimulation Parameters: FES pulses are characterized by three parameters, as illustrated in Figure 8: pulse amplitude, pulse duration (or pulse width (PW)), and pulse frequency [23]. Pulse amplitude represents the magnitude 

of the stimulation and directly influences the specific type of nerve fibers responding to it. As mentioned earlier, larger nerve fibers in close proximity to the stimulation electrode are recruited first [23]. Pulse duration, or pulse width, refers to the time duration of a single phase of the pulse. The strength of a pulse is determined by its charge level, defined as the product of pulse amplitude and duration. Therefore, the required duration for an effective pulse varies inversely with amplitude; in other words, to generate the same induced response, an increase in pulse duration requires a lower amplitude, and vice versa [26]. Each pulse with a proper charge level, inducing an action potential, produces a muscle twitch characterized by a sharp rise in force  followed by a slower return to the relaxed state. Stimulation frequency is the rate at which stimulation pulses are delivered. Increasing the frequency leads to temporal summation of twitches, resulting in a higher mean generated force compared to that of single twitches [23]. Once the pulse frequency surpasses a certain value (typically 20 Hz), a sustained contraction (tetanic contraction) is achieved, where individual twitches are no longer distinguishable. Tetanic contraction, achieved with frequency values ranging from 20 to 50 Hz, is desirable in FES applications to provide high-quality movement. However, pulse frequency should not be excessively increased as it accelerates the onset of muscle fatigue. Thus, there is a trade-off in choosing this parameter [23]. Typical working values include a frequency between 20 and 50 Hz, pulse width between 100 and 500 μs, and amplitude between 10 and 125 mA [22,28].
 

 

Figure 8: Functional electrical stimulation parameters.

Stimulator Circuit: Electrical stimulators can deliver pulses controlled by either voltage or current. Voltage-controlled stimulators maintain a constant desired voltage between electrodes, without considering variations in tissue resistance. In instances of inadequate skin-electrode contact, resulting in increased resistance, these stimulators may experience a decrease in current, leading to a reduced muscle response. However, they do not pose a potential harm to the skin. On the other hand, current-controlled stimulators provide constant current pulses. In cases where there is a reduced effective electrode surface area, the current density increases with a rise in the voltage level. This elevation in current density could potentially lead to skin burns.[24] Motion capture module The motion capture module utilizes Inertial Measurement Unit (IMU) sensors, which are integrated into the suit and fixed in place (as illustrated in Figure 10). These sensors track, record, and monitor the movements and positioning of the users, generating a digital representation of the user in the form of an avatar. This technology finds applications in various fields such as animation creation (e.g., games, 

movies), performance monitoring and capture (e.g., sports), as well as ergonomics and human factor testing for research and data analysis. [22,23,28]

Electrical stimulation module: The electrical stimulation module incorporates dry textile voltage-controlled electrodes, strategically embedded in the suit at anatomical locations, as depicted in Figure 10. These electrodes are paired into channels, with each channel corresponding to a specific muscle. Anodes and cathodes are distributed across these channels, and each channel comprises both an anode and a cathode, with some channels sharing the same anode. The electrodes have the capability to deliver Neuromuscular Electrical Stimulation pulses, inducing artificial muscle contractions, and Transcutaneous Electrical Nerve Stimulation to replicate haptic sensations. Through this system, the Haptic suit can provide physical feedback aligned with the visual simulation experienced in a virtual reality environment. [22,23,28]

Figure 9 Smart Haptic Feedback Suit [28]

Biometry module: The biometry module incorporates photoplethysmography (PPG) technology, offering data on the user's heart rate in beats per minute (BPM) and pulse rate variability (PRV). This functionality facilitates the development of interactive virtual reality training content that dynamically adjusts to the participant, providing personalized and tailored experiences. [22,23,28]

Benefits & Limitations: FES-based treatment presents several advantages in the rehabilitation process. It enables active muscle contractions, contributes to muscle strength improvement, prevents disuse and muscle atrophy, reduces spasticity and spasms, enhances the energy-efficient use of proximal limb muscles, and reduces energy expenditure associated with post-stroke activities [25]. Despite its peripheral application, studies have shown that FES can induce neurological changes and potentially aid in motor relearning when combined with residual voluntary inputs from the patient. This phenomenon, known as the "carry-over effect," is explained by Rushton's hypothesis, highlighting FES's unique feature of activating nerve fibers both orthodromically and antidromically, unlike physiological activation that exclusively operates orthodromically. [22,26,28]

However, FES has some limitations, including the non-linear relationship between injected current and induced muscle contraction, and the early onset of fatigue in stimulated muscles. This is attributed to the synchronous and inverted recruitment of motor units compared to physiological activation, restricting the long-term applicability of FES [27,28].

4.Discussion and Conclusion

The judgment and expression technology for Human-Robot Interaction (HRI) is a crucial technology that endows robots with vitality by enabling them to assess interaction situations and user intentions. This is achieved through discerning appropriate responses and actions, facilitating communication and collaboration with humans. This technology is essential for the commercialization of service robots, outlining the development direction of HRI. Initially, research in HRI technology focused on individual functional units based on foundational technologies, aiming to improve performance in specific functionalities (such as face recognition, speaker recognition, gesture recognition, sound source tracking, and human tracking). However, these technologies were limited to well-structured environments, with low usability due to insufficient consideration of consumer demands in the immature market. As demand for HRI technology in real-world applications has increased across various fields, the research direction has shifted toward development focused on real-world applications.

In the future, it is anticipated that the development of recognition technology will transition from unit-function-based continuous monitoring to recognition technology integrating environmental sensors and distributed resources. Following recognition, there will be an organic integration of perception, judgment, and expression based on service scenarios to provide services. Moreover, HRI technology is expected to evolve into an open and market-oriented form, where market participants and technology providers share information and knowledge. This will enhance responsiveness to actual services through efficient utilization of computing resources. Robot services will take on a form where knowledge and resources are shared, reused, and virtualized, similar to the web. In this manner, robots and avatars can interact in virtual spaces, providing a core technology applicable across various fields such as education, healthcare, entertainment, and social safety, where humans and spaces are shared.

Figure 10: Development direction of HRI technology

Acknowledgement

“This research was supported by the Technology Innovation Program (20019115) funded by the Korea Planning & Evaluation Institute of Industrial Technology (KEIT) and the Ministry of Trade, Industry, & Energy (MOTIE, Korea)”.

References

a