ShelfHelp: Empowering Humans to Perform Vision-Independent Manipulation Tasks with a Socially Assistive Robotic Cane

Shivendra Agrawal, Suresh Nayak, Ashutosh Naik, Bradley Hayes·May 30, 2024

Summary

ShelfHelp is a socially assistive robotic cane designed to empower visually impaired individuals in grocery shopping by assisting with vision-independent manipulation tasks. The system combines a visual product locator, a planner for verbal guidance, and a computer vision pipeline that includes product detection using YOLOV5 and a fine-grained manipulation guidance system. It addresses challenges in store navigation and retrieval through a Markov Decision Process-based guidance, offering both continuous and discrete modes. A human subjects study with 15 participants demonstrated the system's effectiveness, with discrete guidance showing fewer commands and faster retrieval times compared to continuous and human assistance. The system, which works offline and without barcode scanning, received positive feedback for its usability and potential to increase independence, privacy, and safety for visually impaired individuals. Future research will focus on incorporating semantic information and improving the interactive aspect of the guidance.

Key findings

Paper digest

What problem does the paper attempt to solve? Is this a new problem?

The paper aims to address the issue of empowering individuals with vision impairments to perform vision-independent manipulation tasks using a socially assistive robotic cane . This paper focuses on enhancing the independence of visually impaired individuals during shopping activities by reducing their reliance on sighted assistance and mitigating privacy concerns associated with traditional support mechanisms . While the concept of using technology to assist visually impaired individuals is not new, the specific problem of providing vision-independent manipulation guidance in shopping scenarios, including navigation, product retrieval, and product examination, is a novel and important area of research .

What scientific hypothesis does this paper seek to validate?

This paper seeks to validate several scientific hypotheses related to the manipulation tasks performed with a socially assistive robotic cane:

The study confirms that participants retrieved products with significantly fewer commands in the discrete mode compared to the continuous mode .
It also confirms that participants could retrieve products significantly faster in the discrete mode compared to the continuous mode .
The research found that there was no statistical difference in the net hand movement caused by either planner, thus the hypothesis related to net hand movement could not be confirmed .
Additionally, the study revealed that the human caller had significantly higher net movement compared to the proposed guidance planners .

What new ideas, methods, or models does the paper propose? What are the characteristics and advantages compared to previous methods?

The paper proposes a novel verbal guidance solution that involves learning a mapping of language commands to human hand movements and then mapping them to actions within a Markov Decision Process (MDP) that can be solved using reinforcement learning techniques . This approach aims to provide manipulation guidance to users, focusing on fine-grain manipulation guidance in scenarios like kitchen countertops or grocery stores where exhaustive search is inefficient due to similar items densely situated . The system addresses the limitations of existing solutions like Be My Eyes, which rely on human availability, scalability issues, privacy concerns, and the need for an active data connection .

Additionally, the paper discusses the concept of manipulation guidance, which has been explored in the robotics community for over a decade. Previous studies have used saliency maps to find regions of interest and direct users' hands, but they lacked a global frame of reference, affecting the efficacy of verbal commands . The proposed system aims to overcome these limitations by providing a more comprehensive and autonomous manipulation guidance solution .

Furthermore, the paper highlights the importance of product identification in the context of visually impaired individuals navigating grocery stores. Existing techniques with a fixed number of output classes may not be practical due to the vast number of products available and their variations . The paper suggests that relying on bar code scanning for product identification, which requires internet connectivity, may not be the most efficient solution . This underscores the need for innovative approaches to product identification that can handle the diverse range of products in a grocery store without the need for extensive labeled data and training of object classifiers . The proposed system in the paper introduces a novel verbal guidance solution that maps language commands to human hand movements and actions within a Markov Decision Process (MDP) using reinforcement learning techniques . This approach aims to provide manipulation guidance in scenarios like kitchen countertops or grocery stores, where exhaustive search is inefficient due to densely situated items . Compared to existing solutions like Be My Eyes, which rely on human availability and have scalability and privacy concerns, the system offers a more autonomous and scalable manipulation guidance solution .

One key advantage of the system is the discrete guidance mode, inspired by the fact that visually impaired individuals perceive length units better than sighted people and prefer minimal verbal feedback . The discrete guidance mode optimizes guide time and the total number of commands without compromising legibility, performing on par with a human caller in terms of guide time and the number of commands . This mode minimizes the exhaustive search required in the haptic space of the user, enhancing efficiency in manipulation tasks .

Moreover, the continuous guidance mode provides continuous cues along each axis of movement until alignment, offering affirmations like the "Stop" command to ground the user during execution . While the continuous mode elicited positive responses, the discrete mode outperformed it in terms of the number of commands and guide time, aligning closely with the performance of a human caller . The system's continuous planner calculates relative positions and provides continuous cues, enhancing the user's manipulation guidance experience .

Additionally, the system incorporates a two-stage product search system that proposes regions likely to contain a product and matches features to find the best match, enabling real-time product detection without the need for retraining . This feature enhances the efficiency of locating and retrieving products in real-world environments like grocery stores, contributing to the overall effectiveness of the manipulation guidance system .

Overall, the system's characteristics include a maintenance-free product locator system, offline functionality, and a fine-grain manipulation guidance planner that optimizes guide time and command efficiency . By offering both discrete and continuous guidance modes, leveraging real-time product detection, and addressing the preferences and needs of visually impaired users, the system presents a comprehensive and effective solution for vision-independent manipulation tasks .

Do any related researches exist? Who are the noteworthy researchers on this topic in this field?What is the key to the solution mentioned in the paper?

Several related research studies exist in the field of vision-independent manipulation tasks with socially assistive robotic canes. Noteworthy researchers in this field include Vladimir Kulyukin, Chaitanya Gharpure, John Nicholson, and many others . The key to the solution mentioned in the paper involves developing a novel verbal guidance solution that maps language commands to human hand movements and actions within a Markov Decision Process (MDP) using reinforcement learning techniques to assist users in manipulation tasks .

How were the experiments in the paper designed?

The experiments in the paper were designed with a specific focus on two proposed guidance planners - Continuous and Discrete, compared against a human caller guiding over a video call as a baseline . Each participant performed tasks under three conditions (continuous, discrete, human) five times, with a set of 5 different products for each condition and the same spatial configuration used for each condition . The users were oriented on how to use the system, hold it, and informed about the alignment process and interpretation of alignment guidance instructions before the actual guidance . The experiments aimed to test the system's efficacy in manipulation guidance using different modes and a human caller, with participants retrieving products under various conditions to evaluate the success rate, number of commands, guidance time, and subjective evaluation metrics .

What is the dataset used for quantitative evaluation? Is the code open source?

The dataset used for quantitative evaluation in the study is a dataset collected from human demonstrations, where verbal movement commands were mapped to participants' hand movements in response to those commands . The code for the study, including the discrete guidance method development, is not explicitly mentioned as open source in the provided context.

Do the experiments and results in the paper provide good support for the scientific hypotheses that need to be verified? Please analyze.

The experiments and results presented in the paper provide strong support for the scientific hypotheses that needed verification. The study conducted post-hoc comparisons using Tukey’s HSD test to analyze the participants' performance in retrieving products under different guidance modes . The results confirmed the hypotheses that participants retrieved products with significantly fewer commands in the discrete mode compared to the continuous mode, and they could retrieve products faster in the discrete mode . Additionally, the study found that the net hand movement was less with the discrete guidance mode compared to the continuous mode, aligning with the hypotheses .

Moreover, the subjective evaluation conducted through a survey after each condition indicated that participants rated both proposed guidance planners highly on metrics such as human-like, interactive, competent, and intelligent . The survey results showed that the two planners and human performance were not statistically different on competence and intelligence metrics, providing further support for the hypotheses . However, the discrete planner was rated slightly lower in the interactive metric, suggesting an area for potential improvement in future investigations .

Overall, the experiments and results in the paper offer robust evidence supporting the scientific hypotheses related to the effectiveness of the discrete guidance mode in minimizing commands, reducing net hand movement, and providing a competent and intelligent guidance system for vision-independent manipulation tasks with a socially assistive robotic cane .

What are the contributions of this paper?

The contributions of the paper "ShelfHelp: Empowering Humans to Perform Vision-Independent Manipulation Tasks with a Socially Assistive Robotic Cane" include:

Developing a robotic cane equipped with RealSense D455 and T265 cameras for navigation and manipulation tasks, powered through a laptop in a backpack .
Addressing the lack of independence faced by people with visual impairments (PVI) during shopping by reducing dependence on guide availability and mitigating privacy loss .
Providing assistance in locating items in stores and at home for visually impaired individuals .
Focusing on improving navigation, product retrieval, and product examination during grocery shopping for individuals with visual impairments .

What work can be continued in depth?

Further research in the field can focus on the following areas to deepen the understanding and enhance the system:

Exploration of Manipulation Guidance: Future work can delve into refining manipulation guidance systems to provide more precise and efficient assistance in various scenarios, such as kitchen environments or locations with densely packed items like grocery stores .
Enhancement of Product Identification: There is room for improvement in product identification techniques, especially in developing solutions that can handle the vast array of products found in grocery stores without the need for extensive data labeling and training of object classifiers .
Scalability and Practicality: Research can aim to make assistive systems more scalable, practical, and user-friendly, addressing issues such as reliance on human availability, active data connections, and privacy concerns, particularly in developing countries .

Tables

Introduction

Background

Advancements in assistive technology for visually impaired

Challenges faced by visually impaired in grocery shopping

Objective

Empower visually impaired individuals with a novel robotic cane solution

Improve store navigation and manipulation tasks

Methodology

System Components

Visual Product Locator

Real-time product detection using YOLOV5

Planning and Guidance

Markov Decision Process (MDP) based navigation

Continuous and discrete guidance modes

Computer Vision Pipeline

Fine-grained manipulation guidance system

Data Collection

Human subjects study design

Data Analysis

Study participants (15 visually impaired individuals)

Performance metrics: commands, retrieval times

Implementation

Offline operation, no barcode scanning required

Evaluation

Study results: discrete guidance vs. continuous and human assistance

Usability feedback

Privacy and safety benefits

Results and Findings

Effectiveness demonstrated in the human subjects study

Positive feedback on usability, independence, privacy, and safety

Limitations and areas for improvement

Future Research

Semantic Information Integration

Enhancing guidance with context and product semantics

Interactive Aspects

Improving user engagement and adaptability in real-time

Potential for integration with other assistive technologies

Conclusion

ShelfHelp's impact on visually impaired grocery shopping experience

Potential for broader adoption and societal benefits

Basic info

papers

computer vision and pattern recognition

human-computer interaction

robotics

machine learning

artificial intelligence

Advanced features

Insights

Which computer vision technique does ShelfHelp use for product detection?

What is ShelfHelp designed for?

What mode of guidance in the system showed better performance in the human subjects study?

How does the system assist visually impaired individuals in grocery shopping?