Shopping Assistance for Visually Impaired People Using Edge AI
Hello! This is "Mugicha_han".
In this article, I introduce the development process and details of a shopping assistance device for the visually impaired, which combines the speech recognition library VOSK and the object detection model YOLO. This system is built on NVIDIA's compact computer Jetson and is based on the concept of edge AI. By leveraging the technologies I've explored in previous articles, this project demonstrates how such solutions can be applied to real shopping scenarios. Please read on until the end!
Previous Articles
In creating this project, I have been studying VOSK and YOLO for some time and have written several articles on their fundamentals. The following articles may help you understand the project better, so please take a look:
- Creating a Dog and Cat Detection Model with YOLOv5n and labelImg
- How to Fix the TypeError in labelImg
- Trying Out the Speech Recognition Library VOSK
Background of the Project
Shopping Challenges for the Visually Impaired
To improve the shopping environment for people with visual impairments, I investigated their actual situation. According to one article, many visually impaired individuals order groceries online. However, they face issues such as long waiting times for delivery and high shipping costs.
In physical stores, completely blind users often require assistance from store staff, while people with low vision sometimes rely on smartphones to take photos of products and zoom in on details. Nevertheless, in stores where photography is prohibited, these methods cannot be used, presenting various challenges.
Reference Articles
Project Objectives
The goal of this project is to utilize the power of edge AI to enable visually impaired individuals to shop independently and safely. Specifically, the project aims to:
-
Simplify Operations through Speech Recognition
Allow users to specify products using voice commands, thereby reducing the burden of interacting with devices manually. -
Situation Awareness through Object Detection
Accurately detect the positions of the user's hand and the product from camera footage, enabling real-time monitoring of the situation. -
Assistive Guidance via Voice Prompts
Provide voice prompts that indicate the direction in which the user should extend their hand, based on the relative positions of the hand and the product.
Technologies Used
Below is an illustration of the equipment.
Hardware
-
Jetson Orin Nano
A compact computer capable of high-speed edge AI processing. -
Headset with Microphone
A device used for both voice input and output. -
Web Camera
Captures real-life scenes for object detection.
Software
-
Ubuntu
OS for the Jetson Orin Nano. -
VOSK
A speech recognition library that converts the user's spoken commands into text for processing. -
YOLO
An object detection model that accurately identifies the positions of products and the user's hand. -
VOICEVOX
A voice guidance system that uses pre-prepared audio data to provide directional instructions.
System Overview
This system integrates three main functions to ensure that visually impaired individuals can shop safely:
1. Speech Recognition with VOSK
The user speaks the name of the desired product or gives a command through the headset. VOSK converts this spoken input into text, which is then passed on for further processing.
2. Object Detection with YOLO
The web camera captures video of the user, and YOLO detects both the product and the user's hand. Notably, the detection of the user's hand position is crucial for determining the correct direction for reaching out.
The detectable products are five types: onigiri, bread, cup ramen, PET bottles, and snacks. An example is shown below.
3. Voice Guidance with VOICEVOX
Based on the relative positions of the user's hand and the product, the system plays a pre-prepared voice guide. This guide provides instructions such as "right", "up", or "down-left" in eight possible directions to help the user reach the product smoothly.
Note: The voice used in the system is provided by VOICEVOX’s "Shikoku Metan".
Demo Video
You can watch a demo video showcasing the system in action:
Future Improvements
In future development, we plan to focus on the following improvements:
-
Increase the Number of Products
Currently, the system supports a limited number of products, so we plan to expand the product database. -
Improve Camera Field of View
Optimize the camera’s placement and field of view to capture a wider area. We are also considering the adoption of a wide-angle camera. -
Resolve Overlap Issues Between Hand and Product
Enhance YOLO’s model to accurately detect hand positions even when they overlap with the product. -
Product Price Recognition
In the future, we aim to provide voice guidance that includes product price information as well. -
Device's carry.
Since the device is to be used while outside the office, it is necessary to be able to carry the entire Jetson with it.
Conclusion
In this article, I explained the Edge AI Shopping Assistance Device for the Visually Impaired—covering its technical background, system architecture, and a demo of its operation. We will continue to incorporate further improvements to create a more user-friendly system.
GitHub is also open to the public, if you would like to take a look.
Thank you very much for reading until the end!
Discussion