🛒
Shopping Assistance for Visually Impaired People Using Edge AI

2025/02/24に公開
NVIDIA
Hello! This is "Mugicha_han".

In this article, I introduce the development process and details of a shopping assistance device for the visually impaired, which combines the speech recognition library VOSK and the object detection model YOLO. This system is built on NVIDIA's compact computer Jetson and is based on the concept of edge AI. By leveraging the technologies I've explored in previous articles, this project demonstrates how such solutions can be applied to real shopping scenarios. Please read on until the end!

 Previous ArticlesIn creating this project, I have been studying VOSK and YOLO for some time and have written several articles on their fundamentals. The following articles may help you understand the project better, so please take a look:
Creating a Dog and Cat Detection Model with YOLOv5n and labelImg
How to Fix the TypeError in labelImg
Trying Out the Speech Recognition Library VOSK

 Background of the Project
 Shopping Challenges for the Visually ImpairedTo improve the shopping environment for people with visual impairments, I investigated their actual situation. According to one article, many visually impaired individuals order groceries online. However, they face issues such as long waiting times for delivery and high shipping costs.

In physical stores, completely blind users often require assistance from store staff, while people with low vision sometimes rely on smartphones to take photos of products and zoom in on details. Nevertheless, in stores where photography is prohibited, these methods cannot be used, presenting various challenges.

 Reference Articleshttps://www.mirairo.co.jp/blog/post-20200428/1800

https://www.atgp.jp/knowhow/oyakudachi/c6478/

 Project ObjectivesThe goal of this project is to utilize the power of edge AI to enable visually impaired individuals to shop independently and safely. Specifically, the project aims to:
Simplify Operations through Speech Recognition

Allow users to specify products using voice commands, thereby reducing the burden of interacting with devices manually.
Situation Awareness through Object Detection

Accurately detect the positions of the user's hand and the product from camera footage, enabling real-time monitoring of the situation.
Assistive Guidance via Voice Prompts

Provide voice prompts that indicate the direction in which the user should extend their hand, based on the relative positions of the hand and the product.

 Technologies UsedBelow is an illustration of the equipment.


 Hardware
Jetson Orin Nano

A compact computer capable of high-speed edge AI processing.

Headset with Microphone

A device used for both voice input and output.

Web Camera

Captures real-life scenes for object detection.

 Software
Ubuntu

OS for the Jetson Orin Nano.

VOSK

A speech recognition library that converts the user's spoken commands into text for processing.

YOLO

An object detection model that accurately identifies the positions of products and the user's hand.

VOICEVOX

A voice guidance system that uses pre-prepared audio data to provide directional instructions.

 System OverviewThis system integrates three main functions to ensure that visually impaired individuals can shop safely:

 1. Speech Recognition with VOSKThe user speaks the name of the desired product or gives a command through the headset. VOSK converts this spoken input into text, which is then passed on for further processing.

 2. Object Detection with YOLOThe web camera captures video of the user, and YOLO detects both the product and the user's hand. Notably, the detection of the user's hand position is crucial for determining the correct direction for reaching out.
The detectable products are five types: onigiri, bread, cup ramen, PET bottles, and snacks. An example is shown below.


 3. Voice Guidance with VOICEVOXBased on the relative positions of the user's hand and the product, the system plays a pre-prepared voice guide. This guide provides instructions such as "right", "up", or "down-left" in eight possible directions to help the user reach the product smoothly.
Note: The voice used in the system is provided by VOICEVOX’s "Shikoku Metan".

 Demo VideoYou can watch a demo video showcasing the system in action:

https://youtu.be/SqMolXfrWyM

 Future ImprovementsIn future development, we plan to focus on the following improvements:
Increase the Number of Products

Currently, the system supports a limited number of products, so we plan to expand the product database.
Improve Camera Field of View

Optimize the camera’s placement and field of view to capture a wider area. We are also considering the adoption of a wide-angle camera.
Resolve Overlap Issues Between Hand and Product

Enhance YOLO’s model to accurately detect hand positions even when they overlap with the product.
Product Price Recognition

In the future, we aim to provide voice guidance that includes product price information as well.
Device's carry.

Since the device is to be used while outside the office, it is necessary to be able to carry the entire Jetson with it.

 ConclusionIn this article, I explained the Edge AI Shopping Assistance Device for the Visually Impaired—covering its technical background, system architecture, and a demo of its operation. We will continue to incorporate further improvements to create a more user-friendly system.
GitHub is also open to the public, if you would like to take a look.
https://github.com/Mugi323/Jetshop-Eye.git
Thank you very much for reading until the end!
Discussion

ログインするとコメントできます