🛒

Shopping Assistance for Visually Impaired People Using Edge AI

2025/02/24に公開

Hello! This is "Mugicha_han".
In this article, I introduce the development process and details of a shopping assistance device for the visually impaired, which combines the speech recognition library VOSK and the object detection model YOLO. This system is built on NVIDIA's compact computer Jetson and is based on the concept of edge AI. By leveraging the technologies I've explored in previous articles, this project demonstrates how such solutions can be applied to real shopping scenarios. Please read on until the end!

Previous Articles

In creating this project, I have been studying VOSK and YOLO for some time and have written several articles on their fundamentals. The following articles may help you understand the project better, so please take a look:

Background of the Project

Shopping Challenges for the Visually Impaired

To improve the shopping environment for people with visual impairments, I investigated their actual situation. According to one article, many visually impaired individuals order groceries online. However, they face issues such as long waiting times for delivery and high shipping costs.
In physical stores, completely blind users often require assistance from store staff, while people with low vision sometimes rely on smartphones to take photos of products and zoom in on details. Nevertheless, in stores where photography is prohibited, these methods cannot be used, presenting various challenges.

Reference Articles

https://www.mirairo.co.jp/blog/post-20200428/1800
https://www.atgp.jp/knowhow/oyakudachi/c6478/

Project Objectives

The goal of this project is to utilize the power of edge AI to enable visually impaired individuals to shop independently and safely. Specifically, the project aims to:

  • Simplify Operations through Speech Recognition
    Allow users to specify products using voice commands, thereby reducing the burden of interacting with devices manually.

  • Situation Awareness through Object Detection
    Accurately detect the positions of the user's hand and the product from camera footage, enabling real-time monitoring of the situation.

  • Assistive Guidance via Voice Prompts
    Provide voice prompts that indicate the direction in which the user should extend their hand, based on the relative positions of the hand and the product.

Technologies Used

Below is an illustration of the equipment.

Hardware

  • Jetson Orin Nano
    A compact computer capable of high-speed edge AI processing.
  • Headset with Microphone
    A device used for both voice input and output.
  • Web Camera
    Captures real-life scenes for object detection.

Software

  • Ubuntu
    OS for the Jetson Orin Nano.
  • VOSK
    A speech recognition library that converts the user's spoken commands into text for processing.
  • YOLO
    An object detection model that accurately identifies the positions of products and the user's hand.
  • VOICEVOX
    A voice guidance system that uses pre-prepared audio data to provide directional instructions.

System Overview

This system integrates three main functions to ensure that visually impaired individuals can shop safely:

1. Speech Recognition with VOSK

The user speaks the name of the desired product or gives a command through the headset. VOSK converts this spoken input into text, which is then passed on for further processing.

2. Object Detection with YOLO

The web camera captures video of the user, and YOLO detects both the product and the user's hand. Notably, the detection of the user's hand position is crucial for determining the correct direction for reaching out.

The detectable products are five types: onigiri, bread, cup ramen, PET bottles, and snacks. An example is shown below.

3. Voice Guidance with VOICEVOX

Based on the relative positions of the user's hand and the product, the system plays a pre-prepared voice guide. This guide provides instructions such as "right", "up", or "down-left" in eight possible directions to help the user reach the product smoothly.

Note: The voice used in the system is provided by VOICEVOX’s "Shikoku Metan".

Demo Video

You can watch a demo video showcasing the system in action:
https://youtu.be/SqMolXfrWyM

Future Improvements

In future development, we plan to focus on the following improvements:

  • Increase the Number of Products
    Currently, the system supports a limited number of products, so we plan to expand the product database.

  • Improve Camera Field of View
    Optimize the camera’s placement and field of view to capture a wider area. We are also considering the adoption of a wide-angle camera.

  • Resolve Overlap Issues Between Hand and Product
    Enhance YOLO’s model to accurately detect hand positions even when they overlap with the product.

  • Product Price Recognition
    In the future, we aim to provide voice guidance that includes product price information as well.

  • Device's carry.
    Since the device is to be used while outside the office, it is necessary to be able to carry the entire Jetson with it.

Conclusion

In this article, I explained the Edge AI Shopping Assistance Device for the Visually Impaired—covering its technical background, system architecture, and a demo of its operation. We will continue to incorporate further improvements to create a more user-friendly system.

GitHub is also open to the public, if you would like to take a look.

https://github.com/Mugi323/Jetshop-Eye.git

Thank you very much for reading until the end!

Discussion