iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🎤

[Free] Amical: The AI Voice Input Revolution

に公開

AI Voice Input Software Amical

While AquaVoice is a popular choice for voice input software, I would like to introduce an open-source project called Amical. It is free to use, and I found its precision to be remarkably high and its speed quite impressive.

Introduction

Telemetry Settings

In Advanced > Anonymous Telemetry, there is a setting for whether your data is used for training. If you have concerns about privacy, please turn this off.

Amical Official Documentation

https://amical.ai/docs

I recommend pasting the URL https://amical.ai/docs into ChatGPT or Gemini to let the AI read it; then you can ask questions directly. You might not even need to read this article.

Basic Usage

ctrl + win: Voice Input (Hold to speak)

ctrl + win + space: Voice Input (Hands-free)

Press ctrl + win + space again to confirm the input.

Installation Procedures

How to Download the Installer

Access the official site (GitHub) at: https://github.com/amicalhq/amical/releases

For Windows, click Amical-0.1.16-beta.5.Setup.exe to download.

For Mac:

  • Equipped with Apple Silicon (M1 / M2 / M3 / M4, etc.) → Amical-0.1.16-beta.5-arm64.dmg
  • Equipped with Intel Processors (Models before 2020, etc.) → Amical-0.1.16-beta.5-x64.dmg

If you have Homebrew installed on Mac, you can use the following command:

brew install --cask amical

When you start the installer, a screen like this will be displayed:

→ Since I forgot to take some screenshots, I will only provide the translations here.
Amical's display settings are currently English only.
You should be able to proceed through the installer by simply clicking through the prompts.

Welcome to Amical

To personalize your experience, please select the features you are interested in.

Contextual Dictation: Quick voice input for seamless transcription in any application.

Note-taking: Record thoughts and ideas by voice with smart formatting.

Meeting Transcription (Coming soon): Record and transcribe meetings and conversations with high accuracy.

Voice Commands (Coming soon): Control apps hands-free — perform tasks with natural voice commands.

Please select at least one feature.

Note: The selection here is for personalizing the setup. All features are available at any time.

For now, I'll select "Contextual Dictation" and click Continue.

Permission Settings

Amical requires several permissions to function properly.

All permissions granted. You're all set! You can proceed to the next step.

Microphone access: Required for audio recording and transcription.

Permission granted.

→ You'll be asked for microphone permission, so click OK and then Continue.

Survey (How did you discover Amical?)

This is a simple survey to help understand the user base, asking "How did you discover Amical?".

How did you discover Amical?: This helps us understand where users are coming from.

Options:

Search engine (Google, Bing, etc.): Search engine (Google, Bing, etc.)

Social media (Twitter, LinkedIn, etc.): Social media (Twitter, LinkedIn, etc.)

Friend or colleague recommendation: Friend or colleague recommendation

Blog post or article: Blog post or article

GitHub: GitHub

AI assistant (ChatGPT, Claude, etc.): AI assistant (ChatGPT, Claude, etc.)

Other: Other

→ I found out about it through YouTube, so I'll choose Social media.

Choosing Your AI Model

Choose whether to perform voice processing in the "Cloud" or on "Your own PC (Local)".

Recommendation: Based on your system specs, Amical Cloud is recommended. Local models may result in lower performance.

Amical Cloud (Recommended): Fast, high precision, free. No setup required.

Pros: Free, fast, high accuracy, no configuration needed.

Cons: Requires internet connection and login.

Local Models: Private, offline, free. Runs entirely on the device.

Pros: Complete privacy, offline operation.

Cons: Uses device resources.

Hint: Settings can be changed later. If you're unsure, it's smoothest to choose the recommended "Amical Cloud" and click "Complete setup to continue" at the bottom right.

→ I'll go with Amical Cloud.

I was prompted to sign in, so I'll register.

I chose Google authentication.

Setup Complete!


This is the final step screen.

Quick Configuration

Microphone: Select the microphone to use. Currently, it's set to "System Default".

Push to talk: Ctrl + Win key.

Voice input (transcription) is performed only while this key is held. You can also change it using the pencil icon on the right.

Join our Community

Join Discord to browse help, share feedback, and interact with other users.

You're All Set!

Start transcribing using the push-to-talk shortcut.

Click the floating widget for quick access.

You can customize further from Settings.

It seems you can perform voice input with Ctrl + Win.

Even though I spoke in Japanese, it was somehow translated into English.

I turned off Auto Detect Language in Dictation and set Languages to Japanese.

Screen and Feature Introduction

Notes Feature

There is a feature to take notes within the Amical application. It seems to be plain text rather than Markdown.

Preferences

Launch at login

Automatically starts the application when you log in to your PC.

Show widget while inactive

Keeps the widget displayed on the screen even when you are not recording.

Theme

Select your preferred color scheme.

Dictation


Here you can configure language settings and microphone input.

Shortcuts

Push to talk

Performs voice input (transcription) only while the key is held.

Current key: Ctrl + Win

Hands-free mode

Starts with one press and stops with another. There is no need to keep holding the key.

Current key: Ctrl + Win + Space

→ Just like AquaVoice, you can use the Space bar for hands-free mode.

Vocabulary

Here you manage unique words you want to be recognized correctly or paraphrasing rules.

Vocabulary: Manages custom vocabulary and word replacements for dictation.

No vocabulary words found. Add your first word to get started.: No words have been registered. Add your first word to get started.

  • Add Word: This is the button to add a word.

Usage: Registering technical terms, company jargon, people's names, or other words that the AI might easily misinterpret will improve the accuracy of the transcription.

History


Ability to copy text
Ability to listen to recorded audio data (interesting)
Download audio data
Delete

There is also a search function, which allows you to quickly pick up information you have spoken in the past. This is also quite interesting. By speaking various things into Amical, it seems that data accumulation and reference will become possible.

Advanced Settings

Preload Whisper Model: By turning this switch on, you can have the AI prepared in advance when the app starts, making the start of voice input much smoother.

Debug Mode: A feature for developers. Turning it on records the background operations in detail, but usually, it is fine to keep it off.

Auto Updates: It is recommended to keep this on to always receive the latest features and security fixes.

Anonymous Telemetry: A setting to send non-personally identifiable information, such as which features were used, to the development team to help improve the app.

Data Location: Displays where the locally stored data is located.

Danger Zone / Reset App: Pressing "Reset App" in the red frame will erase all previous history and settings. As it is literally a "Danger Zone," do not touch it unless you want to start over.

About


This section summarizes app version information and contact details for when you need help.

Current Version

The current version is v0.1.13. You can see that it is still an early version, just at the beginning of development.

Resources

Change Log: You can see records of what new features were added and what bugs were fixed.

GitHub Repository: The place where the app's blueprints (source code) are published.

Discord Community: A place where you can ask questions and exchange opinions with other users and developers via chat.

Contact

Contains an email address (contact@amical.ai) for reporting bugs or asking direct questions.

AI Models

AI Models Settings Screen

This section explains the three categories of AI models: Speech, Language, and Embedding.

1. Speech (Speech Recognition Model)

This is the main engine of the app that converts audio data into text.

Main role: Analyzes audio input from the microphone and transcribes it into words.

Selectable models:

  • Amical Cloud (Recommended): Fast, high precision, and ready to use without any setup.

  • Whisper Model (Local): A model developed by OpenAI that allows processing to be performed on your own PC. You can select and download models based on your hardware specs, such as "Medium" for precision or "Tiny" for speed.

2. Language (Language Model)

This model is used to refine the transcribed text into "more natural sentences."

Main role: Performs context-aware insertion of punctuation, correction of typos, summarization, and formatting on the raw data obtained through speech recognition.

Features: Amical's strength lies in its "context awareness." This model is utilized to optimize the writing style according to the application you are using (such as Slack or email).

→ It seems that you can improve the precision of context understanding by registering your own OpenRouter API key and paying for tokens. That is interesting.

3. Embedding (Embedding Model)

A specialized model mainly used for "searching data" and "associating knowledge."

Main role: Converts sentences into vectors (sequences of numbers) that a computer can understand.

Usage in Amical: Used to quickly search for necessary information from past transcription history or to improve recognition accuracy by cross-referencing custom vocabulary with the current context.

Setup advice: For standard transcription (dictation), simply selecting the recommended "Amical Cloud" in the Speech tab provides high precision even in Japanese. If you want specific industry terms to be recognized correctly, registering those words in the Vocabulary screen will allow the Embedding and Language models to take them into account during processing.

Q. Why does Amical use an Embedding model?

A. "Amical 'remembers' all of your past statements and uses that information to convert your current voice into text."

To put it more intuitively, Amical is doing the following behind the scenes:

1. Listening as a "long-time partner" rather than a "stranger"

Ordinary transcription tools are in a "first-time meeting" state every time. Because of this, they cannot effectively convert your technical terms or unique phrasing. In contrast, Amical "prepares" by looking at all past history before listening to the current audio. Therefore, it converts text while already understanding your habits and the nature of your work.

2. Transcribing by "reading the room"

When you say "that matter," Amical quickly looks back at past records and determines, "Ah, this person is talking about Project A right now." By reflecting that "vibe (context)" in the current conversion, it provides correct sentences aligned with your intention, rather than just a simple transcription.

3. Keeping that "memory" only in your own hands

Normally, to achieve this level of intelligence, you would need to send data to a large server on the internet (under another company's management). However, Amical insists on a system where "past memories never leave your computer." That's why you can feel safe "talking about anything."

Summary: Amical acts like "a tight-lipped private secretary who transcribes your current talk in real-time while looking at a 'cheat sheet' consisting of your entire history."

The name of the component for that "function to quickly re-read the cheat sheet" is just "Embedding," which you saw on the settings screen. As a user, it's perfectly fine to understand it as: "The more I use it, the more it becomes a smart conversion tool dedicated just to me."

Amical's Folder Structure

Since Amical is local-first, it seems to store most of its data locally. I asked an AI about it. Regarding the storage location of Amical's local data (history, settings, audio data), Amical currently uses standard application data folders for each OS.

Data Storage Location (File Path)

On Windows, it is generally saved in the following path:

Windows: C:\Users\(Username)\AppData\Roaming\amical-desktop or C:\Users\(Username)\AppData\Local\amical-desktop

Saved Contents

Inside that folder, you will mainly find the following types of data:

  • database folder (or .db file): This stores "History" text data and the previously mentioned "embeddings (vectorized data)."
  • recordings folder (or audio): If you have configured it to keep recordings locally, audio files (.wav or .mp3) will be saved here.
  • models folder: If you download local models like Whisper, large model files will be saved here.

Points to Check

If the files in this local folder increase when you press the "Download" button on the history screen, it means "the file was moved from the cloud to your local PC (the path mentioned above)."

Conversely, if you recorded with the Speech model set to "Local" from the start, the audio is saved in this folder from the beginning and is not sent to the cloud at all.

2. Which file contains what?

The roles of the primary files and folders in this directory are as follows:

  • amical.db: This is the most important file. All your "History" text is stored here. Since the size is 68KB, some data must have already been recorded.
  • models folder: If you download local models such as Whisper (for speech recognition) or Ollama (for language/embedding) in the future, they will be saved here.
  • Local Storage / Session Storage: These store the app's settings and state.

3. About Downloading Audio Data

Based on the list in the image, a dedicated folder for audio files like "recordings" is not found at this moment.

From this, we can infer the following:

Because Amical Cloud is being used, audio is sent to the cloud for analysis, and "only text (amical.db) remains on the local PC."

The appearance of a "Download" button on the history screen seems to be a function to bring audio files from the cloud into this local folder.

Supplementary Information

"Thank you for watching" sometimes appears

Apparently, this is because YouTube was used to train voice input AI models like "Whisper."

Q&A

Q. AquaVoice seems to send screen captures along with voice input to assist in voice conversion; does Amical have such a feature?

A. It does not send screen captures during voice input.

Q. AquaVoice has a feature where the input voice is transcribed live as a preview; does Amical have this?

A. No. There is a waveform that animates. It might be added in the future.

Q. Amical uses an embedding model; is the feature to consider the context of past voice conversions for the latest conversion active even if no model is specified?

A. In a state where you haven't specified (configured) the model yourself, "contextual correction" based on past history is not functioning. To draw out Amical's "true intelligence," the following steps are necessary:

  • Specify an Embedding model: Enables the AI to read past history.
  • Specify a Language model: Sets up the "brain" that performs "reading the room" conversions by cross-referencing history with the current audio.

→ That's it. Since high-precision conversion is possible even without specifying these, it seems you can enhance its capabilities even further by doing so. I'd like to give this a try.

Discussion