iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🌟

I Created a Transcription App for System Audio Using Whisper (Now Available on Microsoft Store!)

に公開

Overview

I created a transcription software called (AuText). It is completely, no matter how you look at it, yet another one of those tools.
Also, I released it on the Microsoft Store.

Here is the store page (Microsoft Store):
https://www.microsoft.com/store/productId/9NP0PJHCSRH3

Here is the homepage:
https://autext.orizika.com/

Contact

https://discord.gg/Zy65k8AxH2

What did I make!?

I created a transcription software (free).
It is a tool that can perform transcription from "software (processes)," "microphones," and "audio files."
I focused on a simple design and aimed for something that is easy to install and uninstall.
By the way, since my nature is such that I would simply perish if things don't run locally (if it's not processed within the PC), I haven't used any external cloud services. (I'd dieeee~~)

Why did I make it?

Someone mentioned, "I made transcription software!" I happened to have some source code that I had previously abandoned because I couldn't fulfill a certain requirement, so I thought, "If there's a demand for this, I might as well try making one too~."

What was that "certain requirement"?

Originally, I wanted to create software that could transcribe and translate in real-time while playing games. However, real-time transcription eats up PC specs, making it extremely difficult to perform locally (the same goes for translation). Therefore, using an external cloud service would be the best option, but...

"To be honest, it's such a massive hassssle!!!!!" I thought. The reason it's a hassle is that using cloud services usually involves a pay-as-you-go system, which incurs costs. A pay-as-you-go system means you have to track usage. However, since the client's PC cannot be trusted, the developer needs to set up a server as a buffer to send requests to the cloud service from there.
And then... "Creating a server is such a hassssle!!!!! Managing a server is also such a hassssle!!!" I concluded.

That's it.

What am I using?

  • Language: C#
  • GUI: WinUI3
  • Transcription Algorithm: whisper
  • Transcription: whisper.cpp
  • Transcription C# Wrapper: whisper.net
  • Audio: NAudio

NAudio does not have a feature to capture audio from processes. So, I modified NAudio to make it work. Yay!
Regarding capturing audio from processes, I wrote an article about it before, so please take a look.
https://zenn.dev/test_myname/articles/a5338b7ef10e37

I completely forgot to mention this, but I'm using a feature limited to Windows 11 and later (the feature to capture audio from processes), so it might not work on Windows 10.
I lied, I'm sorry.
I have confirmed operation on "Windows 10 21H2 19044.2846" (it's not Windows 11).
If you update Windows regularly, it should work.
For those who haven't updated in over two years... I'm not really sure...

Where is the software published?

I released it on the Microsoft Store! Yes!☆
https://www.microsoft.com/store/productId/9NP0PJHCSRH3

Were there any difficulties?

None. If I had to say,

  • I almost died of frustration because I couldn't exchange data properly between raw audio data (custom capture), NAudio (audio conversion), WinUI3 (audio display), and Whisper (transcription).
  • I started to dislike C# a bit because I couldn't release memory properly (after all, C++ is better when you want to be meticulous about memory—conversely, it's safe, but the GC...).

That's about it.

Future Development Plans?

It's something like this.
I think a timeline feature is essential, so I'll work on that quickly.
As for real-time transcription... honestly, my PC specs are too low to handle processing in real-time, so it might be pushed back quite a bit.

Also, I'm thinking about things like embedding subtitles into videos and exporting them when a video is loaded.

The End

Thank you all for your support.

By the way... I feel like someone's going to say, "Seriously, stop making mini-apps and work on the 'CommGameNews' you mentioned in December lol are you an idiot lol"... sorry.

Discussion