iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🎵

Get Started for Free: The Complete Guide to Creating Parody Songs with OSS and AI

に公開

Start for Free! Complete Guide to Creating Parody Songs with OSS and AI

Introduction

"I want to arrange that song in a certain style," "I want my favorite character to sing lyrics I wrote," "I want to create a funny parody song and share it on social media" — have you ever had these desires?

Conventionally, producing parody songs required musical expertise, expensive software, and even singing ability. However, with the rapid development of AI and speech synthesis technology, we are now in an era where anyone can easily create high-quality parody songs.

In this article, we will provide a thorough explanation of how to create high-quality parody songs by making full use of free open-source software (OSS) and AI services. We will delve deep into the technical background of music and audio processing to the actual production steps from an engineer's perspective.

What You Will Learn in This Article

  • Overview of the latest AI tools and OSS that can be utilized for parody song production
  • A series of workflows from lyrics generation to speech synthesis
  • Technical mechanisms and how to choose each tool
  • How to use them with consideration for copyright

Engineers with programming knowledge can perform even more advanced customizations. Let's dive into the world of creation using the latest technology!

Basic Knowledge of Parody Song Creation

What is a Parody Song?

"Parody song" (kae-uta) refers to a song where the lyrics are changed while keeping the melody of an existing song. It is a long-standing creative activity used in various contexts such as parody, satire, and educational purposes.

From a technical perspective, parody song production consists of the following elements:

  1. Analysis of the original song: Understanding the tempo, scale, rhythmic structure, etc.
  2. Creation of new lyrics: Creating lyrics that match the syllable count and accents of the original lyrics.
  3. Singing voice synthesis: Generating the voice that sings the new lyrics.
  4. Audio editing: Adjusting with accompaniment, adding effects, etc.

Conventional Parody Song Creation Methods and Their Challenges

Conventionally, the following approaches were common for creating parody songs:

  • Recording yourself singing
  • Hiring professional singers or voice actors
  • Using commercial software like VOCALOID

However, these methods faced the following challenges:

  • Issues with singing ability and recording environment
  • High cost of hiring experts
  • High price and difficulty of using commercial software
  • Limitations in vocal naturalness and expressiveness

The Revolution AI Brings to Parody Song Creation

With the development of AI and OSS, these challenges are being significantly resolved. Currently, the following are becoming possible:

  • Automatic lyrics generation using large language models
  • Creation of accompaniment and melodies using music generation AI
  • Natural and expressive singing voices through cutting-edge speech synthesis
  • Free and customizable environments through open source

Furthermore, the following advancements are significant from a technical background:

  • Performance improvements in language models due to the development of Transformer-based architectures
  • Appearance of speech synthesis models using deep learning (VITS, FastSpeech2, etc.)
  • Advancement of music generation models (MusicGen, MusicLM, etc.)
  • Improved accuracy of source separation technology

By combining these technologies, anyone can now create high-quality parody songs at a low cost.

Parody Song Creation Workflow

The following is a general workflow for creating parody songs using OSS and AI.

Parody song creation workflow
Parody song production process combining OSS and AI services

Following this workflow, we will introduce specific tools and services and explain how to use them.

AI Services Available for Free

Music Generation AI

In recent years, AI that can generate high-quality music from text prompts has developed rapidly. Below, we introduce free services that are particularly useful for creating parody songs.

Suno AI

Suno AI is a service that can generate high-quality music from text. By simply specifying lyrics and style, it can automatically generate a song with vocals.

Main Features:

  • Generate up to 10 songs per day on the free plan
  • Support for multiple languages, including Japanese
  • Generation of songs up to 4 minutes long
  • Ability to specify lyrics and style (genre)
  • Features to further edit or extend the created songs

Technical Background:
Suno AI utilizes advanced music generation models, combining both speech synthesis and music generation technologies. The models are based on the Transformer architecture and have been trained on vast amounts of musical data.

How to Use:

  1. Register an account (possible with a Google account, etc.)
  2. Create a new project in the "Create" tab
  3. Enter the lyrics (Japanese is supported)
  4. Specify the style (e.g., "J-pop ballad," etc.)
  5. Download the generated song

Suno AI Interface
Example of Suno AI interface

Sample Prompt:

Title: Engineers' Daily Life
Style: Japanese pop rock
Lyrics:
Writing and deleting code, another day's battle with bugs
Seeking help on Stack Overflow
Debugging until morning, sleepless nights continue
Still, we move forward with engineers' pride in our hearts

YuE (乐)

YuE is a relatively new open-source music generation AI released in January 2025. It is particularly strong in generating singing voices in Asian languages, including Japanese.

Main Features:

  • Completely free and open source
  • Capable of generating songs up to 5 minutes long
  • Supports Japanese, Chinese (Mandarin/Cantonese), English, and Korean
  • Supports various genres via style specification
  • Can be run in a local environment

Technical Background:
YuE was developed by the Multimodal Art Projection team at the Hong Kong University of Science and Technology. Based on Transformer-like architectures, it tokenizes musical components (lyrics, instruments, rhythm, pitch, timbre, etc.) and generates them in an integrated manner.

Usage Requirements:

  • 16GB or more of memory recommended
  • NVIDIA GPU recommended
  • Python environment required

Sample Settings:

# Configuration example
config = {
    "lyrics": "When the spring breeze blows, cherry blossom petals flutter down",
    "language": "japanese",
    "genre": "jpop",
    "tempo": 90,
    "duration": 120,  # in seconds
}

Udio

Udio is an AI service that can generate high-quality music from text-based prompts. It is characterized by its high degree of customizability.

Main Features:

  • Free plan allows 10 credits per day, up to 100 credits per month
  • Supports Japanese lyrics
  • Manual mode allows for detailed settings
  • Song editing and adjustment features
  • Mixing and mastering features for generated music

Technical Background:
Udio combines music generation models with neural vocal synthesis to generate both high-quality vocals and accompaniments. It is particularly excellent in the expressiveness and realism of the singing voice.

How to Use:

  1. Register an account on the website
  2. Select simple mode or manual mode
  3. Specify lyrics, genre, style, etc.
  4. Adjust customization parameters as needed
  5. Edit and download the generated song

These AI services can all be used with a free plan and allow you to easily generate high-quality music without programming knowledge. However, for commercial use, it is important to check the terms of use for each service.

AI Cover / Singing Voice Conversion Services

Services that create covers of existing songs as if they were sung by another singer using AI have also emerged. These are very useful tools for parody song production.

4Covers.ai

4Covers.ai is a service that allows you to cover existing songs with AI voices.

Main Features:

  • Basic features available for free
  • Supports YouTube video URLs and audio file uploads
  • Choose from numerous AI characters (singing voices)
  • Downloadable generated AI singing voices

How to Use:

  1. Access the official website
  2. Click "Try it Now"
  3. Select the AI character to use
  4. Upload an audio file or enter a YouTube video link
  5. Click "Generate AI Cover" to generate
  6. Download the generated AI singing voice

Musicfy

Musicfy is a service that covers uploaded songs with AI singing voices.

Main Features:

  • Easy to use with simple operations
  • Choose from a variety of voice types
  • Extract vocals from the original song and replace them with another voice
  • Downloadable converted audio

How to Use:

  1. Access the official website
  2. Click "Upload Song" and upload the audio file of the song you want to cover
  3. Select the desired voice and enter your email address
  4. Click "Let's go!"
  5. Download the generated AI cover song

By using these services, you can have a different voice sing an existing song without changing the lyrics, or have it sing lyrics you've created yourself. However, sufficient attention must be paid to copyright.

Utilization of Open Source (OSS) Tools

In addition to large language models and web services, there are numerous high-performance open-source tools that can be used in a local environment. Here, we introduce important OSS tools particularly related to singing voice synthesis and audio processing.

Singing Voice Synthesis Tools

UTAU / OpenUTAU

UTAU is a singing voice synthesis software that has existed for a long time, and OpenUTAU has been developed as its modern version.

Main Features:

  • Completely free to use
  • Abundant free voice libraries (voicebanks)
  • Detailed adjustment of singing voices is possible
  • Supports a wide range of platforms (OpenUTAU)
  • Active community and resources

Technical Background:
UTAU uses concatenative synthesis technology, which generates singing voices by connecting recorded audio samples. OpenUTAU extends this with a modern interface and incorporates more advanced algorithms.

How to Use:

  1. Install the software
  2. Install a voice library (such as "Kasane Teto")
  3. Prepare MIDI data or manually enter notes
  4. Set the lyrics
  5. Adjust parameters and output

OpenUTAU Interface
Example of the OpenUTAU editing screen

# Installation example for OpenUTAU (for Linux)
git clone https://github.com/stakira/OpenUtau.git
cd OpenUtau
dotnet build

VOICEVOX

VOICEVOX is a free speech synthesis software developed by Hiroshiba Hirai in Japan. While primarily for spoken language, it can also be utilized for singing.

Main Features:

  • Completely free and allows commercial use
  • High-quality Japanese speech synthesis
  • Easy operability
  • Cross-platform support
  • Ability to control emotional expression

Technical Background:
VOICEVOX adopts an end-to-end speech synthesis model, specifically based on the MOJIBake model. This allows for the generation of natural speech with few parameter adjustments.

How to Use:

  1. Download and install the software
  2. Enter lyrics into the text input area
  3. Select a voice character
  4. Adjust parameters (speed, emotion, etc.)
  5. Generate and export the audio

Style-Bert-VITS2

Style-Bert-VITS2 is an open-source AI model that combines state-of-the-art speech synthesis technologies.

Main Features:

  • Highly natural speech synthesis is possible
  • Create new voices by merging multiple models
  • High-precision analysis of Japanese text
  • Control over styles such as emotions and speaking manners
  • Adjustment of accents and intonations

Technical Background:
Style-Bert-VITS2 adopts an architecture that combines the BERT language model with VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech). This enables advanced speech synthesis based on a semantic understanding of the text.

Usage Example:

# Basic example of speech synthesis with Style-Bert-VITS2
from style_bert_vits2 import TextToSpeech

tts = TextToSpeech()
tts.load_model("path/to/model")
wav = tts.synthesize("This is a sample of lyrics", speaker_id=0)

NNSVS (Neural Network Singing Voice Synthesis)

NNSVS is a neural network-based singing voice synthesis toolkit developed by researcher Ryuichi Yamamoto.

Main Features:

  • Advanced singing voice synthesis based on academic research
  • Flexible model architecture
  • Support for MusicXML and UST formats
  • Advanced pitch and timing control
  • Abundant customization options

Technical Background:
NNSVS was inspired by the open-source singing voice synthesis system Sinsy and applies the latest deep learning approaches. It implements multiple architectures such as RNN-LSTM, Autoregressive models, and Transformer-based models for predicting acoustic features.

Basic Usage Steps:

  1. Install dependency libraries
  2. Train a model or download an existing model
  3. Prepare score data (such as MusicXML)
  4. Adjust configuration files
  5. Execute synthesis and post-processing
# Installation example for NNSVS
pip install nnsvs
# Basic singing voice synthesis command
nnsvs synthesis question.musicxml --acoustic-model=/path/to/model

Source Separation and Audio Processing Tools

UVR5 (Ultimate Vocal Remover v5)

UVR5 is an AI-powered source separation tool that can separate vocals and accompaniment from songs with high precision.

Main Features:

  • Completely free to use
  • High-precision source separation
  • Features such as vocal extraction, accompaniment extraction, and harmony removal
  • Batch processing support
  • High-speed processing using GPU

Technical Background:
UVR5 implements source separation algorithms based on deep learning. Specifically, it uses convolutional neural networks applying the U-Net architecture to perform source separation in the frequency domain.

Example of use on Google Colaboratory:
UVR5 recommends a GPU for high-speed processing. Below is an example of execution using Google Colaboratory, where you can use a GPU for free.

# Setup example on Google Colab
!git clone https://github.com/Anjok07/ultimatevocalremovergui.git
%cd ultimatevocalremovergui
!pip install -r requirements.txt

# Basic usage example with CLI
!python uvr.py --input "input.mp3" --model_name "your_model" --output_dir "output"

Audacity

Audacity is a multifunctional open-source audio editor. It can be widely used for audio editing and processing.

Main Features:

  • Completely free and open source
  • Cross-platform support
  • Variety of editing and effect functions
  • Plugin support
  • Multi-track editing

Usage Examples:

  • Mixing separated accompaniment and synthesized singing voice
  • Trimming or fading audio
  • Applying effects such as noise removal or adding echo
  • Adjustment of pitch and tempo
  • Optimization of volume balance

In parody song production, it is very convenient for integrating synthesized singing voices and accompaniment tracks, as well as final audio adjustments.

SoX (Sound eXchange)

SoX is a powerful command-line-based audio processing tool. It is suitable for batch processing and automation.

Main Features:

  • Command-line interface
  • Supports various audio file formats
  • Advanced audio processing functions
  • Ideal for use in scripts and pipelines
  • Resource-efficient

Basic Command Examples:

# Audio file format conversion
sox input.wav output.mp3

# Changing the sampling rate
sox input.wav -r 44100 output.wav

# Volume adjustment
sox input.wav output.wav vol 2.0

# Adding an echo effect
sox input.wav output.wav echo 0.8 0.9 1000 0.3

Practice: Step-by-Step Guide to Creating Parody Songs

Here, we will explain the series of steps to actually create a parody song. While you can choose tools according to the situation, this guide introduces a combination of relatively accessible tools.

Step 1: Selection and Analysis of the Original Song

Song Selection

First, choose the song that will serve as the base for your parody. It is a good idea to consider the following points:

  • High recognition and easy for many people to identify
  • Clear melody line
  • Clear lyrical structure
  • Ideally, the audio source is easy to obtain

Examples include "Natsumatsuri" (Whiteberry) or "A Cruel Angel's Thesis" (Yoko Takahashi).

Audio Source Separation

Separate the vocals and accompaniment from the original song. This is effective using UVR5.

# Example execution on Google Colab
!python uvr.py --input "summer_festival.mp3" --model_name "HP5-UVR" --output_dir "./separated"

This will generate separated files for accompaniment and vocals in the ./separated directory.

Analysis of Original Lyrics

Analyze the lyrical structure of the original song. Pay particular attention to the following points:

  • Number of syllables (mora count) in each phrase
  • Accent patterns (pitch accent)
  • Rhyming scheme
  • Structure such as Chorus (Sabi) or Verse (A-mero)

For example, analyzing the beginning of "Natsumatsuri":

「期待してた夏が来る 胸を高鳴らせ」
Ki-ta-i-shi-te-ta-na-tsu-ga-ku-ru  mu-ne-wo-ta-ka-na-ra-se
(8 syllables + 9 syllables)

By understanding the syllable count and rhythm patterns in this way, it becomes easier to create new lyrics.

Step 2: Creating New Lyrics

Deciding on a Theme

Decide on the theme or direction of your parody song. For example:

  • Technical topics (programming languages, development environments, etc.)
  • Current events or trends
  • Content related to specific hobbies or activities
  • Parody or satire

Writing the Lyrics

Create new lyrics that fit the structure of the original song. Utilizing Large Language Models (LLMs) can be efficient.

Example Prompt for LLM:

Please create parody song lyrics with a programming theme, maintaining the structure (syllable count, prosody) of the following original lyrics.

Original Lyrics:
「期待してた夏が来る 胸を高鳴らせ」
(Ki-ta-i-shi-te-ta-na-tsu-ga-ku-ru  mu-ne-wo-ta-ka-na-ra-se)
(8 syllables + 9 syllables)

In particular, please ensure that the syllable count for each line matches the original lyrics exactly and that rhymes are placed in the same locations.

Refining the Lyrics

Read the generated lyrics aloud to check the rhythm and naturalness. Make fine adjustments as necessary. Pay attention to the following points in particular:

  • Consistency in syllable count
  • Naturalness of accents
  • Ease of understanding the meaning
  • Integrity of rhymes

Step 3: Singing Voice Synthesis

There are several approaches depending on your purpose and skill level.

Approach 1: Using Music Generation AI (For Beginners)

The easiest way is to enter new lyrics into a music generation AI like Suno AI and have it generate a song from scratch.

Title: Programmer's Daily Life
Style: Similar to "Summer Festival" by Whiteberry
Lyrics:
Fixing code with errors, throughout the night
Finding bugs and correcting them, fighting sleepiness
...

With this approach, the melody won't be exactly the same as the original song, but a new song with a similar atmosphere will be generated.

Approach 2: Using Singing Voice Synthesis Software like UTAU (For Intermediate Users)

If you want to create a parody song that is more faithful to the original, use singing voice synthesis software such as OpenUTAU.

  1. Install OpenUTAU and set up the voice library.
  2. Obtain MIDI data of the original song or enter notes manually.
  3. Assign the new lyrics to the notes.
  4. Adjust parameters (vibrato, volume, etc.).
  5. Render the singing voice.

Approach 3: Using AI Cover Services (Advanced Version)

Method for having an AI cover service sing the new lyrics over the accompaniment of the original song is also effective.

  1. Prepare the accompaniment track separated by UVR5.
  2. Create singing data for the new lyrics (using UTAU, etc.).
  3. Use a service like 4Covers.ai to have it sung with your preferred voice quality.

Step 4: Mixing and Editing

Combining Singing Voice and Accompaniment

Combine the singing voice and accompaniment into a single song. Audacity is convenient for this.

  1. Create a new project in Audacity.
  2. Import the accompaniment track.
  3. Import the synthesized singing voice.
  4. Adjust timing and synchronize.
  5. Adjust the volume balance of both tracks.

Adding Effects

Add effects as needed:

  • Reverb: Adds spatial depth.
  • Compression: Adjusts the dynamic range of volume.
  • EQ: Enhances or suppresses specific frequency bands.
  • Delay: Adds an echo effect.

Final Adjustments

To finish as the final track, perform the following adjustments:

# Example of final audio quality adjustment using SoX
sox mixed.wav -b 16 final.wav rate 44100 dither -s

Add fade-ins and fade-outs as necessary.

Application Examples and Usage Scenarios

Parody song technology can be utilized in various scenarios. Here are some practical examples.

Utilization in Educational Content

Parody songs are highly effective as a tool for conveying technical or academic content in an easy-to-remember way.

Example: A parody song for learning programming language concepts

// To the melody of "Natsumatsuri"
"Variables are like boxes to put values in
 Once they go out of scope, they disappear"

Such parody songs provide complex concepts in a form that is easy to memorize.

Utilization in Corporate and Team Events

By using them in internal events or presentations, you can create impressive and memorable content.

Example: A parody song for a project retrospective

// To the melody of "A Cruel Angel's Thesis"
"Cruel deadlines are rushing us
 The chaos before the due date is unavoidable
 I want to run away, but the team works as one
 Never giving up until the end, to complete it"

Personal Creative Activities

Creating parodies of your favorite songs as a hobby or personal expression is also a fun approach.

Example: A parody song expressing the daily life of an engineer

// To the melody of "Kimi no Shiranai Monogatari"
"Bugs that won't go away, only errors
 Until the dawn of a night where the stack overflows
 I was debugging codes that you don't know"

Tech Demonstrations

Generating and showing parody songs in real-time is also effective as a demonstration of AI technology.

Implementation Example: Live Demo System

# Overview of a real-time parody song generation system
def live_parody_demo(original_song, new_theme):
    # 1. Analyze the original song
    vocal, instrumental = separate_audio(original_song)
    lyrics = transcribe_lyrics(vocal)
    
    # 2. Generate lyrics based on the new theme
    new_lyrics = generate_parody_lyrics(lyrics, theme=new_theme)
    
    # 3. Synthesize singing voice with the new lyrics
    new_vocal = synthesize_voice(new_lyrics, melody_from=vocal)
    
    # 4. Synthesize with accompaniment
    final_song = mix_tracks(new_vocal, instrumental)
    
    return final_song

By building such a system and showcasing it as a live demonstration, you can convey the potential of AI technology.

When creating and publishing parody songs, an understanding of copyright is necessary.

Differences Between Private Use and Public Distribution

  • Private Use: Creating parody songs for personal enjoyment is usually not a problem.
  • Public Distribution/Sharing: Publishing on SNS or websites may potentially infringe on copyright laws.

Regarding Parody Use

In Japanese copyright law, there are no clear provisions regarding "parody," and permission is often required even when modifying and using original works.

Appropriate Actions

It is important to respond appropriately in the following ways:

  1. Obtain permission from the rightsholder: If possible, obtain permission from the copyright holder of the original song.
  2. Utilize royalty-free materials: Use copyright-free music or public domain songs.
  3. Clearly state the original song credits: If used, clearly specify the source.
  4. Limit to non-commercial purposes: Be especially careful with commercial use.

Regarding the copyright of music generated by AI, it is important to check the terms of use for each service.

  • Suno AI: The free plan only permits personal use; a paid plan is required for commercial use.
  • YuE: An open-source project that allows commercial use.
  • UTAU: Usage conditions vary depending on the voice library.

Summary

In this article, we introduced methods for creating high-quality parody songs using OSS and AI services. Due to technological evolution, music production—which once required specialized knowledge and expensive software—has become accessible to everyone.

Recapping the main points:

  1. Analysis of the original song: Separate audio sources with tools like UVR5 and understand the lyrical structure.
  2. Lyrics generation: Create new lyrics while maintaining the original structure.
  3. Singing voice synthesis: Utilize UTAU, VOICEVOX, or Suno AI.
  4. Editing and mixing: Perform final adjustments with Audacity and other tools.

By combining these technologies, you can create parody songs that can be used in various fields such as education, entertainment, and communication.

The combination of AI and OSS is a crucial factor leading to the democratization of creativity. Let's continue to explore the possibilities of music and audio processing from an engineer's perspective while keeping an eye on technological developments.

Finally, don't forget to be mindful of copyright and enjoy your creative activities!

Reference Resources

https://zenn.dev/acntechjp/articles/20250405_music_ai

Discussion