iTranslated by AI
Deleting All Data from Amazon Glacier
Introduction
Glacier is a good service. However, deleting its data is extremely tedious. Data deletion cannot be performed on the AWS Management Console; it requires deletion requests using the CLI or API. Furthermore, if any data remains, you cannot delete the vault (something like a bucket). Deleting data from Glacier is truly a hassle.
It was possible to manually execute CLI commands to delete it. However, since the target vault this time contained 2.5TB of NAS backup data and the number of archives exceeded 300,000, I created and used a Python script. In this article, I will introduce how to use that Python script. I will provide a separate introduction for the details of the script itself.
Data Deletion Flow
The steps for deleting data are shown below.
To provide an overview, the process involves requesting a list of archives within the vault from AWS and then deleting the archives based on that list. However, it takes several hours after requesting the archive list before it actually becomes available for download.
Data Deletion Work
In this chapter, I will explain the prerequisites for the data (archive) deletion work, the actual procedure, and the Python script used in this task.
Prerequisites
It is assumed that the following are installed and functioning correctly:
- AWS CLI (Login must be completed)
- An environment where you can run python3 in some way
Retrieving the Archive List
In this task, we will use the AWS CLI to retrieve the archive list. To retrieve the archive list using the AWS CLI, the following information is required:
- Vault name
- AWS account ID
Many people may not know their AWS account ID because it is a value that we don't usually pay attention to. (In fact, I didn't really know it either.) If that is the case, please refer to the following documentation:
After gathering the necessary information, you can request the archive list by executing the following command:
aws glacier initiate-job --vault-name {ボールト名} --account-id {AWSアカウントのID} --job-parameters="{\"Type\":\"inventory-retrieval\"}"
After execution, a JSON containing the JobId will be displayed, so make a note of the JobId.
As mentioned earlier, retrieving the archive list takes a long time, so it's convenient to set up notifications. For more details, please see the following:
How to Check the Completion of the Archive List Retrieval Process
After executing the archive list request command, the AWS CLI terminates immediately. To check the progress in real-time, you need to execute the following command:
aws glacier describe-job --vault-name {vault-name} --account-id {aws-account-id} --job-id {JobId}
The output will likely look as follows. Note: This may have changed by now.
{
"InventoryRetrievalParameters": {
"Format": "JSON"
},
"VaultARN": "*** vault arn ***",
"Completed": false,
"JobId": "*** jobid ***",
"Action": "InventoryRetrieval",
"CreationDate": "*** job creation date ***",
"StatusCode": "InProgress"
}
Please check the StatusCode to determine whether it has finished; once complete, it should show a status indicating completion.
Downloading the Archive List
The Python script used in this process performs deletion requests based on the archive list JSON. Therefore, you need to download the archive list.
Use the following command to download the archive list:
aws glacier get-job-output --vault-name {vault-name} --account-id {aws-account-id} --job-id {JobId} output.json
After executing the command, output.json should be saved in your current directory. If the number of archives is large (e.g., 300,000 or more), do not mistakenly try to open it with vi or similar editors. It will definitely freeze. (Learned it the hard way.)
Archive Deletion
This section explains how to delete archives from Glacier using the Python script I used for this task.
Cloning the Repository
Clone the following repository to your local machine:
git clone https://github.com/ksatoshi/glacier-deleter.git
Running the Script
First, here are the steps for using an already installed version of Python 3.
When using an already installed version of Python 3
# Initial setup
cd glacier-deleter
pip install boto3
python ./main.py
# After running the script, prompts will appear asking for the vault name, target region, and the path where the archive list is saved.
vault name>> {vault-name}
AWS region (default: ap-northeast-1)>> {target-region}
file path>> {path-to-archive-list}
When using mise or uv
# Initial setup
cd glacier-deleter
mise trust
mise install
uv sync
uv run main.py
# After running the script, prompts will appear asking for the vault name, target region, and the path where the archive list is saved.
vault name>> {vault-name}
AWS region (default: ap-northeast-1)>> {target-region}
file path>> {path-to-archive-list}
While the script is running, the progress will be displayed as follows. (Example output):
# AWS response headers (omitted as there is no record of the actual output)
1/3000
# AWS response headers (omitted as there is no record of the actual output)
2/3000
# ----omitted----
# AWS response headers (omitted as there is no record of the actual output)
3000/3000
After the script finishes and the date has changed, if you check the AWS Management Console, the number of archives should be 0 or - (hyphen). With this, you can finally delete the vault and save on the costs previously spent on storing unnecessary data.
Conclusion
In this article, I introduced a method for deleting all Glacier archives using a custom script. Honestly, I don't think this is the absolute best method, but I am satisfied because it worked out in the end.
By the way, regarding the time it took to delete over 300,000 archives, it exceeded at least 24 hours. The entire deletion process was performed on an EC2 instance, and I take my hat off to the reliability of EC2 for maintaining an SSH connection for over 24 hours.
Finally, I found a screenshot of the archive count before deletion, so I'll conclude with that.

Discussion