iTranslated by AI
Deleting Large Numbers of Files from Amazon S3
There are several ways to delete files from S3.
The simplest method is deleting from the dashboard. While this is easy to do, you can only delete a few hundred thousand files at most, and you cannot navigate away from the active tab while it's running.
You can also delete via the CLI. A simple way is a one-liner like aws s3 rm s3://your-bucket-name --exclude "*" --include "*" --recursive. This will delete all files under the specified bucket. However, it isn't very fast (though it's much better than the GUI).
So, what is the best way? It's performing a bulk delete using delete-objects — AWS CLI 1.33.33 Command Reference. With bulk delete, you can specify up to 1,000 items in a single request. The one-liner is as follows:
aws s3api list-objects-v2 --bucket your-bucket-name --prefix "images/$PREFIX" --output text --query 'Contents[].[Key]' | \
grep -v -e "'" | \
tr '\n' '\0' | \
xargs -0 -P2 -n500 bash -c \
'aws s3api delete-objects --bucket your-bucket-name --delete "Objects=[$(printf "{Key=%q}," "$@")],Quiet=true"' _
Now, while this is certainly fast and great, I found that it often fails with a mysterious error called InternalError (most likely due to session timeouts). In my case, I was deleting over several hundred million files, so I expected some failures, but it's quite a problem if the script stops. So, I wrote a script that waits a bit and retries if an error occurs.
#!/bin/bash
if [ $# -ne 1 ]; then
echo "Usage: $0 <PREFIX>"
exit 1
fi
PREFIX=$1
execute() {
aws s3api list-objects-v2 --bucket your-bucket-name --prefix "$PREFIX" --output text --query 'Contents[].[Key]' | \
grep -v -e "'" | \
tr '\n' '\0' | \
xargs -0 -P2 -n500 bash -c \
'aws s3api delete-objects --bucket your-bucket-name --delete "Objects=[$(printf "{Key=%q}," "$@")],Quiet=true"' _
}
while true; do
if execute; then
echo "Command executed successfully."
break
else
sleep 10
echo "An error occurred. Retrying..."
fi
done
Save this with a name like delete_s3_files.sh, give it execution permissions, and run it with the prefix you want to delete under the bucket, such as ./delete_s3_files.sh images/hoge. It will then handle the bulk deletion for you.
This is a "brute-force" implementation that runs indefinitely until the deletion is complete, so please add a retry limit or other logic as needed.
In my case, I opened about 9 tabs in the terminal and ran them in 9 parallel processes using prefixes like images/1, images/2, ..., images/9. This allowed me to delete hundreds of millions of files in less than a day.
By the way, I found out after I was done that setting a lifecycle policy to expire and delete objects in one day seems to be the easiest way. However, since they are only deleted after a day, bulk deletion via the API might be better if you want to get it done within 24 hours.
References
amazon web services - Cheapest way to delete 2 billion objects from S3 IA - Stack Overflow
Discussion