iTranslated by AI
How to Perform SSH Streaming Backups Without Consuming EC2 Disk Space
Introduction
The disk storage on my EC2 instance has been slowly filling up.
Running df -h showed only a few GBs remaining. Expanding the volume is an option, but I'm reluctant to increase costs indefinitely. On the other hand, deleting files is risky. Caught in this dilemma, I remembered a long-standing technique: streaming tar.gz backups over SSH.
The key point is that it consumes zero disk space on the server because no backup files are created locally on the server; the data is brought directly to your local machine.
Although this is a classic method, I've organized it here as a memo for future reference. I hope this helps anyone facing similar issues.
How it Works
A typical backup workflow looks like this:
# Create tar.gz on the server (writes to disk)
tar czf backup.tar.gz /path/to/dir
# Copy to local and then delete the file on the server
scp user@host:backup.tar.gz ./
ssh user@host "rm backup.tar.gz"
This approach temporarily puts pressure on the server's disk. If the disk is nearly full, the tar command might even fail mid-process.
In contrast, the streaming approach works like this:
# Server-side: Output to stdout with "-" (no file creation)
# Local-side: Receive directly through the SSH pipe
ssh user@host "tar czf - /path/to/dir" > backup.tar.gz
By letting SSH act as the pipe, the backup completes without creating any files on the server.
The Complete Command
ssh user@host "tar czf - /path/to/dir" > backup.tar.gz
It's that simple. The - in tar czf - denotes stdout, and SSH pipes that byte stream directly to the local redirection.
To Restore
cat backup.tar.gz | ssh user@host "tar xzf - -C /restore/path"
Restoration is also possible by reversing the pipe.
Practical Example on Amazon Linux 2023
Here is the actual command to back up /home/ec2-user/myapp from an EC2 instance (Amazon Linux 2023).
ssh -i ~/.ssh/my-key.pem \
-o StrictHostKeyChecking=no \
ec2-user@<EC2_IP> \
"tar czf - -C /home/ec2-user myapp" \
> ./backups/$(date +%Y%m%d-%H%M%S)-myapp.tar.gz
Using the -C option to specify the directory keeps the path structure simple during extraction.
To Monitor Progress
If the data volume is large, you can insert the pv command to see the progress (requires pv installed locally).
ssh -i ~/.ssh/my-key.pem ec2-user@<EC2_IP> \
"tar czf - /home/ec2-user/myapp" \
| pv > ./backups/myapp.tar.gz
Implemented as a Claude Code Skill
Since I perform this task regularly, I implemented it as a Claude Code skill.
Directory Structure
~/.claude/skills/ssh-backup/
├── SKILL.md # Operation instructions for Claude Code
└── backup.py # Backup script
Implementation of backup.py
The script uses Python's subprocess to execute the SSH command via pipe and writes it to the local destination in chunks.
import subprocess
import sys
import os
import time
from pathlib import Path
SSH_KEY = os.path.expanduser("~/.ssh/my-key.pem")
SSH_USER = "ec2-user"
SSH_HOST = "your-ec2-ip"
SSH_OPTS = ["-i", SSH_KEY, "-o", "StrictHostKeyChecking=no", "-o", "ConnectTimeout=10"]
CHUNK_SIZE = 65536 # Read in 64KB chunks
PROGRESS_INTERVAL = 1_048_576 # Show progress every 1MB
def stream_backup(remote_path, output_path):
parent_dir = str(Path(remote_path).parent)
dirname = Path(remote_path).name
# Server-side: output to stdout via tar czf -
tar_cmd = f"tar czf - -C {parent_dir} {dirname}"
cmd = ["ssh"] + SSH_OPTS + [f"{SSH_USER}@{SSH_HOST}", tar_cmd]
proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
total_bytes = 0
last_progress = 0
start_time = time.time()
with open(output_path, "wb") as f:
while True:
chunk = proc.stdout.read(CHUNK_SIZE)
if not chunk:
break
f.write(chunk)
total_bytes += len(chunk)
if total_bytes - last_progress >= PROGRESS_INTERVAL:
elapsed = time.time() - start_time
mb = total_bytes / 1_048_576
print(f"[INFO] {mb:.1f} MB received ({elapsed:.1f}s)", flush=True)
last_progress = total_bytes
proc.wait()
if proc.returncode != 0:
stderr_out = proc.stderr.read().decode("utf-8", errors="replace").strip()
print(f"[ERROR] {stderr_out}")
output_path.unlink(missing_ok=True)
return False
elapsed = time.time() - start_time
mb = total_bytes / 1_048_576
print(f"[OK] {output_path} ({mb:.1f} MB, {elapsed:.1f}s)")
return True
How to Use
source ~/.claude/lib/load_env.sh
run_python ~/.claude/skills/ssh-backup/backup.py /home/ec2-user/myapp
The output file is automatically generated with a timestamp, e.g., ./outputs/20260314-143022-home-ec2-user-myapp.tar.gz.
SKILL.md (Instructions for Claude Code)
# ssh-backup
Perform a tar.gz backup of a remote server directory via SSH streaming.
## Triggers
"Back up the server", "SSH backup", "Remote backup", etc.
## Usage
run_python ~/.claude/skills/ssh-backup/backup.py /remote/path
## Safety Measures
- Server-side: uses only tar czf -. No file creation or deletion.
- Shell injection prevention: subprocess arguments used as a list.
- SSH key file existence check.
Pitfalls
Mind the tar Path
# NG: Using an absolute path with tar includes the full path inside the archive
tar czf - /home/ec2-user/myapp
# OK: Specify the directory with -C to archive with a relative path
tar czf - -C /home/ec2-user myapp
The latter is preferable as it avoids including the absolute path /home/ec2-user/ in the archive, making it easier to extract to any location.
Watch Out for SSH Banner Messages
Depending on the EC2 instance, banner messages (such as "Amazon Linux 2023") might be printed to standard output upon SSH login. If this mixes into the tar.gz stream, it will corrupt the archive.
ssh -q -i ~/.ssh/my-key.pem ec2-user@host "tar czf - /path/to/dir" > backup.tar.gz
Verifying the Archive
After transfer, check if the archive is intact.
tar tzf backup.tar.gz | head -20 # List files (will error if corrupted)
Summary
| Method | Server Disk Usage | Complexity |
|---|---|---|
| Standard (tar → scp → delete) | Consumes storage for the backup file | 3 steps |
| Streaming (SSH pipe) | 0 | 1 command |
This method is effective when disk space is tight or simply when you do not want to create temporary files on the server. The mechanism is simple and can be completed in a single command line.
While it is a classic technique, it remains highly useful even in today's era where EC2 and cloud instances are the norm. I hope this proves helpful to someone.
If you found this article helpful, please consider supporting me! I will continue to share practical infrastructure tips and Claude Code utilization techniques.
Discussion