iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🗜️

How to Perform SSH Streaming Backups Without Consuming EC2 Disk Space

に公開

Introduction

The disk storage on my EC2 instance has been slowly filling up.

Running df -h showed only a few GBs remaining. Expanding the volume is an option, but I'm reluctant to increase costs indefinitely. On the other hand, deleting files is risky. Caught in this dilemma, I remembered a long-standing technique: streaming tar.gz backups over SSH.

The key point is that it consumes zero disk space on the server because no backup files are created locally on the server; the data is brought directly to your local machine.

Although this is a classic method, I've organized it here as a memo for future reference. I hope this helps anyone facing similar issues.


How it Works

A typical backup workflow looks like this:

# Create tar.gz on the server (writes to disk)
tar czf backup.tar.gz /path/to/dir

# Copy to local and then delete the file on the server
scp user@host:backup.tar.gz ./
ssh user@host "rm backup.tar.gz"

This approach temporarily puts pressure on the server's disk. If the disk is nearly full, the tar command might even fail mid-process.

In contrast, the streaming approach works like this:

# Server-side: Output to stdout with "-" (no file creation)
# Local-side: Receive directly through the SSH pipe
ssh user@host "tar czf - /path/to/dir" > backup.tar.gz

By letting SSH act as the pipe, the backup completes without creating any files on the server.

The Complete Command

ssh user@host "tar czf - /path/to/dir" > backup.tar.gz

It's that simple. The - in tar czf - denotes stdout, and SSH pipes that byte stream directly to the local redirection.

To Restore

cat backup.tar.gz | ssh user@host "tar xzf - -C /restore/path"

Restoration is also possible by reversing the pipe.


Practical Example on Amazon Linux 2023

Here is the actual command to back up /home/ec2-user/myapp from an EC2 instance (Amazon Linux 2023).

ssh -i ~/.ssh/my-key.pem \
    -o StrictHostKeyChecking=no \
    ec2-user@<EC2_IP> \
    "tar czf - -C /home/ec2-user myapp" \
  > ./backups/$(date +%Y%m%d-%H%M%S)-myapp.tar.gz

Using the -C option to specify the directory keeps the path structure simple during extraction.

To Monitor Progress

If the data volume is large, you can insert the pv command to see the progress (requires pv installed locally).

ssh -i ~/.ssh/my-key.pem ec2-user@<EC2_IP> \
    "tar czf - /home/ec2-user/myapp" \
  | pv > ./backups/myapp.tar.gz

Implemented as a Claude Code Skill

Since I perform this task regularly, I implemented it as a Claude Code skill.

Directory Structure

~/.claude/skills/ssh-backup/
├── SKILL.md      # Operation instructions for Claude Code
└── backup.py     # Backup script

Implementation of backup.py

The script uses Python's subprocess to execute the SSH command via pipe and writes it to the local destination in chunks.

backup.py
import subprocess
import sys
import os
import time
from pathlib import Path

SSH_KEY = os.path.expanduser("~/.ssh/my-key.pem")
SSH_USER = "ec2-user"
SSH_HOST = "your-ec2-ip"
SSH_OPTS = ["-i", SSH_KEY, "-o", "StrictHostKeyChecking=no", "-o", "ConnectTimeout=10"]

CHUNK_SIZE = 65536            # Read in 64KB chunks
PROGRESS_INTERVAL = 1_048_576 # Show progress every 1MB


def stream_backup(remote_path, output_path):
    parent_dir = str(Path(remote_path).parent)
    dirname = Path(remote_path).name

    # Server-side: output to stdout via tar czf -
    tar_cmd = f"tar czf - -C {parent_dir} {dirname}"
    cmd = ["ssh"] + SSH_OPTS + [f"{SSH_USER}@{SSH_HOST}", tar_cmd]

    proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

    total_bytes = 0
    last_progress = 0
    start_time = time.time()

    with open(output_path, "wb") as f:
        while True:
            chunk = proc.stdout.read(CHUNK_SIZE)
            if not chunk:
                break
            f.write(chunk)
            total_bytes += len(chunk)

            if total_bytes - last_progress >= PROGRESS_INTERVAL:
                elapsed = time.time() - start_time
                mb = total_bytes / 1_048_576
                print(f"[INFO] {mb:.1f} MB received ({elapsed:.1f}s)", flush=True)
                last_progress = total_bytes

    proc.wait()

    if proc.returncode != 0:
        stderr_out = proc.stderr.read().decode("utf-8", errors="replace").strip()
        print(f"[ERROR] {stderr_out}")
        output_path.unlink(missing_ok=True)
        return False

    elapsed = time.time() - start_time
    mb = total_bytes / 1_048_576
    print(f"[OK] {output_path} ({mb:.1f} MB, {elapsed:.1f}s)")
    return True

How to Use

source ~/.claude/lib/load_env.sh
run_python ~/.claude/skills/ssh-backup/backup.py /home/ec2-user/myapp

The output file is automatically generated with a timestamp, e.g., ./outputs/20260314-143022-home-ec2-user-myapp.tar.gz.

SKILL.md (Instructions for Claude Code)
# ssh-backup

Perform a tar.gz backup of a remote server directory via SSH streaming.

## Triggers
"Back up the server", "SSH backup", "Remote backup", etc.

## Usage
run_python ~/.claude/skills/ssh-backup/backup.py /remote/path

## Safety Measures
- Server-side: uses only tar czf -. No file creation or deletion.
- Shell injection prevention: subprocess arguments used as a list.
- SSH key file existence check.

Pitfalls

Mind the tar Path

# NG: Using an absolute path with tar includes the full path inside the archive
tar czf - /home/ec2-user/myapp

# OK: Specify the directory with -C to archive with a relative path
tar czf - -C /home/ec2-user myapp

The latter is preferable as it avoids including the absolute path /home/ec2-user/ in the archive, making it easier to extract to any location.

Watch Out for SSH Banner Messages

Depending on the EC2 instance, banner messages (such as "Amazon Linux 2023") might be printed to standard output upon SSH login. If this mixes into the tar.gz stream, it will corrupt the archive.

ssh -q -i ~/.ssh/my-key.pem ec2-user@host "tar czf - /path/to/dir" > backup.tar.gz

Verifying the Archive

After transfer, check if the archive is intact.

tar tzf backup.tar.gz | head -20   # List files (will error if corrupted)

Summary

Method Server Disk Usage Complexity
Standard (tar → scp → delete) Consumes storage for the backup file 3 steps
Streaming (SSH pipe) 0 1 command

This method is effective when disk space is tight or simply when you do not want to create temporary files on the server. The mechanism is simple and can be completed in a single command line.

While it is a classic technique, it remains highly useful even in today's era where EC2 and cloud instances are the norm. I hope this proves helpful to someone.


If you found this article helpful, please consider supporting me! I will continue to share practical infrastructure tips and Claude Code utilization techniques.

Discussion