stefan's blag and stuff

Blog – 2017-09-11 – How to restart a failed scp file upload

From time to time I'm transferring big files (up to 150 GiB) over relative slow network links or to slow servers. These are either tar.gz backups, disk images or huge photo collections. Since the fancy web-UIs are non-scriptable, bloated and suffer from timeout issues, I prefer to use the scp command over a SSH connection:

$ scp some-big-file.zip your-server.example.com:tmp/

The command assumes that you have configured the host your-server.example.com in your local ~/.ssh/config or your local username is the same as remote.

Needless to say that a long running scp command can fail after some hours or days when your network connection is unstable, e.g. your upstream provider rotates your IP-address every 24 hours.

In that case just reexecuting the scp is suboptimal, because it truncates the remote file, restarts the transfer from scratch and mostly will fail again.

It would be really nice to have a commandline argument to reuse the already transfered bytes. Sadly scp does not have this feature, but rsync comes to rescue:

Using rsync with --append-verify

To restart the upload without reuploading the already transmitted bytes, you can use the rsync command as follows:

$ rsync --progress --append-verify -v --rsh=ssh \
     some-big-file.zip your-server.example.com:tmp/some-big-file.zip

The trick is the --append-verify option. It's an advanced version of --append.

If you use the argument --append, rsync reuses existing files on the remote side and only appends bytes. Additionally --append-verify uses a checksum algorithm to compare the existing bytes on the remote side prior to appending data. In the end you can be sure that the huge file was transferred successfully to remote server without any sort of data corruption.

Nevertheless I'm always doing an extra round with md5sum to ensure that the local and remote file are identically. Just in case. (Until now I have never witnessed a data corruption caused by rsync.)

Notes

It's totally save to interrupt a rsync upload with CTRL+C, e.g. to pause an already running upload. Just rexecute the rsync command and it will continue.

rsync has also the argument --partial. It's a different mechanism, but behaves the same as --append for a single file. In the past I have also used --partial to restart interrupted uploads successfully.