S3-parallel-put Notes
Third-party S3 tool S3-parallel-put
Official Repository - https://github.com/twpayne/s3-parallel-put
Usage
This script is not that refined and has some quirks to get it to operate.
Definitely run this with --limit=xxx and --dryrun
It gets it's AWS creds currently via env vars, so you have to set those first. I may change that so it reads a cfg file, but haven't yet.
Set env vars (bash):
export AWS_ACCESS_KEY_ID=key export AWS_SECRET_ACCESS_KEY=secret_key
cd to directory you want to upload from. s3pp works of cwd ".".
$ time /home/kinscoe/s3_software/s3-parallel-put/s3-parallel-put --put=update \\ --host=s3.amazonaws.com --bucket=my-review.hrw.com --walk=filesystem --content-type=guess \\ --processes=10 --limit=100 . --dry-run
If you want to put the objects into a "path" other than root on the bucket, you have to define the leading prefix
--prefix=target_dir_in_bucket/however/far/deep
You can use --put=stupid on the first run and it won't bother checking for existence and just put/overwrite anything in the bucket with the same prefix/path.
For processes you should start low. Especially if the files are huge and you have little RAM. I had to run with 10 with 8GB of ram and that was pushing it sometimes when s3pp encountered big zip files, etc.
Here are actual examples I have used for s3cmd and s3pp for s3 url composition reference:
s3cmd:
$ date; time /usr/bin/s3cmd -c /home/ec2-user/.s3cfg sync --no-check-md5 --delete-removed --no-preserve --verbose --progress /mnt/error_pages/ s3://mybucket/error_pages/
"--no-check-md5" prevents md5 checks on initial upload for a speed bump, but you may not want that really
s3pp:
This example is to upload "admin" as a subdir to the bucket off root (notice the "cwd" kick off point and the "--prefix" option to define the dir/path/prefix to start it on s3.
$ cd /mnt/admin $ time /home/ec2-user/s3_software/s3-parallel-put/s3-parallel-put --put=update --host=s3.amazonaws.com --bucket=mybucket --walk=filesystem --content-type=guess --processes=25 . --prefix=/hlroadmin --limit=50 --dry-run