S3-parallel-put Notes

From Public wiki of Kevin P. Inscoe
Revision as of 15:57, 19 August 2020 by Ah2l671Liu8Hah$K2eit (talk | contribs) (Created page with "Third-party S3 tool S3-parallel-put Official Repository - https://github.com/twpayne/s3-parallel-put ==Usage== This script is not that refined and has some quirks to get it...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Third-party S3 tool S3-parallel-put

Official Repository - https://github.com/twpayne/s3-parallel-put

Usage

This script is not that refined and has some quirks to get it to operate.

Definitely run this with --limit=xxx and --dryrun

It gets it's AWS creds currently via env vars, so you have to set those first. I may change that so it reads a cfg file, but haven't yet.

Set env vars (bash):

export AWS_ACCESS_KEY_ID=key
export AWS_SECRET_ACCESS_KEY=secret_key

cd to directory you want to upload from. s3pp works of cwd ".".

$ time /home/kinscoe/s3_software/s3-parallel-put/s3-parallel-put --put=update \\
--host=s3.amazonaws.com --bucket=my-review.hrw.com --walk=filesystem --content-type=guess \\
--processes=10 --limit=100 . --dry-run

If you want to put the objects into a "path" other than root on the bucket, you have to define the leading prefix

--prefix=target_dir_in_bucket/however/far/deep

You can use --put=stupid on the first run and it won't bother checking for existence and just put/overwrite anything in the bucket with the same prefix/path.

For processes you should start low. Especially if the files are huge and you have little RAM. I had to run with 10 with 8GB of ram and that was pushing it sometimes when s3pp encountered big zip files, etc.

Here are actual examples I have used for s3cmd and s3pp for s3 url composition reference:

s3cmd:

$ date; time /usr/bin/s3cmd -c /home/ec2-user/.s3cfg sync --no-check-md5 --delete-removed --no-preserve --verbose --progress /mnt/error_pages/ s3://mybucket/error_pages/

"--no-check-md5" prevents md5 checks on initial upload for a speed bump, but you may not want that really

s3pp:

This example is to upload "admin" as a subdir to the bucket off root (notice the "cwd" kick off point and the "--prefix" option to define the dir/path/prefix to start it on s3.

$ cd /mnt/admin
$ time /home/ec2-user/s3_software/s3-parallel-put/s3-parallel-put --put=update --host=s3.amazonaws.com --bucket=mybucket --walk=filesystem --content-type=guess --processes=25 . --prefix=/hlroadmin --limit=50 --dry-run