https://kevininscoe.com/wiki/index.php?title=S3-parallel-put_Notes&feed=atom&action=historyS3-parallel-put Notes - Revision history2024-03-28T23:03:16ZRevision history for this page on the wikiMediaWiki 1.40.1https://kevininscoe.com/wiki/index.php?title=S3-parallel-put_Notes&diff=921&oldid=prevAh2l671Liu8Hah$K2eit: Created page with "Third-party S3 tool S3-parallel-put Official Repository - https://github.com/twpayne/s3-parallel-put ==Usage== This script is not that refined and has some quirks to get it..."2020-08-19T15:57:52Z<p>Created page with "Third-party S3 tool S3-parallel-put Official Repository - https://github.com/twpayne/s3-parallel-put ==Usage== This script is not that refined and has some quirks to get it..."</p>
<p><b>New page</b></p><div>Third-party S3 tool S3-parallel-put<br />
<br />
Official Repository - https://github.com/twpayne/s3-parallel-put<br />
<br />
==Usage==<br />
<br />
This script is not that refined and has some quirks to get it to operate. <br />
<br />
Definitely run this with --limit=xxx and --dryrun<br />
<br />
It gets it's AWS creds currently via env vars, so you have to set those first. I may change that so it reads a cfg file, but haven't yet.<br />
<br />
Set env vars (bash):<br />
<br />
<pre><br />
export AWS_ACCESS_KEY_ID=key<br />
export AWS_SECRET_ACCESS_KEY=secret_key<br />
</pre><br />
<br />
cd to directory you want to upload from. s3pp works of cwd ".". <br />
<br />
<pre><br />
$ time /home/kinscoe/s3_software/s3-parallel-put/s3-parallel-put --put=update \\<br />
--host=s3.amazonaws.com --bucket=my-review.hrw.com --walk=filesystem --content-type=guess \\<br />
--processes=10 --limit=100 . --dry-run<br />
</pre><br />
<br />
If you want to put the objects into a "path" other than root on the bucket, you have to define the leading prefix<br />
<br />
<pre><br />
--prefix=target_dir_in_bucket/however/far/deep<br />
</pre><br />
<br />
You can use --put=stupid on the first run and it won't bother checking for existence and just put/overwrite anything in the bucket with the same prefix/path.<br />
<br />
For processes you should start low. Especially if the files are huge and you have little RAM. I had to run with 10 with 8GB of ram and that was pushing it sometimes when s3pp encountered big zip files, etc.<br />
<br />
Here are actual examples I have used for s3cmd and s3pp for s3 url composition reference:<br />
<br />
s3cmd:<br />
<br />
<pre><br />
$ date; time /usr/bin/s3cmd -c /home/ec2-user/.s3cfg sync --no-check-md5 --delete-removed --no-preserve --verbose --progress /mnt/error_pages/ s3://mybucket/error_pages/<br />
</pre><br />
<br />
"--no-check-md5" prevents md5 checks on initial upload for a speed bump, but you may not want that really<br />
<br />
s3pp:<br />
<br />
This example is to upload "admin" as a subdir to the bucket off root (notice the "cwd" kick off point and the "--prefix" option to define the dir/path/prefix to start it on s3.<br />
<br />
<pre><br />
$ cd /mnt/admin<br />
$ time /home/ec2-user/s3_software/s3-parallel-put/s3-parallel-put --put=update --host=s3.amazonaws.com --bucket=mybucket --walk=filesystem --content-type=guess --processes=25 . --prefix=/hlroadmin --limit=50 --dry-run<br />
</pre></div>Ah2l671Liu8Hah$K2eit