Large file transfers across wide area networks

From Public wiki of Kevin P. Inscoe
Revision as of 17:53, 4 February 2018 by Kinscoe (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Keys to fastest transfers

The keys to fastest and global wide area network file transfer speed are:

  • UDP rather than TCP protocol
  • Concurrency (multiple simultaneous transfers) with file management
  • On Linux/MacOS/Unix use lftp to increase saturation
  • Less than 80% circuit utilization to avoid congestion control
  • Less software layers and more straight file copy, avoid Subversion or rsync and instead archive single files for transfer
  • Avoid encryption over the wire, instead encrypt and shrink before transfer
  • Shortest path
  • Mirror files or edge delivery (Content Delivery Network) at locations closer to receiver
  • Use WAN acceleration
  • Delta changes - block level changes rather than whole file synchronization
  • Use cloud or distributed file systems where possible

Useful links

http://udt.sourceforge.net/

After reading this government web site (folks who apparently shift terabytes of data around every day)

http://fasterdata.es.net/

http://fasterdata.es.net/tuning.html

http://fasterdata.es.net/tools.html

I have come to the conclusion our current tool sets are inadequate. Given the tuning they suggest has been applied in several scenarios now with little improvement (mainly TCP congestion control and increased buffers) I am soliciting your input on improved toolsets to test and evaluate for possible adoption in the future.

As a global company with large amounts of content shifting around the world on an almost daily basis we have to find a way to improve our current situation.

This encompasses all of our operating system in use at xxxx today (Windows, Apple OS X, Solaris and Linux).

Here are some of my personal notes on the subject of they are of help.

Troubleshooting large file transfers across wide area networks

One thing I have learned it that much of the inefficiency is in the software itself and usually not in the underlying network or operating systems. That’s not to say we don’t have process improvements of course (such as moving data closer to clients, using delta transfers rather than whole collections, etc..) but it clearly shows our classic tool sets are woefully inadequate for the job at hand.

There are times when back-grounding data transfer is the best approach for link congestion concerns (ala BitTorrent - Mass Transit?) however there are other times when a near immediate file transfer has to take place. Mirroring technology seems best suited for this task but sometime the amount of data burst to the mirrored locations are almost too much for the mirror copies to stay current. One thing I have observed is short small transfers (many of them like in collections or an rsync) is highly inefficient. Favored for speed are larger single non-encrypted (on the wire) transfers in parallel (which get better utilization of the link capacity) This I believe will become the new paradigm in our bulk file transfers. I still favor remote mirroring (it is essential) but the way in which we transfer the mirrored data from one location to another in a timely fashion is our new challenge.

Buffers and tuning

Bufferbloat from Jim Getty