Why I don't generally recommend using a swapfile in cloud based Linux environments: Difference between revisions

Latest revision as of 18:32, 29 March 2019

This is a FAQ (Frequently Asked Question). Note the arguments presented represent only servers running server type loads such as application and database servers and this model does not work on desktop or VDI type hosts.

The Question

Why do you not include a swap partition or swap volume in the AMI supplied by xyz?

Rationale

Well the primary reason is the AMI we deliver is meant to be one size fits all. It really is a "base" AMI or what we refer to as pre-deployable. You would not normally put this AMI into an launch configuration and run with it. More work needs to be done to it using configuration tools like Salt or Chef. We have a wide difference in loads and wide selection of customers. It is impractical to provide swap in many instance types such as *.micros and *.small. Also generally we have not recommended swapping in cloud hosts. Its also long been our practice not to use swapping in virtual hosts. The reason is simple: if you need to swap you have undersized your virtual host and would need to know that. Swapping might conceal that fact while impacting performance (slow vs failure) but not necessarily in a reportable way (other than running long term reporting tools in host such as sar). CloudWatch might notice a slow system but almost any monitoring tool will notice a failure. If you are scaling correctly and an overcommitment in memory occurs the process in Linux will die and it will die in a forensic way. Meanwhile auto-scaling will continue on while devops teams determine why swapping occurred and if a correction or re-sizing needs to be made which usually becomes a simple launch configuration update. Remember Amazon recommends "Design for failure and nothing will fail". Think Netflix's famous Choas Monkey announcement which states in part:

Why Run Chaos Monkey?
Failures happen and they inevitably happen when least desired or expected. If your application can't tolerate an instance failure would you rather find out by being paged at 3am or when you're in the office and have had your morning coffee? Even if you are confident that your architecture can tolerate an instance failure, are you sure it will still be able to next week? How about next month? Software is complex and dynamic and that "simple fix" you put in place last week could have undesired consequences. Do your traffic load balancers correctly detect and route requests around instances that go offline? Can you reliably rebuild your instances? Perhaps an engineer "quick patched" an instance last week and forgot to commit the changes to your source repository?
There are many failure scenarios that Chaos Monkey helps us detect. Over the last year Chaos Monkey has terminated over 65,000 instances running in our production and testing environments. Most of the time nobody notices, but we continue to find surprises caused by Chaos Monkey which allows us to isolate and resolve them so they don't happen again.

Other reasons (but not the primary reason) to not include swap volumes include:

The increased cost of swapping against magnetic volumes although using Instance Storage is supported,
Possibly performance spent swapping but admittedly this is rare these days. SSD is common now it wasn't when we first came to this conclusion.

In general however we admit swapping is good overall. Its more about failing than a safety net.

How Linux uses memory

How Linux uses RAM (very simplified)

Each application can use some of your memory. Linux uses all otherwise unoccupied memory (except for the last few Mib) as "cache". This includes the page cache, inode caches, etc. This is a good thing - it helps speed things up heaps. Both writing to disk and reading from disk can be sped up immensely by cache.

Important note here: It is impractical to look at memory in a point in time such as using the free or top commands and simply state this Linux system is not swapping or does not need to swap. The converse is true also. You must use historical reporting tools such as sar and iostat to determine if you have been swapping over time.

Ideally, you have enough memory for all your applications, and you still have several hundred MiB left for cache. In this situation, as long as your applications do not increase their memory use and the system isn't struggling to get enough space for cache, there is no need for any swap.

Once applications claim more RAM, it simply goes into some of the space that was used by cache, shrinking the cache. De-allocating cache is cheap and easy enough that it is simply done in real time - everything that sits in the cache is either just a second copy of something that's already on disk, so can just be deallocated instantly, or it's something that we would have had to flush to disk within the next few seconds anyway.

This is not a situation that is specific to Linux - all modern operating systems work this way. The different operating systems might just report free RAM differently: some include the cache as part of what they consider "free" and some may not.

When you talk about free RAM, it's a lot more meaningful to include cache, because it practically is free - it's available should any application request it. On Linux, the free command reports it both ways - the first line includes cache in the used RAM column, and the second line includes cache (and buffers) in the free column.

How Linux uses swap (even more simplified)

Once you have used up enough memory that there is not enough left for a smooth-running cache, Linux may decide to re-allocate some unused application memory from RAM to swap.

It doesn't do this according to a definite cut-off. It's not like you reach a certain percentage of allocation then Linux starts swapping. It has a rather "fuzzy" algorithm. It takes a lot of things into account, which can best be described by "how much pressure is there for memory allocation". If there is a lot of "pressure" to allocate new memory, then it will increase the chances some will be swapped to make more room. If there is less "pressure" then it will decrease these chances.

Your system has a "swappiness" setting which helps you tweak how this "pressure" is calculated. It's normally not recommended to alter this at all, and I would not recommend you alter it. Swapping is overall a very good thing - although there are a few edge cases where it harms performance, if you look at overall system performance it's a net benefit for a wide range of tasks. If you reduce the swappiness, you let the amount of cache memory shrink a little bit more than it would otherwise, even when it may really be useful. Whether this is a good enough trade-off for whatever problem you're having with swapping is up to you. You should just know what you're doing, that's all.

There is a well-known situation in which swap really harms perceived performance on a desktop system, and that's in how quickly applications can respond to user input again after being left idle for a long time and having background processes heavy in IO (such as an overnight backup) run. This is a very visible sluggishness, but not enough to justify turning off swap all together and very hard to prevent in any operating system. Turn off swap and this initial sluggishness after the backup/virus scan may not happen, but the system may run a little bit slower all day long. This is not a situation that's limited to Linux, either.

When choosing what is to be swapped to disk, the system tries to pick memory that is not actually being used - read to or written from. It has a pretty simple algorithm for calculating this that chooses well most of the time.

If you have a system where you have a huge amount of RAM (at time of writing, 8GB is a huge amount for a typical Linux distro), then you will very rarely ever hit a situation where swap is needed at all. You may even try turning swap off. I never recommend doing that, but only because you never know when more RAM may save you from some application crashing. But if you know you're not going to need it, you can do it.

But how can swap speed up my system? Doesn't swapping slow things down?

The act of transferring data from RAM to swap is a slow operation, but it's only taken when the kernel is pretty sure the overall benefit will outweigh this. For example, if your application memory has risen to the point that you have almost no cache left and your I/O is very inefficient because of this, you can actually get a lot more speed out of your system by freeing up some memory, even after the initial expense of swapping data in order to free it up.

It's also a last resort should your applications actually request more memory than you actually have. In this case, swapping is necessary to prevent an out-of-memory situation which will often result in an application crashing or having to be forcibly killed.

Swapping is only associated with times where your system is performing poorly because it happens at times when you are running out of usable RAM, which would slow your system down (or make it unstable) even if you didn't have swap. So to simplify things, swapping happens because your system is becoming bogged down, rather than the other way around.

Once data is in swap, when does it come out again?

Transferring data out of swap is (for traditional hard disks, at least) just as time-consuming as putting it in there. So understandably, your kernel will be just as reluctant to remove data from swap, especially if it's not actually being used (ie read from or written to). If you have data in swap and it's not being used, then it's actually a good thing that it remains in swap, since it leaves more memory for other things that are being used, potentially speeding up your system.

What is swappiness and how do I change it?

From the Ubuntu's Swap FAQ:

"What is swappiness and how do I change it?

The swappiness parameter controls the tendency of the kernel to move processes out of physical memory and onto the swap disk. Because disks are much slower than RAM, this can lead to slower response times for system and applications if processes are too aggressively moved out of memory.

swappiness can have a value of between 0 and 100

swappiness=0 tells the kernel to avoid swapping processes out of physical memory for as long as possible

swappiness=100 tells the kernel to aggressively swap processes out of physical memory and move them to swap cache

The default setting in Ubuntu is swappiness=60. Reducing the default value of swappiness will probably improve overall performance for a typical Ubuntu desktop installation. A value of swappiness=10 is recommended, but feel free to experiment. Note: Ubuntu server installations have different performance requirements to desktop systems, and the default value of 60 is likely more suitable.

To check the swappiness value

cat /proc/sys/vm/swappiness

To change the swappiness value A temporary change (lost on reboot) with a swappiness value of 10 can be made with

sudo sysctl vm.swappiness=10

To make a change permanent, edit the configuration file with your favorite editor:

sudo vim /etc/sysctl.conf

Search for vm.swappiness and change its value as desired. If vm.swappiness does not exist, add it to the end of the file like so:

vm.swappiness=10

Save the file and reboot."

Alternatives

If memory becomes an issue perhaps look at other ways to relieve that issue such as memcache. Or you may need to horizontally scale using an Elastic Load Balancer.

Exceptions to the rule

"But I really need a swap mounted?"

In some cases you do particularly with large applications like Oracle databases. In cases like this we generally build out Oracle with its own data and swap volumes. But if you really need swap on an ad hoc basis we recommend using Chef or cloud provisioning to add that to the instance at deployment. This is the model we are moving to. TGIE will no longer "hand roll" AMI's. The new direction is to use Chef to make both basic pre-deployable AMI's and use Chef to provision what is need in the instance after launch. Other than single instance with no ELB this would assume using auto-scaling groups even if it is a fixed number of instances.

The basic steps for adding swap after instance launch are:

Provision volume and attach to instance (using AWS tools)
Verify volume presented to instance (command lsblk)
sudo dd if=/dev/zero of=/swapfile1 bs=1M count=nnnn (in blocks which are sized by bs so "bs" * "count" = total swap space)
sudo mkswap /swapfile1
sudo swapon /swapfile1

To survive reboots we need to add this mount to the fstab:

echo "/swapfile1 swap swap defaults 0 0" >> /etc/fstab

Additional Notes

Amazon Web Services -Architecting for The Cloud: Best Practices

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSPerformance.html

http://unix.stackexchange.com/questions/128642/debug-out-of-memory-with-var-log-messages

http://techblog.netflix.com/2010/12/5-lessons-weve-learned-using-aws.html

@@ Line 7: / Line 7: @@
 == Rationale ==
-Well the primary reason is the AMI we deliver is meant to be one size fits all. It really is a "base" AMI or what we refer to as pre-deployable. You would not normally put this AMI into an launch configuration and run with it. More work needs to be done to it. To see what goes into a typical AMI see [[Steps to produce a pre-deployable AWS AMI]]. We have a wide difference in loads and wide selection of customers. It is impractical to provide swap in many instance types such as *.micros and *.small. Also generally we have not recommended swapping in cloud hosts. Its also long been our practice not to use swapping in virtual hosts. The reason is simple: if you need to swap you have undersized your virtual host and would need to know that. Swapping might conceal that fact while impacting performance (slow vs failure) but not necessarily in a reportable way (other than running long term reporting tools in host such as sar).  CloudWatch might notice a slow system but almost any monitoring tool will notice a failure. If you are scaling correctly and an overcommitment in memory occurs the process in Linux will die and it will [http://linux-mm.org/OOM die in a forensic way]. Meanwhile auto-scaling will continue on while devops teams determine why swapping occurred and if a correction or re-sizing needs to be made which usually becomes a simple launch configuration update. Remember Amazon recommends "[https://media.amazonwebservices.com/AWS_Cloud_Best_Practices.pdf Design for failure and nothing will fail]". Think Netflix's famous [http://techblog.netflix.com/2012/07/chaos-monkey-released-into-wild.html Choas Monkey] announcement which states in part:
+Well the primary reason is the AMI we deliver is meant to be one size fits all. It really is a "base" AMI or what we refer to as pre-deployable. You would not normally put this AMI into an launch configuration and run with it. More work needs to be done to it using configuration tools like Salt or Chef. We have a wide difference in loads and wide selection of customers. It is impractical to provide swap in many instance types such as *.micros and *.small. Also generally we have not recommended swapping in cloud hosts. Its also long been our practice not to use swapping in virtual hosts. The reason is simple: if you need to swap you have undersized your virtual host and would need to know that. Swapping might conceal that fact while impacting performance (slow vs failure) but not necessarily in a reportable way (other than running long term reporting tools in host such as sar).  CloudWatch might notice a slow system but almost any monitoring tool will notice a failure. If you are scaling correctly and an overcommitment in memory occurs the process in Linux will die and it will [http://linux-mm.org/OOM die in a forensic way]. Meanwhile auto-scaling will continue on while devops teams determine why swapping occurred and if a correction or re-sizing needs to be made which usually becomes a simple launch configuration update. Remember Amazon recommends "[https://media.amazonwebservices.com/AWS_Cloud_Best_Practices.pdf Design for failure and nothing will fail]". Think Netflix's famous [http://techblog.netflix.com/2012/07/chaos-monkey-released-into-wild.html Choas Monkey] announcement which states in part:
 <blockquote>