Linux kernel performance and reliability monitoring and tuning
Summary
I created this section as a compendium of my collected knowledge of Linux performance monitoring and tuning. I am not an expert but I decided to collect these together with original references or citations when possible for non-original work. I started this in my private wikis because I grew tired of searching all the articles everytime a new subject would come up. There are hundreds of thousands of articles out there on topics related to Linux performance. I then decided to make my notes public to benefit others. So this collection is an outreach of that and will take time to migrate over from my private archives. In some cases original attribution is now lost however if you see something here I have referenced that you would like to take credit for please contact me by email at.
See also Web or server connection troubleshooting and Network troubleshooting.
Command to monitor resources overall
- top
- dstat - http://dag.wiee.rs/home-made/dstat/
- nmon - http://nmon.sourceforge.net/
- htop - http://hisham.hm/htop/
- Collectl - http://collectl.sourceforge.net/
- Glances - https://nicolargo.github.io/glances/
- saidar - See also http://www.binarytides.com/saidar-linux-system-monitor/
- atop - http://www.atoptool.nl/
- iftop - http://www.ex-parrot.com/pdw/iftop/
CPU monitoring tools
- mpstat (try command: mpstat -P ALL)
- sar -u
- ps command
- iostat
- vmstat
- directory /sys/devices/system/cpu or file /proc/cpuinfo to identify cpus present - see https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-devices-system-cpu
$ sar -u 12 5
Report CPU utilization. The following values are displayed:
- %user: Percentage of CPU utilization that occurred while executing at the user level (application).
- %nice: Percentage of CPU utilization that occurred while executing at the user level with nice priority.
- %system: Percentage of CPU utilization that occurred while executing at the system level (kernel).
- %iowait: Percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request.
- %idle: Percentage of time that the CPU or CPUs were idle and the system did not have an outstanding disk I/O request.
Finally, you need to determine which process is monopolizing or eating the CPUs. Following command will displays the top 10 CPU users on the Linux system.
$sudo ps -eo pcpu,pid,user,args | sort -k 1 -r | head -10
OR
$ sudo ps -eo pcpu,pid,user,args | sort -r -k1 | less
%CPU PID USER COMMAND 96 2148 vivek /usr/lib/vmware/bin/vmware-vmx -C /var/lib/vmware/Virtual Machines/Ubuntu 64-bit/Ubuntu 64-bit.vmx -@ "" 0.7 3358 mysql /usr/libexec/mysqld --defaults-file=/etc/my.cnf --basedir=/usr --datadir=/var/lib/mysql --user=mysql --pid-file=/var/run/mysqld/mysqld.pid --skip-locking --socket=/var/lib/mysql/mysql.sock 0.4 29129 lighttpd /usr/bin/php 0.4 29128 lighttpd /usr/bin/php 0.4 29127 lighttpd /usr/bin/php 0.4 29126 lighttpd /usr/bin/php 0.2 2177 vivek [vmware-rtc] 0.0 9 root [kacpid] 0.0 8 root [khelper]
To see who’s using CPU:
$ sudo ps -e -o pcpu,cpu,nice,state,cputime,args --sort=-pcpu | head -n 10 %CPU CPU NI S TIME COMMAND 100 - 0 R 00:01:31 dd if=/dev/zero of=/dev/null 0.0 - 0 S 00:00:01 /sbin/init 0.0 - 0 S 00:00:00 [kthreadd] 0.0 - - S 00:00:00 [migration/0] 0.0 - 0 S 00:00:00 [ksoftirqd/0] 0.0 - - S 00:00:00 [stopper/0] 0.0 - - S 00:00:00 [watchdog/0] 0.0 - - S 00:00:00 [migration/1] 0.0 - - S 00:00:00 [stopper/1]
To look at cpu overall:
$ vmstat 20 3 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 1 0 0 7173628 98296 436428 0 0 2 1 26 8 0 1 99 0 0 1 0 0 7173612 98296 436428 0 0 0 0 1010 9 13 37 50 0 0 1 0 0 7173116 98312 436432 0 0 0 3 1016 13 12 38 50 0 0
Field Description For Vm Mode
(a) procs is the process-related fields are:
- r: The number of processes waiting for run time.
- b: The number of processes in uninterruptible sleep.
(b) memory is the memory-related fields are:
- swpd: the amount of virtual memory used.
- free: the amount of idle memory.
- buff: the amount of memory used as buffers.
- cache: the amount of memory used as cache.
(c) swap is swap-related fields are:
- si: Amount of memory swapped in from disk (/s).
- so: Amount of memory swapped to disk (/s).
(d) io is the I/O-related fields are:
- bi: Blocks received from a block device (blocks/s).
- bo: Blocks sent to a block device (blocks/s).
(e) system is the system-related fields are:
- in: The number of interrupts per second, including the clock.
- cs: The number of context switches per second.
(f) cpu is the CPU-related fields are:
These are percentages of total CPU time.
- us: Time spent running non-kernel code. (user time, including nice time)
- sy: Time spent running kernel code. (system time)
- id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time.
- wa: Time spent waiting for IO. Prior to Linux 2.5.41, shown as zero.
To introduce cpu load
Perhaps you want to introduce a cpu load for tuning or analysis.
You can do this with this command:
$ dd if=/dev/zero of=/dev/null
This will run forever until you enter Control-C.
A note on threading
Non-threaded applications tend to consume mostly one cpu where a well balanced threaded application should divide its time across all available cpus.
For stats on how well you are threading across multiple cpus look at file /proc/stat.
Here is a very non-balanced application:
$ cat /proc/stat cpu 101506 126 285419 21111496 1760 19 11 1297 0 cpu0 5408 51 3912 10740445 410 18 6 596 0 cpu1 96097 75 281507 10371050 1349 0 5 700 0 ...
The very first "cpu" line aggregates the numbers in all of the other "cpuN" lines.
These numbers identify the amount of time the CPU has spent performing different kinds of work. Time units are in USER_HZ or Jiffies (typically hundredths of a second).
The meanings of the columns are as follows, from left to right:
- user: normal processes executing in user mode
- nice: niced processes executing in user mode
- system: processes executing in kernel mode
- idle: twiddling thumbs
- iowait: waiting for I/O to complete
- irq: servicing interrupts
- softirq: servicing softirqs
mpstat shows one processor favored on this instance:
$ mpstat -P ALL 20 3 Linux 2.6.32-642.3.1.el6.x86_64 (typhon) 07/27/2016 _x86_64_ (2 CPU) ... Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle Average: all 12.47 0.00 37.54 0.00 0.00 0.00 0.00 0.00 49.99 Average: 0 0.00 0.00 0.02 0.00 0.00 0.00 0.00 0.00 99.98 Average: 1 24.95 0.00 75.05 0.00 0.00 0.00 0.00 0.00 0.00
You can show threads by using lsof command or ps. Here you can see the process that is consuming cpu all on one cpu:
$ ps -eLf UID PID PPID LWP C NLWP STIME TTY TIME CMD root 1 0 1 0 1 Jul26 ? 00:00:01 /sbin/init root 2 0 2 0 1 Jul26 ? 00:00:00 [kthreadd] root 3 2 3 0 1 Jul26 ? 00:00:00 [migration/0] root 4 2 4 0 1 Jul26 ? 00:00:00 [ksoftirqd/0] root 5 2 5 0 1 Jul26 ? 00:00:00 [stopper/0] ... root 29210 1457 29210 0 1 14:30 ? 00:00:00 sshd: sysoper [priv] sysoper 29212 29210 29212 0 1 14:30 ? 00:00:00 sshd: sysoper@pts/1 sysoper 29213 29212 29213 0 1 14:30 pts/1 00:00:00 -bash sysoper 29236 29213 29236 99 1 14:31 pts/1 01:21:17 dd if=/dev/zero of=/dev/null postfix 29569 1547 29569 0 1 15:38 ? 00:00:00 pickup -l -t fifo -u
NLWP is "Number of Light Weight Processes" - number of lwps threads in the process and LWP (aka SID or TID) (light weight process, or thread) ID of the lwp being reported. The difference between LWP and NWLP is essentially Posix standards. Since Linux 2.4.19 (or so) threads can share the pid of the parent process and have a separate thread id, TID. Most processes have just the one thread and so their TID is is same as their PID. C represents processor utilization. Currently, this is the integer value of the percent usage over the lifetime of the process In the above case we can see the number
For more info see http://yolinux.com/TUTORIALS/LinuxTutorialPosixThreads.html for a tutorial on POSIX threads usage under Linux. See also https://www.akkadia.org/drepper/nptl-design.pdf
ps command options:
- -H Show threads as if they were processes
- -L Show threads, possibly with LWP and NLWP columns
- -T Show threads, possibly with SPID column
- -m Show threads after processes
You can list threads for a given process several ways:
- ps --pid <pid> -Lf, ps -eLf for all processes or ps -T to show just thread count
- top -H -p <pid>
- Each thread in a process creates a directory under /proc/<pid>/task. Count the number of directories, and you have the number of threads.
- File /proc/<pid>/status
Examples:
$ sudo ps --pid 2702 -Lf UID PID PPID LWP C NLWP STIME TTY TIME CMD tomcat 2702 1 2702 0 209 04:39 ? 00:00:00 /usr/lib/jvm/java-1.7.0/bin/java -Xmx3072M -Xm tomcat 2702 1 2704 0 209 04:39 ? 00:00:03 /usr/lib/jvm/java-1.7.0/bin/java -Xmx3072M -Xm tomcat 2702 1 2705 0 209 04:39 ? 00:00:21 /usr/lib/jvm/java-1.7.0/bin/java -Xmx3072M -Xm tomcat 2702 1 2706 0 209 04:39 ? 00:00:22 /usr/lib/jvm/java-1.7.0/bin/java -Xmx3072M -Xm tomcat 2702 1 2707 0 209 04:39 ? 00:00:39 /usr/lib/jvm/java-1.7.0/bin/java -Xmx3072M -Xm tomcat 2702 1 2708 0 209 04:39 ? 00:00:00 /usr/lib/jvm/java-1.7.0/bin/java -Xmx3072M -Xm tomcat 2702 1 2709 0 209 04:39 ? 00:00:01 /usr/lib/jvm/java-1.7.0/bin/java -Xmx3072M -Xm tomcat 2702 1 2710 0 209 04:39 ? 00:00:00 /usr/lib/jvm/java-1.7.0/bin/java -Xmx3072M -Xm tomcat 2702 1 2711 0 209 04:39 ? 00:03:03 /usr/lib/jvm/java-1.7.0/bin/java -Xmx3072M -Xm tomcat 2702 1 2712 0 209 04:39 ? 00:03:03 /usr/lib/jvm/java-1.7.0/bin/java -Xmx3072M -Xm ...
$ sudo ps -T -p 2702 PID SPID TTY TIME CMD 2702 2702 ? 00:00:00 java 2702 2704 ? 00:00:03 java 2702 2705 ? 00:00:21 java 2702 2706 ? 00:00:22 java 2702 2707 ? 00:00:39 java 2702 2708 ? 00:00:00 java 2702 2709 ? 00:00:01 java 2702 2710 ? 00:00:00 java
$ top -H -p 2702 top - 18:30:32 up 105 days, 8:33, 1 user, load average: 0.02, 0.05, 0.05 Tasks: 209 total, 0 running, 209 sleeping, 0 stopped, 0 zombie Cpu(s): 1.0%us, 0.1%sy, 0.0%ni, 98.8%id, 0.1%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 4049992k total, 3621592k used, 428400k free, 186140k buffers Swap: 8388604k total, 14848k used, 8373756k free, 968856k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2916 tomcat 20 0 5286m 2.3g 27m S 2.0 58.7 1:46.19 java 2920 tomcat 20 0 5286m 2.3g 27m S 2.0 58.7 0:00.96 java 2702 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 0:00.01 java 2704 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 0:03.04 java 2705 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 0:21.74 java 2706 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 0:22.02 java 2707 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 0:39.16 java 2708 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 0:00.51 java 2709 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 0:01.10 java 2710 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 0:00.00 java 2711 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 3:03.78 java 2712 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 3:04.78 java 2713 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 0:00.00 java 2714 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 0:00.00 java 2715 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 0:27.43 java 2716 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 0:00.00 java 2717 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 0:00.00 java 2718 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 0:00.00 java 2721 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 0:00.02 java 2727 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 0:00.11 java 2728 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 0:00.00 java 2731 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 0:00.04 java 2732 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 0:10.43 java 2734 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 0:00.42 java
$ sudo ls -l /proc/2702/task total 0 dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:46 11238 dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:46 11998 dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:46 12400 dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:17 12962 dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:17 12964 dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:17 12965 dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:17 13097 dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:46 13385 dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:46 13388 dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:46 13389 dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:46 13390 dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:46 13391 dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:46 13392 dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:17 14928 dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:17 14963 dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:17 15629 dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:17 15720 ...
$ cat /proc/2702/status | grep Thread Threads: 207
Run queues
use sar -q, it will give you the number of tasks in the task list under the column plist-sz.
$ sar -q 3 5 Linux 4.1.13-18.26.amzn1.x86_64 (ip-10-32-8-250) 07/27/2016 _x86_64_ (2 CPU) 06:32:39 PM runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15 06:32:42 PM 0 292 0.15 0.07 0.06 06:32:45 PM 0 293 0.14 0.06 0.06 06:32:48 PM 0 293 0.14 0.06 0.06 06:32:51 PM 0 293 0.13 0.06 0.05 06:32:54 PM 0 293 0.13 0.06 0.05 Average: 0 293 0.14 0.06 0.06
Memory and swap
Common memory and swap commands
- free
- cat /proc/meminfo
- vmstat
- top
- htop
- pmap
- dmidecode -t 17 - Shows installed ram
- File /proc/<pid>/status virtual memory statistics for a process
Linux memory management
Linux memory management dated but still largely relevant. http://www.linuxhowtos.org/System/Linux%20Memory%20Management.htm.
It is important to note that Linux will always try to cache the most recently used (MRU) files if memory is available. Linux in order to help performance will always try to fill memory up to a certain limit with caching.
For a look at cached memory usage look at file /proc/slabinfo. For more info on slabinfo use command "$ man slabinfo".
Just looking at the output of the free or top commands alone or looking at how much memory used is not enough to paint a clear picture of memory allocation.
For example showing active and inactive memory:
$ vmstat -a procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu----- r b swpd free inact active si so bi bo in cs us sy id wa st 1 0 0 7173612 181684 521708 0 0 2 1 21 8 0 1 99 0 0
Active memory is memory that is being used by a particular process. Inactive memory is memory that was allocated to a process that is no longer running. For an even more detailed distribution of memory look at special file /proc/meminfo :
$ cat /proc/meminfo MemTotal: 8057792 kB MemFree: 7173628 kB Buffers: 98160 kB Cached: 436424 kB SwapCached: 0 kB Active: 521712 kB Inactive: 181732 kB Active(anon): 169004 kB Inactive(anon): 12 kB Active(file): 352708 kB Inactive(file): 181720 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 0 kB Writeback: 0 kB AnonPages: 168856 kB Mapped: 14644 kB Shmem: 160 kB Slab: 102796 kB SReclaimable: 80104 kB SUnreclaim: 22692 kB KernelStack: 2000 kB PageTables: 4440 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 4028896 kB Committed_AS: 190744 kB VmallocTotal: 34359738367 kB VmallocUsed: 27420 kB VmallocChunk: 34359705844 kB HardwareCorrupted: 0 kB AnonHugePages: 135168 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 6144 kB DirectMap2M: 8382464 kB
For a better picture of active memory you need the ps command other such tools (pmap) that look at process allocations. For Java you need to look at heap size (+mx on the VM). Also if you are swapping you very likely need more memory.
The ps command can output various pieces of information about a process, such as its process id, current running state, and resource utilization. Two of the possible outputs are VSZ and RSS, which stand for "virtual set size" and "resident set size".
$ sudo ps ux -q 1457 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1457 0.0 0.0 66240 1200 ? Ss 09:41 0:00 /usr/sbin/sshd
In the above example process PID 1457 has a virtual size of 66mb (ps reports in kilobytes) and a resident size of 1.2mb. ps is not reporting the real memory usage of processes. What it is really doing is showing how much real memory this process would take up if it were the only process running. A Linux machine has several dozen processes running at any given time, which means that the VSZ and RSS numbers reported by ps are almost definitely incorrect. In order to understand why, it is necessary to learn how Linux handles shared libraries in programs. Because of shared libraries especially commonly referenced ones like libc, are used by many of the programs running on a Linux system. Due to this sharing, Linux is able to load a single copy of the shared libraries into memory and use that one copy for every program that references it. Most tools don't care very much about sharing; they simply report how much memory a process uses, regardless of whether that memory is shared with other processes as well. Two programs could therefore use a large shared library and yet have its size count towards both of their memory usage totals; the library is being double-counted, which can be very misleading if you don't know what is going on.
Unfortunately, a perfect representation of process memory usage isn't easy to obtain. Seeing a process's memory map. Let's see what the situation is with that "huge" SSH process. To see what PID 1457's memory looks like, we'll use the pmap program (with the -d flag):
$ sudo pmap -d 1457 1457: /usr/sbin/sshd Address Kbytes Mode Offset Device Mapping 00007f1cb346e000 52 r-x-- 0000000000000000 0ca:00001 libnss_files-2.12.so 00007f1cb347b000 2044 ----- 000000000000d000 0ca:00001 libnss_files-2.12.so 00007f1cb367a000 4 r---- 000000000000c000 0ca:00001 libnss_files-2.12.so 00007f1cb367b000 4 rw--- 000000000000d000 0ca:00001 libnss_files-2.12.so 00007f1cb367c000 28 r-x-- 0000000000000000 0ca:00001 librt-2.12.so 00007f1cb3683000 2044 ----- 0000000000007000 0ca:00001 librt-2.12.so 00007f1cb3882000 4 r---- 0000000000006000 0ca:00001 librt-2.12.so 00007f1cb3883000 4 rw--- 0000000000007000 0ca:00001 librt-2.12.so 00007f1cb3884000 228 r-x-- 0000000000000000 0ca:00001 libnspr4.so 00007f1cb38bd000 2048 ----- 0000000000039000 0ca:00001 libnspr4.so 00007f1cb3abd000 4 r---- 0000000000039000 0ca:00001 libnspr4.so 00007f1cb3abe000 8 rw--- 000000000003a000 0ca:00001 libnspr4.so 00007f1cb3ac0000 8 rw--- 0000000000000000 000:00000 [ anon ] 00007f1cb3ac2000 12 r-x-- 0000000000000000 0ca:00001 libplds4.so 00007f1cb3ac5000 2044 ----- 0000000000003000 0ca:00001 libplds4.so 00007f1cb3cc4000 4 r---- 0000000000002000 0ca:00001 libplds4.so 00007f1cb3cc5000 4 rw--- 0000000000003000 0ca:00001 libplds4.so 00007f1cb3cc6000 16 r-x-- 0000000000000000 0ca:00001 libplc4.so 00007f1cb3cca000 2044 ----- 0000000000004000 0ca:00001 libplc4.so 00007f1cb3ec9000 4 r---- 0000000000003000 0ca:00001 libplc4.so 00007f1cb3eca000 4 rw--- 0000000000004000 0ca:00001 libplc4.so 00007f1cb3ecb000 152 r-x-- 0000000000000000 0ca:00001 libnssutil3.so 00007f1cb3ef1000 2044 ----- 0000000000026000 0ca:00001 libnssutil3.so 00007f1cb40f0000 24 r---- 0000000000025000 0ca:00001 libnssutil3.so 00007f1cb40f6000 4 rw--- 000000000002b000 0ca:00001 libnssutil3.so ... 00007f1cb763d000 8 r---- 000000000001f000 0ca:00001 ld-2.12.so 00007f1cb763f000 4 rw--- 0000000000021000 0ca:00001 ld-2.12.so 00007f1cb7640000 4 rw--- 0000000000000000 000:00000 [ anon ] 00007f1cb7641000 544 r-x-- 0000000000000000 0ca:00001 sshd 00007f1cb78c8000 12 r---- 0000000000087000 0ca:00001 sshd 00007f1cb78cb000 4 rw--- 000000000008a000 0ca:00001 sshd 00007f1cb78cc000 36 rw--- 0000000000000000 000:00000 [ anon ] 00007f1cb8f60000 132 rw--- 0000000000000000 000:00000 [ anon ] 00007ffc200b3000 84 rw--- 0000000000000000 000:00000 [ stack ] 00007ffc2010f000 4 r-x-- 0000000000000000 000:00000 [ anon ] ffffffffff600000 4 r-x-- 0000000000000000 000:00000 [ anon ] mapped: 66240K writeable/private: 816K shared: 0K
Reduced a lot of the output; the rest is similar to what is shown. Even without the complete output, we can see some very interesting things. One important thing to note about the output is that each shared library is listed twice; once for its code segment and once for its data segment. The code segment can be shared however the data (also known as "text") segment cannot be shared and must be forked in memory with each invocation of the program creating multiple copies of the same memory. The code segments have a mode of "r-x--", while the data is set to "rw---". The Kbytes, Mode, and Mapping columns are the only ones we will care about, as the rest are unimportant to our analysis.
If you go through the output, you will find that the lines with the largest Kbytes number are usually the code segments of the included shared libraries (the ones that start with "lib" are the shared libraries). What is great about that is that they are the ones that can be shared between processes. If you factor out all of the parts that are shared between processes, you end up with the "writeable/private" total, which is shown at the bottom of the output. This is what can be considered the incremental cost of this process, factoring out the shared libraries. Therefore, the cost to run this instance of SSH (assuming that all of the shared libraries were already loaded) is around 800 kilobytes. That is quite a different story from the 66 or 1.2 megabytes that ps reported.
Keep that in mind when sizing applications or determining true application sizing. The moral of this story is that process memory usage on Linux is a complex matter; you can't just run ps and know what is going on. This is especially true when you deal with programs that create a lot of identical children processes, like Apache. ps might report that each Apache process uses 10 megabytes of memory, when the reality might be that the marginal cost of each Apache process is 1 megabyte of memory. This information becomes critical when tuning Apache's MaxClients setting, which determines how many simultaneous requests your server can handle.
Swap
Linux divides its physical RAM (random access memory) into chucks of memory called pages. Swapping is the process whereby a page of memory is copied to the preconfigured space on the hard disk, called swap space, to free up that page of memory. The combined sizes of the physical memory and the swap space is the amount of virtual memory available.
Swapping is necessary for two important reasons. First, when the system requires more memory than is physically available, the kernel swaps out less used pages and gives memory to the current application (process) that needs the memory immediately. Second, a significant number of the pages used by an application during its startup phase may only be used for initialization and then never used again. The system can swap out those pages and free the memory for other applications or even for the disk cache.
However, swapping does have a downside. Compared to memory, disks are very slow. Memory speeds can be measured in nanoseconds, while disks are measured in milliseconds, so accessing the disk can be tens of thousands times slower than accessing physical memory. The more swapping that occurs, the slower your system will be. Sometimes excessive swapping or thrashing occurs where a page is swapped out and then very soon swapped in and then swapped out again and so on. In such situations the system is struggling to find free memory and keep applications running at the same time. In this case only adding more RAM will help.
Linux has two forms of swap space: the swap partition and the swap file. The swap partition is an independent section of the hard disk used solely for swapping; no other files can reside there. The swap file is a special file in the filesystem that resides amongst your system and data files.
To see what swap space you have, use the command swapon -s. The output will look something like this:
Filename Type Size Used Priority /dev/sda5 partition 859436 0 -1
Tuning swap
It is possible to run a Linux system without a swap space, and the system will run well if you have a large amount of memory -- but if you run out of physical memory then the system will crash, as it has nothing else it can do, so it is advisable to have a swap space, especially since disk space is relatively cheap.
See also Why isn't swap in TGIE supplied AMIs?.
The Linux 2.6 kernel added a new kernel parameter called swappiness to let administrators tweak the way Linux swaps. It is a number from 0 to 100. In essence, higher values lead to more pages being swapped, and lower values lead to more applications being kept in memory, even if they are idle.
The default value for swappiness is 60. You can alter it temporarily (until you next reboot) by typing as root:
echo 50 > /proc/sys/vm/swappiness If you want to alter it permanently then you need to change the vm.swappiness parameter in the /etc/sysctl.conf file.
Measuring swap activity
See also http://serverfault.com/questions/270283/what-does-the-fields-in-sar-b-output-mean
Paging is not the same as swapping. You might have paging activity when calling executables to read portions of their binary code off disk or working with memory-mapped files. Essentially paging is loading code and text data into memory as part of the RSS. Swapping is removing your process entirely into the swapping file. Swapping is a hugely expensive operation compared to demand loading (paging).
A fault is loading a portion of memory from disk (as part of the VSZ) into RSS and making it resident in memory for use.
For the above reasons I do not rely on the sar -W command to get a better idea of swap activity, take a look at the si / so counters of vmstat.
$ sudo vmstat 5 10 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 0 0 0 7350088 50412 365992 0 0 7 2 23 17 0 0 100 0 0 0 0 0 7350088 50412 365992 0 0 0 0 17 10 0 0 100 0 0 0 0 0 7350088 50420 365992 0 0 0 3 18 12 0 0 100 0 0 0 0 0 7350088 50420 365992 0 0 0 0 13 9 0 0 100 0 0 0 0 0 7350088 50420 365992 0 0 0 0 21 11 0 0 100 0 0 0 0 0 7350088 50428 365992 0 0 0 2 15 11 0 0 100 0 0 0 0 0 7350088 50428 365992 0 0 0 2 21 11 0 0 100 0 0 0 0 0 7350088 50428 365992 0 0 0 0 15 10 0 0 100 0 0 0 0 0 7350088 50428 365992 0 0 0 0 15 9 0 0 100 0 0 0 0 0 7350088 50428 365992 0 0 0 0 18 12 0 0 100 0 0
The above system is not swapping as si and so are zero.
Java processes and faulting
$ sudo sar -B 3 5 Linux 2.6.32-642.3.1.el6.x86_64 (typhon) 07/26/2016 _x86_64_ (2 CPU) 05:40:16 PM pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff 05:40:19 PM 0.00 6.67 12.33 0.00 22.67 0.00 0.00 0.00 0.00 05:40:22 PM 0.00 0.00 14.00 0.00 25.00 0.00 0.00 0.00 0.00 05:40:25 PM 0.00 0.00 16.33 0.00 28.67 0.00 0.00 0.00 0.00 05:40:28 PM 0.00 0.00 14.33 0.00 26.33 0.00 0.00 0.00 0.00 05:40:31 PM 0.00 0.00 11.00 0.00 22.67 0.00 0.00 0.00 0.00 Average: 0.00 1.33 13.60 0.00 25.07 0.00 0.00 0.00 0.00
What is the difference between a "fault", sometimes known as a "soft fault", and a "major fault" (aka "hard fault")? Soft fault happens when the process needs a page that is already in memory, but was freed by the page replacement process. Major or "hard" fault happens when the page needs to be brought into memory from disk. Major faults are, of course, much more expensive and take much longer to complete then the soft ones. Large number of major page faults can slow the system down to the crawl. On an average system, major page faults are responsible for the vast majority of the CPU time spent in the kernel mode.
Also look at major faults on the Java process with ps:
$ sudo ps -o pid,ppid,flags,rss,resident,size,min_flt,maj_flt,share,vsize 3275 PID PPID F RSS RES SZ MINFL MAJFL - VSZ 3275 3239 0 1555480 - 2871068 443437 0 – 2972676
"MAJFL" should be zero at all times. The VSZ (virtual size) is the virtual size of the whole process in core and out. This number should equal your JVM heap size maximum plus a few more for overhead. RSS is resident set size which is how much of the memory of the VSZ is resident in the core at this moment. This is "garbage collection" (GC) happening this number will fluctuate with the GC going on. SZ is size this is how much ram is mapped in physical pages but not necessarily in the core right now. VSZ is the number of pages mapped in both physical pages and virtual ones (virtual meaning swap to disk also).
OOM killer
OOM or Out of memory errors and adjustment.
https://linux-mm.org/OOM_Killer - "It is the job of the linux 'oom killer' to sacrifice one or more processes in order to free up memory for the system when all else fails. It will also kill any process sharing the same mm_struct as the selected process, for obvious reasons. Any particular process leader may be immunized against the oom killer if the value of its /proc/<pid>/oomadj is set to the constant OOM_DISABLE (currently defined as -17)."
See also https://www.kernel.org/doc/gorman/html/understand/understand016.html
An example of an OOM in the systems log (/var/log/messages usually):
Mar 13 15:40:33 web1 kernel: mysqld invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0 Mar 13 15:40:33 web1 kernel: mysqld cpuset=/ mems_allowed=0 Mar 13 15:40:33 web1 kernel: Pid: 18355, comm: mysqld Not tainted 3.2.1-linode40 #1 Mar 13 15:40:33 web1 kernel: Call Trace: Mar 13 15:40:33 web1 kernel: [<c01958fd>] ? T.662+0x7d/0x1b0 Mar 13 15:40:33 web1 kernel: [<c01067fb>] ? xen_restore_fl_direct_reloc+0x4/0x4 Mar 13 15:40:33 web1 kernel: [<c06e1431>] ? _raw_spin_unlock_irqrestore+0x11/0x20 Mar 13 15:40:33 web1 kernel: [<c04707a7>] ? ___ratelimit+0x97/0x110 Mar 13 15:40:33 web1 kernel: [<c0158fb1>] ? get_task_cred+0x11/0x50 Mar 13 15:40:33 web1 kernel: [<c0195a8e>] ? T.661+0x5e/0x150
I've just read the kernel documentation for "oom_adj" (filesytems/proc.txt) :
2.12 /proc/<pid>/oom_adj - Adjust the oom-killer score ------------------------------------------------------ This file can be used to adjust the score used to select which processes should be killed in an out-of-memory situation. Giving it a high score will increase the likelihood of this process being killed by the oom-killer. Valid values are in the range -16 to +15, plus the special value -17, which disables oom-killing altogether for this process.
Network
nload - http://www.roland-riegel.de/nload/ and http://www.cyberciti.biz/networking/nload-linux-command-to-monitor-network-traffic-bandwidth-usage/
nmon - http://nmon.sourceforge.net/pmwiki.php
mtr - http://www.bitwizard.nl/mtr/
Socket Stats - ss command. http://www.cyberciti.biz/tips/linux-investigate-sockets-network-connections.html
Good articles
http://www.brendangregg.com/blog/2015-02-27/linux-profiling-at-netflix.html