Linux CPU Monitoring and Tuning
CPU monitoring tools
- mpstat (try command: mpstat -P ALL)
- sar -u
- ps command
- iostat
- vmstat
- directory /sys/devices/system/cpu or file /proc/cpuinfo to identify cpus present - see https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-devices-system-cpu
$ sar -u 12 5
Report CPU utilization. The following values are displayed:
- %user: Percentage of CPU utilization that occurred while executing at the user level (application).
- %nice: Percentage of CPU utilization that occurred while executing at the user level with nice priority.
- %system: Percentage of CPU utilization that occurred while executing at the system level (kernel).
- %iowait: Percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request.
- %idle: Percentage of time that the CPU or CPUs were idle and the system did not have an outstanding disk I/O request.
Finally, you need to determine which process is monopolizing or eating the CPUs. Following command will displays the top 10 CPU users on the Linux system.
$sudo ps -eo pcpu,pid,user,args | sort -k 1 -r | head -10
OR
$ sudo ps -eo pcpu,pid,user,args | sort -r -k1 | less
%CPU PID USER COMMAND 96 2148 vivek /usr/lib/vmware/bin/vmware-vmx -C /var/lib/vmware/Virtual Machines/Ubuntu 64-bit/Ubuntu 64-bit.vmx -@ "" 0.7 3358 mysql /usr/libexec/mysqld --defaults-file=/etc/my.cnf --basedir=/usr --datadir=/var/lib/mysql --user=mysql --pid-file=/var/run/mysqld/mysqld.pid --skip-locking --socket=/var/lib/mysql/mysql.sock 0.4 29129 lighttpd /usr/bin/php 0.4 29128 lighttpd /usr/bin/php 0.4 29127 lighttpd /usr/bin/php 0.4 29126 lighttpd /usr/bin/php 0.2 2177 vivek [vmware-rtc] 0.0 9 root [kacpid] 0.0 8 root [khelper]
To see who’s using CPU:
$ sudo ps -e -o pcpu,cpu,nice,state,cputime,args --sort=-pcpu | head -n 10 %CPU CPU NI S TIME COMMAND 100 - 0 R 00:01:31 dd if=/dev/zero of=/dev/null 0.0 - 0 S 00:00:01 /sbin/init 0.0 - 0 S 00:00:00 [kthreadd] 0.0 - - S 00:00:00 [migration/0] 0.0 - 0 S 00:00:00 [ksoftirqd/0] 0.0 - - S 00:00:00 [stopper/0] 0.0 - - S 00:00:00 [watchdog/0] 0.0 - - S 00:00:00 [migration/1] 0.0 - - S 00:00:00 [stopper/1]
To look at cpu overall:
$ vmstat 20 3 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 1 0 0 7173628 98296 436428 0 0 2 1 26 8 0 1 99 0 0 1 0 0 7173612 98296 436428 0 0 0 0 1010 9 13 37 50 0 0 1 0 0 7173116 98312 436432 0 0 0 3 1016 13 12 38 50 0 0
Field Description For Vm Mode
(a) procs is the process-related fields are:
- r: The number of processes waiting for run time.
- b: The number of processes in uninterruptible sleep.
(b) memory is the memory-related fields are:
- swpd: the amount of virtual memory used.
- free: the amount of idle memory.
- buff: the amount of memory used as buffers.
- cache: the amount of memory used as cache.
(c) swap is swap-related fields are:
- si: Amount of memory swapped in from disk (/s).
- so: Amount of memory swapped to disk (/s).
(d) io is the I/O-related fields are:
- bi: Blocks received from a block device (blocks/s).
- bo: Blocks sent to a block device (blocks/s).
(e) system is the system-related fields are:
- in: The number of interrupts per second, including the clock.
- cs: The number of context switches per second.
(f) cpu is the CPU-related fields are:
These are percentages of total CPU time.
- us: Time spent running non-kernel code. (user time, including nice time)
- sy: Time spent running kernel code. (system time)
- id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time.
- wa: Time spent waiting for IO. Prior to Linux 2.5.41, shown as zero.
To introduce cpu load
Perhaps you want to introduce a cpu load for tuning or analysis.
You can do this with this command:
$ dd if=/dev/zero of=/dev/null
This will run forever until you enter Control-C.
A note on threading
Non-threaded applications tend to consume mostly one cpu where a well balanced threaded application should divide its time across all available cpus.
For stats on how well you are threading across multiple cpus look at file /proc/stat.
Here is a very non-balanced application:
$ cat /proc/stat cpu 101506 126 285419 21111496 1760 19 11 1297 0 cpu0 5408 51 3912 10740445 410 18 6 596 0 cpu1 96097 75 281507 10371050 1349 0 5 700 0 ...
The very first "cpu" line aggregates the numbers in all of the other "cpuN" lines.
These numbers identify the amount of time the CPU has spent performing different kinds of work. Time units are in USER_HZ or Jiffies (typically hundredths of a second).
The meanings of the columns are as follows, from left to right:
- user: normal processes executing in user mode
- nice: niced processes executing in user mode
- system: processes executing in kernel mode
- idle: twiddling thumbs
- iowait: waiting for I/O to complete
- irq: servicing interrupts
- softirq: servicing softirqs
mpstat shows one processor favored on this instance:
$ mpstat -P ALL 20 3 Linux 2.6.32-642.3.1.el6.x86_64 (typhon) 07/27/2016 _x86_64_ (2 CPU) ... Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle Average: all 12.47 0.00 37.54 0.00 0.00 0.00 0.00 0.00 49.99 Average: 0 0.00 0.00 0.02 0.00 0.00 0.00 0.00 0.00 99.98 Average: 1 24.95 0.00 75.05 0.00 0.00 0.00 0.00 0.00 0.00
You can show threads by using lsof command or ps. Here you can see the process that is consuming cpu all on one cpu:
$ ps -eLf UID PID PPID LWP C NLWP STIME TTY TIME CMD root 1 0 1 0 1 Jul26 ? 00:00:01 /sbin/init root 2 0 2 0 1 Jul26 ? 00:00:00 [kthreadd] root 3 2 3 0 1 Jul26 ? 00:00:00 [migration/0] root 4 2 4 0 1 Jul26 ? 00:00:00 [ksoftirqd/0] root 5 2 5 0 1 Jul26 ? 00:00:00 [stopper/0] ... root 29210 1457 29210 0 1 14:30 ? 00:00:00 sshd: sysoper [priv] sysoper 29212 29210 29212 0 1 14:30 ? 00:00:00 sshd: sysoper@pts/1 sysoper 29213 29212 29213 0 1 14:30 pts/1 00:00:00 -bash sysoper 29236 29213 29236 99 1 14:31 pts/1 01:21:17 dd if=/dev/zero of=/dev/null postfix 29569 1547 29569 0 1 15:38 ? 00:00:00 pickup -l -t fifo -u
NLWP is "Number of Light Weight Processes" - number of lwps threads in the process and LWP (aka SID or TID) (light weight process, or thread) ID of the lwp being reported. The difference between LWP and NWLP is essentially Posix standards. Since Linux 2.4.19 (or so) threads can share the pid of the parent process and have a separate thread id, TID. Most processes have just the one thread and so their TID is is same as their PID. C represents processor utilization. Currently, this is the integer value of the percent usage over the lifetime of the process In the above case we can see the number
For more info see http://yolinux.com/TUTORIALS/LinuxTutorialPosixThreads.html for a tutorial on POSIX threads usage under Linux. See also https://www.akkadia.org/drepper/nptl-design.pdf
ps command options:
- -H Show threads as if they were processes
- -L Show threads, possibly with LWP and NLWP columns
- -T Show threads, possibly with SPID column
- -m Show threads after processes
You can list threads for a given process several ways:
- ps --pid <pid> -Lf, ps -eLf for all processes or ps -T to show just thread count
- top -H -p <pid>
- Each thread in a process creates a directory under /proc/<pid>/task. Count the number of directories, and you have the number of threads.
- File /proc/<pid>/status
Examples:
$ sudo ps --pid 2702 -Lf UID PID PPID LWP C NLWP STIME TTY TIME CMD tomcat 2702 1 2702 0 209 04:39 ? 00:00:00 /usr/lib/jvm/java-1.7.0/bin/java -Xmx3072M -Xm tomcat 2702 1 2704 0 209 04:39 ? 00:00:03 /usr/lib/jvm/java-1.7.0/bin/java -Xmx3072M -Xm tomcat 2702 1 2705 0 209 04:39 ? 00:00:21 /usr/lib/jvm/java-1.7.0/bin/java -Xmx3072M -Xm tomcat 2702 1 2706 0 209 04:39 ? 00:00:22 /usr/lib/jvm/java-1.7.0/bin/java -Xmx3072M -Xm tomcat 2702 1 2707 0 209 04:39 ? 00:00:39 /usr/lib/jvm/java-1.7.0/bin/java -Xmx3072M -Xm tomcat 2702 1 2708 0 209 04:39 ? 00:00:00 /usr/lib/jvm/java-1.7.0/bin/java -Xmx3072M -Xm tomcat 2702 1 2709 0 209 04:39 ? 00:00:01 /usr/lib/jvm/java-1.7.0/bin/java -Xmx3072M -Xm tomcat 2702 1 2710 0 209 04:39 ? 00:00:00 /usr/lib/jvm/java-1.7.0/bin/java -Xmx3072M -Xm tomcat 2702 1 2711 0 209 04:39 ? 00:03:03 /usr/lib/jvm/java-1.7.0/bin/java -Xmx3072M -Xm tomcat 2702 1 2712 0 209 04:39 ? 00:03:03 /usr/lib/jvm/java-1.7.0/bin/java -Xmx3072M -Xm ...
$ sudo ps -T -p 2702 PID SPID TTY TIME CMD 2702 2702 ? 00:00:00 java 2702 2704 ? 00:00:03 java 2702 2705 ? 00:00:21 java 2702 2706 ? 00:00:22 java 2702 2707 ? 00:00:39 java 2702 2708 ? 00:00:00 java 2702 2709 ? 00:00:01 java 2702 2710 ? 00:00:00 java
$ top -H -p 2702 top - 18:30:32 up 105 days, 8:33, 1 user, load average: 0.02, 0.05, 0.05 Tasks: 209 total, 0 running, 209 sleeping, 0 stopped, 0 zombie Cpu(s): 1.0%us, 0.1%sy, 0.0%ni, 98.8%id, 0.1%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 4049992k total, 3621592k used, 428400k free, 186140k buffers Swap: 8388604k total, 14848k used, 8373756k free, 968856k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2916 tomcat 20 0 5286m 2.3g 27m S 2.0 58.7 1:46.19 java 2920 tomcat 20 0 5286m 2.3g 27m S 2.0 58.7 0:00.96 java 2702 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 0:00.01 java 2704 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 0:03.04 java 2705 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 0:21.74 java 2706 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 0:22.02 java 2707 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 0:39.16 java 2708 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 0:00.51 java 2709 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 0:01.10 java 2710 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 0:00.00 java 2711 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 3:03.78 java 2712 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 3:04.78 java 2713 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 0:00.00 java 2714 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 0:00.00 java 2715 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 0:27.43 java 2716 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 0:00.00 java 2717 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 0:00.00 java 2718 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 0:00.00 java 2721 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 0:00.02 java 2727 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 0:00.11 java 2728 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 0:00.00 java 2731 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 0:00.04 java 2732 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 0:10.43 java 2734 tomcat 20 0 5286m 2.3g 27m S 0.0 58.7 0:00.42 java
$ sudo ls -l /proc/2702/task total 0 dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:46 11238 dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:46 11998 dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:46 12400 dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:17 12962 dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:17 12964 dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:17 12965 dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:17 13097 dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:46 13385 dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:46 13388 dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:46 13389 dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:46 13390 dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:46 13391 dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:46 13392 dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:17 14928 dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:17 14963 dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:17 15629 dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:17 15720 ...
$ cat /proc/2702/status | grep Thread Threads: 207
Run queues
use sar -q, it will give you the number of tasks in the task list under the column plist-sz.
$ sar -q 3 5 Linux 4.1.13-18.26.amzn1.x86_64 (ip-10-32-8-250) 07/27/2016 _x86_64_ (2 CPU) 06:32:39 PM runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15 06:32:42 PM 0 292 0.15 0.07 0.06 06:32:45 PM 0 293 0.14 0.06 0.06 06:32:48 PM 0 293 0.14 0.06 0.06 06:32:51 PM 0 293 0.13 0.06 0.05 06:32:54 PM 0 293 0.13 0.06 0.05 Average: 0 293 0.14 0.06 0.06