Linux CPU Monitoring and Tuning

From Public wiki of Kevin P. Inscoe
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

CPU monitoring tools

$ sar -u 12 5

Report CPU utilization. The following values are displayed:

  • %user: Percentage of CPU utilization that occurred while executing at the user level (application).
  • %nice: Percentage of CPU utilization that occurred while executing at the user level with nice priority.
  • %system: Percentage of CPU utilization that occurred while executing at the system level (kernel).
  • %iowait: Percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request.
  • %idle: Percentage of time that the CPU or CPUs were idle and the system did not have an outstanding disk I/O request.

Finally, you need to determine which process is monopolizing or eating the CPUs. Following command will displays the top 10 CPU users on the Linux system.

$sudo ps -eo pcpu,pid,user,args | sort -k 1 -r | head -10

OR

$ sudo ps -eo pcpu,pid,user,args | sort -r -k1 | less
%CPU   PID USER     COMMAND
  96  2148 vivek    /usr/lib/vmware/bin/vmware-vmx -C /var/lib/vmware/Virtual Machines/Ubuntu 64-bit/Ubuntu 64-bit.vmx -@ ""
 0.7  3358 mysql    /usr/libexec/mysqld --defaults-file=/etc/my.cnf --basedir=/usr --datadir=/var/lib/mysql --user=mysql --pid-file=/var/run/mysqld/mysqld.pid --skip-locking --socket=/var/lib/mysql/mysql.sock
 0.4 29129 lighttpd /usr/bin/php
 0.4 29128 lighttpd /usr/bin/php
 0.4 29127 lighttpd /usr/bin/php
 0.4 29126 lighttpd /usr/bin/php
 0.2  2177 vivek    [vmware-rtc]
 0.0     9 root     [kacpid]
 0.0     8 root     [khelper]

To see who’s using CPU:

$ sudo ps -e -o pcpu,cpu,nice,state,cputime,args --sort=-pcpu | head -n 10
%CPU CPU  NI S     TIME COMMAND
 100   -   0 R 00:01:31 dd if=/dev/zero of=/dev/null
 0.0   -   0 S 00:00:01 /sbin/init
 0.0   -   0 S 00:00:00 [kthreadd]
 0.0   -   - S 00:00:00 [migration/0]
 0.0   -   0 S 00:00:00 [ksoftirqd/0]
 0.0   -   - S 00:00:00 [stopper/0]
 0.0   -   - S 00:00:00 [watchdog/0]
 0.0   -   - S 00:00:00 [migration/1]
 0.0   -   - S 00:00:00 [stopper/1]

To look at cpu overall:

$ vmstat 20 3
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0      0 7173628  98296 436428    0    0     2     1   26    8  0  1 99  0  0
 1  0      0 7173612  98296 436428    0    0     0     0 1010    9 13 37 50  0  0
 1  0      0 7173116  98312 436432    0    0     0     3 1016   13 12 38 50  0  0

Field Description For Vm Mode

(a) procs is the process-related fields are:

  • r: The number of processes waiting for run time.
  • b: The number of processes in uninterruptible sleep.

(b) memory is the memory-related fields are:

  • swpd: the amount of virtual memory used.
  • free: the amount of idle memory.
  • buff: the amount of memory used as buffers.
  • cache: the amount of memory used as cache.

(c) swap is swap-related fields are:

  • si: Amount of memory swapped in from disk (/s).
  • so: Amount of memory swapped to disk (/s).

(d) io is the I/O-related fields are:

  • bi: Blocks received from a block device (blocks/s).
  • bo: Blocks sent to a block device (blocks/s).

(e) system is the system-related fields are:

  • in: The number of interrupts per second, including the clock.
  • cs: The number of context switches per second.

(f) cpu is the CPU-related fields are:

These are percentages of total CPU time.

  • us: Time spent running non-kernel code. (user time, including nice time)
  • sy: Time spent running kernel code. (system time)
  • id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time.
  • wa: Time spent waiting for IO. Prior to Linux 2.5.41, shown as zero.

To introduce cpu load

Perhaps you want to introduce a cpu load for tuning or analysis.

You can do this with this command:

$ dd if=/dev/zero of=/dev/null

This will run forever until you enter Control-C.

A note on threading

Non-threaded applications tend to consume mostly one cpu where a well balanced threaded application should divide its time across all available cpus.

For stats on how well you are threading across multiple cpus look at file /proc/stat.

Here is a very non-balanced application:

$ cat /proc/stat
cpu  101506 126 285419 21111496 1760 19 11 1297 0
cpu0 5408 51 3912 10740445 410 18 6 596 0
cpu1 96097 75 281507 10371050 1349 0 5 700 0
...

The very first "cpu" line aggregates the numbers in all of the other "cpuN" lines.

These numbers identify the amount of time the CPU has spent performing different kinds of work. Time units are in USER_HZ or Jiffies (typically hundredths of a second).

The meanings of the columns are as follows, from left to right:

  • user: normal processes executing in user mode
  • nice: niced processes executing in user mode
  • system: processes executing in kernel mode
  • idle: twiddling thumbs
  • iowait: waiting for I/O to complete
  • irq: servicing interrupts
  • softirq: servicing softirqs

mpstat shows one processor favored on this instance:

$ mpstat -P ALL 20 3
Linux 2.6.32-642.3.1.el6.x86_64 (typhon)        07/27/2016      _x86_64_        (2 CPU)

...

Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
Average:     all   12.47    0.00   37.54    0.00    0.00    0.00    0.00    0.00   49.99
Average:       0    0.00    0.00    0.02    0.00    0.00    0.00    0.00    0.00   99.98
Average:       1   24.95    0.00   75.05    0.00    0.00    0.00    0.00    0.00    0.00

You can show threads by using lsof command or ps. Here you can see the process that is consuming cpu all on one cpu:

$ ps -eLf
UID        PID  PPID   LWP  C NLWP STIME TTY    TIME       CMD
root         1     0     1       0    1     Jul26 ?        00:00:01 /sbin/init
root         2     0     2       0    1     Jul26 ?        00:00:00 [kthreadd]
root         3     2     3       0    1     Jul26 ?        00:00:00 [migration/0]
root         4     2     4       0    1     Jul26 ?        00:00:00 [ksoftirqd/0]
root         5     2     5       0    1     Jul26 ?        00:00:00 [stopper/0]
...
root     29210  1457 29210  0    1 14:30 ?        00:00:00 sshd: sysoper [priv]
sysoper  29212 29210 29212  0    1 14:30 ?        00:00:00 sshd: sysoper@pts/1
sysoper  29213 29212 29213  0    1 14:30 pts/1    00:00:00 -bash
sysoper  29236 29213 29236 99    1 14:31 pts/1    01:21:17 dd if=/dev/zero of=/dev/null
postfix  29569  1547 29569  0    1 15:38 ?        00:00:00 pickup -l -t fifo -u

NLWP is "Number of Light Weight Processes" - number of lwps threads in the process and LWP (aka SID or TID) (light weight process, or thread) ID of the lwp being reported. The difference between LWP and NWLP is essentially Posix standards. Since Linux 2.4.19 (or so) threads can share the pid of the parent process and have a separate thread id, TID. Most processes have just the one thread and so their TID is is same as their PID. C represents processor utilization. Currently, this is the integer value of the percent usage over the lifetime of the process In the above case we can see the number

For more info see http://yolinux.com/TUTORIALS/LinuxTutorialPosixThreads.html for a tutorial on POSIX threads usage under Linux. See also https://www.akkadia.org/drepper/nptl-design.pdf

ps command options:

  • -H Show threads as if they were processes
  • -L Show threads, possibly with LWP and NLWP columns
  • -T Show threads, possibly with SPID column
  • -m Show threads after processes

You can list threads for a given process several ways:

  • ps --pid <pid> -Lf, ps -eLf for all processes or ps -T to show just thread count
  • top -H -p <pid>
  • Each thread in a process creates a directory under /proc/<pid>/task. Count the number of directories, and you have the number of threads.
  • File /proc/<pid>/status

Examples:

$ sudo ps --pid 2702  -Lf
UID        PID  PPID   LWP  C NLWP STIME TTY          TIME CMD
tomcat    2702     1  2702  0  209 04:39 ?        00:00:00 /usr/lib/jvm/java-1.7.0/bin/java -Xmx3072M -Xm
tomcat    2702     1  2704  0  209 04:39 ?        00:00:03 /usr/lib/jvm/java-1.7.0/bin/java -Xmx3072M -Xm
tomcat    2702     1  2705  0  209 04:39 ?        00:00:21 /usr/lib/jvm/java-1.7.0/bin/java -Xmx3072M -Xm
tomcat    2702     1  2706  0  209 04:39 ?        00:00:22 /usr/lib/jvm/java-1.7.0/bin/java -Xmx3072M -Xm
tomcat    2702     1  2707  0  209 04:39 ?        00:00:39 /usr/lib/jvm/java-1.7.0/bin/java -Xmx3072M -Xm
tomcat    2702     1  2708  0  209 04:39 ?        00:00:00 /usr/lib/jvm/java-1.7.0/bin/java -Xmx3072M -Xm
tomcat    2702     1  2709  0  209 04:39 ?        00:00:01 /usr/lib/jvm/java-1.7.0/bin/java -Xmx3072M -Xm
tomcat    2702     1  2710  0  209 04:39 ?        00:00:00 /usr/lib/jvm/java-1.7.0/bin/java -Xmx3072M -Xm
tomcat    2702     1  2711  0  209 04:39 ?        00:03:03 /usr/lib/jvm/java-1.7.0/bin/java -Xmx3072M -Xm
tomcat    2702     1  2712  0  209 04:39 ?        00:03:03 /usr/lib/jvm/java-1.7.0/bin/java -Xmx3072M -Xm
...
$ sudo ps -T -p 2702 
  PID  SPID TTY          TIME CMD
 2702  2702 ?        00:00:00 java
 2702  2704 ?        00:00:03 java
 2702  2705 ?        00:00:21 java
 2702  2706 ?        00:00:22 java
 2702  2707 ?        00:00:39 java
 2702  2708 ?        00:00:00 java
 2702  2709 ?        00:00:01 java
 2702  2710 ?        00:00:00 java
$ top -H -p 2702
top - 18:30:32 up 105 days,  8:33,  1 user,  load average: 0.02, 0.05, 0.05
Tasks: 209 total,   0 running, 209 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.0%us,  0.1%sy,  0.0%ni, 98.8%id,  0.1%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   4049992k total,  3621592k used,   428400k free,   186140k buffers
Swap:  8388604k total,    14848k used,  8373756k free,   968856k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                    
 2916 tomcat    20   0 5286m 2.3g  27m S  2.0 58.7   1:46.19 java                                        
 2920 tomcat    20   0 5286m 2.3g  27m S  2.0 58.7   0:00.96 java                                        
 2702 tomcat    20   0 5286m 2.3g  27m S  0.0 58.7   0:00.01 java                                        
 2704 tomcat    20   0 5286m 2.3g  27m S  0.0 58.7   0:03.04 java                                        
 2705 tomcat    20   0 5286m 2.3g  27m S  0.0 58.7   0:21.74 java                                        
 2706 tomcat    20   0 5286m 2.3g  27m S  0.0 58.7   0:22.02 java                                        
 2707 tomcat    20   0 5286m 2.3g  27m S  0.0 58.7   0:39.16 java                                        
 2708 tomcat    20   0 5286m 2.3g  27m S  0.0 58.7   0:00.51 java                                        
 2709 tomcat    20   0 5286m 2.3g  27m S  0.0 58.7   0:01.10 java                                        
 2710 tomcat    20   0 5286m 2.3g  27m S  0.0 58.7   0:00.00 java                                        
 2711 tomcat    20   0 5286m 2.3g  27m S  0.0 58.7   3:03.78 java                                        
 2712 tomcat    20   0 5286m 2.3g  27m S  0.0 58.7   3:04.78 java                                        
 2713 tomcat    20   0 5286m 2.3g  27m S  0.0 58.7   0:00.00 java                                        
 2714 tomcat    20   0 5286m 2.3g  27m S  0.0 58.7   0:00.00 java                                        
 2715 tomcat    20   0 5286m 2.3g  27m S  0.0 58.7   0:27.43 java                                        
 2716 tomcat    20   0 5286m 2.3g  27m S  0.0 58.7   0:00.00 java                                        
 2717 tomcat    20   0 5286m 2.3g  27m S  0.0 58.7   0:00.00 java                                        
 2718 tomcat    20   0 5286m 2.3g  27m S  0.0 58.7   0:00.00 java                                        
 2721 tomcat    20   0 5286m 2.3g  27m S  0.0 58.7   0:00.02 java                                        
 2727 tomcat    20   0 5286m 2.3g  27m S  0.0 58.7   0:00.11 java                                        
 2728 tomcat    20   0 5286m 2.3g  27m S  0.0 58.7   0:00.00 java                                        
 2731 tomcat    20   0 5286m 2.3g  27m S  0.0 58.7   0:00.04 java                                        
 2732 tomcat    20   0 5286m 2.3g  27m S  0.0 58.7   0:10.43 java                                        
 2734 tomcat    20   0 5286m 2.3g  27m S  0.0 58.7   0:00.42 java     
$ sudo ls -l /proc/2702/task
total 0
dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:46 11238
dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:46 11998
dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:46 12400
dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:17 12962
dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:17 12964
dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:17 12965
dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:17 13097
dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:46 13385
dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:46 13388
dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:46 13389
dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:46 13390
dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:46 13391
dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:46 13392
dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:17 14928
dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:17 14963
dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:17 15629
dr-xr-xr-x 7 tomcat tomcat 0 Jul 27 18:17 15720
...
$ cat /proc/2702/status | grep Thread
Threads:        207

Run queues

use sar -q, it will give you the number of tasks in the task list under the column plist-sz.

$ sar -q 3 5
Linux 4.1.13-18.26.amzn1.x86_64 (ip-10-32-8-250)        07/27/2016      _x86_64_        (2 CPU)

06:32:39 PM   runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15
06:32:42 PM         0       292      0.15      0.07      0.06
06:32:45 PM         0       293      0.14      0.06      0.06
06:32:48 PM         0       293      0.14      0.06      0.06
06:32:51 PM         0       293      0.13      0.06      0.05
06:32:54 PM         0       293      0.13      0.06      0.05
Average:            0       293      0.14      0.06      0.06