编程不止是一份工作,还是一种乐趣!!!
执行top -c,显示进程运行信息列表,并键入P(大写),进程按照CPU使用率排序
top - 23:35:34 up 44 min, 2 users, load average: 0.38, 0.67, 1.39
Tasks: 103 total, 2 running, 101 sleeping, 0 stopped, 0 zombie
%Cpu(s): 22.9 us, 28.0 sy, 0.0 ni, 48.2 id, 0.0 wa, 0.0 hi, 0.8 si, 0.0 st
KiB Mem : 1865988 total, 1551680 free, 124868 used, 189440 buff/cache
KiB Swap: 2097148 total, 2097148 free, 0 used. 1556884 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1612 john 20 0 2812968 37232 11880 S 88.7 2.0 0:17.87 java Test
1225 john 20 0 147844 2424 1084 S 13.0 0.1 4:11.92 sshd: john@pts/1
44 root 20 0 0 0 0 S 0.7 0.0 0:08.67 [kworker/0:1]
13 root 20 0 0 0 0 S 0.3 0.0 0:00.65 [ksoftirqd/1]
1376 root 20 0 0 0 0 R 0.3 0.0 0:03.59 [kworker/1:0]
1627 john 20 0 157704 2172 1520 R 0.3 0.1 0:00.01 top -c
1 root 20 0 128168 6844 4080 S 0.0 0.4 0:02.15 /usr/lib/systemd/systemd --switched-root --system --deserialize 21
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [kthreadd]
3 root 20 0 0 0 0 S 0.0 0.0 0:00.02 [ksoftirqd/0]
5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 [kworker/0:0H]
6 root 20 0 0 0 0 S 0.0 0.0 0:00.01 [kworker/u256:0]
7 root rt 0 0 0 0 S 0.0 0.0 0:00.48 [migration/0]
8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [rcu_bh]
9 root 20 0 0 0 0 S 0.0 0.0 0:00.32 [rcu_sched]
10 root rt 0 0 0 0 S 0.0 0.0 0:00.01 [watchdog/0]
11 root rt 0 0 0 0 S 0.0 0.0 0:00.01 [watchdog/1]
12 root rt 0 0 0 0 S 0.0 0.0 0:00.39 [migration/1]
15 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 [kworker/1:0H]
17 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [kdevtmpfs]
18 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 [netns]
19 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [khungtaskd]
20 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 [writeback]
21 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 [kintegrityd]
22 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 [bioset]
23 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 [kblockd]
24 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 [md]
30 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [kswapd0]
31 root 25 5 0 0 0 S 0.0 0.0 0:00.00 [ksmd]
由top命令可得到进程1612最耗CPU。
执行top -Hp 1612,显示一个进程的线程运行信息列表,键入P(大写),线程按照CPU使用率排序
top - 23:36:28 up 45 min, 2 users, load average: 1.56, 0.96, 1.45
Threads: 14 total, 1 running, 13 sleeping, 0 stopped, 0 zombie
%Cpu(s): 23.3 us, 27.5 sy, 0.0 ni, 48.0 id, 0.0 wa, 0.0 hi, 1.2 si, 0.0 st
KiB Mem : 1865988 total, 1543940 free, 132608 used, 189440 buff/cache
KiB Swap: 2097148 total, 2097148 free, 0 used. 1549144 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1624 john 20 0 2812968 45080 11880 R 90.3 2.4 1:04.99 java
1625 john 20 0 2812968 45080 11880 S 0.3 2.4 0:00.01 java
1612 john 20 0 2812968 45080 11880 S 0.0 2.4 0:00.00 java
1613 john 20 0 2812968 45080 11880 S 0.0 2.4 0:00.06 java
1614 john 20 0 2812968 45080 11880 S 0.0 2.4 0:00.04 java
1615 john 20 0 2812968 45080 11880 S 0.0 2.4 0:00.04 java
1616 john 20 0 2812968 45080 11880 S 0.0 2.4 0:00.15 java
1617 john 20 0 2812968 45080 11880 S 0.0 2.4 0:00.00 java
1618 john 20 0 2812968 45080 11880 S 0.0 2.4 0:00.00 java
1619 john 20 0 2812968 45080 11880 S 0.0 2.4 0:00.00 java
1620 john 20 0 2812968 45080 11880 S 0.0 2.4 0:00.14 java
1621 john 20 0 2812968 45080 11880 S 0.0 2.4 0:00.02 java
1622 john 20 0 2812968 45080 11880 S 0.0 2.4 0:00.00 java
1623 john 20 0 2812968 45080 11880 S 0.0 2.4 0:00.03 java
由top命令可得到进程1612中最耗CPU的线程是1624。
也可以通过ps与sort命令得到最耗CPU的线程,如下:
[john@server-1 ~]$ ps -mp 1612 -o THREAD,tid | sort -k 2 -rn
john 89.4 - - - - - -
john 88.9 19 - - - - 1624
john 0.2 19 - futex_ - - 1616
john 0.0 19 - skb_wa - - 1642
john 0.0 19 - futex_ - - 1625
john 0.0 19 - futex_ - - 1623
john 0.0 19 - futex_ - - 1622
john 0.0 19 - futex_ - - 1621
john 0.0 19 - futex_ - - 1620
john 0.0 19 - futex_ - - 1619
john 0.0 19 - futex_ - - 1618
john 0.0 19 - futex_ - - 1617
john 0.0 19 - futex_ - - 1615
john 0.0 19 - futex_ - - 1614
john 0.0 19 - futex_ - - 1613
john 0.0 19 - futex_ - - 1612
USER %CPU PRI SCNT WCHAN USER SYSTEM TID
输出的第一行是进程1612的信息,结果根据第二列进行排序,所以第二行的最后一列就是我们要询找的,最耗CPU的线程ID。
$ printf "0x%x\n" 1624
0x658
[john@server-1 ~]$ jstack 1612 | grep 0x658 -A 20
"Thread-0" #8 prio=5 os_prio=0 tid=0x00007f164c0e8000 nid=0x658 runnable [0x00007f1636af1000]
java.lang.Thread.State: RUNNABLE
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:326)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
- locked <0x00000000e3814d78> (a java.io.BufferedOutputStream)
at java.io.PrintStream.write(PrintStream.java:482)
- locked <0x00000000e3804d30> (a java.io.PrintStream)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)
at sun.nio.cs.StreamEncoder.flushBuffer(StreamEncoder.java:104)
- locked <0x00000000e3804cf0> (a java.io.OutputStreamWriter)
at java.io.OutputStreamWriter.flushBuffer(OutputStreamWriter.java:185)
at java.io.PrintStream.newLine(PrintStream.java:546)
- eliminated <0x00000000e3804d30> (a java.io.PrintStream)
at java.io.PrintStream.println(PrintStream.java:807)
- locked <0x00000000e3804d30> (a java.io.PrintStream)
at Test$Thread1.run(Test.java:15)
at java.lang.Thread.run(Thread.java:748)
根据jstack的输出,结合源码很容易就可以发现问题:
import java.util.concurrent.TimeUnit;
public class Test {
public static void main(String[] args) throws InterruptedException {
new Thread(new Thread1()).start();
new Thread(new Thread2()).start();
}
static class Thread1 implements Runnable {
@Override
public void run() {
for (;;) {
System.out.println("hello");
}
}
}
static class Thread2 implements Runnable {
@Override
public void run() {
for (;;) {
System.out.println("hello");
try {
TimeUnit.SECONDS.sleep(2);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
}
}
线上定位问题都是争分夺秒,按照上面的步骤来会比较慢,可以把这些步骤写进一个脚本。
#!/bin/bash
pid=$1
tid=$(ps -mp $pid -o THREAD,tid | sort -k 2 -rn | awk 'NR==2 {printf "0x%x",$8}')
jstack $pid | grep $tid -A 20
现在只要将进程id作为参数转给脚本就可以了:
[john@server-1 ~]$ ./show-thread.sh 1612
"Thread-0" #8 prio=5 os_prio=0 tid=0x00007f164c0e8000 nid=0x658 runnable [0x00007f1636af1000]
java.lang.Thread.State: RUNNABLE
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:326)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
- locked <0x00000000e3814d78> (a java.io.BufferedOutputStream)
at java.io.PrintStream.write(PrintStream.java:482)
- locked <0x00000000e3804d30> (a java.io.PrintStream)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)
at sun.nio.cs.StreamEncoder.flushBuffer(StreamEncoder.java:104)
- locked <0x00000000e3804cf0> (a java.io.OutputStreamWriter)
at java.io.OutputStreamWriter.flushBuffer(OutputStreamWriter.java:185)
at java.io.PrintStream.newLine(PrintStream.java:546)
- eliminated <0x00000000e3804d30> (a java.io.PrintStream)
at java.io.PrintStream.println(PrintStream.java:807)
- locked <0x00000000e3804d30> (a java.io.PrintStream)
at Test$Thread1.run(Test.java:15)
at java.lang.Thread.run(Thread.java:748)
实际上通过ps与sort命令,也可以计算出最耗CPU的进程,所以可以优化一下脚本:
#!/bin/bash
pid=$(ps -aux | sort -k 3 -rn | awk 'NR==1 {print $2}')
tid=$(ps -mp $pid -o THREAD,tid | sort -k 2 -rn | awk 'NR==2 {printf "0x%x",$8}')
jstack $pid | grep $tid -A 20
现在直接执行脚本即可看到线程栈信息了:
[john@server-1 ~]$ ./show-thread.sh
"Thread-0" #8 prio=5 os_prio=0 tid=0x00007f164c0e8000 nid=0x658 runnable [0x00007f1636af1000]
java.lang.Thread.State: RUNNABLE
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:326)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
- locked <0x00000000e38169b0> (a java.io.BufferedOutputStream)
at java.io.PrintStream.write(PrintStream.java:482)
- locked <0x00000000e3804d30> (a java.io.PrintStream)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)
at sun.nio.cs.StreamEncoder.flushBuffer(StreamEncoder.java:104)
- locked <0x00000000e3804cf0> (a java.io.OutputStreamWriter)
at java.io.OutputStreamWriter.flushBuffer(OutputStreamWriter.java:185)
at java.io.PrintStream.newLine(PrintStream.java:546)
- eliminated <0x00000000e3804d30> (a java.io.PrintStream)
at java.io.PrintStream.println(PrintStream.java:807)
- locked <0x00000000e3804d30> (a java.io.PrintStream)
at Test$Thread1.run(Test.java:15)
at java.lang.Thread.run(Thread.java:748)