最新消息:

利用kernel dump分析内核故障

debug admin 7007浏览 0评论

故障描述:
报告某服务器出现故障,测试发现ssh和ping均不通,尝试ILO连接也失败(网页打不开)。

在准备使用ipmitool命令重启服务器时,服务器恢复正常,能够ssh登录。发现服务器前几分钟自动重启了。

为了尽快恢复故障,首先将应用服务启动起来。之后排查了系统日志、硬件信息,均未发现异常。

根据经验,看了下操作系统版本,是RHEL6.1 x86_64,怀疑是内核bug所致。

通过kernel dump方法,最后发现是swapper导致的系统crash,swapper这个进程是linux系统的首进程(pid=0)。

解决方法:
通过升级内核解决问题,升级至RHEL6.5的内核版本。

利用kernel dump分析内核故障方法:
1、安装相关包
安装对应内核版本的几个包:

crash-trace-command-1.0-3.el6.x86_64
crash-6.1.0-5.el6.x86_64
kernel-debuginfo-2.6.32-131.0.15.el6.x86_64
kernel-debuginfo-common-x86_64-2.6.32-131.0.15.el6.x86_64

这些包可以从http://debuginfo.centos.org/6/x86_64/下载。

2、分析
#找到kernel crash目录下
#执行指令:crash /usr/lib/debug/lib/modules/2.6.32-xxx(相应内核版本)/vmlinux ./vmcore
#输出内容的PANIC字段一般会告知是不是内核bug,COMMAND字段表示哪个进程引起的crash

[root@17173.com 127.0.0.1-2014-09-16-10:20:28]# cd /var/crash/127.0.0.1-2014-09-16-10:20:28
[root@17173.com 127.0.0.1-2014-09-16-10:20:28]# crash /usr/lib/debug/lib/modules/2.6.32-131.0.15.el6.x86_64/vmlinux ./vmcore

crash 6.1.0-5.el6
Copyright (C) 2002-2012 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.

GNU gdb (GDB) 7.3.1
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

WARNING: kernel version inconsistency between vmlinux and dumpfile

      KERNEL: /usr/lib/debug/lib/modules/2.6.32-131.0.15.el6.x86_64/vmlinux
    DUMPFILE: ./vmcore  [PARTIAL DUMP]
        CPUS: 16
        DATE: Tue Sep 16 10:19:24 2014
      UPTIME: 273 days, 22:14:50
LOAD AVERAGE: 0.11, 0.30, 0.43
       TASKS: 2335
    NODENAME: myhost.17173ops.com
     RELEASE: 2.6.32-131.0.15.el6.x86_64
     VERSION: #1 SMP Tue May 10 15:42:40 EDT 2011
     MACHINE: x86_64  (2400 Mhz)
      MEMORY: 24 GB
       PANIC: ""
         PID: 0
     COMMAND: "swapper"
        TASK: ffff88037222ca80  (1 of 16)  [THREAD_INFO: ffff88067273e000]
         CPU: 2
       STATE: TASK_RUNNING (PANIC)

crash> quit

转载请注明:爱开源 » 利用kernel dump分析内核故障

您必须 登录 才能发表评论!