GPFS issues troubleshooting

When you got GPFS issue?

Got a problem? Don’t panic!
Check for possible basic problems:
  • Is Network OK?
  • Check status of the cluster: “mmgetstate–a”
  • Check status of NSDs: “mmlsdisk fsname”
Take a 5 min break
  • In major cases GPFS will recover by it self without need of any intervention from the administrator
If not recovered
  • Ensure that you are the only person who is doing the work!
  • check gpfslogs (first on cluster manager, then on FS manager, then on NSD servers)
  • check syslog(/var/log/messages) for eventual errors
  • Check disks availability (mmlsdisk fsname)
  • Consult “Problem determination guide”

Some usefull commands:

  • “mmfsadm dump waiters” will help to find long lasting processes
  • “mmdiag –network|grep pending” helps to individuate non-responsive node
  • “mmdiag –iohist” lists last 512 I/O operations performed by GPFS on current node (helps to find malfunctioning disk)
  • “gpfs.snap” will garter all logs and configurations from all nodes in the cluster
  • the first thing to send to IBM support when opening service reques

GPFS V3.4 Problem Determination Guide:

NFS stale file handle:

When a GPFS mount point is in the “NFS stale file handle” status, example
[root@um-gpfs1 root]# df
Filesystem 1K-blocks Used Available Use% Mounted on !
/dev/gpfs_um1 8125032448 8023801088 101231360 99% /storage/gpfs_um
df: `/storage/gpfs_um’: Stale NFS file handle

Then check if there is any NSD with status “down” 

[root@um-gpfs1 root]# mmlsdisk gpfs_um
disk driver sector failure holds holds
name type size group metadata data status availability
———— ——– —— ——- ——– —– ————- ————
disk21 nsd 512 4015 yes yes ready up !
disk22 nsd 512 4015 yes yes ready down !
disk23 nsd 512 4015 yes yes ready down !
disk24 nsd 512 4013 yes yes ready up !
restart the NSDs (important: do it for all NSD with status “down” in one command): 
[root@um-gpfs1 root]# mmchdisk gpfs_um start -d “disk21;disk24”
re-mount filesystems

Recovery of GPFS configuration:

If a node of the cluster lost its configuration (has been re-installed) but still present as member of this cluster
(“mmgetstate” lists it in “unknown” state) use this command to recover the node:
/usr/lpp/mmfs/bin/mmsdrrestore -p diskserv-san-5 -R /usr/bin/scp

Checking existing NSD:

  • If get this warning while creating new nsd Disk descriptor xxx system refers to an existing NSD
Use this command to verify if this device is actually used in one of the file systems
mmfsadm test readdescraw /dev/emcpowerax
–Viewing the GPFS disks
# mmlsdisk /dev/slvdata01208
Use the mmlsfs command to view the attributes and values of a GPFS file system.
# mmlsfs /dev/slvdata01208
view the nodes in your GPFS nodesets
# mmlsnode -a
view which GPFS file systems
mmlsconfig

test if Oracle TDP (RMAN) is working properly?
# tdpoconf showenv

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s