When you got GPFS issue?
Check for possible basic problems:
- Is Network OK?
- Check status of the cluster: “mmgetstate–a”
- Check status of NSDs: “mmlsdisk fsname”
Take a 5 min break
- In major cases GPFS will recover by it self without need of any intervention from the administrator
If not recovered
- Ensure that you are the only person who is doing the work!
- check gpfslogs (first on cluster manager, then on FS manager, then on NSD servers)
- check syslog(/var/log/messages) for eventual errors
- Check disks availability (mmlsdisk fsname)
- Consult “Problem determination guide”
Some usefull commands:
- “mmfsadm dump waiters” will help to find long lasting processes
- “mmdiag –network|grep pending” helps to individuate non-responsive node
- “mmdiag –iohist” lists last 512 I/O operations performed by GPFS on current node (helps to find malfunctioning disk)
- “gpfs.snap” will garter all logs and configurations from all nodes in the cluster
- the first thing to send to IBM support when opening service reques
GPFS V3.4 Problem Determination Guide:
NFS stale file handle:
Then check if there is any NSD with status “down”
Recovery of GPFS configuration:
Checking existing NSD:
- If get this warning while creating new nsd Disk descriptor xxx system refers to an existing NSD
# mmlsdisk /dev/slvdata01208
Use the mmlsfs command to view the attributes and values of a GPFS file system.
# mmlsfs /dev/slvdata01208
view the nodes in your GPFS nodesets
# mmlsnode -a
view which GPFS file systems
test if Oracle TDP (RMAN) is working properly?
# tdpoconf showenv