Replacing a failing rootvg disk on AIX

Works on : AIX

Let’s suppose you’re getting permanent hardware errors on hdisk0  when running the errpt -a command on an IBM AIX server.

In order to check that both disks are really assigned to the volume group, you should start with:
lsvg -p rootvg
You should see both hdisk0 and hdisk1 under the PV name.

A second thing to check would be that the re really are copies:
lsvg -l rootvg
Just check that there is a 1:2 relationship between LPs and PPs, and that PVs is equal to 2. Otherwise, you should check that the volume that’s not copied doesn’t reside on the failing disk with:
lslv -l LV_NAME

Once you’ve done these preliminary checks, you can start detaching hdisk0 from the volume:
unmirrorvg rootvg hdisk0

After running the command, I’ve sometimes had these messages, which are mostly informational:
0516-1246 rmlvcopy: If hd5 is the boot logical volume, please run ‘chpv -c <diskname>’
as root user to clear the boot record and avoid a potential boot
off an old boot image that may reside on the disk from which this
logical volume is moved/removed.
0301-108 mkboot: Unable to read file blocks. Return code: -1
0516-1132 unmirrorvg: Quorum requirement turned on, reboot system for this
to take effect for rootvg.
0516-1144 unmirrorvg: rootvg successfully unmirrored, user should perform
bosboot of system to reinitialize boot records.  Then, user must modify
bootlist to just include:  hdisk0.

Then we reduce the volume:
reducevg rootvg hdisk0

And remove the device from configuration:
rmdev -dl hdisk0

Then, we will have to power down the machine, as we’re dealing with a rootvg disk. However, before doing so, it’s preferable to check whether we will boot of from the right drive:
bootinfo -b will tell you which drive was last booted up.
If it’s the failed drive (hdisk0 in our case), we should change it to the drive still usable (hdisk1 in our case) by creating the boot image on hdisk1 and recrcreating the fixed ipldevice link, which was deleted by the previous rmdev command  :
bosboot -ad /dev/hdisk1

ln /dev/rhdisk1 /dev/ipldevice

Then, we can check bootlist:
bootlist -m normal -o

… And now, we can finally power down our server, replace the failed drive, and power it back on…

Once the server has booted up, we should run:
cfgmgr
so that the OS will recognize the new disk.

To check that AIX really has done its job, run:
lsdev -Cc disk
which should list both disks hdisk0 and hdisk1

Now, we can assign the new disk to the rootvg volume group:
extendvg rootvg hdisk0

Then we mirror the group:
mirrorvg rootvg

Wait for hdisk1 to complete copying on hdisk0 (it can take some time, as you can imagine). You can check activity with iostat.

You should check that both disks are really assigned to rootvg by typing:
lsvg -p rootvg

An lsvg -l rootvg will show you whether mirroring has worked OK. You should once again have a 1:2 relationship between LPs and PPs.

Then, create the boot image on the new disk:
bosboot -a -d hdisk0

Finally, modify the bootlist to take into account both disks:
bootlist -m normal hdisk0 hdisk1
Check with:
bootlist -m -normal -o

And you’re finally done!

Happy computing.

Drop me a comment if this post has been useful to you, or if you see any reason for add-on or modification.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s