RAID 1 - Basic Health check commands

Hi,

What are the basic commands that one could use to check a health of RAID1? What happens if one disk fails? Will the system trigger some error message? Or user needs to continually monitor log files?
thank you

check /proc/mdstat for health check
You can always create a cron job to mail you periodically about the status of the drives.

The reason I asked is because I’m frequently getting this message at the boot which stops system from booting and I need to enter ^D to continue to boot or enter root password and reboot.

dmesg | grep md3
[   12.545589] md/raid1:md3: active with 2 out of 2 mirrors
[   12.558315] created bitmap (15 pages) for device md3
[   12.568714] md3: bitmap initialized from disk: read 1/1 pages, set 0 of 29479 bits
[   12.617839] md3: detected capacity change from 0 to 1978285285376
[   12.688416]  md3: unknown partition table
[   14.869197] udevd[519]: '/sbin/blkid -o udev -p /dev/md3' [961] terminated by signal 15 (Terminated)

==> here it stops booting... ^D to continue to boot or enter root password

[   31.253200] systemd-fsck[1022]: /dev/md3: clean, 1979/120750080 files, 10850286/482979806 blocks
[   32.151784] EXT4-fs (md3): mounted filesystem with ordered data mode. Opts: acl,user_xattr
[   43.288203] EXT4-fs (md3): re-mounted. Opts: acl,user_xattr,commit=0

I knew about

cat /proc/mdstat

command and it also does not reveal anything out of ordinary:

cat /proc/mdstat 
Personalities : [raid1] [raid0] [raid10] [raid6] [raid5] [raid4] 
md3 : active raid1 sdb5[1] sda5[0]
      1931919224 blocks super 1.0 [2/2] [UU]
      bitmap: 3/15 pages [12KB], 65536KB chunk

md0 : active raid1 sdb1[1] sda1[0]
      96344 blocks super 1.0 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md1 : active raid1 sda2[0] sdb2[1]
      1951884 blocks super 1.0 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md2 : active raid1 sda3[0] sdb3[1]
      19534968 blocks super 1.0 [2/2] [UU]
      bitmap: 1/1 pages [4KB], 65536KB chunk

My system is OpenSUSE:

cat /etc/issue
Welcome to openSUSE 12.1 "Asparagus" RC 1  - Kernel \r (\l).

For completeness here are my partition tables for sda and sdb - > md:

# fdisk -l /dev/sda /dev/sdb

Disk /dev/sda: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00043930

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *          63      192779       96358+  fd  Linux raid autodetect
/dev/sda2          192780     4096574     1951897+  fd  Linux raid autodetect
/dev/sda3         4096575    43166654    19535040   fd  Linux raid autodetect
/dev/sda4        43167744  3907028991  1931930624    f  W95 Ext'd (LBA)
/dev/sda5        43169792  3907008511  1931919360   fd  Linux raid autodetect

Disk /dev/sdb: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0007b252

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *          63      192779       96358+  fd  Linux raid autodetect
/dev/sdb2          192780     4096574     1951897+  fd  Linux raid autodetect
/dev/sdb3         4096575    43166654    19535040   fd  Linux raid autodetect
/dev/sdb4        43167744  3907028991  1931930624    f  W95 Ext'd (LBA)
/dev/sdb5        43169792  3907008511  1931919360   fd  Linux raid autodetect

can anyone give me some hints on what is going wrong?

thank you

You can try to recreate your config file:


mv mdadm.conf mdadm.conf.old
mdadm --examine --scan >> mdadm.conf

I also suspect there might be a bug in blkid, so first update your OpenSUSE, as I noticed it’s at RC1.

For comparison

OLD:

 # cat mdadm.conf 
DEVICE containers partitions
ARRAY /dev/md/0 UUID=3be9cb66:3913cafa:a402a78b:84d5ca9a
ARRAY /dev/md/1 UUID=4ab789d5:54d23a90:b482cf0e:f587b941
ARRAY /dev/md/2 UUID=948af06b:caf993d4:a5887ee1:c7043c39
ARRAY /dev/md/3 UUID=206c6e19:7b14bb75:a351c4e4:c8e77d87

NEW:

 # mdadm --examine --scan
ARRAY /dev/md/0 metadata=1.0 UUID=3be9cb66:3913cafa:a402a78b:84d5ca9a name=linux.site:0
ARRAY /dev/md/1 metadata=1.0 UUID=4ab789d5:54d23a90:b482cf0e:f587b941 name=linux.site:1
ARRAY /dev/md/2 metadata=1.0 UUID=948af06b:caf993d4:a5887ee1:c7043c39 name=linux.site:2
ARRAY /dev/md/3 metadata=1.0 UUID=206c6e19:7b14bb75:a351c4e4:c8e77d87 name=linux.site:3

I’ll give it few reboots and time to see if this makes any difference…

Still no help. I had my doubts that the above would help. But it was worth to try.

There is little bit more going on:

# dmesg | grep -w md
[    0.000000] Kernel command line: root=/dev/disk/by-id/md-uuid-948af06b:caf993d4:a5887ee1:c7043c39 resume=/dev/disk/by-id/md-uuid-4ab789d5:54d23a90:b482cf0e:f587b941 splash=silent quiet vga=0x31a
[    1.422491] PM: Checking hibernation image partition /dev/disk/by-id/md-uuid-4ab789d5:54d23a90:b482cf0e:f587b941
[    3.772296] md: bind<sdb2>
[    3.775183] md: bind<sda3>
[    3.777703] md: bind<sdb3>
[    3.779887] md: raid1 personality registered for level 1
[    3.780100] md/raid1:md2: active with 2 out of 2 mirrors
[    3.791642] md: bind<sda2>
[    3.793490] md/raid1:md1: active with 2 out of 2 mirrors
[    4.250403] md: raid0 personality registered for level 0
[    4.253306] md: raid10 personality registered for level 10
[    4.952822] md: raid6 personality registered for level 6
[    4.952824] md: raid5 personality registered for level 5
[    4.952826] md: raid4 personality registered for level 4
[   10.053305] md: md0 stopped.
[   10.057398] md: bind<sdb1>
[   10.057564] md: bind<sda1>
[   10.073694] md/raid1:md0: active with 2 out of 2 mirrors
[   10.320274] boot.md[522]: Starting MD RAID mdadm: /dev/md/0 has been started with 2 drives.
[   10.609477] md: bind<sda5>
[   10.936257] systemd[1]: Job fsck@dev-disk-by\x2did-md\x2duuid\x2d206c6e19:7b14bb75:a351c4e4:c8e77d87.service/start failed with result 'dependency'.
[   11.028990] md: could not open unknown-block(8,21).
[   11.029060] md: md_import_device returned -16
[   11.255020] md: bind<sdb5>
[   11.723384] md/raid1:md3: active with 2 out of 2 mirrors
[   12.592269] systemd[1]: md.service: control process exited, code=exited status=3
[   12.688038] systemd[1]: Unit md.service entered failed state.
[   12.753286] boot.md[994]: Not shutting down MD RAID - reboot/halt scripts do this...missing
[   23.557887] boot.md[1048]: Starting MD RAID ..done

My last boot failed because of:

[   11.028990] md: could not open unknown-block(8,21).

However, the raid seems to be functioning properly after successful boot or when I hit ^d. Any other ideas on what to try.

First, did you update your system? Second, does the system boot successfully or not? Third, please paste your entire dmesg.

Yes system is up to date. I can successfully boot but sometimes I need to interfere and press ^D . The raid looks to be healthy after boot. Next time when I get error message I will post full dmesg.

LiLo, when you use the cat /proc/mdstat command what you are looking for is to make sure the two letters ‘[UU]’ are there. If you see something else there is a problem. On my server I see ‘[UUUU]’ for RAID 10. I tested this by removing a drive and then having it rebuild itself. It was cool to see the machine boot up successfully without a drive and all my data was there.

LiLo, when you use the cat /proc/mdstat command what you are looking for is to make sure the two letters ‘[UU]’ are there. If you see something else there is a problem. On my server I see ‘[UUUU]’ for RAID 10. I tested this by removing a drive and then having it rebuild itself. It was cool to see the machine boot up successfully without a drive and all my data was there.

thanks for this. I guess no problem there:


cat /proc/mdstat 
Personalities : [raid1] [raid0] [raid10] [raid6] [raid5] [raid4] 
md0 : active raid1 sdb1[1] sda1[0]
      96344 blocks super 1.0 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md3 : active raid1 sdb5[1] sda5[0]
      1931919224 blocks super 1.0 [2/2] [UU]
      bitmap: 3/15 pages [12KB], 65536KB chunk

md1 : active raid1 sda2[0] sdb2[1]
      1951884 blocks super 1.0 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md2 : active raid1 sda3[0] sdb3[1]
      19534968 blocks super 1.0 [2/2] [UU]
      bitmap: 1/1 pages [4KB], 65536KB chunk

unused devices: <none>