Hi,
What are the basic commands that one could use to check a health of RAID1? What happens if one disk fails? Will the system trigger some error message? Or user needs to continually monitor log files?
thank you
Hi,
What are the basic commands that one could use to check a health of RAID1? What happens if one disk fails? Will the system trigger some error message? Or user needs to continually monitor log files?
thank you
check /proc/mdstat for health check
You can always create a cron job to mail you periodically about the status of the drives.
The reason I asked is because I’m frequently getting this message at the boot which stops system from booting and I need to enter ^D to continue to boot or enter root password and reboot.
dmesg | grep md3
[ 12.545589] md/raid1:md3: active with 2 out of 2 mirrors
[ 12.558315] created bitmap (15 pages) for device md3
[ 12.568714] md3: bitmap initialized from disk: read 1/1 pages, set 0 of 29479 bits
[ 12.617839] md3: detected capacity change from 0 to 1978285285376
[ 12.688416] md3: unknown partition table
[ 14.869197] udevd[519]: '/sbin/blkid -o udev -p /dev/md3' [961] terminated by signal 15 (Terminated)
==> here it stops booting... ^D to continue to boot or enter root password
[ 31.253200] systemd-fsck[1022]: /dev/md3: clean, 1979/120750080 files, 10850286/482979806 blocks
[ 32.151784] EXT4-fs (md3): mounted filesystem with ordered data mode. Opts: acl,user_xattr
[ 43.288203] EXT4-fs (md3): re-mounted. Opts: acl,user_xattr,commit=0
I knew about
cat /proc/mdstat
command and it also does not reveal anything out of ordinary:
cat /proc/mdstat
Personalities : [raid1] [raid0] [raid10] [raid6] [raid5] [raid4]
md3 : active raid1 sdb5[1] sda5[0]
1931919224 blocks super 1.0 [2/2] [UU]
bitmap: 3/15 pages [12KB], 65536KB chunk
md0 : active raid1 sdb1[1] sda1[0]
96344 blocks super 1.0 [2/2] [UU]
bitmap: 0/1 pages [0KB], 65536KB chunk
md1 : active raid1 sda2[0] sdb2[1]
1951884 blocks super 1.0 [2/2] [UU]
bitmap: 0/1 pages [0KB], 65536KB chunk
md2 : active raid1 sda3[0] sdb3[1]
19534968 blocks super 1.0 [2/2] [UU]
bitmap: 1/1 pages [4KB], 65536KB chunk
My system is OpenSUSE:
cat /etc/issue
Welcome to openSUSE 12.1 "Asparagus" RC 1 - Kernel \r (\l).
For completeness here are my partition tables for sda and sdb - > md:
# fdisk -l /dev/sda /dev/sdb
Disk /dev/sda: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00043930
Device Boot Start End Blocks Id System
/dev/sda1 * 63 192779 96358+ fd Linux raid autodetect
/dev/sda2 192780 4096574 1951897+ fd Linux raid autodetect
/dev/sda3 4096575 43166654 19535040 fd Linux raid autodetect
/dev/sda4 43167744 3907028991 1931930624 f W95 Ext'd (LBA)
/dev/sda5 43169792 3907008511 1931919360 fd Linux raid autodetect
Disk /dev/sdb: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0007b252
Device Boot Start End Blocks Id System
/dev/sdb1 * 63 192779 96358+ fd Linux raid autodetect
/dev/sdb2 192780 4096574 1951897+ fd Linux raid autodetect
/dev/sdb3 4096575 43166654 19535040 fd Linux raid autodetect
/dev/sdb4 43167744 3907028991 1931930624 f W95 Ext'd (LBA)
/dev/sdb5 43169792 3907008511 1931919360 fd Linux raid autodetect
can anyone give me some hints on what is going wrong?
thank you
You can try to recreate your config file:
mv mdadm.conf mdadm.conf.old
mdadm --examine --scan >> mdadm.conf
I also suspect there might be a bug in blkid, so first update your OpenSUSE, as I noticed it’s at RC1.
For comparison
OLD:
# cat mdadm.conf
DEVICE containers partitions
ARRAY /dev/md/0 UUID=3be9cb66:3913cafa:a402a78b:84d5ca9a
ARRAY /dev/md/1 UUID=4ab789d5:54d23a90:b482cf0e:f587b941
ARRAY /dev/md/2 UUID=948af06b:caf993d4:a5887ee1:c7043c39
ARRAY /dev/md/3 UUID=206c6e19:7b14bb75:a351c4e4:c8e77d87
NEW:
# mdadm --examine --scan
ARRAY /dev/md/0 metadata=1.0 UUID=3be9cb66:3913cafa:a402a78b:84d5ca9a name=linux.site:0
ARRAY /dev/md/1 metadata=1.0 UUID=4ab789d5:54d23a90:b482cf0e:f587b941 name=linux.site:1
ARRAY /dev/md/2 metadata=1.0 UUID=948af06b:caf993d4:a5887ee1:c7043c39 name=linux.site:2
ARRAY /dev/md/3 metadata=1.0 UUID=206c6e19:7b14bb75:a351c4e4:c8e77d87 name=linux.site:3
I’ll give it few reboots and time to see if this makes any difference…
Still no help. I had my doubts that the above would help. But it was worth to try.
There is little bit more going on:
# dmesg | grep -w md
[ 0.000000] Kernel command line: root=/dev/disk/by-id/md-uuid-948af06b:caf993d4:a5887ee1:c7043c39 resume=/dev/disk/by-id/md-uuid-4ab789d5:54d23a90:b482cf0e:f587b941 splash=silent quiet vga=0x31a
[ 1.422491] PM: Checking hibernation image partition /dev/disk/by-id/md-uuid-4ab789d5:54d23a90:b482cf0e:f587b941
[ 3.772296] md: bind<sdb2>
[ 3.775183] md: bind<sda3>
[ 3.777703] md: bind<sdb3>
[ 3.779887] md: raid1 personality registered for level 1
[ 3.780100] md/raid1:md2: active with 2 out of 2 mirrors
[ 3.791642] md: bind<sda2>
[ 3.793490] md/raid1:md1: active with 2 out of 2 mirrors
[ 4.250403] md: raid0 personality registered for level 0
[ 4.253306] md: raid10 personality registered for level 10
[ 4.952822] md: raid6 personality registered for level 6
[ 4.952824] md: raid5 personality registered for level 5
[ 4.952826] md: raid4 personality registered for level 4
[ 10.053305] md: md0 stopped.
[ 10.057398] md: bind<sdb1>
[ 10.057564] md: bind<sda1>
[ 10.073694] md/raid1:md0: active with 2 out of 2 mirrors
[ 10.320274] boot.md[522]: Starting MD RAID mdadm: /dev/md/0 has been started with 2 drives.
[ 10.609477] md: bind<sda5>
[ 10.936257] systemd[1]: Job fsck@dev-disk-by\x2did-md\x2duuid\x2d206c6e19:7b14bb75:a351c4e4:c8e77d87.service/start failed with result 'dependency'.
[ 11.028990] md: could not open unknown-block(8,21).
[ 11.029060] md: md_import_device returned -16
[ 11.255020] md: bind<sdb5>
[ 11.723384] md/raid1:md3: active with 2 out of 2 mirrors
[ 12.592269] systemd[1]: md.service: control process exited, code=exited status=3
[ 12.688038] systemd[1]: Unit md.service entered failed state.
[ 12.753286] boot.md[994]: Not shutting down MD RAID - reboot/halt scripts do this...missing
[ 23.557887] boot.md[1048]: Starting MD RAID ..done
My last boot failed because of:
[ 11.028990] md: could not open unknown-block(8,21).
However, the raid seems to be functioning properly after successful boot or when I hit ^d. Any other ideas on what to try.
First, did you update your system? Second, does the system boot successfully or not? Third, please paste your entire dmesg.
Yes system is up to date. I can successfully boot but sometimes I need to interfere and press ^D . The raid looks to be healthy after boot. Next time when I get error message I will post full dmesg.
LiLo, when you use the cat /proc/mdstat command what you are looking for is to make sure the two letters ‘[UU]’ are there. If you see something else there is a problem. On my server I see ‘[UUUU]’ for RAID 10. I tested this by removing a drive and then having it rebuild itself. It was cool to see the machine boot up successfully without a drive and all my data was there.
LiLo, when you use the cat /proc/mdstat command what you are looking for is to make sure the two letters ‘[UU]’ are there. If you see something else there is a problem. On my server I see ‘[UUUU]’ for RAID 10. I tested this by removing a drive and then having it rebuild itself. It was cool to see the machine boot up successfully without a drive and all my data was there.
thanks for this. I guess no problem there:
cat /proc/mdstat
Personalities : [raid1] [raid0] [raid10] [raid6] [raid5] [raid4]
md0 : active raid1 sdb1[1] sda1[0]
96344 blocks super 1.0 [2/2] [UU]
bitmap: 0/1 pages [0KB], 65536KB chunk
md3 : active raid1 sdb5[1] sda5[0]
1931919224 blocks super 1.0 [2/2] [UU]
bitmap: 3/15 pages [12KB], 65536KB chunk
md1 : active raid1 sda2[0] sdb2[1]
1951884 blocks super 1.0 [2/2] [UU]
bitmap: 0/1 pages [0KB], 65536KB chunk
md2 : active raid1 sda3[0] sdb3[1]
19534968 blocks super 1.0 [2/2] [UU]
bitmap: 1/1 pages [4KB], 65536KB chunk
unused devices: <none>