Linux Software RAID On Trial

Pages

About Me

Links

Tags

PERSONAL 520

SPIRITUAL 416

LDS 312

BOOK OF MORMON 237

SCRIPTURES 154

STUDIO-JOURNEY 129

RELIGION 112

LINUX 79

COMPUTERS 65

LIFE 60

GENERAL CONFERENCE 46

GENTOO 39

MISCELLANEOUS 37

MUSIC 37

PROGRAMMING 33

CARS 29

MICROSOFT 23

FAMILY 23

AUDIO 21

I LOVE MY JOURNAL 18

FUN 15

CHILDREN 12

CURRENT EVENTS 10

NATURE'S WAY 10

VIDEO 9

DRM 9

CONEXM 7

BABBLINGS 7

PROVO CITY CENTER TEMPLE 6

FRIENDS 6

HEROD THE FINK 5

GAMES 5

COMPUTER HARDWARE 5

DRUMS 4

HAND OF GOD 3

ADVERSITY 3

KDENLIVE 3

AUDIO HARDWARE 3

GENERAL INSANITY 3

STUDIO 3

THANKS4GIVING 2

CATS 2

MY JOURNAL 1

POETRY 1

FOREVERGREEN 1

EVERYDAY THOUGHTS 1

GOSPEL 1

PARENTING 1

YOUTH CONFERENCE 1

CHURCH NOTES 1

POLITICS 1

RSS Feed

Subscribe!

Wed - Apr 02, 2008 : 02:33 pm

amazed

rated 0 times

>>next>>

<<previous<<

Linux Software RAID On Trial

Here at work, the time is fast coming where I will be required to put live, the server I've been working on for the past year.

And we all know that when the rubber meets the road, things get a little bit more serious.

I came to the conclusion that since I'm the only Linux person in this company, should the Linux servers ever decide to go *kaput*, there would be only one person to blame, and quite frankly, I never want to deal with that situation.

The servers we have are IBM cheapies, so I didn't have any hardware RAID controllers at my disposal.

I have heard really good things about the Linux software RAID, so I decided to use the two available hard drive bays as mirrors in a RAID-1 configuration using Linux software RAID (md) on gentoo.

Setting up the RAID was easy. I had that done about 10 months ago.

Well... I hadn't realized until now that no testing had been done on the RAID at all, so in event of a hard drive failure, I wouldn't even know it had happened.

So, yesterday, I got the mdadm monitoring system in place by putting the following command in a cron job which runs every 15 minutes: mdadm --monitor --scan -1

I also made sure a valid email address was placed in the /etc/mdadm.conf file.

Once I saw that the monitoring system was running (by using the --test option), we decided to go to the server and pull a hard drive out while the system was running.

Before I pulled it out, I catted the /proc/mdstat node, and it gave me the "we're all good" sign. Here's the output:

JDEV php5 # cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb1[1] sda1[0]
      104320 blocks [2/2] [UU]

md2 : active raid1 sdb2[1] sda2[0]
      3911744 blocks [2/2] [UU]

md3 : active raid1 sdb3[1] sda3[0]
      240179712 blocks [2/2] [UU]

unused devices: <none>

We did so, and when I came back, I had two emails waiting for me which told me a hard drive had failed.

The server was still 100% functional and had no evidence of any stuttering or stalling whatsoever.

This is what the readout looked like when I catted /proc/mdstat:

JDEV php5 # cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb1[1] sda1[0]
      104320 blocks [2/2] [UU]

md2 : active raid1 sdb2[2](F) sda2[0]
      3911744 blocks [2/1] [U_]

md3 : active raid1 sdb3[2](F) sda3[0]
      240179712 blocks [2/1] [U_]

unused devices: <none>

That tells me that the sdb device had failed.

There was something interesting, however, which I didn't anticipate. If you'll look at the first block, the md1 raid block seemed to be doing just fine.

After a bit of pondering, I figured that since that whole block wasn't even mounted, it couldn't have known about the removed drive.

As soon as I mounted /dev/md1, the software recognized the missing drive, and it was assigned as a failed drive.

So, all three blocks told me the /dev/sdb disc was having problems.

In order to get rid of the failed status, and get the system ready to accept a new hard drive, I issued the following commands:

JDEV dev # mdadm /dev/md3 -r detached
mdadm: hot removed 8:19

JDEV dev # mdadm /dev/md1 --remove detached
mdadm: hot removed 8:17

JDEV dev # mdadm /dev/md2 -r detached
mdadm: hot removed 8:18

-r and --remove are the same thing.

After issuing that command, the /proc/mdstat looked like this:

JDEV dev # cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda1[0]
      104320 blocks [2/1] [U_]

md2 : active raid1 sda2[0]
      3911744 blocks [2/1] [U_]

md3 : active raid1 sda3[0]
      240179712 blocks [2/1] [U_]

unused devices: <none>

So, there are no mirrored partitions at all, now.

After that, I put the removed hard drive back in (the hard drive wasn't really bad, so I put the same one back in)

As soon as I put it back in, the /dev directory assured me that the system recognized it and had hot-plugged it back in.

At this point, if it were a new unformatted disc, I would have to partition it so as to match the existing drive's partitions.

Then, I issued the following commands to re-add the drive's partitions to their respective mirrors:

JDEV dev # mdadm /dev/md3 --add /dev/sdb3
mdadm: re-added /dev/sdb3
JDEV dev # mdadm /dev/md2 --add /dev/sdb2
mdadm: re-added /dev/sdb2
JDEV dev # mdadm /dev/md1 --add /dev/sdb1
mdadm: re-added /dev/sdb1

And all looked good!

after syncing for about 30 minutes, this is what the /proc/mdstat node looks like:

JDEV dev # cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb1[2] sda1[0]
      104320 blocks [2/1] [U_]
          resync=DELAYED

md2 : active raid1 sdb2[2] sda2[0]
      3911744 blocks [2/1] [U_]
          resync=DELAYED

md3 : active raid1 sdb3[2] sda3[0]
      240179712 blocks [2/1] [U_]
      [=======>.............] recovery = 35.5% (85384448/240179712) finish=44.4min speed=58045K/sec

unused devices: <none>

And that, my friend is pretty darned slick.

Linux software RAID-1 can handle a failed disk, removing the disk, adding a new one, and syncing the two, all without rebooting the machine, or having any downtime at all.

Computer Hardware / Computers / Linux / Programming

Comment by anonymous on Apr. 03, 2008 @ 11:01 am

Finally. I've been wondering about this for quite awhile. Good to know.

Comment by romild0 on Apr. 17, 2008 @ 10:05 am

w0rd up.

Comment by anonymous on Jul. 23, 2009 @ 11:15 am

Linux RAID can also do this

http://www.purplefrog.com/~thoth/philosophy/raid.html

Although I would recommend that you find an easier way to manage your storage.