RAID (redundant array of
independent disks; originally redundant array of inexpensive disks) is
a way of storing the same data in different places (thus, redundantly) on
multiple hard disks. By placing data
on multiple disks, I/O (input/output) operations can overlap in a
balanced way, improving performance. Since multiple disks increases the mean
time between failures (MTBF), storing data redundantly also increases fault tolerance.
A RAID appears to the operating system to be a single logical hard
disk. RAID employs the technique of disk striping, which involves partitioning each
drive's storage space into units ranging from a sector (512 bytes) up to
several megabytes. The stripes of all the disks are interleaved and addressed
in order.
In a single-user system where large records, such as
medical or other scientific images, are stored, the stripes are typically set
up to be small (perhaps 512 bytes) so that a single record spans all disks
and can be accessed quickly by reading all disks at the same time.
In a multi-user system, better performance requires
establishing a stripe wide enough to hold the typical or maximum size record.
This allows overlapped disk I/O across drives.
There are at least nine types of RAID plus a
non-redundant array (RAID-0):
|
disk striping
|
- In computers that use
multiple hard disk systems, disk striping is the process of dividing a body
of data into blocks and spreading the data blocks across several partitions on several hard disks. Each stripe is the size of the
smallest partition. For example, if three partitions are selected with one
partition equaling 150megabytes, another 100MB, and the third 50MB,
each stripe will be 50 MB in size. It is wise to create the partitions equal
in size to prevent wasting disk space. Each stripe created is part of the
stripe set. Disk striping is used with redundant array of independent disks
(RAID). RAID is a storage system that uses multiple disks to store and
distribute data. Up to 32 hard disks can be used with disk striping.
There
are two types of disk striping: single user and multi-user. Single user disk
striping allows multiple hard disks to simultaneously service multiple I/O requests from a single workstation.
Multi-user disk striping allows multiple I/O requests from several
workstations to be sent to multiple hard disks. This means that while one
hard disk is servicing a request from a workstation, another hard disk is
handling a separate request from a different workstation.
Disk
striping is used with or without parity. When disk striping is used with parity,
an additional stripe that contains the parity information is stored on its
own partition and hard disk. If a hard disk fails, a fault tolerance driver
makes the lost partition invisible allowing reading and writing operations to
continue which provides time to create a new stripe set. Once a hard disk
fails, the stripe set is no longer fault tolerant, which means that if one or
more hard disks fail after the first one, the stripe set is lost. Disk
striping without parity provides no fault tolerance. The disk striping
process is used in conjunction with software that lets the user know when a disk
has failed. This software also allows the user to define the size of the
stripes, the color assigned to the stripe set for recognition and diagnosing,
and whether parity was used or not.
|
Additions to these pages are welcome !
- What does RAID stand for ?
In 1987, Patterson, Gibson and
Katz at the University
of California Berkeley ,
published a paper entitled "A Case for Redundant Arrays of Inexpensive
Disks (RAID)" . This paper described various types of disk arrays,
referred to by the acronym RAID. The basic idea of RAID was to combine multiple
small, inexpensive disk drives into an array of disk drives which yields
performance exceeding that of a Single Large Expensive Drive (SLED).
Additionally, this array of drives appears to the computer as a single logical
storage unit or drive.
The Mean Time Between Failure
(MTBF) of the array will be equal to the MTBF of an individual drive, divided
by the number of drives in the array. Because of this, the MTBF of an array of
drives would be too low for many application requirements. However, disk arrays
can be made fault-tolerant by redundantly storing information in various ways.
Five types of array architectures,
RAID-1 through RAID-5, were defined by the Berkeley paper, each providing disk
fault-tolerance and each offering different trade-offs in features and
performance. In addition to these five redundant array architectures, it has
become popular to refer to a non-redundant array of disk drives as a RAID-0
array.
- Data Striping
Fundamental to RAID is
"striping", a method of concatenating multiple drives into one
logical storage unit. Striping involves partitioning each drive's storage space
into stripes which may be as small as one sector (512 bytes) or as large as
several megabytes. These stripes are then interleaved round-robin, so that the
combined space is composed alternately of stripes from each drive. In effect,
the storage space of the drives is shuffled like a deck of cards. The type of
application environment, I/O or data intensive, determines whether large or
small stripes should be used.
Most multi-user operating
systems today, like NT, Unix and Netware, support overlapped disk I/O operations
across multiple drives. However, in order to maximize throughput for the disk
subsystem, the I/O load must be balanced across all the drives so that each
drive can be kept busy as much as possible. In a multiple drive system without
striping, the disk I/O load is never perfectly balanced. Some drives will
contain data files which are frequently accessed and some drives will only
rarely be accessed. In I/O intensive environments, performance is optimized by
striping the drives in the array with stripes large enough so that each record
potentially falls entirely within one stripe. This ensures that the data and
I/O will be evenly distributed across the array, allowing each drive to work on
a different I/O operation, and thus maximize the number of simultaneous I/O
operations which can be performed by the array.
In data intensive environments
and single-user systems which access large records, small stripes (typically
one 512-byte sector in length) can be used so that each record will span across
all the drives in the array, each drive storing part of the data from the
record. This causes long record accesses to be performed faster, since the data
transfer occurs in parallel on multiple drives. Unfortunately, small stripes
rule out multiple overlapped I/O operations, since each I/O will typically
involve all drives. However, operating systems like DOS which do not allow
overlapped disk I/O, will not be negatively impacted. Applications such as
on-demand video/audio, medical imaging and data acquisition, which utilize long
record accesses, will achieve optimum performance with small stripe arrays.
A potential drawback to using
small stripes is that synchronized spindle drives are required in order to keep
performance from being degraded when short records are accessed. Without
synchronized spindles, each drive in the array will be at different random
rotational positions. Since an I/O cannot be completed until every drive has
accessed its part of the record, the drive which takes the longest will
determine when the I/O completes. The more drives in the array, the more the
average access time for the array approaches the worst case single-drive access
time. Synchronized spindles assure that every drive in the array reaches its
data at the same time. The access time of the array will thus be equal to the
average access time of a single drive rather than approaching the worst case
access time.
- The different RAID levels
RAID-0
RAID Level 0 is not redundant, hence
does not truly fit the "RAID" acronym. In level 0, data is split
across drives, resulting in higher data throughput. Since no redundant
information is stored, performance is very good, but the failure of any disk in
the array results in data loss. This level is commonly referred to as striping.
RAID-1
RAID Level 1 provides redundancy by
writing all data to two or more drives. The performance of a level 1 array
tends to be faster on reads and slower on writes compared to a single drive,
but if either drive fails, no data is lost. This is a good entry-level redundant
system, since only two drives are required; however, since one drive is used to
store a duplicate of the data, the cost per megabyte is high. This level is
commonly referred to as mirroring.
RAID-2
RAID Level 2, which uses Hamming
error correction codes, is intended for use with drives which do not have
built-in error detection. All SCSI drives support built-in error detection, so
this level is of little use when using SCSI drives.
RAID-3
RAID Level 3 stripes data at a byte
level across several drives, with parity stored on one drive. It is otherwise
similar to level 4. Byte-level striping requires hardware support for efficient
use.
RAID-4
RAID Level 4 stripes data at a block
level across several drives, with parity stored on one drive. The parity
information allows recovery from the failure of any single drive. The
performance of a level 4 array is very good for reads (the same as level 0).
Writes, however, require that parity data be updated each time. This slows
small random writes, in particular, though large writes or sequential writes
are fairly fast. Because only one drive in the array stores redundant data, the
cost per megabyte of a level 4 array can be fairly low.
RAID-5
RAID Level 5 is similar to level 4,
but distributes parity among the drives. This can speed small writes in
multiprocessing systems, since the parity disk does not become a bottleneck.
Because parity data must be skipped on each drive during reads, however, the
performance for reads tends to be considerably lower than a level 4 array. The
cost per megabyte is the same as for level 4.
Summary:
- RAID-0 is the fastest and most
efficient array type but offers no fault-tolerance.
- RAID-1 is the array of choice
for performance-critical, fault-tolerant environments. In addition,
RAID-1 is the only choice for fault-tolerance if no more than two drives
are desired.
- RAID-2 is seldom used today
since ECC is embedded in almost all modern disk drives.
- RAID-3 can be used in data
intensive or single-user environments which access long sequential
records to speed up data transfer. However, RAID-3 does not allow
multiple I/O operations to be overlapped and requires
synchronized-spindle drives in order to avoid performance degradation
with short records.
- RAID-4 offers no advantages over
RAID-5 and does not support multiple simultaneous write operations.
- RAID-5 is the best choice in
multi-user environments which are not write performance sensitive.
However, at least three, and more typically five drives are required for
RAID-5 arrays.
- Possible aproaches to RAID
- Hardware RAID
The hardware based system manages the RAID subsystem independently from the host and presents to the host only a single disk per RAID array. This way the host doesn't have to be aware of the RAID subsystems(s). - The controller based hardware
solution
DPT's SCSI controllers are a good example for a controller based RAID solution.
The intelligent contoller manages the RAID subsystem independently from the host. The advantage over an external SCSI---SCSI RAID subsystem is that the contoller is able to span the RAID subsystem over multiple SCSI channels and and by this remove the limiting factor external RAID solutions have: The transfer rate over the SCSI bus. - The external hardware solution
(SCSI---SCSI RAID)
An external RAID box moves all RAID handling "intelligence" into a contoller that is sitting in the external disk subsystem. The whole subsystem is connected to the host via a normal SCSI controller and apears to the host as a single or multiple disks.
This solution has drawbacks compared to the contoller based solution: The single SCSI channel used in this solution creates a bottleneck.
Newer technologies like Fiber Channel can ease this problem, especially if they allow to trunk multiple channels into a Storage Area Network.
4 SCSI drives can already completely flood a parallel SCSI bus, since the average transfer size is around 4KB and the command transfer overhead - which is even in Ultra SCSI still done asynchonously - takes most of the bus time. - Software RAID
- The MD driver in the Linux
kernel is an example of a RAID solution that is completely hardware
independent.
The Linux MD driver supports currently RAID levels 0/1/4/5 + linear mode. - Under Solaris you have the
Solstice DiskSuite and Veritas Volume Manager which offer RAID-0/1 and
5.
- Adaptecs AAA-RAID controllers
are another example, they have no RAID functionality whatsoever on the
controller, they depend on external drivers to provide all external RAID
functionality.
They are basically only multiple single AHA2940 controllers which have been integrated on one card. Linux detects them as AHA2940 and treats them accordingly.
Every OS needs its own special driver for this type of RAID solution, this is error prone and not very compatible. - Hardware vs. Software RAID
Just like any other application, software-based arrays occupy host system memory, consume CPU cycles and are operating system dependent. By contending with other applications that are running concurrently for host CPU cycles and memory, software-based arrays degrade overall server performance. Also, unlike hardware-based arrays, the performance of a software-based array is directly dependent on server CPU performance and load.
Except for the array
functionality, hardware-based RAID schemes have very little in common with
software-based implementations. Since the host CPU can execute user
applications while the array adapter's processor simultaneously executes the
array functions, the result is true hardware multi-tasking. Hardware arrays
also do not occupy any host system memory, nor are they operating system
dependent.
Hardware arrays are also highly
fault tolerant. Since the array logic is based in hardware, software is NOT
required to boot. Some software arrays, however, will fail to boot if the boot
drive in the array fails. For example, an array implemented in software can
only be functional when the array software has been read from the disks and is
memory-resident. What happens if the server can't load the array software
because the disk that contains the fault tolerant software has failed?
Software-based implementations commonly require a separate boot drive, which is
NOT included in the array.
- What are the advantages of a
multichannel contoller ?
- Hardware vs. Software caching ?
No comments:
Post a Comment