RAID is a set of storage virtualization techniques that distribute data across multiple hard drives to improve either performance, security, or fault tolerance for the entire system(s). Since its inception, the main feature of RAID architecture has been its ability to combine multiple low-cost storage devices and common technology into a single array, so that this grouping offers increased reliability and/or performance at a much lower cost than a single storage device of equivalent capacity using state-of-the-art technology.
Types of RAID
In software RAID, RAID control is provided entirely by a software layer of the operating system. This layer is inserted between the hardware abstraction layer (driver) and the file system layer.
- This is the least expensive method since it does not require any additional equipment.
- This method has a great flexibility of administration (software).
- This method has the advantage of compatibility between all machines equipped with the same RAID software (i.e. the same operating system).
- The major drawback is that this method relies on the hardware abstraction layer of the devices that make up the RAID volume. For a variety of reasons, this layer may be imperfect and lack important features such as hardware fault detection and diagnosis and/or support for hot-swap of storage units.
- Software management of RAID monopolizes system resources (slightly the processor and especially the system bus) that could be used for other purposes. The drop in performance due to software RAID management is particularly noticeable in configurations where the system has to transfer the same data multiple times, such as RAID 1, and quite small in non-redundant configurations, typically RAID 0.
- Using software RAID on the system disk is not always possible.
- Most consumer operating systems already support software RAID, whether it’s Microsoft Windows, various Linux distributions, or Mac OS X.
- Microsoft Windows XP (and higher) supports software RAID 0 and 1, and can handle RAID 5 with a small adaptation.
- Microsoft Windows 2003 Server handles RAID 0, 1, and 5 software.
- Mac OS X handles RAID 0, 1, and concatenation software.
- The Linux kernel (>=2.6) supports RAID 0, 1, 4, 5, 6, and 10, and combinations of these modes.
In the case of hardware RAID, a card or component is assigned to manage operations. The RAID controller can be internal to the CPU (expansion card) or remote in a storage array.
A raid controller usually has a specific processor, dedicated memory, possibly a battery backup, and is able to manage all aspects of the RAID storage system through embedded firmware. From an operating system perspective, the hardware RAID controller provides full virtualization of the storage system. The operating system treats each RAID volume as a unique storage volume and is not aware of its physical constituents.
- Hardware RAID controllers enable fault detection, hot swapping of failed units, and the ability to seamlessly rebuild failed units. (But advanced operating systems also allow this if the hardware allows it.)
- The system load (mainly bus occupancy) is lightened (this is especially felt in the case of configurations with many disks and high redundancy).
- Consistency checking, maintenance diagnostics are performed in the background by the controller without requiring system resources.
- Hardware RAID controllers each use their own system to manage storage devices. As a result, unlike software RAID, disks transferred from one system to another cannot be recovered if the RAID controller is not exactly the same (including firmware). It is therefore advisable to have a second card in case of failure of the first. This is not always true, at least in raid 1, the disks, redundant mirrored, can be used mounted on separate units and thus allow their fine hardware control for example (this is to be checked for each hardware configuration).
- Entry-level boards have processors that are much less powerful than today’s computers. We can therefore have much worse performance for a higher price than a software RAID.
- The cost: the entry-level is around $300 but more powerful cards can often exceed $1,000.
- The RAID controller itself is a hardware component, which can fail. Its software (firmware) may contain errors, which is another risk of potential failure (single-point-of-failure). (Nevertheless, the frequently ten-year warranty of this type of hardware encourages manufacturers to pay close attention to writing firmware, which reduces this risk in practice.)
- Different manufacturers of RAID controllers provide software management tools that are very different from each other (and sometimes of uneven quality). In contrast, the software RAID management tools that come with an operating system are generally well integrated into that system.
- The duration of support for a RAID controller by its manufacturer (fixing bugs in the firmware, for example), sometimes related to the arrival of new products making old ones obsolete, may be shorter or more volatile than the support of software RAID by the operating system vendor. The manufacturer may even disappear (which is rarer among operating system manufacturers).
- Less flexibility compared to software RAID, which has an abstraction layer to manage RAID on top of all types of block devices supported by the operating system, local or remote (ATA, SCSI, ATA over Ethernet, iSCSI… and all possible combinations between them). RAID controllers are specialized for a single type of block device.
Levels of RAID
RAID 0: Striping
Don’t count as RAID system. It is adding hard drives in parallel.
RAID 1: Mirroring
Commonly used. RAID 1 consists of the use of more than 2 redundant disks, each disk in the pool contains exactly the same data at all times. This solution offers an excellent level of data protection. It works as long as there is at least one working disk left. By connecting the components such as SATA, reading speed can be doubled.
RAID 5: Striping with distributed parity information
RAID 5 combines striping with striped parity. This is a set to redundancy N+1. Parity, which is included with each write, is distributed circularly on the different disks. Each band is therefore made up of N blocks of data and a parity block. Thus, if one of the disks in the pool fails, for each tape either a data block or the parity block will be missing. If it is the parity block, it does not matter, because no data is missing. If it is a block of data, its contents can be calculated from the N-1 other data blocks and parity block. The integrity of the data in each tape is preserved. So not only is the array still in working order, but it is also possible to rebuild the disk once swapped from the data and parity information contained on the other disks.
RAID 6: Striping with doubly distributed parity information
RAID 6 is an evolution of RAID 5 that increases security by using n redundant information instead of one. It can therefore withstand the failure of n discs. The mathematical basis used for RAID 6 redundancy information is much more complex than for RAID 5.
RAID 10: RAID 0 across multiple RAID 1
It makes it possible to obtain a volume aggregated by tape with a good level of reliability (since it is based on replicated clusters). Each bundle contains a minimum of two elements and a minimum of two arrays and requires a minimum of four storage units to create a RAID 1+0 volume.
Its reliability is quite high since all the elements of a cluster must be defective to cause an overall defect. The reconstruction is quite efficient since it mobilizes only the disks of a single disk and not the totality.
To enable RAID 1 (software or hardware RAID), you need two SSD drives and two spinning HDDs on your dedicated server. A RAID array is an indispensable tool for websites that need to care about uptime. These days, within $100/month charges, you’ll get this kind of dedicated server with a minor compromise on other specification.