RAID: Improve your data redundancy and performance

RAID (redundant array of independent disks) is the method of combining multiple physical disk drives into a single unit for data storage. The idea of RAID is to have a number of disks co-operating as one big disk to ensure data redundancy and performance improvement.

In 1987 at the University of California, Berkeley David Patterson, Garth A. Gibson, and Randy Katz created the approach of Redundant Array of Inexpensive Disks. Its design ensured large storage capacity using smaller disks instead of very expensive and reliable ones.

Today the initial concept has altered a bit since prices for hard disks are now a way cheaper than in the 80s and it is not a problem to spend money for a single 1 TB disk than many smaller disks. That is why today RAID is described as “Redundant Array of Independent Disks”.

What do we achieve with RAID?

01.

Increased Speed

With many drives working together data write and read speed increases. It gives you a possibility to quickly operate with large volumes of data.

02.

Fault Tolerance and Better Availability

Redundancy achieved with RAID ensures a much more reliable storage system. The data is stored in multiple disks, so in most cases when one of them fails, others will ensure your data integrity. At the same time fault tolerance in conjunction with particular RAID features improve availability allowing recovery from hardware faults without disruption.

03.

Data Security

Thanks to the redundancy, most RAID levels provide protection for the data stored in the array.

04.

Increased Capacity

All RAID levels allow combining a number of smaller drives into a larger array, which means that you also combine their capacity.

06.

Advanced Performance

All aforementioned RAID benefits combined with each RAID level specific features improve performance of the software.

RAID can be implemented as:

A layer that abstracts multiple devices providing a single virtual device.
A generic logical volume manager, provided with the majority of server-class operating systems.
A component of the file system (ZFS, Btrfs, etc).
A layer above any file system providing equal protection to users’ data.

Software and Hardware RAID

These are two approaches in RAID implementation.

A software RAID uses ordinary disk drives and controllers, while the OS display them as a single device to users or applications.

A hardware RAID uses, obviously, hardware to unify multiple devices in a single device for the operating system.

A hardware RAID is more expensive (due to the extra hardware that you need to purchase), much faster, and usually more robust. Some hardware RAID levels envision the replacement of failed drives without removing power. Conversely, the cheap software RAID can impair the host computer, which might be the cause of poor performance. This is due to its need to process the data before it is written to disk in order to determine where each piece of data should be allocated.

A software RAID is more likely to experience data corruption, that a hardware RAID. It is due to the fault of the RAID software or driver that is being used. A software RAID can also be affected if the host computer is heavily loaded that can cause some pieces of data delayed by a small amount of time. These delays can add up, and negate the benefits of the RAID array to some degree.

Which one is better? There is no answer for that. You can compare their pros and cons considering the following statements to decide which one is the most suited for your project:

The hardware RAID requires specialized hardware to handle the drives, when the software one works “virtually”.
The software RAID is much more cheaper than the hardware RAID.
The software RAID takes up a portion of the host processor.
The hardware RAID offers better reliability compared to the software RAID.

RAID Level	Description
Standard RAID levels
RAID 0	This RAID level is based on stripping and doesn’t provide fault tolerance. But it increases the system’s performance (high read and write speed). All the data in RAID 0 will be lost if one drive fails (a minimum of 2 disks is required for RAID 0). You can add additional drives to RAID 0 to increase its performance even more, but with that the risk of failure increases as well.
RAID 1	RAID 1 utilizes mirroring technique and does not utilize stripping. Read performance is improved since either disk can be read simultaneously and write performance is the same as for single disk storage. This level provides fault tolerance in the loss of no more than one disk.
RAID 5	RAID 5 utilizes striping and parity techniques. The parity information is striped across each drive, allowing the array to function even if one drive fails. The array’s architecture allows read and write operations to span multiple drives. This results in better performance compared to the one of a single drive, but not as high as that of a RAID 0 array. RAID 5 requires at least three disks. However, it is often recommended to use five disks or more to achieve a great performance. Write performance of RAID 5 is relatively poor because of the extra time required to write parity data.
RAID 6	Similar to RAID 5, but uses a second parity function. Additional parity allows the array to continue functioning even if two disks fail simultaneously. The read speed is the same as in RAID 5. However, this extra protection requires a higher cost per 1GB and often has slower write performance compared to RAID 5 arrays.
Nested RAID levels
RAID 10	It offers maximum performance without compromising redundancy. Based on the combination of striping and mirroring techniques, this RAID level combines RAID 0 performance and RAID 1 fault tolerance. It requires a minimum of 4 disks and only half of the disk space is usable due to mirroring.
RAID 50	A minimum of 6 disks is required for RAID 50. RAID 50 couples RAID 5 distributed parity with RAID 0 striping. RAID 50 improves upon the performance of RAID 5 particularly during write and provides better fault tolerance than a single RAID level does. You may lose up to 33% of total raw capacity, depending on how you create your volumes.
RAID 60	RAID 60 requires a minimum of 8 disks and provides very high levels of availability since you can lose two disks in each RAID 6 array and remain functional. It is rather expensive. RAID 60 can result in capacity overhead, and it also carries a hefty write penalty.

Different RAID levels have their advantages and disadvantages, but their help in terms of fast, secure and redundant data storage capacity is incontestable. If you want to make the loss of data happen less often, to get more storage space, to get more flexibility and to get the data more quickly do not hesitate to use RAID technology.

RAID: What is so helpful about it?

What do we achieve with RAID?

Increased Speed

Fault Tolerance and Better Availability

Data Security

Increased Capacity

Advanced Performance

Software and Hardware RAID

Related posts

Infrastructure as Code for AWS: CloudFormation

Microservices: Ease of Development

SAN vs. NAS storage: The Difference is Simple