1

I have a workstation with four 2 TB SAS disks. I do data analysis with Python on Linux.

Is it better to use RAID 0, or install Linux on one disk and simply mount the other hard disks? I have several terabytes of data that I use frequently. I know that RAID could sometimes fail, and in that situation, I will not be able to access the data using RAID 0. From this perspective, it seems that mounting other disks without RAID is better. At least part of the data could be accessed.

What are the pros and cons of each choice?

Are there better choices?

3
  • 4
    RAID could sometimes fail no, drives can fail, so if a drive fails in your scenario, you either lose any access to your data (RAID 0) or access to the data on the failed drive (no RAID) - so, there really is not much difference between RAID 0 and no RAID regarding loss of data (or inability to even boot if the failed drive is the OS drive) - if you need fault tolerance, consider one of the other RAID modes Commented Jul 7 at 23:17
  • 1
    "Is it better" .... for you is not a question we can answer.
    – symcbean
    Commented Jul 7 at 23:25
  • @symcbean - exactly. I have data on RAID0 for convenience (and simplified reuse of disks). If anything crashes then bad luck but I do not car as I can recreate the data.
    – WoJ
    Commented Jul 8 at 9:32

5 Answers 5

2

Pros:

  • You have a single volume, so you don't need to decide "I'll put this data on this drive, and that data on that drive" and make sure each group fits within a drive
  • Reads and writes can be faster as they are spread among several disks
  • Compared to other RAID levels, you get the full capacity of the drives

Cons:

  • As soon as one drive goes bad you lose everything

RAID-0 can be useful in some pretty specific scenarios:

  • It's only for temporary storage while you process files (you stil have the original inputs somewhere else, and you can regenerate the data at will)
  • You have redundancy at some other layer (e.g. you have a several mirrored nodes, so if you lose any of them it doesn't matter)
  • It's read-only data and you have a backup
5

All storage media fail, whether in arrays or not. One of the jobs of an IT person is to know what to do when that happens, guided by on business continuity objectives. Depending on the situation: it is part of an array that keeps going just replace the disk, restore from backup, or rebuild and recover from loss.

The zero in RAID 0 is the number of redundant drives that can fail before data loss. (Note this is a half-serious joke based on what number happened to be associated with a weird not-redundant variant. RAID level numbers don't really encode any meaning, read the documentation of your actual array.) In other words, all drives must work to get at your data, less reliable than just one. Do not use RAID 0. You could simply buy one faster or bigger drive and not bother with an array. Or build an array that can survive a loss.

Not bothering with an array might be reasonable if taking hours or longer to recover is acceptable. Perhaps on some workstation you are processing data in scratch spaces. Starting simple with one file system per drive. Losing a drive might be the loss of a significant number of files. Costing processing time and maybe restores from backup, annoying but maybe recoverable. Yet this loss of productivity is still expensive.

When hours of downtime due to disk failure is just too long, redundant arrays keep on going. For example, 4 or more disks in a RAID 6. Any two disks can fail and you still have the data, an amazing trick. At the cost of a little complexity, performance, and capacity. But be sure to replace the failed disk!

Operating system versus data volumes is somewhat a matter of preference. Separating the OS complicates storage management but could be useful: OS upgrades or moving data around without touching the other. Could be done with OS on a small internal disk separate from data disks. Or the OS on a small LVM logical volume, leaving space for data LVs.

2
  • 1
    "The zero in RAID 0 is the number of redundant drives that can fail before data loss" while this is true here, let's make it clear that this is not a general rule: a RAID-5 array still only allows a single disk to fail.
    – jcaron
    Commented Jul 8 at 9:03
  • 1
    Edited to explain the joke about RAID 0 Commented Jul 8 at 16:15
4

The technology you are going to use depends on the workload.

If that's just for scratch storage, such as cache (where the loss of device is not critical since it causes only some extra data fetches and not the failure to service anything) or temp (where the data can be quickly recalculated), and there is a requirement to have a single huge fast scratch storage device, RAID0 will probably do it, and I would say, this is the only use of RAID0 that I can think of. But that's still a strange requirement. Investigate if multiple individual storage devices can be given for scratch, and use SSDs for that, it's way more sensible as a scratch storage.

Media dies, it's not "if" by "when" question. The system that was interfacing with the failed media will show errors, and won't be able to serve its purpose until someone stopped and fixed it, which incurs downtime to a service. RAID nowadays is the solution to precisely this single problem: to eliminate (or reduce) a downtime caused by individual storage media malfunction. Notice it's not about data loss: RAID is not a backup. Consequently, RAID0 is nonsense, as it can't save from such a downtime. So, if you don't have any peculiar requirements, you would never need RAID0

3
  • There is also the case of reusing several smaller disks in a simple way, when the data is recoverable or can be recreated.
    – WoJ
    Commented Jul 8 at 9:33
  • @WoJ we're here for reasonable IT management practices, and your suggestion doesn't fit into this category Commented Jul 8 at 10:15
  • we're here for reasonable IT management practices - reuse of disks is a very reasonable IT management practice. You have plenty of cases where the cost of data recovery is negligible and you care about costs.
    – WoJ
    Commented Jul 8 at 13:52
3

If you have on-site backups you can enjoy stripping aka RAID0. If you have sort of hardware RAID controller and your OS can boot from RAID0 LUN.

1

For your workstation with four 2TB SAS disks, RAID 0 offers improved performance by striping data across all disks, but if one disk fails, all data is lost. Installing Linux on one disk and mounting the others separately provides better fault tolerance, allowing you to access data on the remaining disks if one fails, though you lose the performance benefits of RAID 0. Consider RAID 1 (mirroring) or RAID 5 (striping with parity) for a balance between performance, capacity, and data protection. RAID 1 halves your usable space but provides redundancy, while RAID 5 offers better performance and redundancy but requires at least three disks.

2
  • "RAID 5 offers better performance". I don't think so.
    – Greg Askew
    Commented Jul 8 at 12:36
  • Don't even think about RAID5 with 2TB rotating disks. Commented Jul 8 at 17:00

Not the answer you're looking for? Browse other questions tagged .