[PLUG] Raving mad RAID

John Jason Jordan johnxj at gmx.com
Tue Feb 2 08:09:05 UTC 2021


On Mon, 1 Feb 2021 23:48:03 -0800
Ben Koenig <techkoenig at gmail.com> dijo:

>On 2/1/21 11:35 PM, John Jason Jordan wrote:
>> On Mon, 1 Feb 2021 22:15:12 -0800
>> Ben Koenig <techkoenig at gmail.com> dijo:
>>
>>>> Perhaps now would be the time to dig out those old emails and
>>>> consider some of the native alternatives rejected in favor of
>>>> RAID0.
>>> Unfortunately it looks like RAID might not be the culprit if his
>>> NVMe /dev nodes are moving around. RAID0 isn't the cause but it
>>> will make things more complicated when something fails further down
>>> in the stack.
>>>
>>> If his system is dynamically naming devices in /dev/nvme* then that
>>> needs to be dealt with before even thinking about RAID. Not really
>>> sure where to start looking at that off the top of my head since I
>>> was under the assumption that this wasn't supposed to happen with
>>> NVMe.
>> There was recently a bit of discussion about LVM, and Rich sent me
>> some links. I tried to read and understand it, but it seemed even
>> more complicated than RAID. Plus, several years ago, when I was
>> using Fedora, one of their obligatory updates changed my setup to
>> LVM (without telling me that it was going to do so), and I couldn't
>> get rid of it. That left a bad taste in my mouth and I have always
>> avoided LVM ever since. But I must admit that my dislike of LVM is
>> pure bias without much science.
>>
>> I am more concerned about devices renaming themselves and changing
>> how they are mounted, all without any input from me. About January
>> 20 I lost the first array that I had been running without a problem
>> for about a month. And now my re-creation of that array is playing
>> up after only a week. As I mentioned before, after rebooting the
>> drives appear fine, read-write, but when I launched Ktorrent it
>> complained that about half of the files it was seeding were missing.
>> The files are all there and I can do anything I want to with them,
>> but something is screwy with access. And why just half of the files?
>> Either they should all work or they should all fail.
>
>That's what seems so odd. A defective drive wouldn't actually change
>the way things are enumerated. You have 4 drives, one of those would
>disappear and the others would stay the same (for the most part).

My understanding is that, since it is RAID0, if one drive fails the
whole array fails. (But that's why this array is backed up to a NAS.)

>A simple test to help everyone here understand what your machine is
>doing would be to run through a few reboots and grab the list of
>devices, like so

I will do these things in the morning. It's too late and my brain is
going into shutdown mode.

But I should add one more thought: I have a RAID0 array on the Synology
NAS, and another on a Mediasonic enclosure with two WS drives. Both
have worked flawlessly for about four years. I've never had to mess
with the arrays.

And one more thought: I'd consider LVM instead of RAID0. But whatever
system I set up, I need the four 7.68TB NVMe drives to appear as one
big-ass 31TB drive.

Now it's bedtime. :)

>1) unplug your TB-3 drives and reboot.
>
>2) record the output of 'ls -l /dev/nvme*' here
>
>3) turn the computer off
>
>4) plug in the TB-3 drives
>
>5) turn the computer on and run 'ls /dev/nvme*' again.
>
>
>This will clearly isolate the device nodes for your enclosure
>independently of everything else on your computer. Once we have the
>drives isolate, it's trivial to watch them for irregular behavior.
>Until we have more confidence in the existence of your /dev/nvme nodes
>we can ignore the other symptoms.




More information about the PLUG mailing list