Posts made by PhlipElder

PhlipElder

Picked up Timothy Zahn's new Thrawn Alliances book so am re-reading its predecessor then the new one.

He is a fantastic writer. Great book(s).

PhlipElder

@travisdh1 said in Safe to have a 48TB Windows volume?:

@PhlipElder said in Safe to have a 48TB Windows volume?:

@scottalanmiller said in Safe to have a 48TB Windows volume?:

@PhlipElder said in Safe to have a 48TB Windows volume?:

The RAID 10 failure was icing on the cake. Not an emotional reaction, just one that falls into what we've experienced failure wise across the board.

What math did you use to make a single, very unusual RAID 10 failure lead you to something riskier?

How can it be non-emotional unless your discovery was that data loss simply didn't affect you and increasing risk was okay to save money on needing fewer disks?

The statistics for double disk failures. The rebuild rates throw an extra amount of stress on the RAID 10's buddy. That places an extra amount of risk on the table.

Yet we know that RAID 5/6 rebuild adds stress to every drive in the array instead of just a single drive.

Concur, but the stress is a lot more distributed.

PhlipElder

@FATeknollogee said in Safe to have a 48TB Windows volume?:

@PhlipElder said in Safe to have a 48TB Windows volume?:

(BUE failed us so we moved to Storagecraft ShadowProtect which has been flawless to date).

What is BUE?

Sorry, I should have broken the acronym out. It's Backup Exec at one time by Colorado when it was an awesome product then Symantec when things went downhill from there.

PhlipElder

@scottalanmiller said in Safe to have a 48TB Windows volume?:

@PhlipElder said in Safe to have a 48TB Windows volume?:

The RAID 10 failure was icing on the cake. Not an emotional reaction, just one that falls into what we've experienced failure wise across the board.

What math did you use to make a single, very unusual RAID 10 failure lead you to something riskier?

How can it be non-emotional unless your discovery was that data loss simply didn't affect you and increasing risk was okay to save money on needing fewer disks?

The statistics for double disk failures. The rebuild rates throw an extra amount of stress on the RAID 10's buddy. That places an extra amount of risk on the table. The number of times the same thing happened in a RAID 1 setting was also a factor.

PhlipElder

@scottalanmiller said in Safe to have a 48TB Windows volume?:

@PhlipElder said in Safe to have a 48TB Windows volume?:

Parallel rebuild into free pool space rebuilds that dead disk across all members in the pool. So, 2TB gets done in minutes instead of hours/days. Plus, once that dead drive's contents is rebuilt into free pool space so long as there is more free pool space to be had (size of disk + ~150GB) another disk failure can happen and still maintain the two disk resilience (Dual Parity or 3-Way Mirror).

RAID can't do that for us.

Absolutely, this is a huge reason why RAIN has been replacing RAID for a long time. We've had that for many years. Large capacity is making RAID simply ineffective, no surprises there. "Shuffling" data around as needed is a powerful tool.

Technically, RAID can do this, but does it very poorly. It's a feature of hybrid RAID.

We're seeing the same thing in Solid-State now too. As SSD vendors deliver larger and larger capacity devices the write speeds all of a sudden become a limiting factor. Go figure. :S

PhlipElder

@scottalanmiller said in Safe to have a 48TB Windows volume?:

With lots of double disk failures, the real thing you need to be looking at is the disks that you have or the environment that they are in. RAID 5 carries huge risk, but it shouldn't primarily be from double disk failures. That that is what led you away from RAID 5 should have been a red flag that something else was wrong. Double disk failure can happen to anyone, of course, but lots of them indicates a trend that isn't RAID related.

One was environment. The site had the HVAC above the ceiling tiles all messed up with primary paths not capped. So, air return did not work and A/C in the summer stayed above the ceiling tiles and heat in the winter as well. The server closet during the winter could easily hit 40C. There were no more circuits available anywhere in the leased space so we couldn't even get a portable A/C in there.

We experienced four, count them, four catastrophic failures at that site. The owners knew why but we were helpless against it. So, we build-out a highly available system using two servers, third party products, and a really good backup set (BUE failed us so we moved to Storagecraft ShadowProtect which has been flawless to date).

There's statistics. Then there's d*mned statistics.

PhlipElder

@scottalanmiller said in Safe to have a 48TB Windows volume?:

@PhlipElder said in Safe to have a 48TB Windows volume?:

@scottalanmiller said in Safe to have a 48TB Windows volume?:

@PhlipElder said in Safe to have a 48TB Windows volume?:

@scottalanmiller said in Safe to have a 48TB Windows volume?:

Some examples of things we have math to tell us are good or bad...

RAID 10 .... we've done massive empirical studies. We know that the RAID systems themselves are insanely reliable.
Cheap SAN like the P2000 .... we know that by collecting anecdotes, and knowing total sales figures, that the failure rates of those observed alone is too high for the entire existing set of products made, and we can safely assume that the number we have not observed is vastly higher. But observation alone tells us that the reliability is not high enough for any production use.

We lost an entire virtualization platform and had to recover from scratch because the second member of a RAID 10 pair failed after replacing the first and a rebuild initiating. We'll stick with RAID 6 thanks.

EDIT: The on-site IT and I were well into our coffee chat when the spontaneous beep/beep happened and we were both, WTF?

See, that's an irrational, emotional reaction that we are trying to avoid. You have one anecdote that tells you nothing, but you make a decision based on it that goes against math and empirical studies. Why?

The fact that you and possibly your org has actually studied things is important to the discussion.

I've published about it and speak about it all the time. The study was massive. And took forever. As you can imagine.

One of the reasons we adopted Storage Spaces as a platform was because of the auto-retire and rebuild into free pool space via parallel rebuild. With, at that time, 2TB and larger drives becoming all the more common rebuild times on the RAID controller were taking a long time to happen.

Parallel rebuild into free pool space rebuilds that dead disk across all members in the pool. So, 2TB gets done in minutes instead of hours/days. Plus, once that dead drive's contents is rebuilt into free pool space so long as there is more free pool space to be had (size of disk + ~150GB) another disk failure can happen and still maintain the two disk resilience (Dual Parity or 3-Way Mirror).

RAID can't do that for us.

PhlipElder

@scottalanmiller said in Safe to have a 48TB Windows volume?:

@PhlipElder said in Safe to have a 48TB Windows volume?:

@scottalanmiller said in Safe to have a 48TB Windows volume?:

Some examples of things we have math to tell us are good or bad...

RAID 10 .... we've done massive empirical studies. We know that the RAID systems themselves are insanely reliable.
Cheap SAN like the P2000 .... we know that by collecting anecdotes, and knowing total sales figures, that the failure rates of those observed alone is too high for the entire existing set of products made, and we can safely assume that the number we have not observed is vastly higher. But observation alone tells us that the reliability is not high enough for any production use.

We lost an entire virtualization platform and had to recover from scratch because the second member of a RAID 10 pair failed after replacing the first and a rebuild initiating. We'll stick with RAID 6 thanks.

EDIT: The on-site IT and I were well into our coffee chat when the spontaneous beep/beep happened and we were both, WTF?

See, that's an irrational, emotional reaction that we are trying to avoid. You have one anecdote that tells you nothing, but you make a decision based on it that goes against math and empirical studies. Why?

The fact that you and possibly your org has actually studied things is important to the discussion.

We've had enough double disk failures over time to have influenced the decision to drop RAID 5. The RAID 10 failure was icing on the cake. Not an emotional reaction, just one that falls into what we've experienced failure wise across the board.

PhlipElder

@scottalanmiller said in Safe to have a 48TB Windows volume?:

Some examples of things we have math to tell us are good or bad...

RAID 10 .... we've done massive empirical studies. We know that the RAID systems themselves are insanely reliable.
Cheap SAN like the P2000 .... we know that by collecting anecdotes, and knowing total sales figures, that the failure rates of those observed alone is too high for the entire existing set of products made, and we can safely assume that the number we have not observed is vastly higher. But observation alone tells us that the reliability is not high enough for any production use.

We lost an entire virtualization platform and had to recover from scratch because the second member of a RAID 10 pair failed after replacing the first and a rebuild initiating. We'll stick with RAID 6 thanks.

EDIT: The on-site IT and I were well into our coffee chat when the spontaneous beep/beep happened and we were both, WTF?

PhlipElder

@scottalanmiller said in Safe to have a 48TB Windows volume?:

@PhlipElder said in Safe to have a 48TB Windows volume?:

@scottalanmiller said in Safe to have a 48TB Windows volume?:

@Obsolesce said in Safe to have a 48TB Windows volume?:

@scottalanmiller said in Safe to have a 48TB Windows volume?:

@Obsolesce said in Safe to have a 48TB Windows volume?:

@jim9500 said in Safe to have a 48TB Windows volume?:

Have any of you used 48TB Windows volumes? Any resources on risk analysis vs ZFS?

I have two that are close to 60 TB. But they are REFS and hold a lot of large virtual disks.

REFS on 2019 is what I would wait for, for bare file storage.

Are you on 2019 now or looking to move off of a Windows file server?

ReFS has a bad track record. It's got a future, but has been pretty lacking and presents a bit of risk. Microsoft has had a disastrous track record with storage recently, even if ReFS is supposed to get brought to production levels with 2019, 2019 is questionably production ready. Remember... data loss is why it was pulled out of production in the first place.

It's been great in my experience. Though, I am using it in such a way the risk is worth the benefits... replication and backup repositories. It's been 100% solid. And like I said, it's all huge files stored on it, and probably not the use case that you seen results in data loss. I haven't seen that anywhere, so only taking your word for it unless you have links for me to do some reading. Not dumb stuff from Tom's or whatever, reputable scenarios in correct use cases.

The problem with storage is that we expect durability of something like seven nines as a "minimum" for being production ready. That means no matter how many people having "good experiences" with it, that tells us nothing. It's the people having issues with it that matter. And ReFS lacks the stability, safety, and recoverability necessary for it to be considered production ready to normal people as a baseline.

But even systems that lose data 90% of the time, work perfectly for 10% of people.

The problem I have with this perspective is that some of us have more direct contacts with folks that have had their SAN storage blow up on them but nothing gets seen in the public. One that does come to mind is the Australian Government's very public SAN blow-out a few years ago.

There is no solution out there that's perfect. None. Nadda. Zippo. Zilch.

All solutions blow up, have failures, lose data, and outright stop working.

Thus, in my mind citing up-time, reliability, or any other such statistic is a moot point. It's essentially useless.

Not at all. Reliability stats are SUPER important. There's ton of value. When we are dealing with systems expecting durability like this, those stats tell us a wealth of information. You can't dismiss the only data we have on reliability. It's far from useless.

BackBlaze is probably the only vendor I can think of that has told the drive vendors to take a flying leap and published what I consider to be real reliability statistics.

There are vendors, VMware for vSAN and Nutanix come to mind, that have specific NDAs in place that block any mention of their product's reliability and performance.

Drive vendors also have a similar clause but note BackBlaze.

Other than BackBlaze, the reliability statistics that I can find reliable are the ones that we have based on all of the solution sets we've built and deployed or worked with over the years. Those numbers tell a pretty good story. But, so too do the statistics that come about as result of the aforementioned panicked phone call.

Anything else in the public sphere has about the same weight as CRN, PCMag, ConsumerReports, or any other marketing fluff type.

PhlipElder

@scottalanmiller said in Safe to have a 48TB Windows volume?:

@Obsolesce said in Safe to have a 48TB Windows volume?:

@scottalanmiller said in Safe to have a 48TB Windows volume?:

@Obsolesce said in Safe to have a 48TB Windows volume?:

@jim9500 said in Safe to have a 48TB Windows volume?:

Have any of you used 48TB Windows volumes? Any resources on risk analysis vs ZFS?

I have two that are close to 60 TB. But they are REFS and hold a lot of large virtual disks.

REFS on 2019 is what I would wait for, for bare file storage.

Are you on 2019 now or looking to move off of a Windows file server?

ReFS has a bad track record. It's got a future, but has been pretty lacking and presents a bit of risk. Microsoft has had a disastrous track record with storage recently, even if ReFS is supposed to get brought to production levels with 2019, 2019 is questionably production ready. Remember... data loss is why it was pulled out of production in the first place.

It's been great in my experience. Though, I am using it in such a way the risk is worth the benefits... replication and backup repositories. It's been 100% solid. And like I said, it's all huge files stored on it, and probably not the use case that you seen results in data loss. I haven't seen that anywhere, so only taking your word for it unless you have links for me to do some reading. Not dumb stuff from Tom's or whatever, reputable scenarios in correct use cases.

The problem with storage is that we expect durability of something like seven nines as a "minimum" for being production ready. That means no matter how many people having "good experiences" with it, that tells us nothing. It's the people having issues with it that matter. And ReFS lacks the stability, safety, and recoverability necessary for it to be considered production ready to normal people as a baseline.

But even systems that lose data 90% of the time, work perfectly for 10% of people.

The problem I have with this perspective is that some of us have more direct contacts with folks that have had their SAN storage blow up on them but nothing gets seen in the public. One that does come to mind is the Australian Government's very public SAN blow-out a few years ago.

There is no solution out there that's perfect. None. Nadda. Zippo. Zilch.

All solutions blow up, have failures, lose data, and outright stop working.

Thus, in my mind citing up-time, reliability, or any other such statistic is a moot point. It's essentially useless.

The reality for me is, and maybe my perspective is coloured by the fact that I've been on so many calls over the years with the other end being at their wit's end with a solution that has blown-up on them, no end of marketing fluff promoting a product as being five nines or whatever has an ounce/milligram of credibility to stand on. None.

The only answer that has any value to me at this point is this: Are the backups taken test restored to bare-metal or bare-hypervisor? Has your hyper-scale whatever been tested to failover without data loss?

The answer to the first question is a percentage I'm interested in and could probably guess. We all know the answer to the second question as there have been many public cloud data loss situations over the years.

[/PONTIFICATION]

PhlipElder

@Obsolesce said in Safe to have a 48TB Windows volume?:

@jim9500 said in Safe to have a 48TB Windows volume?:

Have any of you used 48TB Windows volumes? Any resources on risk analysis vs ZFS?

I have two that are close to 60 TB. But they are REFS and hold a lot of large virtual disks.

REFS on 2019 is what I would wait for, for bare file storage.

Are you on 2019 now or looking to move off of a Windows file server?

ReFS is supported for production workloads on Storage Spaces Direct and Storage Spaces. With the Server 2019 ReFS generation Microsoft has relented to some degree and stated that ReFS can be done on SAN but only for archival purposes only. No workloads on SAN. Period.

There are a lot of features within ReFS that need to reach in a lot deeper thus the Storage Spaces/Storage Spaces Direct requirement.

PhlipElder

@scottalanmiller said in Safe to have a 48TB Windows volume?:

@DustinB3403 said in Safe to have a 48TB Windows volume?:

Doesn't ntfs have a limit of 16TB per volume?

NTFS volume limit is 256TB in older systems.

NTFS has an 8PB volume limit in modern ones.

The one caveat to NTFS Volumes as far as size goes is the 64TB limit for Volume Shadow Copy snapshots. A lot of products use VSS for their purposes.

PhlipElder

@Donahue said in Hyper-V teaming worth it for LACP?:

Yeah, i think i need to learn powershell. I probably rely too much on GUI's

Same fees, tenth of the time.

PhlipElder

@scottalanmiller said in Safe to have a 48TB Windows volume?:

@PhlipElder said in Safe to have a 48TB Windows volume?:

@jim9500 said in Safe to have a 48TB Windows volume?:

It seems like I remember Scott Miller talking about combining enterprise hardware + SAS/SATA Controller + Linux for storage requirements vs proprietary hardware raid controller.

@Donahue - Yes. I have a similar setup offsite backup several miles away for disaster recovery / hardware failure etc. I know raid != backups.

What's the air-gap to protect against an encryption event if any?

LOL. I like that term. "Encryption Event"

It implies, quite correctly, that many of those problems are not exactly malware. Many are just bad system design.

Indeed. We've "heard" of cloud vendors that have lost both their own and their tenant's environments due to an encryption event which implies improper setup and procedures.

As far as the backup server pulling the data on to itself one needs to make sure no credentials are saved anywhere. All it takes is one lazy tech doing so and the baddies are in. Rotating that password regularly would help to stem that.

Gostev (Veeam) has a regular newsletter and mentioned that offlining the backup server with it firing up to do its pulls then shutting itself back down again once done would be one way of dealing with having an air-gap.

EDIT: Setting that "Cannot Save Credentials" setting for RDS in Local GPMC would work too.

PhlipElder

@jim9500 said in Safe to have a 48TB Windows volume?:

It seems like I remember Scott Miller talking about combining enterprise hardware + SAS/SATA Controller + Linux for storage requirements vs proprietary hardware raid controller.

@Donahue - Yes. I have a similar setup offsite backup several miles away for disaster recovery / hardware failure etc. I know raid != backups.

What's the air-gap to protect against an encryption event if any?

PhlipElder

@JaredBusch said in Hyper-V teaming worth it for LACP?:

@Donahue said in Hyper-V teaming worth it for LACP?:

I attempted to make a team on the old host with (2) ports, but I did not use the method that most of the posts on ML use. I created a new vswitch with both ports as members using WAC, but I am not sure if this is the same as what I see in other posts. In WAC, I am able to

WAC did not exist when I wrote my posts.

At this time I have never even setup WAC yet for a client.

So I have no idea what the settings are.

We do everything in PowerShell and to some degree the native tools. WAC has been low on the To Do List for us.

From what I understand though, there should be a button or link that one can click to expose the underlying PowerShell WAC would be using when the NEXT button gets clicked?

PhlipElder

The default algorithm in Server 2019 is Hyper-V Port for LBFO Teams. There's no reason to tweak anything beyond the "Share with host OS" setting. Choose the ports to be included in the team, bind a virtual switch to the team, and flip the VMs over to the newly created vSwitch.

Takes about 10 seconds to do the above in PowerShell. The link is to a guide for setting up a standalone host using PowerShell.

PhlipElder

The engine may end up being an "Enterprise Mode" like IE I think?

Edge as a browser works well but with a few show stoppers that killed any further usage for us:
1: Downloads mysteriously won't start or just plain stop for no reason.
2: Edge ate my favourites way too many times.

The containerized Edge, Application Guard I think(?), is a great idea. If Edge was as good as they had hoped it would provide a fantastic sandbox experience to protect users from drive-by attacks and bad GET commands from e-mail clients.

At least we are not getting stuck with the legacy ActiveX that keeps rearing its head every once in a while because of IE.

PhlipElder

The announcement page: Starwood Guest Reservation Database Security Incident Marriott International

My thoughts on the matter though rather curtailed from what I really want to say due to polite company: Some Thoughts on the Starwood/Marriott Reservations Database Breach