Posts made by StorageNinja

StorageNinja

@tirendir said in Is Most IT Really Corrupt?:

@dashrender In my mind, a bench tech actually does tech work. Like troubleshooting issues that can't be done remotely, like a workstation that won't boot. You're not troubleshooting that from a few hundred miles away very well lol, someone has to put hands on that to do anything with it. It's difficult to maintain cost effectiveness except in particularly good SMBs with multiple spares sitting on site, and it takes more time to do a workstation swap managed by a clueless user versus a legitimate technician of just about any level. If they have any issues at all, it's burning vastly more expensive MSP time at over $100/hour minimum for direct support that isn't covered in a contract.

We did VDI a lot for service delivery. Thin clients REALLY don't die much. Eliminated the need for bench entirely, and spares were cheap to have sitting in each branch office.

StorageNinja

@tirendir said in Is Most IT Really Corrupt?:

@dashrender The fix would have run us some 10s of thousands of dollars, because the Host was setup on RAID5 with four drives prior to my arrival. We had a triple-drive failure on that Host, and the backups they were supposedly managing never actually happened off-site, and the on-site backups it turned out were scrambled so badly by the box they were being sent to on-site, that they were completely un-readable, meaning that the only fix they could offer was essentially a complete software-side environment rebuild from the ground with an on-site exchange orphaned from it's domain in that case.

What's great about this story is it has nothing to do with a MSP, or a MSP model. In house IT can not pay attention to backups, or setup everything in RAID 0 (Working at a MSP we had to clean this up a lot, when onboarding clients).

StorageNinja

@tirendir said in Is Most IT Really Corrupt?:

@storageninja I'm saying that if there were enough good MSPs, I would agree with you. But there simply aren't enough of them, which means many of the realistic options for MSPs for most organizations aren't much if any better than the lousy SMB IT Scott's been talking about for a while. Reality doesn't realize a world where good MSPs are available to everyone in practical terms. The good MSPs aren't large enough, plentiful enough, or spread out well enough to service the vast majority of organizations that need good IT services is how it seems to work out ultimately. That's the reason there is so much meh or even poor IT going on, imo.

This is kind of a weird fallacy because MSP's can cover a LOT of ground without a lot of staff. (Not that there are not some massive ones like All Covered) as well as tons of medium ones like RoundTower etc. I worked for a small one in Houston and we had no problem support customers in EMEA (It was easier). We had a local resource deal with end user support (What little there was left), and all the back end maintenance stuff was easy to do as our day was there after hours. having a contract model where that literally forces them from adopting stupid things (no PST's allowed was my favorite one) ended up with a far more value and transparency for the IT spend than purely in house.

A MSP doesn't have to be in the same zip code to cover a client. If it does, they are honestly a shitty MSP.

StorageNinja

@scottalanmiller said in Is Most IT Really Corrupt?:

This was an interesting read. I had never really given any thought to corruption in IT beyond the faking it idea. I sure as heck wouldn't feel comfortable working somewhere and not knowing what I'm doing and not even making an effort to learn.

And yet, it's so common that people no longer even realize it is something to question!

As long as you know how to learn quickly this one is often over-rated. It's the guys who NEVER learn what their job is (20 years in, having the SE come in once a week and make changes to network for them in exchange for buying enterysys switches) where it's out of hand.

StorageNinja

@tirendir said in Is Most IT Really Corrupt?:

As far as underhanded or shady purchasing deals with kickbacks, I would agree that IT are pretty uniquely situated to participate in such practices far more so than the vast majority of fields. I'll also agree that SMBs get the short end of things in terms of quality personnel of course, because they don't have the scope of reach for talent recruitment, nor the vast resources that Enterprises typically do, so oftentimes the Enterprises will scoop up much of the best talent before SMBs ever get a chance. Such scenarios obviously would leave the SMBs with far less comparable or adequately capable talent to choose from, forcing them to have to make due with what they have left to select from. Ironically, the biggest issue with SMBs may well be Enterprises gobbling up much of the best talent, perhaps as much as the fact that SMBs may not be great businesses.

A huge issue with SMB's is they have far less robust hiring practices. SMB's tend to cheap out on background checks for criminal actions (and in some cases even hilariously waste money on drug tests for IT staff, while using the "budget" background check).

I saw an IT director wiretap a board meeting at a SMB once. The guy was a little off, but honestly, I blame management. They wanted to have an IT director slash custom software developer who worked 70 hour weeks with 1 week of vacation a year and they paid a farcical 100K with no variable comp. If you have such unrealistic compensation requirements, your only option is going to be getting someone who's an idiot or worse, has some "fun" quirks like ethics or mental health.

StorageNinja

@scottalanmiller said in Is Most IT Really Corrupt?:

@tirendir said in Is Most IT Really Corrupt?:

@scottalanmiller Totally agree it's often really obvious in SMB, but to them it's not a question of whether there is theft or not even when they notice it, but whether the individual is worth losing over it. Enterprise minded people have this mistaken idea oftentimes that everyone is replaceable or interchangeable. SMBs don't have the luxury of such a silly notion, so when they detect theft they must weigh relative value where Enterprises often simply don't bother because they seem to think they don't have to.

SMBs definitely have that option. It's an illusion that the do not. SMBs need fewer resources and at a lower level so actually have more ability to replace. What's often approaching impossible for an enterprise might be trivial for an SMB. SMB needs are so often generic and interchangeable compared to enterprise. They have a big advantage here.

Wait, is he saying people in SMBs are not easy to replace? The MSP and SaaS industry continue to make that a less defendnable position.

StorageNinja

http://vspeakingpodcast.com Episode 55 - talking about automation and the upcoming hackathon.

@NetworkNerd I hear is leading a team!?!

StorageNinja

@emad-r said in Is Most IT Really Corrupt?:

Taking kick backs from salespeople in order to ensure a sale
It does not happen like that, you just get better treatment from the sales personnel maybe discount for personal purchases, but that's it.

So I JUST finished my yearly ethics training and that one would flag unless...

The discount was fully public, open to everyone at the company, and part of a program that is available to anyone at any company. Example, my company gets 10% off AT&T cell contracts for your personal use. If our Telecom department manager was the ONLY person getting the discount that would be UN-ethical.

I fly Southwest and stay in Marriott I earn status while spending company money within the limitations of our corporate travel policy. Now if I was staying at the Ritz Carlton for $1000 a night to get more Marriott points (Not allowed by policy when there is a $200 Hilton next to the venue) then I would be getting in trouble.

StorageNinja

@scottalanmiller said in Is Tintri Heading for Pure and Nutanix Territory Financially?:

Depends. Ones like Nutanix with zero plan, that makes no sense and that's why we are concerned here. Others, like Amazon, that are obviously and publicly trading profits today for market growth, but could be profitable at any moment if they wanted to be make total sense.

But...... they are a cloud software company who makes an HCI appliance that can BACKUP DATA TO THE PUBLIC CLOUD! OMGZ IT'S SO FORWARD CLOUD THINKING! CLOUD CLOUD CLOUD CLOUD! OHHH IT CAN CREATE A CONTAINER ON GOOGLE! OMGZ CLOUDS!

On a serious note, the cloud washing in this industry is awful when someone who makes a hypervisor that doesn't run in/on ANY public clouds can be considered a cloud company.

StorageNinja

@pmoncho said in Is Tintri Heading for Pure and Nutanix Territory Financially?:

I get that they gamble. I am an investor but the mentality to throw good money after bad doesn't make sense to me. Initial stages (GOOGL, AMZN, FB, NFLX, etc) and a few years I understand. Companies going on 10+ years, burn immense amounts of cash, high debt and everyone is paid in stock options are the ones I cannot wrap my head around. Good money after bad IMHO.

Blame regulations making being public such a pain in the rear. Everyone delays it.
Mark Cuban has written quite a bit on why companies are staying private too long.

http://blogmaverick.com/2016/02/04/the-pre-cognitive-anti-trust-violationhow-the-decimation-of-the-ipo-market-has-hurt-the-economy-and-worse/

StorageNinja

@travisdh1 said in Is this server strategy reckless and/or insane?:

You're job description includes talking to people on web forums now doesn't it? Also, when do you stop drinking?

Fly, Drink, Talk. There you go.

No, hanging out on web forums is not my job.
I actually didn't drink that much this weekend (was too hot, working on the beach house).

My day job involves...

Flying to conferences and speaking. I have 11 conference presentations in the next 4 weeks. Crowd size is 200-800.
Flying to fun places and meeting with people. I'll be in India soon meeting with Customers, Partners, and SE's training them and taking questions, and collecting feedback for engineering.
Breaking things. I technically am classified as a R&D employee and have full access to our nightly builds, our BAT private cloud, and a dozen "Fully loaded" servers for a lab. I test the new stuff, send feedback through my customer [0] Team, and meet with engineers to capture the subtitles of what's coming out. I don't write the technical publications (core documentation), but I do draft thousands upon thousands of words for design and sizing and usage guides, blogs.
I host a podcast for the lols.

StorageNinja

@kooler HTML5 GUI's you say?

0_1502925457565_Cool-vmware-vcenter-vsphere-client-HTML5-no-single-sign-on.png

0_1502925524187_ESXiHostClientFlingScreenShotLargest.png

StorageNinja

@travisdh1 My job is to fly drink and talk primarily

StorageNinja

@matteo-nunziati said in Is this server strategy reckless and/or insane?:

This. I asked the reseller about this feature. They anwer: disable ssd cache anyway and use controller cache.
The latter former is a safer choice while the latter is too new/untested feature...

To be blunt, the reseller doesn't know what they are talking about. Every enterprise SSD in the modern era (using some sort of FTL) uses this design and has for years. They are configured this way even in big enterprise storage arrays with the unique exception of Pure Storage who re-writes their firmware to basically use drives as dumb NAND devices (and then has MASSIVE NVRAM buffers fronting the drives that do the same damn thing at a global level).

Some SDS systems want you to explicitly disable the front cache as it will coalesce data and prevent data proximity optimizations in the actual raw data placement. It also exists as yet another place that data can be lost or corrupted and for systems that want to "own" IO integrity end to end they want to know where stuff is.

Then again, what do I know...

StorageNinja

@fateknollogee said in Is Tintri Heading for Pure and Nutanix Territory Financially?:

@scottalanmiller said in Is Tintri Heading for Pure and Nutanix Territory Financially?:

This stuff worries me significantly in storage vendors...But Nutanix? That's a vendor that could close its doors any day. It's just bleeding cash, has no hope of investment...

Where'd you hear that from??

They've lost over 500 million dollars, and are not public so they can't raise VC cash.
well over 100 million lost on last quarter.

StorageNinja

Another trend in benchmarking is using stuff like HCI bench or VM Fleet to test LOTS of workloads. A single worker in a single VM doesn't' show what contention looks like at scale.

StorageNinja

@matteo-nunziati said in Is this server strategy reckless and/or insane?:

@storageninja ok smartphone here. Will be ultrashort.

0- really enlighting thank you!

1- I was thinking about a simple layout for bench: os, RAID controller, disks no hypervisor, no apps on system like network fs servers and so on.
I tested the machine w/ Centos with iozone. So my fault: with controller I meant raid controller

2- yes, cache is on disk controller board.

3- so when my raid controller card asks me to disable disk onboard cache, and performance actually drops a lot on ssd, what actually happens? Dram is still alive?

Depends on the vendor and the drive but I would suspect DRAM cache is still being used (again to protect endurance), it's just delaying the ACK until it gets to the lower level. Now on some enterprise drives that have capacitors (so they can protect that DRAM completely on power loss) they will sometimes still ACK a write in DRAM anyways (as nothing really changes and it's why those drives can post giant performance numbers). On a drive that has full power loss protection built in benching with the cache disabled is dumb as we don't care what the RAW NAND can do, we care what the drive can do under a given load. Your better off in this case if you want to stress the drives do two tests.

75% write small block (some drives fall over on mixed workload).
100% sequential write large block (256KB).

Even then if your workload doesn't look like this (most don't) then it's kinda pointless finding break points of drives. The point of benchmarking is to make sure a system will handle your workload not find it's break point.

Most people screw up and accidentally test a cupecake (a DRAM cache for reads somewhere), or try to break it with an unrealistic workload. Outside of engineering labs for SSD drives and storage products, there isn't a lot of use to this.

Another thing to note is you can capture and replay an existing workload using vSCSI trace capture and one of the VMware storage flings. you can even "accelerate" or duplicate it several times over. This helps know what your REAL workload will look like on a platform.

StorageNinja

@matteo-nunziati said in Is this server strategy reckless and/or insane?:

@storageninja said in Is this server strategy reckless and/or insane?

There are a lot of Cache's.. There are controller caches, their are DRAM caches inside of drives (You can't disable this on SSD's, and can only sometimes turn they off on SATA magnetic drives and others). Some SDS systems use one tier of NAND as a Write Cache also, some do read/write caches.

I was aware of only 3 levels of cache: os, controller, disk.
Disk cache can be disabled according to my controller. Is The cache you are talking about another level ofthe disk?

It's technically inside the disk controller (you can't put DRAM on a platter!).

Enough Magnetic SATA drives will ignore the command, combined with the performance being completely lousy that VMware stopped certifying them on the vSAN HCL.

As far as SSD's go, they use DRAM to de-amplify writes. If you didn't do this (absorb writes, compress them, compact them) the endurance on TLC would be complete and utter garbage, and the drive vendor would get REALLY annoyed replacing drives that either performed like sludge or were burning out the NAND cells. Some TLC drives will also use a SLC buffer for incoming writes beyond the DRAM buffer (and slide the NAND in an out of SLC mode as it can't take the load anymore and retire it for write cold data). SSD's are basically a mini storage array inside of the drive (which is why you see FPGA's and ASIC's and 4 core ARM processors on the damn things).

There are also hypervisor caches (Hyper-V has some kind of NTFS DRAM cache, ESXi has CBRC a deduped RAM cache commonly used for VDI, Client Cache a data local DRAM cache for VMware vSAN) there are application caches (Simple ones like SQL, more complicated ones like PVS's Write overflow cache that risks with data loss to give you faster writes so must be strategically used).

On top of this there are just other places besides this disk for bottlenecks to occur inside of various IO queues or other bottlenecks.

The vHBA, The Clustered File System locking, Weird SMB redirects use by CSV, LUN queues, Target queues, Server HBA queues (Total, and max LUN queue which are wildly different). Kernel injected latency can cause IO to back up (no CPU cycles to process IO it will cause storage latency) as well as the inverse (high disk latency, CPU cycles get stuck waiting on IO!) which can lead to FUN race conditions. Sub-LUN queues (vVols) and even NFS have per mount queues! IO filters (VAIO!), guest OS filter driver layers can also add cache or impact performance. Throw in quirks of storage arrays (Many don't use more than 1 thread for the array, or a given feature per LUN or RAID group like how FLARE RAN for ages) and you could have a system that's at 10% CPU load, but it being a 10 core system the 1 core that does Cache logic is pegged and causing horrible latency.

You can even have systems that try to dynamically balence these queues to prevent noisy neighbor issues (SIOCv1!)

http://www.yellow-bricks.com/2011/03/04/no-one-likes-queues/

StorageNinja

It also completely ignores what DWPD is often a proxy for. Write latency consistency. A drive with .3DWPD MIGHT be good enough for your endurance requirements but might also completely implode on performance if all your writes are done within a short period of time and the application has end users accessing it.

StorageNinja

@matteo-nunziati said in Is this server strategy reckless and/or insane?:

About bench. I've made some tests with my new server before deployment. Disabling controller and disk cache helped a lot understanding real perf of disks.
I've seen sata ssd x4 raid5 outperform 15k sas x4 raid 10.
Enabling cache at controller level blends things, even with big files making benches a bit more blurry.

Running benchmarks is a dark art. Especially with Cache.

Some workloads are cache friendly (So a hybrid system of DRAM or NAND cache and magnetic drives will work the same as an "all flash").
There are a lot of Cache's.. There are controller caches, their are DRAM caches inside of drives (You can't disable this on SSD's, and can only sometimes turn they off on SATA magnetic drives and others). Some SDS systems use one tier of NAND as a Write Cache also, some do read/write caches.
Trying to maximize drives when testing them for Throughput or IOPS is different than trying to profile steady state latency under low queue depth.

99% of people I talk to who are testing something are doing something fairly terrible that doesn't test what htey want. They are doing Crystal Disk or some desktop class system to test a single disk, on a single vHBA, on a single VM that's touching only a handful of the disks or a single cache device.

For bench-marking on VMware vSAN with HCI bench there is now a cloud analytic platform that will diagnose if you are properly creating a workload and configuration that is truly trying to maximize something (Throughput, Latency, IOPS). If it's not optimized it will give you improvements (Maybe stripe objects more, tune disk groups, generate more queued IO with your workers). This is actually pretty cool in that it helps make sure you are doing real benchmarking and not testing the speed of your DRAM

https://www.youtube.com/edit?o=U&video_id=wAz4h48pZZI