@matteo-nunziati said in Is this server strategy reckless and/or insane?:
@storageninja said in Is this server strategy reckless and/or insane?
- There are a lot of Cache's.. There are controller caches, their are DRAM caches inside of drives (You can't disable this on SSD's, and can only sometimes turn they off on SATA magnetic drives and others). Some SDS systems use one tier of NAND as a Write Cache also, some do read/write caches.
I was aware of only 3 levels of cache: os, controller, disk.
Disk cache can be disabled according to my controller. Is The cache you are talking about another level ofthe disk?
It's technically inside the disk controller (you can't put DRAM on a platter!).
Enough Magnetic SATA drives will ignore the command, combined with the performance being completely lousy that VMware stopped certifying them on the vSAN HCL.
As far as SSD's go, they use DRAM to de-amplify writes. If you didn't do this (absorb writes, compress them, compact them) the endurance on TLC would be complete and utter garbage, and the drive vendor would get REALLY annoyed replacing drives that either performed like sludge or were burning out the NAND cells. Some TLC drives will also use a SLC buffer for incoming writes beyond the DRAM buffer (and slide the NAND in an out of SLC mode as it can't take the load anymore and retire it for write cold data). SSD's are basically a mini storage array inside of the drive (which is why you see FPGA's and ASIC's and 4 core ARM processors on the damn things).
There are also hypervisor caches (Hyper-V has some kind of NTFS DRAM cache, ESXi has CBRC a deduped RAM cache commonly used for VDI, Client Cache a data local DRAM cache for VMware vSAN) there are application caches (Simple ones like SQL, more complicated ones like PVS's Write overflow cache that risks with data loss to give you faster writes so must be strategically used).
On top of this there are just other places besides this disk for bottlenecks to occur inside of various IO queues or other bottlenecks.
The vHBA, The Clustered File System locking, Weird SMB redirects use by CSV, LUN queues, Target queues, Server HBA queues (Total, and max LUN queue which are wildly different). Kernel injected latency can cause IO to back up (no CPU cycles to process IO it will cause storage latency) as well as the inverse (high disk latency, CPU cycles get stuck waiting on IO!) which can lead to FUN race conditions. Sub-LUN queues (vVols) and even NFS have per mount queues! IO filters (VAIO!), guest OS filter driver layers can also add cache or impact performance. Throw in quirks of storage arrays (Many don't use more than 1 thread for the array, or a given feature per LUN or RAID group like how FLARE RAN for ages) and you could have a system that's at 10% CPU load, but it being a 10 core system the 1 core that does Cache logic is pegged and causing horrible latency.
You can even have systems that try to dynamically balence these queues to prevent noisy neighbor issues (SIOCv1!)
http://www.yellow-bricks.com/2011/03/04/no-one-likes-queues/