XenServer hyperconverged

olivier

@scottalanmiller I have multiple angles of attack, I'm currently benching and establishing pros/cons for each approach.

FATeknollogee

@olivier
I think you should move this "hyperconverged" feature up on the release schedule

olivier

I have file level restore on top right now

FATeknollogee

@olivier said in XenServer hyperconverged:

I have file level restore on top right now

I realize that.
File restore won't be unhappy at occupying the #2 spot, will it? jk

olivier

@FATeknollogee It doesn't work like that.

Playing/exploring a technology is one thing, releasing a minimal viable product is another one. Maybe my exploration will finish by a "it will be better to wait for SmapiV3 in XenServer" verdict.

I set some goals, I'll try to reach them but I can guarantee anything. About the file level restore, our lead dev work on it, not me. So I try to have my "tech time" on this (which is a bit hard considering I'm doing a lot of not technical work)

FATeknollogee

Thanks for the detailed explanation.

Just curious, but what is "SmapiV3 in XenServer"?

olivier

At least a modular storage "API" for XenServer: http://xapi-project.github.io/xapi/futures/smapiv3/smapiv3.html

It will allow to plug any filesystem/share into XenServer via "simple" plugins.

For me, that's the best "neat" solution coming, but it's not yet ready.

FATeknollogee

Thx for the explanation & link.

Keep up the great work, you have a fantastic product (I know I'm not the 1st one to tell you that)

olivier

Hey there,

If anyone can make some quick benchmark if you have any Windows based VM: using Crystal Disk Mark (latest, 5.2 I think) with the default parameters (5/1GiB)

Done tests on Windows Server 2016 (TP5, yeah I know I'm late) and I would like to compare how much I can lose in a hyperconverged scenario.

Also, telling the SR type and the physical device underneath would be great Thanks!

edit: no worries, I'm not here to compare apples to apples, just want a quick order of magnitude.

Danp

@olivier Here are my results (Windows Server 2008, LVM, Raid 10, 8x 15K spinning rust) --

Sequential Read (Q= 32,T= 1) : 416.270 MB/s
Sequential Write (Q= 32,T= 1) : 412.617 MB/s
Random Read 4KiB (Q= 32,T= 1) : 14.298 MB/s [ 3490.7 IOPS]
Random Write 4KiB (Q= 32,T= 1) : 17.564 MB/s [ 4288.1 IOPS]
Sequential Read (T= 1) : 321.305 MB/s
Sequential Write (T= 1) : 273.068 MB/s
Random Read 4KiB (Q= 1,T= 1) : 1.218 MB/s [ 297.4 IOPS]
Random Write 4KiB (Q= 1,T= 1) : 12.264 MB/s [ 2994.1 IOPS]

Test : 1024 MiB [C: 75.9% (56.9/75.0 GiB)] (x5) [Interval=5 sec]

olivier

@Danp Your storage is on your host right?

Danp

@olivier Yes, local storage.

olivier

@Danp Thanks!

olivier

Okay, so after just few days of technical experiments, here is the deal.

Context

2x XS7 hosts, installed directly on 1x Samsung EVO 750 (128 GiB) each
dedicated 1Gb link between those 2 machines (one Intel card, the other is Realtek garbage)

Usually, in a 2 hosts configuration, it's not trivial to avoid split-brain scenarios.

In a very small setup like this (2 hosts only with few disk space), you'll expect the overhead to be the worst possible regarding the proportion of resources. But will see it's still reasonable.

Current working solution

A shared file storage (thin provisioned):

What's working

data replicated on both nodes
fast live migrate VMs (just the RAM) between hosts without a NAS/SAN
very decent perfs
"reasonable" overhead (~2GiB RAM on each Node + 10GiB of storage lost)
scalable up to the max pool size (16 hosts)
killing one node and other VMs on the other host will still work
using XenServer HA on this "shared" storage to automatically bring back to life VMs that were on the killed node
no split brain scenario (at least during my tests)
no over complicated configuration on hosts

Overhead

RAM overhead: <5GiB RAM on 32GiB installed
Storage overhead: lost around 9GB of disk space per host

Obviously, in case of using large local HDDs, storage overhead will become negligible.

Scalability

In theory, going for more than 3 nodes will open interesting perfs scalability. So far, it's just replicating data, but you can also spread them when you have 3+ nodes.

Perfs

I'm comparing to a dedicated NAS with ZFS RAID10 (6x500GiB HDDs) with 16GiB of RAM (very efficient cache for random read/write) with semi-decent hardware (dedicated IBM controller card), on a NFS share.

	ZFS NAS	XOSAN	diff
Sequential reads	120 MB/s	170 MB/s	+40%
4K reads	9.5 MB/s	9.4 MB/s	draw
Sequential writes	115 MB/s	110 MB/s	-5%
4k writes	8.4 MB/s	17 MB/s	+200%

As you can see, that's not bad.

Drawbacks

right now, it's a fully manual solution to install and deploy, but it could be (partly) automated
it's a kind of "cheating" with XAPI to create a "shared" local file SR (but it works ^^)
XS host can't mount the share automatically on boot for some reasons. So I'm currently finding a way to do that correctly (maybe creating a XAPI plugin?)
you'll have to deploy 2 or 3 rpm's on Dom0, but the footprint is pretty light
it will probably (very likely in fact) work only on XS7 and not before
the only clean way to achieve this is to have SMAPIv3 finished. Until then, we'll have (at XO) to glue stuff in the best way we could to provide a correct user experience.

Conclusion

It's technically doable. But there is a mountain of work to have this in a "one click" deploy. I'll probably make a closed beta for some XOA users, and deploy things semi-manually to validate a bit the concept before spending to much time scaling something that nobody will use in production for some reasons (interest, complexity, etc.).

FATeknollogee

@olivier
Very nice work.
I'm claiming my place on the beta line.

black3dynamite

@olivier
Any drawback while using the default storage type LVM?

olivier

@black3dynamite I'm not sure to understand the question.

So far, the "stack" is:

Local Storage in LVM (created during XS install)
on top of that, filling it by a big data disk used by a VM
the VM will expose this data disk
XenServer will mount this data disk and create a file level SR on it
VMs will use this SR

It sounds like a tons of extra layers, but that's the easiest one I found after a lot of tests (you can see it as a compromise between modifying the host too deeply to reduce the layers VS not modifying anything into the host but have more complexity to handle on VM level). You can consider it as an "hybrid" approach.

Ideally, XenServer could be modified directly to allow this (like VMWare do with VSAN), and expose the configuration via XAPI.

I think if we (XO project) show the way, it could (maybe) trigger some interest on Citrix side (which is only into XenDesktop/XenApp, but hyperconvergence even make sense here)

black3dynamite

@olivier
The question is based on you using EXT (thin) shared storage instead of LVM (thick) for XenServer.

olivier

@black3dynamite I can't use LVM because it's block-based. I can only work with file level backend.

I did try to play with blocks, performance was also correct (a new layer so a small extra overhead). But I got a big issue in certain cases. Also, it was less scalable in "more than 3" hosts scenario.

Danp

@olivier Sounds promising. Can you elaborate on how adding additional overhead of XOSAN would yield an increase in performance of 40% / 200% ?