What's the first thing you do when you get a new laptop or system?

scottalanmiller

In other words if the important stuff is in its own lane, wouldn't that exempt it from any kind of work to prioritize or rearrange traffic like it'd hit if it were sharing a connection?

Nope, doesn't change anything. The computer doesn't know that the traffic is prioritized. Basically you are attempting to do QoS without really doing it. The NIC is going to treat the traffic with the same overhead whether DOTA is the only thing talking on that channel or if other things are. The NIC is able to handle wire speed without additional latency, lowering the work it does does not free things up for other traffic realistically, but it does make everything else slower by sending it on the Wifi.

creayt

@scottalanmiller said:

Nope, doesn't change anything.

Maybe you can correct the issues w/ how I'm thinking about this:

Information ( in the form of requests and responses in both directions ) needs transfer between the laptop and the router.
Information is not transferred in its raw or natural form, it requires preparation, translation, and fragmenting into smaller chunks and reassembly from smaller chunks for incoming information.
The laptop sends its various requests or responses to the router through an adapter, which intelligently handles the above tasks.
Therefore, handling 4 separate applications' worth of information coordination is an amount of work, which consists of each application's information coordination workload. At any given moment, there may be requests currently being coordinated on adapter 1 w/ the router, and new requests coming into it.
By removing say, one of the applications from this channel ( adapter 1 ) and moving it to its own channel ( adapter 2 ), it guarantees that at any given moment the new network adapter isn't spending its clock cycles handling other requests, receiving new ones from other applications, or having to mix in requests from the removed application into the packet batches.
This has an effect similar to splitting a specific task off onto a dedicated second core in CPU processing, although the primary core has more than enough horsepower and capacity to handle all of the work of the computer, by isolating a task onto a separate core, the task switching penalties and "contention" for focus are removed, and the 2nd core will respond, even if only marginally or trivially, more quickly to any incoming tasks dedicated to it because its resources are guaranteed to not be doing anything else, even if we say that none of the other work at play will or should matter because the hardware can more than easily enough handle all of it.

I may totally misunderstand how packets and routers work and be missing how things you've already mention invalidate the summary above. If so I apologize and am definitely thinking about this from a programmer's perspective. But, I'm genuinely interested in understanding these networking concepts so your help is very much appreciated! I find this stuff ( networking, servers, general IT ) fascinating despite my limited exposure to and knowledge of it.

creayt

@scottalanmiller said:

The NIC is able to handle wire speed without additional latency, lowering the work it does does not free things up for other traffic realistically.

Maybe this part is the key. Are you saying that whatever handles the processing in the adapter is so fast that it's basically 0 ms? So that the embedded processor in the NIC, even when handling 5 separate applications' worth of information workload, say a large file transfer, a bunch of web pages, IMs, websocket data, etc, will always be 0 ms and that there's no subprocessing at all?

In my mind latency is a byproduct of a few different components, the network beyond the laptop, but internally on the laptop, the time it takes for the CPU to hand a request off to the NIC, and then the NIC to translate that request into appropriate formats and packets, and then the time for the NIC to intermingle that w/ the streaming it's already doing to the router. So by reducing the work in any one of those steps you're getting closer, even if only theoretically, to 0 ms.

Dashrender

I'll toss a small part into this mostly to see if Scott agrees more than anything...

Assuming the WAP is connected to the same switch that your PC is connected to, and that the router is also connected to that switch, you'll gain little to nothing because the switch will be mixing those packets together before sending them to the router assuming they are all on the same VLAN/router interface.

Also the use of the DOTA software is 'just one more thing' in the line up that that has to be processed before either NIC can do their job. In a situation where the gains by splitting traffic are trivial at best and non existent at worst, this additional layer does nothing but hurt performance, it could never help it.

Scott also mentioned that the WiFi connection will suffer the additional inefficiencies inherent of WiFi - latency and a contention based network. Now, if the LAN port is saturated, and you'd see actual gain from splitting of traffic over two network connections, then you can overcome these inefficiencies, but that doesn't seem to be the case.

Dashrender

@creayt said:

In my mind latency is a byproduct of a few different components, the network beyond the laptop, but internally on the laptop, the time it takes for the CPU to hand a request off to the NIC, and then the NIC to translate that request into appropriate formats and packets, and then the time for the NIC to intermingle that w/ the streaming it's already doing to the router. So by reducing the work in any one of those steps you're getting closer, even if only theoretically, to 0 ms.

I ask because I don't know - is the processing of things into packets done at the NIC layer, or is it done in software and the CPU before being sent to the NIC to put it on the line?

I know that some advanced NICs, like those used for SANs, do offload some of the processing from the system CPU, but I'm not sure if that's the case in a normal PC/laptop.

scottalanmiller

@creayt said:

Information is not transferred in its raw or natural form, it requires preparation, translation, and fragmenting into smaller chunks and reassembly from smaller chunks for incoming information.

The laptop sends its various requests or responses to the router through an adapter, which intelligently handles the above tasks.

Nearly all of that task is done by the OS, not the NIC. Even with TCP Offload enabled (it rarely is) there is a lot done by the OS.

scottalanmiller

@creayt said:

By removing say, one of the applications from this channel ( adapter 1 ) and moving it to its own channel ( adapter 2 ), it guarantees that at any given moment the new network adapter isn't spending its clock cycles handling other requests, receiving new ones from other applications, or having to mix in requests from the removed application into the packet batches.

This has an effect similar to splitting a specific task off onto a dedicated second core in CPU processing, although the primary core has more than enough horsepower and capacity to handle all of the work of the computer, by isolating a task onto a separate core, the task switching penalties and "contention" for focus are removed, and the 2nd core will respond, even if only marginally or trivially, more quickly to any incoming tasks dedicated to it because its resources are guaranteed to not be doing anything else, even if we say that none of the other work at play will or should matter because the hardware can more than easily enough handle all of it.

This is where the analogy breaks down. This is not like a CPU. A CPU has cache, NICs are not caching the workload for repeatable code. The idea of horsepower between the two is quite different as well. The NIC doesn't saturate its processor, so the idea of capacity on the processing is pointless. The bottleneck is always the network, this is completely different than a CPU where having "spare cycles" buys other processes more speed. The NIC doesn't have this effect. So you aren't getting anything here.

The only bottleneck you are removing is a tiny bit of line wait, and if that is affecting you, like I said before, this is a horrible bandaid adding real latency all the time to resolve imagined latency some of the time. If you have line wait, get another NIC and bond them. Or move to 10GigE.

You are going after performance problems that don't exist and introducing real network latency in doing so.

scottalanmiller

@Dashrender said:

I ask because I don't know - is the processing of things into packets done at the NIC layer, or is it done in software and the CPU before being sent to the NIC to put it on the line?

I know that some advanced NICs, like those used for SANs, do offload some of the processing from the system CPU, but I'm not sure if that's the case in a normal PC/laptop.

Packets is done on the NIC normally. But uses no real overhead.

creayt

@Dashrender said:

I ask because I don't know - is the processing of things into packets done at the NIC layer, or is it done in software and the CPU before being sent to the NIC to put it on the line?

I know that some advanced NICs, like those used for SANs, do offload some of the processing from the system CPU, but I'm not sure if that's the case in a normal PC/laptop.

I think this is a key part of what's missing in how I see things too. It's my impression that NICs do a ton of processing internally which is why one of the main things that differentiates their performance/speed/price is the speed of their internal processor. For example the "Killer" brand NIC in the laptop I just returned touts its 400 MHz processor, which as far as my current understanding and w/ the steps I outlined would mean that, say, compared to a NIC w/ a 200 MHz processor it'd chip away at latency during all moments that it's translating requests into packets and deserializing requests from packets by doing so ~ twice as fast. Now that's clearly just one part of the latency contribution, but it's still a part where a faster processor, or offloading a task to second processor in the case of using both a wired and wireless adapter ( two processors instead of one ), could theoretically reduce the end latency value.

creayt

@scottalanmiller said:

Nearly all of that task is done by the OS, not the NIC. Even with TCP Offload enabled (it rarely is) there is a lot done by the OS.

Ah, ok, well that certainly changes things. What purpose does a faster internal NIC processor serve? Here's a quote from a Tom's Hardware review for reference:
@review said:

You may recall that the Killer NIC derived its strength from a few key enhancements over regular integrated network controllers. First and foremost, the adapter used an on-board 400 MHz processor to handle all network packet processing. This offloaded traffic from the host CPU and side-stepped the Windows networking stack. Killer actually had a Linux distribution on the card, turning it into a sort of PCI Express-based co-computer.

scottalanmiller

@Dashrender said:

Scott also mentioned that the WiFi connection will suffer the additional inefficiencies inherent of WiFi - latency and a contention based network. Now, if the LAN port is saturated, and you'd see actual gain from splitting of traffic over two network connections, then you can overcome these inefficiencies, but that doesn't seem to be the case.

Right, which I mentioned earlier - the only case in which this would be beneficial is when you are bandaiding a saturated Ethernet connection in which case you need to do something a lot better than this. In all other cases, this is a negative.

The WiFi connection adds huge latency and network risk compared to the Ethernet connection. If adding a second NIC was a big deal, I could see doing this maybe in a really rare case. But we are talking about a trivial hardware update to go to 2 - 4 Ethernet ports.

scottalanmiller

@creayt said:

I think this is a key part of what's missing in how I see things too. It's my impression that NICs do a ton of processing internally which is why one of the main things that differentiates their performance/speed/price is the speed of their internal processor. For example the "Killer" brand NIC in the laptop I just returned touts its 400 MHz processor, which as far as my current understanding and w/ the steps I outlined would mean that, say, compared to a NIC w/ a 200 MHz processor it'd chip away at latency during all moments that it's translating requests into packets and deserializing requests from packets by doing so ~ twice as fast.

That's why I mentioned the fact that the NICs are generally at wire speed. Once at wire speed, there is no "faster" no matter what you do. The biggest benefits of extra processing power is not in making the NIC faster, often this makes it slower (losing wire speed) but taking a load off of the CPU itself.

scottalanmiller

Remember that RAID cards are slower when you go to hardware RAID compared to software RAID. But we use them because it offloads processing from the main CPU and because of convenience. But we never do it for speed.

Dashrender

@scottalanmiller said:

Remember that RAID cards are slower when you go to hardware RAID compared to software RAID. But we use them because it offloads processing from the main CPU and because of convenience. But we never do it for speed.

Why are hardware RAIDs slower than software? One uses the (I hope) specially designed processor for this task, the other uses the CPU.

scottalanmiller

@Dashrender said:

Why are hardware RAIDs slower than software? One uses the (I hope) specially designed processor for this task, the other uses the CPU.

Because the central CPU is just SO much faster. Even a specially designed $50 processor can't keep up with that $800 Xeon that is powering the main system.

Software RAID became almost universally faster around 2001 when the Pentium III became the standard entry point server processor.

Dashrender

So why haven't we moved to that solution on Intel based systems? Would we see so little gain? Or would this require a fundamental change for the system board makers to make hot swappable plugs? OR are the big vendors holding us back because of the prices they get to charge us for RAID cards?

If Software really is faster - why not go that way unless there are other things holding us back that either make it more expensive or impossible to do?

scottalanmiller

@Dashrender said:

So why haven't we moved to that solution on Intel based systems? Would we see so little gain?

Because, like nearly everything in SMB IT, performance is not a key issue. If we were concerned about performance as the primary factor we would not be using AMD64 processors at all, we'd run nothing but UNIX, on software RAID, etc.

We run Windows, AMD64 chips and hardware RAID because they are easy, convenient and protect us. There is almost no major decision made in SMB IT (or even enterprise IT) where performance is the driving factor. A secondary or tertiary one maybe, but not a driving one.

Software RAID is the only option on big iron servers and always has been. Hardware RAID only exists because of deficiencies in how the SMB world handles software RAID (Windows SR is terrible, VMware doesn't have it, etc.)

scottalanmiller

@Dashrender said:

Would we see so little gain?

Extremely little. The only place you'd really see it is on RAID 6 and 7 systems, RAID 7 is software RAID only already so that point is moot.

scottalanmiller

@Dashrender said:

Or would this require a fundamental change for the system board makers to make hot swappable plugs? OR are the big vendors holding us back because of the prices they get to charge us for RAID cards?

They are all hot swappable already and have been for as long as I've been aware. You can go to MDADM, Windows SR or ZFS today and you have had hot swap since the 1990s at least.

scottalanmiller

@Dashrender said:

If Software really is faster - why not go that way unless there are other things holding us back that either make it more expensive or impossible to do?

Because outside of the most extreme cases, speed just isn't that important. And when it is, the truly high speed systems like FusionIO can't use hardware RAID anyway.