Help with Application Infrastructure / Architecture

jn19

I’m looking to my fellow users for help here, and I’ve always been impressed with the breadth and depth of the knowledge that you’ve shared in this forum. My background’s in general educational/ corporate/small business IT support, desktop deployment with MDT, and basic Windows server setups with Hyper-V. I’ve also helped manage a vSphere 5.5 installation with around 30 hypervisors before, so I’m comfortable with that as well, but have more experience with Hyper-V. My new job was initially focused on the same types of tasks, but now management has chosen to discontinue our IT support services and concentrate all of our resources on our core SaaS applications.

The problem is that while our core web applications have tremendous functionality, they have terrible performance and even worse security. The servers were basically set up by the developers years ago as a test environment when initially developing the core business, but then they quickly became the production servers once everything was working and able to generate a profit.

Again, I’m no application architect, but I know that SQL server shouldn’t be installed on the system drive, that the application server probably needs more RAM than the database servers, that field client applications shouldn’t bypass the application server and communicate directly with database servers, and that everything should be behind a firewall and only accessible via VPN connection or federated identity, not RDP. There’s no Active Directory infrastructure for these machines, so there are individual local accounts on each machine with random insecure credentials, and the list just goes on. All of this was set up before I worked here, and I’ve been lobbying to get things fixed ever since. We’d hired an employee that was experienced in infrastructure architecture but he had to move back overseas for family reasons, and the task of setting up an entirely new, performant application infrastructure has been passed to me. I like doing the research, but we’re a small company, and I’m only afforded so much time to come up with a solution. At previous jobs I had network and system engineers that dealt with the big infrastructure issues. It’d be one thing if I was just trying to get things running better at our current load, but within the next 2-3 months we’ve got a large client coming on that will make usage of our system increase by a factor of at least 10.

As it stands, I need help with everything from hardware sizing to SQL licensing. I don’t know the best way to license SQL 2014 for our purposes, and I know that our lead dev would prefer the Enterprise version for some of the extra features, but I think the costs will be too prohibitive.

Right now we’re running IIS 8 on Server 2012, with a SQL 2012 backend. The application’s in .NET 4.6.1, and we’ll be moving it to IIS 8.5 soon. Unfortunately, our entire setup is dependent upon IIS, .NET and SQL, and I don’t think that will be changing anytime soon, if ever.
Our servers interact with several types of clients:

Mobile Devices – Mobile view of web application, plus native Android & iOS apps that tie into a subset of the web application’s functionality. There are generally 30-40 connections of this type at any given time.

Site Servers – These machines ingest information from SCADA systems & issue automation commands. Each machine runs a local .NET app that has a SQL 2008R2 backend. Almost all of the site servers utilize cellular connections, and traffic volume has been an issue. There are around 80 sites like this that send and receive data to/from our servers every 30 seconds to a minute.
Web Users – The web application is used by another 30-40 users at any given time.
Response times on the web application are often painfully slow. Some queries can take over 100 seconds. I’ve run a lot of SQL health scripts from Brent Ozar’s site and those have helped a bit, but I don’t think the speed issue lies with SQL. IIS seems to be the culprit.

CURRENT SERVER SPECIFICATIONS:
We utilize 3 physical servers at this time, along with a Redis Instance in Azure.

Application Server:
Dual Xeon E5645 (6-core) @ 2.4GHz
64GB RAM
System Drive – Intel DC3500 – 600GB
SSD Storage – Intel DC S3500 SSD–800GB
SATA Storage -- Seagate 7200 rpm – 3TB
Network – 1Gbps up/down (usually at 1-3% utilization)
Redis on Windows – Caching App server requests

Database Server – Master:
Dual Xeon E5-2630 (6-core) @ 2.3 GHz
128GB RAM
System Drive – Intel DC S3500 – 600GB
SATA Storage -- Seagate 7200 rpm – 3TB
Network – 1Gbps up/down (usually at 1% utilization)
Redis on Windows – Pub/Sub relationship with Azure Redis – High-volume ASP requests
Transactional Replication Publisher

Database Server – Slave:
Dual Xeon E5-2630 (6-core) @ 2.3 GHz
128GB RAM
System Drive – Micron M510DC SSD – 960GB
Network – 1Gbps up/down (usually at 1% utilization)
Transactional Replication Subscriber
Azure Redis – Standard Tier -13GB (I have no idea why we have a Redis instance in Azure, but I imagine that it’s a speed bottleneck as well. I don’t know how to measure the response time between our servers and the Azure Redis instance, though.)

My proposed hardware is along these lines:

3 Hypervisors with the following specs:
2 x 2.4GHz Octa-Core E5-2630 v3 Haswell Xeon
256GB RAM
Boot Drive - 160 GB SSDs in RAID1
VM drives – Intel DC S3500 or NVMe drives of around 900GB in RAID1
Storage drives – 4 x 6TB SATA in RAID 10 w/ BBU
Server 2012R2 / SQL 2014 Std or Ent

1U Quad-core servers for pfSense & HAProxy.

Here are some basic diagrams of our current setup along with my proposed setup, which is all based heavily on Stack Exchange's setup:

These are all little more than guesses, as I really don’t know the best way to set up a fast and secure IIS / .NET / SQL infrastructure. Is virtualization a bad idea for this type of setup? My thoughts were that the advantage to having 3 or so high-performance hypervisors would be that we could more easily migrate things to better hardware as the need arises, and that it should run nearly as fast as a bare-metal server as long as we’re not putting both databases/app servers/redis instances on just one box, causing resource contention.

Any help you can give would be greatly appreciated.

scottalanmiller

Welcome to the MangoLassi community!

scottalanmiller

Why Redis hosted on Azure? Redis on Azure itself is fine, I'm wondering why your pub/sub is being handled remotely from the physical servers. Seems like having Redis local to the rest of the system would mean faster responses with less overhead. Redis is pretty trivial to manage and runs great on Linux so can be all done for free.

jn19

@scottalanmiller

I wondered the exact same thing. It was set up that way by our lead dev, (who also co-founded our company), thinking that we were going to migrate EVERYTHING to Azure, but then realized upon testing that it's much slower there. I guess he's got a lot of API calls pointing there and either doesn't want / hasn't had time to move things local. IT wasn't consulted about this beforehand, so we became stuck with it after the fact. This is the only part of our setup that wasn't in place prior to my arrival, and I didn't know about it until a few weeks ago. I just thought we had part of one of our websites in Azure.

scottalanmiller

@jn19 said:

@scottalanmiller

I wondered the exact same thing. It was set up that way by our lead dev, (who also co-founded our company), thinking that we were going to migrate EVERYTHING to Azure, but then realized upon testing that it's much slower there.

And more expensive... and unreliable. LOL

If cost was a concern, why is SQL Server in there?

scottalanmiller

@jn19 said:

As it stands, I need help with everything from hardware sizing to SQL licensing. I don’t know the best way to license SQL 2014 for our purposes, and I know that our lead dev would prefer the Enterprise version for some of the extra features, but I think the costs will be too prohibitive.

Right now we’re running IIS 8 on Server 2012, with a SQL 2012 backend. The application’s in .NET 4.6.1, and we’ll be moving it to IIS 8.5 soon. Unfortunately, our entire setup is dependent upon IIS, .NET and SQL, and I don’t think that will be changing anytime soon, if ever.

That's really bad. All of those are good tech, but as a SaaS vendor you are really, really stuck. You need all kinds of crazy licensing for these in a public setting and will need to monitor those licenses for forever. This is both a licensing and a human cost that will be enormous.

.NET can be run without Windows Servers, but Microsoft makes that free knowing that people will use it as an excuse to get mired into costly MS licenses and it works.

Getting away from SQL Server is your top concern in this list. Getting to PostgreSQL would save you a developer or two's salary in licensing improvements.

DustinB3403

The hardware design seems very odd to say the least.

3 server, each with dual socket boards, and 6 Core CPU's.

Are you looking to redesign everything from the ground up, replace equipment etc?

scottalanmiller

@jn19 said:

Web Users – The web application is used by another 30-40 users at any given time.
Response times on the web application are often painfully slow. Some queries can take over 100 seconds. I’ve run a lot of SQL health scripts from Brent Ozar’s site and those have helped a bit, but I don’t think the speed issue lies with SQL. IIS seems to be the culprit.

That is not many users at all. If you are getting performance issues from the IIS / .NET layer with that few users and you don't feel that the database is a bottleneck, then there is a really, really good chance that you have a code problem in .NET that needs to be addressed. It might be that some really critical components are blocking and waiting on things that they should not be waiting on. How many threads do you have working? MVC depends on heavy external concurrency for performance so this is very important.

marcinozga

I think you're trying to work these issues backwards. You have performance problems, you suspect where the problems are, so you're trying to solve them by throwing more hardware at it. You need to step back and really figure out what's going on there. Monitor entire setup for a few days and see if there are obvious bottlenecks, like CPU, RAM or disk IO.

And like Scott said above, it really sounds like you have some bad code there.

scottalanmiller

I agree with @marcinozga , there is a really good chance that hardware won't solve the issue here. It might mitigate it some, it might hold it off but as you scale it will just get worse and worse most likely, if the hardware even does anything.

jn19

@scottalanmiller

You've got me! They've been leasing servers this whole time on a monthly basis, so over the last 3-4 years the company has probably paid $30k each for machines that might have been $7k-8k new. SQL Standard's been bundled into that monthly price at around ~$900/mo/SQL server (dual-proc hexacore machines), so full licenses could have easily been bought for that by now. Plus, the SQL servers are generally at maybe 10-15% CPU usage for the "master" server, and maybe 5% at most for the "slave" server, the latter of which is where the app server and clients pull most of their data. It's just been a lack of good long-term planning, really. I'm trying to help now that I'm here, but it would have been nice to have been here before everything was coded and put into production.

I do wonder what, from a technical standpoint, keeps us from using something like Postgresql, as we do industrial automation, and all of the data acquisition devices we utilize have Linux drivers available. Not that we'd ever have time to rewrite things to switch to it, but I wonder nonetheless.

jn19

@scottalanmiller

Oh, I agree that it should run quite well on the current hardware, given the right setup. Here's some info from one of the hang reports in LeanSentry, which has been a pretty handy tool for IIS analytics:

[img]
[img]

The blocked request location in this instance was a "Session in AcquireRequestState."

jn19

@DustinB3403

I'm basically looking for the best ways to improve performance that I can control, i.e. any IIS/SQL/Server 2012 configuration or architecture changes that can be made that will require little to no work on the part of the developer(s). I've got full access to these machines but I have no software development experience, so I just want to do what I can to get things running more smoothly.

Dashrender

From a platform perspective this seems strange to me that your devs are not the ones working to fix these issues.
Unless you are responsible for application performance as well as hardware performance?

jn19

@Dashrender said:

From a platform perspective this seems strange to me that your devs are not the ones working to fix these issues.
Unless you are responsible for application performance as well as hardware performance?

I concur, but since the only dev on the main application (co-founder/co-owner/boss's boss) is convinced that it's hardware or some simple configuration setting somewhere that's causing the issue, I figure I should go ahead and investigate every avenue of improvement that I can touch!

scottalanmiller

@jn19 said:

@scottalanmiller

You've got me! They've been leasing servers this whole time on a monthly basis, so over the last 3-4 years the company has probably paid $30k each for machines that might have been $7k-8k new. SQL Standard's been bundled into that monthly price at around ~$900/mo/SQL server (dual-proc hexacore machines), so full licenses could have easily been bought for that by now. Plus, the SQL servers are generally at maybe 10-15% CPU usage for the "master" server, and maybe 5% at most for the "slave" server, the latter of which is where the app server and clients pull most of their data. It's just been a lack of good long-term planning, really. I'm trying to help now that I'm here, but it would have been nice to have been here before everything was coded and put into production.

I do wonder what, from a technical standpoint, keeps us from using something like Postgresql, as we do industrial automation, and all of the data acquisition devices we utilize have Linux drivers available. Not that we'd ever have time to rewrite things to switch to it, but I wonder nonetheless.

Doesn't matter if they have Linux drivers... PostgreSQL is the database only, teh application layer can happily run on Windows. Not that that would be my first choice, just saying that using PostgreSQL is over the network and the platform for PostgreSQL itself isn't a factor for other things.

scottalanmiller

@jn19 said:

@DustinB3403

I'm basically looking for the best ways to improve performance that I can control, i.e. any IIS/SQL/Server 2012 configuration or architecture changes that can be made that will require little to no work on the part of the developer(s). I've got full access to these machines but I have no software development experience, so I just want to do what I can to get things running more smoothly.

The problem is... those aren't the places to fix things and you could drop a million dollars and do effectively nothing. It looks like you have a code problem, throwing money and hardware at it might do nothing.

scottalanmiller

@jn19 said:

@Dashrender said:

From a platform perspective this seems strange to me that your devs are not the ones working to fix these issues.
Unless you are responsible for application performance as well as hardware performance?

I concur, but since the only dev on the main application (co-founder/co-owner/boss's boss) is convinced that it's hardware or some simple configuration setting somewhere that's causing the issue, I figure I should go ahead and investigate every avenue of improvement that I can touch!

Oh man... the same guy that caused all of the performance and cost problems already? That doesn't sound like a healthy situation.

scottalanmiller

@Dashrender said:

From a platform perspective this seems strange to me that your devs are not the ones working to fix these issues.

Well, one of the problems with devs causing issues is that often they caused them because they don't really know what they are doing and so can't fix them because they aren't sure why or how it all works.

This is not just suggested by several of the scenarios that the OP mentioned about how they got to where they are, but using SQL Server and IIS for SaaS apps, Redis on Azure, misunderstanding the goals of cloud and such all have the same "not necessarily but realistically... bad developers" problem. It's all "tech I heard about from my first year college professor" who, in turn, was a failed developer that's never worked in the real world and when put together is just a chain of disaster.

Someone in a position of decision making who understands the tech, even a little, looking at the cost associated with all of the Windows Server and SQL Server licenses and that scaling cost as they take the product public would nearly always put a stop to using those technologies before the first line of code was written. Sure, there are exceptions, but few and far between. Those technologies cost a fortune and creating licensing problems that are staggering.

It's hard to tell but it sounds like just one bad decision layered on another and people not willing to take ownership of their mistakes leading to an attempt to throw money (VC money, perhaps) at a problem to hide the fact that the person responsible doesn't want to take ownership of the issue.

scottalanmiller

@jn19 said:

I've got full access to these machines but I have no software development experience, so I just want to do what I can to get things running more smoothly.

I'm not saying quit, but this is when you prep your resume and start looking. I'm not being funny in any way. It's impossible to read the situation from here, but everything that we are hearing is that you have completely incompetent developers and management and they are driving the product into the ground and throwing money away like crazy and are trying to blame IT for their failings. This aren't the kinds of things that are likely to fix themselves down the road. This is the making of a bad situation - most likely just a company failing. But if this is supposed to be software to sell to customers, how will these problems play out at that point? How will paying customers react to being told to "buy faster desktops" or other insane things when the application isn't fast enough for them?