Web and Database Planning for a site that may need to scale rapidly
-
I have a new website that may need to scale fairly quickly. The website will have a database for information people are submitting through a secure back end, but other users will be searching the posted information. It is tough to tell how quickly the site will need to expand, so that brought me to my question. Initially I figure we can start out with one web server with both the database and web front end, however, I was not sure how to calculate capacity planning and scaling in such a way that it would be easy to scale as the need arises. Knowing the site will scale, it may make more sense to break the database into another server right away. The next question then becomes tracking web and database server load, and deciding the best way to handle the increased load. Is it multi-front end, and having the database sync from server to server. Would it make sense to increase resources on each server first (CPU, RAM, high output disk). Looking at the cost increase for certain features it almost seems to make sense to break out into multiple boxes much more quickly. This would be my first multi-server environment. The servers will be Linux (Ubuntu), with most likely a MySQL database. Also, any thoughts on scaling MySQL, or is there a better alternative that will scale better?
-
MySQL and its replacement, MariaDB, are not known for scaling well. If you need to scale past a single server they would not be my "go to" choice. But scaling past a single server means an insane amount of volume. You can get some pretty serious scale by going with SSDs and throwing a lot of memory and CPU in a single box and not having the sync issues that cripple databases.
For scaling up most people either choose PostgreSQL for a relational database but most people go to NoSQL products today as they scale so much better.
-
I would rather use the faster drives as well. Is there a rule of thumb for determining which way to go?
-
Yes, if you CAN fit all into a single server you want to do that at pretty much any cost. A single server is cheaper for the same speed until you physically can't fit into a single server anymore. Scaling horizontally for data is extremely expensive in terms of resources (you don't scale linearly.) Adding a second server comes nowhere close to doubling your capacity.
Scaling horizontally, though, gives you options like replicating across regions. So, for example, MangoLassi is designed with horizontal scaling in mind and if our Pacific Rim readership gets really large we can very easily place an additional database (MongoDB) and web servers in their region and serve content locally rather than making them connect back to the US servers that we are using right now.
-
It is really amazing how much database you can get out of a DL380 or R730 with 128GB or more of RAM and RAID 10 SSDs. I mean truly astounding. And if you outgrow that there are options like the DL580 which takes you to quad processors!
And that's just the Intel world. Nearly any database that you would want to run in production, except for MS SQL Server, runs just as well (maybe better) on RISC and EPIC systems like Power, Sparc and Itanium. Not only are they more powerful processor for processor than Intel and AMD systems, but they scale way, way beyond the single box size of Intel and AMD. Instead of four processors being a reasonable limit, that's more like a starting point. Getting 32 or 64 processors in a single system is completely reasonable. And talk about memory capacity! They just get enormous.
-
Lateral might be an option in the future. Why is PostGre better than MySQL for scaling?
-
PostgreSQL scales better. It is more performant for large, relational lookups and has better scaling options. MySQL is fast for light, non-relational lookups but that means that it is often eclipsed by even faster, even lighter NoSQL options. PostgreSQL has replaced MySQL for nearly all new web architecture development that doesn't work well with NoSQL.