When to use SQL or NoSQL

scottalanmiller

Blogs would be the same way. Many, many things work well without tight relationships. Accounts and financial data are very common and major exceptions.

tonyshowoff

@scottalanmiller said:

Other things that are ideal for NoSQL include...... discussions in online communities! Think about it - we don't need tight atomic commits for posts, but they include a lot of data. So this type of data is very expensive for a relational database to handle and very efficient for a document database (a la MongoDB) to handle. We don't need the whole thread locked while someone is posting, if two people post at once and there isn't a really tight "who posted first" it doesn't really matter.

Sure, we use NoSQL for video/photo/etc comments and also messaging as well.

tonyshowoff

@scottalanmiller said:

Blogs would be the same way. Many, many things work well without tight relationships. Accounts and financial data are very common and major exceptions.

When it comes to building financial reports it's definitely superior to use relational databases, but I think most things can probably just be in NoSQL databases, but the difference is that databases like MySQL, Microsoft SQL Server, etc come with long, tested histories and a reliability/availability you can't usually get out of a NoSQL database without a ton of effort, but we've noticed Cassandra sort of breaks that mould.

Having said the above though, we do keep transaction histories, payment histories, etc in Cassandra as well, and we simply cache harder to generate things that need to use that information (very rare that things do) and also we track, for example, the amount changes so we don't need to do anything like SUM(10,000 rows) like I've seen people do for some reason.

scottalanmiller

Sure, you can find ways to do things in NoSQL that you do with an RDBMS, but it takes more care and possibly moving some of the data protection into the application layer. RDBMSs are really ideal when those things are needed. Tons of testing, as you mentioned, and tons and tons of research into how to make those transactions efficient.

scottalanmiller

One thing that is interesting is that PostgreSQL now talks in native JSON like MongoDB does. This is a case of the NoSQL world pushing the relational world to change how they think and do new things.

tonyshowoff

@scottalanmiller said:

One thing that is interesting is that PostgreSQL now talks in native JSON like MongoDB does. This is a case of the NoSQL world pushing the relational world to change how they think and do new things.

Definitely, I think that's a pretty kick ass feature.

scottalanmiller

One of the most important things about NoSQL is that they are adding variety and forcing everyone to look at databases in a fresh light. We had a lot of this stuff back in the 1980s, heck I was doing NoSQL for Eastman Kodak in 1989, but everyone on the IT side seemed to have literally forgotten about non-relational databases since the mid-1990s.

tonyshowoff

@scottalanmiller said:

One of the most important things about NoSQL is that they are adding variety and forcing everyone to look at databases in a fresh light. We had a lot of this stuff back in the 1980s, heck I was doing NoSQL for Eastman Kodak in 1989, but everyone on the IT side seemed to have literally forgotten about non-relational databases since the mid-1990s.

Yeah you know when this stuff started taking off, it seemed a little weird to me, because I had dealt with similar databases years ago, and an unnamed large "ISP" I worked for used essentially a NoSQL key-value store for buddies, statuses, etc and this was in the late 90s. People were acting as if nobody had thought of this stuff before.

scottalanmiller

Exactly. I am pretty sure that an entire generation of IT actually doesn't know what databases are, what relational databases are, relational theory, the difference between just a database and a DBMS. Discussions, especially on some other communities, really highlight that databases are a complete mystery. Even IT people are often confused as to what is a database and what is an application, which is mind blowing to me. And people wanting to say that they are "database programmers" - what does that even mean? They certainly don't mean that they are writing databases and I've never heard of anyone, ever, who did nothing but "program" in a database language (a la SQL.) And so many people think that half of IT is DBAs even though none of them have ever met one.

Something so basic and core to our field yet so completely unknown is odd.

tonyshowoff

@scottalanmiller said:

Exactly. I am pretty sure that an entire generation of IT actually doesn't know what databases are, what relational databases are, relational theory, the difference between just a database and a DBMS.

You said it all right there. I see the questions similar to the OP's question fairly often, typically it's "which is better?" and it's sort of odd to me that they didn't first ask themselves "well, does it need to be relational?" instead it's a question of speed, which isn't even applicable for 99% of people. I think this is because databases aren't seen as anything other than places to store things, and their actual power is taken for granted. I blame the MTV!

scottalanmiller

The speed thing, I think, is an SMB problem. Only since being on SW so much have I come to realize the obsession with unmeasureable and often purely theoretical speed improvements that don't matter at all. Tiny, tiny tweaks that cause no end of issues and expense and no one can tell if they even work. At the scope of most SMBs, everything is about the same. You've got to really have something be bad, whether it is disk array, database type, application, protocol (SMB vs. NFS, Block vs. File storage, iSCSI vs. FC), networking you name it, for it to be a problem. SMBs putting 10GigE to the desktop and bonding four devices when they can't push GigE of data is crazy.

tonyshowoff

@scottalanmiller You just don't get it, I need GigE for my ancient refurbished Dell machines with Windows XP.

scottalanmiller

GiE is a must have for VoIP today! And I need QoS on my massive GigE switch or I won't be able to make my 100Kb/s phone calls!

QuixoticAndy

@scottalanmiller said:

And people wanting to say that they are "database programmers" - what does that even mean?... And so many people think that half of IT is DBAs even though none of them have ever met one.

During the twenty-or-so contracts I've worked as a software engineer, I've met probably half a dozen DBAs, and only one of them actually wrote SQL for our application. The rest of it was done by developers. I'm sure somewhere there is a person doing nothing but write stored procedures and database functions, but I have yet to meet them.

A Former User

@scottalanmiller said:

GiE is a must have for VoIP today! And I need QoS on my massive GigE switch or I won't be able to make my 100Kb/s phone calls!

What? I've always went cheap with the phones that only have 100mb anyway.

lance

@tonyshowoff said:

We deal with both MySQL and a few different NoSQL databases as well. It really depends on the scenario, but if you don't know, then you can just use a regular relational database.

In general though you don't need a lot of data to justify MongoDB or Cassandra (we use both, though we're slowly transitioning to only Cassandra), in the same way you don't need to be doing relational or financial things to justify MySQL or anything like that.

NoSQL works best for key-value store, as in you already know the information you want, it likely isn't directly related to anything else and so probably doesn't need to be joined (joins tend to be very expensive in NoSQL if supported at all), and you don't need to worry too much about the same level of consistency. So some examples:

Sessions

Logs or user activity

Expensive but rarely rebuilt things from your relational database for permanent caching

There's also the issue of heavy writing, as NoSQL databases tend to be faster at writes than reads, which is why we use it for really write heavy things that either we do not need back right away or we allow the user to keep (when they make a comment, show it based on what they put, instead of getting it back, for example).

Relational databases can be used for all the same things as NoSQL databases and many actually do a fairly good job at it, and you don't really need to worry about the differences until you get hundreds of thousands or millions of requests (like us) per day.

Aside from that MySQL (and others) have been around for years, are highly tested in large environments, etc and anything which is relational, would need to be associated with things in other tables (foreign key scenario for example), dealing with financial information, etc.

When you don't know or don't have the environment or resources to test, then always just use your relational database, it's probably the right choice anyway.

If you are curious, in our environment for our adult content sites, we've got about 630 servers so and we use MySQL, MongoDB, Cassandra, and memcached (and Apache, nginx, and PHP), and our other products and services we use similar setups but usually either MongoDB or Cassandra, and like I said we're trying to transition to it as we've gotten more out of it and better performance for us. I've noticed a lot of issues with MongoDB and returning things in reverse order (even with a reverse index) and also paging this way, it's painfully slow compared to Cassandra (1 - 2 seconds compared to 400ms or so; doesn't seem like a lot, but keep in mind that's on top of other loading, so loading new comments or whatever on some of our older sites is up to 3 seconds, and most people start to feel things are "too slow" after about half a second.), so I suggest maybe not even bothering with MongoDB unless it fits some other scenario of yours.

Thanks for the info!

mlnews

If you are interested in learning more about NoSQL, I highly recommend this book (and highly recommend buying it directly from the awesome publisher....)

Seven Databases in Seven Weeks

scottalanmiller

Great book!

lance

@mlnews said:

If you are interested in learning more about NoSQL, I highly recommend this book (and highly recommend buying it directly from the awesome publisher....)

Seven Databases in Seven Weeks

Thanks! It looks like this is my next book to buy!