I am really only a hobbyist, who has aspirations far too grand, that said, I am trying to figure out the right way to create my database so that database changes don't require client refactoring, but also fast. Please respond in as if I wouldn't understand typical development or DBA terminology well.
The situation: I was trying to determine how many books each user has rated. I consider a book rated if it has any two of the following:
-Overall rating (ratings table)
-sub rating (ratings table)
-tag (book_tags table)
-list (book_lists table)
*Related tables: users, tags, lists
The problem: I have 10 sub ratings and the two overall ratings all in the ratings table, each in a column (guessing this is bad, but not sure). Should I instead have a ratings table (12 rows) and a book_ratings table where each row of the ratings table is a type of rating for a user?
-e.g. book_ratings: id | user_id | book_id | rating_id
If yes, what happens if there are 500k books, 12 rating types per book, 10,000 users and a total of 5 billion rows on that book_ratings table? is that going to run super slow? Another consideration is that I may want to add more sub rating types in the future, which is partially why I think it might be valuable to change it, but it's a lot of work, so I wanted to check first.
Thanks!
You should model your system to make it usable and extendable. Having 12 rating columns is going to cause you a lot of pain when you want to aggregate results etc. There are plenty of examples of those kinds of pains on this site.
As it grows you optimize by adding indexes, clustering, data partitioning, etc.
But if you know you are going to have massive amounts of data right away you might want to consider some "Big Data" solutions and perhaps go the NoSQL way..
Yes, I would change the structure as you describe - it is more flexible and more 'correct' (normalized).
You'll have 5 billion rows (that would indeed be bad) only if EACH user gives ALL the ratings to ALL the books, that seems unlikely. A great majority of users would not rate anything, and a great majority of books will not attract any rating.