I first began working with SQL (the SQL-92 dialect) in 1995. At the time I’d only ever used raw disk IO for storage; and SQL was a complete shock. Every bit of data I needed could be stored using a single API, and the database server took care of all the hard work. In 1997 I switched to Microsoft SQL Server 6.5, and it quickly became my preferred database. Not everyone in the gaming world uses SQL, but if you do, here are some performance tuning lessons I’ve learned along the way.
Part 1: Effective use of SQL Server
SQL Server is almost completely devoid of “magic performance tricks.” This article isn’t going to be about tuning configuration settings; instead, it’s going to cover how to give yourself enough flexibility and freedom to spot performance problems early, and make changes to improve performance when you identify a problem.
Design Flexibility
Before we start, let’s cover a few helpful design ideas that can make performance tuning easier in the future. Each one of these could be discussed as its own topic, but let’s do a quick top level view.
- Use SQL only for data that must be varnish-cached), and so on.
- Design your SQL environment to be split apart. If you have eight or nine types of data in your system, make them each their own databases. That way, when your game suddenly explodes in size, each one can be scaled upward at its own pace. Within a single database, I find that it’s useful to give each subsystem its own three-character prefix so that I can rapidly group related objects together. The only drawback is that you won’t be able to use foreign keys across distributed databases; but in most cases that’s necessary for high performance anyway.
- Use SQL licensing effectively. Since SQL Standard with a size limitation, I like to use it in proof of concept environments. When the database grows larger, I can migrate it to SQL Standard with just a new connection string.
- If you have a physical SQL server, put your TempDB on its own, physically isolated disk drive. Ideally, use a sub-$100 SSD for this. There are lots of nifty tricks you can do with a high performance TempDB drive.
- Get an MSDN subscription and use your development SQL license to have extra non-production servers sitting around. This means you can always experiment with a new database configuration.
- Don’t write SQL directly into your application. Using stored procedures mostly removes the SQL language parser overhead. Additionally, when you discover a performance problem, a stored procedure can be surgically altered while still in production.
- Make use of your database maintenance plans. Note that these only work on SQL Standard; but it can be awfully useful to have your database run a checkpoint and incremental backup every hour. Indexes should be rebuilt nightly.
- Learn “Third Normal Form”. The full details of 3NF are quite complex, but in practice, what it means is that each datum should exist in the database once and only once. A classic example is storing the department name for an employee database; here’s the right and wrong way to do it:
I should also warn you that databases are easy to take for granted. SQL Server is a very friendly, deceptively simple program. Often the dev database you create will seem so fast that you’ll be tempted to ignore performance tuning, until the production server grinds to a halt on deploy day. So let’s review a few tricks to enable you to see these problems before they get pushed live.
Spotting Performance Problems
Everyone’s best friend should be “Show Execution Plan.” For each query you write, click the little button that shows either the estimated or actual execution plan. The results will look like this:
In this screen, you see a “Table Scan” that consumes a large amount of time that could be optimized out by creating an index. It’s the equivalent to having an unstructured array and scanning through every element blindly. Adding an index allows your database to use the equivalent of adding a hash table to look up data faster. The great value of SQL Server is that you can dynamically tune the indexes your application uses without modifying your code!
You should look through the execution plan for the following:
- Missing indexes. SQL Server will helpfully tell you, highlighted in green, the exact index it wishes it had. Right click on it and select “Missing Index Details;” you can then give it a name and create it for an instant performance boost …… but remember that this isn’t always the best idea. Each index you create slows down inserts, updates, and deletes. Indexes also consume memory, which can result in page faults when you’re under memory pressure. If you can, make all your indexes use integer IDs, since they are very memory-dense and fast to search. Organize indexes with multiple values so that the most restrictive values are first – that will allow the query optimizer to exclude large volumes of rows rapidly.
- Table Scans. As described above, if SQL server is spending a significant amount of time scanning your tables, try to figure out ways to modify your indexes to convert them to “Index Seek” instead.
- Too many joins. If your execution plan is really tall and wide, this usually means that SQL server is joining lots of tables together in a single step. Lots of joins are an inescapable side effect of a normalized database; I’ll write a future article on the subject in detail. But for the moment, look for any tables that can be optimized away. Try declaring a variable to contain some of the values you want and selecting them early. For example, look at these two equivalent SQL snippets – one of them only joins two tables, and the other joins three:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | -- SLOW WAY: Get weapons sales by querying three tables SELECT i.item_name, i.item_id, h.price, h.unit_count FROM act_auctionhistory h INNER JOIN itm_items i ON i.item_id = a.item_id INNER JOIN itm_itemclasses ic ON ic.itemclass_id = i.itemclass_id WHERE ic.itemclass_name = 'weapon' GROUP BY i.item_id, i.item_name -- FAST WAY: Get the same data but use only two tables at a time DECLARE @cid INT SELECT @cid = itemclass_id FROM itm_itemclasses WHERE itemclass_name = 'weapon' SELECT i.item_name, i.item_id, h.price, h.unit_count FROM act_auctionhistory h INNER JOIN itm_items i ON i.item_id = a.item_id WHERE i.itemclass_id = @cid GROUP BY i.item_id, i.item_name |
Here’s another tip. Log on to your live server (make sure you use read-only privileges of course!). Right click on your SQL Server in the left hand pane and select “Activity Monitor”, then “Recent Expensive Queries”. I find that I get a lot of mileage out of just checking this page every few hours:
So when you get see this screen, what do you look for and how do you make use of the information? Start by right clicking on each query and show the execution plan for each one.
Log Your Database Activity
Another worthwhile tip is to modify your code to write a log of database queries to your debug output. When you’re using advanced ORM tools like NHibernate, it’s very easy to write code that has a side effect of generating unexpected database hits. For example, consider this simple logic:
1 2 3 4 5 | for (int i = 0; i < itemarray.Length; i++) { if (itemarray[i].IsQuestObject()) { return true; } } |
In most cases this logic is fine and fast. But imagine that the programmer down the hall fixes a bug by making IsQuestObject() ping the database – all of a sudden you’ve got a nearly-invisible massive performance penalty. The database is often so fast that you don’t notice these kinds of performance sinks. So I modify my ORM to emit debug output every time it executes a SQL statement.
As a result, when running in debug, I see the following:
1 2 3 4 5 6 7 8 | SQL: list_OrderedUserAccounts (Time: 00:00:00.0030000) SQL: get_PermissionsByUser (Time: 00:00:00.0020000) SQL: RetrievePermissionGroupObject (Time: 00:00:00.0020000) SQL: get_JoinsByLabel (Time: 00:00:00.0010000) SQL: get_JoinsByLabel (Time: 00:00:00.0010000) SQL: get_JoinsByLabel (Time: 00:00:00.0010000) SQL: ListProductObject (Time: 00:00:01.3680000) SQL: list_OrderedPlatforms (Time: 00:00:00.0020000) |
When I look at this output, I immediately check to see if I’ve accidentally written a loop that calls get_JoinsByLabel too many times. Then next I investigate the query “ListProductObject” to see why it took 1.5 seconds.
Combine multiple queries
Nothing improves performance quite like reducing the number of queries you execute. Rather than submitting ten queries rapidly, why not make use of a single stored procedure that returns ten result sets? Let’s say you’re working on the front page of your in-game auction house, but when the user mouses over an item name or a player name, you want to show some secondary details.
You could do this by writing one query for the auction house page and a second query for each time the user hovers their mouse. But that would produce dozens of queries every time the mouse moved.
Instead, let’s create a single compound query that returns both the auction page and the popup text for everything. Even though at first glance you may think you’re doing unnecessary work, SQL Server is already looking up all the objects using hash tables, and you’re reducing query overhead significantly by reducing volume. Here’s roughly how to do it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | CREATE PROCEDURE get_AuctionHouseFirstPage AS -- Preparation: Select basic into a temporary table SELECT auction_id, character_id, auction_date, … FROM ACT_Auctions, … INTO #temp_auction_page -- First result set: Return the auction page SELECT ... FROM #temp_auction_page -- Second result set: Return pop-up information about characters SELECT user_name, guild_name, … FROM CHR_Characters c INNER JOIN #temp_auction_page t ON c.char_id = t.char_id -- Third result set: Return pop-up information about items SELECT item_name, item_enchantment, ... FROM ITM_Items i INNER JOIN #temp_auction_page t ON i.item_id = t.item_id |
For Next Time
Many of you may be asking – what about SQL vs NoSQL databases? I encourage you to tread lightly and consider them carefully on their own merits. Unlike SQL, which provides specific guarantees to data behavior at the cost of performance, NoSQL databases have their own unique advantages and drawbacks. There are many situations in gaming that are extremely well suited for NoSQL implementations; but this is a complex issue that really deserves its own article.
For the next SQL Server performance article, I’ll look at the speed of database inserts and updates, and the side effects of some basic choices in software style.