I share my thoughts on data collection in video games. Examples of data collection might include performance data at the end of a Guitar Hero or Rock Band song or match data at the end of a Black Ops or Halo:Reach match or score and birds left at the end of an Angry Birds level. This is not an exhaustive list. YMMV.


But is it fast enough so we can fly away? There are times when your data collection is going to have to be as fast as possible. Launch day of your game and the week after launch, holidays like Thanksgiving and Christmas and Hanukkah, school breaks/vacations, evenings and weekends, etc. Make some worst-case scenario estimates on data traffic based on prior experience or Ouija board or however, but make sure that your data collection can scale at these peak times. If you need to spin up servers temporarily for these times, do it. This is also an opportunity to investigate micro frameworks that reduce application complexity and that may allow you to be as direct as possible in collecting game data into a database.


I haven’t looked myself, but I’m sure there’s some seedy, alternate meaning to the phrase, “Don’t put all your eggs in one basket” (Urban Dictionary: NOT SAFE). The same applies to data and databases. Don’t put all your data in one database. If you are going to be writing data from your game into a database, a good rule of thumb is a database per title+platform. If you use a single database for all your game data across platforms, when that one database “shits the bed” (Urban Dictionary: SAFE), and databases will eventually “shit the bed”, data collection will effectively be stopped for all titles+platforms. The single database also makes it a pain to do data archival when you want to move old entries into long-term storage. And a larger, single database means database backups are going to be slower.


If you’re going to be looking up game data from your database, make sure you are using indexed fields. Any data retrieval resulting in a “full-table scan”, as the size of your data size grows, so does the time it takes for you to look up that data. This happened to me around Thanksgiving 2008 where collected game data, used for online tournaments, was taking longer than usual to post and affect the tournaments. This resulted in a complete rewrite of a data collector, that has since (knock on wood … hehehehe wood), performed flawlessly and that scales automatically during peak periods.


Whether you’re working with a middleware provider to retrieve data or you’re doing it internally, presumably you’ll have some process to collect and do something with that data. I like to affectionately call these processes, “reapers”. Without knowing what data you have to work with, I can say this, make sure your reaper saves its state while running. Reapers, whether it be from solar flares or maxed out memory limits, will fail and they will need to be restarted. Without some bootstrap knowledge of where reaping left off, this may result in unnecessary processing of already processed data.


Ensure that you use reasonable limits in the amount of game data that you pull at any one time from your database. Let’s say your reaper “shits the bed” on a Friday and you don’t notice until Monday and you need to restart the reaper. Furthermore, 1,000,000 (1 MILLION) new game data entries have been collected since your reaper stopped. Batching these unprocessed entries in groups of 1,000 (1 THOUSAND) will help to make sure you’re not trying to retrieve too much data in any one cycle.


It’s 2011. JSON is the shit and XML is dead, right? Whatever. Pick a reasonable, structured data format in which to package your game data.






I said it. You better make sure all your data is going to fit in your database fields. Don’t assume that an 8K database field in which to store match data, for example, is reasonable. Or maybe it is reasonable. But did you validate that a game data record with 16 players from a match with all game data statistics maxed out will fit into an 8K database field? If not, dummy up the data, try to stick that record in the database, and see if it fits. If not, increase the size of the database field. It’s that easy. No pills or creams required.


Data collection from video games requires some finesse to get right. Be quick to process, distribute your load, index your lookups, save state in processing, batch process, structure the data and size your storage appropriately.

Also, happy April Fool’s Day.

You can find more hilarity over on my Twitter account, CzarneckiD.