After an even longer delay here is part 3 of Gathering and Analyzing Data with Django. Apologies to those who were waiting for the rest of the series, as the entirety has been completed for awhile. The article itself was picked up for publication in a gems style book so I ceased further distribution. The book’s status is now in flux so I’m going ahead with the posting the rest.
The goal of the series is to showcase Django as a way to build web-enabled tools. The example presented is an analytics site for gameplay data. In the first post we setup the site and created the database schema. In the second post we collected the logs and populated the database. In this post we’ll begin generating the statistics, and provide a Web API to access them.
Generating the Statistics
With our gameplay logs now successfully loading, we can turn our attention to mining the data. Some metrics we’d like to examine are the number of players, that status of the quest among players and – where applicable – the completion time. In its present state all this data is already contained in the database, however as our player base grows so does the time required to generate the statistics. We need a more scalable solution.
Mining data can put a significant strain on the database. To mitigate this we can run the queries and then cache the results inside the database. Django has a straight forward way of accomplishing this through management tasks.
Management tasks are commands that can be run on the project. In fact we’ve already been using them substantially as they are invoked through manage.py. Creating one involves subclassing the base command and supplying our implementation. By convention the commands go in the manage directory within the application’s folder structure. To keep with the component nature of Django we’ll be creating an additional application, called analytics, that will hold the statistics as well as the tasks used to generate them.
Management tasks live within the application’s directory in a subfolder of management named commands. Here we create queststatistics.py which holds the implementation.
class Command(BaseCommand): def handle(self, *args, **options): player_count = Player.objects.count() quests = Quest.objects.all() for quest in quests: print 'Quest\n' stats = QuestStatistics(quest=quest) # Get statistics for the various states completion_query = QuestState.objects.filter(quest=quest, status=QuestState.COMPLETED_STATUS) stats.completed = completion_query.count() stats.accepted = QuestState.objects.filter(quest=quest, status=QuestState.ACCEPTED_STATUS).count() stats.rejected = QuestState.objects.filter(quest=quest, status=QuestState.DECLINED_STATUS).count() stats.not_encountered = player_count - stats.accepted - stats.rejected - stats.completed # Get statistics for the completion times if stats.completed > 0: result = completion_query.aggregate(Avg('completion_time')) stats.average_completion_time = result['completion_time__avg'] result = completion_query.aggregate(Min('completion_time')) stats.shortest_completion_time = result['completion_time__min'] result = completion_query.aggregate(Max('completion_time')) stats.longest_completion_time = result['completion_time__max'] stats.save()
The task simply filters the status information based on the state. From there it populates the counts for each individual state. Additionally it uses aggregates, which function as wrappers around SQL functionality such as the min and max. We can then run the command by invoking the following
python manage.py queststatistics
In our case we can schedule the command, and any others we create, through a cron job. The task is run nightly so all the data is ready to use the next day.
Designing the Web API
Now that we have a means to generate statistics on our data, we need a mechanism to query this information. The most flexible way to go about this is to create a Web API, which is a defined set of HTTP requests that generate structured responses in XML or JSON. The benefit gained is the ability for other applications to process the data. Perhaps later on someone will want to link in the data to a spreadsheet, or create a fancy visualization of the data. The possibilities are endless.
To build the API we need to define a set of views which respond with a JSON. In this view we will be taking in the identifier for the quest and then pass back the associated statistics.
We then link a URL to the view. The URL contains the parameters for the view. Django matches these using regular expressions that are then passed on.
If we point our browser to http://127.0.0.1/quest_statistics/0 we’ll see the JSON that is returned.
Also posted to my personal blog