Comments on: Data Schemas It's a fallacy that for 1> you have to parse the whole of C++. We just went with a parser that could handle #include and struct declarations. The parser also understood directives embedded in comments such as //-ignore //-noignore so we could embed C++ between those comments. Worked like a charm. It’s a fallacy that for 1> you have to parse the whole of C++. We just went with a parser that could handle #include and struct declarations. The parser also understood directives embedded in comments such as //-ignore //-noignore so we could embed C++ between those comments.

Worked like a charm.

]]>
By: Julien Koenen/2011/04/15/data-schemas/#comment-2845 Julien Koenen Sat, 16 Apr 2011 07:07:43 +0000 We use a similar system in idea, but different in implementation. Instead of insisting on having the source data definitions in C/C++, we instead use a Scheme-based language. Our data-compiler (dc) then generates header files as well as binary files based on the data definition files. Usage from C++ is still fine, since you can always inherit or embed the generated structures. E.g.: class Car : protected DC::Car { //... methods, but no extra data. }; Advantages to this is that you can make use of Scheme's powerful macro system to massage the input data. A simple thing as converting degrees to radians for instance, can be done at data-compilation time. We use this for everything, including scripts, render settings, particle data (both for rendering and behaviour), you name it. We use a similar system in idea, but different in implementation. Instead of insisting on having the source data definitions in C/C++, we instead use a Scheme-based language. Our data-compiler (dc) then generates header files as well as binary files based on the data definition files.

Usage from C++ is still fine, since you can always inherit or embed the generated structures. E.g.:

class Car : protected DC::Car
{
//… methods, but no extra data.
};

Advantages to this is that you can make use of Scheme’s powerful macro system to massage the input data. A simple thing as converting degrees to radians for instance, can be done at data-compilation time. We use this for everything, including scripts, render settings, particle data (both for rendering and behaviour), you name it.

]]>
By: Dominique Boutin/2011/04/15/data-schemas/#comment-2829 Dominique Boutin Fri, 15 Apr 2011 19:37:46 +0000 Hi Joseph. That is actually quite a good point that I forgot to mention explicitly: The option to generate code for different languages from the data schema. Thanks for the comment! Hi Joseph. That is actually quite a good point that I forgot to mention explicitly: The option to generate code for different languages from the data schema. Thanks for the comment!

]]>
By: Julien Koenen/2011/04/15/data-schemas/#comment-2827 Julien Koenen Fri, 15 Apr 2011 19:32:01 +0000 For the Bullet SDK we are also parsing the structures in C++/C header files, and create reflection data from there. An importer can automatically deal with versioning, pointer resolving, endianness, 32/64bit and forward/backward compatible etc. So there is very little manual work involved, which is nice. This parser is very limited, and the main "markup" is to let it skip C++ constructs that confuses the parser. I've been considering using clang as parser for the reflection, so I'm interested in your clang project. How did your experiments go? For the Bullet SDK we are also parsing the structures in C++/C header files, and create reflection data from there. An importer can automatically deal with versioning, pointer resolving, endianness, 32/64bit and forward/backward compatible etc. So there is very little manual work involved, which is nice.

This parser is very limited, and the main “markup” is to let it skip C++ constructs that confuses the parser. I’ve been considering using clang as parser for the reflection, so I’m interested in your clang project.

How did your experiments go?

]]>
By: Joseph Simons (@sigmel)/2011/04/15/data-schemas/#comment-2821 Joseph Simons (@sigmel) Fri, 15 Apr 2011 18:02:32 +0000 We used sqlite on a few games. I didn't deal with it too directly, but we did have issues with memory fragmentation and speed in some cases. It was likely overkill for our purposes and we were planning on replacing it due to this. Probably not an issue on a PC environment so much, but for consoles when you are using all the available resources those were some of our concerns. We used sqlite on a few games. I didn’t deal with it too directly, but we did have issues with memory fragmentation and speed in some cases. It was likely overkill for our purposes and we were planning on replacing it due to this. Probably not an issue on a PC environment so much, but for consoles when you are using all the available resources those were some of our concerns.

]]>
By: Niklas Frykholm/2011/04/15/data-schemas/#comment-2818 Niklas Frykholm Fri, 15 Apr 2011 16:46:07 +0000 I think your data and code can well go out of sync in the approach you mentioned simply if the data you load has been created with an old version of your exe. So it would still be beneficial to be able to check identity of the structural information used. An asset creation tool and a game executable compiled from the same source are always in sync but that is also true if you define the structure in a different language than C if that file is the one and only source of that information. I think your data and code can well go out of sync in the approach you mentioned simply if the data you load has been created with an old version of your exe.

So it would still be beneficial to be able to check identity of the structural information used.

An asset creation tool and a game executable compiled from the same source are always in sync but that is also true if you define the structure in a different language than C if that file is the one and only source of that information.

]]>
By: Mike D/2011/04/15/data-schemas/#comment-2814 Mike D Fri, 15 Apr 2011 15:17:23 +0000 Nice article. Very sensible approach and similar to we do except for the generation of structural information. We actually use a fourth approach: 4. The "asset compiler" and the "runtime" use the same code base. (In fact it is even the same exe, asset compilation is triggered with a -compile flag. The asset compilation code is stripped out of the release exe.) This means that there is only a single definition of the data structure (the struct in the code base) which is used by both the asset compiler and the runtime. They can never get out of sync and we don't need a code generation step. Can your tool deal with data that cannot be described by C structs? For example : int32 number_of_enemies; Enemy enemies[number_of_enemies]; int32 number_of_guns; Gun guns[number_of_guns]; I have often thought that it would be nice to have a standard (C-like) way of describing such general data layouts. Nice article. Very sensible approach and similar to we do except for the generation of structural information. We actually use a fourth approach:

4. The “asset compiler” and the “runtime” use the same code base. (In fact it is even the same exe, asset compilation is triggered with a -compile flag. The asset compilation code is stripped out of the release exe.)

This means that there is only a single definition of the data structure (the struct in the code base) which is used by both the asset compiler and the runtime. They can never get out of sync and we don’t need a code generation step.

Can your tool deal with data that cannot be described by C structs? For example :

int32 number_of_enemies;
Enemy enemies[number_of_enemies];
int32 number_of_guns;
Gun guns[number_of_guns];

I have often thought that it would be nice to have a standard (C-like) way of describing such general data layouts.

]]>
By: Julien Koenen/2011/04/15/data-schemas/#comment-2808 Julien Koenen Fri, 15 Apr 2011 10:20:00 +0000 It certainly is an interesting and often overlooked problem. I've seen a fair few different approaches - including trying to pull info from the pdb files (ouch). You might also like to take a look at "Using Templates for Reflection in C++" by Dominic Filion in GPG5. Typically we dump the data properties/format into an XML for tools. Personally I think the parsing code approach with some extra markup offers the best solution if you put in the necessary work. Here's a clang based parser I was experimenting with a while back - https://github.com/gwaredd/reflector It certainly is an interesting and often overlooked problem. I’ve seen a fair few different approaches – including trying to pull info from the pdb files (ouch). You might also like to take a look at “Using Templates for Reflection in C++” by Dominic Filion in GPG5.

Typically we dump the data properties/format into an XML for tools.

Personally I think the parsing code approach with some extra markup offers the best solution if you put in the necessary work. Here’s a clang based parser I was experimenting with a while back – The only difference is how the structures are organized. Does each struct get it's own hpp/cpp file or are they clumped together (as certain popular middleware enginers happen to do)? If there is a mega-hpp file with dozens of structs then changing one, even if it's infrequently used, means you get to rebuild the whole thing. The only difference is how the structures are organized. Does each struct get it’s own hpp/cpp file or are they clumped together (as certain popular middleware enginers happen to do)? If there is a mega-hpp file with dozens of structs then changing one, even if it’s infrequently used, means you get to rebuild the whole thing.

]]>
By: Julien Koenen/2011/04/15/data-schemas/#comment-2803 Julien Koenen Fri, 15 Apr 2011 08:14:28 +0000 )

]]>
By: questor/2011/04/15/data-schemas/#comment-2799 questor Fri, 15 Apr 2011 07:22:56 +0000 Thanks for the comment Noel. About the versioning: We compute a 'type-signature' for each type in a data schema. This signature is basically a normalized (no comments, minimal whitespace,no annotations) version of the data structure definition (and all dependent types recursively). We then store a crc32 of that in an enum inside the generated type and in all data files that use the data schema. That enables us to check if the actual data schema is the same. As all the binary files are only intermediate we don't need backwards-compatibility because we can always recreate the intermediates from the original source files. Thanks for the comment Noel. About the versioning: We compute a ‘type-signature’ for each type in a data schema. This signature is basically a normalized (no comments, minimal whitespace,no annotations) version of the data structure definition (and all dependent types recursively).

We then store a crc32 of that in an enum inside the generated type and in all data files that use the data schema. That enables us to check if the actual data schema is the same. As all the binary files are only intermediate we don’t need backwards-compatibility because we can always recreate the intermediates from the original source files.

]]>
By: Julien Koenen/2011/04/15/data-schemas/#comment-2795 Julien Koenen Fri, 15 Apr 2011 06:12:36 +0000 Hi Alex ;) No worries! We don't use the C- (or C++) Runtime in our game code at all. As mentioned in the code-comment it was just for the sake of the example and to keep the code short and focussed on the point. Hi Alex ;) No worries! We don’t use the C- (or C++) Runtime in our game code at all. As mentioned in the code-comment it was just for the sake of the example and to keep the code short and focussed on the point.

]]>
By: Julien Koenen/2011/04/15/data-schemas/#comment-2793 Julien Koenen Fri, 15 Apr 2011 06:09:20 +0000 I guess that really depends on your needs. I don't see immediate advantages of a database to a text file and I could imagine that integration of database-editing in your existing source-code editing environment could be a lot harder. I guess that really depends on your needs. I don’t see immediate advantages of a database to a text file and I could imagine that integration of database-editing in your existing source-code editing environment could be a lot harder.

]]>
By: Julien Koenen/2011/04/15/data-schemas/#comment-2791 Julien Koenen Fri, 15 Apr 2011 06:04:21 +0000 Yeah great post Julien! The difficulty of managing this stuff in C/C++ is quite frustrating, and so often we end up with hand rolled reflection systems which often end up obfuscating code a lot. I love the idea of being able to load the data description in code I can think of some pretty exciting things that would be possible with that! Another thing to be careful about is having some kind of versioning in place on the data to prevent problems when loading up old data with new code or vice versa. An incremental version number is usually sufficient to decide whether to attempt to load the data or not, or you could do something cleverer and try and fit the data to the different format. Yeah great post Julien! The difficulty of managing this stuff in C/C++ is quite frustrating, and so often we end up with hand rolled reflection systems which often end up obfuscating code a lot. I love the idea of being able to load the data description in code I can think of some pretty exciting things that would be possible with that!

Another thing to be careful about is having some kind of versioning in place on the data to prevent problems when loading up old data with new code or vice versa. An incremental version number is usually sufficient to decide whether to attempt to load the data or not, or you could do something cleverer and try and fit the data to the different format.

]]>
By: Noel Austin/2011/04/15/data-schemas/#comment-2786 Noel Austin Fri, 15 Apr 2011 03:51:14 +0000 Cool post! I wish I were at liberty to talk about some of the stuff we're doing here in this area :) Cool post! I wish I were at liberty to talk about some of the stuff we’re doing here in this area :)

]]>
By: Alex Rosenberg/2011/04/15/data-schemas/#comment-2783 Alex Rosenberg Fri, 15 Apr 2011 01:30:45 +0000 How does your setup handle build times when changing a data structure? For example if you add a new data structure or modify an old one how much of a recompile do you get? There are a few ways to structure your auto-generated header files and I'm curious what you settled on. How does your setup handle build times when changing a data structure? For example if you add a new data structure or modify an old one how much of a recompile do you get? There are a few ways to structure your auto-generated header files and I’m curious what you settled on.

]]>
By: Gavan Woolery/2011/04/15/data-schemas/#comment-2779 Gavan Woolery Fri, 15 Apr 2011 00:57:20 +0000 You might want to consider looking at protocol buffers (