Saving and loading data is, frankly, a bit of a pain in the arse. Whilst for a lot of cases I personally wouldn’t really recommend XML (runtime overhead if used in-game, lack of good merging with most source control systems, general unreadability), for saving data in tools (and sometimes caching state too), it is hard to beat as a simple and effective mechanism – particularly if you are working in C#. The standard C# XmlSerializer library provides the functionality to read/write arbitrary objects into XML format with very little code… and, it has to be said, all the power and control of a well-greased sledgehammer.
Basically, what XmlSerializer does is to turn this:
1 2 3 4 5 6 7 8 9 10 11 12 13 | public class Skills { public float Strength = 4.0f; public float Dexterity = 8.0f; public float Speed = 6.0f; } public class Character { public String Name = "Mina"; public int HP = 4; public Skills Skills = new Skills(); } |
…into this:
1 2 3 4 5 6 7 8 9 | > <Name>Mina</Name> <HP>4</HP> <Skills> <Strength>4</Strength> <Dexterity>8</Dexterity> <Speed>6</Speed> </Skills> </Character> |
…and back again. More specifically, it takes any tree-like graph of C# objects and walks over them, converting public members into text format and writing them to an XML file. Deserialisation then does the reverse, constructing the objects and restoring the member values.
XmlSerializer (found in System.Xml.Serialization) is really very straightforward to use – the results above can be achieved with nothing more than the following:
1 2 3 4 5 6 | Character character = new Character(); XmlSerializer serialiser = new XmlSerializer(typeof(Character)); StreamWriter writer = new StreamWriter("Test.xml"); serialiser.Serialize(writer, character); writer.Close(); |
And deserialisation is an equally simple affair:
1 2 | FileStream filestream = new FileStream("Text.xml", FileMode.Open); Character character = (Character) serialiser.Deserialize(filestream); |
When you consider that this will work on (virtually) any class, and requires no maintenance to support newly-added member variables or sub-classes, it really is quite an incredibly powerful technique to have available. Aside from the ease-of-use, XmlSerializer is also really quite fast for what it does – whilst a pure-binary serialisation mechanism would almost certainly outperform it, I doubt that it is possible to get generalised plaintext output to be significantly better. One of our tools uses it to serialise the entire state of a large build process, and the resulting 216Mb XML file loads in about 17 seconds on my machine.
And so, without further ado, here is a grab-bag of notes on how to use XmlSerializer effectively and (relatively) safely:
Customising names and types
The XmlElement attribute can be used to alter the name of a member in the XML document. For example:
1 2 | [XmlElement(ElementName = "FullName")] public String Name; |
…will have the effect of naming the field “FullName” in the XML. You can also use XmlElement to specify additional type information (more on that later).
Preventing fields being serialised
Normally, any public member will be serialised – there are two ways to avoid this (well, other than the obvious one of “making the member non-public”). Firstly, the XmlIgnore attribute can be used:
1 2 | [XmlIgnore] public int SuperSecretStuff; |
XmlIgnore is great if you always intend to omit that field, but if you want to decide to include it or not at runtime, then you need a slightly more arcane bit of trickery. The key to this is that XmlSerializer looks for a boolean member called “Specified”, and uses that to decide whether to serialise a given member. You can make the ”specified” member be a simple boolean if you want to control serialisation on a per-instance basis, but I’ve generally found that I want to do it as a global option across the whole file, which can be achieved using a custom property:
1 2 3 4 5 6 7 8 9 10 11 | public bool NameSpecified { get { return(IncludeNames); } } public String Name; [ThreadStatic] public static bool IncludeNames = true; |
One interesting thing to note here is that NameSpecified is a read-only property, and hence won’t itself get serialised – if this wasn’t the case, then an XmlIgnore attribute would be necessary to prevent this. Also, using the ThreadStatic attribute on IncludeNames is necessary if you ever might serialise from multiple threads simultaneously – without this, all threads will be accessing the same static member and hence may overwrite each others’ IncludeNames setting.
Handling dictionaries and other custom lists
Whilst XmlSerializer copes happily with many list classes, it sadly fails when presented with others (and, unfortunately, a lot of the time “fails” should be read as “silently omits the member it dislikes”). The reason for this is that it only understands how to serialise IEnumerable and ICollection objects which fit simpler patterns (see the MSDN documentation for a full list of constraints). Therefore, key/value pair dictionaries and suchlike confuse it and either cause errors or get ignored outright.
Fortunately, it is not particularly difficult to construct a custom property which returns the data in a format XmlSerializer can deal with – for example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | [XmlIgnore] public Dictionary CharactersByName = new Dictionary(); // Helper to serialise the dictionary public Character[] CharactersByNameAsList { get { return (CharactersByName.Values.ToArray()); } set { foreach (Character character in value) { CharactersByName.Add(character.Name, character); } } } |
In this example, we know that the dictionary keys are stored in the Character class itself, so we can simply flatten the dictionary down to an array of Characters for serialisation, and restore it again afterwards. If this wasn’t the case, then an intermediate class storing the key and value pair could be used instead to achieve the same effect.
Specifying types
You may have noticed in our very first example code that we passed in the type of the object we were going to serialise as a parameter when creating the XmlSerializer object. One of the more annoying quirks of the XmlSerializer system is that it requires a full list of all classes which might get serialised ahead-of-time. In simple cases (such as the Skills class included in Character), it can find these itself, but if the member variable is of a base class type then XmlSerializer will not know about the actual derived object type, and will throw an (exceptionally cryptic) exception when it encounters it.
There are two ways to avoid this – one is to pass an array of types to the XmlSerializer constructor, and the other is to use the XmlInclude attribute. The latter can be attached to a class (or, I believe, a member, although I’ve never personally tested this) and specifies additional classes which should be included in the type list when this one is encountered – for example:
1 2 3 4 5 6 7 8 | Class SpecialSkills : Skills { } [XmlInclude(typeof(SpecialSkills))] Class Skills { } |
One useful trick here is that if you don’t know the list of possible types ahead of time (if, for example, you are loading assemblies dynamically), or it’s simply too much of a pain to enumerate them all by hand, you can use reflection to build the type list. For example, the following code walks all of the assemblies in the current application domain and adds any subclasses of Skills it finds to the serialiser’s list:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | List skilltypes = new List(); foreach (Assembly assembly in AppDomain.CurrentDomain.GetAssemblies()) { foreach (Type type in assembly.GetTypes()) { if (type.IsSubclassOf(typeof(Skills))) { skilltypes.Add(type); } } } XmlSerializer serialiser = new XmlSerializer(typeof(Character), skilltypes.ToArray()); |
Reusing serialiser instances
One of the oddities of XmlSerializer is that under the hood it actually generates code on-the-fly to perform the (de-)serialisation process. This code generation is the reason that occasionally when things go wrong you will get errors reported in DLLs which have random ASCII character strings as their names, and it also means that the process of creating an XmlSerializer object is really quite slow and memory-hungry, comparatively speaking. Fortunately, once you have created one, it can be reused many times, so it is highly recommended to keep them stored somewhere for this purpose. Under the hood XmlSerializer does attempt to perform its own caching of instances, but there are many circumstances under which this can fail, so minimising the number of new serialisers you create at an application level is highly recommended.
Watch out for private members
As mentioned above, private members won’t get serialised – in fact, everything that XmlSerializer needs to touch must be public (including the classes themselves). However, this can be awkward sometimes because the default storage type in C# is private, and so newly-added members can sometimes drop through the cracks. There really isn’t a perfect answer to this, but one useful trick is to use reflection on classes which you know are supposed to be fully-serialised to check that they are fully public:
1 2 3 4 5 | FieldInfo[] private_fields = type.GetFields(BindingFlags.NonPublic | BindingFlags.Instance); if (private_fields.Length > 0) { …Do something about it… } |
(for bonus points here, you could also check if private members have the XmlIgnore attribute, meaning that intentionally private marked-as-ignored fields would not cause an error)
Constructors
In order for XmlSerializer to actually construct the objects during deserialisation, they must have a public constructor which takes no arguments. For 99% of stuff this isn’t a big deal, but occasionally it turns out that you don’t want to allow that for some reason… unfortunately the best solution I’ve found to that problem thus far is to have a (thread static) boolean flag which is set during deserialisation, and then throw and exception in the constructor if this isn’t set.
Be wary of object references
Unlike BinarySerializer, XmlSerializer can only cope with structures which can be represented as a tree (as that is the form of the resulting XML). As a consequence, if you have circular references, back-links or similar then you will find that serialisation fails. The easiest way to deal with this is to mark the pointers in question with the XmlIgnore attribute, and then have a post-deserialisation function which walks through and reinstates the missing pointers. For example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | Class Node { Void PostDeserialise(Node parent) { Parent = parent; foreach (Node child in Children) { child.PostDeserialise(this); } } [XmlIgnore] Node Parent; List Children; } |
A similar problem exists where the same object is referenced twice from different places – the serialisation process will effectively “flatten” the tree down and you will end up with several copies of the object when it is deserialised. Handing this is largely application-dependent, but generally comes down to either making one of the copies the “master” and using XMLIgnore on the others (restoring the pointers later), or storing a flat list/array of the duplicated objects and then using a property with custom get/set methods to turn the references into simple indices when they are serialised.
Default values are painful
I didn’t myself know about this particular pitfall of XmlSerializer until I read Paul Evans’ enlightening article on the topic “The XmlSerializer class in C# is asymmetric”. I won’t reiterate his advice here but instead strongly suggest you read it before going anywhere near the default value attributes.
For more control
If the format of the XML itself matters to you (as opposed to simply “can I save and reload this correctly?”), then there are a few more attributes which can be used to customise things further – a complete list can be found on MSDN under “Attributes That Control XML Serialization”).
If you really need to customise further, you can override the serialiser methods themselves to achieve almost any layout. Full details would be an entire article in themselves, but a good starting point is “How to: Control Serialization of Derived Classes”).
Hopefully this article has given a reasonable overview of how XmlSerializer can be used, and what it is good (and less good) for! Although an awkward beast at times, it definitely deserves a place in any C# programmers’ toolkit.
Additional reading
Whilst bouncing the content of this article off the infinite wellspring of wisdom that is the AltDev community, some other interesting tools for XML serialisation in C# came to light.
One of these, as pointed out by Glenn Watson, is the DataContract system that is available from .Net 3.0 onwards. This serialiser is more advanced than the XmlSerializer, and solves several of the problems mentioned here (most notably, the need for publicly accessible fields and constructors, and the inability to serialise complex list types by default). It also gives significantly more control over the format of the generated XML, and adds functionality such as validation. However, it appears to be reliant on having attributes added to all fields in the classes being serialised, making it a little less appealing for applications which want to serialise large collections of otherwise unrelated objects – although the overall improvements in robustness may well be worth the extra effort in marking up classes.
Further documentation can be found on MSDN under Using Data Contracts.
XNA also provides another serialisation mechanism, in the form of IntermediateSerialiser, which is part of the default XNA content pipeline and solves a lot of the problems XmlSerialiser has in a game development context. The MSDN documentation is a little bare, but fortunately the author of the system Shawn Hargreaves has a wealth of information on his blog, which Paul Evans was kind enough to create a handy list of links to:
- XML and the Content Pipeline
- IntermediateSerializer vs. XmlSerializer
- How serializers work
- Teaching a man to fish
- Everything you ever wanted to know about IntermediateSerializer
- Customizing IntermediateSerializer, part 1
- Customizing IntermediateSerializer, part 2
- Why IntermediateSerializer control attributes are not part of the Content Pipeline
- Serializing collections of shared resources
Thanks
Many thanks to Kalin for his input into this article, and Glenn Watson, Nathan Runge and Paul Evans for a wealth of suggestions and information on alternative solutions.