Three options for data correctness

Instapaper Text

Three options for data correctness

I just read the posts and the linked blogs, I had a question about some specific implementations. How do you deal with classes that represent another non-[in this case]-Python entity that may be updated outside of Python?

I’m not sure if this sort of case is outside of the scope of what’s being talked about in the articles, but if there’s a better way to do getting on things like p4 paths or elements in a Maya file (that may have been changed by the user since instantiating/loading the object) I’d really like some ideas about that.

You basically have three options and fortunately they line up easily on a scale:

Technique	Correct	Difficulty
Transactions	Always	High
Fetch-on-demand	Usually	Medium
Store in memory	Maybe	Low

Let’s get on the same page first. Let’s consider all three types of interactions- database through a DAL, perforce (or any source control) interaction, and interaction with some host application (Maya, or your engine, or whatever). So what are the three approaches and how do they differ?

Store in Memory

You create a code object with a given state, and you interact with that code object. Every set either pushes changes, or you can push all changes at once. So for example, if you have a tool that works with some Maya nodes, you create the python objects, one for each node, when you start the tool. When you change one of the python objects, it pushes its changes to the tool.

This is the simplest to reason about and implement. However, the difficultly quickly becomes managing its correctness. You need to lock people out of making changes (like deleting the maya node a python object refers to), which is pretty much impossible. Or you need to keep the two in sync, which is incredibly difficult (especially since you have any number of systems running concurrently trying to keep things in sync). Or you just ignore the incorrectness that will appear.

It isn’t that this is always bad, more that it is a maintenance nightmare because of all sorts of race conditions and back doors. Not good for critical tools that are editing any sort of useful persistent data. And in my opinion, the difficulties with correctness are not worth the risk. While the system can be easy to reason about, it is only easy to reason about because it is very incomplete and thus deceivingly simple. So what is better?

Fetch on Demand

Here, instead of storing objects in two places (your code’s memory, and where they exist authoritatively, like the Maya scene, or a Perforce database), you store them only where they exist authoritatively and create the objects when that data is queried. So instead of working with a list of python objects as with Store in Memory, you’d always query for the list of Maya nodes (and create the python object you need from it).

This can be simple to reason about as well but can also be quite slow, depending on your dependency. If you’re hitting a DB each time, it will be slow. If you need to build complex python objects from hundreds of Maya or Max calls, it will be slow. If you need to query Perforce each time, it will be slow.

I should note that this is really just a correctness improvement upon Store in Memory and the workings are really similar. The querying of data is only superior because it is done more frequently (so it is more likely to be correct). The changing of data is only more likely to be correct because it will have had less time to change since querying.

That said, in many cases the changing of data will be correct enough. In a Maya scene, for example, this will always be correct on the main thread because the underlying Maya nodes will not be modified by another thread. In the case of Perforce, it may not matter if the file has changed (let’s say, if someone has checked in a new revision when your change is to sync a file).

Transactions

Transactions should be familiar to anyone who knows about database programming or has read about Software Transactional Memory. I’m going to simplify at the risk of oversimplifying. When you use a transactions, you start a transaction, do some stuff (to a ‘copy’ of the ‘real’ data), and commit the transaction. If the ‘real’ data you are reading or updating has changed, the whole transaction fails, and you can abort the transaction, or keep trying until it succeeds.

Mass simplification but should be enough for our purposes. This is, under the hood, the guaranteed behavior of SCM systems and all databases I know of. The correctness is guaranteed (as long as the implementation is correct, of course). However, it is difficult to implement. It is even difficult to conceptualize in a lot of cases. There are lots of user-feedback implications: an ‘increment’ button should obviously retry a transaction, but what if it’s a spinner? Are you setting an explicit value, or just incrementing? Regardless, where you need correctness in a concurrent environment, you need transactions. The question is, do you need absolute correctness, or is ‘good enough’ good enough?

Recommendations

Avoid Store in Memory. If you design things this way, break the habit. It is a beginner’s mistake that I still make from time to time. Use Fetch on Demand instead. It should be your most common pattern for designing your tools.

Be careful if you think you need Transactions. Ensure they are where they need to be (database, SCM), but don’t just go around designing everything as if it needs to be transactional. If you have two programs that can edit the same file- is one or the other just winning OK? How likely is that to happen? How will you indicate the failed transaction to the user? I’d suggest designing your tools so transactions are not necessary, and just verify things are correct when they cross an important threshold (checkin, export, etc.). Do your cost-benefit analysis. A highly concurrent system will need transactions, tools that only work with local data will likely not.

It should be clear, but still worth pointing out, you can mix-and-match these patterns inside of your designs.

Hope that clarifies things, Tyler.

#AltDevBlogADay

Rob-Galanakis