Comments on: Preparing for Parallelism Just to let you know I will post something else to do with this, which includes some source code snips. I've just had quite a busy last few days! Just to let you know I will post something else to do with this, which includes some source code snips. I’ve just had quite a busy last few days!

]]>
By: gdev/2011/01/19/preparing-for-parallelism/#comment-670 gdev Thu, 20 Jan 2011 10:18:15 +0000 The next sentence states if you then design your system around this fact (seperating out into phases where you can ensure a correct state) then the update phase can run fine in parallel. It will not crash (unless you introduce a bug, of course! ). As I said in the article, a crash would be a read of invalid data. The x1/x2 point, in my view, is a non example, it's a very simple problem to extract out and do in such a way that would not raise issue. Sorry if the terms I have used are confusing. I may post something in the next few days to explain with an example :) The next sentence states if you then design your system around this fact (seperating out into phases where you can ensure a correct state) then the update phase can run fine in parallel. It will not crash (unless you introduce a bug, of course! ). As I said in the article, a crash would be a read of invalid data. The x1/x2 point, in my view, is a non example, it’s a very simple problem to extract out and do in such a way that would not raise issue. Sorry if the terms I have used are confusing. I may post something in the next few days to explain with an example :)

]]>
By: Dmitry Vyukov/2011/01/19/preparing-for-parallelism/#comment-668 Dmitry Vyukov Thu, 20 Jan 2011 09:47:24 +0000 This is about realising that the operations that happen to cause data races are those which happen quite infrequently; if you design the system around that fact you can cope quite well in a game-frame sense, and allow tasks to run in parallel. As I said in the article, structural data changes are quite small in the grand scheme of things, and it is the type of operation that if you re-think can open up scope for completely lockless operation. The sorts of actions which cause the main issues are usually surprisingly high level, and certainly for the stuff I do rarely causes low-level worry (even though I am interacting with low-level data, in a sense). The point about there being no incorrect data is a very good one. It means you have rewritten your code in such a way that you do not need the mutexes, so get rid of them - they are the fast food of parallell thinking! :)I will post more on the semantics in the future, as it seems I have more explaining to do. I'll link to it from here in the comments, when the time comes. This is about realising that the operations that happen to cause data races are those which happen quite infrequently; if you design the system around that fact you can cope quite well in a game-frame sense, and allow tasks to run in parallel. As I said in the article, structural data changes are quite small in the grand scheme of things, and it is the type of operation that if you re-think can open up scope for completely lockless operation. The sorts of actions which cause the main issues are usually surprisingly high level, and certainly for the stuff I do rarely causes low-level worry (even though I am interacting with low-level data, in a sense). The point about there being no incorrect data is a very good one. It means you have rewritten your code in such a way that you do not need the mutexes, so get rid of them – they are the fast food of parallell thinking! :)I will post more on the semantics in the future, as it seems I have more explaining to do. I’ll link to it from here in the comments, when the time comes.

]]>
By: Dmitry Vyukov/2011/01/19/preparing-for-parallelism/#comment-666 Dmitry Vyukov Thu, 20 Jan 2011 08:05:41 +0000 Dmitry, do you mean: if the "incorrect" data to read consists of multiple variables/elements that need to be updated together to not break and when the data reading function is scheduled in a thread aside a writing thread then syncing is needed not to garble the data in memory.To use no syncing in such a situation needs a lot of knowhow how data garbling would affect the reading thread/core. To not fall into data lifetime traps you'd also need in-depth knowhow about the platform and how it manages its caches. Dmitry, do you mean: if the “incorrect” data to read consists of multiple variables/elements that need to be updated together to not break and when the data reading function is scheduled in a thread aside a writing thread then syncing is needed not to garble the data in memory.To use no syncing in such a situation needs a lot of knowhow how data garbling would affect the reading thread/core. To not fall into data lifetime traps you’d also need in-depth knowhow about the platform and how it manages its caches.

]]>
By: Colin Riley/2011/01/19/preparing-for-parallelism/#comment-664 Colin Riley Wed, 19 Jan 2011 17:51:48 +0000 > This might be because the memory hasn't been synced between coresOk, and here we return to the x1/x2 example. Because if data is written by a thread but not yet completely synced between processors, then it may happen so that x1 is synced and x2 is not. So one needs atomics and memory fences *inside* of a phase. And it's not something you want to get a defenceless programmer into with a general advice.> or because there is an old data buffer used to decouple phasesIf there is explicit double buffering, then there is no incorrect/inconsistent data, then there is just what a programmer explicitly asked for. > This might be because the memory hasn’t been synced between coresOk, and here we return to the x1/x2 example. Because if data is written by a thread but not yet completely synced between processors, then it may happen so that x1 is synced and x2 is not. So one needs atomics and memory fences *inside* of a phase. And it’s not something you want to get a defenceless programmer into with a general advice.> or because there is an old data buffer used to decouple phasesIf there is explicit double buffering, then there is no incorrect/inconsistent data, then there is just what a programmer explicitly asked for.

]]>
By: Bjoern Knafla/2011/01/19/preparing-for-parallelism/#comment-662 Bjoern Knafla Wed, 19 Jan 2011 17:17:49 +0000 Then what is your tip 3 about? Isn't it about situations when you don't want to solve concurrency problems simply by suppressing concurrency?If we will reason your way, then there is no need for Incorrectness, because you can "simply" ensure mutual exclusion between reads and writes, and then there is nothing to solve there. Then what is your tip 3 about? Isn’t it about situations when you don’t want to solve concurrency problems simply by suppressing concurrency?If we will reason your way, then there is no need for Incorrectness, because you can “simply” ensure mutual exclusion between reads and writes, and then there is nothing to solve there.

]]>
By: Colin Riley/2011/01/19/preparing-for-parallelism/#comment-660 Colin Riley Wed, 19 Jan 2011 17:00:07 +0000 > thats when you would fall back to other methodsIndeed. And it's when you *need* to consider the problem with x1 and x2 and how to solve it, and when you can't just say "I did kill all concurrency, and that would solve all my concurrency problems". > thats when you would fall back to other methodsIndeed. And it’s when you *need* to consider the problem with x1 and x2 and how to solve it, and when you can’t just say “I did kill all concurrency, and that would solve all my concurrency problems”.

]]>
By: Colin Riley/2011/01/19/preparing-for-parallelism/#comment-658 Colin Riley Wed, 19 Jan 2011 16:26:09 +0000 >In that example, i’d have the Read Phase precacheYeah, of course it’s possible to solve any concurrency problem by making things not concurrent, but then you loose parallelism.Actually it’s possible to ensure consistency by means of a SeqLock ( Thanks for the comments so far. Ben, it seems to be a common occurance, suddenly everything appears so much simpler :)Dmitry, you are correct. In that example, i'd have the Read Phase precache g_x1 and g_x2, and the they would only be written in the Write Phase. In that situation you could ensure data reads always followed previous writes, as Phases are defined not to run in parallel together. Finding those issues are indeed tricky and present the usual parallel perils. Not every use case will fit into these Phases perfectly, but many things will. Thanks for the comments so far. Ben, it seems to be a common occurance, suddenly everything appears so much simpler :)Dmitry, you are correct. In that example, i’d have the Read Phase precache g_x1 and g_x2, and the they would only be written in the Write Phase. In that situation you could ensure data reads always followed previous writes, as Phases are defined not to run in parallel together. Finding those issues are indeed tricky and present the usual parallel perils. Not every use case will fit into these Phases perfectly, but many things will.

]]>
By: Dmitry Vyukov/2011/01/19/preparing-for-parallelism/#comment-655 Dmitry Vyukov Wed, 19 Jan 2011 14:06:47 +0000 Great article! And a very good point indeed about the benefits of considering if incorrect data is "good enough" for your scenario - It's way too easy to set out down a "everything must be 100% perfect" path and lose out on a lot of speed, especially in the case of things like particle systems where (generally speaking) as long as it looks OK you don't need to lose sleep over how accurate it is.We had a very good case of this many years back on a PS2 game I worked on called Battle Engine Aquila. Basically the game maps had absolutely ridiculous numbers of trees on them, each of which could be individually knocked over and destroyed if the player so desired. The rendering of all these was handled by some custom VU1 code, but we had the problem that because VU1 and the CPU were running in parallel, the update code had to double-buffer all the output data lest something unpleasant slip through to the rendering.We struggled to get the performance of that approach up to acceptable levels before realising exactly your point - whilst rewriting the data for a tree completely could cause nasty rendering errors, if we used a sparse array for the data (so trees never changed index), and ensured that deleted trees weren't recycled until a couple of frames later, the worst that would happen in the case of a GPU/CPU race condition was that a tree would be a frame ahead/behind of itself when animating, or exist for a frame longer than it should after dying.With that change in place, the CPU-side workload dropped massively (no expensive buffer copies, or indeed any need to touch trees that weren't currently "active" in some way), and we stopped having to care about the sequencing with the GPU... and I never heard a single person complain about trees not dying when they were supposed to. Great article! And a very good point indeed about the benefits of considering if incorrect data is “good enough” for your scenario – It’s way too easy to set out down a “everything must be 100% perfect” path and lose out on a lot of speed, especially in the case of things like particle systems where (generally speaking) as long as it looks OK you don’t need to lose sleep over how accurate it is.We had a very good case of this many years back on a PS2 game I worked on called Battle Engine Aquila. Basically the game maps had absolutely ridiculous numbers of trees on them, each of which could be individually knocked over and destroyed if the player so desired. The rendering of all these was handled by some custom VU1 code, but we had the problem that because VU1 and the CPU were running in parallel, the update code had to double-buffer all the output data lest something unpleasant slip through to the rendering.We struggled to get the performance of that approach up to acceptable levels before realising exactly your point – whilst rewriting the data for a tree completely could cause nasty rendering errors, if we used a sparse array for the data (so trees never changed index), and ensured that deleted trees weren’t recycled until a couple of frames later, the worst that would happen in the case of a GPU/CPU race condition was that a tree would be a frame ahead/behind of itself when animating, or exist for a frame longer than it should after dying.With that change in place, the CPU-side workload dropped massively (no expensive buffer copies, or indeed any need to touch trees that weren’t currently “active” in some way), and we stopped having to care about the sequencing with the GPU… and I never heard a single person complain about trees not dying when they were supposed to.

]]>
By: Dmitry Vyukov/2011/01/19/preparing-for-parallelism/#comment-653 Dmitry Vyukov Wed, 19 Jan 2011 10:47:04 +0000 Just a thought: locking or syncing are control-flow ingredients. They are called from code - as data they are just passive. Structuring control-flow into read-update-write phases/stages is the most "natural" consequence.The other reason to structure the code / control flow around data-access phases is to coordinate use of the weakest and most important link for performance in the parallel-computer-world: the memory hierarchy. You can't get performance when not working with it.Good and fundamental read to understand scalable and less error prone parallelism.Thanks Colin! Just a thought: locking or syncing are control-flow ingredients. They are called from code – as data they are just passive. Structuring control-flow into read-update-write phases/stages is the most “natural” consequence.The other reason to structure the code / control flow around data-access phases is to coordinate use of the weakest and most important link for performance in the parallel-computer-world: the memory hierarchy. You can’t get performance when not working with it.Good and fundamental read to understand scalable and less error prone parallelism.Thanks Colin!

]]>