Comments on: Redneck Cloud Computing It might work better nowadays but I tried using Condor about a year ago and found the latency between job submission and dispatch pretty bad. It also seemed somewhat erratic in behaviour wrt work distribution. It seems designed mostly for quite time intensive work units. It might work better nowadays but I tried using Condor about a year ago and found the latency between job submission and dispatch pretty bad. It also seemed somewhat erratic in behaviour wrt work distribution. It seems designed mostly for quite time intensive work units.

]]>
By: Kyle Hayward/2011/02/26/redneck-cloud-computing/#comment-1102 Kyle Hayward Mon, 28 Feb 2011 17:53:00 +0000 Condor project, which seems to be more expressive than Incredibuild and can do more advanced steps like gathering results at an intermediate step.

]]>
By: Rachel Blum/2011/02/26/redneck-cloud-computing/#comment-1088 Rachel Blum Mon, 28 Feb 2011 04:04:05 +0000 Caching based on your inputs is an excellent idea - but you need to carefully think about what constitutes your inputs. (There's a post by DJB on this somewhere, but I can't find the link right now) * Obviously, all input files * Dependencies (i.e. .h files) * options to the process * the actual executable involved * Dependencies of the executable (i.e. DLLs/Assemblies used) Some of these, you might skip (CRT DLL version is usually not relevant, for example). Some of these you might want to apply in a modified form. (Instead of the actual executable, often a major revision number for the executable is a better idea, so that not every single tiny change rebuilds everything) Also, if your data files are large, and you have intermediate stages, giving jobs machine affinity so they can re-use local versions might be saving a lot of time. The whole thing is an awesomely interesting bottomless time sink :) Caching based on your inputs is an excellent idea – but you need to carefully think about what constitutes your inputs. (There’s a post by DJB on this somewhere, but I can’t find the link right now)

* Obviously, all input files
* Dependencies (i.e. .h files)
* options to the process
* the actual executable involved
* Dependencies of the executable (i.e. DLLs/Assemblies used)

Some of these, you might skip (CRT DLL version is usually not relevant, for example). Some of these you might want to apply in a modified form. (Instead of the actual executable, often a major revision number for the executable is a better idea, so that not every single tiny change rebuilds everything)

Also, if your data files are large, and you have intermediate stages, giving jobs machine affinity so they can re-use local versions might be saving a lot of time.

The whole thing is an awesomely interesting bottomless time sink :)

]]>
By: Nick Darnell/2011/02/26/redneck-cloud-computing/#comment-1086 Nick Darnell Sun, 27 Feb 2011 21:40:14 +0000 ), it appears they’ve done the painful work of creating the device/driver but expose it to user land with a nice C++ layer and even have .Net wrappers. It looks pretty awesome.

]]>
By: Stefan Boberg/2011/02/26/redneck-cloud-computing/#comment-1085 Stefan Boberg Sun, 27 Feb 2011 21:24:56 +0000 I’ve been out-rednecked *tips hat* I’ve been out-rednecked *tips hat*

]]>
By: Nick Darnell/2011/02/26/redneck-cloud-computing/#comment-1076 Nick Darnell Sun, 27 Feb 2011 00:25:40 +0000 Yeah I've definitely been thinking about caching. Haven't settled on exactly what I'll do, I'll probably try a few things and pick one. The easiest one is just to keep a small set size cache on every machine and keep the most recent stuff on disk there. I have been thinking about virtualizing the I/O access. If I create a driver and a virtual device then I can intercept all the I/O operations and use that as the trigger for bringing over files from the host machine, I should be able to achieve similar results. I was looking at some samples from the DDK not long ago. Not a ton of information online other than MSDN on writing one though, so I thought I'd keep it simple for version 1.0. Yeah I’ve definitely been thinking about caching. Haven’t settled on exactly what I’ll do, I’ll probably try a few things and pick one. The easiest one is just to keep a small set size cache on every machine and keep the most recent stuff on disk there.

I have been thinking about virtualizing the I/O access. If I create a driver and a virtual device then I can intercept all the I/O operations and use that as the trigger for bringing over files from the host machine, I should be able to achieve similar results. I was looking at some samples from the DDK not long ago. Not a ton of information online other than MSDN on writing one though, so I thought I’d keep it simple for version 1.0.

]]>
By: Sam/2011/02/26/redneck-cloud-computing/#comment-1072 Sam Sat, 26 Feb 2011 23:52:20 +0000 Darn, them red-necks are all growed up ;) I.e. that's already pretty sophisticated. Back-of-the-napkin redneck: Shared drive is "the pipe". Instead of a work distribution server, work machines engage in work stealing. Environment requirements can be encoded in filenames to make the stealing a bit simpler, or each server just keeps trying jobs till it finds a suitable one. And a shared drive with admin-only write solves the problem of approved executables. Mind, I think your approach is going to be better as you expand, but if you want pure redneck... :) Darn, them red-necks are all growed up ;)

I.e. that’s already pretty sophisticated. Back-of-the-napkin redneck: Shared drive is “the pipe”. Instead of a work distribution server, work machines engage in work stealing. Environment requirements can be encoded in filenames to make the stealing a bit simpler, or each server just keeps trying jobs till it finds a suitable one. And a shared drive with admin-only write solves the problem of approved executables.

Mind, I think your approach is going to be better as you expand, but if you want pure redneck… :)

]]>
By: imm/2011/02/26/redneck-cloud-computing/#comment-1070 imm Sat, 26 Feb 2011 21:58:39 +0000