Comments on: Database programming is fun! This looks real familiar. Been there, done that, come up with different solutions (I mmap() the file and abort if I can't mmap() in the right place). This looks real familiar.

Been there, done that, come up with different solutions (I mmap() the file and abort if I can’t mmap() in the right place).

]]>
By: Всем сотрудникам отдела! «/2011/03/07/database-programming-is-fun/#comment-1926 Всем сотрудникам отдела! « Thu, 24 Mar 2011 13:57:03 +0000 Alex, you're great. I'm glad you're in the business with the same spirit (and immense genius) you were in the scene. This industry surely needs to kick out many stupid habits that make no sense, like considering code sacred. I hope MM will prosper! Alex, you’re great. I’m glad you’re in the business with the same spirit (and immense genius) you were in the scene. This industry surely needs to kick out many stupid habits that make no sense, like considering code sacred. I hope MM will prosper!

]]>
By: James Podesta/2011/03/07/database-programming-is-fun/#comment-1343 James Podesta Wed, 09 Mar 2011 03:54:41 +0000 And more related links… Just ran across a nice post dissecting reddis: Ah, that makes more sense. I was about to laugh at the statement "just for the cost of a page-table copy in the OS" Those things are absurdly expensive, especially on multi-core machines. But once every 12 hours might be OK. Ah, that makes more sense. I was about to laugh at the statement “just for the cost of a page-table copy in the OS” Those things are absurdly expensive, especially on multi-core machines. But once every 12 hours might be OK.

]]>
By: alex/2011/03/07/database-programming-is-fun/#comment-1325 alex Mon, 07 Mar 2011 23:12:18 +0000 Great post Alex! I'm not sure if you saw my post here on Leaderboards: A How-To Guide, http://altdevblogaday.org/2011/01/31/leaderboards-a-how-to-guide/, but oddly enough, I used LittleBigPlanet in one of the example scenarios. Glad to see other folks doing interesting Redis-y things, even if alexdb isn't Redis. Looking forward to more posts like this. Great post Alex!

I’m not sure if you saw my post here on Leaderboards: A How-To Guide, Thanks for sharing your experiences in such detail! However, I'm a little confused - it seems like there's a huge disconnect between how you talk about the need to minimize syscalls and dynamic allocation, but then go on to use a fork() per request. fork() is implicitly going to create a brand new process with its own page tables, thread(s), etc - which is basically dynamic allocation - and multiple syscalls are involved. Did you just decide that the cost there was justified by all the benefits you got out of it (like copy-on-write for data pages), or do you feel that the dynamic allocation and syscalls involved in a fork() operation aren't an issue because they're so well optimized in a modern unix OS? Between fork() and a custom user-mode coroutine scheduler (or threadpool), if I really cared about avoiding dynamic allocation and keeping delays to an absolute minimum, I would expect the latter to be the appropriate choice. You also mention L2 cache, and I would expect that forking off dozens of workers is going to be pretty harmful to your cache since each worker is, at a minimum, going to end up with its own data pages for its stack and memory storage, and those pages will probably end up in a new place each time as the OS allocates physical storage for them when you fork() off a new process - while if you were to use a usermode scheduler based on a threadpool or coroutines, you could reuse the same stack pages and have a better chance of things staying in cache. Anyway, just my two cents - great post :) Thanks for sharing your experiences in such detail! However, I’m a little confused – it seems like there’s a huge disconnect between how you talk about the need to minimize syscalls and dynamic allocation, but then go on to use a fork() per request. fork() is implicitly going to create a brand new process with its own page tables, thread(s), etc – which is basically dynamic allocation – and multiple syscalls are involved.

Did you just decide that the cost there was justified by all the benefits you got out of it (like copy-on-write for data pages), or do you feel that the dynamic allocation and syscalls involved in a fork() operation aren’t an issue because they’re so well optimized in a modern unix OS?

Between fork() and a custom user-mode coroutine scheduler (or threadpool), if I really cared about avoiding dynamic allocation and keeping delays to an absolute minimum, I would expect the latter to be the appropriate choice. You also mention L2 cache, and I would expect that forking off dozens of workers is going to be pretty harmful to your cache since each worker is, at a minimum, going to end up with its own data pages for its stack and memory storage, and those pages will probably end up in a new place each time as the OS allocates physical storage for them when you fork() off a new process – while if you were to use a usermode scheduler based on a threadpool or coroutines, you could reuse the same stack pages and have a better chance of things staying in cache.

Anyway, just my two cents – great post :)

]]> By: alex/2011/03/07/database-programming-is-fun/#comment-1321 alex Mon, 07 Mar 2011 21:54:46 +0000

]]>
By: Rachel 'Groby' Blum/2011/03/07/database-programming-is-fun/#comment-1320 Rachel 'Groby' Blum Mon, 07 Mar 2011 21:47:32 +0000 FlashStore: High Throughput Persistent Key-Value Store
)

Or, even worse, the NIH that invents things that are already there. I don’t know how many tools I’ve seen that want to (crappily) re-implement VM, for example.

So, really, your post is an excellent example of when to write your own, and when to build on the work of others. In a way, you’ve joined the NIH camp. Welcome! ;)

]]> By: Sergey P/2011/03/07/database-programming-is-fun/#comment-1319 Sergey P Mon, 07 Mar 2011 21:26:25 +0000 Wow, I'm happy that I'm not the only one to have travelled down this road! I can really relate to your story. Back in the 90s I used to yawn at the mere thought of databases, while devouring CPU instruction manuals, maths- and graphics books. So exciting! Now it's the reverse. My challenge now is to wrestle huge datasets into submission so artists and designers can work efficiently. To this end I've also written a very specialized storage engine and since I wasn't a database programmer by trade (and didn't go to any of the database classes @ uni since I thought it was so boring) I had to learn tons of stuff from scratch to be able implement it. This suits me since I'm a bit OCD and my mind easily wanders if I'm not heading into new challenging territory :) I think there's plenty of interesting stuff in database programming even for runtime peeps. The most scalable databases are DOD and massively parallel by design, and the 'hot' DOD approaches of today can pretty much be described using database terms of old (PAX, column stores, random-access column compression etc). Not to mention the fact that while you might worry about missing your CPU cache the db peeps worry about hitting the disk! It's been a very interesting journey and we're definitely not done yet - kill one bottleneck and three others pop up! And don't utter the NIH words - what we needed truly wasn't invented :P Wow, I’m happy that I’m not the only one to have travelled down this road! I can really relate to your story.

Back in the 90s I used to yawn at the mere thought of databases, while devouring CPU instruction manuals, maths- and graphics books. So exciting!

Now it’s the reverse. My challenge now is to wrestle huge datasets into submission so artists and designers can work efficiently. To this end I’ve also written a very specialized storage engine and since I wasn’t a database programmer by trade (and didn’t go to any of the database classes @ uni since I thought it was so boring) I had to learn tons of stuff from scratch to be able implement it. This suits me since I’m a bit OCD and my mind easily wanders if I’m not heading into new challenging territory :)

I think there’s plenty of interesting stuff in database programming even for runtime peeps. The most scalable databases are DOD and massively parallel by design, and the ‘hot’ DOD approaches of today can pretty much be described using database terms of old (PAX, column stores, random-access column compression etc). Not to mention the fact that while you might worry about missing your CPU cache the db peeps worry about hitting the disk!

It’s been a very interesting journey and we’re definitely not done yet – kill one bottleneck and three others pop up!

And don’t utter the NIH words – what we needed truly wasn’t invented :P

]]>
By: woods/2011/03/07/database-programming-is-fun/#comment-1313 woods Mon, 07 Mar 2011 19:53:27 +0000 Disgraceful bit of NIH, but a wonderful post on the lessons learned so I'll allow it ;-) Disgraceful bit of NIH, but a wonderful post on the lessons learned so I’ll allow it ;-)

]]>