Comments on: For the Flocks Sake Thanks Noel and Bjoern I've runned the test on a core i5, and openmp reports using 4 threads Noel, we also have a custom thread managment library, but for the speed of testing threading and see how well DoD accomodate to threading I choose openmp, and to be honest it is the first time ever I made a program with openmp :) Cheers Stéphane Thanks Noel and Bjoern

I’ve runned the test on a core i5, and openmp reports using 4 threads

Noel, we also have a custom thread managment library, but for the speed of testing threading and see how well DoD accomodate to threading I choose openmp, and to be honest it is the first time ever I made a program with openmp :)

Cheers
Stéphane

]]>
By: Bjoern Knafla/2011/04/07/for-the-flocks-sake/#comment-2760 Bjoern Knafla Thu, 14 Apr 2011 17:36:16 +0000 Yeah I forgot to mention I turned off STL iterator debugging. I also ran with FastMath, but seeing as it's a comparison not outright timings, it's not a big deal. Interesting results you got, I'd like to know which CPU you were running on. I've not had much experience with OpenMP or other high level multithreading libraries. The engines I've worked with have task based multithreaded systems which allow finer control of how the resources and tasks are distributed. If you have a single threaded app that you want to speed up I can see how using OpenMP would be super easy to thread the number crunching parts! Thanks for taking the time to investigate. I'm impressed with the MP results and it's a good point that by having the data in a cache friendly layout just makes threading easier! Yeah I forgot to mention I turned off STL iterator debugging. I also ran with FastMath, but seeing as it’s a comparison not outright timings, it’s not a big deal. Interesting results you got, I’d like to know which CPU you were running on.

I’ve not had much experience with OpenMP or other high level multithreading libraries. The engines I’ve worked with have task based multithreaded systems which allow finer control of how the resources and tasks are distributed. If you have a single threaded app that you want to speed up I can see how using OpenMP would be super easy to thread the number crunching parts!

Thanks for taking the time to investigate. I’m impressed with the MP results and it’s a good point that by having the data in a cache friendly layout just makes threading easier!

]]>
By: Bjoern Knafla/2011/04/07/for-the-flocks-sake/#comment-2758 Bjoern Knafla Thu, 14 Apr 2011 17:32:02 +0000 Thanks for the code and article I was about to make something similar to get my hands on DoD, so thanks to make it first :) I've grab the code and made some minor changes First I removed stl debugging facilities to make the compiler inlining more stuff and remove ugly iterator checking Then I added basic openmp directive in the streamed dude, thanks to DoD for making it already openmp complaint by nature :) here the results original code: =============================================== numDudes: 1000 numIterations: 100 Dude_Original: 69.87 1.98 59.80 1.76 Dude_Original: 71.52 2.02 60.16 1.78 Dude_Original: 70.35 2.01 59.96 1.76 Dude_Stream: 56.77 10.17 1.47 0.68 Dude_Stream: 56.86 10.17 1.46 0.69 Dude_Stream: 56.81 10.18 1.47 0.68 =============================================== Average times for searching neighbours per dude: Original: 70.58 Contig : 59.98 (x 1.18) Stream : 56.81 (x 1.24) SIMD : 10.17 (x 6.94) Average times for updating dude: Original: 2.00 Contig : 1.77 (x 1.13) Stream : 1.46 (x 1.37) with openmp: =============================================== numDudes: 1000 numIterations: 100 Dude_Original: 70.34 2.23 59.93 1.73 Dude_Original: 69.89 2.07 59.44 1.69 Dude_Original: 68.91 2.00 59.35 1.70 Dude_Stream: 15.43 2.68 0.43 0.19 Dude_Stream: 15.55 2.76 0.44 0.19 Dude_Stream: 15.67 2.76 0.44 0.19 =============================================== Average times for searching neighbours per dude: Original: 69.71 Contig : 59.57 (x 1.17) Stream : 15.55 (x 4.48) SIMD : 2.74 (x 25.48) Average times for updating dude: Original: 2.10 Contig : 1.71 (x 1.23) Stream : 0.44 (x 4.80) As you can see with only 3 #pragma openmp parallel for added, the increase in performance is impressive ! Cheers Stéphane Thanks for the code and article

I was about to make something similar to get my hands on DoD, so thanks to make it first :)

I’ve grab the code and made some minor changes
First I removed stl debugging facilities to make the compiler inlining more stuff and remove ugly iterator checking
Then I added basic openmp directive in the streamed dude, thanks to DoD for making it already openmp complaint by nature :)

here the results
original code:
===============================================
numDudes: 1000 numIterations: 100

Dude_Original: 69.87 1.98 59.80 1.76
Dude_Original: 71.52 2.02 60.16 1.78
Dude_Original: 70.35 2.01 59.96 1.76
Dude_Stream: 56.77 10.17 1.47 0.68
Dude_Stream: 56.86 10.17 1.46 0.69
Dude_Stream: 56.81 10.18 1.47 0.68

===============================================
Average times for searching neighbours per dude:
Original: 70.58
Contig : 59.98 (x 1.18)
Stream : 56.81 (x 1.24)
SIMD : 10.17 (x 6.94)

Average times for updating dude:
Original: 2.00
Contig : 1.77 (x 1.13)
Stream : 1.46 (x 1.37)

with openmp:
===============================================
numDudes: 1000 numIterations: 100

Dude_Original: 70.34 2.23 59.93 1.73
Dude_Original: 69.89 2.07 59.44 1.69
Dude_Original: 68.91 2.00 59.35 1.70
Dude_Stream: 15.43 2.68 0.43 0.19
Dude_Stream: 15.55 2.76 0.44 0.19
Dude_Stream: 15.67 2.76 0.44 0.19

===============================================
Average times for searching neighbours per dude:
Original: 69.71
Contig : 59.57 (x 1.17)
Stream : 15.55 (x 4.48)
SIMD : 2.74 (x 25.48)

Average times for updating dude:
Original: 2.10
Contig : 1.71 (x 1.23)
Stream : 0.44 (x 4.80)

As you can see with only 3 #pragma openmp parallel for added, the increase in performance is impressive !

Cheers
Stéphane

]]>
By: Noel Austin/2011/04/07/for-the-flocks-sake/#comment-2593 Noel Austin Mon, 11 Apr 2011 18:36:55 +0000 Nice experiment - thanks for describing it! How do you organize the data for the nearest-neighbor search? Do you use a brute force approach where each agents iterates over an array of last-agent positions or something more spatially organized? Though for "merely" ;-) 1.000 agents the brute force method might always win out - especially when taking the implementation complexity and time into account. Cheers, Bjoern Nice experiment – thanks for describing it!

How do you organize the data for the nearest-neighbor search? Do you use a brute force approach where each agents iterates over an array of last-agent positions or something more spatially organized?

Though for “merely” ;-) 1.000 agents the brute force method might always win out – especially when taking the implementation complexity and time into account.

Cheers,
Bjoern

]]>