About Monkey 2 › Forums › Monkey 2 Projects › Structs, Classes and Cache Efficiency
This topic contains 2 replies, has 1 voice, and was last updated by
arpie
2 years, 4 months ago.
-
AuthorPosts
-
November 26, 2016 at 11:10 pm #5449
I recently bought this book : http://gameprogrammingpatterns.com/ and was fascinated by this chapter in particular: http://gameprogrammingpatterns.com/data-locality.html
TLDR: improving data locality can improve cache efficiency and can have enormous effects on speed of execution.
I wondered if I could find a significant difference between using arrays of Structs vs using arrays of Class objects in Monkey2 as Class object arrays are (afaik) arrays of pointers, wheras arrays of Structs ar stored in contiguous memory. Once I got it right, the results were surprising.
It seems to be that Structs are a lot faster (up to 20x faster!) than Class objects for large numbers (thousands) of entities, and coding to optimise data locality is potentially worth considering… but this is a dark art and we all know not to waste time prematurely optimising code.
I would be delighted if we ever get a reasonably safe way to use pointers into arrays of Structs but as far as I understand this is difficult to achieve in Monkey2 (using the undocumented (?) varptr and ptr).
Your comments, results on other hardware, critiques of my code, etc. will be very welcome.
This is the code I ended up with (NOTE: You must run it in release mode to get accurate results). Hopefully the comments in the code will make it clear what is happening :
Monkey123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191#Import "<std>"#Import "<mojo>"Using std..Class TestClassField thing:IntEnd ClassStruct TestStructField thing:IntEnd StructFunction Main()'Run tests for different sized arrays, increasing each time by a factor'of ten. Stop the tests if/when your machine starts to struggle!For Local i := 2 To 6 'That's 100, 1000, 10000, 100000 and 1millionRunTest(Pow(10, i))NextEndFunction RunTest(n:Int)'Variable for timing measurementsLocal t0:IntPrint("")Print("")Print("Initialising test using " + n + " array members. Please wait...")'Seed the RNG'============'Try to seed the RNG as randomly as possible (probably this is overkill!).While (Millisecs() < 10)Local seed:Int = Millisecs() + Microsecs()SeedRnd(seed)Print(Rnd())Sleep(Rnd())End While'Generate some random numbers for random array access testing'============================================================'We generate two arrays in advance :'One contains ordered numbers (ie array[x] = x) and'The other contains random numbers (ie. array[x] = Rnd())'This way, both sets of tests will produce timings that can be directly compared with'each other without skewing results with extra array lookups or calls to Rnd().Local ordered := New Int[n]Local random := New Int[n]For Local i := 0 To n-1ordered[i] = irandom[i] = Rnd(n-1)Next'Generate an array of TestClass objects, all in quick succession'==============================================================='This method of allocating memory for the array should maximise'the likelyhood of objects being allocated contiguously in memory.'This should allow the most efficient array operations.Local classarray1 := New TestClass[n]For Local i := 0 To n-1classarray1[i] = New TestClass()Next'Generate an array of TestClass objects, with junk allocations between array members'==================================================================================='This method of allocating memory for the array should spread the objects out and'reduce the likelyhood of neighbouring array members being cached during array'operations, causing a lot of cache misses and slowness. I think this will most'closely simulate common coding practices where a pool is not used for allocating'new entities as and when needed.Local classarray2 := New TestClass[n]For Local i := 0 To n-1classarray2[i] = New TestClass()'Now allocate a bunch of new classes without assigning them to anything to fill'some space between real array membersFor Local i := 0 To 100New TestClass()NextNext'Generate an array of TestClass objects, in a random order'========================================================='This will keep the array within a localised area of memory but will still'degrade performance by causing a lot of cache misses.'This should perform slower than classarray1 but faster than classarray2'and is mostly here just to confirm that my theory of what is happening is correct.Local classarray3 := New TestClass[n]For Local i := 0 To n-1'Random order Class instantiationclassarray3[Rnd(n-1)] = New TestClass()NextFor Local i := 0 To n-1' Now fill in any gaps in that random set of objectsIf classarray3[i] = Null Then classarray3[i] = New TestClass()Next'Generate an array of TestStruct objects'======================================='These objects will be generated in a nice, neat contiguous chunk, this being'one of the advantages of Structs over Classes. It should be lightning fast'for array operations, being the most cache efficient and having the fewest'indirections (probably!) in the generated C code.Local structarray := New TestStruct[n]'Run the tests'============='All tests perform a very simple array operation (adding 1 to the object 'thing''field). The intention is to measure as closely as possible the cost of array'lookups without any avoidable overhead caused by actually doing anything to the'array objects.'Each test performs the same operation on consecutive or random array members,'looping continuously over the array until ten million operations have been'performed, thus allowing all timings to be directly compared (in theory!).Print ""Print "Testing rate of linear access of all array members."t0 = Millisecs()For Local i := 0 To 10000000/nFor Local j := 0 To n-1classarray1[ordered[j]].thing += 1NextNextPrint("Linear ClassArray1 access took " + (Millisecs() - t0) + "ms")t0 = Millisecs()For Local i := 0 To 10000000/nFor Local j := 0 To n-1classarray2[ordered[j]].thing += 1NextNextPrint("Linear ClassArray2 access took " + (Millisecs() - t0) + "ms")t0 = Millisecs()For Local i := 0 To 10000000/nFor Local j := 0 To n-1classarray3[ordered[j]].thing += 1NextNextPrint("Linear ClassArray3 access took " + (Millisecs() - t0) + "ms")t0 = Millisecs()For Local i := 0 To 10000000/nFor Local j := 0 To n-1structarray[ordered[j]].thing += 1NextNextPrint("Linear StructArray access took " + (Millisecs() - t0) + "ms")Print ""Print "Testing rate of random access of array members."t0 = Millisecs()For Local i := 0 To 10000000/nFor Local j := 0 To n-1classarray1[random[j]].thing += 1NextNextPrint("Random ClassArray1 access took " + (Millisecs() - t0) + "ms")t0 = Millisecs()For Local i := 0 To 10000000/nFor Local j := 0 To n-1classarray2[random[j]].thing += 1NextNextPrint("Random ClassArray2 access took " + (Millisecs() - t0) + "ms")t0 = Millisecs()For Local i := 0 To 10000000/nFor Local j := 0 To n-1classarray3[random[j]].thing += 1NextNextPrint("Random ClassArray3 access took " + (Millisecs() - t0) + "ms")t0 = Millisecs()For Local i := 0 To 10000000/nFor Local j := 0 To n-1structarray[random[j]].thing += 1NextNextPrint("Random StructArray access took " + (Millisecs() - t0) + "ms")EndAnd the results on my i5-powered laptop are :
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576Initialising test using 100 array members. Please wait...0.41406254938420939Testing rate of linear access of all array members.Linear ClassArray1 access took 8msLinear ClassArray2 access took 11msLinear ClassArray3 access took 8msLinear StructArray access took 7msTesting rate of random access of array members.Random ClassArray1 access took 8msRandom ClassArray2 access took 9msRandom ClassArray3 access took 9msRandom StructArray access took 6msInitialising test using 1000 array members. Please wait...Testing rate of linear access of all array members.Linear ClassArray1 access took 19msLinear ClassArray2 access took 66msLinear ClassArray3 access took 19msLinear StructArray access took 6msTesting rate of random access of array members.Random ClassArray1 access took 11msRandom ClassArray2 access took 16msRandom ClassArray3 access took 12msRandom StructArray access took 6msInitialising test using 10000 array members. Please wait...Testing rate of linear access of all array members.Linear ClassArray1 access took 29msLinear ClassArray2 access took 96msLinear ClassArray3 access took 29msLinear StructArray access took 7msTesting rate of random access of array members.Random ClassArray1 access took 34msRandom ClassArray2 access took 66msRandom ClassArray3 access took 36msRandom StructArray access took 9msInitialising test using 100000 array members. Please wait...Testing rate of linear access of all array members.Linear ClassArray1 access took 90msLinear ClassArray2 access took 264msLinear ClassArray3 access took 164msLinear StructArray access took 7msTesting rate of random access of array members.Random ClassArray1 access took 161msRandom ClassArray2 access took 272msRandom ClassArray3 access took 252msRandom StructArray access took 25msInitialising test using 1000000 array members. Please wait...Testing rate of linear access of all array members.Linear ClassArray1 access took 127msLinear ClassArray2 access took 291msLinear ClassArray3 access took 261msLinear StructArray access took 16msTesting rate of random access of array members.Random ClassArray1 access took 1340msRandom ClassArray2 access took 1675msRandom ClassArray3 access took 980msRandom StructArray access took 86msFinished running app.November 26, 2016 at 11:18 pm #5450And on my Sony Xperia D5803 running Android 6.0.1 the results aren’t too shabby either :
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859Initialising test using 100 array members. Please wait...Testing rate of linear access of all array members.Linear ClassArray1 access took 74msLinear ClassArray2 access took 67msLinear ClassArray3 access took 50msLinear StructArray access took 38msTesting rate of random access of array members.Random ClassArray1 access took 69msRandom ClassArray2 access took 74msRandom ClassArray3 access took 47msRandom StructArray access took 27msInitialising test using 1000 array members. Please wait...Testing rate of linear access of all array members.Linear ClassArray1 access took 106msLinear ClassArray2 access took 320msLinear ClassArray3 access took 105msLinear StructArray access took 38msTesting rate of random access of array members.Random ClassArray1 access took 99msRandom ClassArray2 access took 351msRandom ClassArray3 access took 102msRandom StructArray access took 28msInitialising test using 10000 array members. Please wait...Testing rate of linear access of all array members.Linear ClassArray1 access took 85msLinear ClassArray2 access took 418msLinear ClassArray3 access took 121msLinear StructArray access took 50msTesting rate of random access of array members.Random ClassArray1 access took 260msRandom ClassArray2 access took 673msRandom ClassArray3 access took 232msRandom StructArray access took 108msInitialising test using 100000 array members. Please wait...Testing rate of linear access of all array members.Linear ClassArray1 access took 166msLinear ClassArray2 access took 746msLinear ClassArray3 access took 603msLinear StructArray access took 51msTesting rate of random access of array members.Random ClassArray1 access took 983msRandom ClassArray2 access took 1218msRandom ClassArray3 access took 1257msRandom StructArray access took 154msInitialising test using 1000000 array members. Please wait...Testing rate of linear access of all array members.Linear ClassArray1 access took 295msLinear ClassArray2 access took 832msLinear ClassArray3 access took 819msLinear StructArray access took 57msTesting rate of random access of array members.Random ClassArray1 access took 1886msRandom ClassArray2 access took 1923msRandom ClassArray3 access took 1964msRandom StructArray access took 688msNovember 27, 2016 at 9:38 am #5453Finally, here are some results on a 2-year-old lenovo yoga B8000 android tablet. Not sure why the last line is missing.
Monkey1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859Initialising test using 100 array members. Please wait...Testing rate of linear access of all array members.Linear ClassArray1 access took 124msLinear ClassArray2 access took 129msLinear ClassArray3 access took 110msLinear StructArray access took 91msTesting rate of random access of array members.Random ClassArray1 access took 118msRandom ClassArray2 access took 140msRandom ClassArray3 access took 111msRandom StructArray access took 84msInitialising test using 1000 array members. Please wait...Testing rate of linear access of all array members.Linear ClassArray1 access took 122msLinear ClassArray2 access took 322msLinear ClassArray3 access took 180msLinear StructArray access took 86msTesting rate of random access of array members.Random ClassArray1 access took 129msRandom ClassArray2 access took 262msRandom ClassArray3 access took 181msRandom StructArray access took 86msInitialising test using 9999 array members. Please wait...Testing rate of linear access of all array members.Linear ClassArray1 access took 240msLinear ClassArray2 access took 1452msLinear ClassArray3 access took 388msLinear StructArray access took 94msTesting rate of random access of array members.Random ClassArray1 access took 359msRandom ClassArray2 access took 537msRandom ClassArray3 access took 365msRandom StructArray access took 121msInitialising test using 100000 array members. Please wait...Testing rate of linear access of all array members.Linear ClassArray1 access took 495msLinear ClassArray2 access took 1753msLinear ClassArray3 access took 1522msLinear StructArray access took 144msTesting rate of random access of array members.Random ClassArray1 access took 2166msRandom ClassArray2 access took 2673msRandom ClassArray3 access took 2533msRandom StructArray access took 285msInitialising test using 999999 array members. Please wait...Testing rate of linear access of all array members.Linear ClassArray1 access took 736msLinear ClassArray2 access took 1974msLinear ClassArray3 access took 1981msLinear StructArray access took 284msTesting rate of random access of array members.Random ClassArray1 access took 3813msRandom ClassArray2 access took 3969msRandom ClassArray3 access took 4201ms -
AuthorPosts
You must be logged in to reply to this topic.