About Monkey 2 › Forums › Monkey 2 Development › Collision module and performance
This topic contains 43 replies, has 8 voices, and was last updated by
peterigz
2 years, 8 months ago.
-
AuthorPosts
-
June 12, 2016 at 9:19 pm #1082
I’ve done an initial convert of my collision code from Blitzmax/Monkey and generally it was all very easy
All code available here: https://github.com/peterigz/timelinefx.monkey2 (clone to timelinefx in modules folder)
There are performance issues though so I’d like to tackle that next. It’s pretty much the same implementation from Blitzmax/Monkey, I haven’t compared with monkey yet, only Blitzmax. I have a simple test with 10000 objects in a quadtree, with a small square that you move around with the mouse to query the quadtree and only draw the objects around the mouse pointer, or you can press space to draw all 10000 objects. So in release with monkey 2:
Drawing mouse objects only: 4-8ms
Drawing all: 80-85msBlitzmax
Drawing mouse objects only: 0-1ms
Drawing all: 7-10msSo at the moment quite a big difference, not sure if that’s because I’ve messed something up in the conversion or there’s still some optimising to be done with the compiler.
I do need to check a few things as I had some trouble with the “Concurrent list modification” error; I changed how things work a bit (and I changed to using stacks while I was at it instead of lists), so I may well have messed things up a bit but nothing obvious so far.
Feel free to play with the code, there’s a samples folder with the examples in there.
June 15, 2016 at 4:28 am #1101<edit-removed the link, since you got it working on the first post>
Great stuff! Thanks for sharing! I’ll try to work it into an entity system, if you don’t mind me using it (with credit, of course).
June 15, 2016 at 8:58 am #1102Go ahead it’s there to be used
I will be making changes to it but not anything that will affect the usage of it with the exception that I’ll probably switch to using lambdas for handling the collision events. I think they will be better then using interfaces which was the best solution I could think of in monkey 1. In Blitzmax I used function pointers which worked well too.
I’ll try and edit the first post, see if it works now…
June 15, 2016 at 7:24 pm #1104I was getting about 2ms/ 20ms on my macbook pro when I tried this last night.
M2 allows you to time things in microseconds instead of milliseconds, if you want to try that. Mark offered some good info on how to do accurate timing in the Window class in this thread: http://monkey2.monkey-x.com/forums/topic/strange-slowness-using-canvas-texture/June 15, 2016 at 7:45 pm #1105I had a quick look at this and there does seem to be some kind of weird overhead (on my windows machine anyway) with the line drawing (and nothing to do with GL either…) – will investigate. Ethernaut’s mac performance suggests this may even be a compiler thing.
I also had a go at comparing the mx2/bmx code quadtree code, but they seem to use slightly different algorithms – the bmx one looks like it uses a ‘leafy’ tree, whereas the mx2 one has objects stored in nodes too. It may or may not make much difference, but it makes it harder to compare the 2.
Without the rendering code, the 2 have similar performance although the bmx one is a bit faster – but it’s using a different algorithm and the mx2 one does a bit more, so it’s hard to tell what’s up.
June 15, 2016 at 9:45 pm #1106Both algorithms should be pretty much the same, with the exception of how objects move within the quadtree. In Blitzmax I’m able to remove from the quad tree node list and add them back again, but in monkey 2 I have to remove them and then batch them up to be added back at the end of the update due to the concurrent list modification, but I should think the time is about the same, it’s just adding them back at the end rather then on the fly.
But, I do have another much more simple test where the code is pretty much the same and doesn’t use the quadtree at all. This just checks for a collision between 2 polys as many times as it can in 1 second, hopefully it should be a lot more easy to diagnose:
Monkey
Monkey123456789101112131415161718192021222324#Import "<std>"#Import "<mojo>"#Import "<timelinefx>"Using std..Using timelinefx..Using mojo..Function Main()New AppInstanceLocal verts:= New Float[](0.0, 0.0, -150.0, 100.0, 50.0, 150.0, 185.0, 100.0, 300.0, 0.0)Local poly1:= CreatePolygon(150, 150, verts)Local poly2:= CreatePolygon(450, 250, verts)Local time:Int = App.MillisecsLocal collisions:IntPrint "Starting test..."While App.Millisecs - time <= 1000CheckCollision(poly1, poly2)collisions+=1WendPrint "Collisions done in 1 second: " + collisionsEndMax:
Monkey123456789101112131415161718SuperStrictImport rigz.collisionLocal verts:Float[] = [0.0, 0.0, -150.0, 100.0, 50.0, 150.0, 185.0, 100.0, 300.0, 0.0]Local poly1:tlPolygon = CreatePolygon(150, 150, verts)Local poly2:tlPolygon = CreatePolygon(450, 250, verts)Local time:Int = MilliSecs()Local collisions:IntPrint "Starting test..."While MilliSecs() - time <= 1000CheckCollision(poly1, poly2)collisions = collisions + 1WendPrint "Collisions done in 1 second: " + collisionsResults here (on Windows PC i7) in release mode:
Monkey:
Collisions done in 1 second: 192950BlitzMax:
Collisions done in 1 second: 1214547Blitzmax doesn’t mess about! In fact in Blitzmax it does 217000+/- in debug mode. I did double check that both polys in both versions are positioned exactly the same, they just slightly overlap each other.
I’ve just noticed one difference between each code, the monkey version is creating 2 arrays each check (with a length of 2 each), the max version is creating 4 variables instead, but that wouldn’t make a difference though would it? Will change it anyway just to check…
June 15, 2016 at 9:57 pm #1107Ok, so I changed the arrays to variables like they are in Blitzmax. I can see why I switched to arrays in monkey – because there was no variable pointers. I was using pointers in max and changing the values in the function without needing to pass them back.
So now, the new result in Monkey is:
Collisions done in 1 second: 567293
Decent improvement! Will keep looking for more improvements.
June 16, 2016 at 5:23 am #1108Took me a while to get BMax working again, hadn’t used it in a long time… I got these results on Os X, using both the old version of rigz.collision and the newly modified one:
BMax (release): Collisions done in 1 second: 2592103
M2 (release, today’s version): Collisions done in 1 second: 1711379
M2 (release, old version): Collisions done in 1 second: 740431June 16, 2016 at 9:33 am #1109Thanks for testing it.
Maybe arrays in M2 are the issue, I should try reversing it and add the arrays into the Max version to see what performance hit it takes. The vertices of the polygons are stored in an array, so maybe accessing those values are slow?
June 16, 2016 at 9:48 am #1110Found it I think!
I haven’t tried the bmx version, but the mx2 version has gone from 570,000/sec on my machine to 3,700,000/sec!
The problem was my implementation of DebugAssert. It was evaluating the string param even if the condition param was true – even in release mode – and one operation that uses DebugAssert is the array indexing operator in bbArray (to check the array index isn’t out of range).
This meant that *every* use of array[blah] in an app has been doing a cstring->bbstring conversion of “Array index out of range” behind the scenes, which involves strlen, malloc, strcpy etc. I’m actually pretty impressed by how well stuff was running with this going on.
This may also explain the slow DrawLine, as each vertex added involves 7 array index ops so 40000 lines=560000 completely unnecessary cstring->bbstring ops.
Looking forward to fixing this properly tomorrow!
June 16, 2016 at 10:17 am #1111Wow, that sounds a bit more like it!
Looking forward to trying the new version out. Had a feeling there must be a spanner in the works somewhere
If MX2 can be shown to out perform Max then that could be a real clincher for a lot of people to make the jump over.
June 16, 2016 at 11:30 pm #1112V010 now pushed!
Some things have been moved around (eg: build scripts are now in /scripts) as I’m in the process of getting it all ready for a release, hopefully by the end of the month.
June 17, 2016 at 7:18 am #1116Whohoo! Got about 4 million collisions with V010, 60% faster than my BlitzMax result:
Collisions done in 1 second: 4157842
June 17, 2016 at 9:22 am #1117That’s better! it’s insanely fast now, getting:
MX2: Collisions done in 1 second: 3694513
BMX: Collisions done in 1 second: 1241045So almost 3 times faster than bmx on my PC. The quadtree example is also where it should be as well, a solid 7-8ms when drawing a whole screen with 10000 objects, max goes from 7-10, the rendering will be a bit more of a normaliser there though I guess. BMX sort of flicks up to 10 every other second, I wonder if that’s the garbage collector cleaning things up?
Well done for finding the spanner!
Will keep tidying up the module and see if I can squeeze out any more speed, maybe I can get some particles going before the release
June 17, 2016 at 10:00 am #1118Alrighty!
BMX sort of flicks up to 10 every other second, I wonder if that’s the garbage collector cleaning things up?
Quite likely. Mx2 uses an incremental collector so it collects garbage as it goes, whereas bmx collects everything in one hit when it reaches some threshhold, and you get a ‘bump’ ala Java/C#. The incremental system actually ends up taking a bit more time in total, but I think it’s worth it for the smoother performance.
Next think I would suggest looking into is perhaps using a struct for vector2d instead of a class. This would reduce GC overhead even more, as structs are created ‘on the stack’. Structs take a bit of getting using to – the semantics are fairly different from classes – but in some situations they are very efficient.
-
AuthorPosts
You must be logged in to reply to this topic.