Structs and the GC

Tagged: Struct

This topic contains 17 replies, has 7 voices, and was last updated by sicilica 2 years, 8 months ago.

Viewing 15 posts - 1 through 15 (of 18 total)

1 2 →

Author

Posts

July 28, 2016 at 5:28 pm #2498

sicilica

Participant

So, with structs (and I guess also with the ability to get references / manually pass around pointers), theoretically you could completely avoid invoking the GC if you didn’t use classes, because:

Any structs you create inside of a local scope would be on the stack rather than the heap (I assume)
Any structs that are passed by value will be on the stack
When you need pointers, or for any global data, you could create global arrays of structs (not struct pointers) in memory at init, then assign to those structs by value if they need replaced or pass pointers to them into functions when they need to be used – essentially giving you one large malloc() that you can control, instead of giving to the GC (you would, of course, need to write an allocator and probably want to optimize your ability to track slots that are in use / free, maybe with bitmasks etc)

What other heap allocations through the GC would be unavoidable? I would assume that any construction of strings (if you concat stuff or w/e) would go on the heap? Are there other “gotchas” you’d have to think about?

If you do it right, you should also be able to improve your cache hits by getting some control over locality this way. Does the overhead to try to manually control memory in m2 make sense – or would you be better off either not worrying about the GC and just hoping it’s “fast enough”, or switching to something like C if you actually need that level of performance?

July 28, 2016 at 6:30 pm #2501

gcmartijn

Participant

Would be cool if someone explain the need about structs.
When do you need them and are the a lot faster then a class ?

With my background using C# and python,monkey1,javascript,nodejs I never needed them.

http://monkey2.monkey-x.com/language-reference/

To declare a struct:

Struct Identifier

[crayon-5cba8ab318ce9378228998 inline="true" ]    ...Struct members...

1	[crayon-5cba8ab318ce9378228998 inline="true" ] ...Struct members...

[/crayon]

End

A struct can contain consts, globals, fields, methods, functions and other user defined types.

Structs are similar to classes, but differ in several important ways:

A struct is a ‘value type’, whereas a class is a ‘reference type’. This means that when you assign a struct to a variable, pass a struct to a function or return a struct from a function, the entire struct is copied in the process. —– what does this mean ?
Stucts are statically typed, whereas classes are dynamically typed.
Struct methods cannot be virtual.
A struct cannot extend anything.

July 28, 2016 at 7:41 pm #2502

dawlane

Participant

@gcmartijn:

A struct is a ‘value type’, whereas a class is a ‘reference type’. This means that when you assign a struct to a variable, pass a struct to a function or return a struct from a function, the entire struct is copied in the process. —– what does this mean ?

I will assume here that Mark is saying that the struct data type in MX2 is being passed by value. Very expensive in time and memory when you pass a large data object to and from a function or method, as the whole data object is copied to the stack.

Where as a data object that’s being passed by reference will only need the address of the object in memory to be copied onto the stack. A lot faster and more memory efficient.

July 28, 2016 at 7:58 pm #2503

sicilica

Participant

That’s exactly what the difference between passing a value and a reference is. @gcmartijn, I don’t think structs exist in any of the languages you mentioned, since they are all pretty high level. A struct is like if you duct-tape a bunch of variables together: if I pass a struct into a function, for example, it’s the same as if I was taking a parameter for each variable in the struct and passing all of them at the same time. All of the languages you mention follow the “every object is a pointer” mindset, so if you pass an object into a function, you’re not passing its data; instead, you’re passing an integer that represents where in memory the object is.

Anyway, whether you pass values or pointers into a function isn’t really the question I’m getting at here – of course you would never want to copy large structs around, and if you don’t know exactly what you’re doing you might not want to use structs at all. My question is about memory allocation – if I can’t be confident that I have a lot of control over the allocator, then it doesn’t make any sense to mess with what I’m talking about at all.

July 28, 2016 at 9:35 pm #2504

Mark Sibly

Keymaster

Are there other “gotchas” you’d have to think about?

Don’t think so…

The garbage collector gets involved whenever you call ‘new class’ or ‘new blah[]’. Also, if you ‘Slice’ an array or somehow create a new array, eg: with String.Split().

Strings aren’t GC’d but still need to be malloced/freed.

Would be cool if someone explain the need about structs.When do you need them and are the a lot faster then a class ?

They’re really just another tool and it’s up to you when to use them!

I like to use them for small-ish data structures that are frequently duplicated, eg: things like ‘Vec3f’ that might be used in complex-ish mathematical expressions such as v=v*scale+offset.

If Vec3f were a class, such expressions would involve considerable memory management overhead. Not so with structs where new values just go on the stack and are pretty much free to create. The downside is the struct contents need to be copied, but this is likely to be true no matter how you write Vec3f as each ‘operation’ usually needs to modify all members.

re: the docs – is this any better?

A struct is a ‘value type’, whereas a class is a ‘reference type’. This means that when you assign a struct to a variable, pass a struct to a function or return a struct from a function, an entirely new copy of the struct is created. This can be expensive if the struct is large as it involves copying every field from the source struct to the copy. The up-side is that allocating a copy does not involve garbage collection as all copies are created on the stack.

July 28, 2016 at 10:13 pm #2506

taumel

Spectator

I think it removes another onionskin but you’re still not down to the core.

What’s missing is a to the point introduction about the garbage collector, the stack, call by reference/value, the memory management, … the influences these things have in m2 on performance, so that you get an understanding about when you want to use which option, plus examples.

Btw. i like them for using small structs i calc with, like mem6f.

July 29, 2016 at 2:45 pm #2522

gcmartijn

Participant

I don’t see it yet haha, I know the point about a reference and non reference.

So a struct don’t cost memory because it will reset after a loop, and will use only memory when it is used by me inside a loop ?

While a Class Field is used from the beginning till the end of the program.

Look at this not working example

Normally I want to keep a postion X and Y and update this every loop.
Is it better/faster to keep it in a struct.

Class MojoTest Extends Window

Field posX:Float ' it cost memory now

Method OnRender( canvas:Canvas ) Override

App.RequestRender()

' posX is still using memory
' lot of code here

Local blaX:Float ' decl. is using memory

Local inst:=New TheStruct ' (decl.) only at this point a struct is using memory
inst.posX=MOUSEX()
blaX=MOUSEX() ' only at this point a Local is using memory (same as struct)

posX=MOUSEX() ' reference to the memory and update the value

End


Struct TheStruct
Field posX:Float
End
End

Class MojoTest Extends Window

Field posX:Float ' it cost memory now

Method OnRender( canvas:Canvas ) Override

App.RequestRender()

' posX is still using memory

' lot of code here

Local blaX:Float ' decl. is using memory

Local inst:=New TheStruct ' (decl.) only at this point a struct is using memory

inst.posX=MOUSEX()

blaX=MOUSEX() ' only at this point a Local is using memory (same as struct)

posX=MOUSEX() ' reference to the memory and update the value

End

Struct TheStruct

Field posX:Float

End

Can I say, A Struct acts just a Local but you can store Fields inside them.
It is only using memory when you declare them, and it is resetting after a loop.

If you want to use a variable in multiple classes then maybe you can make a global struct.
And maybe the struct will use memory if a function is using the global struct, but mostly not.

July 29, 2016 at 3:36 pm #2524

peterigz

Participant

I think the stack is a key thing here, if you didn’t have structs, then if you wanted to store vectors or matrices in a usable way you’d have to use a class which are objects managed by the garbage collector. So if you have a game entity that has vectors to store coordinates, velocity, direction and so on, that ends up being a lot more work for the GC. But with structs they’re all on the stack so when the game entity is removed the GC has less clean up work to do.

Yes you could store coordinates as individual x,y float fields in your class and do all your vector math “by hand” as it were, and not see any real benefit to using structs but you make your life a lot easier when you can just do Position += Direction * Speed

Put it this way, I converted my collision code from Monkey1 which stored vectors using a class. When I changed them to structs instead I saw a significant performance boost.

July 29, 2016 at 4:01 pm #2525

sicilica

Participant

It has less to do with how much memory you need to use and how often you need to allocate it then it has to do with managing the heap. Any time you “new up” a class in a language with managed pointers, all you really get is a pointer. Even if I store that class in a Local, which would be on the stack, the stack only contains the pointer – the actually data is somewhere in the heap. Even if you know how many “things” you’ll need and make, say, an array of 20 instances of some class – your array, wherever it’s stored, is an array of 20 pointers. The 20 instances you actually create will all be in the heap, and since they are allocated individually, it’s almost certain that they will all be at random places in memory. You want to control your allocations because contiguous memory will result in a lot fewer cache misses, and because polluting memory like that leads to fragmentation as objects get created and destroyed essentially ad hoc. For software that doesn’t need to be highly performant you don’t care, but for a game, you have to manage huge amounts of memory, you have a lot of calculations to make many many times per second, and you need all of this to happen as smoothly as possible so your framerate doesn’t have significant dips and spikes.

@mark – Let me try to think of a couple of the specific questions I have, I’m being super vague.

Is there a difference between declaring an array these two ways?

Global myArray:MyStruct[20]

Global myArray:MyStruct[] = New MyStruct[20] ' or with := ...

Global myArray:MyStruct[]
' do stuff, get in a method...
myArray = New MyStruct[20]

Global myArray:MyStruct[20]

Global myArray:MyStruct[] = New MyStruct[20] ' or with := ...

Global myArray:MyStruct[]

' do stuff, get in a method...

myArray = New MyStruct[20]

When you say strings need to be malloc’d / free’d – how or when does that happen? Is the GC still the one that tracks their usage?

And on the GC, I presume that it works by spidering memory to see what malloc’d objects are still accessible, similar to the way I think Java does? The only other algorithm I know of is to count references, which I don’t think we’re doing? Does it rely on reflection to know how to follow pointers at runtime?

In addition to giving us structs, M2 gave us the ability to convert a value into a pointer, like getting a reference in C++ (I think we use varptr or some weird keyword, I don’t know off the top of my head…). As in the above example, lets assume I declare an array of struct values in some global or local scope. I then pass a pointer to one of those structs into another function (because in this case, I didn’t make it an array of values because I wanted to pass by value, what I wanted was to control the allocator – I want the subroutine to be able to write to it still). What happens with the GC here? Now all the sudden there’s a pointer on the stack in my subroutine – will the GC do anything to try to manage it? What if the reference I passed was from a Local, on the stack, rather than something I had allocated in the heap? In C++, that pointer would be an address in the stack, and nothing would stop me from shooting myself in the foot if the subroutine ends up storing it somewhere and trying to use it after the caller’s stack frame goes out of scope. And again, what is the GC going to think / do about this pointer – especially if I do something nasty like store it?

July 29, 2016 at 4:09 pm #2526

sicilica

Participant

Sorry I’m being kinda nit-picky about this. I guess the thing for me is, this is the first time I’ve ever seen any form of unmanaged memory present in a language that also does managed pointers and automatic garbage collection, and it’s probably the first for a lot of people. With a language like C, I know exactly how all of that’s working; with a language like Java, I know that I have no ability to influence things so it’s okay if the inner workings are a magical black box; but in a situation like this, I’m excited to try to use these tools that could allow huge optimization, but it’s hard/scary to do so without understanding exactly how all of the memory management is performed under the hood.

July 29, 2016 at 8:43 pm #2532

gcmartijn

Participant

I try to visualize it and learn by mx2 codes.
And found this text about a array with struct

struct

An array of values v encoded by a struct (value type) looks like this in memory:

vvvv

class

An array of values v encoded by a class (reference type) look like this:

pppp

..v..v…v.v..

where p are the this pointers, or references, which point to the actual values v on the heap. The dots indicate other objects that may be interspersed on the heap. In the case of reference types you need to reference v via the corresponding p, in the case of value types you can get the value directly via its offset in the array.

</div>

July 29, 2016 at 8:47 pm #2533

Simon Armstrong

Participant

I don’t know how they work under the hood but I think of structs as primitive types.

When you pass ints and floats around their value is copied rather than a pointer to the value being shared.

This is true for structs so I have been able to adopt them happily knowing my managed object count is reduced.

August 2, 2016 at 2:52 am #2673

sicilica

Participant

How structs work under the hood – is precisely what I’m asking here. I know very well what a struct is.

So this code, which works in Monkey-X, isn’t valid in M2:

Class Whatever
    Field arr:Float[4]
End

Class Whatever

Field arr:Float[4]

End

I’m guessing that that was just an alias for “:= New Float[4]” anyway. In any case, for structs to make any sense, they need to be able to actually ‘own’ or ‘contain’ their data – for example, a struct for a vector or matrix needs to contain some array of floats. If the struct just contains a pointer, and the array ITSELF is going through the GC anyway, then the struct serves absolutely no purpose and provides no optimization – and Arrays are always Objects, from what I can tell.

Edit:

Actually, I probably found a bug that I should report. Here was the actual struct I was playing with; it throws a runtime error when you init an instance of it, on “data[0] = val” because data.Length is 0 at the time. Maybe it’s trying to initialize the properties and fields, but “data” hasn’t been assigned the pointer to the new array yet?

Struct Vector3

	Field data:= New Float[3]

	Property x:Float()
		Return data[0]
	Setter(val:Float)
		data[0] = val
	End
	Property y:Float()
		Return data[1]
	Setter(val:Float)
		data[1] = val
	End
	Property z:Float()
		Return data[2]
	Setter(val:Float)
		data[2] = val
	End

End

Struct Vector3

Field data:= New Float[3]

Property x:Float()

Return data[0]

Setter(val:Float)

data[0] = val

End

Property y:Float()

Return data[1]

Setter(val:Float)

data[1] = val

End

Property z:Float()

Return data[2]

Setter(val:Float)

data[2] = val

End

August 2, 2016 at 4:33 am #2675

Mark Sibly

Keymaster

I don’t know how they work under the hood but I think of structs as primitive types.

This is probably the best way to think of them, and they do in fact work very much like primitive types under the hood, eg:

Struct MyInt
   Field value:Int
End

Local t1:Int
Local t2:MyInt

Struct MyInt

Field value:Int

End

Local t1:Int

Local t2:MyInt

Here, both t1 and t2 will both take 4-ish bytes of stack memory (the size of the ‘int’ struct might end up being aligned) and both will ‘disappear’ when the statement block ends (so using Varptr with either is dangerous!).

Both are also ‘copied’ when assigned to variables or passed to or returned from functions.

Where things get perhaps a little confusing is with…

Local t3:=New MyInt

1	Local t3:=New MyInt

Since MyInt is a struct not a class, the ‘new MyInt’ bit doesn’t actually allocate ‘heap’ memory – it creates a new temporary MyInt (on the stack) that is then copied to t3. Which is actually quite similar to what this does…

Local t4:=5+10

1	Local t4:=5+10

This also creates a temporary int to store the result of 5+10 in (ie: 5+10 is a ‘new Int’), which is then copied to t4.

But really, apart from this wrinkle there’s nothing really that magical about structs – it’s pointers that are the tricky ones!

(Note also that when I say ‘on the stack’, this really just means logically on the stack. Compilers generally try to store as much as possible in cpu registers (which are really another form of stack storage) and this applies to structs as much as ints etc. So code like “Local t3:=New MyInt” may well reduce down to a single move instruction to a cpu register).

Structs and primitives also act the same way when it comes to arrays:

Local a1:=New Int[100]
Local a2:=New MyInt[100]

1 2	Local a1:=New Int[100] Local a2:=New MyInt[100]

Both of these will allocate about 400 bytes of heap memory (all arrays are stored on the heap regardless of whether the elements are primitives, structs or classes).

And in both cases, Varptr a[i+1]-Varptr a[i] will be 4, ie: the values are stored consecutively in memory.

So structs and primitive types are actually very very similar concepts.

As for…

Struct Vector3
	Field data:= New Float[3]
End

Struct Vector3

Field data:= New Float[3]

End

This should in fact be causing a compile time error – something along the lines of ‘struct field initializers must be constant’ (another related story…) – if not, there’s a bug or you’re not using the latest version of mx2cc.

August 2, 2016 at 9:37 pm #2686

sicilica

Participant

So there won’t be any way to allocate even fixed-size arrays “inside” a struct? That’s too bad, but it makes sense that arrays need to always be pointers. Guess I’ll have to continue indexing into databuffers and large float arrays.

It sounds like my intuition for everything else with struct pointers was right, then, but can you clarify what you said about how strings are handled? You said strings weren’t GC’d – but certainly they are, since they would always be on the heap, no?

Author

Posts

Viewing 15 posts - 1 through 15 (of 18 total)

1 2 →

You must be logged in to reply to this topic.

Structs and the GC

Archives

Recent Replies

What is this?