A few years ago I started working on a project, for which I created (among others) one C (instead of C++) file with a large number (hundreds) of static and global variables, and functions using those variables. A few douzen are arrays of exactly 32 or 64 kB, all allocated with __declspec(align(64)), the rest is mainly base types, sometimes small (size 2) arrays of base types.
I'm now trying to convert this file to a class, mainly because I need to be able to have multiple instances of it.
What I did:
1. Converted all functions to class methods.
2. Converted all the static and global variables to class members (removing 'static').
I have overloaded the constructor of the class to make sure it's always aligned a 64 bytes, and I've checked that the 32/64 kB arrays are also still aligned at 64 bytes.
At first I got a really big performance drop (more than 13%). After checking the pointer values I discovered that my old implementation with statics caused memory to be allocated at more-or-less random locations; after converting many of the arrays were exactly 64 kB apart which of course causes caching issues. So I added some 'fillers' (0x1100 bytes each) to get rid of that. This nearly completely restored the performance.
But now I have added all the variables, I'm seeing a 4% drop in performance. This new class is only a control layer with some simple calculations, most of the work is done elsewhere in other classes (and partially by IPP).
I'm using compiler option /Qipo, due to which almost everything gets inlined into one big function, which makes it difficult to analyse what is causing the changes. (I would have to wade through a few MB's of assembly output).
4% may not seem much, but this is a real-time application, which is consuming quite a lot of processing power as it is. So I really want to get rid of the extra overhead.
Are there more things (like the different memory locations) that I should be aware of when performing this conversion?