Throughout the memory hierarchy, data moves at cache line granularity - 64 bytes per line. Although this is much larger than many common data types, such as integer, float, or double, unaligned values of these or other types may span two cache lines. Recent Intel architectures have significantly improved the performance of such 'split stores' by introducing split registers to handle these cases. But split stores can still be problematic, especially if they consume split registers which could be servicing other split loads.
A significant portion of cycles is spent handling split stores.
Consider aligning your data to the 64-byte cache line granularity.
Note that this metric value may be highlighted due to Port 4 issue.