关于内存对齐(memory alignment
)的笔记
计算机读取数据的方式
计算机以字节块为粒度进行数据读取。例如,64位的机器中,数据总线宽度为64 bits(8 bytes),字节块大小为8,它以8 bytes的粒度进行数据读取。若int64_t x;
的地址为0x8
(能整除8bytes, 称为64位对齐),那么机器读取x
时,会将0x8~0xF
这个地址范围内的8 bytes读出,从而得到x
。若x
的存储地址未对齐(不能整除8, 称为未按8 bytes对齐),那么机器可能需要进行两次数据读取,才能得到x
的值。假设x
存储地址为0x6
,x
数据地址范围则为0x6~0xD
,跨越了0x0~0x7
0x8~0xF
这两个字节块,机器读取x
时,并不能够直接读出地址0x6~0xD
的8 bytes数据。机器需要先读取0x0~0x7
中的8 bytes,再读取0x8~0xF
中的8 bytes,最后拼接得到x
。
内存对齐:变量的地址是变量大小的整数倍($addr = n * sizeof(varible)$) 例如,double variable
的地址保证是8的整数倍,int variable
的地址保证是4的整数倍。
保证每一个变量的内存对齐,能够让机器以尽可能少的次数读取到数据。若变量内存未对齐,跨越了多个字节块,计算机就可能需要读取多个字节块
memory alignment
对性能的影响:
🔗code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
| #include <iostream> #include <chrono> #include <vector> #include <array>
#pragma pack(push, 1) struct UnalignedData { char c; double d; int i; }; #pragma pack(pop)
struct AlignedData { double d; int i; char c; };
void performance_test(){ const int iterations = static_cast<int>(1e4); std::vector<UnalignedData> unaligned(iterations) ; std::vector<AlignedData> aligned(iterations); std::array<double, iterations> collect;
for(int i = 0; i < iterations; ++i){ unaligned[i] = {static_cast<char>(i), i * 3.14, i}; aligned[i] = {i * 3.14, i, static_cast<char>(i)}; }
auto start = std::chrono::high_resolution_clock::now(); for(int i = 0; i < iterations; ++i){ collect[i] = unaligned[i].d; } auto end = std::chrono::high_resolution_clock::now(); std::chrono::duration<double, std::milli> unaligned_time(end - start); start = std::chrono::high_resolution_clock::now(); for(int i = 0; i < iterations; ++i){ collect[i] = aligned[i].d; } end = std::chrono::high_resolution_clock::now(); std::chrono::duration<double, std::milli> aligned_time(end - start); std::cout << "Exec Time of Unaligned: " << unaligned_time.count() << "ms" << std::endl; std::cout << "Exec Time of Aligned: " << aligned_time.count() << "ms" << std::endl; std::cout << "Performance Improvement: " << (unaligned_time.count() / aligned_time.count() -1)*100 << "%" << std::endl; } int main(){ performance_test(); return 0; }
|
Exec Time of Unaligned: 0.000185ms
Exec Time of Aligned: 7.2e-05ms
Performance Improvement: 156.944%
struct对齐
当一个结构体内有不同的数据类型时,如何保证所有的变量内存对齐?
编译器通过加入pad
实现内存对齐:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
| #include<iostream> #include<cstdint> #include<cassert> struct foo{ char a; uint16_t b; int32_t c; char d; };
int main(){
foo F; uintptr_t addr_a = reinterpret_cast<uintptr_t>(&F.a); uintptr_t addr_b = reinterpret_cast<uintptr_t>(&F.b); uintptr_t addr_c = reinterpret_cast<uintptr_t>(&F.c); uintptr_t addr_d = reinterpret_cast<uintptr_t>(&F.d); std::cout << "Address of F.a is 0x" << std::hex << addr_a << std::endl;
std::cout << "Address of F.b is 0x" << std::hex << addr_b << std::endl; assert((addr_b % 2) == 0);
std::cout << "Address of F.c is 0x" << std::hex << addr_c << std::endl; assert((addr_c % 4) == 0);
std::cout << "Address of F.d is 0x" << std::hex << addr_d << std::endl;
return 0; }
|
Address of F.a is 0x61fdf4
Address of F.b is 0x61fdf6
Address of F.c is 0x61fdf8
Address of F.d is 0x61fdfc
调换struct
中变量的声明顺序,会改变padding
的方式:
以上这种padding
更加节省存储,但是编译器不会主动去调换变量声明顺序去做优化。因此程序员需要手动调整结构体内变量声明顺序以达到最优的padding
。一般来说,按照数据类型大小,从大到小声明变量,能达到最佳的padding
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
| struct foo{ char a; uint16_t b; int32_t c; char d; };
struct foo2{ int32_t a; uint16_t b; char c; char d; };
int main(){
foo F; foo2 F2; std::cout << "struct foo's size = " << sizeof(F) << std::endl; std::cout << "struct foo2's size = " << sizeof(F2) << std::endl; }
|
struct foo’s size = 12
struct foo2’s size = 8
禁用编译器padding
#pragma pack(push 1)
…. #pragma pack(pop)
🔗code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
| #include <stdio.h>
struct AlignedStruct { char a; int b; double c; };
#pragma pack(push, 1) struct PackedStruct { char a; int b; double c; }; #pragma pack(pop)
int main() { struct AlignedStruct s1; struct PackedStruct s2; printf("Addresses:\n"); printf("s1: %p\ns1.a: %p\ns1.b: %p\ns1.c: %p\n", &s1, &s1.a, &s1.b, &s1.c); printf("s2: %p\ns2.a: %p\ns2.b: %p\ns2.c: %p\n", &s2, &s2.a, &s2.b, &s2.c); return 0; }
|
Reference
[1] 10 Things You Should Know About Memory Alignment | ncmiller.dev