When creating high-speed data streaming applications, it’s important to avoid unnecessary data transfer to keep things fast and efficient. Operating systems (OS) automatically buffer file input/output (I/O) in the computer’s memory. However, many data streaming applications already have their own buffering steps, making the OS’s additional buffering unnecessary. Disabling this OS buffering allows direct control of data transfer, but it requires the application to access data in sizes that are multiples of the system page size (or disk sector size).
This blog post will show you how to build a C++ data structure called PageContainer
that lets you access data without the OS buffering. You can find the ready-to-use source code for PageContainer
in my Github repository.(URL: https://github.com/JianZhongDev/PageContainer )
Container Class Requirements#
For applications that need to read and write large amounts of data without OS buffering, the data container should meet the following requirements:
- The buffer size should be a multiple of the page size.
- It should be able to store data whose size does not perfectly match multiples of the page size
- It should be capable of saving multiple containers within a large file.
The PageContainer Class#
To meet the requirements for accessing data without OS buffering, we can create a PageContainer
class. The following sections will explain each part of this class in detail.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
| namespace Container{
typedef int errflag_t;
static const errflag_t ERR_NULL = 0;
static const errflag_t SUCCESS = 1;
static const errflag_t ERR_FAILED = -1;
static const errflag_t ERR_MEMERR = -2;
// return errflag for fast error handling
template<typename dtype>
class PageContainer {
private:
void* buffer_p = nullptr;
size_t* buffer_size_p = nullptr;
dtype* data_p = nullptr;
size_t* data_size_p = nullptr;
bool allocated_buffer = false;
// assign data pointers insider buffer
errflag_t assign_pointers() {
if (this->buffer_p != nullptr) {
this->buffer_size_p = (size_t*)this->buffer_p;
this->data_size_p = ((size_t*)this->buffer_p) + 1;
this->data_p = (dtype*)(((size_t*)this->buffer_p) + 2);
return SUCCESS;
}
return ERR_FAILED;
}
// initialize buffer and data pointers
errflag_t init_buffer(size_t buffer_size) {
errflag_t err_flag = ERR_NULL;
if (this->buffer_p == nullptr) {
// allocate buffer
this->buffer_p = malloc(buffer_size);
if (this->buffer_p == nullptr) return ERR_MEMERR;
memset(this->buffer_p, 0, buffer_size);
// assign pointers
err_flag = this->assign_pointers();
// update type
*(this->buffer_size_p) = buffer_size;
*(this->data_size_p) = 0;
}
this->allocated_buffer = true;
return err_flag;
}
// free allocated buffer
errflag_t free_buffer() {
if (this->allocated_buffer) { // free buffer only when object allocated the buffer.
if (this->buffer_p != nullptr) {
free(this->buffer_p);
}
}
return SUCCESS;
}
public:
// constructor from existing buffer
PageContainer(void* input_buffer_p, bool new_buffer = true) {
errflag_t errflag = ERR_NULL;
size_t buffer_size = *((size_t*)input_buffer_p);
if (new_buffer) {
errflag = this->init_buffer(buffer_size);
memcpy(this->buffer_p, input_buffer_p, buffer_size);
}
else {
this->buffer_p = input_buffer_p;
errflag = this->assign_pointers();
}
// throw error when failed
if (errflag != SUCCESS) {
throw std::runtime_error("PageContainer failed to init. Error flag = " + std::to_string(errflag));
}
}
// constructor by giving max data size
PageContainer(size_t capacity, size_t page_size) {
errflag_t errflag = ERR_NULL;
size_t nof_pages = (capacity + 2 * sizeof(size_t)) / page_size;
if ((capacity + 2 * sizeof(size_t)) % page_size > 0) nof_pages += 1;
size_t buffer_size = nof_pages * page_size;
errflag = this->init_buffer(buffer_size);
// throw error when failed
if (errflag != SUCCESS) {
throw std::runtime_error("PageContainer failed to init. Error flag = " + std::to_string(errflag));
}
}
// constructor by giving buffer size
PageContainer(size_t buffer_size) {
errflag_t errflag = ERR_NULL;
errflag = this->init_buffer(buffer_size);
// throw error when failed
if (errflag != SUCCESS) {
throw std::runtime_error("PageContainer failed to init. Error flag = " + std::to_string(errflag));
}
}
// destuctor frees memory
~PageContainer() {
this->free_buffer();
}
// get the buffer pointer and size
errflag_t get_buffer(void** buf_pp, size_t* buf_size_p) {
*buf_pp = this->buffer_p;
*buf_size_p = *(this->buffer_size_p);
return SUCCESS;
}
// get data pointer
errflag_t get_data_p(dtype** data_pp) {
*data_pp = this->data_p;
return SUCCESS;
}
// get capacity
errflag_t get_capacity(size_t* capacity_p) {
size_t capacity = *(this->buffer_size_p) - 2 * sizeof(size_t);
*capacity_p = capacity;
return SUCCESS;
}
// get data size
errflag_t get_data_size(size_t* data_size_p) {
*data_size_p = *(this->data_size_p);
return SUCCESS;
}
// set data size
errflag_t set_data_size(size_t data_size) {
errflag_t err_flag = ERR_NULL;
size_t capacity = 0;
err_flag = this->get_capacity(&capacity);
if (err_flag != SUCCESS) return err_flag;
if (data_size > capacity) {
return ERR_FAILED;
}
*(this->data_size_p) = data_size;
return SUCCESS;
}
};
}
|
Data structure#
As shown in the cover image of this post, the main storage space of the PageContainer
is a buffer sized as a multiple of the system page size. The first sizeof(size_t)
bytes (8 bytes on 64-bit systems, 4 bytes on 32-bit systems) store the total size of the buffer. The next sizeof(size_t)
bytes store the size of the valid data. The remaining space is used for data storage. Pointers are assigned for the entire buffer, the buffer size, the data size, and the data storage area.
This data structure setup allows data access without OS buffering and lets you store data that isn’t an exact multiple of the page size. By storing the buffer size and data size for each PageContainer
object, you can store and access multiple PageContainer
buffers within a single file without needing extra metadata.
Creating and deleting a PageContainer#
When creating a PageContainer object, it allocates a buffer and organizes it based on the structure described earlier. The methods assign_pointers()
and init_buffer()
handle the allocation of the buffer and set up the necessary pointers.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
| // assign data pointers insider buffer
errflag_t assign_pointers() {
if (this->buffer_p != nullptr) {
this->buffer_size_p = (size_t*)this->buffer_p;
this->data_size_p = ((size_t*)this->buffer_p) + 1;
this->data_p = (dtype*)(((size_t*)this->buffer_p) + 2);
return SUCCESS;
}
return ERR_FAILED;
}
// initialize buffer and data pointers
errflag_t init_buffer(size_t buffer_size) {
errflag_t err_flag = ERR_NULL;
if (this->buffer_p == nullptr) {
// allocate buffer
this->buffer_p = malloc(buffer_size);
if (this->buffer_p == nullptr) return ERR_MEMERR;
memset(this->buffer_p, 0, buffer_size);
// assign pointers
err_flag = this->assign_pointers();
// update type
*(this->buffer_size_p) = buffer_size;
*(this->data_size_p) = 0;
}
this->allocated_buffer = true;
return err_flag;
}
|
With the methods for buffer allocation and arrangement in place, the constructors of the PageContainer can be defined simply as follows:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
| // constructor by giving buffer size
PageContainer(size_t buffer_size) {
errflag_t errflag = ERR_NULL;
errflag = this->init_buffer(buffer_size);
// throw error when failed
if (errflag != SUCCESS) {
throw std::runtime_error("PageContainer failed to init. Error flag = " + std::to_string(errflag));
}
}
// constructor by giving max data size
PageContainer(size_t capacity, size_t page_size) {
errflag_t errflag = ERR_NULL;
size_t nof_pages = (capacity + 2 * sizeof(size_t)) / page_size;
if ((capacity + 2 * sizeof(size_t)) % page_size > 0) nof_pages += 1;
size_t buffer_size = nof_pages * page_size;
errflag = this->init_buffer(buffer_size);
// throw error when failed
if (errflag != SUCCESS) {
throw std::runtime_error("PageContainer failed to init. Error flag = " + std::to_string(errflag));
}
}
|
Here, the first constructor accepts a directly calculated buffer size (buffer_size
) from external sources. The second constructor takes expected maximum data size (capacity
, in bytes) and system page size (page_size
, in bytes) as inputs, calculating the minimum required buffer size accordingly.
In some applications, such as when loading a PageContainer
buffer from a file or using intermediate buffering, there may already be memory allocated to store the buffer. In these cases, where pre-existing buffers are available, PageContainer
also provides constructors that use these buffers directly, without allocating additional space.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
| // constructor from existing buffer
PageContainer(void* input_buffer_p, bool new_buffer = true) {
errflag_t errflag = ERR_NULL;
size_t buffer_size = *((size_t*)input_buffer_p);
if (new_buffer) {
errflag = this->init_buffer(buffer_size);
memcpy(this->buffer_p, input_buffer_p, buffer_size);
}
else {
this->buffer_p = input_buffer_p;
errflag = this->assign_pointers();
}
// throw error when failed
if (errflag != SUCCESS) {
throw std::runtime_error("PageContainer failed to init. Error flag = " + std::to_string(errflag));
}
}
|
To properly manage the dynamically allocated buffer, we have also defined the free_buffer()
method as follows.
1
2
3
4
5
6
7
8
9
| // free allocated buffer
errflag_t free_buffer() {
if (this->allocated_buffer) { // free buffer only when object allocated the buffer.
if (this->buffer_p != nullptr) {
free(this->buffer_p);
}
}
return SUCCESS;
}
|
In the PageContainer
destructor, the buffer is freed if it was allocated during construction.
1
2
3
4
| // destuctor frees memory
~PageContainer() {
this->free_buffer();
}
|
Accessing data and buffer#
In data streaming applications, it’s common to have APIs that accept pointers to storage spaces for reading and writing data. Here’s a typical pesudo code of how such an API might be structured:
1
2
| errorflag_t write_data(handle_t file_handle, void* src_buffer, size_t bytes_to_write);
errorflag_t read_data(handle_t file_handle, void* dst_buffer, size_t bytes_to_read, size_t* bytes_retrieved);
|
Examples of such APIs include functions like WriteFile
and ReadFile
provided by the Windows API.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
| BOOL WriteFile(
[in] HANDLE hFile,
[in] LPCVOID lpBuffer,
[in] DWORD nNumberOfBytesToWrite,
[out, optional] LPDWORD lpNumberOfBytesWritten,
[in, out, optional] LPOVERLAPPED lpOverlapped
);
BOOL ReadFile(
[in] HANDLE hFile,
[out] LPVOID lpBuffer,
[in] DWORD nNumberOfBytesToRead,
[out, optional] LPDWORD lpNumberOfBytesRead,
[in, out, optional] LPOVERLAPPED lpOverlapped
);
|
PageContainer
offers the following methods to access data stored within it following such pointer data IO scheme.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
| // get data pointer
errflag_t get_data_p(dtype** data_pp) {
*data_pp = this->data_p;
return SUCCESS;
}
// get capacity
errflag_t get_capacity(size_t* capacity_p) {
size_t capacity = *(this->buffer_size_p) - 2 * sizeof(size_t);
*capacity_p = capacity;
return SUCCESS;
}
// get data size
errflag_t get_data_size(size_t* data_size_p) {
*data_size_p = *(this->data_size_p);
return SUCCESS;
}
// set data size
errflag_t set_data_size(size_t data_size) {
errflag_t err_flag = ERR_NULL;
size_t capacity = 0;
err_flag = this->get_capacity(&capacity);
if (err_flag != SUCCESS) return err_flag;
if (data_size > capacity) {
return ERR_FAILED;
}
*(this->data_size_p) = data_size;
return SUCCESS;
}
|
The get_data_p()
method provides a pointer to the storage space where data is stored. The get_capacity()
method provides the maximum amount of data, in bytes, that the PageContainer
can hold. The get_data_size()
method provides the current size of the stored data in bytes. The set_data_size()
method provides the size of the stored data when new data is written into the PageContainer
.
For reading and writing operations that require data sizes to be multiples of the page size, PageContainer
offers the following method to access the entire allocated buffer.
1
2
3
4
5
6
| // get the buffer pointer and size
errflag_t get_buffer(void** buf_pp, size_t* buf_size_p) {
*buf_pp = this->buffer_p;
*buf_size_p = *(this->buffer_size_p);
return SUCCESS;
}
|
Using PageContainer#
Here’s a quick example demonstrating how to use the PageContainer
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
| void main() {
std::wstring file_path = L"test_file.bin";
HANDLE file_handle = INVALID_HANDLE_VALUE;
size_t page_size = 4 * 1024; // initial 4KB page size
Container::errflag_t errflag = Container::ERR_NULL;
DWORD error_flag = NULL;
DWORD d_error = NULL;
// get system page size
SYSTEM_INFO sys_info;
GetSystemInfo(&sys_info);
page_size = sys_info.dwPageSize;
std::cout << "system page size: " << page_size << std::endl;
// initialize test array
const size_t array_len = 10;
float* test_array = new float[array_len];
size_t data_size = array_len * sizeof(float);
for (size_t idx = 0; idx < array_len; ++idx) {
test_array[idx] = rand_float(0.0f, 1.0f, 1000);
}
std::cout << "write array: " << carray_to_string(test_array, array_len) << std::endl;
// create container with page size
Container::PageContainer<float> write_container(array_len * sizeof(float), page_size);
// copy data into container
size_t data_size_to_write = data_size;
float* write_data_p = nullptr;
size_t write_capacity = 0;
errflag = write_container.get_data_p(&write_data_p);
if (errflag != Container::SUCCESS) {
std::cout << "ERR:\t failed to access data pointer." << std::endl;
}
write_container.get_capacity(&write_capacity);
memcpy(write_data_p, test_array, data_size_to_write);
write_container.set_data_size(data_size_to_write);
// write data to file (bypassing system buffer)
// open file
file_handle = CreateFileW(
file_path.c_str(),
GENERIC_WRITE | GENERIC_READ,
0,
NULL,
CREATE_ALWAYS,
FILE_ATTRIBUTE_NORMAL | FILE_FLAG_NO_BUFFERING,
NULL);
if (file_handle == INVALID_HANDLE_VALUE) {
std::cout << "ERR:\t failed to create file." << std::endl;
return;
}
void* write_buffer_p = nullptr;
size_t write_buffer_size = 0;
write_container.get_buffer(&write_buffer_p, &write_buffer_size);
// write data
error_flag = WriteFile(file_handle, write_buffer_p, write_buffer_size, NULL, NULL);
d_error = GetLastError();
if (error_flag == FALSE && d_error != ERROR_IO_PENDING) {
std::cout << "ERR:\t error in write file, error code = " << d_error << std::endl;
return;
}
// close file
CloseHandle(file_handle);
file_handle = INVALID_HANDLE_VALUE;
std::cout << "data write to: ";
std::wcout << file_path;
std::cout << std::endl;
// read data from file
void* read_buffer_p = nullptr;
size_t read_buffer_size = 0;
DWORD bytes_read = 0;
// open file
file_handle = CreateFileW(
file_path.c_str(),
GENERIC_READ,
0,
NULL,
OPEN_EXISTING,
FILE_ATTRIBUTE_NORMAL,
NULL);
if (file_handle == INVALID_HANDLE_VALUE) {
std::cout << "ERR:\t failed to create file." << std::endl;
return;
}
// get buffer size
error_flag = ReadFile(file_handle, &read_buffer_size, sizeof(size_t), &bytes_read, NULL);
d_error = GetLastError();
if (error_flag == FALSE && d_error != ERROR_IO_PENDING) {
std::cout << "ERR:\t error in read file, error code = " << d_error << std::endl;
return;
}
std::cout << "read_buffer_size = " << read_buffer_size << std::endl;
Container::PageContainer<float> read_container(read_buffer_size);
read_container.get_buffer(&read_buffer_p, &read_buffer_size);
// set file pointer to start of the buffer
SetFilePointer(file_handle, 0, NULL, FILE_BEGIN);
d_error = GetLastError();
if (d_error != 0) {
std::cout << "ERR:\t error set file pointer, error code = " << d_error << std::endl;
return;
}
// load buffer from the file
error_flag = ReadFile(file_handle, read_buffer_p, read_buffer_size, &bytes_read, NULL);
d_error = GetLastError();
if (error_flag == FALSE && d_error != ERROR_IO_PENDING) {
std::cout << "ERR:\t error in read file, error code = " << d_error << std::endl;
return;
}
// close file
CloseHandle(file_handle);
file_handle = INVALID_HANDLE_VALUE;
std::cout << "data read from: ";
std::wcout << file_path;
std::cout << std::endl;
// display read data
float* read_arr = nullptr;
size_t read_data_size = 0;
size_t read_arr_len = 0;
read_container.get_data_p(&read_arr);
read_container.get_data_size(&read_data_size);
read_arr_len = read_data_size / sizeof(float);
std::cout << "read array: " << carray_to_string(read_arr, read_arr_len) << std::endl;
}
|
In this example, we start by creating a PageContainer
object (write_container
) to store a float test array. The size of this array doesn’t match the system’s page size. We use the memcpy()
function explicitly in the example to copy data into the PageContainer
object.
Next, we create a file with the FILE_FLAG_NO_BUFFERING
flag, which disables Windows’ OS file buffering. We then write the entire buffer of the PageContainer
object to this file and close it.
Lastly, we reopen the saved file to read the buffer size. Using this size, we create a new PageContainer
object (read_container
) and load the buffer from the file. Finally, we retrieve the array from the read_container
.
The output displayed in the terminal for the demo code above is as follows:
system page size: 4096
write array: {0.041, 0.467, 0.334, 0.5, 0.169, 0.724, 0.478, 0.358, 0.962, 0.464}
data write to: test_file.bin
read_buffer_size = 4096
data read from: test_file.bin
read array: {0.041, 0.467, 0.334, 0.5, 0.169, 0.724, 0.478, 0.358, 0.962, 0.464}
Conclusion#
By allocating a memory space sized in multiples of page sizes and organizing it into sections for storing buffer size, data size, and actual data, we created the PageContainer
class. This setup enables direct reading and writing of data without relying on OS buffering.
Citation#
If you found this article helpful, please cite it as:
Zhong, Jian (June 2024). PageContainer: Fast, Direct Data I/O Without OS Buffering. Vision Tech Insights. https://jianzhongdev.github.io/VisionTechInsights/posts/page_container_direct_data_io/.
Or
@article{zhong2024pagecontainer,
title = "PageContainer: Fast, Direct Data I/O Without OS Buffering",
author = "Zhong, Jian",
journal = "jianzhongdev.github.io",
year = "2024",
month = "June",
url = "https://jianzhongdev.github.io/VisionTechInsights/posts/building_a_configuration_file_parser_with_cpp/."
}