Isidoros Tsaousis-Seiras
Aristotle's University of Thessaloniki
Commits on GitHub
HPX provides highly optimized facilities to ensure maximal
resource utilization.
However there is still performance to be gained in object
movement tasks. Proposal
P1144
introduces the idea
of relocation. A facility used to hide the
boilerplate of object relocation, and optimize it when
possible. Essentially, when relocating a buffer of objects,
the uninitialized_relocate
algorithm will determine if it is valid to reduce multiple
move-constructor and destructor calls to a
single memcpy()
. This improvement impacts key
primitives like vector.reserve()
, leading to
speedup in higher-level programs.
There are two main steps to implement relocation semantics. First, we need to determine if a type is trivially relocatable. Second, we need to implement the relocation mechanism. There is also a third step, which is specific to HPX, and that is to implement the parallelized version of the procedure.
Most, but not all types happen to be agnostic to their location in memory. That is, they can be moved around without any special care. For a complete list of such types, see here. To understand this better, let's delve into some examples.
For example, an std::vector
object is trivially
relocatable, because it is just a wrapper around pointer to a buffer.
This is what trivially relocating it looks like:
It is perfectly safe to memcpy
an std::vector
to a new location and deallocate the memory holding the old object. That is because:
In contrast, an std::string
is not
trivially relocatable (in libstdc++), because
it contains a pointer to an internal buffer (used
for small string optimization). This is what attempting to trivially relocate it would look like:
The copied object will point to the same buffer as the original, so when the memory holding the original object is freed, the copied one will be left with a dangling pointer.
Another example of a non-trivially relocatable type
is std::list
(in libc++ and libstdc++). It is a circularly
linked list, so the last element points to the first. This is what attempting to trivially
relocate it would look like:
If we were to memcpy
the list (the first node)
to a new location, the last element would
still point to the old location, so after freeing,
that pointer would be dangling.
While there's no definitive sign that a type is trivially relocatable, certain common cases can be identified:
We can test for the first case using std::is_trivially_copyable
.
We cannot cover the
second case with today's C++. And in any case it would be a
very limited solution. For example
neither cases cover std::vector
, which is
trivially relocatable, but not trivially copyable
and it does have special member functions.
To cover all cases, P1144 proposes that the user can declare that a type is trivially relocatable.
To achieve this, HPX offers the following interface, implemented here:
HPX_DECLARE_TRIVIALLY_RELOCATABLE(custom_type);
struct hpx::is_trivially_relocatable<custom_type> : std::true_type {};
Here, the user is specializing hpx::is_trivially_relocatable<T>
for their type.
By default, it's true for trivially copyable types, and false otherwise.
That way we automatically optimize for all the simple cases,
but we also let the user benefit further with their own types.
It is important to note that trivial relocatability is implementation-specific, so HPX can't
automatically optimize the relocation of standard types like std::vector
. As the standard
lacks a way to denote
trivial relocatability, the only way to confirm the safety of such optimization is by examining each STL
vendor's
implementation and labeling the types as trivially relocatable. However, this is impractical
due to the high maintenance it would entail. Hence, users must designate types as trivially relocatable.
Let t_buffer
be a buffer of n
objects of type T
, and
t_buffer_new
be another buffer large enough to
fit m
≥ n
objects of type T
.
The desired state after the relocation is:
t_buffer_new
contains the objects
originally in t_buffer
t_buffer
can be repurposed or freed
without further cleanup
The traditional approach is to move
each object
from t_buffer
to
t_buffer_new
and destroy
the moved-from objects in t_buffer
.
This amounts to n calls to T
's move-constructor and n and
to its destructor. Depending on the type, these operations can be
expensive. For example, clang's std::vector
will set the data
pointer of the moved-from object to nullptr
and the size
and capacity
to 0
.
But since we know that std::vector
is a
trivially relocatable type, we can do better.
We can simply bitwise-copy the entire buffer
from t_buffer
to t_buffer_new
.
This amounts to a single call to memcpy
for
n*sizeof(T)
bytes.
The interface the user sees is:
uninitialized_relocate_n(t_buffer, n, t_buffer_new);
So the implementation details are hidden from them. Internally, the optimized path will be chosen if all of the following conditions are met:
T
is trivially relocatableT
is not volatile (memcpy
cannot be used on volatile objects)As seen here.
Testing this for trivially relocatable STL types, we see that the optimized path is substantially faster.
It's easy to see that both the traditional and the optimized
approach are embarrassingly
parallel. Using HPX's parallelization facilities, we can
easily make this into a ready-to-use optimized algorithm. Specifically, the parallelization
was achieved using the partitioner_with_cleanup
internal facility. The implementation can
be
seen here.
The parallelization yields high observable speedup, as seen in the graph below.
hpx::is_relocatable
tag, which is a condition different from trivial
relocatability,
but it is needed to ensure that types marked as trivially relocatable are actually legal to move
in the first place.
hpx::is_trivially_relocatable
tag, along with the user interface described above.
uninitialized_relocate
algorithm in a bare-bones fashion, along
with the
relocate_at
algorithm, which is the single element version of the former.
It also implemented the method selection logic, using the hpx::is_trivially_relocatable
tag.
uninitialized_relocate_n
which differs only in terms of the API.
Code additions in the above PRs are accompanied with the matching tests.
The P1144 proposal is still active. It's essential to monitor its progress and integrate relevant updates into HPX. While the proposal awaits formal acceptance, prompting it's adoption from the community and even applying its advantages within HPX itself is a great way to help improve our software.
Some of the features that are still missing from the HPX implementation are:
relocate
function,
which requires language features that are not yet available in C++. It provides the interface
T t = std::relocate(&t_old)
, which is equivalent to either move
-ing
t_old
into t
, or memcpy
-ing it and skipping any
constructor or destructor calls if it is trivially relocatable.
My deepest thanks go to
Giannis Gonidelis,
Hartmut Kaiser, and
Panagiotis Syskakis
for their mentorship and invaluable insights throughout this project.
Additionally, I'd like to express my gratitude to P1144's author, Arthur O'Dwyer, not only for his pivotal code reviews and insights but also for pioneering the proposal itself.