Parallel computations¶
Installation¶
Muscat is based on the Kokkos ecosystem [1] to support (thread-scalable) node-level parallelism (i.e. CPU/GPU shared memoery). During the compilation phase, please add the -DMuscat_ENABLE_Kokkos=ON
flag to enable Kokkos usage.
Usage¶
Note
Muscat only supports the usage of Kokkos on some specific algorithm specifically designed for it.
They are all listed in the Muscat::KK
namespace.
Host & Device¶
In the context of parallel computing (particularly in environments like GPU programming) the terms “host” and “device” refer to different types of processing units:
Host
: This typically refers to the CPU (Central Processing Unit) and the main system memory (RAM) where the main application runs. The host is responsible for managing the overall application flow, including setting up computations and transferring data to and from the device.Device
: This usually refers to the GPU (Graphics Processing Unit) or other accelerators used for high-performance parallel computations. The device is optimized for executing large numbers of parallel operations.
Warning
Using a device like a GPU isn’t always the best choice because data transfer overhead can negate speedup for smaller tasks, and some algorithms may not parallelize well, making the CPU more efficient. To maximize efficiency, it’s crucial to fully utilize both the host and device. This involves wisely distributing workloads, and ensuring that each platform is leveraged for its strengths.
Python usage¶
from Muscat.Helpers.Kokkos.KokkosHelper import *
# [...]
with UseDevice(): # equivalent to UseDevice(False)
# This part of the code will run on the Device (if available)
with UseDevice(True):
# This part of the code will run on the Device or crash if not available
with UseHost():
# This part of the code will run on the Host
You can ensure that a Device is available by using:
from Muscat.Helpers.Kokkos.KokkosHelper import *
ensureDeviceAvailable(True) # True to raise an exception if not available
Warning
If a device is not present and the ensureDeviceAvailable
is not used to raise an exception, the program will run using available CPUs.
Footnotes
GPU offloading¶
In order to maximize the usage of your CPU and GPU, it would be useful to use both at the same time.
That’s what Future
are for ! Some python functions have an adapted version returning a Future
object.
Those object are running in background and allow you to deal with other kind of data meanwhile.
For example:
from Muscat.Helpers.Kokkos.KokkosHelper import *
from Muscat.LinAlg.Kokkos.Utils import *
# [...]
matrix = np.array([[1,2,3],[4,5,6],[7,8,9]]) # some data
ensureDeviceAvailable() # If there is no device available, this code is not optimal
with UseDevice():
future = uniqueRowsFuture(matrix) # not blocking instruction
with UseHost():
# This part of the code will run on the Host during the computation
# [...]
print(future.availble()) # will print if the data is available, non blocking
result = future.get() # retrieve the data afterward, blocking