Parallel computations

Installation

Muscat is based on the Kokkos ecosystem [1] to support (thread-scalable) node-level parallelism (i.e. CPU/GPU shared memoery). During the compilation phase, please add the -DMuscat_ENABLE_Kokkos=ON flag to enable Kokkos usage.

Usage

Note

Muscat only supports the usage of Kokkos on some specific algorithm specifically designed for it. They are all listed in the Muscat::KK namespace.

Host & Device

In the context of parallel computing (particularly in environments like GPU programming) the terms “host” and “device” refer to different types of processing units:

  • Host: This typically refers to the CPU (Central Processing Unit) and the main system memory (RAM) where the main application runs. The host is responsible for managing the overall application flow, including setting up computations and transferring data to and from the device.

  • Device: This usually refers to the GPU (Graphics Processing Unit) or other accelerators used for high-performance parallel computations. The device is optimized for executing large numbers of parallel operations.

Warning

Using a device like a GPU isn’t always the best choice because data transfer overhead can negate speedup for smaller tasks, and some algorithms may not parallelize well, making the CPU more efficient. To maximize efficiency, it’s crucial to fully utilize both the host and device. This involves wisely distributing workloads, and ensuring that each platform is leveraged for its strengths.

Python usage

from Muscat.Helpers.Kokkos.KokkosHelper import *

# [...]

with UseDevice(): # equivalent to UseDevice(False)
    # This part of the code will run on the Device (if available)

with UseDevice(True):
    # This part of the code will run on the Device or crash if not available

with UseHost():
    # This part of the code will run on the Host

You can ensure that a Device is available by using:

from Muscat.Helpers.Kokkos.KokkosHelper import *
ensureDeviceAvailable(True) # True to raise an exception if not available

Warning

If a device is not present and the ensureDeviceAvailable is not used to raise an exception, the program will run using available CPUs.

Footnotes

GPU offloading

In order to maximize the usage of your CPU and GPU, it would be useful to use both at the same time. That’s what Future are for ! Some python functions have an adapted version returning a Future object. Those object are running in background and allow you to deal with other kind of data meanwhile. For example:

from Muscat.Helpers.Kokkos.KokkosHelper import *
from Muscat.LinAlg.Kokkos.Utils import *

# [...]

matrix = np.array([[1,2,3],[4,5,6],[7,8,9]]) # some data

ensureDeviceAvailable() # If there is no device available, this code is not optimal
with UseDevice():
    future = uniqueRowsFuture(matrix) # not blocking instruction

with UseHost():
    # This part of the code will run on the Host during the computation
    # [...]
print(future.availble()) # will print if the data is available, non blocking
result = future.get() # retrieve the data afterward, blocking