An ICudaTask is used to attach a task to an NVIDIA Cuda GPU. More...

#include <htgs/api/ICudaTask.hpp>

Inheritance diagram for htgs::ICudaTask< T, U >:

Collaboration diagram for htgs::ICudaTask< T, U >:

Public Member Functions
	ICudaTask (int *cudaIds, size_t numGpus, bool autoEnablePeerAccess=true)
	Creates an ICudaTask. More...

virtual void	initializeCudaGPU ()
	Virtual function that is called when the ICudaTask has been initialized and is bound to a CUDA GPU.

virtual void	executeTask (std::shared_ptr< T > data)=0
	Executes the ICudaTask on some data. More...

virtual void	shutdownCuda ()
	Virtual function that is called when the ICudaTask is shutting down.

virtual std::string	getName () override
	Virtual function that gets the name of this ICudaTask. More...

std::string	getDotFillColor () override
	Gets the color for filling the shape for graphviz dot. More...

virtual ITask< T, U > *	copy ()=0
	Pure virtual function that copies this ICudaTask. More...

virtual void	debug () override
	Virtual function that can be used to provide debug information.

int	getCudaId ()
	Gets the Cuda Id for this cudaTask. More...

bool	requiresCopy (size_t pipelineId)
	Checks if the requested pipelineId requires GPU-to-GPU copy. More...

template<class V >
bool	requiresCopy (std::shared_ptr< MemoryData< V >> data)
	Checks if the requested pipelineId requires GPU-to-GPU copy. More...

bool	hasPeerToPeerCopy (size_t pipelineId)
	Checks if the requested pipelineId allows peer to peer GPU copy. More...

template<class V >
bool	autoCopy (V *destination, std::shared_ptr< MemoryData< V >> data, long numElems)
	Will automatically copy from one GPU to another (if it is required). More...

void	initialize () override final
	Initializes the CudaTask to be bound to a particular GPU. More...

void	shutdown () override final
	Shutsdown the ICudaTask. More...

const cudaStream_t &	getStream () const
	Gets the CUDA stream for this CUDA task. More...

int *	getCudaIds ()
	Gets the cudaIds specified during ICudaTask construction. More...

size_t	getNumGPUs ()
	Gets the number of GPUs specified during ICudaTask construction. More...

void	syncStream ()
	Synchronizes the Cuda stream associated with this task. More...

Public Member Functions inherited from htgs::ITask< T, U >
	ITask ()
	Creates an ITask with number of threads equal to 1.

	ITask (size_t numThreads)
	Constructs an ITask with a specified number of threads. More...

	ITask (size_t numThreads, bool isStartTask, bool poll, size_t microTimeoutTime)
	Constructs an ITask with a specified number of threads as well as additional scheduling options. More...

virtual bool	canTerminate (std::shared_ptr< AnyConnector > inputConnector) override
	Virtual function that is called when an ITask is checking if it can be terminated. More...

virtual void	executeTaskFinal () override
	Virtual function that is called just before the task has shutdown. More...

virtual std::string	getDotLabelName () override
	Virtual function to get the label name used for dot graph viz. More...

virtual std::string	getDotShapeColor () override
	Gets the color of the shape for graphviz dot. More...

virtual std::string	getDotShape () override
	Gets the shape for graphviz dot. More...

virtual std::string	getDotCustomProfile () override
	Adds the string text to the profiling of this task in the graphviz dot visualization. More...

virtual void	printProfile () override
	Prints the profile data to std::out. More...

virtual size_t	getNumGraphsSpawned ()
	Gets the number of graphs spawned by this ITask. More...

virtual std::string	genDotProducerEdgeToTask (std::map< std::shared_ptr< AnyConnector >, AnyITask *> &inputConnectorDotMap, int dotFlags) override

virtual std::string	genDotConsumerEdgeFromConnector (std::shared_ptr< AnyConnector > connector, int flags) override

virtual std::string	genDotProducerEdgeFromConnector (std::shared_ptr< AnyConnector > connector, int flags)

ITask< T, U > *	copyITask (bool deep) override
	Copies the ITask (including a copy of all memory edges) More...

void	addResult (std::shared_ptr< U > result)
	Adds results to the output list to be sent to the next connected ITask in a TaskGraph. More...

void	addResult (U *result)
	Adds results to the output list to be sent to the next connected ITask in a TaskGraph. More...

void	initialize (size_t pipelineId, size_t numPipeline, TaskManager< T, U > *ownerTask)
	Function that is called when an ITask is being initialized by it's owner thread. More...

template<class V >
m_data_t< V >	getMemory (std::string name, IMemoryReleaseRule *releaseRule)
	Retrieves memory from a memory edge. More...

template<class V >
m_data_t< V >	getDynamicMemory (std::string name, IMemoryReleaseRule *releaseRule, size_t numElems)
	Retrieves memory from a memory edge. More...

template<class V >
void	releaseMemory (m_data_t< V > memory)
	Releases memory onto a memory edge, which is transferred by the graph communicator. More...

void	resetProfile ()
	Resets profile data.

size_t	getThreadID ()
	Gets the thread ID associated with this task. More...

unsigned long long int	getTaskComputeTime () const
	Gets the task's compute time. More...

std::string	inTypeName () override final
	Gets the demangled input type name of the connector. More...

std::string	outTypeName () override final
	Gets the demangled output type name of the connector. More...

std::string	getAddress () override final
	Gets the address from the owner task, which is the address of the task graph. More...

void	setTaskManager (TaskManager< T, U > *ownerTask)
	Sets the owner task manager for this ITask. More...

TaskManager< T, U > *	getOwnerTaskManager ()
	Gets the owner task manager for this ITask. More...

virtual void	gatherProfileData (std::map< AnyTaskManager , TaskManagerProfile > *taskManagerProfiles)
	Gathers profile data. More...

Public Member Functions inherited from htgs::AnyITask
	AnyITask ()
	Creates an ITask with number of threads equal to 1.

	AnyITask (size_t numThreads)
	Constructs an ITask with a specified number of threads. More...

	AnyITask (size_t numThreads, bool isStartTask, bool poll, size_t microTimeoutTime)
	Constructs an ITask with a specified number of threads as well as additional scheduling options. More...

virtual	~AnyITask ()
	Destructor.

virtual std::string	genDot (int flags, std::string dotId, std::shared_ptr< htgs::AnyConnector > input, std::shared_ptr< htgs::AnyConnector > output)
	Virtual function that generates the input/output and per-task dot notation. More...

virtual std::string	getConsumerDotIds ()

virtual std::string	getProducerDotIds ()

virtual std::string	genDot (int flags, std::string dotId)
	Virtual function that adds additional dot attributes to this node. More...

virtual std::string	genCustomDot (ProfileUtils *profileUtils, int colorFlag)
	Virtual function to generate customized dot file. More...

virtual std::string	debugDotNode ()
	Provides debug output for a node in the dot graph. More...

virtual void	profile ()
	Virtual function that is called to provide profile output for the ITask. More...

virtual std::string	profileStr ()
	Virtual function that is called after executionTask is called. More...

void	initialize (size_t pipelineId, size_t numPipeline)
	Virtual function that is called when an ITask is being initialized by it's owner thread. More...

void	setPipelineId (size_t pipelineId)
	Sets the pipeline Id for this ITask. More...

size_t	getPipelineId ()
	Gets the pipeline ID. More...

void	setNumPipelines (size_t numPipelines)
	Sets the number of pipelines that this ITask belongs too. More...

size_t	getNumPipelines () const
	Sets the task graph communicator. More...

size_t	getNumThreads () const
	Gets the number of threads associated with this ITask. More...

bool	isStartTask () const
	Gets whether this ITask is a starting task. More...

bool	isPoll () const
	Gets whether this ITask is polling for data or not. More...

size_t	getMicroTimeoutTime () const
	Gets the timeout time for polling. More...

void	copyMemoryEdges (AnyITask *iTaskCopy)
	Copies the memory edges from this AnyITask to another AnyITask. More...

std::string	genDot (int flags, std::shared_ptr< AnyConnector > input, std::shared_ptr< AnyConnector > output)
	Creates a dot notation representation for this task. More...

void	profileITask ()
	Provides profile output for the ITask,. More...

std::string	getDotId ()
	Gets the id used for dot nodes. More...

std::string	getNameWithPipelineId ()
	Gets the name of the ITask with it's pipeline ID. More...

const std::shared_ptr< ConnectorMap > &	getMemoryEdges () const
	Gets the memory edges for the task. More...

const std::shared_ptr< ConnectorMap > &	getReleaseMemoryEdges () const
	Gets the memory edges for releasing memory for the memory manager, used to shutdown the memory manager. More...

bool	hasMemoryEdge (std::string name)
	Checks whether this ITask contains a memory edge for a specified name. More...

void	attachMemoryEdge (std::string name, std::shared_ptr< AnyConnector > getMemoryConnector, std::shared_ptr< AnyConnector > releaseMemoryConnector, MMType type)
	Attaches a memory edge to this ITask to get memory. More...

unsigned long long int	getMemoryWaitTime () const
	Gets the amount of time the task was waiting for memory. More...

void	incMemoryWaitTime (unsigned long long int val)
	Increments memory wait time. More...

Private Attributes
cudaStream_t	stream
	The CUDA stream for the ICudaTask (set after initialize)

int *	cudaIds
	The array of cuda Ids (one per GPU)

size_t	numGpus
	The number of GPUs.

int	cudaId
	The CudaID for the ICudaTask (set after initialize)

std::vector< int >	nonPeerDevIds
	The list of CudaIds that do not have peer-to-peer access.

bool	autoEnablePeerAccess
	Flag to automatically enables peer access between multiple GPUs.

Detailed Description

template<class T, class U>
class htgs::ICudaTask< T, U >

An ICudaTask is used to attach a task to an NVIDIA Cuda GPU.

The task that inherits from this class will automatically be attached to the GPU when launched by the TaskGraphRunTime from within a TaskGraphConf.

An ICudaTask may be bound to one or more GPUs if the task is added into an ExecutionPipeline. The number of CUContexts must match the number of pipelines specified for the ExecutionPipeline.

Mechanisms to handle automatic data motion for GPU-to-GPU memories is provided to simplify peer to peer device memory copies. In order to use peer to peer copy, both GPUs must reside on the same I/O Hub (IOH) and be the same GPU model.

It may be necessary to copy data that resides on two different GPUs. This can be achieved by using the autoCopy(V destination, std::shared_ptr<MemoryData<V>> data, long numElems) function. This occurs when there are ghost regions between data domains. If peer to peer copying is allowed between the multiple GPUs, then the autocopy function is not needed. See below for an example of using autocopy.

At this time it is necessary for the ICudaTask to copy data from CPU memories to GPU memories.

Functions are available for getting the CUDA stream, context, pipeline ID, and number of pipelines.

Note: It is ideal to configure a separate copy ICudaTask to copy data asynchronously from a computation ICudaTask for CPU->GPU or GPU->CPU copies.

Example implementation:

#define SIZE 100
class SimpleCudaTask : public htgs::ICudaTask<MatrixData, VoidData> {
public:
SimpleCudaTask(int *cudaIds, int numGpus) : ICudaTask(contexts, cudaIds, numGpus) { }
~SimpleCudaTask() {}
virtual void initializeCudaGPU()
{
   // Allocate local GPU memory in initialize will allocate on correct GPU
   cudaMalloc(&localMemory, sizeof(double) * SIZE);
}
virtual void executeTask(std::shared_ptr<MatrixData> data) {
  ...
  double * memory;
  // Checks if the data received needs to be copied to another GPU
  // getCudaMemoryData is defined by the MatrixData class
  if (this->autoCopy(localMemory, data->getCudaMemoryData(), SIZE))
  {
    // Copy was required
    memory = localMemory;
  }
  else
  {
    // Copy was not required because of peer to peer or same GPU
    memory = data->getMemoryData()->get();
  }
  ...
}
virtual void shutdownCuda() { cudaFree(localMemory); }
virtual void debug() { ... }
virtual std::string getName() { return "SimpleCudaTask"; }
virtual htgs::ITask<PCIAMData, CCFData> *copy() { return new SimpleCudaTask(...) }
private:
  double *localMemory;
};

Example usage:

htgs::TaskGraphConf<MatrixData, htgs::VoidData> *taskGraph = new htgs::TaskGraphConf<MatrixData, htgs::VoidData>();
SimpleCudaTask *cudaTask = new SimpleCudaTask(...);
// Adds cudaTask to process input from taskGraph, input type of cudaTask matches input type of taskGraph
taskGraph->setGraphConsumerTask(cudaTask);

Template Parameters

T	the input data type for the ICudaTask ITask, T must derive from IData.
U	the output data type for the ICudaTask ITask, U must derive from IData.

Constructor & Destructor Documentation

◆ ICudaTask()

template<class T , class U >

htgs::ICudaTask< T, U >::ICudaTask	(	int *	cudaIds,
		size_t	numGpus,
		bool	autoEnablePeerAccess = `true`
	)

inline

Creates an ICudaTask.

If this task is added into an ExecutionPipeline, then the number of cudaIds should match the number of pipelines

Parameters

cudaIds	the array of cudaIds
numGpus	the number of GPUs
autoEnablePeerAccess	Flag to automatically enables peer access between multiple GPUs (default true)

Member Function Documentation

◆ autoCopy()

template<class T , class U >

template<class V >

bool htgs::ICudaTask< T, U >::autoCopy	(	V *	destination,
		std::shared_ptr< MemoryData< V >>	data,
		long	numElems
	)

inline

Will automatically copy from one GPU to another (if it is required).

Will check if the data being copied requires to be copied first, and then execute cudaMemcpyPeerAsync if the data requires to be copied.

Parameters

destination	cuda memory that can be copied into, must be a pointer
data	the source MemoryData that is allocated using a CudaMemoryManager (created using taskGraph->addCudaMemoryEdge)
numElems	the number of elements to be copied

Returns: Whether the copy occurred or not

Return values

TRUE	if the copy was needed
FALSE	if the copy was not needed

Template Parameters

V	a type of MemoryData that is allocated using a CudaMemoryManager (created using taskGraph->addCudaMemoryEdge) AND must be a pointer

◆ copy()

template<class T , class U >

virtual ITask<T, U>* htgs::ICudaTask< T, U >::copy ( )

pure virtual

Pure virtual function that copies this ICudaTask.

Returns: the copy of the ICudaTask

Implements htgs::ITask< T, U >.

◆ executeTask()

template<class T , class U >

virtual void htgs::ICudaTask< T, U >::executeTask ( std::shared_ptr< T > data )

pure virtual

Executes the ICudaTask on some data.

Use this->getStream() to acquire CUDA stream if needed.

Parameters

data	the data executed on

Implements htgs::ITask< T, U >.

◆ getCudaId()

template<class T , class U >

int htgs::ICudaTask< T, U >::getCudaId ( )

inline

Gets the Cuda Id for this cudaTask.

Set only after this task has been bound to a thread during initialization.

Returns: the cudaId associated with this cudaTask

◆ getCudaIds()

template<class T , class U >

int* htgs::ICudaTask< T, U >::getCudaIds ( )

inline

Gets the cudaIds specified during ICudaTask construction.

Returns: the cudaIds

◆ getDotFillColor()

template<class T , class U >

std::string htgs::ICudaTask< T, U >::getDotFillColor ( )

inlineoverridevirtual

Gets the color for filling the shape for graphviz dot.

Returns: the fill color

Reimplemented from htgs::ITask< T, U >.

◆ getName()

template<class T , class U >

virtual std::string htgs::ICudaTask< T, U >::getName ( )

inlineoverridevirtual

Virtual function that gets the name of this ICudaTask.

Returns: the name of the ICudaTask

Reimplemented from htgs::ITask< T, U >.

◆ getNumGPUs()

template<class T , class U >

size_t htgs::ICudaTask< T, U >::getNumGPUs ( )

inline

Gets the number of GPUs specified during ICudaTask construction.

Returns: the number of GPUs

◆ getStream()

template<class T , class U >

const cudaStream_t& htgs::ICudaTask< T, U >::getStream ( ) const

inline

Gets the CUDA stream for this CUDA task.

Returns: the CUDA stream

◆ hasPeerToPeerCopy()

template<class T , class U >

bool htgs::ICudaTask< T, U >::hasPeerToPeerCopy ( size_t pipelineId )

inline

Checks if the requested pipelineId allows peer to peer GPU copy.

Parameters

pipelineId the pipelineId to check

Returns: Whether the pipeline id has peer to peer GPU copy

Return values

TRUE	if the pipeline id has peer to peer GPU copy
FALSE	if the pipeline id has peer to peer GPU copy

◆ initialize()

template<class T , class U >

void htgs::ICudaTask< T, U >::initialize ( )

inlinefinaloverridevirtual

Initializes the CudaTask to be bound to a particular GPU.

Note: This function should only be called by the HTGS API

Reimplemented from htgs::ITask< T, U >.

◆ requiresCopy() [1/2]

template<class T , class U >

bool htgs::ICudaTask< T, U >::requiresCopy ( size_t pipelineId )

inline

Checks if the requested pipelineId requires GPU-to-GPU copy.

Parameters

pipelineId the ExecutionPipeline id

Returns: whether the requested pipelineId would require a GPU-to-GPU copy

Return values

TRUE	if copy is required
FALSE	if copy is not required

◆ requiresCopy() [2/2]

template<class T , class U >

template<class V >

bool htgs::ICudaTask< T, U >::requiresCopy ( std::shared_ptr< MemoryData< V >> data )

inline

Checks if the requested pipelineId requires GPU-to-GPU copy.

Parameters

data	the memory data to check

Returns: whether the requested MemoryData would require GPU-to-GPU copy

Return values

TRUE	if copy is required
FALSE	if copy is not required

Template Parameters

V	a type of MemoryData that is allocated using a CudaMemoryManager (created using taskGraph->addCudaMemoryEdge)

◆ shutdown()

template<class T , class U >

void htgs::ICudaTask< T, U >::shutdown ( )

inlinefinaloverridevirtual

Shutsdown the ICudaTask.

Note: This function should only be called by the HTGS API

Reimplemented from htgs::ITask< T, U >.

◆ syncStream()

template<class T , class U >

void htgs::ICudaTask< T, U >::syncStream ( )

inline

Synchronizes the Cuda stream associated with this task.

Note: Should only be called after initialization

The documentation for this class was generated from the following file:

/home/tjb3/work/HTGS/src/htgs/api/ICudaTask.hpp

Public Member Functions

Private Attributes

Detailed Description

template<class T, class U> class htgs::ICudaTask< T, U >

Constructor & Destructor Documentation

◆ ICudaTask()

Member Function Documentation

◆ autoCopy()

◆ copy()

◆ executeTask()

◆ getCudaId()

◆ getCudaIds()

◆ getDotFillColor()

◆ getName()

◆ getNumGPUs()

◆ getStream()

◆ hasPeerToPeerCopy()

◆ initialize()

◆ requiresCopy() [1/2]

◆ requiresCopy() [2/2]

◆ shutdown()

◆ syncStream()

template<class T, class U>
class htgs::ICudaTask< T, U >