I have a include file which contains my defined variables like block size.
This file is included by my kernel functions which need defined block sizes or tile sizes.
The Problem is, the compiler does not register that this file has been updated.
It only checks for the main kernel .cu file if that has been updated.
Is this the usual case, or is this JCuda specific ?
/*
* Shared Memory Tile Size, must be a multiple of 16
*/
#define TILE_SIZE 64
/*
* Number of Threads in a Block
*
* Maximum number of resident blocks per multiprocessor : 8
*
* ///////////////////
* Compute capability:
* ///////////////////
*
* Cuda [1.0 - 1.1] ~
* Maximum number of resident threads per multiprocessor 768
* Optimal Usage: 768 / 8 = 96
* Cuda [1.2 - 1.3] ~
* Maximum number of resident threads per multiprocessor 1024
* Optimal Usage: 1024 / 8 = 128
* Cuda [2.x] ~
* Maximum number of resident threads per multiprocessor 1536
* Optimal Usage: 1536 / 8 = 192
*/
#define BLOCK_SIZE_DEF 96
/*
* Number of Threads in a Block, mainly for reduction kernels,
* must be a power of two variable
*
* ///////////////////
* Compute capability:
* ///////////////////
*
* Cuda [1.0 - 1.1] ~
* BLOCK_SIZE_DEF = 96 --> 128 --> 6 of 8 possible resident blocks per multiprocessor
* Cuda [1.2 - 1.3] ~
* BLOCK_SIZE_DEF = 128 --> 128
* Cuda [2.x] ~
* BLOCK_SIZE_DEF = 192 --> 256
*/
#define BLOCK_SIZE_POW2 128
/*
* two times the block size for reduction kernel, since one thread
* loads two elements at global first reduction level
*/
#define BLOCK_SIZE_POW2_DOUBLE (BLOCK_SIZE_POW2 << 1)
Is this UtilCuda class internally using the KernelLauncher? Note that the KernelLauncher checks whether the output file already exists and is newer than the input file. If you only modify an included file, the KernelLauncher will not notice this. You may add the ‘forceRebuild’ flag in the ‘compile’ call of the KernelLauncher, to enforce the Kernel to be rebuilt each time.