RGB to YUV using JCuda

I’m calculating skintone of an image in java.

  1. Convert the pixel of Image in YUV.
  2. Check if image pixel is in specific range, then its a skin color.
  3. Calculate percentage by dividing it by total pixel.

Its working fine in CPU code, but when i convert it to GPU code, The pixel percentage is not coming right.

The confusing part for me was send the pixel data to GPU and get its r, g, b value in GPU.

So i follow JCuda Pixel Invert Example example to send pixel data. The difference is the example send pixel data in int[] array and I’m sending it in byte[] array.

My Results:

Skintone Using CPU: 75417
Total Pixel Of CPU: 262144
SKinTone Percentage Using CPU: 29%

Skintone Using GPU: 29612
Total Pixel Of GPU: 262144
SKinTone Percentage Using GPU: 11%

skintone.zip (951.4 KB)

The attached file is the complete example, you need must have Nvidia Card, Cuda tookkit version 9 and graphics drivers in your system to run the example.

Thank you
–Aqeel Haider

First of all: Great that you provided the project as a package that can simply be opened and run. It’s always difficult when people just dump a random code snippet and as “Why this not worx?”. Even if one can allocate the time to set it up so that it can be run and tested, there’s always the chance that people will afterwards tell details that they didn’t mention before, and the effort was useless. So I really appreciate a test case where the problem can easily be reproduced!


Now, investigating the problem took a bit longer than I expected, though: I started it a few times, and the output seemed to be random. That’s why I took a wrong path of debugging: I suspected a race condition, and looked at the atomicAdd and the output.

But in the end, the problem is in the input:

When the image is loaded, the resulting BufferedImage will have the type TYPE_3BYTE_BGR. This means that each pixel consists of three bytes instead of four (and not even in the expected order). See, for example,

System.out.println("pixels "+pixels.length);
System.out.println("3?     "+(raster.getWidth()*raster.getHeight()*3));
System.out.println("4?     "+(raster.getWidth()*raster.getHeight()*4));

which prints

pixels 786432
3?     786432
4?     1048576

The whole downstream pipeline assumed that the pixels consisted of four bytes: The memory allocations, the copies, and eventually the uchar4 in the kernel. (So the “random” results had been caused by some random, uninitialized memory…)

The problem does not appear in the CPU version, because

raster.getPixel(x, y, rgb);

will do the magic conversion to RGB internally.


Now, how to solve this?

That basically depends on which image type you’d like to operate on. When loading an image with ImageIO, you never know the exact image type. (I think that ImageJ will do a sensible conversion, and expose the pixels as an RGB int[] array)

One solution that worked in the given example was:

Convert the input image to a 4 byte format, for example, with this method:

public static BufferedImage convertToTarget(BufferedImage image)
{
    BufferedImage newImage = new BufferedImage(
        image.getWidth(), image.getHeight(),
        BufferedImage.TYPE_4BYTE_ABGR);
    Graphics2D g = newImage.createGraphics();
    g.drawImage(image, 0, 0, null);
    g.dispose();
    return newImage;
}	

And in the kernel, as you converted to TYPE_4BYTE_ABGR, you could do

// pixel.x is the alpha channel - ignored here
b = pixel.y;
g = pixel.z;
r = pixel.w;

A side note:

Usually, I use this method

public static BufferedImage convertToARGB(BufferedImage image)
{
    BufferedImage newImage = new BufferedImage(
        image.getWidth(), image.getHeight(),
        BufferedImage.TYPE_INT_ARGB);
    Graphics2D g = newImage.createGraphics();
    g.drawImage(image, 0, 0, null);
    g.dispose();
    return newImage;
}

to convert all images that I load with ImageIO into INT_ARGB images. This type is usually the fastest for drawing- and pixel manipulation operations. From this image type, you can fetch a DataBufferInt and its int[] array (you’d have to change the execute method accordingly).

Based on the int pixels, you could also do

int r = (pixels[i] >> 16) & 0xFF;
int g = (pixels[i] >>  8) & 0xFF;
int b = (pixels[i]      ) & 0xFF;

to extract the RGB components on Java side, which should be significantly faster than raster.getPixel.


EDIT: With the changes described above, the output for me was

Skintone Using CPU: 75417
Total Pixel Of CPU: 262144
SKinTone Percentage Using CPU: 29%
Skintone Using GPU: 75418
Total Pixel Of GPU: 262144
SKinTone Percentage Using GPU: 29%

Where this off-by-one error comes from? Well, maybe I’ll try to figure that out later…

In any case: You should also clear the output memory that you allocated (otherwise, the output[0] might already contain a non-zero value, and you’re only adding values to this in the kernel). For example:

// Allocate device output memory
CUdeviceptr deviceOutput = new CUdeviceptr();
cuMemAlloc(deviceOutput, numElements * Sizeof.INT);

// Clear the output memory with zeros:
int hostOutput[] = new int[numElements];
cuMemcpyHtoD(deviceOutput, Pointer.to(hostOutput), numElements * Sizeof.INT);

...

// Allocate host output memory and copy the device output
// to the host.
cuMemcpyDtoH(Pointer.to(hostOutput), deviceOutput, numElements * Sizeof.INT);

Thank you Marco for sparing the time and solve the problem.

I use another method but didn’t understand the reason. Its is now clear from your Explination

int[] pixels = new int[width * height];
PixelGrabber pg = new PixelGrabber(img, 0, 0, width, height, pixels, 0, width);
try {
        pg.grabPixels();
	} catch (InterruptedException e){};

Thank you again.

Regards
–Aqeel Haider