Video processing Advices

Good afternoon, i’m in a project that runs in sequential mode (CPU only) and in grid (still, GPU only :P)

my task is to use CUDA to parallelize one part which contains several subparts, such as:

  • decoding mjpeg (which is being done with ffmpeg linux libs (libavcodec, libavformat,etc…). this part is done package by package (and consists in retrieving each frame inside packages of main video stream)
  • after that, convert the acquired frame from YUV(i believe) to RGB
  • build frame histogram, normalize the histogram and than, quantize it
  • after that, i need to segment the resultant frame into 4 categories (1,2,3 or 4), depending on color properties

and it is repeated from each and every frame of the video.

I’m posting this to ask for some advice! how can i decode mjpeg in cuda (each thread decodes a frame, and than, post-process it)? i realized that nvcuvid only decodes mpeg 1/2 and h264… so it seems that there isn’t any easy way to do this?

I would appreciate any lead,… because to make this worth the work, i need to make it all inside GPU.

Hello,

In general, each of the tasks that you mentioned is certainly challenging on its own. For example, there are papers and source codes about Histogram computations, and at least when trying to do this efficiently, it’s far from trivial - and as far as I can see, the Histogram is still one of the “easier” tasks of the pipeline that you described.

I’m not so familiar with video processing in general, and specifically not how the mjpeg stream looks like and whether there are parts which are really data parallel (and thus efficiently porable to CUDA). When you say
each thread decodes a frame, and than, post-process it
I assume that it would be easier and more appropriate to decode a single frame with many threads in a data-parallel way, but can not say whether or even how this is applicable for the mjpeg decompression.

The first starting point would probably be the Video encode/decode samples, to get a basic idea about a possible structure and approaches for such an application, but beyond that I can not give any more specific hints, since this would require a very deep knowledge about the encoding/decoding algorithms…

bye
Marco