I am new to CUDA programming and wanted help in converting my current multithreaded CPU program to be able to run on GPU.
My task is to parallelize the co-occurence matrix creation that I do in my traditional java program. More specifically I have a huge file, where each line represents a document. Each line in the document contains about 100 words (for ease, the words are converted into word indices). The number of lines/documents is about 300K. For simplicity, let’s say the co-ocuccrence window is one document, i.e., how many times word X came with word Y in a document/line. In other words, each line/document needs to go through a nested loop.
Now because of the magnitude of the resulting co-occurence matrix, my current program follows a model similar to external sort-merge algorithm, where doing the merge phase, I keep summing up the number of times a pair of words have repeated, thus finally yielding a complete co-occurence matrix.
I was wondering, at what steps would parallelizing through GPU help and if they are any jcuda codes that the community can help me get started:
Step-(a) creating all pair (Cartesian product) (output is multiple files, each line containing a pair of words)
Step-(b) sorting each file from step - a
Step-© Merging the output of step-b while summing up the repetitions.