I am sending my first mail to this forum. From the last six months I’m researching on Hadoop Technologies & now I am also a member of project entitled “JCuda on Hadoop”. I want some useful links to run JCuda Sample Programs to run a Hadoop Cluster 0f 2 nodes only.
Anyone has some knowledge about this, Please share.
There once was this question about JCuda on Hadoop, but not with any specific information about how to get it up and running. The only hints about that which I found on the web (and which you most likely already found as well) was this thread in a mail archive.
I’d probably have to get more familiar with Hadoop (although it seems to be targeted for Linux, and I’m usually using Windows). Are there any specific requirements for the setup? Have you already set up a “Single Node” or Cluster, regardless of JCuda?
I think as long this refers to the basic setup of a JCuda program, we can continue in the other thread, and resume this one when specific aspects of Hadoop will be discussed.
Sorry, I’m not aware of an existing example for JCuda on Hadoop, and I don’t know the project sturcture and configuration required for MapReduce using Hadoop in general.
Can you give an example of a class or method that should be executed by JCuda? Maybe I can sketch some of the “boilerplate code” that may be necessary, regardless of how the methods will be called exactly or how the kernel will look like…
Hadoop uses a Map-Reduce Framework for processing data in HDFS. I explain in simple steps :-
Data is stored on multiple nodes in HDFS. I attached a simple wordcount program written in Java that uses concept of Map-Reduce functions and run parllelly on multiple nodes.
Marco, how could I know that Cuda is installed properly or not.
Most Important, I want to know that driver is installed properly or not.
I run a simple program below properly through hadoop user below
import jcuda.CUDA;^M
import jcuda.driver.CUdevprop;^M
import jcuda.driver.types.CUdevice;^M
^M
^M
public class MahEnum
{^M
public static void main(String args[])
{^M
//Init CUDA Driver^M
CUDA cuda = new CUDA(true);^M
^M
int count = cuda.getDeviceCount();^M
^M
System.out.println("Total number of devices: " + count);^M
^M
for (int i = 0; i < count; i++) {^M
CUdevice dev = cuda.getDevice(i);^M
^M
String name = cuda.getDeviceName(dev);^M
System.out.println(“Name: " + name);^M
^M
int version[] = cuda.getDeviceComputeCapability(dev);^M
System.out.println(“Version: " + String.format(”%d.%d”, version[0], version[1]));^M
^M
CUdevprop prop = cuda.getDeviceProperties(dev);^M
System.out.println(“Clock rate: " + prop.clockRate + " MHz”);^M
System.out.println("Threads per block: " + prop.maxThreadsPerBlock);^M
}^M
}^M
}^M
This program produces the correct output but when I run another program it displays as
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:569)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:113)
… 3 more
Caused by: java.lang.UnsatisfiedLinkError: no jcuda in java.library.path
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1709)
at java.lang.Runtime.loadLibrary0(Runtime.java:823)
at java.lang.System.loadLibrary(System.java:1028)
at jcuda.driver.CUDADriver.(CUDADriver.java:909)
at jcuda.CUDA.init(CUDA.java:62)
at jcuda.CUDA.(CUDA.java:42)
at org.myorg.WordCount$TokenizerMapper.(WordCount.java:28)
… 8 more
EDIT: Concerning the omnipresent “UnsatisfiedLinkError”: Might this thread help here?
Concerning the example: This is basically the example from the website… Without a deeper knowledge about Hadoop, and without a precise idea about where and how to apply the data-parallel computation that is offered with CUDA, it is hard to tell what could be the best approach for this. Saying “we’ll use CUDA, then it will be faster” is plainly not true if the undelying algorithms and structures can not be expressed in a data-parallel way - and this limitation may also be imposed by the framework.
Moreover, things which are trivial with Java, like using a StringTokenizer or putting something into a HashMap, can be more than challenging in CUDA…
However, I assume that the “Context” which is given to Mapper#map can be an arbitrary implementation of a Mapper.Context? I don’t have a clue where the instances of these Contexts come from, what all the constructor parameters of this class are intended for, and which role they play during the mapping, but it might be possible to do “something” in parallel there. Reduction in general is a very common task to be done on a parallel architecture, CUDPP even offers dedicated methods for that. But in how far it may be applied here depends on the data which is to be processed…