Counting entries in Excel table

Marco13 · 24. März 2012 um 05:38

OK… I may be a little bit closer to understanding what has to be done there. I’ll try to continue in the next few days. Maybe there arise new questions, but I’ll see what I can do.

system · 26. März 2012 um 08:07

Ok, thanks a lot Marco.

Marco13 · 29. März 2012 um 10:00

Hello

I have tried to call a Java-Implementation of the Fisher test, that was extracted from the code that you originally posted. Can you confirm that the computation itself is correct? (Are the other entries for the 2x2-Matrix always set to 0?)

import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.AbstractList;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.LinkedHashMap;
import java.util.LinkedHashSet;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Set;


public class JCudaFisherTestMain
{
    private static Map<String, Integer> valueToIndex;
    private static Map<Integer, String> indexToValue;
    
    private static void initMapping()
    {
        valueToIndex = new LinkedHashMap<String, Integer>();
        valueToIndex.put("A/A",  0);
        valueToIndex.put("A/C",  1);
        valueToIndex.put("A/G",  2);
        valueToIndex.put("A/T",  3);
        valueToIndex.put("C/A",  4);
        valueToIndex.put("C/C",  5);
        valueToIndex.put("C/G",  6);
        valueToIndex.put("C/T",  7);
        valueToIndex.put("G/A",  8);
        valueToIndex.put("G/C",  9);
        valueToIndex.put("G/G", 10);
        valueToIndex.put("G/T", 11);
        valueToIndex.put("T/A", 12);
        valueToIndex.put("T/C", 13);
        valueToIndex.put("T/G", 14);
        valueToIndex.put("T/T", 15);
        valueToIndex.put("NoCall", 16);
        valueToIndex.put("-/-", 17);
        valueToIndex.put("-/A", 18);
        valueToIndex.put("-/C", 19);
        valueToIndex.put("-/G", 20);
        valueToIndex.put("-/T", 21);
        valueToIndex.put("A/-", 22);
        valueToIndex.put("C/-", 23);
        valueToIndex.put("G/-", 24);
        valueToIndex.put("T/-", 25);
        indexToValue = reverse(valueToIndex);
    }
    

    
    public static void main(String[] args) throws Exception
    {
        initMapping();
        
        SimpleTableData table = SimpleTableData.read("FirstTable.csv");
        List<Integer> sickColumns = Arrays.asList(
            2, 3, 13, 14, 15, 16, 17, 18,
            19, 20, 21, 22, 25, 27
        );
        Set<Integer> set = new LinkedHashSet<Integer>();
        for (int i=1; i<table.getNumCols(); i++)
        {
            set.add(i);
        }
        set.removeAll(sickColumns);
        List<Integer> healthyColumns = new ArrayList<Integer>(set);

        SimpleTableData sickTable = SimpleTableData.create(table, sickColumns);        
        SimpleTableData healthyTable = SimpleTableData.create(table, healthyColumns);        

        int printedRows = 5;
        
        System.out.println("Start of sick table");
        System.out.println(sickTable.toString(printedRows));

        System.out.println("Start of healthy table");
        System.out.println(healthyTable.toString(printedRows));
        
        int sickCount[] = createCountMatrix(sickTable);
        int healthyCount[] = createCountMatrix(healthyTable);
        
        int numRows = table.getNumRows();
        int numCols = valueToIndex.size();
        
        System.out.println("Start of sick count table");
        System.out.println(toString2D(sickCount, numCols, printedRows));

        System.out.println("Start of healthy count table");
        System.out.println(toString2D(healthyCount, numCols, printedRows));
     
        FisherExactTestCore f = new FisherExactTestCore(1000000);
        for (int r=0; r<numRows; r++)
        {
            for (int c=0; c<numCols; c++)
            {
                int index = c+r*numCols;
                int h = healthyCount[index];
                int s = sickCount[index];
                if (h != 0 && s != 0)
                {
                    float result = f.getTwoTailedP(h, 0, 0, s);

                    System.out.println(
                        "For "+table.get(r, 0)+
                        " col "+indexToValue.get(c)+
                        " found healthy: "+h+" sick: "+s+
                        " result: "+result);
                }
            }
        }
    }
    
    
    private static <K, V> Map<V, K> reverse(Map<K, V> input)
    {
        Map<V, K> result = new LinkedHashMap<V, K>();
        for (Entry<K, V> entry : input.entrySet())
        {
            result.put(entry.getValue(), entry.getKey());
        }
        return result;
    }


    private static int[] createCountMatrix(TableData tableData)
    {
        int result[] = new int[valueToIndex.size() * tableData.getNumRows()];
        for (int r=0; r<tableData.getNumRows(); r++)
        {
            for (int c=0; c<tableData.getNumCols(); c++)
            {
                String value = tableData.get(r, c);
                Integer index = valueToIndex.get(value);
                if (index != null)
                {
                    int arrayIndex = r * valueToIndex.size() + index;
                    result[arrayIndex]++;
                }
            }
        }
        return result;
    }
    
    public static String toString2D(int a[], int columns, int maxRows)
    {
        StringBuilder sb = new StringBuilder();
        int row = 0;
        for (int i=0; i<a.length; i++)
        {
            if (i>0 && i % columns == 0)
            {
                sb.append("
");
                row++;
                if (maxRows > 0 && row >= maxRows)
                {
                    sb.append("...
");
                    break;
                }
            }
            sb.append(String.format("%4s ", String.valueOf(a**)));
        }
        return sb.toString();
    }
    
}


class FisherExactTestCore
{
    private float[] f;
    private int maxSize;

    /**
     * Constructor for FisherExactTestCore
     * 
     * @param maxSize is the maximum sum that will be encountered by the 
     * table a+b+c+d)
     */
    public FisherExactTestCore(int maxSize)
    {
        this.maxSize = maxSize;
        f = new float[maxSize + 1];
        f[0] = 0.0f;
        for (int i = 1; i <= this.maxSize; i++)
        {
            f** = f[i - 1] + (float)Math.log(i);
        }
    }

    /**
     * calculates the P-value for this specific state
     * 
     * @param a
     *            a, b, c, d are the four cells in a 2x2 matrix
     * @param b
     * @param c
     * @param d
     * @return the P-value
     */
    private final float getP(int a, int b, int c, int d)
    {
        int n = a + b + c + d;
        if (n > maxSize)
        {
            return Float.NaN;
        }
        float p;
        p = (f[a + b] + f[c + d] + f[a + c] + f[b + d]) - (f[a] + f** + f``` + f[d] + f[n]);
        return (float)Math.exp(p);
    }

    /**
     * Calculates the two-tailed P-value for the Fisher Exact test.
     * 
     * In order for a table under consideration to have its p-value included
     * in the final result, it must have a p-value less than the original
     * table's P-value, i.e.
     * Fisher's exact test computes the probability, given the observed marginal
     * frequencies, of obtaining exactly the frequencies observed and any
     * configuration more extreme.
     * By "more extreme," we mean any configuration (given observed marginals)
     * with a smaller probability of
     * occurrence in the same direction (one-tailed) or in both directions
     * (two-tailed).
     * 
     * @param a
     *            a, b, c, d are the four cells in a 2x2 matrix
     * @param b
     * @param c
     * @param d
     * @return two-tailed P-value
     */
    public final float getTwoTailedP(int a, int b, int c, int d)
    {
        int min, i;
        int n = a + b + c + d;
        if (n > maxSize)
        {
            return Float.NaN;
        }
        float p = 0;

        float baseP = getP(a, b, c, d);
        int initialA = a, initialB = b, initialC = c, initialD = d;
        p += baseP;
        min = (c < b) ? c : b;
        for (i = 0; i < min; i++)
        {
            float tempP = getP(++a, --b, --c, ++d);
            if (tempP <= baseP)
            {
                p += tempP;
            }
        }

        // reset the values to their original so we can repeat this process for
        // the other side
        a = initialA;
        b = initialB;
        c = initialC;
        d = initialD;

        min = (a < d) ? a : d;
        for (i = 0; i < min; i++)
        {
            float tempP = getP(--a, ++b, ++c, --d);
            if (tempP <= baseP)
            {
                p += tempP;
            }
        }
        return p;
    }
}

interface TableData
{
    String get(int row, int col);
    int getNumRows();
    int getNumCols();
}


class SimpleTableData implements TableData
{
    private List<String> headers = null;
    private List<List<String>> data = null;
    
    public static SimpleTableData create(SimpleTableData other, List<Integer> columns)
    {
        SimpleTableData result = new SimpleTableData();
        result.headers = new ArrayList<String>();
        for (Integer column : columns)
        {
            result.headers.add(other.getHeader(column));
        }
        result.data = new ArrayList<List<String>>();
        for (int r=0; r<other.getNumRows(); r++)
        {
            List<String> row = new ArrayList<String>();
            for (Integer column : columns)
            {
                row.add(other.get(r, column));
            }
            result.data.add(row);
        }
        return result;
    }
    
    public static SimpleTableData read(String csvFileName) throws IOException
    {
        SimpleTableData result = new SimpleTableData();
        result.data = new ArrayList<List<String>>();
        BufferedReader r = new BufferedReader(
            new InputStreamReader(new FileInputStream(csvFileName)));
        String line = null;
        while (true)
        {
            line = r.readLine();
            if (line == null)
            {
                break;
            }
            String tokens[] = line.split(",");
            if (result.headers == null)
            {
                result.headers = Arrays.asList(tokens);
            }
            else
            {
                result.data.add(Arrays.asList(tokens));
            }
        }
        return result;
    }
    
    @Override
    public int getNumRows()
    {
        return data.size();
    }
    @Override
    public int getNumCols()
    {
        return headers.size();
    }
    @Override
    public String get(int row, int col)
    {
        return data.get(row).get(col);
    }

    public List<String> getHeaders()
    {
        return headers;
    }
    public String getHeader(int col)
    {
        return headers.get(col);
    }
    public List<String> getRow(int row)
    {
        return data.get(row);
    }
    
    
    public List<String> getColumn(String name)
    {
        for (int i=0; i<getNumCols(); i++)
        {
            if (getHeader(i).equals(name))
            {
                return getColumn(i);
            }
        }
        return null;
    }
    
    public List<String> getColumn(final int col)
    {
        return new AbstractList<String>()
        {
            @Override
            public String get(int index)
            {
                return SimpleTableData.this.get(col, index);
            }

            @Override
            public int size()
            {
                return getNumRows();
            }};
    }
    
    public String toString(int numRows)
    {
        StringBuilder sb = new StringBuilder();
        sb.append(headers);
        sb.append("
");
        for (int i=0; i<numRows; i++)
        {
            sb.append(getRow(i));
            sb.append("
");
        }
        sb.append("...
");
        return sb.toString();
    }
}

system · 31. März 2012 um 03:56

Hi Marco. The computation is almost correct: the problem is that we have to consider all possible combinations of alleles for each probe. For example:

Healthy’s occurrences table:

ProbeSetID A/A A/G C/C C/T G/G T/T …
AM_10001 0 0 3 0 0 0
AM_10002 0 0 0 0 3 0
AM_10003 0 0 1 8 0 7
AM_10004 …
AM_10005
AM_10006
AM_10008
AM_10010
AM_10011
AM_10012
AM_10013
AM_10014
AM_10016
AM_10017
AM_10019
AM_10020

Sick’s occurrences table:

ProbeSetID A/A A/G C/C C/T G/G T/T …
AM_10001 0 0 4 0 0 0
AM_10002 1 0 0 2 0 0
AM_10003 0 0 3 0 5 1
AM_10004 …
AM_10005
AM_10006
AM_10008
AM_10010
AM_10011
AM_10012
AM_10013
AM_10014
AM_10016
AM_10017
AM_10019
AM_10020

In this case:


for the probe AM_10001:

               Healthy Sick
       C/C       3       0
       C/C       0       4

for the probe AM_10002:
     
               Healthy Sick                       Healthy  Sick
       G/G        3      0                  G/G       3        0 
       A/A        0      1                  C/T       0        2

for the probe AM_10003:

               Healthy Sick                       Healthy Sick                   Healthy Sick
       C/C        1      3                  C/C       1       3              C/C      1       3
       C/C        1      3                  G/G       0       5              T/T      7       1

               Healthy Sick                       Healthy Sick                   Healthy Sick
       C/T        8      0                  C/T        8      0              C/T      8       0
       C/C        1      3                  G/G        0      5              T/T      7       1
             
               Healthy Sick                       Healthy Sick                   Healthy Sick
       T/T        7      1                  T/T       7       1              T/T      7       1
       C/C        1      3                  G/G       0       5              T/T      7       1

And so on. Here there’s a more better “view” of this example without “forum’s reduction”.

Marco13 · 31. März 2012 um 04:30

OK, in this case, the computation is indeed rather complex. I thought that it could be possible to use Java to find the (a,b,c,d) elements for which a test has to be computed, and then only compute the exact “getTwoTailedP” value with CUDA.
I’ll try to adjust the Java implementation so that it computes what you described (or you could do this) and think about a possible implementation in CUDA then.

system · 31. März 2012 um 06:15

I’ll try Marco, but for me is very hard. I forgot to say that when i have a 2x2 matrix like these:


                 Healthy Sick
C/C                1       3
T/T                7       1

and


                Healthy Sick
T/T                7       1
C/C                1       3

the result of the fisher test should be calculated only once. Thanks for your precious help Marco.

system · 2. April 2012 um 01:01

[QUOTE=Unregistered]I’ll try Marco, but for me is very hard. I forgot to say that when i have a 2x2 matrix like these:


                 Healthy Sick
C/C                1       3
T/T                7       1

and


                Healthy Sick
T/T                7       1
C/C                1       3

the result of the fisher test should be calculated only once. Thanks for your precious help Marco.[/QUOTE]

I must point out that this is true only if are calculated for the same probe set ID.

Marco13 · 2. April 2012 um 01:18

I’ll try to compute the combinations as you described, but don’t know when I will find the time (possibly this week, but possibly next week). In any case, once I have the Java implementation, I’ll post it here for review.

system · 2. April 2012 um 02:47

Ok Marco. Forgive me for the too much disturbance, i am not able to do something like this but i have to do it. Anyway i’ll try to do something. Thanks for your help (and your patience).

Marco13 · 11. April 2012 um 13:56

Hello

Did you make any progress with the computation of the combinations? If not, I would try to continue with this task tomorrow.

bye
Marco

system · 11. April 2012 um 23:55

Hi Marco. I tried to have a closer look to the code but i didn’t do nothing (unfortunately).

Marco13 · 12. April 2012 um 11:24

Hello

I tried to continue, but it is still not entirely clear which combinations have to be computed. According to the example that you posted here, why is the first computation


               Healthy Sick
       C/C       3       0
       C/C       0       4

and not


               Healthy Sick
       C/C       3       4
       C/C       3       4

?

The current attempt: (It fills the “count” tables with some “dummy” data that is similar to the example that you posted, just for testing)

public class JCudaFisherTestMain
{
    private static Map<String, Integer> valueToIndex;
    private static Map<Integer, String> indexToValue;
    
    private static void initMapping()
    {
        valueToIndex = new LinkedHashMap<String, Integer>();
        valueToIndex.put("A/A",  0);
        valueToIndex.put("A/C",  1);
        valueToIndex.put("A/G",  2);
        valueToIndex.put("A/T",  3);
        valueToIndex.put("C/A",  4);
        valueToIndex.put("C/C",  5);
        valueToIndex.put("C/G",  6);
        valueToIndex.put("C/T",  7);
        valueToIndex.put("G/A",  8);
        valueToIndex.put("G/C",  9);
        valueToIndex.put("G/G", 10);
        valueToIndex.put("G/T", 11);
        valueToIndex.put("T/A", 12);
        valueToIndex.put("T/C", 13);
        valueToIndex.put("T/G", 14);
        valueToIndex.put("T/T", 15);
        valueToIndex.put("NoCall", 16);
        valueToIndex.put("-/-", 17);
        valueToIndex.put("-/A", 18);
        valueToIndex.put("-/C", 19);
        valueToIndex.put("-/G", 20);
        valueToIndex.put("-/T", 21);
        valueToIndex.put("A/-", 22);
        valueToIndex.put("C/-", 23);
        valueToIndex.put("G/-", 24);
        valueToIndex.put("T/-", 25);
        indexToValue = reverse(valueToIndex);
    }
    

    
    public static void main(String[] args) throws Exception
    {
        initMapping();
        
        SimpleTableData table = SimpleTableData.read("FirstTable.csv");
        List<Integer> sickColumns = Arrays.asList(
            2, 3, 13, 14, 15, 16, 17, 18,
            19, 20, 21, 22, 25, 27
        );
        Set<Integer> set = new LinkedHashSet<Integer>();
        for (int i=1; i<table.getNumCols(); i++)
        {
            set.add(i);
        }
        set.removeAll(sickColumns);
        List<Integer> healthyColumns = new ArrayList<Integer>(set);

        SimpleTableData sickTable = SimpleTableData.create(table, sickColumns);        
        SimpleTableData healthyTable = SimpleTableData.create(table, healthyColumns);        

        int printedRows = 5;
        
        System.out.println("Start of sick table");
        System.out.println(sickTable.toString(printedRows));

        System.out.println("Start of healthy table");
        System.out.println(healthyTable.toString(printedRows));
        
        int sickCount[] = createCountMatrix(sickTable);
        int healthyCount[] = createCountMatrix(healthyTable);
        
        int numRows = table.getNumRows();
        int numCols = valueToIndex.size();
        
        int handledRows = numRows;
        int handledCols = numCols;
        
        boolean dummy = true;
        if (dummy)
        {
            initDummy(healthyCount, sickCount, numCols);
            handledRows = 3;
            handledCols = 6;
        }
        
        System.out.println("Start of sick count table");
        System.out.println(toString2D(sickCount, numCols, printedRows));

        System.out.println("Start of healthy count table");
        System.out.println(toString2D(healthyCount, numCols, printedRows));
     
        FisherExactTestCore f = new FisherExactTestCore(1000000);
        for (int r=0; r<handledRows; r++)
        {
            System.out.println("Probe "+table.get(r, 0)+":");
            for (int c0=0; c0<handledCols; c0++)
            {
              int index0 = c0+r*numCols;
              int h0 = healthyCount[index0];
              int s0 = sickCount[index0];
              
              System.out.println("At col0="+c0+" with name "+indexToValue.get(c0)+" found h0="+h0+" and s0="+s0);
              
              if (h0 != 0 && s0 != 0)
              {
                  float result = f.getTwoTailedP(h0, 0, 0, s0);
                  System.out.printf("%8s %3s %3s
", "", "H", "S");
                  System.out.printf("%8s %3s %3s
", indexToValue.get(c0), String.valueOf(h0), String.valueOf(0));
                  System.out.printf("%8s %3s %3s  ", indexToValue.get(c0), String.valueOf(0), String.valueOf(s0));
                  System.out.printf("result: %f (special case)

", result);
              }
              if (h0 != 0)
              {
                  for (int c1=0; c1<handledCols; c1++)
                  {
                      int index1 = c1+r*numCols;
                      int h1 = healthyCount[index1];
                      int s1 = sickCount[index1];
                      
                      System.out.println("At col1="+c1+" with name "+indexToValue.get(c1)+" found h1="+h1+" and s1="+s1);
                      
                      if (s1 != 0)
                      {
                          float result = f.getTwoTailedP(h0, h1, s0, s1);
                          System.out.printf("%8s %3s %3s
", "", "H", "S");
                          System.out.printf("%8s %3s %3s
", indexToValue.get(c0), String.valueOf(h0), String.valueOf(s0));
                          System.out.printf("%8s %3s %3s  ", indexToValue.get(c1), String.valueOf(h1), String.valueOf(s1));
                          System.out.printf("result: %f

", result);
                      }
                  }
              }
            }
        }
    }
    
    private static void initDummy(int h[], int s[], int numCols)
    {
//        AM_10001 0 0 3 0 0 0
//        AM_10002 0 0 0 0 3 0
//        AM_10003 0 0 1 8 0 7        

        h[0+0*numCols] = 0;
        h[1+0*numCols] = 0;
        h[2+0*numCols] = 3;
        h[3+0*numCols] = 0;
        h[4+0*numCols] = 0;
        h[5+0*numCols] = 0;

        h[0+1*numCols] = 0;
        h[1+1*numCols] = 0;
        h[2+1*numCols] = 0;
        h[3+1*numCols] = 0;
        h[4+1*numCols] = 3;
        h[5+1*numCols] = 0;

        h[0+2*numCols] = 0;
        h[1+2*numCols] = 0;
        h[2+2*numCols] = 1;
        h[3+2*numCols] = 8;
        h[4+2*numCols] = 0;
        h[5+2*numCols] = 7;
        
//        AM_10001 0 0 4 0 0 0
//        AM_10002 1 0 0 2 0 0
//        AM_10003 0 0 3 0 5 1
        
        s[0+0*numCols] = 0;
        s[1+0*numCols] = 0;
        s[2+0*numCols] = 4;
        s[3+0*numCols] = 0;
        s[4+0*numCols] = 0;
        s[5+0*numCols] = 0;

        s[0+1*numCols] = 1;
        s[1+1*numCols] = 0;
        s[2+1*numCols] = 0;
        s[3+1*numCols] = 2;
        s[4+1*numCols] = 0;
        s[5+1*numCols] = 0;

        s[0+2*numCols] = 0;
        s[1+2*numCols] = 0;
        s[2+2*numCols] = 3;
        s[3+2*numCols] = 0;
        s[4+2*numCols] = 5;
        s[5+2*numCols] = 1;
    }
    
    private static <K, V> Map<V, K> reverse(Map<K, V> input)
    {
        Map<V, K> result = new LinkedHashMap<V, K>();
        for (Entry<K, V> entry : input.entrySet())
        {
            result.put(entry.getValue(), entry.getKey());
        }
        return result;
    }


    private static int[] createCountMatrix(TableData tableData)
    {
        int result[] = new int[valueToIndex.size() * tableData.getNumRows()];
        for (int r=0; r<tableData.getNumRows(); r++)
        {
            for (int c=0; c<tableData.getNumCols(); c++)
            {
                String value = tableData.get(r, c);
                Integer index = valueToIndex.get(value);
                if (index != null)
                {
                    int arrayIndex = r * valueToIndex.size() + index;
                    result[arrayIndex]++;
                }
            }
        }
        return result;
    }
    
    public static String toString2D(int a[], int columns, int maxRows)
    {
        StringBuilder sb = new StringBuilder();
        int row = 0;
        for (int i=0; i<a.length; i++)
        {
            if (i>0 && i % columns == 0)
            {
                sb.append("
");
                row++;
                if (maxRows > 0 && row >= maxRows)
                {
                    sb.append("...
");
                    break;
                }
            }
            sb.append(String.format("%4s ", String.valueOf(a**)));
        }
        return sb.toString();
    }
    
}

Marco13 · 12. April 2012 um 11:41

Assuming that the “special case” should actually use the values from the ‘count’ tables, can you tell me whether the cases handled here are correct:

public class JCudaFisherTestMain
{
    private static Map<String, Integer> valueToIndex;
    private static Map<Integer, String> indexToValue;
    
    private static void initMapping()
    {
        valueToIndex = new LinkedHashMap<String, Integer>();
        valueToIndex.put("A/A",  0);
        valueToIndex.put("A/C",  1);
        valueToIndex.put("A/G",  2);
        valueToIndex.put("A/T",  3);
        valueToIndex.put("C/A",  4);
        valueToIndex.put("C/C",  5);
        valueToIndex.put("C/G",  6);
        valueToIndex.put("C/T",  7);
        valueToIndex.put("G/A",  8);
        valueToIndex.put("G/C",  9);
        valueToIndex.put("G/G", 10);
        valueToIndex.put("G/T", 11);
        valueToIndex.put("T/A", 12);
        valueToIndex.put("T/C", 13);
        valueToIndex.put("T/G", 14);
        valueToIndex.put("T/T", 15);
        valueToIndex.put("NoCall", 16);
        valueToIndex.put("-/-", 17);
        valueToIndex.put("-/A", 18);
        valueToIndex.put("-/C", 19);
        valueToIndex.put("-/G", 20);
        valueToIndex.put("-/T", 21);
        valueToIndex.put("A/-", 22);
        valueToIndex.put("C/-", 23);
        valueToIndex.put("G/-", 24);
        valueToIndex.put("T/-", 25);
        indexToValue = reverse(valueToIndex);
    }
    

    
    public static void main(String[] args) throws Exception
    {
        initMapping();
        
        SimpleTableData table = SimpleTableData.read("FirstTable.csv");
        List<Integer> sickColumns = Arrays.asList(
            2, 3, 13, 14, 15, 16, 17, 18,
            19, 20, 21, 22, 25, 27
        );
        Set<Integer> set = new LinkedHashSet<Integer>();
        for (int i=1; i<table.getNumCols(); i++)
        {
            set.add(i);
        }
        set.removeAll(sickColumns);
        List<Integer> healthyColumns = new ArrayList<Integer>(set);

        SimpleTableData sickTable = SimpleTableData.create(table, sickColumns);        
        SimpleTableData healthyTable = SimpleTableData.create(table, healthyColumns);        

        int printedRows = 5;
        
        System.out.println("Start of sick table");
        System.out.println(sickTable.toString(printedRows));

        System.out.println("Start of healthy table");
        System.out.println(healthyTable.toString(printedRows));
        
        int sickCount[] = createCountMatrix(sickTable);
        int healthyCount[] = createCountMatrix(healthyTable);
        
        int numRows = table.getNumRows();
        int numCols = valueToIndex.size();
        
        int handledRows = numRows;
        int handledCols = numCols;
        
        System.out.println("Start of sick count table");
        System.out.println(toString2D(sickCount, numCols, printedRows));

        System.out.println("Start of healthy count table");
        System.out.println(toString2D(healthyCount, numCols, printedRows));
     
        int tasks[] = new int[4];
        int t = 0;
        for (int r=0; r<handledRows; r++)
        {
            for (int c0=0; c0<handledCols; c0++)
            {
              int index0 = c0+r*numCols;
              int h0 = healthyCount[index0];
              if (h0 != 0)
              {
                  for (int c1=0; c1<handledCols; c1++)
                  {
                      int index1 = c1+r*numCols;
                      int s1 = sickCount[index1];
                      if (s1 != 0)
                      {
                          int h1 = healthyCount[index1];
                          int s0 = sickCount[index0];
                          
                          tasks[t*4+0] = h0;
                          tasks[t*4+1] = h1;
                          tasks[t*4+2] = s0;
                          tasks[t*4+3] = s1;
                          t++;
                          
                          if (t*4+0 >= tasks.length)
                          {
                              tasks = Arrays.copyOf(tasks, tasks.length*2);
                          }
                      }
                  }
              }
            }
        }

        FisherExactTestCore f = new FisherExactTestCore(1000000);
        for (int i=0; i<t; i++)
        {
            int h0 = tasks[i*4+0];
            int h1 = tasks[i*4+1];
            int s0 = tasks[i*4+2];
            int s1 = tasks[i*4+3];
            float result = f.getTwoTailedP(h0, h1, s0, s1);
            System.out.printf("Handle %3d %3d %3d %3d : %f
", h0, h1, s0, s1, result);
        }
    }
    
    private static <K, V> Map<V, K> reverse(Map<K, V> input)
    {
        Map<V, K> result = new LinkedHashMap<V, K>();
        for (Entry<K, V> entry : input.entrySet())
        {
            result.put(entry.getValue(), entry.getKey());
        }
        return result;
    }


    private static int[] createCountMatrix(TableData tableData)
    {
        int result[] = new int[valueToIndex.size() * tableData.getNumRows()];
        for (int r=0; r<tableData.getNumRows(); r++)
        {
            for (int c=0; c<tableData.getNumCols(); c++)
            {
                String value = tableData.get(r, c);
                Integer index = valueToIndex.get(value);
                if (index != null)
                {
                    int arrayIndex = r * valueToIndex.size() + index;
                    result[arrayIndex]++;
                }
            }
        }
        return result;
    }
    
    public static String toString2D(int a[], int columns, int maxRows)
    {
        StringBuilder sb = new StringBuilder();
        int row = 0;
        for (int i=0; i<a.length; i++)
        {
            if (i>0 && i % columns == 0)
            {
                sb.append("
");
                row++;
                if (maxRows > 0 && row >= maxRows)
                {
                    sb.append("...
");
                    break;
                }
            }
            sb.append(String.format("%4s ", String.valueOf(a**)));
        }
        return sb.toString();
    }
    
}

system · 13. April 2012 um 01:31

Hi Marco. You have right and i’m sorry for the mistake. The proper matrix is the second:


            Healthy Sick
C/C           3       4
C/C           3       4

However, in cases like this, i can provide controls (if it is possible) to prevent the calculation because the Fisher’s test is not significant. I’ve tested both projects and seem to be exact (maybe the dummy edition looks like better).

Marco13 · 13. April 2012 um 02:19

OK, the last version collects all matrices (a,b,c,d) that have to be computed, but some of them are not required (e.g. those where the rows are just swapped, or the values are equal).

Assuming that the “twoTailedP” value of ALL of these would have been computed, what would you do with these values? Is it important to sort out the values that are not needed? And by the way: How large are the tables in the real application? At the moment, they are rather small, I’m not sure whether the performance can be increased so much. The most time-consuming things may (!) be reading the files, creating the "count’ tables and finding the (a,b,c,d) matrices that have to be computed, and not the computation of the P-values itself. But we’ll see…

system · 14. April 2012 um 06:19

Hi Marco. Calculating the Fisher’s test, i get a p-value that is the probability that determines whether or not to accept the initial hypothesis. Then the p-value must be less then alpha (the significance). Alpha is defined as 1-p (p is the p-value).
The greatness is never known in advance but depends on the microarray which is used. One type contains 1931 lines and a variable number of columns: 30 (like the table in my example), 100, 1.000 or even more depending on the experiment. One other type presents 1.000.000 rows and number of columns also this time variable. I hope this explains what you asked and thanks again Marco.

Marco13 · 22. April 2012 um 07:17

Just a short note: I’m rather busy at the moment, and will hardly have the chance to continue with this next week, but try to do so in the first week of May

system · 22. April 2012 um 08:13

Ok Marco. No hurry (for now). Thanks.

Marco13 · 30. April 2012 um 11:05

Hello

Attached you will find a first test of a strighforward JCuda implementation. Let me know whether this goes into the right direction.

bye

system · 3. Mai 2012 um 04:03

Hi Marco. I finally try this first test (after many problems) and seems ok but seems that java and CUDA spends the same time for the execution:


For  13  13  13  13 : Java : 1,000000    CUDA : 1,000000
For  13   0  13   1 : Java : 0,999997    CUDA : 0,999997
For   1  13   0  13 : Java : 0,999997    CUDA : 0,999997
For   1   0   0   1 : Java : 1,000000    CUDA : 1,000000
For  14  14  14  14 : Java : 1,000002    CUDA : 1,000002
For   2   2   2   2 : Java : 1,000000    CUDA : 1,000000

Can you please explain (in easy words) how this example works ? What are the FisherTestKernel.cu and the FisherTestKernel.ptx files (and why creates, each execution time, the ptx file) ?
Is not possible to add the Probe Set ID after each for like this:

For AM_10001 13 13 13 13 : Java : 1,000000 CUDA : 1,000000
For AM_10001 13 0 13 1 : Java : 0,999997 CUDA : 0,999997
…

Thanks.