Batch normalization problem

There’s a lot of discussion, and admittedly, I still have to figure out the exact relationship between cuDNN and DIGITS. On the one hand, it’s good to know that the issue is not caused by the JCu* layer - on the other hand, I see that this can be annoying. The initial problem was reported in March. However, NVIDIA announced DIGITS 5 recently. Maybe it is accompanied by a new cuDNN version …?

I didn’t even realize DIGITS was a separate project from Nvidia. It appeared to be a visualization interface for Caffe since it loads Caffe prototxt and generate some visualization for it. I don’t think it uses any different version of cuDNN. Even the proposed solution to running variance problem was initially implemented in Caffe’s CPU implementation.

I think the issue is not batch norm itself since it works for some data but not stable for other. I implemented variance clipping (rather easy with JCuda actually) and it appears to work with some trial and error of parameters.

Torch appears to use exponential moving average rather than accumulative moving average as suggested by the batch norm paper.
It would make less theoretical sense to use exponential moving average for running mean/variance but I just tested it and it works as well.

Anyway, I won’t dwell on this any more and consider it a solved problem.

Fine then. And I have some keywords to look into when I actually find the time to have a closer look at DNN.

@typecheck I gave deepdsl a try. Apart from minor points (mentioned in the pull request), it seems to work smoothly: A test run with the Lenet/MNIST dataset worked well, and seems to have achieved a precision of 0.77. (This was only a very basic test run. I’d still have to read a lot more to even remotely understand what it actually does there…)