For computational deep learning, you need a model to learn the machines on. A model is a mathematical representation of what you are studying. You need a good model, one that can be trained by the data.
The model goes through a training process where it makes decisions based on the data it has, and if the decision is a wrong one, it must change its approach until it reliably gets at the correct decision.
Standard statistical models discover patterns, but deep learning models learn themselves some actual computation, modest methods that can then stretch wide and flat across the cluster.
The New Yorker provides an example: A deep learning program is tasked with playing the old Atari game Breakout, in which the player must make an opening in a wall by repeatability hitting it with a ball. The program's goal is to win the game, though it is given no instruction how to do so. It can, however, analyze its scores. In this way, it can distinguish moves that bring points, versus those that do not. Given enough time, it could amass enough knowledge to win the game.
You need a neural network, 10-20 layers deep. The more expensive the algorithm, the deeper the circuit you'll need.
Machine learning models can solve any problem at hand given infinite data and time for training. With infinite resources, you could simply map into memory every possible answer, and then every possible path to every answer.
The challenge in making deep learning practical is to find a way to get the desired result with only a finite amount of resources, namely what you have on hand.
There is less guarantee here that the model on hand will find the answer this way, which is why you must do all the work to make a model as kick-ass as possible.
Use the Stochastic Gradient Descent (SGD), which can help immensely in the training of current deep learning models. SGD is the main pillar of deep learning, a surprisingly efficient algo for training a neural network.
SGD starts by identifying easy correlations, putting a markers on them, and then going on to search more larger, more obscure correlations.
This is how machines get smarter, moving ever so gradually from deductive to inductive reasoning.
Perhaps one day, we can enjoy unsupervised learning. But that day is not today. There also is semi-supervised learning, in which the model gets a data dump and must figure out itself how to categorize it all, or at least those portions that haven't been indexed, taxonomically divvied up or otherwise organized with a set of instructions as to what means what.
For the models trained on supervised learning though, the data set must be clean, well annotated. The mean must be at zero, and the dimensions must be bounded sensibly. Train multiple neural networks and take the average of the results.
For some tasks, such neural networks can be more efficient than standard Boolean circuits. Sorting 10 six-bit numbers by neural networks requires fewer circuit layers.
Notes taken from: "A Brief Overview of Deep Learning" by Google's Ilya Sutskever, with some additional bits thrown in from "A Tour of Machine Learning Algorithms."