Great question. A response from one of our Data Scientists:
A rudimentary convolutional network performs a series of weighted averages over input windows, along with thresholding. These operations look for patterns in the input --- a neuron "fires" when the weighted average of its input window values is above a threshold. Using a larger cascade of these operations, a neural network is able to build up elaborate representations of the image content, and eventually the tag outputs that you see. The techniques we use at Clarifai are more complex systems based upon these principles.