Convolutional Neural Networks

UMaine COS 470/570 – Introduction to AI

Spring 2019

Created: 2019-04-29 Mon 21:45

Convolutional neural networks

  • One of the major kinds of ANNs in use
  • One of the reasons deep learning is so successful:
    • addresses computational tractability
    • addresses vanishing/exploding gradient problem
  • Start – 80s
  • First truly successful modern version: LeNet (LeCun, 1989)
  • LeNet-5 (LeCun, 1998): 7-layer CNN for reading numbers on checks

The problem

  • Goal: High-accuracy image recognition
  • Standard supervised learning with deep (fully-connected) networks:
    • Images require connections from each pixel → each neuron
      • E.g., 1028 × 768 image ⇒ about 789,504 weights per neuron
      • Slow to train
      • Vanishing/exploding gradient problem
    • Also no spatial locality exploited
  • Can we take inspiration from biological vision systems?

Human visual system

lisa-analysis.png

Image credit: user Clock, CC BY-SA 3.0, via Wikimedia Commons

Convolutional layers

  • Instead of fully-connected layer, think of using a layer whose neurons each have a receptive field:

    conv-layer-conceptual.png

  • Overlapping receptive fields
  • Neurons then learn local features, only have a few weight each
  • Have multiple feature-detecting layers per convolutional layer
  • Problem:
    • Features should be location-independent
    • ⇒ Weights for nodes should be shared, learned together

Shared weights

  • So—how to compute the layer?
  • For a \(n \times n\) receptive field:
    • \(n\times n\) weights
    • If input layer is \(m \times m\), hidden layer is \(m-n+1 \times m-n+1\)
    • For hidden layer neuron at \(x,\ y\), activation is: \[\sigma(b + \sum_{i=0}^{n-1}\sum_{j=0}^{n-1} w_{i,j} a_{x+i, y+j})\]
  • Slide the kernel across, down the image by some stride
  • \(b\) weights = kernel or filter
  • Hidden layer = feature map
  • Update weights based on entire hidden layer’s computed loss function

Why convolutional layer?

  • Learns local spatial features of input
  • Location-independent (location-invariant)
  • Example (from Nielsen, M: Neural Networks and Deep Learning):

    net_full_layer_0.png

  • Typically \(> 1\) feature map/layer ⇒ learn different kinds of features

Pooling layers

  • Convolutional layers are coupled with pooling layers
  • Each node of pooling layer connected to some \(i\times j\) region of feature map

    pooling-layer.png

Pooling layers

  • Pool based on some function—max, average, etc.

    Max_pooling.png

    (Aphex34 [CC BY-SA 4.0], via Wikimedia Commons)

  • Purpose(s):
    • Reduce # weights needed
    • Blur/average/smooth feature map
    • Determining if a feature is in a particular region

Using the features

  • Pooled layers’ output ⇒ fully-connected layer – e.g., for MNIST:

    1conv-layer.png

    (From Nielson))

  • Learn configuration of features
  • Could have multiple fully-connected layers, too

Learning in CNNs

  • Backpropagation learning, gradient descent
  • Equations for fully-connected nets have to be modified, though
  • Theano, TensorFlow, PyTorch – all have support for training CNNs

Multiple convolutional layers

Multiple convolutional layers

  • LeNet-5:
    • 7 layers
    • Recognize numbers on checks

      LeNet-5.png

  • Recall the DQN we talked about used CNNs
  • Many additional variants of CNNs now
  • ResNet: 152 layers, general image recognition, lots of additions to LeNet’s basic architecture

Feature detection in CNNs

  • From ConvNet:

    feature-extraction.png

Progress in image recognition competition

progression-of-cnn-layers.png

(Aphex34 [CC BY-SA 4.0], via Wikimedia Commons)

Your turn

  1. Build a CNN
    • Get into groups, one of whom has a laptop with Keras on it
    • Create a simple CNN for MNIST
  2. Explain a CNN
    • Get into groups with at least 2 laptops
    • Part of group: Look up an “inception” layer in (e.g.) GoogleNet
    • Other part: Look up ResNet
    • Explain them to each other after a few minutes