딥러닝으로 MNIST 98%이상 해보기


이번 시간은 neural net을 사용할 때 유용한 팁에 대해서 알아보겠다.


MNIST Softmax!

In [1]:
# Lab 7 Learning rate and Evaluation
import tensorflow as tf
import random
# import matplotlib.pyplot as plt
from tensorflow.examples.tutorials.mnist import input_data
tf.set_random_seed(777)  # reproducibility

mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
# Check out https://www.tensorflow.org/get_started/mnist/beginners for
# more information about the mnist dataset

# parameters
learning_rate = 0.001
training_epochs = 15
batch_size = 100

# input place holders
X = tf.placeholder(tf.float32, [None, 784])
Y = tf.placeholder(tf.float32, [None, 10])

# weights & bias for nn layers
W = tf.Variable(tf.random_normal([784, 10]))
b = tf.Variable(tf.random_normal([10]))

hypothesis = tf.matmul(X, W) + b

# define cost/loss & optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
    logits=hypothesis, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# initialize
sess = tf.Session()

# train my model
for epoch in range(training_epochs):
    avg_cost = 0
    total_batch = int(mnist.train.num_examples / batch_size)

    for i in range(total_batch):
        batch_xs, batch_ys = mnist.train.next_batch(batch_size)
        feed_dict = {X: batch_xs, Y: batch_ys}
        c, _ = sess.run([cost, optimizer], feed_dict=feed_dict)
        avg_cost += c / total_batch

    print('Epoch:', '%04d' % (epoch + 1), 'cost =', '{:.9f}'.format(avg_cost))

print('Learning Finished!')

# Test model and check accuracy
correct_prediction = tf.equal(tf.argmax(hypothesis, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print('Accuracy:', sess.run(accuracy, feed_dict={
      X: mnist.test.images, Y: mnist.test.labels}))

# Get one and predict
r = random.randint(0, mnist.test.num_examples - 1)
print("Label: ", sess.run(tf.argmax(mnist.test.labels[r:r + 1], 1)))
print("Prediction: ", sess.run(
    tf.argmax(hypothesis, 1), feed_dict={X: mnist.test.images[r:r + 1]}))

# plt.imshow(mnist.test.images[r:r + 1].
#           reshape(28, 28), cmap='Greys', interpolation='nearest')
# plt.show()
Epoch: 0001 cost = 5.745170995
Epoch: 0002 cost = 1.780056727
Epoch: 0003 cost = 1.122778645
Epoch: 0004 cost = 0.872012251
Epoch: 0005 cost = 0.738203191
Epoch: 0006 cost = 0.654728889
Epoch: 0007 cost = 0.596023612
Epoch: 0008 cost = 0.552216822
Epoch: 0009 cost = 0.518254963
Epoch: 0010 cost = 0.491113201
Epoch: 0011 cost = 0.468347534
Epoch: 0012 cost = 0.449374355
Epoch: 0013 cost = 0.432675662
Epoch: 0014 cost = 0.418828156
Epoch: 0015 cost = 0.406128935
Learning Finished!
Accuracy: 0.9023
Label:  [1]
Prediction:  [1]

지난번 MNIST data (https://jfun.tistory.com/169)에서 했던 내용이다.

hypothesis = tf.matmul(X, W) + b

# define cost/loss & optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
    logits=hypothesis, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

이 코드는 위에있는 간단한 네트워크의 모델부분인데 3줄밖에 안되는데도 정확도가 무려 90%가 나왔다.
이 모델부분을 조금 더 깊이 들어가보자.




NN에서는 레이어의 크기를 조심해야 한다.

In [2]:
# input place holders
X = tf.placeholder(tf.float32, [None, 784])
Y = tf.placeholder(tf.float32, [None, 10])

# weights & bias for nn layers
W1 = tf.Variable(tf.random_normal([784, 256]))
b1 = tf.Variable(tf.random_normal([256]))
L1 = tf.nn.relu(tf.matmul(X, W1) + b1)

W2 = tf.Variable(tf.random_normal([256, 256]))
b2 = tf.Variable(tf.random_normal([256]))
L2 = tf.nn.relu(tf.matmul(L1, W2) + b2)

W3 = tf.Variable(tf.random_normal([256, 10]))
b3 = tf.Variable(tf.random_normal([10]))
hypothesis = tf.matmul(L2, W3) + b3

# define cost/loss &/ optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
    logits=hypothesis, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
In [3]:
# initialize
sess = tf.Session()

# train my model
for epoch in range(training_epochs):
    avg_cost = 0
    total_batch = int(mnist.train.num_examples / batch_size)

    for i in range(total_batch):
        batch_xs, batch_ys = mnist.train.next_batch(batch_size)
        feed_dict = {X: batch_xs, Y: batch_ys}
        c, _ = sess.run([cost, optimizer], feed_dict=feed_dict)
        avg_cost += c / total_batch

    print('Epoch:', '%04d' % (epoch + 1), 'cost =', '{:.9f}'.format(avg_cost))

print('Learning Finished!')

# Test model and check accuracy
correct_prediction = tf.equal(tf.argmax(hypothesis, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print('Accuracy:', sess.run(accuracy, feed_dict={
      X: mnist.test.images, Y: mnist.test.labels}))

# Get one and predict
r = random.randint(0, mnist.test.num_examples - 1)
print("Label: ", sess.run(tf.argmax(mnist.test.labels[r:r + 1], 1)))
print("Prediction: ", sess.run(
    tf.argmax(hypothesis, 1), feed_dict={X: mnist.test.images[r:r + 1]}))

# plt.imshow(mnist.test.images[r:r + 1].
#           reshape(28, 28), cmap='Greys', interpolation='nearest')
# plt.show()
Epoch: 0001 cost = 166.656716985
Epoch: 0002 cost = 41.038044298
Epoch: 0003 cost = 25.719991985
Epoch: 0004 cost = 17.777964834
Epoch: 0005 cost = 12.983673341
Epoch: 0006 cost = 9.572005866
Epoch: 0007 cost = 7.205640663
Epoch: 0008 cost = 5.499154909
Epoch: 0009 cost = 4.002632276
Epoch: 0010 cost = 3.117470723
Epoch: 0011 cost = 2.328740600
Epoch: 0012 cost = 1.740799948
Epoch: 0013 cost = 1.238861716
Epoch: 0014 cost = 1.043431234
Epoch: 0015 cost = 0.779316331
Learning Finished!
Accuracy: 0.9433
Label:  [3]
Prediction:  [3]

무려 94%라니!!!!!!!!!!


지난시간에 초기화를 잘 해야 한다는 말을 한적이 있었는데, xavier라는 방법이 있었다.
모르는게 생긴다면 구글에 가서 질문하기 'xavier initialization tensorflow'


Xavier fo MNIST

In [4]:
# input place holders
X = tf.placeholder(tf.float32, [None, 784])
Y = tf.placeholder(tf.float32, [None, 10])

# weights & bias for nn layers
# http://stackoverflow.com/questions/33640581/how-to-do-xavier-initialization-on-tensorflow
W1 = tf.get_variable("W1", shape=[784, 256],
b1 = tf.Variable(tf.random_normal([256]))
L1 = tf.nn.relu(tf.matmul(X, W1) + b1)

W2 = tf.get_variable("W2", shape=[256, 256],
b2 = tf.Variable(tf.random_normal([256]))
L2 = tf.nn.relu(tf.matmul(L1, W2) + b2)

W3 = tf.get_variable("W3", shape=[256, 10],
b3 = tf.Variable(tf.random_normal([10]))
hypothesis = tf.matmul(L2, W3) + b3

# define cost/loss & optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
    logits=hypothesis, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
In [5]:
# initialize
sess = tf.Session()

# train my model
for epoch in range(training_epochs):
    avg_cost = 0
    total_batch = int(mnist.train.num_examples / batch_size)

    for i in range(total_batch):
        batch_xs, batch_ys = mnist.train.next_batch(batch_size)
        feed_dict = {X: batch_xs, Y: batch_ys}
        c, _ = sess.run([cost, optimizer], feed_dict=feed_dict)
        avg_cost += c / total_batch

    print('Epoch:', '%04d' % (epoch + 1), 'cost =', '{:.9f}'.format(avg_cost))

print('Learning Finished!')

# Test model and check accuracy
correct_prediction = tf.equal(tf.argmax(hypothesis, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print('Accuracy:', sess.run(accuracy, feed_dict={
      X: mnist.test.images, Y: mnist.test.labels}))

# Get one and predict
r = random.randint(0, mnist.test.num_examples - 1)
print("Label: ", sess.run(tf.argmax(mnist.test.labels[r:r + 1], 1)))
print("Prediction: ", sess.run(
    tf.argmax(hypothesis, 1), feed_dict={X: mnist.test.images[r:r + 1]}))

# plt.imshow(mnist.test.images[r:r + 1].
#           reshape(28, 28), cmap='Greys', interpolation='nearest')
# plt.show()
Epoch: 0001 cost = 0.301935923
Epoch: 0002 cost = 0.116421225
Epoch: 0003 cost = 0.076252542
Epoch: 0004 cost = 0.057026181
Epoch: 0005 cost = 0.039157043
Epoch: 0006 cost = 0.031723986
Epoch: 0007 cost = 0.023718875
Epoch: 0008 cost = 0.020155743
Epoch: 0009 cost = 0.013723808
Epoch: 0010 cost = 0.017393448
Epoch: 0011 cost = 0.015805782
Epoch: 0012 cost = 0.009462772
Epoch: 0013 cost = 0.010965110
Epoch: 0014 cost = 0.007961460
Epoch: 0015 cost = 0.008922998
Learning Finished!
Accuracy: 0.978
Label:  [4]
Prediction:  [4]

와 거의 98%에 육박하는 정확도가 낮았다.
그리고 재미있는점은 에폭이 1이어도 cost가 매우 낮다.
이것은 초기값이 잘 initialize 되었다는 의미이다. (같은 모델에 초기값만 잘 잡아줘도 정확도를 올리는데 영향을 많이 준다.)


Deep NN for MNIST


256에서 512로 넓게 하고, 그리고 5단으로 보다 깊게 해보자.

In [6]:
# input place holders
X = tf.placeholder(tf.float32, [None, 784])
Y = tf.placeholder(tf.float32, [None, 10])

# weights & bias for nn layers
# http://stackoverflow.com/questions/33640581/how-to-do-xavier-initialization-on-tensorflow
W1 = tf.get_variable("W1_", shape=[784, 512],
b1 = tf.Variable(tf.random_normal([512]))
L1 = tf.nn.relu(tf.matmul(X, W1) + b1)

W2 = tf.get_variable("W2_", shape=[512, 512],
b2 = tf.Variable(tf.random_normal([512]))
L2 = tf.nn.relu(tf.matmul(L1, W2) + b2)

W3 = tf.get_variable("W3_", shape=[512, 512],
b3 = tf.Variable(tf.random_normal([512]))
L3 = tf.nn.relu(tf.matmul(L2, W3) + b3)

W4 = tf.get_variable("W4_", shape=[512, 512],
b4 = tf.Variable(tf.random_normal([512]))
L4 = tf.nn.relu(tf.matmul(L3, W4) + b4)

W5 = tf.get_variable("W5_", shape=[512, 10],
b5 = tf.Variable(tf.random_normal([10]))
hypothesis = tf.matmul(L4, W5) + b5
In [7]:
# define cost/loss & optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
    logits=hypothesis, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# initialize
sess = tf.Session()

# train my model
for epoch in range(training_epochs):
    avg_cost = 0
    total_batch = int(mnist.train.num_examples / batch_size)

    for i in range(total_batch):
        batch_xs, batch_ys = mnist.train.next_batch(batch_size)
        feed_dict = {X: batch_xs, Y: batch_ys}
        c, _ = sess.run([cost, optimizer], feed_dict=feed_dict)
        avg_cost += c / total_batch

    print('Epoch:', '%04d' % (epoch + 1), 'cost =', '{:.9f}'.format(avg_cost))

print('Learning Finished!')

# Test model and check accuracy
correct_prediction = tf.equal(tf.argmax(hypothesis, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print('Accuracy:', sess.run(accuracy, feed_dict={
      X: mnist.test.images, Y: mnist.test.labels}))

# Get one and predict
r = random.randint(0, mnist.test.num_examples - 1)
print("Label: ", sess.run(tf.argmax(mnist.test.labels[r:r + 1], 1)))
print("Prediction: ", sess.run(
    tf.argmax(hypothesis, 1), feed_dict={X: mnist.test.images[r:r + 1]}))

# plt.imshow(mnist.test.images[r:r + 1].
#           reshape(28, 28), cmap='Greys', interpolation='nearest')
# plt.show()
Epoch: 0001 cost = 0.300793265
Epoch: 0002 cost = 0.103306956
Epoch: 0003 cost = 0.070477889
Epoch: 0004 cost = 0.052671541
Epoch: 0005 cost = 0.039592809
Epoch: 0006 cost = 0.035387814
Epoch: 0007 cost = 0.030010276
Epoch: 0008 cost = 0.025740681
Epoch: 0009 cost = 0.022673877
Epoch: 0010 cost = 0.019972242
Epoch: 0011 cost = 0.018704831
Epoch: 0012 cost = 0.017537554
Epoch: 0013 cost = 0.015988760
Epoch: 0014 cost = 0.015692382
Epoch: 0015 cost = 0.016034859
Learning Finished!
Accuracy: 0.9798
Label:  [6]
Prediction:  [6]

더 넓게 더 깊게 쌓았는데 확률이 0.004프로 떨어졌다.
라는 시나리오로 가야 하는데.. 더 높게 나와버렸다..ㅜ
만약 떨어졌다면 왜 이런 일이 발생하는 것일까?
데이터마다 경우가 다르긴 하겠지만, 여기서는 overfitting이다.
그 overffiting을 예방하는 방법중 하나가 drop out이다.


Dropout for MNIST


텐서플로우에서 dropout이라는 layer를 하나 더 추가하면된다.
L1 다음에 drop시켜주는 layer 추가. L2 다음에 drop시켜주는 layer 추가. 이런식
얼마나 유지시켜줄지 keep_prop.(test할때는 1로 할 것!)

In [10]:
# dropout (keep_prob) rate  0.7 on training, but should be 1 for testing
keep_prob = tf.placeholder(tf.float32)

# weights & bias for nn layers
# http://stackoverflow.com/questions/33640581/how-to-do-xavier-initialization-on-tensorflow
W1 = tf.get_variable("W1_1", shape=[784, 512],
b1 = tf.Variable(tf.random_normal([512]))
L1 = tf.nn.relu(tf.matmul(X, W1) + b1)
L1 = tf.nn.dropout(L1, keep_prob=keep_prob)

W2 = tf.get_variable("W2_1", shape=[512, 512],
b2 = tf.Variable(tf.random_normal([512]))
L2 = tf.nn.relu(tf.matmul(L1, W2) + b2)
L2 = tf.nn.dropout(L2, keep_prob=keep_prob)

W3 = tf.get_variable("W3_1", shape=[512, 512],
b3 = tf.Variable(tf.random_normal([512]))
L3 = tf.nn.relu(tf.matmul(L2, W3) + b3)
L3 = tf.nn.dropout(L3, keep_prob=keep_prob)

W4 = tf.get_variable("W4_1", shape=[512, 512],
b4 = tf.Variable(tf.random_normal([512]))
L4 = tf.nn.relu(tf.matmul(L3, W4) + b4)
L4 = tf.nn.dropout(L4, keep_prob=keep_prob)

W5 = tf.get_variable("W5_1", shape=[512, 10],
b5 = tf.Variable(tf.random_normal([10]))
hypothesis = tf.matmul(L4, W5) + b5
WARNING:tensorflow:From <ipython-input-10-4b39660c2b49>:10: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
In [11]:
# define cost/loss & optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
    logits=hypothesis, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# initialize
sess = tf.Session()

# train my model
for epoch in range(training_epochs):
    avg_cost = 0

    for i in range(total_batch):
        batch_xs, batch_ys = mnist.train.next_batch(batch_size)
        feed_dict = {X: batch_xs, Y: batch_ys, keep_prob: 0.7}
        c, _ = sess.run([cost, optimizer], feed_dict=feed_dict)
        avg_cost += c / total_batch

    print('Epoch:', '%04d' % (epoch + 1), 'cost =', '{:.9f}'.format(avg_cost))

print('Learning Finished!')

# Test model and check accuracy
correct_prediction = tf.equal(tf.argmax(hypothesis, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print('Accuracy:', sess.run(accuracy, feed_dict={
      X: mnist.test.images, Y: mnist.test.labels, keep_prob: 1}))

# Get one and predict
r = random.randint(0, mnist.test.num_examples - 1)
print("Label: ", sess.run(tf.argmax(mnist.test.labels[r:r + 1], 1)))
print("Prediction: ", sess.run(
    tf.argmax(hypothesis, 1), feed_dict={X: mnist.test.images[r:r + 1], keep_prob: 1}))

# plt.imshow(mnist.test.images[r:r + 1].
#           reshape(28, 28), cmap='Greys', interpolation='nearest')
# plt.show()
Epoch: 0001 cost = 0.479064576
Epoch: 0002 cost = 0.169453053
Epoch: 0003 cost = 0.129123473
Epoch: 0004 cost = 0.105926294
Epoch: 0005 cost = 0.092658146
Epoch: 0006 cost = 0.080533782
Epoch: 0007 cost = 0.074052478
Epoch: 0008 cost = 0.066805487
Epoch: 0009 cost = 0.062837852
Epoch: 0010 cost = 0.057148129
Epoch: 0011 cost = 0.054961414
Epoch: 0012 cost = 0.054889232
Epoch: 0013 cost = 0.048306230
Epoch: 0014 cost = 0.047772515
Epoch: 0015 cost = 0.043271988
Learning Finished!
Accuracy: 0.9809
Label:  [3]
Prediction:  [3]

98%가 넘었다. 엄청난 결과다.!




train = tf.train.GradientOptimizer(learning_rate=0.1).minimize(cost)

  • tf.train.AdadeltaOptimizer
  • tf.train.AdagradOptimizer
  • tf.train.AdagradDAOptimizer
  • tf.train.MomentumOptimizer
  • tf.train.AdamOptimizer
  • tf.train.FtrlOptimizer
  • tf.train.ProximalGradientDescentOptimizer
  • tf.train.ProximalAdagradOptimizer
  • tf.train.RMSPropOptimizer

Optimizer 종류는 여러가지가 있고 어떤것이 학습이 잘 되는지 테스트 해볼것.


optimizer를 종류별로 테스트 가능

In [12]:
from PIL import Image

cost가 빨리 줄어들수록 해당 데이터에 적절한 optimizer이다.
보다시피 adam이라는 optimizer가 빨리 줄어드는데, 많이 사용하는 optimizer이니 알아두자


Use Adam Optimizer


gradiet descent와 이름부분만 바꾸면 사용할 수 있다.

# define cost/loss & optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
                                             logits = hypothesis, labels = Y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

Exercise: Batch Normalization


입력값을 normalize 잘 하는 방법. 많은 사람들이 사용하고 있다. 이것을 한 번 보고 98% 이상 성능을 올릴 수 있는지 연습해보자.

