XSigmoid 보다 ReLU가 더 좋아¶

복습¶

from PIL import Image
Image.open('xor.png')

Activation function¶

값이 전달될 때 어느정도 이상이 되면 활성화 되고, 자극이 적으면 활성화 되지 않는 함수

W1 = tf.Variable(tf.random_uniform([2,2], -1.0, 1.0))
W2 = tf.Variable(tf.random_uniform([2,1], -1.0, 1.0))

b1 = tf.Variable(tf.zeros([2]), name='Bias1')
b2 = tf.Variable(tf.zeros([1]), name='Bias2')

# Our hypothesis
L2 = tf.sigmoid(tf.matmul(X, W1) + b1)
hypothesis = tf.sigmoid(tf.matmul(L2, W2) + b2)

Let's go deep & wide!¶

Image.open('3dan.png')

3단¶

W1 = tf.Variable(tf.random_uniform([2,5], -1.0, 1.0))
W2 = tf.Variable(tf.random_uniform([5,4], -1.0, 1.0))
W3 = tf.Variable(tf.random_uniform([4,1], -1.0, 1.0))

b1 = tf.Variable(tf.zeros([5]), name='Bias1')
b2 = tf.Variable(tf.zeros([4]), name='Bias2')
b3 = tf.Variable(tf.zeros([1]), name='Bias3')

# Our hypothesis
L2 = tf.sigmoid(tf.matmul(X, W1) + b1)
L3 = tf.sigmoid(tf.matmul(L2, W2) + b2)
hypothesis = tf.sigmoid(tf.matmul(L2, W3) + b3)

9 hidden layers!¶

W1 = tf.Variable(tf.random_uniform([2,5], -1.0, 1.0), name = "Weight1")

W2 = tf.Variable(tf.random_uniform([5,5], -1.0, 1.0), name = "Weight2")
W3 = tf.Variable(tf.random_uniform([5,5], -1.0, 1.0), name = "Weight3")
W4 = tf.Variable(tf.random_uniform([5,5], -1.0, 1.0), name = "Weight4")
W5 = tf.Variable(tf.random_uniform([5,5], -1.0, 1.0), name = "Weight5")
W6 = tf.Variable(tf.random_uniform([5,5], -1.0, 1.0), name = "Weight6")
W7 = tf.Variable(tf.random_uniform([5,5], -1.0, 1.0), name = "Weight7")
W8 = tf.Variable(tf.random_uniform([5,5], -1.0, 1.0), name = "Weight8")
W9 = tf.Variable(tf.random_uniform([5,5], -1.0, 1.0), name = "Weight9")
W10 = tf.Variable(tf.random_uniform([5,5], -1.0, 1.0), name = "Weight10")

W11 = tf.Variable(tf.random_uniform([5,1], -1.0, 1.0), name = "Weight11")


b1 = tf.Variable(tf.zeros([5]), name='Bias1')
b2 = tf.Variable(tf.zeros([5]), name='Bias2')
b3 = tf.Variable(tf.zeros([5]), name='Bias3')
b4 = tf.Variable(tf.zeros([5]), name='Bias4')
b5 = tf.Variable(tf.zeros([5]), name='Bias5')
b6 = tf.Variable(tf.zeros([5]), name='Bias6')
b7 = tf.Variable(tf.zeros([5]), name='Bias7')
b8 = tf.Variable(tf.zeros([5]), name='Bias8')
b9 = tf.Variable(tf.zeros([5]), name='Bias9')
b10 = tf.Variable(tf.zeros([5]), name='Bias10')

b11 = tf.Variable(tf.zeros([5]), name='Bias11')


# Our hypothesis
L1 = tf.sigmoid(tf.matmul(X, W1) + b1)
L2 = tf.sigmoid(tf.matmul(L1, W2) + b2)
L3 = tf.sigmoid(tf.matmul(L2, W3) + b3)
L4 = tf.sigmoid(tf.matmul(L3, W4) + b4)
L5 = tf.sigmoid(tf.matmul(L4, W5) + b5)
L6 = tf.sigmoid(tf.matmul(L5, W6) + b6)
L7 = tf.sigmoid(tf.matmul(L6, W7) + b7)
L8 = tf.sigmoid(tf.matmul(L7, W8) + b8)
L9 = tf.sigmoid(tf.matmul(L8, W9) + b9)
L10 = tf.sigmoid(tf.matmul(L9, W10) + b10)

hypothesis = tf.sigmoid(tf.matmul(L10, W11) + b11)

만약 tensorboard를 이용하여 그래프로 출력하고 싶다.

# Our hypothesis
with tf.name_scope("layer1") as scope:
    L1 = tf.sigmoid(tf.matmul(X, W1) + b1)
with tf.name_scope("layer2") as scope:
    L2 = tf.sigmoid(tf.matmul(L1, W2) + b2)
with tf.name_scope("layer3") as scope:
    L3 = tf.sigmoid(tf.matmul(L2, W3) + b3)
with tf.name_scope("layer4") as scope:
    L4 = tf.sigmoid(tf.matmul(L3, W4) + b4)
with tf.name_scope("layer5") as scope:
    L5 = tf.sigmoid(tf.matmul(L4, W5) + b5)
with tf.name_scope("layer6") as scope:
    L6 = tf.sigmoid(tf.matmul(L5, W6) + b6)
with tf.name_scope("layer7") as scope:
    L7 = tf.sigmoid(tf.matmul(L6, W7) + b7)
with tf.name_scope("layer8") as scope:
    L8 = tf.sigmoid(tf.matmul(L7, W8) + b8)
with tf.name_scope("layer9") as scope:
    L9 = tf.sigmoid(tf.matmul(L8, W9) + b9)
with tf.name_scope("layer10") as scope:
    L10 = tf.sigmoid(tf.matmul(L9, W10) + b10)

with tf.name_scope("layer11") as scope:
    hypothesis = tf.sigmoid(tf.matmul(L10, W11) + b11)

Image.open('tensorboard.png')

실행시켜 보자 Poor results?¶

Image.open('result.png')

왜 9단인데 확률이 0.5밖에 안되지?? 2단일때도 1이 나왔는데?

Image.open('tensorboard1.png')

올라갔다 내려갔다 하다가 최종적으로 0.5가 되어버렸다. 왜 이런 현상이 발생했을까?

Image.open('back.png')

1986년의 Backpropagation 알고리즘을 보면, 2단 3단 알고리즘은 잘 학습되는데 9단 10단이 넘어가면 학습이 잘 되지 않았다.

backpropagation을 할 때 전체를 미분하기 힘드므로 하나씩 미분해서 곱해나갔다. 체인룰을 적용하다보니 아주 작은 값들이 곱해지게 되어 아주 작은 값이 되어버린다. 2단 3단까지는 괜찮지만 길어질수록 더더욱 작아져버려 입력이 출력에 영향을 주지 못한다.

이것을 Vanishing gradient (NN winter2: 1986-2006)라고 한다.

Image.open('vanishing.png')

이 문제가 풀어지게 되는데, sigmoid 함수를 썼던 것이 잘못되었다고 판단하게 된다.

Image.open('odap.png')

시그모이드는 항상 1보다 작은 값이 온다는 것이 문제였다.(1보다 작은 값을 자꾸 곱하니..)
그래서 1보다 작아지지 않게 하면 좋지 않을까 싶어 나온 함수가 ReLU라는 function이다.

Image.open('ReLU.png')

시그모이드를 넣었던 자리에 렐루를 그대로 넣어주면 된다.
NN에서는 더이상 sigmoid를 사용하는 것은 좋지 않다. ReLU 사용할것

Image.open('ReLU1.png')

# Our hypothesis
with tf.name_scope("layer1") as scope:
    L1 = tf.ReLU(tf.matmul(X, W1) + b1)
with tf.name_scope("layer2") as scope:
    L2 = tf.ReLU(tf.matmul(L1, W2) + b2)
with tf.name_scope("layer3") as scope:
    L3 = tf.ReLU(tf.matmul(L2, W3) + b3)
with tf.name_scope("layer4") as scope:
    L4 = tf.ReLU(tf.matmul(L3, W4) + b4)
with tf.name_scope("layer5") as scope:
    L5 = tf.ReLU(tf.matmul(L4, W5) + b5)
with tf.name_scope("layer6") as scope:
    L6 = tf.ReLU(tf.matmul(L5, W6) + b6)
with tf.name_scope("layer7") as scope:
    L7 = tf.ReLU(tf.matmul(L6, W7) + b7)
with tf.name_scope("layer8") as scope:
    L8 = tf.ReLU(tf.matmul(L7, W8) + b8)
with tf.name_scope("layer9") as scope:
    L9 = tf.ReLU(tf.matmul(L8, W9) + b9)
with tf.name_scope("layer10") as scope:
    L10 = tf.ReLU(tf.matmul(L9, W10) + b10)

with tf.name_scope("layer11") as scope:
    hypothesis = tf.sigmoid(tf.matmul(L10, W11) + b11)

마지막단은 sigmoid를 사용한다. 왜냐면 마지막의 출력은 0~1사이의 값이어야 하므로

Image.open('result2.png')

Image.open('haha.png')

ReLU와 sigmoid의 Cost function에서의 차이¶

Image.open('vs.png')

LeLU가 아주 잘 작동하는구나. 그러면 이것을 조금 바꾸면 어떨까?해서 응용한 다양한 함수가 나오게 된다.

Image.open('vv.png')

Image.open('10.png')

출처 : https://www.inflearn.com/course/%EA%B8%B0%EB%B3%B8%EC%A0%81%EC%9D%B8-%EB%A8%B8%EC%8B%A0%EB%9F%AC%EB%8B%9D-%EB%94%A5%EB%9F%AC%EB%8B%9D-%EA%B0%95%EC%A2%8C/lecture/3411

Dropout과 앙상블 (0)	2019.05.08
Weight 초기화 잘해보자 (0)	2019.05.08
Tensor Board로 딥네트웍 들여다보기 (0)	2019.05.07
XOR을 위한 텐서플로우 딥네트웍 (0)	2019.05.07
딥 네트웍 학습 시키기 (0)	2019.05.06

일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

조환희의 학습 블로그

티스토리 뷰