지난번에 긴 문장을 RNN을 이용하여 자동화 했었는데 좋은 결과가 나오지 않았다.
그 이유는 복잡하고 많은 데이터를 다루기에는 RNN 셀이 너무 작았기 때문이다.
그래서 기본적으로 딥러닝의 핵심적인 아이디어는 wide하고 deep하게 가야 한다는 것이다.
우리는 RNN cell이 한 층밖에 없었는데, 이것을 더 쌓는 방법은 없을까?

Stacked RNN¶

X = tf.placeholder(tf.int32, [None, seq_length])
Y = tf.placeholder(tf.int32, [None, seq_length])

# One-hot encoding
X_one_hot = tf.one_hot(X, num_classes)
print(X_one_hot) # check out the shape

# Make a lstm cell with hidden_size (each unit output vector size) 이 부분만 고쳐주면 된다.
cell = rnn.BasicLSTMCell(hidden_size, state_is_tuple=True) 그 전에는 셀은 이렇게 한 층만 쌓아줬다.
cell = rnn.MultiRNNCell([cell] * 2, state_is_tuple=True) 
# hidden_size를 결정해 기본 층을 쌓고 그런다음 그 것을 불러와 *2만 해주면 2층이 쌓인다.

# outputs: unfolding size x hidden size, state = hidden size
outputs, _states = tf.nn.dynamic_rnn(cell, X_one_hot, dtype=tf.float32)

Softmax¶

그 전에 CNN 할 때 CNN을 다 해놓고 Fully Connected layer를 썼다.

마찬가지로 RNN에도 나온 값을 그대로 사용하지 않고 softmax과정을 붙여주자. RNN은 펼쳐져있지만 사실은 하나이다. sequence가 100이라고 해서 100개에 맞춰 softmax를 만들어 줄 필요가 없고, 한개로만 만들어 주면 된다. 그렇게 해서 softmax에서 나온 결과를 펼쳐주기만 하면 된다.

하나로 모아주는 과정¶

X_for_softmax = tf.reshape(outputs, [-1, hidden_size]) # RNN에서 나온 output

다시 펼쳐주는 과정¶

outputs = tf.reshape(outputs, [batch_size, seq_length, num_classes]) # softmax에서 나온 output

# (optional) osftmax layer
X_for_softmax = tf.reshape(outputs, [-1, hidden_size])

softmax_w = tf.get_variable('softmax_w', [hidden_size, num_classes]) 
# [RNN에서 나오는 크기 : 입력size, 예측하는 것의 onehot의 크기 : 출력size]

softmax_b = tf.get_variable('softmax_b', [num_classes])
# bias 크기는 output의 크기와 같다.

outputs = tf.matmul(X_for_softmax, softmax_w) + softmax_v  
# 이 output은 activation function을 거치지 않았다!! 

outputs = tf.reshape(outputs, [batch_size, seq_length, num_classes])
# 그러므로 이 output을 logit에 넣는 것이 맞는 것이다.
# RNN에서 바로 나오는 output을 그대로 넣게 되면 activation fuction이 들어있기 때문에 logit으로 쓰면 틀린 값이 나온다.
# 그래서 softmax를 하나 깔고 softmax에서는 activation fuction을 거치지 않았기 때문에 softmax의 output을 logit으로 쓴다.

이렇게 나온 output은 다음 sequence loss에 집어넣어 준다.

Loss¶

# reshape out for sequence_loss
outputs = tf.reshape(outputs, [batch_size, seq_length, num_classes])
# All weights are 1 (equal weights)
weights = tf.ones([batch_size, seq_length])

sequence_loss = tf.contrib.seq2seq.sequence_loss(
    logits=outputs, targets=Y, weights=weights)
mean_loss = tf.reduce_mean(sequence_loss)

train_op = tr.train.AdamOptimizer(learning_rate=0.1).minimize(mean_loss)

Training and print results¶

[In]

sess = tf.Session()
sess.run(tf.global_variables_initializer())

for i in range(500):
    _, l, results = sess.run(
                [train_op, mean_loss, outputs],
                feed_dict={X: dataX, Y: dataY})

    for j, result in enumerate(results):
        index = np.argmax(result, axis=1)
        print(i, j, ''.join([char_set[t] for t in index]), l)

[Out]

0 167 tttttttttt 3.23111
0 168 tttttttttt 3.23111
0 169 tttttttttt 3.231111
...
499 167 oof the se 0.229306
499 168 tf the sea 0.229306
499 169 n the sea. 0.229306

원하는 결과가 나왔다. 이제 뿌려놨던 배치들을 모아보자.

[In]

# Let's print the last char of each result to check it works
results = sess.run(outputs, feed_dict={X: dataX})
for j, result in enumerate(results):
    index = np.argmax(result, axis=1)
    if j is 0: # print all for the first result to make a sentence
        print(''.join([char_set[t] for t in index]), end='')
    else:
        print(char_set[index[-1]], end='')

[Out]

g you want to build a ship, don't drum up people together to collect wood and don't assign them tasks and work, but rather teach them to long for the endless immensity of the sea.

구현¶

from __future__ import print_function

import tensorflow as tf
import numpy as np
from tensorflow.contrib import rnn

tf.set_random_seed(777)  # reproducibility

sentence = ("if you want to build a ship, don't drum up people together to "
            "collect wood and don't assign them tasks and work, but rather "
            "teach them to long for the endless immensity of the sea.")

char_set = list(set(sentence))
char_dic = {w: i for i, w in enumerate(char_set)}

data_dim = len(char_set)
hidden_size = len(char_set)
num_classes = len(char_set)
sequence_length = 10  # Any arbitrary number
learning_rate = 0.1

dataX = []
dataY = []
for i in range(0, len(sentence) - sequence_length):
    x_str = sentence[i:i + sequence_length]
    y_str = sentence[i + 1: i + sequence_length + 1]
    print(i, x_str, '->', y_str)

    x = [char_dic[c] for c in x_str]  # x str to index
    y = [char_dic[c] for c in y_str]  # y str to index

    dataX.append(x)
    dataY.append(y)

batch_size = len(dataX)

X = tf.placeholder(tf.int32, [None, sequence_length])
Y = tf.placeholder(tf.int32, [None, sequence_length])

# One-hot encoding
X_one_hot = tf.one_hot(X, num_classes)
print(X_one_hot)  # check out the shape


# Make a lstm cell with hidden_size (each unit output vector size)
def lstm_cell():
    cell = rnn.BasicLSTMCell(hidden_size, state_is_tuple=True)
    return cell

multi_cells = rnn.MultiRNNCell([lstm_cell() for _ in range(2)], state_is_tuple=True)

# outputs: unfolding size x hidden size, state = hidden size
outputs, _states = tf.nn.dynamic_rnn(multi_cells, X_one_hot, dtype=tf.float32)

# FC layer
X_for_fc = tf.reshape(outputs, [-1, hidden_size])
outputs = tf.contrib.layers.fully_connected(X_for_fc, num_classes, activation_fn=None)

# reshape out for sequence_loss
outputs = tf.reshape(outputs, [batch_size, sequence_length, num_classes])

# All weights are 1 (equal weights)
weights = tf.ones([batch_size, sequence_length])

sequence_loss = tf.contrib.seq2seq.sequence_loss(
    logits=outputs, targets=Y, weights=weights)
mean_loss = tf.reduce_mean(sequence_loss)
train_op = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(mean_loss)

sess = tf.Session()
sess.run(tf.global_variables_initializer())

for i in range(500):
    _, l, results = sess.run(
        [train_op, mean_loss, outputs], feed_dict={X: dataX, Y: dataY})
    for j, result in enumerate(results):
        index = np.argmax(result, axis=1)
        print(i, j, ''.join([char_set[t] for t in index]), l)

# Let's print the last char of each result to check it works
results = sess.run(outputs, feed_dict={X: dataX})
for j, result in enumerate(results):
    index = np.argmax(result, axis=1)
    if j is 0:  # print all for the first result to make a sentence
        print(''.join([char_set[t] for t in index]), end='')
    else:
        print(char_set[index[-1]], end='')

0 if you wan -> f you want
1 f you want ->  you want 
2  you want  -> you want t
3 you want t -> ou want to
4 ou want to -> u want to 
5 u want to  ->  want to b
...
499 165 fy of the  0.22926518
499 166 h of the s 0.22926518
499 167 oof the se 0.22926518
499 168 tf the sea 0.22926518
499 169 n the sea. 0.22926518
l you want to build a ship, don't drum up people together to collect wood and don't assign them tasks and work, but rather teach them to long for the endless immensity of the sea.

소설도 쓸 수 있고, 혼자 소스코드도 구현하기도 한다.

다음 페이지에 가면 다양한 rnn 코드를 확인할 수 있다.¶

https://github.com/sherjilozair/char-rnn-tensorflow
http://github.com/hunkim/word-rnn-tensorflow

출처 : https://www.inflearn.com/course/%EA%B8%B0%EB%B3%B8%EC%A0%81%EC%9D%B8-%EB%A8%B8%EC%8B%A0%EB%9F%AC%EB%8B%9D-%EB%94%A5%EB%9F%AC%EB%8B%9D-%EA%B0%95%EC%A2%8C/lecture/3428

RNN with Time Series Data (0)	2019.05.18
RNN with Time Series Data (0)	2019.05.18
Long Sequence RNN (0)	2019.05.16
RNN-Hi Hello (0)	2019.05.15
RNN in TensorFlow (0)	2019.05.14

일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

조환희의 학습 블로그

티스토리 뷰

Stacked RNN + Softmax Layer