티스토리 뷰

 

 

전 시간에 hihello라는 데이터를 RNN으로 학습시켰다. 이것을 학습시킬때 숫자로 고치는 과정이라던지 원핫인코딩이라던지 손으로 일일히 했다. 하지만 이게 쉬운 단어가 아니라 책과 같은 많은 양의 문자를 다뤄야 할때는 어떻게 해야 할까?

 

Better data creation

 
sample = "if you want you"
idx2char = list(set(sample))   # index -> char , set으로 유니크한 문자열이 나오면 list로 만들어버림
char2idx = {c: i for i, c in enumerate(idx2char)}   # char -> idx 

sample_idx = [char2idx[c] for c in sample]   # char to index
x_data = [sample_idx[:-1]]   # X data sample (0 ~ n-1) hello: hell
y_data = [sample_idx[1:]]    # Y label sample (1 ~ n) hello: ello

X = tf.placeholder(tf.int32, [None, sequence_length])   # X data
Y = tf.placeholder(tf.int32, [None, sequence_length])   # Y label

X_one_hot = tf.one_hot(X, num_classes)    # one hot: 1 -> 0 1 0 0 0 0 0 0 0 0 , num_classes는 sample의 유일한 문자의 개수
 

Hyper parameters

 

하이퍼 파라미터도 자동으로 뽑아낼 수 있다.

 
sample = "if you want you"
idx2char = list(set(sample))   # index -> char
char2idx = {c: i for i, c in enumerate(idx2char)}   # char -> idx 

# hyper parameters
dic_size = len(char2idx)    # RNN input size (one hot size)
rnn_hidden_size = len(char2idx)    # RNN output size
num_classes = len(Char2idx)    # final output size (RNN or softmax, etc.)
batch_size = 1   # one sample data, one batch
sequence_length = len(sample) -1    #number of lstm unfolding (unit #)
 

LSTM and Loss

 
X = tf.placeholder(tf.int32, [None, sequence_length])    # X data
Y = tf.placeholder(tf.int32, [None, sequence_length])    # Y label

X_one_hot = tf.one_hot(X, num_classes)    # one hot: 1 -> 0 1 0 0 0 0 0 0 0 0

cell = tf.contrib.rnn.BasicLSTMCell(num_units=rnn_hidden_size, state_is_tuple=True)
initial_state = cell.zero_state(batch_size, tf.float32)
outputs, _states = tf.nn.dynamic_rnn(
    cell, X_one_hot, initial_state=initial_state, dtype=tf.float32)

weights = tf.ones([batch_size, sequence_length])
sequence_loss = tf.contrib.seq2seq.sequence_loss(logits=outputs, targets=Y, weigts=weights)
loss = tf.reduce_mean(sequence_loss)
train = tf.AdamOptimizer(learning_rate=0.1).minimize(loss)

prediction = tf.argmax(outputs, axis=2)
 

Training and Results

 
[In]

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for i in range(3000):
        l, _ = sess.run([loss, train], feed_dict={X: x_data, Y: y_data})
        result = sess.run(prediction, feed_dict={X: x_data})
        # print char using dic
        result_str = [idx2char[c] for c in np.squeeze(result)]
        print(i, "loss:", l, "Prediction:", ''.join(result_Str))
 
[Out]

0 loss: 2.29895 Prediction: nnuffuunnuuuyuy
1 loss: 2.29675 Prediction: nnuffuunnuuuyuy
...

1418 loss: 1.37351 Prediction: if you want you
1419 loss: 1.37331 Prediction: if you want you
 

구현

In [ ]:
import tensorflow as tf
import numpy as np
tf.set_random_seed(777)  # reproducibility

sample = " if you want you"
idx2char = list(set(sample))  # index -> char
char2idx = {c: i for i, c in enumerate(idx2char)}  # char -> idex

# hyper parameters
dic_size = len(char2idx)  # RNN input size (one hot size)
rnn_hidden_size = len(char2idx)  # RNN output size
num_classes = len(char2idx)  # final output size (RNN or softmax, etc.)
batch_size = 1  # one sample data, one batch
sequence_length = len(sample) - 1  # number of lstm rollings (unit #)
learning_rate = 0.1

sample_idx = [char2idx[c] for c in sample]  # char to index
x_data = [sample_idx[:-1]]  # X data sample (0 ~ n-1) hello: hell
y_data = [sample_idx[1:]]   # Y label sample (1 ~ n) hello: ello

X = tf.placeholder(tf.int32, [None, sequence_length])  # X data
Y = tf.placeholder(tf.int32, [None, sequence_length])  # Y label

# flatten the data (ignore batches for now). No effect if the batch size is 1
X_one_hot = tf.one_hot(X, num_classes)  # one hot: 1 -> 0 1 0 0 0 0 0 0 0 0
X_for_softmax = tf.reshape(X_one_hot, [-1, rnn_hidden_size])

# softmax layer (rnn_hidden_size -> num_classes)
softmax_w = tf.get_variable("softmax_w", [rnn_hidden_size, num_classes])
softmax_b = tf.get_variable("softmax_b", [num_classes])
outputs = tf.matmul(X_for_softmax, softmax_w) + softmax_b

# expend the data (revive the batches)
outputs = tf.reshape(outputs, [batch_size, sequence_length, num_classes])
weights = tf.ones([batch_size, sequence_length])

# Compute sequence cost/loss
sequence_loss = tf.contrib.seq2seq.sequence_loss(
    logits=outputs, targets=Y, weights=weights)
loss = tf.reduce_mean(sequence_loss)  # mean all sequence loss
train = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)

prediction = tf.argmax(outputs, axis=2)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for i in range(3000):
        l, _ = sess.run([loss, train], feed_dict={X: x_data, Y: y_data})
        result = sess.run(prediction, feed_dict={X: x_data})

        # print char using dic
        result_str = [idx2char[c] for c in np.squeeze(result)]
        print(i, "loss:", l, "Prediction:", ''.join(result_str))
In [ ]:
'''
0 loss: 2.29513 Prediction: yu yny y y oyny
1 loss: 2.10156 Prediction: yu ynu y y oynu
2 loss: 1.92344 Prediction: yu you y u  you
..
2997 loss: 0.277323 Prediction: yf you yant you
2998 loss: 0.277323 Prediction: yf you yant you
2999 loss: 0.277323 Prediction: yf you yant you
'''
 

Really long sentence?

 
sentence = ("if you want to build a ship, don't drum up people together to "
            "collect wood and don't assign then tasks and work, but rather "
            "teach them to long for the endless immensity of the sea.")
 
# training dataset
0 if you wan -> f you want
1 f you want -> you want
2 you want   -> you want t
3 you wnat t -> ou want to
...
168 of the se-> of the sea
169of the sea-> f the sea
 
char_set = list(set(sentence))
char_dic = {w: i for i, w in enumerate(char_set)}

dataX = []
dataY = []
for i in range(0, len(sentence) - seq_length):
    x_str = sentence[i:i + seq_length]
    y_str = sentence[i + 1: i + seq_length + 1]
    print(i, x_str, '->', y_str)

    x = [char_dic[c] for c in x_str] # x str to index
    y = [char_dic[c] for c in y_str] # y str to index

    dataX.append(x)
    dataY.append(y
 

RNN parameters

 
char_set = list(set(sentence))
char_dic = {w: i for i, w in enumerate(char_set)}

data_dim = len(char_set)
hidden_size = len(char_set)
num_classes = len(char_set)
seq_length = 10 # Any arbitrary number

batch_size = len(dataX) # 169
 

LSTM and Loss

 
X = tf.placeholder(tf.int32, [None, sequence_length]) # X data
Y = tf.placeholder(tf.int32, [None, sequence_length]) # Y label

X_one_hot =tf.one_hot(X, num_classes) # one hot: 1 -> 0 1 0 0 0 0 0 0 0 0 

cell = tf.contrib.rnn.BasicLSTMCell(num_units=rnn_hidden_size, state_is_tuple=True)
initial_state = cell.zero_state(batch_size, tf.float32)
outputs, _states = tf.nn.dynamic_rnn(
    cell, X_one_hot, initial_state=initial_state, dtype=tf.float32)

weights = tf.ones([batch_size, sequence_length])
sequence_loss = tf.contrib.seq2seq.seqence_loss(logits=outputs, targets=Y, weights=weights)
loss = tf.reduce_mean(sequence_loss)
train = tf.train.GradientDescentOptimizer(learning_rate=0.1).minimize(loss)

prediction = tf.argmax(outputs, axis=2)
 

그런데 이게 잘 안돼!!

다음을 생각해보자.

 
  • Run long sequence RNN
  • Why it does not work?
 

hint: logit, RNN이 깊지 않다는 것

'beginner > 파이썬 딥러닝 기초' 카테고리의 다른 글

RNN with Time Series Data  (0) 2019.05.18
Stacked RNN + Softmax Layer  (0) 2019.05.18
RNN-Hi Hello  (0) 2019.05.15
RNN in TensorFlow  (0) 2019.05.14
NN의 꽃 RNN 이야기  (0) 2019.05.13
공지사항
최근에 올라온 글
최근에 달린 댓글
Total
Today
Yesterday
링크
«   2024/04   »
1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30
글 보관함