RNN을 이용하여 Stock Marcket과 같은 time series를 예측하는 방법에 대해 이야기 해보겠다.¶

time series data란 무엇일까?
어떤 시간에 따라 변화하는 값을 의미한다. x축은 시간 y축은 value가 될 것이다.

대표적인 예로 다음과 같은 주식시간의 데이터를 들 수 있겠다.
오픈때 가격, 최대가, 최저가, 판매량, 닫힐때 가격인데, 이것이 매일 기록되니 time series data라 할 수 있다.

RNN을 가지고 어떻게 분석할까?
우리가 7일치의 데이터를 가지고 있다고 해보자. 그러면 이 데이터들을 쭉 배열하고 8일째의 가격을 예측해 보는 것이다. 이런 문제를 many to one이라고 한다. 이런 형태의 예측은 7일의 데이터만 가지고 예측하는 것보다 이전의 데이터들을 연결해서 이전의 데이터들이 어떤 영향을 미친다는 것이 기본적인 time series의 가설이다.

그러면 우리가 가지고 있는 데이터를 어떻게 넣어야 할까?
우리가 알고싶은 값이 ?라고 하자
그러면 그 전까지의 데이터들을 넣어주면 ? 값을 예측할 수 있을것이다.
여기서 생각해볼 것이 입력의 Dimension은 몇일까? Sequence의 길이는 몇일까? 출력의 Hidden size는 얼마일까? 이다.
다음 데이터에서
open~close까지 5개의 컬럼을 가지고 있으므로 input data의 dimension은 5가 된다.
7일동안의 데이터를 가지고 있으므로 sequence의 길이는 7이 된다.
Output dimention은 7일동안의 데이터로 8일 데이터 하나를 구하기 때문에 1이 된다. 만약 fully connected layer을 사용한다면 output dimention을 마음대로 설정할수도 있겠다.

Reading data¶

[In]

timesteps = seq_length = 7
data_dim = 5
output_dim = 1
# Open, High, Low, Close, Volume
xy = np.loadtxt('data-02-stock_daily.csv', delimiter=',')
xy = xy[::-1] # reverse order(chronically ordered)
xy = MinMaxScaler(xy)
x = xy
y = xy[:, [-1]] # Close as label

dataX = []
dataY = []
for i in range(0, len(y) - seq_length):
    _x = x[i:i + seq_length]
    _y = y[i + seq_length] # Next close price
    print(_x, '->', _y)
    dataX.append(_x)
    dataY.append(_y)

[Out]

[0.18667876 0.20948057 0.20878184 0. 0.21744815]

...

[0.18933069 0.20057799 0.19187983 0.29783096 0.2086465] # 여기까지 x input값 각각 계산 한 것

-> [0.14106001]   # 예측값 y

Training and test datasets¶

#split to train and testing
train_size = int(len(dataY) * 0.7)
test_size = len(dataY) - train_size
trainX, testX = np.array(dataX[0:train_size]),
                np.array(dataX[train_size:len(dataX)])
trainY, testY = np.array(dataY[0:train_size]),
                np.array(dataY[train_size:len(dataY)])

# input placeholders
X = tf.placeholder(tf.float32, [None, seq_length, data_dim])
Y = tf.placeholder(tf.float32, [None, 1])

LSTM and Loss¶

# input placeholder
X = tf.placeholder(tf.float32, [None, seq_length, data_dim])
Y = tf.placeholder(tf.float32, [None, 1])

#rnn출력을 Fully connected를 한 번 더 거쳐 출력해주겠다. 이때 몇개로 펼쳐줄지는 hidden_dim에 값을 넣어서 조정해준다.

cell = tf.contrib.rnn.BasicLSTMCell(num_units=hidden_dim, state_is_tuple=True)
outputs, _states = tf.nn.dynamic_rnn(cell, X, dtype=tf.float32)
Y_pred = tf.contrib.layers.fully_connected(
    outputs[:, -1], output_dim, activation_fn=None) 
    # We use the last cell's output. 
    # 예를들어 1일차부터 7일차까지 출력을 다 쓰는게 아니라 7일까지 모은 데이터의 출력을 쓰는 것이므로

# cost/loss
loss = tf.reduce_sum(tf.square(Y_pred - Y)) # sum of the squares
# optimizer
optimizer = tf.train.Adamoptimizer(0.01)
train = optimizer.minimize(loss)

Training and Results¶

sess = tf.Session()
sess.run(tf.global_variables_initializer())

for i in range(1000):
    _, l = sess.run([train, loss],
           feed_dict={X: trainX, Y: trainY})
    print(i, l)

testPredict = sess.run(Y_pred, feed_dict={X: testX})

import matplotlib.pyplot as plt
plt.plot(testY)
plt.plot(testPredict)
plt.show

구현¶

'''
This script shows how to predict stock prices using a basic RNN
'''
import tensorflow as tf
import numpy as np
import matplotlib
import os

tf.set_random_seed(777)  # reproducibility

if "DISPLAY" not in os.environ:
    # remove Travis CI Error
    matplotlib.use('Agg')

import matplotlib.pyplot as plt


def MinMaxScaler(data):
    ''' Min Max Normalization
    Parameters
    ----------
    data : numpy.ndarray
        input data to be normalized
        shape: [Batch size, dimension]
    Returns
    ----------
    data : numpy.ndarry
        normalized data
        shape: [Batch size, dimension]
    References
    ----------
    .. [1] http://sebastianraschka.com/Articles/2014_about_feature_scaling.html
    '''
    numerator = data - np.min(data, 0)
    denominator = np.max(data, 0) - np.min(data, 0)
    # noise term prevents the zero division
    return numerator / (denominator + 1e-7)


# train Parameters
seq_length = 7
data_dim = 5
hidden_dim = 10
output_dim = 1
learning_rate = 0.01
iterations = 500

# Open, High, Low, Volume, Close
xy = np.loadtxt('data-02-stock_daily.csv', delimiter=',')
xy = xy[::-1]  # reverse order (chronically ordered)

# train/test split
train_size = int(len(xy) * 0.7)
train_set = xy[0:train_size]
test_set = xy[train_size - seq_length:]  # Index from [train_size - seq_length] to utilize past sequence

# Scale each
train_set = MinMaxScaler(train_set)
test_set = MinMaxScaler(test_set)

# build datasets
def build_dataset(time_series, seq_length):
    dataX = []
    dataY = []
    for i in range(0, len(time_series) - seq_length):
        _x = time_series[i:i + seq_length, :]
        _y = time_series[i + seq_length, [-1]]  # Next close price
        print(_x, "->", _y)
        dataX.append(_x)
        dataY.append(_y)
    return np.array(dataX), np.array(dataY)

trainX, trainY = build_dataset(train_set, seq_length)
testX, testY = build_dataset(test_set, seq_length)

# input place holders
X = tf.placeholder(tf.float32, [None, seq_length, data_dim])
Y = tf.placeholder(tf.float32, [None, 1])

# build a LSTM network
cell = tf.contrib.rnn.BasicLSTMCell(
    num_units=hidden_dim, state_is_tuple=True, activation=tf.tanh)
outputs, _states = tf.nn.dynamic_rnn(cell, X, dtype=tf.float32)
Y_pred = tf.contrib.layers.fully_connected(
    outputs[:, -1], output_dim, activation_fn=None)  # We use the last cell's output

# cost/loss
loss = tf.reduce_sum(tf.square(Y_pred - Y))  # sum of the squares
# optimizer
optimizer = tf.train.AdamOptimizer(learning_rate)
train = optimizer.minimize(loss)

# RMSE
targets = tf.placeholder(tf.float32, [None, 1])
predictions = tf.placeholder(tf.float32, [None, 1])
rmse = tf.sqrt(tf.reduce_mean(tf.square(targets - predictions)))

with tf.Session() as sess:
    init = tf.global_variables_initializer()
    sess.run(init)

    # Training step
    for i in range(iterations):
        _, step_loss = sess.run([train, loss], feed_dict={
                                X: trainX, Y: trainY})
        print("[step: {}] loss: {}".format(i, step_loss))

    # Test step
    test_predict = sess.run(Y_pred, feed_dict={X: testX})
    rmse_val = sess.run(rmse, feed_dict={
                    targets: testY, predictions: test_predict})
    print("RMSE: {}".format(rmse_val))

    # Plot predictions
    plt.plot(testY)
    plt.plot(test_predict)
    plt.xlabel("Time Period")
    plt.ylabel("Stock Price")
    plt.show()

[[2.53065030e-01 2.45070970e-01 2.33983036e-01 4.66075110e-04
  2.32039560e-01]
 [2.29604366e-01 2.39728936e-01 2.54567513e-01 2.98467330e-03
  2.37426028e-01]
 [2.49235510e-01 2.41668371e-01 2.48338489e-01 2.59926504e-04
  2.26793794e-01]
 [2.21013495e-01 2.46602231e-01 2.54710584e-01 0.00000000e+00
  2.62668239e-01]
 [3.63433786e-01 3.70389871e-01 2.67168847e-01 1.24764722e-02
  2.62105010e-01]
 [2.59447633e-01 3.10673724e-01 2.74113889e-01 4.56323384e-01
  2.71751265e-01]
 [2.76008150e-01 2.78314566e-01 1.98470380e-01 5.70171193e-01
  1.78104644e-01]] -> [0.16053716]
[[2.29604366e-01 2.39728936e-01 2.54567513e-01 2.98467330e-03
  2.37426028e-01]
 [2.49235510e-01 2.41668371e-01 2.48338489e-01 2.59926504e-04
  2.26793794e-01]
 [2.21013495e-01 2.46602231e-01 2.54710584e-01 0.00000000e+00
  2.62668239e-01]
 [3.63433786e-01 3.70389871e-01 2.67168847e-01 1.24764722e-02
  2.62105010e-01]
 [2.59447633e-01 3.10673724e-01 2.74113889e-01 4.56323384e-01
  2.71751265e-01]
 [2.76008150e-01 2.78314566e-01 1.98470380e-01 5.70171193e-01
  1.78104644e-01]
 [1.59015228e-01 1.78651664e-01 1.41728657e-01 3.93806579e-01
  1.60537160e-01]] -> [0.21950626]
[[2.49235510e-01 2.41668371e-01 2.48338489e-01 2.59926504e-04
  2.26793794e-01]
 [2.21013495e-01 2.46602231e-01 2.54710584e-01 0.00000000e+00
  2.62668239e-01]
 [3.63433786e-01 3.70389871e-01 2.67168847e-01 1.24764722e-02
  2.62105010e-01]
 [2.59447633e-01 3.10673724e-01 2.74113889e-01 4.56323384e-01
  2.71751265e-01]
 [2.76008150e-01 2.78314566e-01 1.98470380e-01 5.70171193e-01
  1.78104644e-01]
 [1.59015228e-01 1.78651664e-01 1.41728657e-01 3.93806579e-01
  1.60537160e-01]
 [1.65432462e-01 2.00836760e-01 1.93494176e-01 2.81733441e-01
  2.19506258e-01]] -> [0.25203622]
  
[[0.22101349 0.24660223 0.25471058 0.         0.26266824]
 [0.36343379 0.37038987 0.26716885 0.01247647 0.26210501]
 [0.25944763 0.31067372 0.27411389 0.45632338 0.27175127]
 [0.27600815 0.27831457 0.19847038 0.57017119 0.17810464]
 [0.15901523 0.17865166 0.14172866 0.39380658 0.16053716]
 [0.16543246 0.20083676 0.19349418 0.28173344 0.21950626]
 [0.22415317 0.23612204 0.2340904  0.29783096 0.25203622]] -> [0.17039458]
[[0.36343379 0.37038987 0.26716885 0.01247647 0.26210501]
 [0.25944763 0.31067372 0.27411389 0.45632338 0.27175127]
 [0.27600815 0.27831457 0.19847038 0.57017119 0.17810464]
 [0.15901523 0.17865166 0.14172866 0.39380658 0.16053716]
 [0.16543246 0.20083676 0.19349418 0.28173344 0.21950626]
 [0.22415317 0.23612204 0.2340904  0.29783096 0.25203622]
 [0.24271481 0.23486317 0.18737251 0.36110962 0.17039458]] -> [0.1339569]
[[0.25944763 0.31067372 0.27411389 0.45632338 0.27175127]
 [0.27600815 0.27831457 0.19847038 0.57017119 0.17810464]
 [0.15901523 0.17865166 0.14172866 0.39380658 0.16053716]
 [0.16543246 0.20083676 0.19349418 0.28173344 0.21950626]
 [0.22415317 0.23612204 0.2340904  0.29783096 0.25203622]
 [0.24271481 0.23486317 0.18737251 0.36110962 0.17039458]
 [0.13075879 0.14979736 0.13950917 0.35107108 0.1339569 ]] -> [0.14071632]

...

 [0.88939504 0.88829938 0.94014512 0.13380794 0.90030461]] -> [0.93124657]
[[0.81529885 0.82251706 0.84757626 0.1060959  0.83698714]
 [0.83034597 0.81556125 0.85575466 0.07523435 0.84403567]
 [0.84347469 0.8426171  0.88749985 0.10121322 0.86858608]
 [0.86925245 0.87626864 0.92209184 0.1140722  0.90185774]
 [0.88723699 0.88829938 0.92518158 0.08714288 0.90908564]
 [0.88939504 0.88829938 0.94014512 0.13380794 0.90030461]
 [0.89281215 0.89655181 0.94323484 0.12965206 0.93124657]] -> [0.95460261]
[[0.83034597 0.81556125 0.85575466 0.07523435 0.84403567]
 [0.84347469 0.8426171  0.88749985 0.10121322 0.86858608]
 [0.86925245 0.87626864 0.92209184 0.1140722  0.90185774]
 [0.88723699 0.88829938 0.92518158 0.08714288 0.90908564]
 [0.88939504 0.88829938 0.94014512 0.13380794 0.90030461]
 [0.89281215 0.89655181 0.94323484 0.12965206 0.93124657]
 [0.91133638 0.91818448 0.95944078 0.1885611  0.95460261]] -> [0.97604677]

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.

WARNING:tensorflow:From <ipython-input-2-217d72c284f7>:82: BasicLSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.
WARNING:tensorflow:From <ipython-input-2-217d72c284f7>:83: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `keras.layers.RNN(cell)`, which is equivalent to this API
WARNING:tensorflow:From C:\Users\whanh\AppData\Local\Continuum\anaconda3\lib\site-packages\tensorflow\python\ops\tensor_array_ops.py:162: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
    
[step: 0] loss: 166.2140655517578
[step: 1] loss: 105.05767822265625
[step: 2] loss: 59.56279754638672

...

[step: 497] loss: 0.7032116055488586
[step: 498] loss: 0.7025963664054871
[step: 499] loss: 0.7019816637039185

RMSE: 0.051355425268411636

C:\Users\whanh\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\figure.py:445: UserWarning: Matplotlib is currently using agg, which is a non-GUI backend, so cannot show the figure.
  % get_backend())

그림이 안나오는데 GUI를 사용해야 하나보다.

Exercise¶

implement stock prediction using linear regression only
improve results using more features such as keywords and/or sentiments in top news

other RNN applications¶

Language Modeling
Speech Recognition
Machine Translation
Conversation Modiling/Question Answering
Image/Video Captioning
Image/Music/Dance Generation

출처 : https://www.inflearn.com/course/%EA%B8%B0%EB%B3%B8%EC%A0%81%EC%9D%B8-%EB%A8%B8%EC%8B%A0%EB%9F%AC%EB%8B%9D-%EB%94%A5%EB%9F%AC%EB%8B%9D-%EA%B0%95%EC%A2%8C/lecture/3430

Pytorch란? (0)	2019.07.20
Deep Deep Network AWS 에서 GPU와 돌려보기 (0)	2019.05.18
RNN with Time Series Data (0)	2019.05.18
Stacked RNN + Softmax Layer (0)	2019.05.18
Long Sequence RNN (0)	2019.05.16

조환희의 학습 블로그

티스토리 뷰

RNN with Time Series Data