간단한 예측과 비용함수(cost function)¶

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

data = np.random.randint(0,11,size=10)
print('점수 :', data)
print('분포 :', np.bincount(data))

점수 : [3 5 9 0 0 6 7 5 0 2]
분포 : [3 0 1 1 0 2 1 1 0 1]

data.mean()

3.7

plt.plot(data, 'bo-')
plt.ylabel('score')
plt.xlabel('student #')

Text(0.5,0,'student #')

plt.hist(data,bins=range(11))
plt.xlabel('score')
plt.ylabel('count')

Text(0,0.5,'count')

model = data.mean()
model

3.7

plt.plot(data, 'bo-')
plt.hlines([model], 0, 10, linestyles='dotted', colors='green')
plt.xlabel('student #')
plt.ylabel('score')

Text(0,0.5,'score')

# 차이의 절대값 평균
cost_abs = np.abs(data - model).sum() / len(data)
cost_abs

2.04

# 차이의 제곱 평균 (MSE)
cost_mse = np.power(data-model, 2).sum() / len(data)
cost_mse

5.959999999999999

# 차이의 제곱 평균의 제곱근 (RMSE)
cost_rmse = np.sqrt(cost_mse)
cost_rmse

2.4413111231467406

먼저 상수값 b 로 예측하는 모델을 만들자.
그리고 비용함수로 costmse 를 사용하자. $$ cost = \sum{i=0}^{N-1} (x_i - b)^2 / N $$ $$ = b^2 -(\frac {2}{N} \sum x_i)b + \frac {1}{N}(\sum x_i^2)) $$ $$ \hat{b} = \frac{1}{N} \sum x_i $$
결과적으로, 최적의 예측값은 평균값임을 알 수 있다.

X = np.array(range(11)).reshape(-1,1) # 반드시 2차원 행렬 형태
y = data
pred_y = y.mean()

display(X, y, pred_y)

array([[ 0],
       [ 1],
       [ 2],
       [ 3],
       [ 4],
       [ 5],
       [ 6],
       [ 7],
       [ 8],
       [ 9],
       [10]])

array([ 8,  4,  5, 10,  7, 10,  2,  6,  8,  8])

6.8

조환희의 학습 블로그