로지스틱 회귀¶

Iris 데이터 셋을 로지스틱 회귀를 사용하여 분류해보자.

from sklearn import datasets
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
iris = datasets.load_iris()
list(iris.keys())

['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename']

X = iris['data'][:,3:]
y = (iris['target']==2).astype(np.int)

꽃잎의 너비(X)를 기반으로 Iris-Versicolor(versicolor면 1 아니면 0)를 감지하는 분류기를 만든다

from sklearn.linear_model import LogisticRegression
log_reg = LogisticRegression(solver='liblinear', random_state=42)
log_reg.fit(X,y)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='warn',
          n_jobs=None, penalty='l2', random_state=42, solver='liblinear',
          tol=0.0001, verbose=0, warm_start=False)

sklearn의 LogisticRegression은 클래스 레이블과 클래스에 속할 확률을 반환하는 메서드를 가지고 있다.
꽃잎의 너비 X와 레이블 y를 적용하여 훈련시켰다.

X_n = np.linspace(0,3,100).reshape(-1,1)
y_p = log_reg.predict_proba(X_n)

새롭게 만든 X_n에 훈련된 모델을 적합시켜 확률이 계산된다.
따라서 0.5가 넘는 부분의 첫 확률을 결정의 경계로 삼는다.

decision_boundary = X_n[y_p[:,1]>=0.5][0]
plt.figure(figsize=(8,3))
plt.plot(X[y==0],y[y==0],'bs')
plt.plot(X[y==1],y[y==1],'g^')
plt.plot([decision_boundary, decision_boundary],[-1,2],'k:',linewidth=2)
plt.plot(X_n, y_p[:, 1], 'g-', linewidth=2, label='Iris-Virginica')
plt.plot(X_n, y_p[:, 0], 'b--', linewidth=2, label='Not Iris-Virginica')
plt.text(decision_boundary+0.02, 0.15, 'Decision Boundary', fontsize=14, color='k', ha='center')
plt.arrow(decision_boundary, 0.08, -0.3, 0, head_width=0.05, head_length=0.1, fc='b', ec='b')
plt.arrow(decision_boundary, 0.92, 0.3, 0, head_width=0.05, head_length=0.1, fc='g', ec='g')
plt.xlabel('petal width(cm)', fontsize=14)
plt.xlabel('probability        ', fontsize=14, rotation=0)
plt.legend(loc='center left', fontsize=14)
plt.axis([0,3,-0.02,1.02])
plt.show()

decision_boundary

array([1.63636364])

결정 경계는 1.63으로 나왔고, 아래에서 예측을 해보면 1.63이상인 것은 1아니면 0으로 분류했다.

log_reg.predict([[1.7],[1.5]])

array([1, 0])

그래프를 보면 양쪽의 확률이 50%가 되는 1.6cm 근방에서 결정의 경계가 만들어진다.
따라서 1.6cm을 기준으로 크면 Iris-verginica라고 분류하고, 작으면 아니라고 예측할 것이다.

from sklearn.linear_model import LogisticRegression

X = iris['data'][: , (2, 3)]
y = (iris['target'] == 2).astype(np.int)

log_reg = LogisticRegression(solver = 'liblinear', C=10**10, random_state=42)
log_reg.fit(X,y)

x0, x1 = np.meshgrid(
                np.linspace(2.9, 7, 500).reshape(-1,1),
                np.linspace(0.8, 2.7, 200).reshape(-1,1),
                )
X_n = np.c_[x0.ravel(), x1.ravel()]
y_p = log_reg.predict_proba(X_n)

plt.figure(figsize=(10,4))
plt.plot(X[y==0, 0], X[y==0, 1], 'bs')
plt.plot(X[y==1, 0], X[y==1, 1], 'g^')

zz = y_p[:, 1].reshape(x0.shape)
contour = plt.contour(x0, x1, zz, cmap=plt.cm.brg)

left_right = np.array([2.9, 7])
boundary = -(log_reg.coef_[0][0] * left_right + log_reg.intercept_[0] / log_reg.coef_[0][1])

plt.clabel(contour, inline=1, fontsize=12)
plt.plot(left_right, boundary, 'k--', linewidth=3)
plt.text(3.5, 1.5, 'Not Iris_Virginica', fontsize=14, color='b', ha='center')
plt.text(6.5, 2.3, 'Iris_Virginica', fontsize=14, color='g', ha='center')
plt.xlabel('Petal Length', fontsize=14)
plt.ylabel('Petal Width                       ', fontsize=14, rotation=0)
plt.axis([2.9, 7, 0.8, 2.7])
plt.show()

출처: https://analysis-flood.tistory.com/89 [Data Analysis]

Pandas를 이용한 지하철 데이터 분석 (0)	2019.03.12
분석 -1 (0)	2019.03.06
인공신경망 실습 (2)	2019.02.10
Iris 데이터를 이용해 간단한 랜덤 포레스트 구현 (1)	2019.02.06
의사결정 트리 파이썬 코드 실습 (3)	2019.02.02

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

조환희의 학습 블로그

티스토리 뷰

로지스틱 회귀분석 실습

로지스틱 회귀¶

'beginner > 파이썬 분석' 카테고리의 다른 글

티스토리툴바