그래디언트 부스팅 (Gradient Boosting)¶

그래디언트 부스팅도 랜덤 포레스트 처럼 나무를 여러개 만든다. 단, 한꺼번에 나무를 만들지 않고 나무를 하나 만든 다음 그것의 오차를 줄이는 방법으로 다음 나무를 만들고, 이런 과정을 단계적으로 진행한다.
그래디언트 부스팅은 머신러닝 경연대회에서 우승을 많이 차지하였다. 어떻게 보면 점수를 올리기 위해 마지막 까지 모든 가능성을 쥐어짜는 방식이라고도 할 수 있다.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import GradientBoostingClassifier

cancer = load_breast_cancer()

X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target)

model = GradientBoostingClassifier()
model.fit(X_train,y_train)

train_score = model.score(X_train, y_train)
test_score = model.score(X_test, y_test)
display(train_score, test_score)

1.0

0.986013986013986

model

GradientBoostingClassifier(criterion='friedman_mse', init=None,
              learning_rate=0.1, loss='deviance', max_depth=3,
              max_features=None, max_leaf_nodes=None,
              min_impurity_decrease=0.0, min_impurity_split=None,
              min_samples_leaf=1, min_samples_split=2,
              min_weight_fraction_leaf=0.0, n_estimators=100,
              presort='auto', random_state=None, subsample=1.0, verbose=0,
              warm_start=False)

주목해야 할 인자는 n_estimators 와 max_depth, learning_rate 이다.
n_estimators 는 나무의 갯수이다.
그래디언트 부스팅은 깊이를 작게하고 나무의 갯수를 늘리는 전략을 많이 취한다.
learning_rate는 학습률로서, 이전에 만든 나무의 오류에 기반하여 얼마나 많이 수정해 나갈지의 비율을 의미한다. 값이 클수록 복잡한 모델을 만들게 된다.

X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target)

model =GradientBoostingClassifier(n_estimators=1000, max_depth=1)
model.fit(X_train, y_train)

train_score = model.score(X_train, y_train)
test_score = model.score(X_test, y_test)
display(train_score, test_score)

1.0

0.965034965034965

지도학습-경사하강법 (0)	2019.04.01
지도학습-나이브베이즈 (0)	2019.03.30
지도학습 - 랜덤포레스트 (0)	2019.03.17
지도학습 - 결정트리 (0)	2019.03.12
유방암 데이터 분석 by SVM (0)	2019.03.12

일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

조환희의 학습 블로그

티스토리 뷰

지도학습 - 그래디언트 부스팅(별표)

그래디언트 부스팅 (Gradient Boosting)¶

'beginner > 파이썬 머신러닝 기초' 카테고리의 다른 글

티스토리툴바