장바구니 알고리즘

티스토리 뷰

beginner/파이썬 기초

johh 2019. 3. 5. 00:15

Untitled

In [3]:

import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori

In [4]:

dataset= [['양말','팬티','신발'],
         ['신발','바지','팬티','셔츠'],
         ['모자','양말','신발'],
         ['신발','바지','팬티','장갑']]

In [6]:

t = TransactionEncoder()
t_a = t.fit(dataset).transform(dataset)
df = pd.DataFrame(t_a, columns = t.columns_)
df

Out[6]:

In [7]:

frequent = apriori(df, min_support=0.5, use_colnames=True)
frequent

Out[7]:

바지를 살 확률은 0.5, 팬티, 바지, 신발을 같이 살 확률도 0.5

In [15]:

from mlxtend.frequent_patterns import association_rules
association_rules(frequent, metric='confidence', min_threshold=0.2)

Out[15]:

	antecedents	consequents	antecedent support	consequent support	support	confidence	lift	leverage	conviction
0	(바지)	(신발)	0.50	1.00	0.50	1.000000	1.000000	0.000	inf
1	(신발)	(바지)	1.00	0.50	0.50	0.500000	1.000000	0.000	1.000000
2	(팬티)	(바지)	0.75	0.50	0.50	0.666667	1.333333	0.125	1.500000
3	(바지)	(팬티)	0.50	0.75	0.50	1.000000	1.333333	0.125	inf
4	(양말)	(신발)	0.50	1.00	0.50	1.000000	1.000000	0.000	inf
5	(신발)	(양말)	1.00	0.50	0.50	0.500000	1.000000	0.000	1.000000
6	(팬티)	(신발)	0.75	1.00	0.75	1.000000	1.000000	0.000	inf
7	(신발)	(팬티)	1.00	0.75	0.75	0.750000	1.000000	0.000	1.000000
8	(팬티, 바지)	(신발)	0.50	1.00	0.50	1.000000	1.000000	0.000	inf
9	(팬티, 신발)	(바지)	0.75	0.50	0.50	0.666667	1.333333	0.125	1.500000
10	(바지, 신발)	(팬티)	0.50	0.75	0.50	1.000000	1.333333	0.125	inf
11	(팬티)	(바지, 신발)	0.75	0.50	0.50	0.666667	1.333333	0.125	1.500000
12	(바지)	(팬티, 신발)	0.50	0.75	0.50	1.000000	1.333333	0.125	inf
13	(신발)	(팬티, 바지)	1.00	0.50	0.50	0.500000	1.000000	0.000	1.000000

3열을 보면 lift가 1보다 크므로 바지를 구매한 고객이 팬티를 구매할 확률이 높다는 것을 알 수 있다.

In [ ]:

출처: http://blog.naver.com/PostView.nhn?blogId=eqfq1&logNo=221444712369&parentCategoryNo=&categoryNo=45&viewDate=&isShowPopularPosts=true&from=search

공지사항

최근에 올라온 글

최근에 달린 댓글

링크

글 보관함