티스토리 뷰

0장_거리

거리(distance) 개념

In [1]:
%pylab inline

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
Populating the interactive namespace from numpy and matplotlib
In [2]:
s = open('iris.csv').readline()
#header = [i.strip('"') for i in s.strip().split(',')][:-1]
header = s.strip().split(',')[:-1]
header
Out[2]:
['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth']
In [3]:
labels = ['Iris-setosa', 'Iris-versicolor', 'Iris-virginica']
iris = np.loadtxt('iris.csv', delimiter=',', skiprows=1, converters={4: lambda s: labels.index(s.decode())})
In [4]:
display(iris.shape, iris[:5])
(150, 5)
array([[5.1, 3.5, 1.4, 0.2, 0. ],
       [4.9, 3. , 1.4, 0.2, 0. ],
       [4.7, 3.2, 1.3, 0.2, 0. ],
       [4.6, 3.1, 1.5, 0.2, 0. ],
       [5. , 3.6, 1.4, 0.2, 0. ]])
In [5]:
X = iris[:,:4]
y = iris[:,4]

두 점 간의 거리 계산 공식

$$ distance(x_1, x_2) = \sqrt {\sum_{i=0}^{N-1} (x_{1i} - x_{2i})^2},\ N = \#\ of\ features $$

In [8]:
x1 = X[0]
x2 = X[49]
display(x1, x2)

distance = np.sqrt(((x1-x2)**2).sum())
distance
array([5.1, 3.5, 1.4, 0.2])
array([5. , 3.3, 1.4, 0.2])
Out[8]:
0.22360679774997896
In [8]:
x1 = x1[:2]
x2 = x2[:2]

display(x1, x2)
plt.plot([x1[0],x2[0]], [x1[1],x2[1]], 'bo--')
plt.axis('equal')
array([5.1, 3.5])
array([4.9, 3. ])
Out[8]:
(4.890000000000001, 5.109999999999999, 2.975, 3.525)
In [9]:
distance = np.sqrt(((x1-x2)**2).sum())
distance
Out[9]:
0.5385164807134502

한점과 다른 모든 점들 간의 거리 계산

In [9]:
distances = []
for i in range(150):
    n = np.sqrt(((X[0]-X[i])**2).sum()) # x0 와 X[i] 간의 거리
    distances.append(n)
distances
Out[9]:
[0.0,
 0.5385164807134502,
 0.509901951359278,
 0.648074069840786,
 0.1414213562373093,
 0.6164414002968979,
 0.5196152422706632,
 0.17320508075688762,
 0.9219544457292882,
 0.4690415759823426,
 0.37416573867739483,
 0.3741657386773941,
 0.5916079783099616,
 0.9949874371066197,
 0.8831760866327848,
 1.1045361017187267,
 0.5477225575051664,
 0.09999999999999998,
 0.7416198487095667,
 0.33166247903553986,
 0.4358898943540679,
 0.30000000000000016,
 0.648074069840786,
 0.46904157598234303,
 0.5916079783099616,
 0.5477225575051662,
 0.316227766016838,
 0.14142135623730995,
 0.14142135623730995,
 0.53851648071345,
 0.5385164807134504,
 0.3872983346207423,
 0.6244997998398396,
 0.8062257748298554,
 0.4690415759823426,
 0.37416573867739383,
 0.41231056256176635,
 0.4690415759823426,
 0.866025403784438,
 0.14142135623730964,
 0.17320508075688743,
 1.3490737563232043,
 0.7681145747868601,
 0.45825756949558394,
 0.6164414002968975,
 0.5916079783099616,
 0.3605551275463989,
 0.58309518948453,
 0.30000000000000027,
 0.22360679774997896,
 4.003748243833521,
 3.6166282640050254,
 4.164132562731403,
 3.093541659651604,
 3.792097045171708,
 3.416138170507745,
 3.7854986461495406,
 2.345207879911715,
 3.749666651850535,
 2.8879058156387303,
 2.703701166919155,
 3.228002478313795,
 3.146426544510455,
 3.7,
 2.5806975801127883,
 3.627671429443412,
 3.4351128074635335,
 3.009983388658482,
 3.7682887362833544,
 2.882707061079915,
 3.8535697735995385,
 3.0757112998459397,
 4.047221268969613,
 3.6578682316343767,
 3.416138170507745,
 3.59722114972099,
 4.047221268969612,
 4.244997055358225,
 3.531288716601915,
 2.4939927826679855,
 2.8178005607210745,
 2.7018512172212596,
 2.8948229652260253,
 4.135214625627066,
 3.411744421846396,
 3.5199431813596087,
 3.9115214431215897,
 3.6180105030251095,
 2.9999999999999996,
 3.0215889859476257,
 3.3120990323358397,
 3.59583091927304,
 3.0099833886584824,
 2.387467277262665,
 3.1527765540868895,
 3.07408522978788,
 3.1256999216175574,
 3.3451457367355464,
 2.0904544960366875,
 3.057776970284131,
 5.2848841046895245,
 4.208325082500163,
 5.301886456724625,
 4.690415759823429,
 5.056678751908213,
 6.0950799830683104,
 3.591656999213594,
 5.636488268416782,
 5.047771785649585,
 5.639148871948673,
 4.356604182158392,
 4.519955751995809,
 4.853864439804639,
 4.190465367951393,
 4.417012565071555,
 4.626013402488151,
 4.645427859734774,
 6.240192304729079,
 6.498461356351979,
 4.141255848169732,
 5.121523210920752,
 4.028647415696738,
 6.211280061307815,
 4.109744517606904,
 4.969909455915672,
 5.31224999411737,
 3.9774363602702683,
 4.007492981902776,
 4.840454524112379,
 5.0970579749498635,
 5.546169849544818,
 6.014149981501959,
 4.880573736764972,
 4.160528812542944,
 4.570557952810575,
 5.788782255362521,
 4.891829923454003,
 4.606517122512408,
 3.8961519477556315,
 4.796873982084583,
 5.0199601592044525,
 4.636809247747852,
 4.208325082500163,
 5.2573757712379665,
 5.136146415358503,
 4.654030511288039,
 4.27668095606862,
 4.459820624195552,
 4.650806381693394,
 4.1400483088968905]
In [10]:
plt.plot(distances)
Out[10]:
[<matplotlib.lines.Line2D at 0x1a3bd00c0f0>]
In [11]:
distance = np.sqrt(((X - X[0])**2).sum(axis=1))
distance[:5]
Out[11]:
array([0.        , 0.53851648, 0.50990195, 0.64807407, 0.14142136])
In [25]:
plt.figure(figsize=[10,4])
plt.plot(distance, 'bo-')
plt.xlabel('samples', fontsize=15)
plt.ylabel('distance', fontsize=15)
plt.xticks(range(0,151,25), [0, 'Setosa', 50, 'Versicolor', 75, 'Virginica', 150])
plt.yticks(range(0,8))
plt.grid()
# plt.vlines([50,100], 0, 10, linestyles='dotted')
plt.title('distances from X[0]', fontsize=20)
Out[25]:
Text(0.5,1,'distances from X[0]')

'beginner > 파이썬 머신러닝 기초' 카테고리의 다른 글

지도학습개요  (0) 2019.02.25
머신러닝 기초_비용함수  (0) 2019.02.22
머신러닝 기초_iris활용  (0) 2019.02.22
머신러닝과 파이썬  (0) 2019.02.21
Scikit-learn 기초  (0) 2019.02.20
공지사항
최근에 올라온 글
최근에 달린 댓글
Total
Today
Yesterday
링크
«   2024/05   »
1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31
글 보관함