In [ ]:
In [ ]:
아이 용품과 가장 연관이 높은 물건은 무엇일까?¶
In [41]:
# https://www.kaggle.com/c/instacart-market-basket-analysis/data
In [2]:
from glob import glob
import pandas as pd
In [3]:
glob("*")
Out[3]:
In [4]:
pd.read_csv('products.csv').head()
Out[4]:
In [5]:
pd.read_csv('order_products__train.csv').head()
Out[5]:
In [6]:
pd.read_csv('order_products__prior.csv').head()
Out[6]:
In [7]:
pd.read_csv('orders.csv').head()
Out[7]:
In [80]:
pd.read_csv('departments.csv').head(100)
Out[80]:
In [9]:
pd.read_csv('aisles.csv').head()
Out[9]:
In [89]:
aisle = pd.read_csv('aisles.csv')
pdt = pd.read_csv('products.csv')[['product_id', 'aisle_id']]
pd_ai = pd.merge(aisle, pdt)
pd_ai.head()
Out[89]:
In [90]:
d_1 = pd.read_csv('order_products__train.csv')[['order_id', 'product_id', 'reordered']]
d_2 = pd.read_csv('order_products__prior.csv')[['order_id', 'product_id', 'reordered']]
order_product = pd.concat([d_1, d_2])
order_product.head()
Out[90]:
In [91]:
combined = pd.merge(order_product, pd_ai)[['order_id', 'aisle', 'reordered']]
combined.head()
Out[91]:
In [92]:
user_order = pd.read_csv('orders.csv')[['user_id', 'order_id']]
user_order.head()
Out[92]:
In [94]:
final = pd.merge(user_order, combined)[['user_id', 'aisle', 'reordered']]
print(final.shape)
final.head()
Out[94]:
In [95]:
import numpy as np
%matplotlib inline
user_count = final.groupby('user_id').agg({'reordered' : [np.size, np.mean, np.std]})[:]
user_count.hist(('reordered', 'size'), bins=500)
print(user_count.shape)
In [102]:
user_count.loc[user_count[('reordered', 'size')] > 50, 'heavy'] = 1
grouped_idx = user_count[user_count['heavy'] == 1]
grouped_idx.hist(('reordered', 'size'), bins=500)
print(grouped_idx.shape)
In [103]:
final_filtered = final.loc[final['user_id'].isin(grouped_idx.index)]
final_filtered.head()
Out[103]:
In [104]:
print(final.shape)
print(final_filtered.shape)
In [105]:
pvt = final_filtered.pivot_table(index = 'user_id', columns = 'aisle', values = 'reordered')
pvt.head()
Out[105]:
In [106]:
corr = pvt.corr()
corr.head()
Out[106]:
In [107]:
corr.sort_values(by=('baby accessories'), ascending=False)['baby accessories']
Out[107]:
In [ ]:
In [ ]:
'[중급] 가볍게 이것저것' 카테고리의 다른 글
월마트 맥주와 기저귀 썰에 대한 부분. (0) | 2020.03.25 |
---|---|
[R]소득수준 / 소비수준 / 나이 / 성별을 기반으로 고객군 군집화 분석 예제 (0) | 2019.11.11 |
고객 장바구니 분석 level_1 (0) | 2019.11.08 |
[Turbofan] 설비 잔존수명 예측 / 예방보전 / 관리한계선 설정방법 (0) | 2019.11.06 |
신용카드 거래내역 기반 고객 마케팅 적용기법 (0) | 2019.11.06 |