본문 바로가기

딥러닝,패턴인식,빅데이터

빅데이터 분석기사 3회 기출문제 풀이(2)

 
# 아래 여행객 보험 데이터셋을 참조하여 여행객의 정보들을 기반으로 여행보험 상품 가입 여부 예측하시오.
## (ID와 예측치를 csv 파일로 저장하여 제출하시오)
 
 
import pandas as pd

test = pd.read_csv("3rd_TravelInsurancePrediction_test.csv")
train = pd.read_csv("3rd_TravelInsurancePrediction_train.csv")
X = train[['Age', 'Employment Type', 'GraduateOrNot', 'AnnualIncome', 'FamilyMembers', 'ChronicDiseases', 'FrequentFlyer', 'EverTravelledAbroad']]
y=train[['TravelInsurance']]

X_num = X[['Age', 'AnnualIncome', 'FamilyMembers', 'ChronicDiseases']]
X_cat = X[['Employment Type', 'GraduateOrNot', 'FrequentFlyer', 'EverTravelledAbroad']]
X_cat=pd.get_dummies(X_cat)

test_num = test[['Age', 'AnnualIncome', 'FamilyMembers', 'ChronicDiseases']]
test_cat = test[['Employment Type', 'GraduateOrNot', 'FrequentFlyer', 'EverTravelledAbroad']]
test_cat = pd.get_dummies(test_cat)

X_cat, test_cat = X_cat.align(test_cat, join='inner', axis=1)

from sklearn.preprocessing import MinMaxScaler
minmax=MinMaxScaler()
minmax.fit(X_num)
X_scaled=minmax.transform(X_num)
test_scaled=minmax.transform(test_num)

X_final = pd.concat([pd.DataFrame(X_scaled), X_cat], axis=1)
test_final = pd.concat([pd.DataFrame(test_scaled), test_cat], axis=1)

y = y['TravelInsurance']

from sklearn.linear_model import LogisticRegression
model=LogisticRegression()
model.fit(X_final, y)

pred_test=model.predict_proba(test_final)
pred_test_prob = pd.DataFrame(pred_test[:, 1], columns = ['predict_prob'])
final_predict = pd.concat([test['ID'], pred_test_prob], axis=1)
# print(final_predict)
final_predict.to_csv("20211204.csv", index=False)

결과 확인

model.score(X_final, y)

0.7621585609593604