自学内容网 自学内容网

8评分卡建模整体流程梳理

评分卡建模整体流程梳理

学习目标

  1. 掌握评分卡建模流程
  2. 使用Toad库构建评分卡

1 加载数据

import pandas as pd  
from sklearn.metrics import roc_auc_score,roc_curve,auc  
from sklearn.model_selection import train_test_split  
from sklearn.linear_model import LogisticRegression   
import numpy as np  
import math  
import xgboost as xgb  
import toad  
# 加载数据
data_all = pd.read_csv("scorecard.txt")  

# 指定不参与训练列名  
ex_lis = ['uid', 'samp_type', 'bad_ind']  
# 参与训练列名  
ft_lis = list(data_all.columns)  
for i in ex_lis:      
    ft_lis.remove(i) 

# 开发样本、验证样本与时间外样本  
dev = data_all[(data_all['samp_type'] == 'dev')]
val = data_all[(data_all['samp_type'] == 'val') ]  
off = data_all[(data_all['samp_type'] == 'off') ]  

探索性数据分析,同时处理数值型和字符型

toad.detector.detect(data_all)

显示结果:

type size missing unique mean_or_top1 std_or_top2 min_or_top3 1%_or_top4 10%_or_top5 50%_or_bottom5 75%_or_bottom4 90%_or_bottom3 99%_or_bottom2 max_or_bottom1
bad_ind float64 95806 0.00% 2 0.0187671 0.135702 0 0 0 0 0 0 1 1
uid object 95806 0.00% 95806 Ab99_96002866062686144:0.00% A7511004:0.00% A10729014:0.00% A8502810:0.00% A594541:0.00% A8899777:0.00% A10150838:0.00% A3044048:0.00% A1888452:0.00% A7659794:0.00%
td_score float64 95806 0.00% 95806 0.499739 0.288349 5.46966e-06 0.00961341 0.0997056 0.500719 0.747984 0.900024 0.990041 0.999999
jxl_score float64 95806 0.00% 95806 0.499338 0.28885 1.28155e-05 0.00994678 0.0991025 0.499795 0.748646 0.899703 0.989348 0.999985
mj_score float64 95806 0.00% 95806 0.50164 0.288679 6.92442e-06 0.0105076 0.100882 0.503048 0.752032 0.899308 0.990047 0.999993
rh_score float64 95806 0.00% 95806 0.498407 0.287797 5.00212e-06 0.00991632 0.0999483 0.497466 0.747188 0.899286 0.989473 0.999986
zzc_score float64 95806 0.00% 95806 0.500627 0.289067 1.15778e-05 0.0101856 0.0990114 0.501688 0.750986 0.899924 0.990043 0.999998
zcx_score float64 95806 0.00% 95806 0.499672 0.289137 9.97767e-06 0.0103249 0.0997429 0.49913 0.750683 0.901942 0.989712 0.999987
person_info float64 95806 0.00% 7 -0.078229 0.156859 -0.322581 -0.322581 -0.322581 -0.0537176 0.078853 0.078853 0.078853 0.078853
finance_info float64 95806 0.00% 35 0.0367625 0.0396866 0.0238095 0.0238095 0.0238095 0.0238095 0.0238095 0.0714286 0.214286 1.02381
credit_info float64 95806 0.00% 100 0.0636262 0.143098 0 0 0 0 0.06 0.18 0.8 1
act_info float64 95806 0.00% 74 0.236197 0.157132 0.0769231 0.0769231 0.0769231 0.205128 0.346154 0.487179 0.615385 1.08974
samp_type object 95806 0.00% 3 dev:68.16% off:16.67% val:15.16% None None None None dev:68.16% off:16.67% val:15.16%

2 特征筛选(缺失值,IV,相关系数)

使用缺失率、IV、相关系数进行特征筛选。但是考虑到后续建模过程要对变量进行分箱处理,该操作会使变量的IV变小,变量间的相关性变大,因此此处可以对IV和相关系的阈值限制适当放松,或不做限制

dev_slct1, drop_lst= toad.selection.select(dev, dev['bad_ind'], 
                                                   empty=0.7, iv=0.03, 
                                                   corr=0.7, 
                                                   return_drop=True, 
                                                   exclude=ex_lis) 
print("keep:", dev_slct1.shape[1],  
      "drop empty:", len(drop_lst['empty']), 
      "drop iv:", len(drop_lst['iv']),  
      "drop corr:", len(drop_lst['corr']))

显示结果:

keep: 12 drop empty: 0 drop iv: 1 drop corr: 0

3 卡方分箱

# 得到切分节点  
combiner = toad.transform.Combiner()  
combiner.fit(dev_slct1, dev_slct1['bad_ind'], method='chi',
                min_samples=0.05, exclude=ex_lis)  
# 导出箱的节点  
bins = combiner.export()  
print(bins)

显示结果:

{'td_score': [0.7989831262724624], 'jxl_score': [0.4197048501965005], 'mj_score': [0.3615303943747963], 'zzc_score': [0.4469861520889339], 'zcx_score': [0.7007847486465795], 'person_info': [-0.2610139784946237, -0.1286774193548387, -0.05371756272401434, 0.013863440860215051, 0.06266021505376344, 0.07885304659498207], 'finance_info': [0.047619047619047616], 'credit_info': [0.02, 0.04, 0.11], 'act_info': [0.1153846153846154, 0.14102564102564102, 0.16666666666666666, 0.20512820512820512, 0.2692307692307692, 0.35897435897435903, 0.3974358974358974, 0.5256410256410257]}

4 Bivar图,调整分箱

画图观察每个变量在开发样本和时间外样本上的Bivar图,为方便阅读,这里只以单变量act_info做示范

# 根据节点实施分箱  
dev_slct2 = combiner.transform(dev_slct1)
val2 = combiner.transform(val[dev_slct1.columns])
off2 = combine

原文地址:https://blog.csdn.net/weixin_34280060/article/details/138904658

免责声明:本站文章内容转载自网络资源,如本站内容侵犯了原著者的合法权益,可联系本站删除。更多内容请关注自学内容网(zxcms.com)!