机器学习实战笔记30-31：逻辑回归及对应调参实验代码

🕗 发布于 2024-11-17 13:50 机器学习 笔记逻辑回归 人工智能 算法

逻辑回归模型参数详解

求解对偶问题：如果数据量小但特征很多可以改成true

一般可以设置max_iter 大一些而tol小一些

Class_weight:输入{0:1,1:3}则代表1类样本的每条数据在计算损失函数时都会*3，当输入balanced，则调整为真实样本比例的反比，以达到平衡，但实际情况中不常用

Multi_class:默认情况auto，模型会优先根据惩罚项和solver选择OVR还是MVM

Solver：

逻辑回归调参实验

首先介绍PolynomialFeatures函数，其参数有degree（最高阶数）、interaction_only（是否包含交叉项）、include_bias（是否只包含0阶计算结果、偏置项）

具体实验代码如下：

#!/usr/bin/env python

# coding: utf-8

# In[1]:

import numpy as np

import pandas as pd

# In[2]:

import matplotlib as mpl

import matplotlib.pyplot as plt

# In[3]:

from sklearn.preprocessing import StandardScaler

from sklearn.preprocessing import PolynomialFeatures

from sklearn.linear_model import LogisticRegression

from sklearn.pipeline import make_pipeline

# In[4]:

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score

# In[5]:

np.random.seed(24)

x=np.random.normal(0,1,size=(1000,2))

# In[6]:

y=np.array(x[:,0]+x[:,1]**2<1.5,int)

# In[7]:

plt.scatter(x[:,0],x[:,1],c=y)#c表示颜色

# In[8]:

np.random.seed(24)

for i in range(200):

y[np.random.randint(1000)]=1

y[np.random.randint(1000)]=0

# In[9]:

plt.scatter(x[:,0],x[:,1],c=y)

# In[10]:

x_train,x_test,y_train,y_test=train_test_split(x,y,train_size=0.7,random_state=42)

# In[26]:

def plr(degree=1,penalty='none',C=1.0):

pipe=make_pipeline(PolynomialFeatures(degree=degree,include_bias=False),

StandardScaler(),

LogisticRegression(penalty=penalty,tol=1e-4,C=C,max_iter=int(1e4)))

return pipe

#UI多迭代10的6次方次，tol是优化算法的收敛容忍度，c是正则化项参数

# In[27]:

pl1=plr()

# In[28]:

pl1.fit(x_train,y_train)

# In[29]:

pl1.score(x_train,y_train),pl1.score(x_test,y_test)

# In[37]:

def plot_decision_boundary(x,y,model):

x1,x2=np.meshgrid(

np.linspace(x[:,0].min()-1,x[:,0].max()+1,1000).reshape(-1,1),

np.linspace(x[:,1].min()-1,x[:,1].max()+1,1000).reshape(-1,1))

x_temp=np.concatenate([x1.reshape(-1,1),x2.reshape(-1,1)],1)

yhat_temp=model.predict(x_temp)

yhat=yhat_temp.reshape(x1.shape)

from matplotlib.colors import ListedColormap

custom_cmap=ListedColormap(['#EF9A9A','#90CAF9'])

plt.contourf(x1,x2,yhat,cmap=custom_cmap)

plt.scatter(x[(y==0).flatten(),0],x[(y==0).flatten(),1],color='red')

plt.scatter(x[(y==1).flatten(),0],x[(y==1).flatten(),1],color='red')

# In[38]:

plot_decision_boundary(x,y,pl1)

# In[39]:

#再看下2次特征进行建模：

pr2=plr(degree=2)

# In[40]:

pr2.fit(x_train,y_train)

# In[41]:

pr2.score(x_train,y_train),pr2.score(x_test,y_test)

#(0.7914285714285715, 0.7866666666666666)，分数提升10%

# In[43]:

plot_decision_boundary(x,y,pr2)

# In[45]:

#如何查看参数情况

pr2.named_steps['logisticregression'].coef_

#array([[-0.81012988, 0.04384694, -0.48583038, 0.02977868, -1.12352417]])

# In[46]:

#过拟合倾向实验

pr3=plr(degree=10)

pr3.fit(x_train,y_train)

pr3.score(x_train,y_train),pr3.score(x_test,y_test)

#(0.8314285714285714, 0.78)

# In[47]:

plot_decision_boundary(x,y,pr3)

# In[48]:

#尝试不同参数下准确率评分

score_l=[]

for degree in range(1,21):

pr_temp=plr(degree=degree)

pr_temp.fit(x_train,y_train)

score_temp=[pr_temp.score(x_train,y_train),pr_temp.score(x_test,y_test)]

score_l.append(score_temp)

# In[49]:

np.array(score_l)

# In[54]:

#画图看准确率变化

plt.plot(list(range(1,21)),np.array(score_l)[:,0],label='train_acc')

plt.plot(list(range(1,21)),np.array(score_l)[:,1],label='test_acc')

plt.legend(loc=4)#指定图例位置

# In[55]:

#手动调参，尝试LL1正则化

pl1=plr(degree=10,penalty='l1',C=1.0)

# In[57]:

pl1.set_params(logisticregression__solver='saga')

pl1.fit(x_train,y_train)#直接fit会报错，要改变求解器为saga

# In[58]:

pl1.score(x_train,y_train),pl1.score(x_test,y_test)

# In[62]:

#尝试枚举搜索参数有degree、C、正则化项

score_l1=[]

for degree in range(1,21):

pr_temp=plr(degree=degree,penalty='l1')

pr_temp.set_params(logisticregression__solver='saga')

pr_temp.fit(x_train,y_train)

score_temp=[pr_temp.score(x_train,y_train),pr_temp.score(x_test,y_test)]

score_l1.append(score_temp)

plt.plot(list(range(1,21)),np.array(score_l1)[:,0],label='train_acc')

plt.plot(list(range(1,21)),np.array(score_l1)[:,1],label='test_acc')

plt.legend(loc=4)#指定图例位置

# In[63]:

score_l1#打印发现degree=3是最优解，以此为degree进行后面的搜索

# In[64]:

score_l2=[]

for degree in range(1,21):

pr_temp=plr(degree=degree,penalty='l2')

pr_temp.set_params(logisticregression__solver='saga')

pr_temp.fit(x_train,y_train)

score_temp=[pr_temp.score(x_train,y_train),pr_temp.score(x_test,y_test)]

score_l2.append(score_temp)

plt.plot(list(range(1,21)),np.array(score_l2)[:,0],label='train_acc')

plt.plot(list(range(1,21)),np.array(score_l2)[:,1],label='test_acc')

plt.legend(loc=4)#指定图例位置

# In[66]:

score_l2#打印发现degree=15是最优解，以此为degree进行后面的搜索

# In[72]:

#尝试C的取值

score_l1_3=[]

for c in np.arange(0.5,2,0.1):

pr_temp=plr(degree=3,penalty='l1',C=c)

pr_temp.set_params(logisticregression__solver='saga')

pr_temp.fit(x_train,y_train)

score_temp=[pr_temp.score(x_train,y_train),pr_temp.score(x_test,y_test)]

score_l1_3.append(score_temp)

plt.plot(list(np.arange(0.5,2,0.1)),np.array(score_l1_3)[:,0],label='train_acc')

plt.plot(list(np.arange(0.5,2,0.1)),np.array(score_l1_3)[:,1],label='test_acc')

plt.legend(loc=4)#指定图例位置

# In[73]:

score_l1_3#因此准确率最高为0.8左右

原文地址：https://blog.csdn.net/m0_60792028/article/details/143819542

免责声明：本站文章内容转载自网络资源，如本站内容侵犯了原著者的合法权益，可联系本站删除。更多内容请关注自学内容网（zxcms.com）！

上一篇：力扣-Hot100-链表其一【算法学习day.34】
下一篇：菜叶子芯酸笔记4：大模型训练、分布式训练、显存估算

UML概述、类图关系及连接线表示
继承和实现体现的是一种类与类、或者类与接口间的纵向关系;依赖关联聚合组合体现的是类与类、类与接口间的引用，即横向关系;这几种关系，所表现的强弱程度依次为：组合 > 聚合 > 关联 >
阅读更多2024-11-17
131. HTML标签遮挡Canvas画布事件
HTML元素标签外面div父元素遮挡了Canvas画布鼠标事件，会造成相机控件的旋转、缩放等操作无效，也有可能会影响你的射线拾取,等等任何与canvas画布有关的鼠标事件都有可能收到影响，不过这算是普
阅读更多2024-11-17
Scala-字符串（拼接、printf格式化输出等）-用法详解
Scala-字符串（拼接、printf格式化输出等）用法
阅读更多2024-11-17
c++原型模式（Prototype Pattern）
每个原型类实现自己的克隆方法，从而确保了对象的正确复制。
阅读更多2024-11-17
问题大集-01-kafka问题
1、Windows下启动单机kafka出现：系统找不到指定路径解决：是kafka不能识别本机的java环境（JVM），故需要指定java路径，进入kafka路径下的\bin\windows，找到：ka
阅读更多2024-11-17
【点云上采样】最近邻插值上采样算法增加点云密度
传感器采集到的点云比较稀疏，毕竟价位在那，好的太贵，买便宜的点又太稀，需要增加点云数据。
阅读更多2024-11-17
阮一峰科技爱好者周刊（第 325 期）推荐工具：一个基于 Next.js 的博客和 CMS 系统
近期，阮一峰在科技爱好者周刊第 325 期中推荐了一款开源工具——ReactPress，ReactPress一个基于 Next.js 的博客和 CMS 系统，可查看 demo站点。（@fecommun
阅读更多2024-11-17
js识别二维码
将二维码转换为链接
阅读更多2024-11-17
C 语言【单链表】
‌数据域用于存储实际的数据，而指针域则存储指向下一个节点的地址。单链表的特点包括动态存储、非连续存储、易于插入和删除。节点可以定义成一个结构体，每个节点中包含一个数据和下一个节点的地址。上面的结构体定
阅读更多2024-11-17
校园求职招聘系统（程序+数据库+报告）
基于Spring Boot框架实现的校园求职招聘系统，系统包含两种角色：管理员、用户,系统分为前台和后台两大模块
阅读更多2024-11-17

机器学习实战笔记30-31：逻辑回归及对应调参实验代码

相关文章