自学内容网 自学内容网

【21天学习AI底层概念】day12 (kaggle新手入门教程)Model Validation

网址:https://www.kaggle.com/code/dansbecker/model-validation

代码

1.What is Model Validation

就是衡量模型的预测能力,一般用预测值和现实值,这两的差距是多少来衡量。沿用这个原理,提出一个概念Mean Absolute Error (also called MAE). 通俗易懂地说就是“平均而言,我们的预测偏差是多少”,以下是计算MAE的代码:

from sklearn.metrics import mean_absolute_error

predicted_home_prices = melbourne_model.predict(X)
mean_absolute_error(y, predicted_home_prices)

输出MAE:
434.71594577146544

2.The Problem with “In-Sample” Scores

这一小段结论是:将之前的训练集数据一分为二,一部分用于训练模型,另一部分用于检验模型预测性能。
原因是:如果用训练集数据去检验模型预测性能,显而易见,结果会非常好,但不准确。

3.Coding It

1.使用train_test_split,将数据分成训练集train_X、train_y, 验证集val_X、val_y 。
2.计算MAE

from sklearn.model_selection import train_test_split

# split data into training and validation data, for both features and target
# The split is based on a random number generator. Supplying a numeric value to
# the random_state argument guarantees we get the same split every time we
# run this script.
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state = 0)
# Define model
melbourne_model = DecisionTreeRegressor()
# Fit model
melbourne_model.fit(train_X, train_y)

# get predicted prices on validation data
val_predictions = melbourne_model.predict(val_X)
print(mean_absolute_error(val_y, val_predictions))

输出MAE:
265806.91478373145

4.Wow!

真没想到,预测偏差这么大,没分数据前是500刀,分了数据后是26万刀。要学习下怎样优化模型。


原文地址:https://blog.csdn.net/keira674/article/details/145100295

免责声明:本站文章内容转载自网络资源,如侵犯了原著者的合法权益,可联系本站删除。更多内容请关注自学内容网(zxcms.com)!