自学内容网 自学内容网

MATH122 Math

Math 122a

Final project: Option B

Due: Tuesday, 12/19

Part 1

Regression with clustering

Here we use clustering to improve the performance of a regression fit. We’ll use the

Boston Housing data available in Python. Recall that the task here is to predict

the median house in various neighborhoods, based on their characteristics. You can

import the dataset directly using keras (although you won’t need keras for any other

part of this problem):

1 from keras.datasets import boston_housing

2

3 (x_train, y_train), (x_test, y_test) = boston_housing.load_data()

The idea of this task is to improve the fit by first clustering the data, and training

separate linear models on each cluster (instead of using a single linear model on the

entire dataset).

(i) Use ridge regression as a baseline model: train a ridge regression model on the

training data, and evaluate the mean squared error on the test data.

(ii) Now use k-means clustering to cluster the training data, using only the inputs

x_train for the clustering and not the labels y_train. The reason we cluster

using on the inputs is that we will want a model that can make predictions

based only on test inputs x_test, without first seeing the labels y_test. Use

k = 3 clusters. Visualize the clusters by projecting the data onto the plane and

using a scatter plot (e.g. by plotting the first two variables of each data point).

(iii) Train three separate ridge regression models T1, T2, T3 , one for each cluster,

with each model trained using only the data from the corresponding cluster.

What is the total mean squared error on the test data clustered using the same

centroids found for the training data?

. . . . . . . . .Math 122a

Final project: Option B

Part 2

Open-ended exploration

Go beyond your findings in Part 1 to explore a question of interest to your group.

For example, you could apply the method to a different dataset or propose a modified

approach and compare your results. Prepare a short (4-8 minute) video sharing your

findings. No particular format is required—be creative and try to have fun!

. . . . . . . . .


原文地址:https://blog.csdn.net/zhuyu0206girl/article/details/136041049

免责声明:本站文章内容转载自网络资源,如本站内容侵犯了原著者的合法权益,可联系本站删除。更多内容请关注自学内容网(zxcms.com)!