Hello, Please can you help me with below:
Thanks in advance for your help :) It is appreciated.
Instructions
In this task, we will use the MNIST database. As stated by the creators of the
dataset, “The MNIST database of handwritten digits, available from this page, has a
training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a
larger set available from NIST. The digits have been size-normalised and centred in
a fixed-size image.”
First read and run the MNIST.pynb example file to explore the MNIST data set
provided by sklearn. Then, follow the instructions below to create a random forest
model using the sample of the MNIST data provided by sklearn.
Please answer all below question points.
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
from sklearn.datasets import load_digits
digits = load_digits()
# Print to show there are 1797 images (8 by 8 images for a dimensionality of 64)
print("Image Data Shape" , digits.data.shape)
# Print to show there are 1797 labels (integers from 0-9)
print("Label Data Shape", digits.target.shape)
plt.figure(figsize=(20,4))
for index, (image, label) in enumerate(zip(digits.data[0:5], digits.target[0:5])):
plt.subplot(1, 5, index + 1)
plt.imshow(np.reshape(image, (8,8)), cmap=plt.cm.gray)
plt.title('Training: %i\n' % label, fontsize = 20)
- Create a copy of the MNIST.ipynb file called mnist_task.ipynb. - Load the MNIST dataset. Use a library such as SKLearn to access the dataset (from sklearn.datasets import load_digits). - Split the training data into a training and test set. - Add a comment explaining the purpose of the train and test sets - Use the RandomForestClassifier built into sklearn to create a classification model. - Pick one parameter to tune, and explain why you chose this parameter. - Choose which value for the parameter to set for testing on the test data and explain why. - Print the confusion matrix for your Random Forest model on the test set. - Report which classes the model struggles with the most. - Report the accuracy, precision, recall, and f7-score HINT: use average="macro" in precision_score, recall_score and fl_score from sklearn
[2]: import numpy as np \%matplotlib inline import matplotlib.pyplot as plt from sklearn. datasets import load_digits digits = load_digits () [3]: \# Print to show there are 1797 images ( 8 by 8 images for a dimensionality of 64) print("Image Data Shape", digits.data.shape) \# Print to show there are 1797 labels (integers from 0-9) print("Label Data Shape", digits.target.shape) Image Data Shape (1797,64) Label Data Shape (1797, [4]: plt.figure(figsize =(20,4)) for index, (image, label) in enumerate(zip(digits.data[0:5], digits.target[0:5])): plt. subplot (1,5, index +1) plt.imshow(np.reshape(image, (8,8) ), cmap=plt.cm.gray) plt.title('Training: \%i \n ' \% label, fontsize = 20)