Part A: Clustering –
1. Find a dataset in kaggle or any other source. Make sure that each dataset is at least 500 MB.
2. Write a detailed description of the dataset.
3. Preprocess the dataset.
4. Using K-means algorithm to cluster the dataset.
5. Use the Elbow method and the Silhouette method to find the optimal K.
Part B: Regression
1. Find one or two datasets in kaggle or any other source. Make sure that each dataset is at least 500 MB.
2. Write a detailed description of each dataset.
3. Preprocess each dataset.
4. Divide each dataset into training and testing.
5. Build two regression models.
6. Test the models and compute their accuracy.
Part C: Classification
1. Find one or two datasets in kaggle or any other source. Make sure that each dataset is at least one 500MB.
2. Write a detailed description of each dataset.
3. Preprocess each dataset.
4. Divide each dataset into training and testing.
5. Build two classification models.
6. Test the models and compute their accuracy.
Deliverables:
One pdf file which contains:
A cover page with:
Project title
▪ date of submission
The solution of each of the above questions.
▪ code
▪ output
The file should be in docx format along the code file as well