This assignment is designed to assess your understanding of basic data concepts and
predictive analytics. By the end of this assignment, you will demonstrate proficiency in
dealing with different types of data, applying regression, classification, and clustering
techniques to solve business problems, and implementing models that reflect a realworld scenario.
Task 1: Understanding Data Types:
You are provided with a dataset (choose one related to business domains such as
retail, banking, or healthcare). Review the data and categorize the data types:
• Ordinal Data: Identify ordinal variables and explain why they are categorized as
such.
• Categorical Data: Identify categorical variables (both nominal and ordinal).
• Encoding: Demonstrate how to encode categorical variables (e.g., using onehot encoding or label encoding) to prepare the data for machine learning
models.
3
Task 2: Predictive Analytics with Models:
You will explore regression, classification, and clustering techniques on your dataset
to address different business problems.
1. Regression Analysis:
• Choose a continuous variable from your dataset to predict.
• Implement a linear regression model. Justify the choice of predictors used.
• Evaluate the model using metrics like R², RMSE, and MAE.
2. Classification Analysis:
• Identify a classification problem within the dataset (binary or multiclass).
• Implement at least two classification models (e.g., Decision Trees, Logistic
Regression, Random Forest, or SVM).
• Evaluate model performance using accuracy, precision, recall, and the confusion
matrix.
3. Clustering Analysis:
• Apply K-Means or another clustering technique to segment the data into groups.
• Explain how the clusters could help address a business problem (e.g., customer
segmentation).
• Visualize the clusters using a scatter plot or another visualization method.
Task 3: Project Life Cycle – End-to-End
You will work through the full life cycle of an analytics project, from data preparation
to model evaluation. Choose a business problem (e.g., customer churn prediction,
sales forecasting, etc.) and use the techniques learned in LO1 and LO2 to solve it.
Steps:
1. Problem Definition: Briefly define the business problem and how analytics can
solve it.
2. Data Preparation: Collect, clean, and preprocess the data.
3. Modelling: Apply a model that solves the problem (regression or classification).
Justify why this model fits the problem.
4. Evaluation and Insights: Evaluate the model and explain how the findings
translate into a business solution.
Additional Sources:
UCI Machine Learning Repository
• Website: UCI Machine Learning Repository
Kaggle Datasets
• Website: Kaggle Datasets
4
Submission Instructions:
• Create a practical analysis within the designated word limit (2000 – 2500 words).
• Integrate Python code snippets, visualizations, and pertinent documentation
seamlessly into the analysis.
• Maintain clarity, organization, and logical flow when presenting your result analysis.
• Utilize the BSBI assignment template provided on Canvas for document
preparation.
• Employ the Harvard referencing style for your bibliography.
• Refer to the Essay Guide on Canvas for additional instructions.
• Submit your assignment electronically by the stipulated deadline.