Simple Linear Regression
Given a dataset containing house prices (Price) and square footage (Area):
Price Area (sqft)
300, 000 1200
450,000 1800
600,000 2400
Tasks:
• Build a simple linear regression model to predict house price based on square
footage.
• Evaluate the model’s performance using Mean Squared Error (MSE) and Rsquared.
• Interpret the model’s coefficients.
Task 2: Multiple Linear Regression
Using the same dataset but now with additional features such as Number of Bedrooms
and Location (encoded as a categorical variable):
3
Price Area (sqft) Bedrooms Location
300, 000 1200 3 Suburb
450,000 1800 4 City
600,000 2400 5 City
Tasks:
• Build a multiple linear regression model.
• Perform feature selection and determine which features are most important.
• Use your model to predict the price for a house with 2000 sqft, 4 bedrooms, in
a suburban area.
Task 3: Classification Models
Logistic Regression for Classification
Given the following dataset on customer churn (where 1 = churn, 0 = no churn):
Age Salary Churn (1/0)
30 40,000 1
45 60,000 0
25 30,000 1
Tasks:
• Build a logistic regression model to predict customer churn.
• Evaluate the model’s performance using accuracy, precision, recall, and F1
score.
• Plot the ROC curve and calculate the AUC score.
4
Submission Instructions:
• Create a practical analysis within the designated word limit (2000 – 2500 words).
• Integrate Python code snippets, visualizations, and pertinent documentation
seamlessly into the analysis.
• Maintain clarity, organization, and logical flow when presenting your result analysis.
• Utilize the BSBI assignment template provided on Canvas for document
preparation.
• Employ the Harvard referencing style for your bibliography.
• Refer to the Essay Guide on Canvas for additional instructions.
• Submit your assignment electronically by the stipulated deadline.