Task Scenario
You have been provided with access to three datasets; all are available in the “Assessment &
Submission” folder on Blackboard, along with the accompanying documentation and specific
requirements for each. You have been given the choice of any one of these scenarios as your
project. Using at least two different techniques, your task is to produce models that possesses
predictive capacity with regards to the response variable within the dataset, and to evaluate the
performance of these models. Where possible, you will also provide insight into the feature
importance with regards to the predictive capacity of your model.
All three datasets have been cleaned and are ready for use, however you may still wish to conduct
some data preparation and/or transformation so that the data is in an appropriate condition and
format for the analysis methods that you wish to use. But you need to justify the reasons of using
any pre-processing procedure that you choose to apply. Also, you may choose to use any methods
you wish to tackle the chosen problem; however, you must justify the use of your approach.
The key components of this task that you must complete are:
• Explore the data so that you understand the structure, characteristics and limitations of the
dataset.
• Identify the forms of analysis that will be able to produce a successful outcome for the
scenario. Ensure that the chosen method(s) are suitable for use on the dataset that you have
chosen to use and justify the use of your chosen approach. (You may use methods that have
been taught during the module as well as others that have not been used within the taught
materials, as long as the choice of these methods is appropriately justified).
• Process the data into a condition suitable for the model building to be performed, including
the selection of features to be used within the model.
• Build a model that allows for the response variable in the dataset to be predicted.
• Evaluate the capabilities of the model that has been developed, using suitable metrics.
• Present and describe your findings and recommendations in a manner suitable for the target
audience.
• Critically evaluate the process and discuss the outcome of the project. You may wish to
discuss areas such as the overall performance of your models, your approach to the problem
and what has been learned throughout the process, and the potential real-world application
of your findings.
• While evaluating and comparing your models, ensure that different models are trained with
the same training set that is consistently used across the models and models are tested by
using a consistent test set that was not seen by the models during the training process.
All of the above stages should be documented within the report, while all of the decisions that have
been made throughout the process should be discussed and justified. Please note that it is expected
that you will use R to complete this assignment. While there is no requirement for all of the code
you have used to submitted, it is expected that you will evidence key elements with excerpts of the
code. There is no specific requirement for the software used to prepare your submission: options
include but are not limited to Microsoft Word, LaTeX and R Markdown. All submissions should make
use of some form of data visualisation to support the text-based elements of the work.
The selection, application and evaluation of data science methods, tools and techniques.
Share