They connect borrowers with investors through an online marketplace. They have provided a publicly available dataset from 2007-2011. I have it under Project files. Please see the data with its data dictionary (This is in another separate excel sheet that explains what all the variables mean).
You just got an interview as an analyst for the Lending Club. The client wants you to analyze this big amount of information.
- Start by making initial observations of the data. What types of variables are present? Is there anything that catches your eye? A good analyst checks the data carefully. See the Quartz Guide to Bad Data on our slides.
- Summarize the qualitative data present in the dataset with frequency distributions and the various graphs/charts we have used in the class for Chapter 2. Analyze at least two different variables. Interpret your variables.
- Do the same thing with the quantitative data present. Analyze at least two other variables. Interpret your results.
- Pick two of the above graphs you chose and describe the shape of those distributions (when applicable).
- Why did you use the certain graphs you did? Are there any benefits over the other?
- Now I want you to take two variables you think might be related. Create a scatterplot. Find the covariance, correlation, and interpret the results.
- Use Chebyshev’s or Empirical rule (decide which one is more appropriate) to describe the data in a meaningful way that would mean something to the client.
Finally, give me a summary of what you have discovered as a whole from this dataset. You want the Lending Club to know that you are very interested in working with them. Give them something to think about.
Extra Credit:
Upload this Excel sheet into Python or R and follow steps 2, 3, and 6 as above. Feel free to go beyond the project and try other statistical methods.