Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
Data Set Information:
Data were extracted from images that were taken from genuine and forged banknote-like specimens. For digitization, an industrial camera usually used for print inspection was used. The final images have 400x 400 pixels. Due to the object lens and distance to the investigated object gray-scale pictures with a resolution of about 660 dpi were gained. Wavelet Transform tool were used to extract features from images.
Attribute Information:
1. variance of Wavelet Transformed image (continuous)
2. skewness of Wavelet Transformed image (continuous)
3. kurtosis of Wavelet Transformed image (continuous)
4. entropy of image (continuous)
5. class (integer): 1- genuine, 2-forged
Counterfeiting banknotes has been a problem since the introduction of color photocopiers and computer image scanners. The banking industry has suffered from counterfeits due to inflation and reduction in the value of real money. Assume that you are a data mining expert who works in the banking industry.
The dataset called banknotes.csv Download banknotes.csv contains 5 variables (or columns) and the description-bank.docx Download description-bank.docxcontains a description of the dataset. The end goal is to build an appropriate model (or tool) to successfully predict forgery. Using SAS Studio, perform the following tasks:
- Explore the dataset by providing summary statistics and graphical summaries of all the variables.
- Explain some of the key aspects of data in part 1.
- Examine if the dataset has any anomalies. Describe the method(s) you used as well as the results.
- Examine if there are any association among the variables. Describe the approaches as well as the results.
- Using one of the clustering techniques, analyze all the variables. Explain the results.
- Using one of the classification techniques from the course, build the model that predicts forgery. Explain why you think the model you’ve chosen is most appropriate for this dataset.
- Evaluate the model. How well does the model fit? Can you improve the model? Explain.