Data Mining: Using C++ to Measure Correlation between Real Valued and Nominal Valued datasets
Mr. Shahid Ali Khan , Prof (Dr.) Praveen Dhyani
In vast databases, there are various fields of mixed data types such as real, nominal and ordinal etc. In data mining applications, to know the relationship among data sets is an important issue. A correlation is commonly used for measuring relationship between two data sets and the correlation coefficient measures the strength as well as direction between two data sets and usually used in the context of real valued data sets. In this paper, the correlation coefficient between real valued and nominal valued data sets has been measured by using C++ language.
Correlation, Data mining, Nominal values, Real values.
 Al-Harbi, S.H., McKeown, G.P.,&Rayward-Smith, V.J., (2003). A new metric for categorical data. In: Bozdogan, H. (Ed.), Statistical Data Mining and Knowledge Discovery. CRC Press, Boca Raton, FL.
 Cheung, C. F., & Li, F. L. (2012). A quantitative correlation coefficient mining method for business intelligence in small and medium enterprises of trading business. Expert Systems with Applications, 39(7), 6279-6291.
 Easton, V. J., & McColl, J. H. (1997). Statistics Glossary V1.1, Paired data, correlation and regression, Available , accessed on 10 February 2014.
 Hawthorne G & Elliott P.(2005). Imputing cross-sectional missing data: comparison of common techniques. Australian and New Zealand Journal of Psychiatry, 39(7), 583–90.
 Jain, A.K., Murty, M.N., &Flynn, P.J. (1999). Data clustering: a review. ACM computing surveys, 31 (3), 264–323.
 Jolliffe, I.T.(1986). Principal Component Analysis. Springer, Berlin.
 Rayward-Smith, V. J.(2007). Statistics to measure correlation for data mining applications. Computational Statistics & Data Analysis, 51(8), 3968–3982.
 Xiong, H., Shekhar, S., Tan, P., & Kumar, V. (2004). Exploiting a support-based upper bound of pearson’s correlation coefficient for efficiently identifying strongly correlated pairs. In Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, 334–343.
 Han J. and Kamber M., Data Mining Concepts and Techniques Morgan Kaufmann Publisher, 2006.
 Little R.J. and Rubin D.B., Statistical Analysis with Missing Data. Second Edition. John Wiley and Sons, New York.(2002).
[Mr. Shahid Ali Khan, Prof (Dr.) Praveen Dhyani (2015), Data Mining: Using C++ to Measure Correlation between Real Valued and Nominal Valued datasets, International Journal of Innovative Research in Computer Science & Technology (IJIRCST), Vol-3, Issue-2, Page No-55-59], (ISSN 2347 - 5552). www.ijircst.org
Mr. Shahid Ali Khan
Asst. Professor, Department of Computer Sciences & Engineering, Waljat College of Applied Sciences, P.O.Box-197 P.C.-124 Oman, Tel: +968-24446660(334)