Fast and Highly Scalable Multiresolution Linear Word based Clustering in Multidimensional data
P.Rubi , M.Govindaraj
Clustering problems are well known in database literature for their use in numerous applications. Multidimensional data always is a challenge for clustering algorithms. The Halite, fast and scalable clustering method that looks for clusters in subspaces of multidimensional data. The tree root corresponds to a hypercube embodying the full data set. The next level divides the space in a set of 2D hypercube. The resulting hypercube are divided again, generating the tree structure. Bump Hunting task refers to apply for each level of the Counting-tree one d-dimensional Laplacian mask over the respective grid to spot bumps in the respective resolution. Specifically the main contributions of Halite are: Scalability: it is linear in time and space regarding the data size and dimensionality of the clusters’ subspaces. Usability: it is deterministic, robust to noise, doesn’t take the number of clusters as an input parameter, and detects clusters in subspaces generated by original axes or by their linear combinations, including space rotation. Effectiveness: it is accurate, providing results with equal or better quality. It is achieved through word based approach Generality: it includes a soft clustering approach.
Bump Hunting, Correlation Connected Objects, Harp , Spotting clusters .
 R.L.F.Cordeiro,A.J.M. Traina,C.Faloutsos and C. Traina Jr., ., “Finding Clusters in Subspaces of Very Large, Multi-Dimensional Data Sets,” Proc. IEEE 26th Int’1 Conf.Data Eng.(ICDE),pp.625-636,2010.
 R.C. Gonzalez and R.E. Woods, Digital Image Processing, third ed. Prentice-Hall, Inc., 2006.
 P.D. Grunwald, I.J. Myung, and M.A. Pitt, Advances in Minimum Description Length: Theory and Applications (Neural Information Processing). The MIT Press, 2005.
 C. Traina Jr., A.J.M. Traina, C. Faloutsos, and B. Seeger,“Fast Indexing and Visualization of Metric Data Sets Using Slim-Trees,” IEEE Trans. Knowledge Data Eng., vol. 14, no. 2, pp. 244-260, Mar./ Apr. 2002.
 C. Traina Jr., A.J.M. Traina, L. Wu, and C. Faloutsos, “Fast Feature Selection Using Fractal Dimension,” Proc. 15th Brazilian Symp. Databases (SBBD), pp. 158-171, 2000.
 H.-P. Kriegel, P. Kro¨ger, and A. Zimek, “Clustering High- Dimensional Data: A Survey on Subspace Clustering, PatternBased Clustering, and Correlation Clustering,” ACM Trans. Knowledge Discovery from Data, vol. 3, no. 1, pp. 1-58, 2009.
 C. Domeniconi, D. Gunopulos, S. Ma, B. Yan, M. Al-Razgan, and D. Papadopoulos, “Locally Adaptive Metrics for Clustering High Dimensional Data,” Data Mining and Knowledge Discovery, vol. 14, no. 1, pp. 63-97, 2007.
 A.K.H. Tung, X. Xu, and B.C. Ooi, “Curler: Finding and Visualizing Nonlinear Correlation Clusters,” Proc. ACM SIGMOD Int’l Conf. Management of Data, pp. 467-478, 2005.
 C. Aggarwal and P. Yu, “Redefining Clustering for HighDimensional Applications,” IEEE Trans. Knowledge and Data Eng., vol. 14, no. 2, pp. 210-225, Mar./Apr. 2002 .
 E.K.K. Ng, A.W. chee Fu, and R.C.-W. Wong, “Projective Clustering by Histograms,” IEEE Trans. Knowledge and Data Eng., vol. 17, no. 3, pp. 369-383, Mar. 2005.
 G. Moise, J. Sander, and M. Ester, “Robust Projected Clustering,” Knowledge Information Systems, vol. 14, no. 3, pp. 273-298, 2008.
 R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, “Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications,” SIGMOD Record, vol. 27, no. 2, pp. 94- 105, 1998.
 C.C. Aggarwal, J.L. Wolf, P.S. Yu, C. Procopiuc, and J.S. Park, “Fast Algorithms for Projected Clustering,” SIGMOD Record, vol. 28, no. 2, pp. 61-72, 1999.
 M.L. Yiu and N. Mamoulis, “Iterative Projected Clustering by Subspace Mining,” IEEE Trans. Knowledge and Data Eng., vol. 17, no. 2, pp. 176-189, Feb. 2005.
 K. Yip, D. Cheung, and M. Ng, “Harp: A Practical Projected Clustering Algorithm,” IEEE Trans. Knowledge and Data Eng., vol.16, no. 11, pp. 1387-1397, Nov. 2004.
 G. Moise and J. Sander, “Finding Non-Redundant, Statistically Significant Regions in High Dimensional Data: A Novel Approach to Projected and Subspace Clustering,” Proc. 14th ACM SIGKDD Int’l Conf. Knowledge Discovery Data Mining (KDD), pp. 533-541, 2008
 Douglass R. Cutting, David R. Karger, Jan O. Pedersen, John W. Tukey, Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections, In Proceedings of the Fifteenth Annual International ACM SIGIR Conference, pp 318-329, June 1992.
 Dean, P. M. Ed., Molecular Similarity in Drug Design, Blackie Academic & Professional, 1995, pp 111 –137.
 D. R. Hill, A vector clustering technique, in: Samuelson (Ed.), Mechanized Information Storage, Retrieval and Dissemination, North-Holland, Amsterdam, 1968.
[P.Rubi , M.Govindaraj (2014) Fast and Highly Scalable Multiresolution Linear Word based Clustering in Multidimensional data IJIRCST Vol-2 Issue-3 Page No-85-92] (ISSN 2347 - 5552). www.ijircst.org
Computer Science and Engineering Department, Bharathidasan University, Tiruchirappalli/Tamilnadu, India, 9790534573., (e-mail: firstname.lastname@example.org)