Improving the tool for analyzing Malaysia’s demographic change: Data standardization analysis to form geo-demographics classification profiles using k-means algorithms

Kamarul Ismail, Nasir Nayan, Siti Naielah Ibrahim

Abstract


Clustering is one of the important methods in data exploratory in this era because it is widely applied in data mining.Clustering of data is necessary to produce geo-demographic classification where k-means algorithm is used as cluster algorithm. K-means is one of the methods commonly used in cluster algorithm because it is more significant. However, before any data are executed on cluster analysis it is necessary to conduct some analysis to ensure the variable used in the cluster analysis is appropriate and does not have a recurring information. One analysis that needs to be done is the standardization data analysis. This study observed which standardization method was more effective in the analysis process of Malaysia’s population and housing census data for the Perak state. The rationale was that standardized data would simplify the execution of k-means algorithm. The standardized methods chosen to test the data accuracy were the z-score and range standardization method. From the analysis conducted it was found that the range standardization method was more suitable to be used for the data examined.

Keywords: algorithm, data mining, geo-demographics, k-means, standardization, z-score


Full Text:

PDF

Refbacks

  • There are currently no refbacks.