Entropy-Based intuitionistic fuzzy c-means clustering
TÓM TẮT
PHÂN CỤM C-MEANS MỜ TRỰC CẢM DỰA TRÊN ENTROPY
Với sự phát triển nhanh chóng của các bộ dữ liệu không chắc chắn, một
phương pháp phân cụm C-means mờ trực cảm entropy (EIFCM) được đề xuất
dựa trên cơ sở các tập mờ trực cảm (IFS) cho các bài toán phân cụm. Các ưu
điểm của của các tập mờ trực cảm và các tập mờ được kết hợp với nhau trong
phương pháp đề xuất để khắc phục một số hạn chế của phương pháp FCM
trong việc xử lý dữ liệu không chắc chắn hoặc do dự cũng như giải quyết tính
mờ. Các kết quả thực nghiệm cho thấy thuật toán được đề xuất tốt hơn so với
các thuật toán phân cụm mờ truyền thống
Bạn đang xem tài liệu "Entropy-Based intuitionistic fuzzy c-means clustering", để tải tài liệu gốc về máy hãy click vào nút Download ở trên
Tóm tắt nội dung tài liệu: Entropy-Based intuitionistic fuzzy c-means clustering
Information technology & Applied mathematics T. Q. Hung, N. A. Cuong, N. D. Dzung, “Entropy-based intuitionistic fuzzy clustering.” 46 ENTROPY-BASED INTUITIONISTIC FUZZY C-MEANS CLUSTERING Truong Quoc Hung*, Nguyen Anh Cuong, Nguyen Dinh Dzung Abstract: With the rapid development of the uncertain or hesitant and fuzziness datasets, an entropy-based intuitionistic fuzzy c-means clustering (EIFCM) method is proposed based on the intuitionistic fuzzy sets (IFS)for the clustering problems. Utilizing the advantages of the intuitionistic fuzzy sets and fuzzy sets, which are combined in the proposed method, to overcome some drawbacks of the conventional FCM in handling uncertainties or hesitant and also resolve the fuzziness. Experimental results show that the proposed algorithm is better than the traditional fuzzy clustering algorithms. Keywords: Fuzzy sets, Intuitionistic fuzzy sets, Intuitionistic, Fuzzy c-means clustering, Entropy-Based intuitionistic. 1. INTRODUCTION Clustering technique is applied in many fields such as data mining, pattern recognition, image processing etc. It is used to detect any structures or patterns in the data set, in which objects within the cluster level data show certain similarities. Clustering algorithms have different shapes from simple clustering as k-means and various improvements [1], [2], [3], [4], development of a family of fuzzy c-mean clustering (FCM) [7]. With the framework of fuzzy theory, fuzzy techniques are suitable for the development of new clustering algorithms because they are able to remove vagueness/imprecision in the data [8]. Recently, the intuitionistic fuzzy set (IFS) was introduced [9] and used for representing the hesitance of an expert on determining the membership functions and the non-membership functions. This capability has created a different research direction to handle the uncertainty based on IFS [15]. IFSs also have been recently used for the clustering problem [16]. In [10], the incomplete nutrient-deficient crop images with missing pixels is segmented by an intuitionistic fuzzy clustering algorithm with good results. An other application of the intuitionistic clustering is to evaluate the sport tourism event problem with corporate social responsibility (CSR) and developing a novel Intuitionistic Fuzzy Importance performance Analysis (IFIPA) [11]. Besides, an other type of intuitionistic fuzzy clustering is introduced in [12], the possibilistic intuitionistic fuzzy C-Means clustering algorithm which is the combination of the fuzzy c-means (FCM), possibilistic c- means (PCM) algorithms and intuitionistic fuzzy sets. This algorithm is applied for MRI brain image segmentation with impressive results. In [14], the intuitionistic possibilistic fuzzy c-means clustering algorithm which is proposed to hand the information regarding membership values of objects to each cluster by generalizing membership and non- membership with hesitancy degree. However, the intuitionistic fuzzy clustering methods which were previously introduced, only based on the various distance and similarity measures among intuitionistic fuzzy sets (IFS) [13] and have difficulties in deciding the most Research Journal of Military Science and Technology, Special Issue, No.54A, 05 - 2018 47 suitable for measuring the degree of distance or similarity. In addition, they did not care about the entropy of the IFS. Through the overview of intuitionistic fuzzy clustering presented above, we found an outstanding method of fuzzy sets and intuitionistic fuzzy sets which can be combined in one objective function for handling the hesitant and uncertainties. Remain of the paper is organized as follows: Section II briefly introduces about some backgrounds about intuitionistic fuzzy sets and fuzzy clustering; Section III proposes the intuitionistic fuzzy C-means clustering algorithm; Section IV offers some experimental results and section V concludes the paper. 2. BACKGROUND 2.1. Intuitionistic Fuzzy sets Intuitionistic fuzzy sets (IFS) were introduced by Atanassov as an extension of the fuzzy set theory in 1986 as follows:([9]): Let X be an ordinary finite non-empty set. An IFS in X is an expression à given by: , ( ), ( )) :A AA x x v x x X where µÃ: X → [0; 1] và : X → [0;1] satisfy the condition µÃ(x) + và (x) ≤ 1 for all x X. The numbers µÃ(x) and vÃ(x) denote respectively the degree of membership and the degree of non-membership of the element x in set Ã. Considering IFSs(x) as the set of all the intuitionistic fuzzy sets in X. For each IFS à in X, The values πÃ(x) = 1 − µÃ (x) − và (x) is called the degree of uncertainty of x to Ã, or the degree of hesitancy of x to Ã. Note that for an IFS Ã, if µÃ(x) = 0, then vÃ(x) + πÃ(x) = 1, and if µÃ(x) = 1 then vÃ(x)=0 and πÃ(x) = 0. 2.2. Entropy on intuitionistic fuzzy sets Most of the fuzzy algorithms select the best threshold t using the concept of fuzzy entropy. In this paper, we will focus on the definition and characterization of the Intuitionistic fuzzy entropy. The entropy on IFSs is defined as a magnitude that measures the degree of IFS that a set is with respect to the fuzziness of this set which satisfies the following conditions: 1. The entropy will be null when the set is a FSs(x), 2. The entropy will be maximum if the set is an AIFS; that is, µ(x) = v(x) = 0 for all x X, 3. As in fuzzy sets, the entropy of an IFS will be equal to its respective complement 4. If the degree of membership and the degree of non-membership of each element increase, the sum will as well, and therefore, this set becomes fuzzier, and therefore the entropy should decrease. One of the simplest expressions that satisfy the conditions previously mentioned in [17] 1 1 ( ) n kA k IE A x n (1) Equation 1 is a base for segmentation algorithm. 2.3. Entropy on intuitionistic fuzzy sets Fuzzy c-means (FCM) was first introduced by Bezdek in [7]. It is a method of clustering which allows a data point can belong to more than one cluster with Information technology & Applied mathematics T. Q. Hung, N. A. Cuong, N. D. Dzung, “Entropy-based intuitionistic fuzzy clustering.” 48 different membership grades. It assumes that a number of clusters c is known in prior and minimizes the objective function (Jm) as: 2 1 1 ( , ) ( ) ( ) n c m m ik ik k i J U v u d (2) Where: 1/2 2 1 ( ) ( ) d ik k i k i kj ij j d d x v x v x v and m is a constant, known as the fuzzifier, which controls the fuzziness of the resulting partition and can be any real number greater than 1 but generally, it can be taken as 2. Predefined parameters to the problem: the number of clusters c (1 < c < n), fuzzifier m(1 < m < +∞) and error ε. This algorithm can be briefly described as follows: Algorithm 1: Fuzzy C-means algorithm 1 Step 1: Initialize centroid matrix (0); ; 0M cijV v V R j , by choosing randomly from dataset , , 1..Mi iX x x R i n and the membership matrix U 0 by using the equation: 2 2 1 1 1 ,1 ,1ik mc ik j jk u i c k n d d (3) 3 Where ( )ik k i k id d x v x v is the Euclidian distance from object xk to the cluster center vi 4 Step 2: 5 Repeat: 6 Update the centroid matrix ( ) ( ) ( ) ( ) 1 2, ,..., j j j j cV v v v by: 1 1 ( ) ,1 ( ) n m k k i n m ik k uik x v i c u 7 Update the membership matrix U(j) by using (3) 8 Assign data xj to cluster ci if data (ui (xj) > uk (xj)), k = 1,..., c and j ≠ k. 9 until : ( 1) ( )j jMax U U 10 Step 3: Return U and V. 3. ENTROPY-BASED INTUITIONISIC FUZZY C-MEANS CLUSTERING As an enhancement of classical FCM, the EntropyBased Intuitionisic Fuzzy C- means Clustering(EIFCM) use the intuitionistic fuzzy sets with the aim to better handle the hesitant data. The general idea of this algorithm is both to minimize the distance from data points to cluster centroids such in FCM and minimize the hesitant to the IFS or entropy on IFS (Eq.1). Next, the hesitant of the IFS is integrated with fuzzy membership in FCM to handle the fuzziness. Research Journal of Military Science and Technology, Special Issue, No.54A, 05 - 2018 49 The first step is to build the objective function to minimize the hesitant to an intuition of the IFS or entropy on IFS (Eq.1) and the distance between all of the data points to the clusters, simultaneously. Thus, an objective function in IFCM is formed as follows: 2 1 1 1 1 1 (1 ) ( ) c n c nP P ij i i j i j J dij ij (4) where πij is the hesitant of the j th data point to the ith cluster, dij =||xj − vi|| is Euclidean distance between the data point xj and the centroid vi, c is number of clusters, n is number of data points, p is a weighting exponent for the hesitant membership (p > 1), the scale parameter βi of each cluster. To minimize the objective function J1 (Eq.4), the first component with the distance between all of the data points to the cluster centroids is as small as possible, while the second component requires the hesitant of IFS or entropy of IFS is also as small as possible. The scale parameter βi for each the ith cluster is determined depending on the size of the ith cluster and can be determined as follows: 2 1 1 = n P ij ij j i n P ij j d (5) The main idea of the Entropy-Based Intuitionisic Fuzzy C-means Clustering (EIFCM) algorithm is the combination of the fuzzy membership and the hesitant. Thus, the objective function for the hesitant (Eq.4) is considered as a distance measurement and integrated with the fuzzy membership of the FCM, we have the objective function of the EIFCM as follows: 2 1 1 1 Pc n Pm EIFC M ij ij iji i j ijJ u d (6) 2 1 1 1 1 ( )1 Pc n c n m m P i ijij ij ij i j i j iju d u (7) in which: uij is the fuzzy membership of the data point xj to the i th cluster and m is a constant as in FCM. Theorem 3.1: The JEIF CM in Eq.7 attains its local minima when U ≡ [uij]c×n and Q = [πij]c×n are assigned the following values: 1/( 1)* * 1 1 =ij m c ij k kj u d d (8) 1/( 1) 2 1 = 1 ij p ij i d (9) Information technology & Applied mathematics T. Q. Hung, N. A. Cuong, N. D. Dzung, “Entropy-based intuitionistic fuzzy clustering.” 50 In which i = 1, 2, ..., c; j = 1, 2, ..., n; c is the number of clusters and n is the number of patterns and * 2(1 ) p pij ij ij i ijd d (10) with dij is the Euclid distance between data point xj and the centroid vi which is defined as: 1 1 (1 ) = (1 ) n m p ij ij j j i n m p ij ij j u x v u (11) the scale parameter for each ith cluster is determined depending on the size of the ith cluster can be determined as follows: 2 1 1 = n m p ij i j i j j i n m p ij i j j u d u (12) The proof of Theorem 3.1 can be easily done with lagrange multiplier with the constraint 1 1 c i j i u Because of applying the fuzzy membership and the hesitant to the EIFCM algorithm with the constraint uij + πij ≤ 1, if uij + πij > 1 then the uij and πij will be normalized as ; = =ij ijij ij ij ij ij ij u t u u and u t In defuzzification step, the combined membership * iju is usually as follows: * ij ij iju u (13) Where: uij is the membership and πij is the hesitant of the j th data point to the ith cluster. However, Eq.13 seems ineffective, we take a simple example as follows: Predefined a data x, a set A. To assess a membership function of x in set A based on intuitionistic fuzzy theory. Let u is the membership function, v is the non- membership function and the hesitation degree π (u+v+π=1) Case 1: Assuming that: u = 0.6, v = 0.1 and π = 0.3 we have u∗ = 0:9 according to Eq.13 Case 2: Assuming that: u = 0.8, v=0.1 and π = 0.1. we also have u∗ = 0:9 according to Eq.13 One can easily realize that: case 2 is better than case 1. However, with the above Eq.13, the aggregate membership functions u∗ are 0.9 in both two cases and we cannot determine the better one. Remark: The fact that hesitation degree π usually is not affect performance results. We are only interested in two factors that affect the decision are the membership function u and the non-membership function v. Therefore, we propose a new way to define the aggregate membership functions u∗ satisfying properties: the maximum membership function and the minimum Research Journal of Military Science and Technology, Special Issue, No.54A, 05 - 2018 51 non-membership function * ij ij iju u v (14) Where: uij is the membership function Eq.3 and vij is the the non-membership function of the jth data in ith cluster. Substituting vij = 1 - uij - πij in Eq.14, we have: * =ij ij iju u v (15) = (1 )ij ij iju u (16) = 2 ij iju (17) The experiments are completed for several Machine Learning data ( Table 1. Characteristic of heart disease datasets. Dataset No of Instance Class Cleveland 303 5 Hungarian 294 5 Switzerland 123 5 Long Beach VA 200 5 Table 2. Characteristic of breast cancer diagnostic datasets. Dataset No of Instance Class Breast-cancer- Wisconsin 699 2 WDBC 569 2 WPBC 198 2 Parameters m = 2 for FCM, IFCM, fuzifier m = 2; p = 2 for EIFCM, the number of clusters c is the number of classes and error parameter ε = 0:00001. The resulting classification performance of the classification is evaluated by the accuracy rate CT as follows: = TR CT TT (18) Where: TR is the number of correctly classified data and TT is the total number of data points. The clustering results of the clustering or the quality of classification are reported in terms of index CT, which are shown in Tab.3 and Tab.4 Table 3. Clustering results of heart disease datasets. Dataset FCM IFCM EIFCM Cleveland 85.6 87.6 92.1 Hungarian 78.9 85.5 91.4 Switzerland 76.5 90.1 90.1 Long Beach VA 87.8 88.9 94.3 Information technology & Applied mathematics T. Q. Hung, N. A. Cuong, N. D. Dzung, “Entropy-based intuitionistic fuzzy clustering.” 52 Table 4. Clustering results of breast cancer diagnostic datasets. Dataset FCM IFCM EIFCM Breast-cancer- Wisconsin 90.3 87.4 92.8 WDBC 89.8 89.3 93.4 WPBC 77.8 81.7 90.9 These experimental results in Tab.3 and Tab.4 show that the effectiveness of the algorithm EIFCM is better than FCM and IFCM with the accuracy rate CT is always over 90%. Experiment: The experiments are done based on well-known images with the predefined the number of clusters images in TableV and Fig.1. The results were measured on the basis of several validity indexes to assess the performance of the algorithms on the experimental images. Table 5. The number of clusters. Image Number of cluster Rose 3 Wolf 3 Mountain 4 Figure 1. Test Images: a) Rose image; b) Wolf image; c) Mountain image. Table 6. The various validity indexes on the experimental images. These test images were clustered by the FCM, IFCM [5] and EIFCM algorithms with predefined parameters: the exponential parameters m = 2 for FCM Research Journal of Military Science and Technology, Special Issue, No.54A, 05 - 2018 53 and IFCM, m = 2; p = 2 for EIFCM, the number of clusters c is the number of classes and error parameter ε = 0:00001. The different validity indexes is used to assess the clustering results such as the Bezdeks partition coefficient (PC-I), the Dunns separation index (Dunn-I), the Davies-Bouldins index (DB-I), and the Separation index (S-I), Xie and Beni’s index (XB-I), Classification Entropy index (CE-I) [6]. The various validity indexes are shown in the Tab.6. Note that: The validity indexes are proposed to evaluate the quality of clustering. The better algorithm has smaller T-I, DB-I, XB-I, S-I, CE-I and larger PCI. The results in Table 6 show that the EIFCM (the proposed algorithm) have more good performance or higher quality clustering times than the FCM, IFCM algorithms. 4. CONCLUSION This paper presented a fuzzy c-means clustering algorithm based on intuitionistic fuzzy sets, which improved the clustering results and overcome the drawbacks of the conventional clustering algorithms in handling the hesitant. The proposed approach has solved the problem of combining between IFSs and fuzzy clustering to improve the quality of clustering results. The experiments are done based on some well-known datasets with the statistics show that the proposed algorithm generates better results than the traditional method like FCM. The next goal is some researches related to use the general type-2 intuitionistic fuzzy sets to better improvement of quality. REFERENCES [1]. T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, A. Y. Wu, “An Efficient k-Means Clustering Algorithm: Analysis and Implementation”, IEEE Trans. On Pattern Analysis and Machine Intelligence, Vol 24(7), 881-893, (2002). [2]. M.C. Hung, J. Wu, J.H. Chang, D.L. Yang, “An Efficient k-Means Clustering Algorithm Using Simple Partitioning”, J. of Info. Science and Engineering, Vol.21, 1157-1177, (2005). [3]. K.R. Zalik, “An efficient k-means clustering algorithm”, Pattern Recognition Letters Vol. 29, 1385 - 1391, (2008). [4]. K. A. Abdul Nazeer, M. P. Sebastian, “Improving the Accuracy and Efficiency of the k-means Clustering Algorithm”, Proceedings of the World Congress on Engineering, (2009). [5]. Z. Xu, J. Wu, “Intuitionistic fuzzy C-means clustering algorithms”, Journal of Systems Engineering and Electronics, vol.21 (4), pp.580-590, (2010). [6]. W. Wang, Y. Zhang, “On fuzzy cluster validity indices”, Fuzzy Sets and Systems 158, 2095-2117, (2007). [7]. J. C. Bezdek, “Pattern Recognition with Fuzzy Objective Function Algorithms”, New York: Academic, (1981). [8]. J. Yu and M.-S. Yang, “Optimality test for generalized FCM and its application to parameter selection”, IEEE Trans. Fuzzy Syst., vol. 13, no. 1, pp. 164176, (2005). Information technology & Applied mathematics T. Q. Hung, N. A. Cuong, N. D. Dzung, “Entropy-based intuitionistic fuzzy clustering.” 54 [9]. K. Atanassov, “Intuitionistic fuzzy sets”, Fuzzy Sets and Systems, vol.20, pp. 87-96, (1986). [10]. P. Balasubramaniam, V.P. Ananthi, “Segmentation of nutrient deficiency in incomplete crop images using intuitionistic fuzzy C-means clustering algorithm”, Nonlinear Dynamics, vol.83(1-2), pp. 849-866, (2016). [11]. F.H. Huang, Y.J. Ye, C.H. Kao, “Developing a novel Intuitionistic Fuzzy Importance-performance Analysis for evaluating corporate social responsibility in sport tourism event”, Expert Systems With Applications, vol.42(19), pp.6530-6538, (2015). [12]. H. Verma, R.K. Agrawal, “Possibility Intuitionistic Fuzzy c-Means Clustering Algorithm for MRI Brain Image Segmentation”, International Journal on Artificial Intelligence Tools, vol.24(5), (2015). [13]. W.S. Sheng. B.Q. Hu, “Aggregation distance measure and its induced similarity measure between intuitionistic fuzzy sets”, Pattern Recognition Letters, vol.60-61, pp.65-71, (2015). [14]. A. Chaudhuri, “Intuitionistic Fuzzy Possibilistic C Means Clustering Algorithms”, Advances in Fuzzy Systems, (2015). [15]. P.Burillo and H.Bustince, “Construction theorems for intuitionistic fuzzy sets”, Fuzzy Sets and Systems, vol.84, pp.271281, (1996). [16]. P.Couto, “Image segmentation using atanassov intuitionistic fuzzy sets”, Ph.D. thesis, Trs-os-Montes e Alto Douro University, Vila Real, Portugal, (2006). [17]. H.Bustince, E.Barrenechea, M. Pagola and R. Orduna, “Image Thresholding Computation Using Atanassovs Intuitionistic Fuzzy Sets”, Journal of Advanced Comput. Int. and Int. Informatics, vol.11 (2), pp. 187-194, (2007). TÓM TẮT PHÂN CỤM C-MEANS MỜ TRỰC CẢM DỰA TRÊN ENTROPY Với sự phát triển nhanh chóng của các bộ dữ liệu không chắc chắn, một phương pháp phân cụm C-means mờ trực cảm entropy (EIFCM) được đề xuất dựa trên cơ sở các tập mờ trực cảm (IFS) cho các bài toán phân cụm. Các ưu điểm của của các tập mờ trực cảm và các tập mờ được kết hợp với nhau trong phương pháp đề xuất để khắc phục một số hạn chế của phương pháp FCM trong việc xử lý dữ liệu không chắc chắn hoặc do dự cũng như giải quyết tính mờ. Các kết quả thực nghiệm cho thấy thuật toán được đề xuất tốt hơn so với các thuật toán phân cụm mờ truyền thống. Từ khóa: Tập mờ, Tập mờ trực cảm, Trực cảm, Phân cụm C-means mờ, Trực cảm dựa trên Entropy. Received 27 th February 2018 Revised 5th April 2018 Accepted 20 th April 2018 Author affiliations: Le Quy Don Technical University, Hanoi. *Corresponding author: truongqhung.@gmail.com
File đính kèm:
- entropy_based_intuitionistic_fuzzy_c_means_clustering.pdf