Entropy-Based intuitionistic fuzzy c-means clustering

TÓM TẮT

PHÂN CỤM C-MEANS MỜ TRỰC CẢM DỰA TRÊN ENTROPY

Với sự phát triển nhanh chóng của các bộ dữ liệu không chắc chắn, một

phương pháp phân cụm C-means mờ trực cảm entropy (EIFCM) được đề xuất

dựa trên cơ sở các tập mờ trực cảm (IFS) cho các bài toán phân cụm. Các ưu

điểm của của các tập mờ trực cảm và các tập mờ được kết hợp với nhau trong

phương pháp đề xuất để khắc phục một số hạn chế của phương pháp FCM

trong việc xử lý dữ liệu không chắc chắn hoặc do dự cũng như giải quyết tính

mờ. Các kết quả thực nghiệm cho thấy thuật toán được đề xuất tốt hơn so với

các thuật toán phân cụm mờ truyền thống

pdf 9 trang phuongnguyen 440
Bạn đang xem tài liệu "Entropy-Based intuitionistic fuzzy c-means clustering", để tải tài liệu gốc về máy hãy click vào nút Download ở trên

Tóm tắt nội dung tài liệu: Entropy-Based intuitionistic fuzzy c-means clustering

Entropy-Based intuitionistic fuzzy c-means clustering
Information technology & Applied mathematics 
T. Q. Hung, N. A. Cuong, N. D. Dzung, “Entropy-based intuitionistic fuzzy  clustering.” 46 
ENTROPY-BASED INTUITIONISTIC FUZZY 
 C-MEANS CLUSTERING 
Truong Quoc Hung*, Nguyen Anh Cuong, Nguyen Dinh Dzung 
Abstract: With the rapid development of the uncertain or hesitant and fuzziness 
datasets, an entropy-based intuitionistic fuzzy c-means clustering (EIFCM) method 
is proposed based on the intuitionistic fuzzy sets (IFS)for the clustering problems. 
Utilizing the advantages of the intuitionistic fuzzy sets and fuzzy sets, which are 
combined in the proposed method, to overcome some drawbacks of the conventional 
FCM in handling uncertainties or hesitant and also resolve the fuzziness. 
Experimental results show that the proposed algorithm is better than the traditional 
fuzzy clustering algorithms. 
Keywords: Fuzzy sets, Intuitionistic fuzzy sets, Intuitionistic, Fuzzy c-means clustering, Entropy-Based 
intuitionistic. 
1. INTRODUCTION 
Clustering technique is applied in many fields such as data mining, pattern 
recognition, image processing etc. It is used to detect any structures or patterns in 
the data set, in which objects within the cluster level data show certain similarities. 
Clustering algorithms have different shapes from simple clustering as k-means and 
various improvements [1], [2], [3], [4], development of a family of fuzzy c-mean 
clustering (FCM) [7]. With the framework of fuzzy theory, fuzzy techniques are 
suitable for the development of new clustering algorithms because they are able to 
remove vagueness/imprecision in the data [8]. 
Recently, the intuitionistic fuzzy set (IFS) was introduced [9] and used for 
representing the hesitance of an expert on determining the membership functions 
and the non-membership functions. This capability has created a different research 
direction to handle the uncertainty based on IFS [15]. IFSs also have been recently 
used for the clustering problem [16]. In [10], the incomplete nutrient-deficient crop 
images with missing pixels is segmented by an intuitionistic fuzzy clustering 
algorithm with good results. An other application of the intuitionistic clustering is 
to evaluate the sport tourism event problem with corporate social responsibility 
(CSR) and developing a novel Intuitionistic Fuzzy Importance performance 
Analysis (IFIPA) [11]. 
Besides, an other type of intuitionistic fuzzy clustering is introduced in [12], the 
possibilistic intuitionistic fuzzy C-Means clustering algorithm which is the 
combination of the fuzzy c-means (FCM), possibilistic c- means (PCM) algorithms 
and intuitionistic fuzzy sets. This algorithm is applied for MRI brain image 
segmentation with impressive results. In [14], the intuitionistic possibilistic fuzzy 
c-means clustering algorithm which is proposed to hand the information regarding 
membership values of objects to each cluster by generalizing membership and non-
membership with hesitancy degree. 
However, the intuitionistic fuzzy clustering methods which were previously 
introduced, only based on the various distance and similarity measures among 
intuitionistic fuzzy sets (IFS) [13] and have difficulties in deciding the most 
Research 
Journal of Military Science and Technology, Special Issue, No.54A, 05 - 2018 47
suitable for measuring the degree of distance or similarity. In addition, they did not 
care about the entropy of the IFS. 
Through the overview of intuitionistic fuzzy clustering presented above, we 
found an outstanding method of fuzzy sets and intuitionistic fuzzy sets which can 
be combined in one objective function for handling the hesitant and uncertainties. 
Remain of the paper is organized as follows: Section II briefly introduces about 
some backgrounds about intuitionistic fuzzy sets and fuzzy clustering; Section III 
proposes the intuitionistic fuzzy C-means clustering algorithm; Section IV offers 
some experimental results and section V concludes the paper. 
2. BACKGROUND 
2.1. Intuitionistic Fuzzy sets 
Intuitionistic fuzzy sets (IFS) were introduced by Atanassov as an extension of 
the fuzzy set theory in 1986 as follows:([9]): 
Let X be an ordinary finite non-empty set. An IFS in X is an expression à given 
by: , ( ), ( )) :A AA x x v x x X   where µÃ: X → [0; 1] và : X → [0;1] satisfy the 
condition µÃ(x) + và (x) ≤ 1 for all x X. The numbers µÃ(x) and vÃ(x) denote 
respectively the degree of membership and the degree of non-membership of the 
element x in set Ã. Considering IFSs(x) as the set of all the intuitionistic fuzzy sets 
in X. For each IFS à in X, The values πÃ(x) = 1 − µÃ (x) − và (x) is called the 
degree of uncertainty of x to Ã, or the degree of hesitancy of x to Ã. 
Note that for an IFS Ã, if µÃ(x) = 0, then vÃ(x) + πÃ(x) = 1, and if µÃ(x) = 1 then 
vÃ(x)=0 and πÃ(x) = 0. 
2.2. Entropy on intuitionistic fuzzy sets 
Most of the fuzzy algorithms select the best threshold t using the concept of 
fuzzy entropy. In this paper, we will focus on the definition and characterization of 
the Intuitionistic fuzzy entropy. The entropy on IFSs is defined as a magnitude that 
measures the degree of IFS that a set is with respect to the fuzziness of this set 
which satisfies the following conditions: 
1. The entropy will be null when the set is a FSs(x), 
2. The entropy will be maximum if the set is an AIFS; that is, µ(x) = v(x) = 0 for all 
x X, 
3. As in fuzzy sets, the entropy of an IFS will be equal to its respective 
complement 
4. If the degree of membership and the degree of non-membership of each 
element increase, the sum will as well, and therefore, this set becomes fuzzier, and 
therefore the entropy should decrease. One of the simplest expressions that satisfy 
the conditions previously mentioned in [17] 
1
1
( )
n
kA
k
IE A x
n
  
(1) 
Equation 1 is a base for segmentation algorithm. 
2.3. Entropy on intuitionistic fuzzy sets 
Fuzzy c-means (FCM) was first introduced by Bezdek in [7]. It is a method of 
clustering which allows a data point can belong to more than one cluster with 
Information technology & Applied mathematics 
T. Q. Hung, N. A. Cuong, N. D. Dzung, “Entropy-based intuitionistic fuzzy  clustering.” 48 
different membership grades. It assumes that a number of clusters c is known in 
prior and minimizes the objective function (Jm) as: 
2
1 1
( , ) ( ) ( )
n c
m
m ik ik
k i
J U v u d
 
(2) 
Where: 1/2
2
1
( ) ( )
d
ik k i k i kj ij
j
d d x v x v x v

and m is a constant, known as the fuzzifier, which controls the fuzziness of the 
resulting partition and can be any real number greater than 1 but generally, it can be 
taken as 2. 
Predefined parameters to the problem: the number of clusters c (1 < c < n), 
fuzzifier m(1 < m < +∞) and error ε. This algorithm can be briefly described as 
follows: 
Algorithm 1: Fuzzy C-means algorithm 
1 Step 1: Initialize centroid matrix (0); ; 0M cijV v V R j
 , by choosing 
randomly from dataset , , 1..Mi iX x x R i n and the membership matrix U
0 by 
using the equation: 
2 2
1
1
1
,1 ,1ik
mc
ik
j jk
u i c k n
d
d

(3) 
3 Where ( )ik k i k id d x v x v is the Euclidian distance from object xk to the 
cluster center vi 
4 Step 2: 
5 Repeat: 
6 Update the centroid matrix ( ) ( ) ( ) ( )
1 2, ,...,
j j j j
cV v v v 
 by:
1
1
( )
,1
( )
n
m
k
k
i n
m
ik
k
uik x
v i c
u


7 Update the membership matrix U(j) by using (3) 
8 Assign data xj to cluster ci if data (ui (xj) > uk (xj)), k = 1,..., c and j ≠ k. 
9 until : ( 1) ( )j jMax U U  
10 Step 3: Return U and V. 
3. ENTROPY-BASED INTUITIONISIC FUZZY 
 C-MEANS CLUSTERING 
As an enhancement of classical FCM, the EntropyBased Intuitionisic Fuzzy C-
means Clustering(EIFCM) use the intuitionistic fuzzy sets with the aim to better 
handle the hesitant data. The general idea of this algorithm is both to minimize the 
distance from data points to cluster centroids such in FCM and minimize the 
hesitant to the IFS or entropy on IFS (Eq.1). Next, the hesitant of the IFS is 
integrated with fuzzy membership in FCM to handle the fuzziness. 
Research 
Journal of Military Science and Technology, Special Issue, No.54A, 05 - 2018 49
The first step is to build the objective function to minimize the hesitant to an 
intuition of the IFS or entropy on IFS (Eq.1) and the distance between all of the 
data points to the clusters, simultaneously. Thus, an objective function in IFCM is 
formed as follows: 
2
1
1 1 1 1
(1 ) ( )
c n c nP P
ij i
i j i j
J dij ij 
   
(4) 
where πij is the hesitant of the j
th data point to the ith cluster, dij =||xj − vi|| is 
Euclidean distance between the data point xj and the centroid vi, c is number of 
clusters, n is number of data points, p is a weighting exponent for the hesitant 
membership (p > 1), the scale parameter βi of each cluster. 
To minimize the objective function J1 (Eq.4), the first component with the 
distance between all of the data points to the cluster centroids is as small as 
possible, while the second component requires the hesitant of IFS or entropy of 
IFS is also as small as possible. The scale parameter βi for each the ith cluster is 
determined depending on the size of the ith cluster and can be determined as 
follows: 
2
1
1
=
n
P
ij ij
j
i n
P
ij
j
d



(5) 
The main idea of the Entropy-Based Intuitionisic Fuzzy C-means Clustering 
(EIFCM) algorithm is the combination of the fuzzy membership and the hesitant. 
Thus, the objective function for the hesitant (Eq.4) is considered as a distance 
measurement and integrated with the fuzzy membership of the FCM, we have the 
objective function of the EIFCM as follows: 
 2
1 1
1
Pc n Pm
EIFC M ij ij iji
i j
ijJ u d  
  
(6) 
2
1 1 1 1
( )1
Pc n c n
m m P
i ijij ij ij
i j i j
iju d u 
    
(7) 
in which: uij is the fuzzy membership of the data point xj to the i
th cluster and m is a 
constant as in FCM. 
Theorem 3.1: The JEIF CM in Eq.7 attains its local minima when U ≡ [uij]c×n 
and Q = [πij]c×n are assigned the following values: 
1/( 1)*
*
1
1
=ij m
c
ij
k kj
u
d
d

(8) 
1/( 1)
2
1
=
1
ij p
ij
i
d

(9) 
Information technology & Applied mathematics 
T. Q. Hung, N. A. Cuong, N. D. Dzung, “Entropy-based intuitionistic fuzzy  clustering.” 50 
In which i = 1, 2, ..., c; j = 1, 2, ..., n; c is the number of clusters and n is the 
number of patterns and 
* 2(1 ) p pij ij ij i ijd d  
(10) 
with dij is the Euclid distance between data point xj and the centroid vi which is 
defined as: 
1
1
(1 )
=
(1 )
n
m p
ij ij j
j
i n
m p
ij ij
j
u x
v
u


(11) 
the scale parameter for each ith cluster is determined depending on the size of the ith 
cluster can be determined as follows: 
2
1
1
=
n
m p
ij i j i j
j
i n
m p
ij i j
j
u d
u



(12) 
The proof of Theorem 3.1 can be easily done with lagrange multiplier with the 
constraint 
1
1
c
i j
i
u
 
Because of applying the fuzzy membership and the hesitant to the EIFCM 
algorithm with the constraint uij + πij ≤ 1, if uij + πij > 1 then the uij and πij will be 
normalized as 
; = =ij ijij ij ij
ij ij ij
u
t u u and
u t
In defuzzification step, the combined membership 
*
iju is usually as follows: 
*
ij ij iju u 
 (13) 
Where: uij is the membership and πij is the hesitant of the j
th data point to the ith 
cluster. 
However, Eq.13 seems ineffective, we take a simple example as follows: 
Predefined a data x, a set A. To assess a membership function of x in set A based 
on intuitionistic fuzzy theory. Let u is the membership function, v is the non-
membership function and the hesitation degree π (u+v+π=1) 
Case 1: Assuming that: u = 0.6, v = 0.1 and π = 0.3 we have u∗ = 0:9 according 
to Eq.13 
Case 2: Assuming that: u = 0.8, v=0.1 and π = 0.1. we also have u∗ = 0:9 
according to Eq.13 
One can easily realize that: case 2 is better than case 1. However, with the above 
Eq.13, the aggregate membership functions u∗ are 0.9 in both two cases and we cannot 
determine the better one. 
Remark: The fact that hesitation degree π usually is not affect performance 
results. We are only interested in two factors that affect the decision are the 
membership function u and the non-membership function v. 
Therefore, we propose a new way to define the aggregate membership functions 
u∗ satisfying properties: the maximum membership function and the minimum 
Research 
Journal of Military Science and Technology, Special Issue, No.54A, 05 - 2018 51
non-membership function 
*
ij ij iju u v 
 (14) 
Where: uij is the membership function Eq.3 and vij is the the non-membership 
function of the jth data in ith cluster. 
Substituting vij = 1 - uij - πij in Eq.14, we have: 
* =ij ij iju u v 
(15) 
= (1 )ij ij iju u 
(16) 
= 2 ij iju 
 (17) 
The experiments are completed for several Machine Learning data 
( 
Table 1. Characteristic of heart disease datasets. 
Dataset No of Instance Class 
Cleveland 303 5 
Hungarian 294 5 
Switzerland 123 5 
Long Beach VA 200 5 
Table 2. Characteristic of breast cancer diagnostic datasets. 
Dataset No of Instance Class 
Breast-cancer-
Wisconsin 
699 2 
WDBC 569 2 
WPBC 198 2 
Parameters m = 2 for FCM, IFCM, fuzifier m = 2; p = 2 for EIFCM, the number 
of clusters c is the number of classes and error parameter ε = 0:00001. 
The resulting classification performance of the classification is evaluated by the 
accuracy rate CT as follows: 
=
TR
CT
TT
(18) 
Where: TR is the number of correctly classified data and TT is the total number 
of data points. 
The clustering results of the clustering or the quality of classification are 
reported in terms of index CT, which are shown in Tab.3 and Tab.4 
Table 3. Clustering results of heart disease datasets. 
Dataset FCM IFCM EIFCM 
Cleveland 85.6 87.6 92.1 
Hungarian 78.9 85.5 91.4 
Switzerland 76.5 90.1 90.1 
Long Beach VA 87.8 88.9 94.3 
Information technology & Applied mathematics 
T. Q. Hung, N. A. Cuong, N. D. Dzung, “Entropy-based intuitionistic fuzzy  clustering.” 52 
Table 4. Clustering results of breast cancer diagnostic datasets. 
Dataset FCM IFCM EIFCM 
Breast-cancer-
Wisconsin 
90.3 87.4 92.8 
WDBC 89.8 89.3 93.4 
WPBC 77.8 81.7 90.9 
These experimental results in Tab.3 and Tab.4 show that the effectiveness of the 
algorithm EIFCM is better than FCM and IFCM with the accuracy rate CT is 
always over 90%. 
Experiment: The experiments are done based on well-known images with the 
predefined the number of clusters images in TableV and Fig.1. The results were 
measured on the basis of several validity indexes to assess the performance of the 
algorithms on the experimental images. 
Table 5. The number of clusters. 
Image Number of cluster 
Rose 3 
Wolf 3 
Mountain 4 
Figure 1. Test Images: a) Rose image; b) Wolf image; c) Mountain image. 
Table 6. The various validity indexes on the experimental images. 
These test images were clustered by the FCM, IFCM [5] and EIFCM 
algorithms with predefined parameters: the exponential parameters m = 2 for FCM 
Research 
Journal of Military Science and Technology, Special Issue, No.54A, 05 - 2018 53
and IFCM, m = 2; p = 2 for EIFCM, the number of clusters c is the number of 
classes and error parameter ε = 0:00001. The different validity indexes is used to 
assess the clustering results such as the Bezdeks partition coefficient (PC-I), the 
Dunns separation index (Dunn-I), the Davies-Bouldins index (DB-I), and the 
Separation index (S-I), Xie and Beni’s index (XB-I), Classification Entropy index 
(CE-I) [6]. The various validity indexes are shown in the Tab.6. 
Note that: The validity indexes are proposed to evaluate the quality of 
clustering. The better algorithm has smaller T-I, DB-I, XB-I, S-I, CE-I and larger 
PCI. The results in Table 6 show that the EIFCM (the proposed algorithm) have 
more good performance or higher quality clustering times than the FCM, IFCM 
algorithms. 
4. CONCLUSION 
This paper presented a fuzzy c-means clustering algorithm based on 
intuitionistic fuzzy sets, which improved the clustering results and overcome the 
drawbacks of the conventional clustering algorithms in handling the hesitant. The 
proposed approach has solved the problem of combining between IFSs and fuzzy 
clustering to improve the quality of clustering results. The experiments are done 
based on some well-known datasets with the statistics show that the proposed 
algorithm generates better results than the traditional method like FCM. The next 
goal is some researches related to use the general type-2 intuitionistic fuzzy sets to 
better improvement of quality. 
REFERENCES 
[1]. T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, A. Y. 
Wu, “An Efficient k-Means Clustering Algorithm: Analysis and 
Implementation”, IEEE Trans. On Pattern Analysis and Machine Intelligence, 
Vol 24(7), 881-893, (2002). 
[2]. M.C. Hung, J. Wu, J.H. Chang, D.L. Yang, “An Efficient k-Means Clustering 
Algorithm Using Simple Partitioning”, J. of Info. Science and Engineering, 
Vol.21, 1157-1177, (2005). 
[3]. K.R. Zalik, “An efficient k-means clustering algorithm”, Pattern Recognition 
Letters Vol. 29, 1385 - 1391, (2008). 
[4]. K. A. Abdul Nazeer, M. P. Sebastian, “Improving the Accuracy and Efficiency 
of the k-means Clustering Algorithm”, Proceedings of the World Congress on 
Engineering, (2009). 
[5]. Z. Xu, J. Wu, “Intuitionistic fuzzy C-means clustering algorithms”, Journal of 
Systems Engineering and Electronics, vol.21 (4), pp.580-590, (2010). 
[6]. W. Wang, Y. Zhang, “On fuzzy cluster validity indices”, Fuzzy Sets and 
Systems 158, 2095-2117, (2007). 
[7]. J. C. Bezdek, “Pattern Recognition with Fuzzy Objective Function 
Algorithms”, New York: Academic, (1981). 
[8]. J. Yu and M.-S. Yang, “Optimality test for generalized FCM and its 
application to parameter selection”, IEEE Trans. Fuzzy Syst., vol. 13, no. 1, 
pp. 164176, (2005). 
Information technology & Applied mathematics 
T. Q. Hung, N. A. Cuong, N. D. Dzung, “Entropy-based intuitionistic fuzzy  clustering.” 54 
[9]. K. Atanassov, “Intuitionistic fuzzy sets”, Fuzzy Sets and Systems, vol.20, pp. 
87-96, (1986). 
[10]. P. Balasubramaniam, V.P. Ananthi, “Segmentation of nutrient deficiency in 
incomplete crop images using intuitionistic fuzzy C-means clustering 
algorithm”, Nonlinear Dynamics, vol.83(1-2), pp. 849-866, (2016). 
[11]. F.H. Huang, Y.J. Ye, C.H. Kao, “Developing a novel Intuitionistic Fuzzy 
Importance-performance Analysis for evaluating corporate social 
responsibility in sport tourism event”, Expert Systems With Applications, 
vol.42(19), pp.6530-6538, (2015). 
[12]. H. Verma, R.K. Agrawal, “Possibility Intuitionistic Fuzzy c-Means 
Clustering Algorithm for MRI Brain Image Segmentation”, International 
Journal on Artificial Intelligence Tools, vol.24(5), (2015). 
[13]. W.S. Sheng. B.Q. Hu, “Aggregation distance measure and its induced 
similarity measure between intuitionistic fuzzy sets”, Pattern Recognition 
Letters, vol.60-61, pp.65-71, (2015). 
[14]. A. Chaudhuri, “Intuitionistic Fuzzy Possibilistic C Means Clustering 
Algorithms”, Advances in Fuzzy Systems, (2015). 
[15]. P.Burillo and H.Bustince, “Construction theorems for intuitionistic fuzzy 
sets”, Fuzzy Sets and Systems, vol.84, pp.271281, (1996). 
[16]. P.Couto, “Image segmentation using atanassov intuitionistic fuzzy sets”, 
Ph.D. thesis, Trs-os-Montes e Alto Douro University, Vila Real, Portugal, 
(2006). 
[17]. H.Bustince, E.Barrenechea, M. Pagola and R. Orduna, “Image Thresholding 
Computation Using Atanassovs Intuitionistic Fuzzy Sets”, Journal of 
Advanced Comput. Int. and Int. Informatics, vol.11 (2), pp. 187-194, (2007). 
TÓM TẮT 
PHÂN CỤM C-MEANS MỜ TRỰC CẢM DỰA TRÊN ENTROPY 
Với sự phát triển nhanh chóng của các bộ dữ liệu không chắc chắn, một 
phương pháp phân cụm C-means mờ trực cảm entropy (EIFCM) được đề xuất 
dựa trên cơ sở các tập mờ trực cảm (IFS) cho các bài toán phân cụm. Các ưu 
điểm của của các tập mờ trực cảm và các tập mờ được kết hợp với nhau trong 
phương pháp đề xuất để khắc phục một số hạn chế của phương pháp FCM 
trong việc xử lý dữ liệu không chắc chắn hoặc do dự cũng như giải quyết tính 
mờ. Các kết quả thực nghiệm cho thấy thuật toán được đề xuất tốt hơn so với 
các thuật toán phân cụm mờ truyền thống. 
Từ khóa: Tập mờ, Tập mờ trực cảm, Trực cảm, Phân cụm C-means mờ, Trực cảm dựa trên Entropy. 
Received 27 th February 2018 
Revised 5th April 2018 
Accepted 20 th April 2018 
Author affiliations: 
 Le Quy Don Technical University, Hanoi. 
*Corresponding author: truongqhung.@gmail.com 

File đính kèm:

  • pdfentropy_based_intuitionistic_fuzzy_c_means_clustering.pdf