Yanto, Iwan Tri Riyadi (2023) Hard clustering technique based on multi soft set and multinomial distribution function for categorical data. Doctoral thesis, Universiti Tun Hussein Onn Malaysia.
|
Text
24p IWAN TRI RIYADI YANTO.pdf Download (541kB) | Preview |
|
![]() |
Text (Copyright Declaration)
IWAN TRI RIYADI YANTO COPYRIGHT DECLARATION.pdf Restricted to Repository staff only Download (399kB) | Request a copy |
|
![]() |
Text (Full Text)
IWAN TRI RIYADI YANTO WATERMARK.pdf Restricted to Registered users only Download (2MB) | Request a copy |
Abstract
Categorical data clustering is still an issue due the complexities of measuring the similarity of data. Unlike the numerical data, the categorical data contains the attributes which do not have any natural order. Distance measure-based technique such as kmean cannot be executed straightforwardly on the categorical attribute. Fuzzy k-modes and its improvement likes Hard k-modes, Ng’s k-modes, He’s k-modes, Initialization k-modes, Fuzzy k-modes, Hard and Fuzzy Centroid were proposed to avoid the limitation of k-mean handling the categorical data. The Grade of Membership (GoM) and Fuzzy k-Partition (FkP) were proposed as a parametric-based to improve the Purity and accuracy. However, these clustering techniques still produce clusters with weak intra-similarity and low Purity . Moreover, converting categorical attributes into binary values makes complexities be high. On the other hand, categorical data have multivalued attribute that can be represented as a multi soft set and can be assumed following a random sample multivariate multinomial distribution. This study proposes a clustering technique based on soft set theory for categorical data via multinomial distribution function. The data is represented as multi soft set where every object in each soft set has probability. The probability of each object is calculated by the cluster joint distribution function following the multivariate multinomial distribution function. The experiment results show that the proposed technique has better performance cluster stability in term of Dunn Index. It has improved the error mean of the estimation parameters up to 24.29 % and 2.24%, reducing the complexity to 73.75% and processing times up to 92.96%, Rank Index up to 0.8850 and Purity 0.9197
Item Type: | Thesis (Doctoral) |
---|---|
Subjects: | Q Science > QA Mathematics > QA76 Computer software |
Depositing User: | Pn Sabarina binti Che Mat |
Date Deposited: | 02 May 2024 01:46 |
Last Modified: | 02 May 2024 01:46 |
URI: | http://eprintsthesis.uthm.edu.my/id/eprint/146 |
Actions (login required)
![]() |
View Item |