Search for collections on Eprints Thesis Repository

Hard clustering technique based on multi soft set and multinomial distribution function for categorical data

Yanto, Iwan Tri Riyadi (2023) Hard clustering technique based on multi soft set and multinomial distribution function for categorical data. Doctoral thesis, Universiti Tun Hussein Onn Malaysia.

[img]
Preview
Text
24p IWAN TRI RIYADI YANTO.pdf

Download (541kB) | Preview
[img] Text (Copyright Declaration)
IWAN TRI RIYADI YANTO COPYRIGHT DECLARATION.pdf
Restricted to Repository staff only

Download (399kB) | Request a copy
[img] Text (Full Text)
IWAN TRI RIYADI YANTO WATERMARK.pdf
Restricted to Registered users only

Download (2MB) | Request a copy

Abstract

Categorical data clustering is still an issue due the complexities of measuring the similarity of data. Unlike the numerical data, the categorical data contains the attributes which do not have any natural order. Distance measure-based technique such as kmean cannot be executed straightforwardly on the categorical attribute. Fuzzy k-modes and its improvement likes Hard k-modes, Ng’s k-modes, He’s k-modes, Initialization k-modes, Fuzzy k-modes, Hard and Fuzzy Centroid were proposed to avoid the limitation of k-mean handling the categorical data. The Grade of Membership (GoM) and Fuzzy k-Partition (FkP) were proposed as a parametric-based to improve the Purity and accuracy. However, these clustering techniques still produce clusters with weak intra-similarity and low Purity . Moreover, converting categorical attributes into binary values makes complexities be high. On the other hand, categorical data have multivalued attribute that can be represented as a multi soft set and can be assumed following a random sample multivariate multinomial distribution. This study proposes a clustering technique based on soft set theory for categorical data via multinomial distribution function. The data is represented as multi soft set where every object in each soft set has probability. The probability of each object is calculated by the cluster joint distribution function following the multivariate multinomial distribution function. The experiment results show that the proposed technique has better performance cluster stability in term of Dunn Index. It has improved the error mean of the estimation parameters up to 24.29 % and 2.24%, reducing the complexity to 73.75% and processing times up to 92.96%, Rank Index up to 0.8850 and Purity 0.9197

Item Type: Thesis (Doctoral)
Subjects: Q Science > QA Mathematics > QA76 Computer software
Depositing User: Pn Sabarina binti Che Mat
Date Deposited: 02 May 2024 01:46
Last Modified: 02 May 2024 01:46
URI: http://eprintsthesis.uthm.edu.my/id/eprint/146

Actions (login required)

View Item View Item