Publication: İlgi Metriği İle Bulanık Kümeleme Tabanlı Video Öneri Sistemi
Abstract
Bu tez kapsamında küme sayısını tahmin eden bulanık ve ilgi tabanlı bir kümeleme yöntemi önerilmiştir. Önerilen Bulanık C Ortalamalar Jensen Shannon (BCOJS) yöntemi, 6 kümeden ve 1000 elemandan oluşan yapay bir veri seti ile çalıştırılmıştır. Çalışmanın sonuçları, bulanık tabanlı bir kümeleme algoritması olan Bulanık C Ortalamalar (BCO), ilgi tabanlı bir yöntem olan Jensen Shannon (JS), bulanık tabanlı olasılıksal kümeleme yöntemleri olan Olasılıksal C Ortalamalar (OCO) ve Olasılıksal Bulanık C Ortalamalar (OBCO) yöntemleri ile karşılaştırılmıştır. Karşılaştırma sonuçlarını değerlendirmek için, 7 farklı küme geçerlilik indeksi ve doğruluk metriği kullanılmıştır. BCOJS, OCO, OBCO, JS ile yapılan kümeleme sonuçları doğruluk metriği ile kıyaslandığında BCO ve BCOJS yöntemi, %81.7059, %81.6864 ile diğer üç yönteme göre daha başarılı bulunmuştur. Küme geçerlilik indeksleri ile yöntemin kümeleme yeteneği test edildiğinde ise BCO ve BCOJS yöntemi yine OCO ve OBCO yönteminden daha iyi sonuç vermiştir. BCOJS algoritmasının eşik değerini belirlemedeki zorluğunu gidermek için eşik değerinin algoritma tarafından belirlendiği adaptif BCOJS yöntemi geliştirilmiştir. Adaptif BCOJS yöntemi farklı maksimum küme sayılarında küme sayısını doğru tahmin etmiştir. Küme geçerlilik indeksi ve doğruluk metriği ile yöntemin kümeleme yeteneği test edildiğinde adaptif BCOJS yöntemi başarılı bulunmuştur. Ayrıca önerilen iki yöntemin başarımı film öneri sisteminde kullanıcıya film önermek amacıyla oluşturulan veri seti üzerinde denenmiştir. Bu amaçla film verisi, aksiyon, macera, komedi, drama ve korku türleri için dirichlet fonksiyonu ile ağırlıklandırılmış ve seçilen bu 5 film türünün özelliklerini içeren bir veri seti oluşturulmuştur. Oluşturulan bu film veri seti, BCO, BCOJS, adaptif BCOJS, OCO, OBCO, JS ile kümelenerek doğruluk metrikleri açısından karşılaştırılmıştır. Bu karşılaştırma sonucunda BCOJS yöntemi %89.4628 ile adaptif BCOJS ise %89.0593 ile diğer yöntemlerden yüksek başarı elde ettiği görülmüştür. Ayrıca önerilen iki yöntemin başarımı küme geçerlilik indeksleri açısından da BCO, OCO, OBCO yöntemleri ile karşılaştırılmıştır. Elde edilen sonuçlara göre BCOJS ve adaptif BCOJS yöntemi, uygun küme sayısını tahmin ederek kullanıcının izleyebileceği benzer filmleri türlerine göre gruplama amacını başarılı bir şekilde gerçekleştirmiştir.
In this thesis, a fuzzy and interest-based clustering method that estimates the number of clusters is proposed. The proposed Fuzzy C Means Jensen Shannon (FCMJS) method was run with an artificial data set consisting of 6 clusters and 1000 elements. The results of the study were compared with Fuzzy C Means (FCM), a fuzzy based clustering algorithm, Jensen Shannon (JS) an interest based method, Probabilistic C Means (PCM) and Probabilistic Fuzzy C Means (PFCM) methods, which are fuzzy-based probabilistic clustering methods. To evaluate the comparison results, 7 different cluster validity indices and accuracy metric were used. When the clustering results of FCMJS, PCM, PFCM, and JS were compared with the accuracy metric, the FCM and FCMJS methods were found to be more successful with 81.7059% and 81.6864% accuracy, respectively, compared to the other three methods. When the clustering ability of the method was tested using cluster validity indices, the FCM and FCMJS methods gave better results than the PCM and PFCM methods. An adaptive FCMJS method has been developed to overcome the difficulty of determining the threshold value in the FCMJS algorithm. When the adaptive FCMJS method was tested with different maximum cluster numbers, it predicted the correct number of clusters. When the clustering ability of the method was tested using cluster validity indices and accuracy metrics, the adaptive FCMJS method was found to be successful. In addition, the performance of the proposed methods was tested on a dataset created to recommend movies to users. The movie data was weighted using the Dirichlet function for action, adventure, comedy, drama, and horror genres to create a dataset containing the characteristics of these 5 movie genres. This movie dataset was clustered using FCM, FCMJS, adaptive FCMJS, PCM, PFCM, and JS, and compared in terms of accuracy metrics. In this comparison, the FCMJS method achieved a high success rate of 89.4628% and the adaptive FCMJS method achieved 89.0593% compared to other methods. Additionally, the performance of the proposed methods was compared with FCM, PCM, and PFCM methods in terms of cluster validity indices. According to the results, the FCMJS and adaptive FCMJS methods successfully grouped similar movies based on their genres by predicting the appropriate number of clusters that a user may watch
In this thesis, a fuzzy and interest-based clustering method that estimates the number of clusters is proposed. The proposed Fuzzy C Means Jensen Shannon (FCMJS) method was run with an artificial data set consisting of 6 clusters and 1000 elements. The results of the study were compared with Fuzzy C Means (FCM), a fuzzy based clustering algorithm, Jensen Shannon (JS) an interest based method, Probabilistic C Means (PCM) and Probabilistic Fuzzy C Means (PFCM) methods, which are fuzzy-based probabilistic clustering methods. To evaluate the comparison results, 7 different cluster validity indices and accuracy metric were used. When the clustering results of FCMJS, PCM, PFCM, and JS were compared with the accuracy metric, the FCM and FCMJS methods were found to be more successful with 81.7059% and 81.6864% accuracy, respectively, compared to the other three methods. When the clustering ability of the method was tested using cluster validity indices, the FCM and FCMJS methods gave better results than the PCM and PFCM methods. An adaptive FCMJS method has been developed to overcome the difficulty of determining the threshold value in the FCMJS algorithm. When the adaptive FCMJS method was tested with different maximum cluster numbers, it predicted the correct number of clusters. When the clustering ability of the method was tested using cluster validity indices and accuracy metrics, the adaptive FCMJS method was found to be successful. In addition, the performance of the proposed methods was tested on a dataset created to recommend movies to users. The movie data was weighted using the Dirichlet function for action, adventure, comedy, drama, and horror genres to create a dataset containing the characteristics of these 5 movie genres. This movie dataset was clustered using FCM, FCMJS, adaptive FCMJS, PCM, PFCM, and JS, and compared in terms of accuracy metrics. In this comparison, the FCMJS method achieved a high success rate of 89.4628% and the adaptive FCMJS method achieved 89.0593% compared to other methods. Additionally, the performance of the proposed methods was compared with FCM, PCM, and PFCM methods in terms of cluster validity indices. According to the results, the FCMJS and adaptive FCMJS methods successfully grouped similar movies based on their genres by predicting the appropriate number of clusters that a user may watch
Description
Citation
WoS Q
Scopus Q
Source
Volume
Issue
Start Page
End Page
76
