大佬们
有用Python对关键字分组的吗?使用的什么算法?方便分享一下吗。
比如我找到一万个词,想用工具提前分一下组,应该用啥算法比较好呢?
刚才用ChatGPT写了一个,也可以用。不知道还有更好用的吗?
Kmens算法分组代码:
from sklearn.cluster import KMeans
from sklearn.feature_extraction.text import TfidfVectorizer
# Define the list of words
word_list = ["apple", "banana", "orange", "pineapple", "grape", "strawberry", "kiwi", "mango", "pear"]
# Convert the word list to a matrix of TF-IDF features
vectorizer = TfidfVectorizer(stop_words='english')
X = vectorizer.fit_transform(word_list)
# Cluster the words using K-means
kmeans = KMeans(n_clusters=10, random_state=0).fit(X)
# Print the clusters
for i in range(kmeans.n_clusters):
print(f"Cluster {i+1}:")
cluster_words = [word_list[j] for j in range(len(word_list)) if kmeans.labels_[j] == i]
print(cluster_words)
|