next up previous contents index
Next: Evaluation of clustering Up: Problem statement Previous: A note on terminology.   Contents   Index

Cardinality - the number of clusters

A difficult issue in clustering is determining the number of clusters or cardinality of a clustering, which we denote by $K$. Often $K$ is nothing more than a good guess based on experience or domain knowledge. But for $K$-means, we will also introduce a heuristic method for choosing $K$ and an attempt to incorporate the selection of $K$ into the objective function. Sometimes the application puts constraints on the range of $K$. For example, the Scatter-Gather interface in Figure 16.3 could not display more than about $K=10$ clusters per layer because of the size and resolution of computer monitors in the early 1990s.

Since our goal is to optimize an objective function, clustering is essentially a search problem. The brute force solution would be to enumerate all possible clusterings and pick the best. However, there are exponentially many partitions, so this approach is not feasible.[*] For this reason, most flat clustering algorithms refine an initial partitioning iteratively. If the search starts at an unfavorable initial point, we may miss the global optimum. Finding a good starting point is therefore another important problem we have to solve in flat clustering.


next up previous contents index
Next: Evaluation of clustering Up: Problem statement Previous: A note on terminology.   Contents   Index
© 2008 Cambridge University Press
This is an automatically generated page. In case of formatting errors you may want to look at the PDF edition of the book.
2009-04-07