Let be a metric space where is a set and is a metric
A set, is provided together with a parameter. The goal is to find a subset with such that the maximum distance of a point in to the closest point in is minimized. The problem can be formally defined as follows:
That is, Every point in a cluster is in distance at most from its respective center. The k-Center Clustering problem can also be defined on a complete undirected graphG = as follows:
Given a complete undirected graph G = with distances d ∈ N satisfying the triangle inequality, find a subset C ⊆ V with |C| = k while minimizing:
Computational complexity
In a complete undirected graph G = , if we sort the edges in nondecreasing order of the distances: d ≤ d ≤ … ≤ d and let Gi = , where Ei = . The k-center problem is equivalent to finding the smallest index i such that Gi has a dominating set of size at most k. Although Dominating Set is NP-complete, the k-center problem remains NP-hard. This is clear, since the optimality of a given feasible solution for the k-center problem can be determined through the Dominating Set reduction only if we know in first place the size of the optimal solution , which is precisely the difficult core of the NP-Hard problems.
A simple greedyapproximation algorithm that achieves an approximation factor of 2 builds using a farthest-first traversal in k iterations. This algorithm simply chooses the point farthest away from the current set of centers in each iteration as the new center. It can be described as follows:
Pick an arbitrary point into
For every point compute from
Pick the point with highest distance from.
Add it to the set of centers and denote this expanded set of centers as. Continue this till k centers are found
Running time
The ith iteration of choosing the ith center takes time.
The solution obtained using the simple greedy algorithm is a 2-approximation to the optimal solution. This section focuses on proving this approximation factor. Given a set of n points,belonging to a metric space, the greedy K-center algorithm computes a set K of k centers, such that K is a 2-approximation to the optimal k-center clustering of V. i.e. This theorem can be proven using two cases as follows, Case 1: Every cluster of contains exactly one point of
Case 2: There are two centers and of that are both in, for some
Assume, without loss of generality, that was added later to the center set by the greedy algorithm, say in ith iteration.
But since the greedy algorithm always chooses the point furthest away from the current set of centers, we have that and,
Another 2-factor approximation algorithm
Another algorithm with the same approximation factor takes advantage of the fact that the k-center problem is equivalent to finding the smallest index i such that Gi has a dominating set of size at most k and computes a maximal independent set of Gi, looking for the smallest index i that has a maximal independent set with a size of at least k. It is not possible to find an approximation algorithm with an approximation factor of 2 − ε for any ε > 0, unless P = NP. Furthermore, the distances of all edges in G must satisfy the triangle inequality if the k-center problem is to be approximated within any constant factor, unless P = NP.