Multiway number partitioning
In computer science, multiway number partitioning is the problem of partitioning a multiset of numbers into a fixed number of subsets, such that the sums of the subsets are as similar as possible. It was first presented by Ronald Graham in 1969 in the context of the identical-machines scheduling problem. The problem is parametrized by a positive integer k, and called k-way number partitioning. The input to the problem is a multiset S of numbers, whose sum is k*T.
The associated decision problem is to decide whether S can be partitioned into k subsets such that the sum of each subset is exactly T. There is also an optimization problem: find a partition of S into k subsets, such that the k sums are "as near as possible". The exact optimization objective can be defined in several ways:
- Minimize the difference between the largest sum and the smallest sum. This objective is common in papers about multiway number partitioning, as well as papers originating from physics applications.
- Minimize the largest sum. This objective is equivalent to one objective for Identical-machines scheduling. There are k identical processors, and each number in S represents the time required to complete a single-processor job. The goal is to partition the jobs among the processors such that the makespan is minimized.
- Maximize the smallest sum. This objective corresponds to the application of fair item allocation, particularly the maximin share. It also appears in voting manipulation problems, and in sequencing of maintenance actions for modular gas turbine aircraft engines. Suppose there are some k engines, which must be kept working for as long as possible. An engine needs a certain critical part in order to operate. There is a set S of parts, each of which has a different lifetime. The goal is to assign the parts to the engines, such that the shortest engine lifetime is as large as possible.
All these problems are NP-hard, but there are various algorithms that solve it efficiently in many cases.
Some closely-related problems are:
- The partition problem - a special case of multiway number partitioning in which the number of subsets is 2.
- The 3-partition problem - a different and harder problem, in which the number of subsets is not considered a fixed parameter, but is determined by the input.
- The bin packing problem - a dual problem in which the total sum in each subset is bounded, but k is flexible; the goal is to find a partition with the smallest possible k. The optimization objectives are closely related: the optimal number of d-sized bins is at most k, iff the optimal size of a largest subset in a k-partition is at most d.
- The uniform-machines scheduling problem - a more general problem in which different processors may have different speeds.
Approximation algorithms
There are various algorithms that obtain a guaranteed approximation of the optimal solution in polynomial time. There are different approximation algorithms for different objectives.Minimizing the largest sum
The approximation ratio in this context is the largest sum in the solution returned by the algorithm, divided by the largest sum in the optimal solution. Most algorithms below were developed for identical-machines scheduling.Greedy number partitioning loops over the numbers, and puts each number in the set whose current sum is smallest. If the numbers are not sorted, then the runtime is and the approximation ratio is at most. Sorting the numbers increases the runtime to and improves the approximation ratio to 7/6 when k=2, and in general. If the numbers are distributed uniformly in, then the approximation ratio is at most almost surely, and in expectation.- Largest Differencing Method sorts the numbers in descending order and repeatedly replaces numbers by their differences. The runtime complexity is. In the worst case, its approximation ratio is similar – at most 7/6 for k =2, and at most in general. However, in the average case it performs much better than the greedy algorithm: for k =2, when numbers are distributed uniformly in, its approximation ratio is at most in expectation. It also performs better in simulation experiments.
- The Multifit algorithm uses binary search combined with an algorithm for bin packing. In the worst case, its makespan is at most 8/7 for k =2, and at most 13/11 in general.
- Graham presented the following algorithm. For any integer r>0, choose the r largest numbers in S and partition them optimally. Then allocate the remaining numbers arbitrarily. This algorithm has approximation ratio and it runs in time.
- Sahni presented a PTAS that attains OPT in time. It is an FPTAS if k is fixed. For k=2, the run-time improves to. The algorithm uses a technique called interval partitioning.
- Hochbaum and Shmoys presented the following algorithms, which work even when k is part of the input.
- *For any r >0, an algorithm with approximation ratio at most in time.
- *For any r >0, an algorithm with approximation ratio at most in time.
- *For any ε>0, an algorithm with approximation ratio at most in time . This is a PTAS.
Maximizing the smallest sum
The approximation ratio in this context is the smallest sum in the solution returned by the algorithm, divided by the smallest sum in the optimal solution.- For greedy number partitioning, if the numbers are not sorted then the worst-case approximation ratio is 1/k. Sorting the numbers increases the approximation ratio to 5/6 when k=2, and in general, and it is tight.
- Woeginger presented a PTAS that attains an approximation factor of in time, where a huge constant that is exponential in the required approximation factor ε. The algorithm uses Lenstra's algorithm for integer linear programming.
- The FPTAS of Sahni works for this objective too.
Maximizing the sum of products
Jin studies a problem in which the goal is to maximize the sum, over every set i in 1,...,k, of the product of numbers in set i. In a more general variant, each set i may have a weight wi, and the goal is to maximize the weighted sum of products. This problem has an exact solution that runs in time O.A PTAS for general objective functions
Let Ci be the sum of subset i in a given partition. Instead of minimizing the objective function max, one can minimize the objective function max, where f is any fixed function. Similarly, one can minimize the objective function sum, or maximize min, or maximize sum. Alon, Azar, Woeginger and Yadid presented general PTAS-s for these four problems. Their algorithm works for any f which satisfies the following two conditions:- A strong continuity condition called Condition F*: for every ε>0 there exists δ>0 such that, if |y-''x|<δx'', then |f-f|<εf.
- Convexity or concavity.
- Let L := the average sum in a single subset. If some input x is at least L, then there is an optimal partition in which one part contains only x. This follows from the convexity of f. Therefore, the input can be pre-processes by assigning each such input to a unique subset. After this preprocessing, one can assume that all inputs are smaller than L.
- There is an optimal partition in which all subsets sums are strictly betweel L/2 and 2L. Particularly, the partition minimizing the sum of squares Ci2, among all optimal partitions, satisfies these inequalities.
- For any vj > L/d, the sequence S# contains an input vj# which is vj rounded up to the next integer multiple of L/''d2. Note that vj ≤ vj# < vj +L''/d2, and L/d2 < vj /d, so vj # < vj /d.
- In addition, the sequence S# contains some inputs equal to L/d. The number of these inputs is determined such that the sum of all these new inputs equals the sum of all inputs in S# that are at most L/d, rounded up to the next integer multiple of L/d.
- Let
There are two different ways to find an optimal solution to S#. One way uses dynamic programming: its run-time is a polynomial whose exponent depends on d. The other way uses Lenstra's algorithm for integer linear programming.
Dynamic programming solution
Define as the optimal value of the objective function sum, when the input vector is and it has to be partitioned into k subsets, among all partitions in which all subset sums are strictly between L#/2 and 2L#.It can be solved by the following recurrence relation:
- - since their objective sum is empty.
- if - since all inputs must be assigned to a single subset, so its sum is.
- otherwise - since we do not consider optimal solutions outside this range.
- for all : we check all options for the k-th subset, and combine it with an optimal partition of the remainder into k-1 subsets.
Integer linear programming solution
For each vector t in T, introduce a variable xt denoting the number of subsets with this configuration. Minimizing sum can be attained by the solving the following ILP:- Minimize
- subject to
- and
- and.
Converting the solution from the rounded to the original instance
The following lemmas relate the partitions of the rounded instance S# and the original instance S.- For every partition of S with sums Ci, there is a partition of S# with sums Ci#, where.
- For every partition of S# with sums Ci#, there is a partition of S with sums Ci, where, and it can be found in time O.
Non-existence of PTAS for some objective functions
In contrast to the above result, if we take f = 2x, or f=2, then no PTAS for minimizing sum exists unless P=NP. Note that these f are convex, but they do not satisfy Condition F* above. The proof is by reduction from partition problem.Exact algorithms
There are exact algorithms, that always find the optimal partition. Since the problem is NP-hard, such algorithms might take exponential time in general, but may be practically usable in certain cases.- The pseudopolynomial time number partitioning takes memory, where is the largest number in the input. It is practical only when k=2, or when k=3 and the inputs are small integers.
- The Complete Greedy Algorithm considers all partitions by constructing a k-ary tree. Each level in the tree corresponds to an input number, where the root corresponds to the largest number, the level below to the next-largest number, etc. Each of the k branches corresponds to a different set in which the current number can be put. Traversing the tree in depth-first order requires only O space, but might take O time. The runtime can be improved by using a greedy heuristic: in each level, develop first the branch in which the current number is put in the set with the smallest sum. This algorithm finds first the solution found by greedy number partitioning, but then proceeds to look for better solutions.
- The Complete Karmarkar-Karp algorithm considers all partitions by constructing a tree of degree . Each level corresponds to a pair of k-tuples, and each of the branches corresponds to a different way of combining these k-tuples. This algorithm finds first the solution found by the largest differencing method, but then proceeds to find better solutions. For k =2 and k =3, CKK runs substantially faster than CGA on random instances. The advantage of CKK over CGA is much larger in the latter case, and can be of several orders of magnitude. In practice, with k=2, problems of arbitrary size can be solved by CKK if the numbers have at most 12 significant digit s; with k=3, at most 6 significant digits. CKK can also run as an anytime algorithm: it finds the KK solution first, and then finds progressively better solutions as time allows. For k ≥ 4, CKK becomes much slower, and CGA performs better.
- Korf, Schreiber and Moffitt presented hybrid algorithms, combining CKK, CGA and other methods from the subset sum problem and the bin packing problem to achieve an even better performance. Their 2018 journal paper summarizes works from several previous conference papers:
- *Recursive Number Partitioning uses CKK for k=2, but for k>2 it recursively splits S into subsets and splits k into halves.
- *Hybrid recursive number partitioning.
- *Improved bin completion.
- *Improved search strategies.
- *Few machines algorithm.
- *Cached iterative weakening.
- *Sequential partitioning.
Reduction to bin packing
The bin packing problem has many fast solvers. A BP solver can be used to find an optimal number partitioning. The idea is to use binary search to find the optimal makespan. To initialize the binary search, we need a lower bound and an upper bound:- Some lower bounds on the makespan are: /k - the average value per subset, s1 - the largest number in S, and sk + sk+1 - the size of a bin in the optimal partition of only the largest k+1 numbers.
- Some upper bounds can be attained by running heuristic algorithms, such as the greedy algorithm or KK.
- If the result contains more than k bins, then the optimal makespan must be larger: set lower to middle and repeat.
- If the result contains at most k bins, then the optimal makespan may be smaller set higher to middle and repeat.
Variants
In the balanced number partitioning problem, there are constraints on the number of items that can be allocated to each subset.Another variant is the multidimensional number partitioning.