Computational Complexity

Big picture #

Many problems in math, programming, and life have the form Given an input of variable size, return an output satisfying some condition. For example:

For each of these problems, CCT asks how the computational resources needed to solve the problem scale asymptotically with $n$. Does finding a solution take exponentially many CPU cycles, or merely polynomially many? How about finding a solution in finite time using order $\log n$ memory locations? What about a quantum computer solving the problem in polynomially many cycles with a probability of error below some cutoff?

A computational constraint defines a complexity class, the set of all problems that you can solve under that constraint. Hence the class $\textsf P$ is the set of all problems that you can solve if you’re willing to run your computer for only polynomially many steps. Another famous complexity class $\textsf{NP}$ is the set of all problems for which you can check a solution in polynomially many time steps. That is, if I give you an ostensible solution to any problem in $\textsf {NP}$, you can dismiss it as phoney or confirm that it’s valid in $O(n^k)$ steps for some finite $k$.

Why is computational complexity theory such a mess? #

By its own experts’ admission, computational complexity theory is a zoo. Hundreds of distinct complexity classes have been treated in the academic literature, and the containments among them are still quite poorly understood. Just go to this interactive explorer, click on a typical complexity class, and observe how many of the other classes turn gray, meaning that we don’t know whether they’re harder or easier than problems in the selected class. And to make matters worse, it’s unlikely that the zoo is complete. We almost certainly haven’t identified all of the complexity classes of practical or theoretical interest, let alone mapped out how they relate to all presently known complexity classes.

What would it take for CCT to turn from a zoo into a well organized library? Ambitiously, we would like to solve the meta-complexity problem: Is there an efficient algorithm that takes a problem $P$ and a complexity class $C$ and returns true iff $P$ is in $C$? We hope that there does exist such an algorithm, because to find it would be to solve CCT. But we might be unlucky enough to live in a world where the algorithm doesn’t exist, and there is no general efficient procedure for answering questions about computational complexity. In that case, the most CCT can hope to be is a giant encyclopedia of problems whose complexities happen to be easily characterizable. We would remain unable to say anything nontrivial about the complexities of most conceivable problems. This would be a disappointing outcome.

Do we have any good reason to expect a happier outcome? It depends partly on how natural or artificial we find questions about computational complexity. Maybe the computational complexity classes that we’ve defined don’t carve the space of all problems anywhere near its joints. Maybe Turing machines (and by extension, computers) are not natural kinds. Then we would hardly expect there to be elegant, fully general answers to questions about computational complexity.

Is everything in $\textsf{P}$ really tractable? #

It’s often assumed that everything solvable in polynomial time is tractable—ie, if a problem is in $\textsf{P}$, we can actually solve it in practice. There’s no ex ante reason why this assumption should hold. If the best possible algorithm to solve some problem runs in $O(n^{9000}),$ the problem technically belongs in $\textsf{P}$, but we don’t have a prayer of solving it for full-size inputs. Yet empirically, it turns out that most non-contrived problems we can do in polynomial time can be done in linear or quadratic time. (1)The first person to tell me this was David Quarel.

For example, take boolean satisfiability. 1-SAT and 2-SAT are both solvable in linear time, and after that, 3-SAT and above are all $\textsf{NP}$-hard. You can color the nodes of a 2-colorable graph in linear time, but if it’s only guraranteed to be k-colorable for some higher k, that’s $\textsf{NP}$-hard. If you have a graph, and you’d like to find a subset of its nodes that don’t touch each other, you can do that in linear time unless the maximum degree is three or above, in which case it’s $\textsf{NP}$-hard. If you have two strings, and you’d like to find the minimum number of one-character insertions and deletions needed to make them match, that can be done in quadratic time. But if you want to know the number of block insertions and deletions needed to make them match, that’s $\textsf{NP}$-hard again. The stylized fact coming out of all these examples is that if you can do it in $O(n^k)$ for any $k$, you can usually do it for reasonable $k$.

How weird or surprising is this stylized fact? Maybe not very if we consider what programming is like in real life. How often does it happen that you can only solve a problem with twenty-one nested loops, but you never have to iterate over subsets of your input? Pretty much never—at least in my experience. Either you can write a simpler program with less nesting, or else the complexity is exponential, and you should just give up.

We might suspect, though, that this is more a selection effect than it is a deep fact about logic. Maybe there are problems out there that can’t be done any faster than $O(n^{21})$ but we mistake them for $\textsf{NP}$-hard, or maybe they’re so convoluted to pose that no one has thought to study them yet.

Questions #

Reading #

Last updated 12 January 2025