# The k-set problem

The $k$-set problem is a classic open problem in Combinatorial Geometry, which is considered similar in spirit to incidence problems and to distance problems. While in recent years algebraic techniques led to significant progress in incidence and distance problems, so far none of the new methods were applied to the $k$-set problem. This post briefly introduces the problem, and describes a way of reducing it to an incidence problem. I hope that it will get someone to think of the $k$-set problem in an algebraic context and maybe come up with some new observations.

Let ${\cal P}$ be a set of $n$ points in ${\mathbb R}^2$, and let $1\le k\le n/2$ be an integer. A $k$-set of ${\cal P}$ is a subset ${\cal P}'\subset {\cal P}$ of $k$ points such that there exists a line that separates ${\cal P}'$ from ${\cal P}\setminus{\cal P}'$. That is, ${\cal P}'$ is on one open half-plane defined by the line, while ${\cal P}\setminus{\cal P}'$ is on the other half-plane. The following figure depicts two 5-sets of a set of 12 points.

The $k$-set problem asks for the maximum number of $k$-sets that a set of $n$-points in ${\mathbb R}^2$ can have. As a first silly example, note that the maximum number of 1-sets is exactly $n$. It is not difficult to show that we may assume that no three points are collinear (specifically, perturbing the point set cannot significantly decrease the number of $k$-sets).

The number of subsets of $k$ points of ${\cal P}$ is $\Theta(n^k)$, so this is a straightforward upper bound for the maximum number of $k$ sets. However, we can easily obtain a significantly stronger upper bound. Consider a $k$-set ${\cal P}'\subset{\cal P}$ that is separated by a line $\ell$. We translate $\ell$ until it is incident to a point $p\in {\cal P}'$ and then rotate $\ell$ around $p$ until it is incident to a second point of ${\cal P}$ (not necessarily in ${\cal P}'$). For example, see the figure below. Thus, to find all of the $k$-sets of ${\cal P}$ it suffices to consider every line that is incident to two points of ${\cal P}$. Since there are $O(n^2)$ such lines, this is also an upper bound for the number of $k$-sets.

The current best upper bound for the maximum number of $k$-sets is $O(nk^{1/3})$. This bound was derived by Dey about 20 years ago. Similarly to the pre-algebraic era in incidence and distance problems, this bound was obtained by using the crossing lemma. In the other direction, Tóth constructed a set of $n$ points and $ne^{\Omega(\sqrt{\log k})}$ $k$-sets. This result was published in 2001, and the problem seems to have been stuck since then (although there are many later work on variants of the problem).

A dual problem. The $k$-set problem has an interesting dual formulation. Consider only the $k$-sets ${\cal P}' \subset {\cal P}$ that are separated by a line $\ell$ such that ${\cal P}'$ is above the line (the case of $k$-sets below the separating line can be handled symmetrically). Given a point $a=(a_x,a_y)\in {\cal P}$, we define the dual line $a^*$ by the equation $y=a_xx-a_y$. Given a line $\ell\subset {\mathbb R}^2$ defined by $y=bx-c$, we define the dual point as $\ell^* = (b,c)$. Note that $a$ is incident to $\ell$ if and only if $\ell^*$ is incident to $a^*$. Indeed, both cases occur if and only if $c=ba_x-a_y$. By replacing the equality with $\ge$, we get that $a$ is above $\ell$ if and only if $\ell^*$ is above $a^*$.

Consider the set of lines

${\cal L} = \left\{ p^* \ :\ p\in {\cal P} \right\}.$

Let $\ell$ be a line that has a $k$-set ${\cal P}'\subset {\cal P}$ above it and ${\cal P}\setminus{\cal P}'$ below it. Let ${\cal P}^*$ be the set of lines dual to the points of ${\cal P}'$. Then the point $\ell^*$ is above the lines of ${\cal P}^*$ and below the lines of ${\cal L}\setminus{\cal P}^*$. That is, a point of ${\mathbb R}^2$ is dual to a line that separates a $k$-set if and only if it has exactly $k$ lines of ${\cal L}$ below it.

As stated above, it suffices to consider $k$-sets where the separating line is incident to at least two points of ${\cal P}$. The dual statement says that it suffices to bound the points that are incident to at least two lines of ${\cal L}$. That is, the $k$-set problem was reduced to the following.

Problem 1. Let ${\cal L}$ be a set of $n$ lines in ${\mathbb R}^2$. Find an upper bound for the number of points of ${\mathbb R}^2$ that are incident to at least two lines of ${\cal L}$ and have exactly $k-1$ or $k-2$ lines of ${\cal L}$ above them.

Problem 1 asks for the complexity of the $k$‘th level in an arrangement of $n$ lines.

Algebraic techniques. How can the $k$-set problem be studied using algebraic techniques? I will now present one such connection that I recently noticed. I suspect that stronger and more “natural” connections might exist, so I hope that you will ignore the following and try to come up with a better approach.

In a recent paper by Fox, Pach, Suk, Zahl, and myself, we studied incidences with semi-algebraic objects. The following is a simple corollary of a theorem from that paper.

Corollary 2. Let ${\cal P}$ be a set of $m$ points in the plane and let $\Pi$ be a set of $n$ open half-planes, both in ${\mathbb R}^2$. Let $k$ be a positive integer (which may depend on $m$ and $n$) such that the incidence graph of ${\cal P}\times \Pi$ contains no copy of $K_{2,k}$. Then

$I({\cal P},\Pi) = O(m^{2/3}n^{2/3}k^{1/3}+n+mk).$

We can reduce the $k$-set problem to a similar incidence problem. Recall that we are given a set ${\cal P}$ of $n$ points in ${\mathbb R}^2$ and can think of every $k$-set as an open half-plane that contains exactly $k$ points. The intersection of two such half-planes contains at most $k-1$ common points of ${\cal P}$. That is, the number of $k$-sets is upper bounded by the number of $k$-rich half-planes when assuming that the incidence graph contains no copy of $K_{k,2}$. By a duality argument as described above, we get to the problem of bounding the number of $k$-rich points when there is no $K_{2,k}$ in the incidence graph.

While Corollary 2 provides a nice bound for incidences with half-planes when no $K_{2,k}$ exists, I could not obtain any non-trivial bound for the number of $k$-rich points. Joshua Zahl pointed out one reason for why this is difficult to do: The number of $(k-1)$-rich points can be infinite, and our current incidence tools are not delicate enough to distinguish between $k$-rich and $(k-1)$-rich points. Moreover, it is possible that the two problems are not asymptotically equivalent, and the maximum number of rich points is larger than the maximum number of $k$-sets. If you have additional observations concerning this problem, I will be happy to hear about them.