We describe in some detail ﬁve main techniques that have emerged for generalized inter- section searching over the past few years. Brieﬂy, these include: an approach based on a geometric transformation, an approach based on generating a sparse representation of the input, an approach based on persistent data structures, a generic method that is applicable to any reporting problem, and an approach for searching on a subset of the input satisfying a speciﬁed range restriction. We illustrate each method with examples.
A Transformation-Based Approach
We ﬁrst illustrate a transformation-based approach for the reporting and counting problems, which converts the original generalized reporting/counting problem to an instance of a related standard reporting/counting problem on which eﬃcient known solutions can be brought to bear. We illustrate this approach by considering the generalized 1-dimensional range searching problem. Let S be a set of n colored points on the x-axis. We show how to preprocess S so that for any query interval q, we can solve eﬃciently the dynamic reporting problem, the static and dynamic counting problems, and the static type-2 counting problem. The solutions for the dynamic reporting problem and the static and dynamic counting problems are from . The type-2 counting solution is from .
We ﬁrst describe the transformation. For each color c, we sort the distinct points of that color by increasing x-coordinate. For each point p of color c, let pred (p) be its predecessor of color c in the sorted order; for the leftmost point of color c, we take the predecessor to be the point −∞. We then map p to the point pl = (p, pred (p)) in the plane and associate with it the color c. Let Sl be the resulting set of points. Given a query interval q = [l, r], we map it to the grounded rectangle ql = [l, r] × (−∞, l).
LEMMA 64.1 There is a point of color c in S that is in q = [l, r] if and only if there is a point of color c in Sl that is in ql = [l, r] × (−∞, l). Moreover, if there is a point of color c in ql, then this point is unique.
Proof Let pl be a c-colored point in ql, where pl = (p, pred (p)) for some c-colored point p ∈ S. Since pl is in [l, r] × (−∞, l), it is clear that l ≤ p ≤ r and so p ∈ [l, r].
For the converse, let p be the leftmost point of color c in [l, r]. Thus l ≤ p ≤ r and since pred (p) /∈ [l, r], we have l > pred (p). It follows that pl = (p, pred (p)) is in [l, r] × (−∞, l).
We prove that pl is the only point of color c in ql. Suppose for a contradiction that tl = (t, pred (t)) is another point of color c in ql. Thus we have l ≤ t ≤ r. Since t > p, we also have pred (t) ≥ p ≥ l. Thus tl = (t, pred (t)) cannot lie in ql—a contradiction.
Lemma 64.1 implies that we can solve the generalized 1-dimensional range reporting (resp., counting) problem by simply reporting the points in ql (resp., counting the number of points in ql), without regard to colors. In other words, we have reduced the generalized reporting (resp., counting) problem to the standard grounded range reporting (resp., count- ing) problem in two dimensions. In the dynamic case, we also need to update Sl when S is updated. We discuss these issues in more detail below.
The Dynamic Reporting Problem
Our data structure consists of the following: For each color c, we maintain a balanced binary search tree, Tc, in which the c-colored points of S are stored in increasing x-order. We maintain the colors themselves in a balanced search tree CT , and store with each color c in CT a pointer to Tc. We also store the points of Sl in a balanced priority search tree (PST ) . (Recall that a PST on m points occupies O(m) space, supports insertions and deletions in O(log m) time, and can be used to report the k points lying inside a grounded query rectangle in O(log m + k) time . Although this query is designed for query ranges of the form [l, r] × (−∞, l], it can be trivially modiﬁed to ignore the points on the upper edge of the range without aﬀecting its performance.) Clearly, the space used by the entire data structure is O(n), where n = |S|.
To answer a query q = [l, r], we simply query the PST with ql = [l, r] × (−∞, l) and report the colors of the points found. Correctness follows from Lemma 64.1. The query time is O(log n + k), where k is the number of points inside ql. By Lemma 64.1, k = i, and so the query time is O(log n + i).
Suppose that a c-colored point p is to be inserted into S. If c /∈ CT , then we create a tree Tc containing p, insert pl = (p, −∞) into the PST , and insert c, with a pointer to Tc, into CT . Suppose that c ∈ CT . Let u be the successor of p in Tc. If u exists, then we set pred (p) to pred (u) and pred (u) to p; otherwise, we set pred (p) to the rightmost point in Tc. We then insert p into Tc, pl = (p, pred (p)) into the PST , delete the old ul from the PST , and insert the new ul into it.
Deletion of a point p of color c is essentially the reverse. We delete p from Tc. Then we delete pl from the PST and if p had a successor, u, in Tc then we reset pred (u) to pred (p), delete the old ul from the PST , and insert the new one. If Tc becomes empty in the process, then we delete c from CT . Clearly, the update operations are correct and take O(log n) time.
THEOREM 64.1 Let S be a set of n colored points on the real line. S can be preprocessed into a data structure of size O(n) such that the i distinct colors of the points of S that are contained in any query interval can be reported in O(log n+i) time and points can be inserted and deleted online in S in O(log n) time.
For the static reporting problem, we can dispense with CT and the Tc’s and simply use a static form of the PST to answer queries. This provides a simple O(n)-space, O(log n + i)- query time alternative to another solution given in .
The static counting problem
We store the points of Sl in non-decreasing x-order at the leaves of a balanced binary search tree, T , and store at each internal node t of T an array At containing the points in t’s subtree in non-decreasing y-order. The total space is clearly O(n log n). To answer a query, we determine O(log n) canonical nodes v in T such that the query interval [l, r] covers v’s range but not the range of v’s parent. Using binary search we determine in each canonical node’s array the highest array position containing an entry less than l (and thus the number of points in that node’s subtree that lie in ql) and add up the positions thus found at all canonical nodes. The correctness of this algorithm follows from Lemma 64.1.
The total query time is O(log2 n).
We can reduce the query time to O(log n) as follows: At each node t we create a linked list, Bt, which contains the same elements as At and maintain a pointer from each entry of Bt to the same entry in At. We then apply the technique of fractional cascading  to the B-lists, so that after an initial O(log n)-time binary search in the B-list of the root, the correct positions in the B-lists of all the canonical nodes can be found directly in O(log n) total time. (To facilitate binary search in the root’s B-list, we build a balanced search tree on it after the fractional cascading step.) Once the position in a B-list is known, the appropriate position in the corresponding A-array can be found in O(1) time.
It is possible to reduce the space slightly (to O(n log n/ log log n)) at the expense of a larger query time (O(log2 n/ log log n)), by partitioning the points of Sl recursively into horizontal strips of a certain size and doing binary search, augmented with fractional cascading, within the strips. Details can be found in .
THEOREM 64.2 Let S be a set of n colored points on the real line. S can be preprocessed into a data structure of size O(n log n) (resp., O(n log n/ log log n)) such that the number of distinctly-colored points of S that are contained in any query interval can be determined in O(log n) (resp., O(log2 n/ log log n)) time.
The dynamic counting problem
We store the points of Sl using the same basic two-level tree structure as in the ﬁrst solution for the static counting problem. However, T is now a BB (α) tree  and the auxiliary structure, D(t), at each node t of T is a balanced binary search tree where the points are stored at the leaves in left to right order by non-decreasing y-coordinate. To facilitate the querying, each node v of D(t) stores a count of the number of points in its subtree. Given a real number, l, we can determine in O(log n) time the number of points in D(t) that have y-coordinate less than l by searching for l in D(t) and adding up the count for each node of D(t) that is not on the search path but is the left child of a node on the path. It should be clear that D(t) can be maintained in O(log n) time under updates.
In addition to the two-level structure, we also use the trees Tc and the tree CT , described previously, to maintain the correspondence between S and Sl. We omit further discussion about the maintenance of these trees.
Queries are answered as in the static case, except that at each auxiliary structure we use the above-mentioned method to determine the number of points with y-coordinate less than l. Thus the query time is O(log2 n). (We cannot use fractional cascading here.)
Insertion/deletion of a point is done using the worst-case updating strategy for BB (α) trees, and take O(log2 n) time.
THEOREM 64.3 Let S be a set of n colored points on the real line. S can be preprocessed into a data structure of size O(n log n) such that the number of distinctly-colored points of S that are contained in any query interval can be determined in O(log2 n) time and points can be inserted and deleted online in S in O(log2 n) worst-case time.
The static type-2 problem
We wish to preprocess a set S of n colored points on the x-axis, so that for each color intersected by a query interval q = [l, r], the number of points of that color in q can be reported eﬃciently. The solution for this problem originally proposed in  takes O(n log n) space and supports queries in O(log n + i) time. The space bound was improved to O(n) in , as follows.
The solution consists of two priority search trees, PST 1 and PST 2. PST 1 is similar to the priority search tree built on Sl in the solution for the dynamic reporting problem, with an additional count stored at each node. Let pl = (p, pred (p)) be the point that is stored at a node in PST 1 and c the color of p. Then at this node, we store an additional number t1(pl), which is the number of points of color c to the right of p.
PST 2 is based on a transformation that is symmetric to the one used for PST 1. For each color c, we sort the distinct points of that color by increasing x-coordinate. For each point p of color c, let next (p) be its successor in the sorted order; for the rightmost point of color c, we take the successor to be the point +∞. We then map p to the point pll = (p, next (p)) in the plane and associate with it the color c. Let Sll be the resulting set of points. We build PST 2 on Sll, with an additional count stored at each node. Let pll = (p, next(p)) be the point that is stored at a node in PST 2 and c the color of p. Then at this node, we store an additional number t2(pll), which is the number of points of color c to the right of next (p).
We also maintain an auxiliary array A of size n. Given a query q = [l, r], we query PST 1 with ql = [l, r] × (−∞, l) and for each color c found, we set A[c] = t1(pl), where pl is the point stored at the node where we found c. Then we query PST 2 with qll = [l, r] × (r, +∞) and for each color c found, we report c and A[c] − t2(pll), where pll is the point stored at the node where we found c. This works because the queries on PST 1 and PST 2 eﬀectively ﬁnd the leftmost and rightmost points of color c in q = [l, r] (cf. Thus, A[c] − t2(pll) gives the number of points of color c in q.
proof of Lemma 64.1).
THEOREM 64.4 A set S of n colored points on the real line can be preprocessed into a data structure of size O(n) such that for any query interval, a type-2 counting query can be answered in O(log n + i) time, where i is the output size.
A Sparsiﬁcation-Based Approach
The idea behind this approach is to generate from the given set, S, of colored objects a colored set, Sl—possibly consisting of diﬀerent objects than those in S—such that a query object q intersects an object in S if and only if it intersects at most a constant number of objects in Sl. This allows us to use a solution to a standard problem on Sl to solve the generalized reporting problem on S. (In the case of a generalized counting problem, the requirement is more stringent: exactly one object in Sl must be intersected.) We illustrate this method with the generalized halfspace range searching problem in Rd, d = 2, 3.
Generalized halfspace range searching in R2 and R3
Let S be a set of n colored points in Rd, d = 2, 3. We show how to preprocess S so that for any query hyperplane Q, the i distinct colors of the points lying in the closed halfspace Q− (i.e., below Q) can be reported or counted eﬃciently. Without loss of generality, we may assume that Q is non-vertical since vertical queries are easy to handle. The approach described here is from .
We denote the coordinate directions by x1, x2,…, xd. Let F denote the well-known point-hyperplane duality transform : If p = (p1,…, pd) is a point in Rd, then F (p) is the hyperplane xd = p1x1 + ··· + pd−1xd−1 − pd. If H : xd = a1x1 + ··· + ad−1xd−1 + ad is a (non-vertical) hyperplane in Rd, then F (H) is the point (a1,... , ad −1, −ad). It is easily veriﬁed that p is above (resp. on, below) H, in the xd -direction, if and only if F (p) is below (resp. on, above) F (H). Note also that F (F (p)) = p and F (F (H)) = H.
Using F we map S to a set Sl of hyperplanes and map Q to the point q = F (Q), both in Rd. Our problem is now equivalent to: “Report or count the i distinct colors of the hyperplanes lying on or above q, i.e., the hyperplanes that are intersected by the vertical ray r emanating upwards from q.”
Let Sc be the set of hyperplanes of color c. For each color c, we compute the upper envelope Ec of the hyperplanes in Sc. Ec is the locus of the points of Sc of maximum xd-coordinate for each point on the plane xd = 0. Ec is a d-dimensional convex polytope which is unbounded in the positive xd-direction. Its boundary is composed of j-faces, 0 ≤ j ≤ d−1, where each j–face is a j-dimensional convex polytope. Of particular interest to us are the (d − 1)-faces of Ec, called facets. For instance, in R2, Ec is an unbounded convex chain and its facets are line segments; in R3, Ec is an unbounded convex polytope whose facets are convex polygons.
Let us assume that r is well-behaved in the sense that for no color c does r intersect two or more facets of Ec at a common boundary—for instance, a vertex in R2 and an edge or a vertex in R3. (This assumption can be removed; details can be found in .) Then, by deﬁnition of the upper envelope, it follows that (i) r intersects a c-colored hyperplane if and only if r intersects Ec and, moreover, (ii) if r intersects Ec, then r intersects a unique facet of Ec (in the interior of the facet). Let E be the collection of the envelopes of the diﬀerent colors. By the above discussion, our problem is equivalent to: “Report or count the facets of E that are intersected by r”, which is a standard intersection searching problem. We will show how to solve eﬃciently this ray-envelope intersection problem in R2 and in R3. This approach does not give an eﬃcient solution to the generalized halfspace searching problem in Rd for d> 3; for this case, we will give a diﬀerent solution in Section 64.3.4.
To solve the ray–envelope intersection problem in R2, we project the endpoints of the line segments of E on the x-axis, thus partitioning it into 2n + 1 elementary intervals (some of which may be empty). We build a segment tree T which stores these elementary intervals at the leaves. Let v be any node of T . We associate with v an x-interval I(v), which is the union of the elementary intervals stored at the leaves in v’s subtree. Let Strip(v) be the vertical strip deﬁned by I(v). We say that a segment s ∈E is allocated to a node v ∈ T if and only if I(v) /= ∅ and s crosses Strip(v) but not Strip(parent (v)). Let E(v) be the set of segments allocated to v. Within Strip(v), the segments of E(v) can be viewed as lines since they cross Strip(v) completely. Let E l(v) be the set of points dual to these lines. We store El(v) in an instance D(v) of the standard halfplane reporting (resp. counting) structure for R2 given in  (resp. ). This structure uses O(m) space and has a query time of O(log m + kv ) (resp. O(m1/2 )), where m = |E (v)| and kv is the output size at v.
To answer a query, we search in T using q’s x-coordinate. At each node v visited, we need to report or count the lines intersected by r. But, by duality, this is equivalent to answering, in R2, a halfplane query at v using the query F (q)− = Q−, which we do using D(v). For the reporting problem, we simply output what is returned by the query at each visited node; for the counting problem, we return the sum of the counts obtained at the visited nodes.
THEOREM 64.5 A set S of n colored points in R2 can be stored in a data structure of size O(n log n) so that the i distinct colors of the points contained in any query halfplane can be reported (resp. counted) in time O(log2 n + i) (resp. O(n1/2)).
Proof Correctness follows from the preceding discussion. As noted earlier, there are O(|Sc |) line segments (facets) in Ec; thus |E | = O(),c |Sc|) = O(n) and so |T | = O(n).
Hence each segment of E can get allocated to O(log n) nodes of T . Since the structure D(v) has size linear in m = |E (v)|, the total space used is O(n log n). For the reporting problem, the query time at a node v is O(log m + kv ) = O(log n + kv ). When summed over the O(log n) nodes visited, this gives O(log2 n + i). To see this, recall that the ray r can intersect at most one envelope segment of any color; thus the terms kv , taken over all nodes
v visited, sum to i.
For the counting problem, the query time at v is O(m1/2 ). It can be shown that if v has
depth j in T , then m = |E (v)| = O(n/2j ).
(See, for instance, [12, page 675].)
overall query time is O(),O(log n)(n/2j )1/2), which is O(n1/2).
In R3, the approach is similar, but more complex. Our goal is to solve the ray–envelope intersection problem in R3. As shown in , this problem can be reduced to certain standard halfspace range queries in R3 on a set of triangles (obtained by triangulating the Ec’s.) This problem can be solved by building a segment tree on the x-spans of the triangles projected to the xy-plane and augmenting each node of this tree with a data structure based on partition trees  or cutting trees  to answer the halfplane queries. Details may be found in .
THEOREM 64.6 The reporting version of the generalized halfspace range searching problem for a set of n colored points in R3 can be solved in O(n log2 n) (resp. O(n2+E)) space and O(n1/2+E + i) (resp. O(log2 n + i)) query time, where i is the output size and E > 0 is an arbitrarily small constant. The counting version is solvable in O(n log n) space and O(n2/3+E ) query time.
Additional examples of the sparsiﬁcation-based approach may be found in . (An example also appears in the next section, enroute to a persistence-based solution of a generalized problem.)
A Persistence-Based Approach
Roughly speaking, we use persistence as follows: To solve a given generalized problem we ﬁrst identify a diﬀerent, but simpler, generalized problem and devise a data structure for it that also supports updates (usually just insertions). We then make this structure partially persistent  and query this persistent structure appropriately to solve the original problem.
We illustrate this approach for the generalized 3-dimensional range searching problem, where we are required to preprocess a set, S, of n colored points in R3 so that for any query box q = [a, b] × [c, d] × [e, f ] the i distinct colors of the points inside q can be reported eﬃciently. We ﬁrst show how to build a semi-dynamic (i.e., insertions-only) data structure for the generalized versions of the quadrant searching and 2-dimensional range searching problems. These two structures will be the building blocks of our solution for the 2- dimensional problem.
Generalized semi-dynamic quadrant searching
Let S be a set of n colored points in the plane. For any point q = (a, b), the northeast quadrant of q, denoted by NE (q), is the set of all points (x, y) in the plane such that x ≥ a and y ≥ b. We show how to preprocess S so that for any query point q, the distinct colors of the points of S contained in NE (q) can be reported, and how points can be inserted into S. The data structure uses O(n) space, has a query time of O(log2 n + i), and an amortized insertion time of O(log n). This solution is based on the sparsiﬁcation approach described previously.
For each color c, we determine the c-maximal points. (A point p is called c–maximal if it has color c and there are no points of color c in p’s northeast quadrant.) We discard all points of color c that are not c-maximal. In the resulting set, let the predecessor, pred (p), of a c-colored point p be the c-colored point that lies immediately to the left of p. (For the leftmost point of color c, the predecessor is the point (−∞, ∞).) With each point p = (a, b), we associate the horizontal segment with endpoints (al, b) and (a, b), where al is the x-coordinate of pred (p). This segment gets the same color as p. Let Sc be the set of such segments of color c. The data structure consists of two parts, as follows.
The ﬁrst part is a structure T storing the segments in the sets Sc, where c runs over all colors. T supports the following query: given a point q in the plane, report the segments that are intersected by the upward-vertical ray starting at q. Moreover, it allows segments to be inserted and deleted. We implement T as the structure given in . This structure uses O(n) space, supports insertions and deletions in O(log n) time, and has a query time of O(log2 n + l), where l is the number of segments intersected.
The second part is a balanced search tree CT , storing all colors. For each color c, we maintain a balanced search tree, Tc, storing the segments of Sc by increasing y-coordinate. This structure allows us to dynamically maintain Sc when a new c-colored point p is inserted. The general approach (omitting some special cases; see ) is as follows: By doing a binary search in Tc we can determine whether or not p is c-maximal in the current set of c-maximal points, i.e., the set of right endpoints of the segments of Sc. If p is not c-maximal, then we simply discard it. If p is c-maximal, then let s1,…, sk be the segments of Sc whose left endpoints are in the southwest quadrant of p. We do the following: (i) delete s2,... , sk from Tc; (ii) insert into Tc the horizontal segment which starts at p and extends leftwards upto the x-coordinate of the left endpoint of sk ; and (iii) truncate the segment s1 by keeping only the part of it that extends leftwards upto p’s x-coordinate. The entire operation can be done in O(log n + k) time.
Let us now consider how to answer a quadrant query, NE (q), and how to insert a point into S. To answer NE (q), we query T with the upward-vertical ray from q and report the colors of the segments intersected. The correctness of this algorithm follows from the easily proved facts that (i) a c-colored point lies in NE (q) if and only if a c-maximal point lies in NE (q) and (ii) if a c-maximal point is in NE (q), then the upward-vertical ray from q must intersect a segment of Sc. The correctness of T guarantees that only the segments intersected by this ray are reported. Since the query can intersect at most two segments in any Sc, we have l ≤ 2i, and so the query time is O(log2 n + i).
Let p be a c-colored point that is to be inserted into S. If c is not in CT , then we insert it into CT and insert the horizontal, leftward-directed ray emanating from p into a new structure Tc. If c is present already, then we update Tc as just described. In both cases, we then perform the same updates on T . Hence, an insertion takes O((k + 1) log n) time.
What is the total time for n insertions into an initially empty set S? For each insertion, we can charge the O(log n) time to delete a segment si, 2 ≤ i ≤ k, to si itself. Notice that none of these segments will reappear. Thus each segment is charged at most once. Moreover, each of these segments has some previously inserted point as a right endpoint. It follows that the number of segments existing over the entire sequence of insertions is O(n) and so the total charge to them is O(n log n). The rest of the cost for each insertion (O(log n) for the binary search plus O(1) for steps (ii) and (iii)) we charge to p itself. Since any p is charged in this mode only once, the total charge incurred in this mode by all the inserted points is O(n log n). Thus the time for n insertions is O(n log n), which implies an amortized insertion time of O(log n).
LEMMA 64.2 Let S be a set of n colored points in the plane. There exists a data structure of size O(n) such that for any query point q, we can report the i distinct colors of the points that are contained in the northeast quadrant of q in O(log2 n+ i) time. Moreover, if we do n insertions into an initially-empty set then the amortized insertion time is O(log n).
Generalized semidynamic 2-dimensional range searching
Our goal here is to preprocess a set S of n colored points in the plane so that for any axes- parallel query rectangle q = [a, b] × [c, d], we can solve the semi-dynamic reporting problem eﬃciently.
Our solution is based on the quadrant reporting structure of Lemma 64.2. We ﬁrst show how to solve the problem for query rectangles ql = [a, b] × [c, ∞). We store the points of S in sorted order by x-coordinate at the leaves of a BB (α) tree T l. At each internal node v, we store an instance of the structure of Lemma 64.2 for NE -queries (resp., NW -queries) built on the points in v’s left (resp., right) subtree. Let X(v) denote the average of the x-coordinate in the rightmost leaf in v’s left subtree and the x-coordinate in the leftmost leaf of v’s right subtree; for a leaf v, we take X(v) to be the x-coordinate of the point stored at v.
To answer a query ql, we do a binary search down T l, using [a, b], until either the search runs oﬀ T l or a (highest) node v is reached such that [a, b] intersects X(v). In the former case, we stop. In the latter case, if v is a leaf, then if v’s point is in ql we report its color.
If v is a non-leaf, then we query the structures at v using the NE -quadrant and the NW – quadrant derived from ql (i.e., the quadrants with corners at (a, c) and (b, c), respectively), and then combine the answers. Updates on T l are performed using the amortized-case updating strategy for BB (α) trees . The correctness of the method should be clear.
The space and query time bounds follow from Lemma 64.2. Since the amortized insertion time of the quadrant searching structure is O(log n), the insertion in the BB (α) tree takes amortized time O(log2 n) .
To solve the problem for general query rectangles q = [a, b] × [c, d], we use the above approach again, except that we store the points in the tree by sorted y-coordinates. At each internal node v, we store an instance of the data structure above to answer queries of
the form [a, b] × [c, ∞) (resp. [a, b] × (−∞, d]) on the points in v’s left (resp. right) subtree.
The query strategy is similar to the previous one, except that we use the interval [c, d] to search in the tree. The query time is as before, while the space and update times increase by a logarithmic factor.
LEMMA 64.3 Let S be a set of n colored points in the plane. There exists a data structure of size O(n log2 n) such that for any query rectangle [a, b] × [c, d], we can report the i distinct colors of the points that are contained in it in O(log2 n + i) time. Moreover, points can be inserted into this data structure in O(log3 n) amortized time.
Generalized 3-dimensional range searching
The semi-dynamic structure of Lemma 64.3 coupled with persistence allows us to go up one dimension and solve the original problem of interest: Preprocess a set S of n colored points in R3 so that for any query box q = [a, b] × [c, d] × [e, f ] the i distinct colors of the points inside q can be reported eﬃciently.
First consider queries of the form ql = [a, b] × [c, d] × [e, ∞). We sort the points of S by non-increasing z-coordinates, and insert them in this order into a partially persistent version of the structure of Lemma 64.3, taking only the ﬁrst two coordinates into account. To answer ql, we access the version corresponding to the smallest z-coordinate greater than or equal to e and query it with [a, b] × [c, d].
To see that the query algorithm is correct, observe that the version accessed contains the projections on the xy-plane of exactly those points of S whose z-coordinate is at least e.
Lemma 64.3 then guarantees that among these only the distinct colors of the ones in [a, b] × [c, d] are reported. These are precisely the distinct colors of the points contained in [a, b] × [c, d] × [e, ∞). The query time follows from Lemma 64.3. To analyze the space requirement, we note that the structure of Lemma 64.3 satisﬁes the conditions given in . Speciﬁcally, it is a pointer-based structure, where each node is pointed to by only O(1) other nodes. As shown in , any modiﬁcation made by a persistent update operation on such a structure adds only O(1) amortized space to the resulting persistent structure. By Lemma 64.3, the total time for creating the persistent structure, via insertions, is O(n log3 n). This implies the same bound for the number of modiﬁcations in the structure, so the total space is O(n log3 n).
To solve the problem for general query boxes q = [a, b] × [c, d] × [e, f ], we follow an approach similar to that described for the 2-dimensional case: We store the points in a
balanced binary search tree, sorted by z-coordinates. We associate with each internal node in the tree the auxiliary structure described above for answering queries of the form [a, b] × [c, d] × [e, ∞) (resp. [a, b] × [c, d] × (−∞,f ]) on the points in v’s left (resp. right)
subtree. (Note that since we do not need to do updates here the tree need not be a BB (α) tree.) Queries are done by searching down the tree using the interval [e, f ]. The query time is as before, but the space increases by a logarithmic factor.
THEOREM 64.7 Let S be a set of n colored points in 3-space. S can be stored in a data structure of size O(n log4 n) such that for any query box [a, b] × [c, d] × [e, f ], we can report the i distinct colors of the points that are contained in it in O(log2 n + i) time.
Additional applications of the persistence-based approach to generalized intersection problems can be found in [18, 19, 21].
We describe a general method from  for solving any generalized reporting problem given a data structure for a “related” standard decision problem.
Let S be a set of n colored geometric objects and let q be any query object. In preprocessing, we store the distinct colors in S at the leaves of a balanced binary tree CT (in no particular order). For any node v of CT , let C(v) be the set of colors stored in the leaves of v’s subtree and let S(v) be the set of those objects of S colored with the colors in C(v). At v, we store a data structure DEC (v) to solve the following standard decision problem on S(v): “Decide whether or not q intersects any object of S(v).” DEC (v) returns “true” if and only if there is an intersection.
To answer a generalized reporting query on S, we do a depth-ﬁrst search in CT and query DEC (v) with q at each node v visited. If v is a non-leaf node then we continue searching below v if and only if the query returns “true”; if v is a leaf, then we output the color stored
there if and only if the query returns “true”.
THEOREM 64.8 Assume that a set of n geometric objects can be stored in a data structure of size M (n) such that it can be decided in f (n) time whether or not a query object intersects any of the n objects. Assume that M (n)/n and f (n) are non-decreasing functions for non-negative values of n. Then a set S of n colored geometric objects can be preprocessed into a data structure of size O(M (n) log n) such that the i distinct colors of the objects in S that are intersected by a query object q can be reported in time O(f (n) + i · f (n) log n).
Proof We argue that a color c is reported if and only if there is a c-colored object in S intersecting q. Suppose that c is reported. This implies that a leaf v is reached in the search such that v stores c and the query on DEC (v) returns “true”. Thus, some object in S(v) intersects q. Since v is a leaf, all objects in S(v) have the same color c and the claim follows.
For the converse, suppose that q intersects a c-colored object p. Let v be the leaf storing c. Thus, p ∈ S(vl) for every node vl on the root-to-v path in CT . Thus, for each vl, the query on DEC (vl) will return “true”, which implies that v will be visited and c will be output.
are the nodes at any level, then the total space used by CT at that level is
i=1 M (|S(vi)|) =
i=1 |S(vi)|· (M (|S(vi)|)/|S(vi)|) ≤ ),
|S(vi)|· (M (n)/n) = M (n),
|S(vi)| = n and since |S(vi)| ≤ n implies that M (|S(vi)|)/|S(vi)| ≤ M (n)/n.
Now since there are O(log n) levels, the overall space is O(M (n) log n). The query time
can be upper-bounded as follows: If i = 0, then the query on DEC (root) returns “false”
and we abandon the search at the root itself; in this case, the query time is just O(f (n)). Suppose that i /= 0. Call a visited node v fruitful if the query on DEC (v) returns “true”
and fruitless otherwise. Each fruitful node can be charged to some color in its subtree that gets reported. Since the number of times any reported color can be charged is O(log n) (the height of CT ) and since i colors are reported, the number of fruitful nodes is O(i log n). Since each fruitless node has a fruitful parent and CT is a binary tree, it follows that there are only O(i log n) fruitless nodes. Hence the number of nodes visited by the search is O(i log n). At each such node, v, we spend time f (|S(v)|), which is O(f (n)) since |S(v)|≤ n and f is non-decreasing. Thus the total time spent in doing queries at the visited nodes is O(i · f (n) log n). The claimed query time follows.
As an application of this method, consider the generalized halfspace range searching in Rd, for any ﬁxed d ≥ 2. For d = 2, 3, we discussed a solution for this problem in Section 64.3.2. For d > 3, the problem can be solved by extending (signiﬁcantly) the ray- envelope intersection algorithm outlined in Section 64.3.2. However, the bounds are not very satisfactory—O(ndld/2J+E) space and logarithmic query time or near-linear space and superlinear query time. The solution we give below has more desirable bounds.
The colored objects for this problem are points in Rd and the query is a closed halfspace in Rd. We store the objects in CT , as described previously. The standard decision problem that we need to solve at each node v of CT is “Does a query halfspace contain any point of S(v).” The answer to this query is “true” if and only if the query halfspace is nonempty. We take the data structure, DEC (v), for this problem to be the one given in . If |Sv | = nv , then DEC (v) uses O(nld/2J/(log n )ld/2J−E ) space and has query time O(log n )
. The conditions in Theorem 64.8 hold, so applying it gives the following result.
THEOREM 64.9 For any ﬁxed d ≥ 2, a set S of n colored points in Rd can be stored in a data structure of size O(nld/2J /(log n)ld/2J−1−E ) such that the i distinct colors of the points contained in a query halfspace Q− can be reported in time O(log n + i log2 n). Here E > 0 is an arbitrarily small constant.
Adding Range Restrictions
We describe the general technique of  that adds a range restriction to a generalized intersection searching problem.
Let PR be a generalized intersection searching problem on a set S of n colored objects and query objects q belonging to a class Q. We denote the answer to a query by PR(q, S).
To add a range restriction, we associate with each element p ∈ S a real number kp. In a range-restricted generalized intersection searching problem, denoted by TPR, a query consists of an element q ∈ Q and an interval [l, r], and
THEOREM 64.11 Let S be a set of n colored points in Rd, where d ≥ 1 is a constant. There exists a data structure of size O(n1+E ) such that for any query box in Rd, we can report the i distinct colors of the points that are contained in it in O(log n + i) time.
Proof The proof is by induction on d. For d = 1, the claim follows from Theorem 64.1. Let d ≥ 2, and let DS be a data structure of size O(n1+E ) that answers generalized (d − 1)-dimensional range queries in O(log n + i) time. Observe that for the generalized d-dimensional range searching problem, there are only polynomially many distinct semi- inﬁnite queries. Hence, there exists a data structure TDS of polynomial size that answers generalized d-dimensional semi-inﬁnite range queries in O(log n + i) time. Applying Theo- rem 64.10 to DS and TDS proves the claim.
Conclusion and Future Directions
We have reviewed recent research on a class of geometric query-retrieval problems, where the objects to be queried come aggregated in disjoint groups and of interest are questions concerning the intersection of the query object with the groups (rather than with the individual objects). These problems include the well-studied standard intersection problems as a special case and have many applications. We have described several general techniques that have been identiﬁed for these problems and have illustrated them with examples.
Some potential directions for future work include: (i) extending the transformation-based approach to higher dimensions; (ii) improving the time bounds for some of the problems discussed here—for instance, can the generalized orthogonal range searching problem in Rd, for d ≥ 4, be solved with O(polylog(n) + i) query time and O(n(log n)O(1)n) space;
(i) developing general dynamization techniques for generalized problems, along the lines of, for instance,  for standard problems; (iv) developing eﬃcient solutions to generalized problems where the objects may be in time-dependent motion; and (v) implementing and testing experimentally some of the solutions presented here.