Techniques
We describe in some detail ﬁve main techniques that have emerged for generalized inter section searching over the past few years. Brieﬂy, these include: an approach based on a geometric transformation, an approach based on generating a sparse representation of the input, an approach based on persistent data structures, a generic method that is applicable to any reporting problem, and an approach for searching on a subset of the input satisfying a speciﬁed range restriction. We illustrate each method with examples.
A TransformationBased Approach
We ﬁrst illustrate a transformationbased approach for the reporting and counting problems, which converts the original generalized reporting/counting problem to an instance of a related standard reporting/counting problem on which eﬃcient known solutions can be brought to bear. We illustrate this approach by considering the generalized 1dimensional range searching problem. Let S be a set of n colored points on the xaxis. We show how to preprocess S so that for any query interval q, we can solve eﬃciently the dynamic reporting problem, the static and dynamic counting problems, and the static type2 counting problem. The solutions for the dynamic reporting problem and the static and dynamic counting problems are from [18]. The type2 counting solution is from [6].
We ﬁrst describe the transformation. For each color c, we sort the distinct points of that color by increasing xcoordinate. For each point p of color c, let pred (p) be its predecessor of color c in the sorted order; for the leftmost point of color c, we take the predecessor to be the point −∞. We then map p to the point pl = (p, pred (p)) in the plane and associate with it the color c. Let Sl be the resulting set of points. Given a query interval q = [l, r], we map it to the grounded rectangle ql = [l, r] × (−∞, l).
LEMMA 64.1 There is a point of color c in S that is in q = [l, r] if and only if there is a point of color c in Sl that is in ql = [l, r] × (−∞, l). Moreover, if there is a point of color c in ql, then this point is unique.
Proof Let pl be a ccolored point in ql, where pl = (p, pred (p)) for some ccolored point p ∈ S. Since pl is in [l, r] × (−∞, l), it is clear that l ≤ p ≤ r and so p ∈ [l, r].
For the converse, let p be the leftmost point of color c in [l, r]. Thus l ≤ p ≤ r and since pred (p) /∈ [l, r], we have l > pred (p). It follows that pl = (p, pred (p)) is in [l, r] × (−∞, l).
We prove that pl is the only point of color c in ql. Suppose for a contradiction that tl = (t, pred (t)) is another point of color c in ql. Thus we have l ≤ t ≤ r. Since t > p, we also have pred (t) ≥ p ≥ l. Thus tl = (t, pred (t)) cannot lie in ql—a contradiction.
Lemma 64.1 implies that we can solve the generalized 1dimensional range reporting (resp., counting) problem by simply reporting the points in ql (resp., counting the number of points in ql), without regard to colors. In other words, we have reduced the generalized reporting (resp., counting) problem to the standard grounded range reporting (resp., count ing) problem in two dimensions. In the dynamic case, we also need to update Sl when S is updated. We discuss these issues in more detail below.
The Dynamic Reporting Problem
Our data structure consists of the following: For each color c, we maintain a balanced binary search tree, Tc, in which the ccolored points of S are stored in increasing xorder. We maintain the colors themselves in a balanced search tree CT , and store with each color c in CT a pointer to Tc. We also store the points of Sl in a balanced priority search tree (PST ) [28]. (Recall that a PST on m points occupies O(m) space, supports insertions and deletions in O(log m) time, and can be used to report the k points lying inside a grounded query rectangle in O(log m + k) time [28]. Although this query is designed for query ranges of the form [l, r] × (−∞, l], it can be trivially modiﬁed to ignore the points on the upper edge of the range without aﬀecting its performance.) Clearly, the space used by the entire data structure is O(n), where n = S.
To answer a query q = [l, r], we simply query the PST with ql = [l, r] × (−∞, l) and report the colors of the points found. Correctness follows from Lemma 64.1. The query time is O(log n + k), where k is the number of points inside ql. By Lemma 64.1, k = i, and so the query time is O(log n + i).
Suppose that a ccolored point p is to be inserted into S. If c /∈ CT , then we create a tree Tc containing p, insert pl = (p, −∞) into the PST , and insert c, with a pointer to Tc, into CT . Suppose that c ∈ CT . Let u be the successor of p in Tc. If u exists, then we set pred (p) to pred (u) and pred (u) to p; otherwise, we set pred (p) to the rightmost point in Tc. We then insert p into Tc, pl = (p, pred (p)) into the PST , delete the old ul from the PST , and insert the new ul into it.
Deletion of a point p of color c is essentially the reverse. We delete p from Tc. Then we delete pl from the PST and if p had a successor, u, in Tc then we reset pred (u) to pred (p), delete the old ul from the PST , and insert the new one. If Tc becomes empty in the process, then we delete c from CT . Clearly, the update operations are correct and take O(log n) time.
THEOREM 64.1 Let S be a set of n colored points on the real line. S can be preprocessed into a data structure of size O(n) such that the i distinct colors of the points of S that are contained in any query interval can be reported in O(log n+i) time and points can be inserted and deleted online in S in O(log n) time.
For the static reporting problem, we can dispense with CT and the Tc’s and simply use a static form of the PST to answer queries. This provides a simple O(n)space, O(log n + i) query time alternative to another solution given in [23].
The static counting problem
We store the points of Sl in nondecreasing xorder at the leaves of a balanced binary search tree, T , and store at each internal node t of T an array At containing the points in t’s subtree in nondecreasing yorder. The total space is clearly O(n log n). To answer a query, we determine O(log n) canonical nodes v in T such that the query interval [l, r] covers v’s range but not the range of v’s parent. Using binary search we determine in each canonical node’s array the highest array position containing an entry less than l (and thus the number of points in that node’s subtree that lie in ql) and add up the positions thus found at all canonical nodes. The correctness of this algorithm follows from Lemma 64.1.
The total query time is O(log2 n).
We can reduce the query time to O(log n) as follows: At each node t we create a linked list, Bt, which contains the same elements as At and maintain a pointer from each entry of Bt to the same entry in At. We then apply the technique of fractional cascading [9] to the Blists, so that after an initial O(log n)time binary search in the Blist of the root, the correct positions in the Blists of all the canonical nodes can be found directly in O(log n) total time. (To facilitate binary search in the root’s Blist, we build a balanced search tree on it after the fractional cascading step.) Once the position in a Blist is known, the appropriate position in the corresponding Aarray can be found in O(1) time.
It is possible to reduce the space slightly (to O(n log n/ log log n)) at the expense of a larger query time (O(log2 n/ log log n)), by partitioning the points of Sl recursively into horizontal strips of a certain size and doing binary search, augmented with fractional cascading, within the strips. Details can be found in [18].
THEOREM 64.2 Let S be a set of n colored points on the real line. S can be preprocessed into a data structure of size O(n log n) (resp., O(n log n/ log log n)) such that the number of distinctlycolored points of S that are contained in any query interval can be determined in O(log n) (resp., O(log2 n/ log log n)) time.
The dynamic counting problem
We store the points of Sl using the same basic twolevel tree structure as in the ﬁrst solution for the static counting problem. However, T is now a BB (α) tree [32] and the auxiliary structure, D(t), at each node t of T is a balanced binary search tree where the points are stored at the leaves in left to right order by nondecreasing ycoordinate. To facilitate the querying, each node v of D(t) stores a count of the number of points in its subtree. Given a real number, l, we can determine in O(log n) time the number of points in D(t) that have ycoordinate less than l by searching for l in D(t) and adding up the count for each node of D(t) that is not on the search path but is the left child of a node on the path. It should be clear that D(t) can be maintained in O(log n) time under updates.
In addition to the twolevel structure, we also use the trees Tc and the tree CT , described previously, to maintain the correspondence between S and Sl. We omit further discussion about the maintenance of these trees.
Queries are answered as in the static case, except that at each auxiliary structure we use the abovementioned method to determine the number of points with ycoordinate less than l. Thus the query time is O(log2 n). (We cannot use fractional cascading here.)
Insertion/deletion of a point is done using the worstcase updating strategy for BB (α) trees, and take O(log2 n) time.
THEOREM 64.3 Let S be a set of n colored points on the real line. S can be preprocessed into a data structure of size O(n log n) such that the number of distinctlycolored points of S that are contained in any query interval can be determined in O(log2 n) time and points can be inserted and deleted online in S in O(log2 n) worstcase time.
The static type2 problem
We wish to preprocess a set S of n colored points on the xaxis, so that for each color intersected by a query interval q = [l, r], the number of points of that color in q can be reported eﬃciently. The solution for this problem originally proposed in [18] takes O(n log n) space and supports queries in O(log n + i) time. The space bound was improved to O(n) in [6], as follows.
The solution consists of two priority search trees, PST 1 and PST 2. PST 1 is similar to the priority search tree built on Sl in the solution for the dynamic reporting problem, with an additional count stored at each node. Let pl = (p, pred (p)) be the point that is stored at a node in PST 1 and c the color of p. Then at this node, we store an additional number t1(pl), which is the number of points of color c to the right of p.
PST 2 is based on a transformation that is symmetric to the one used for PST 1. For each color c, we sort the distinct points of that color by increasing xcoordinate. For each point p of color c, let next (p) be its successor in the sorted order; for the rightmost point of color c, we take the successor to be the point +∞. We then map p to the point pll = (p, next (p)) in the plane and associate with it the color c. Let Sll be the resulting set of points. We build PST 2 on Sll, with an additional count stored at each node. Let pll = (p, next(p)) be the point that is stored at a node in PST 2 and c the color of p. Then at this node, we store an additional number t2(pll), which is the number of points of color c to the right of next (p).
We also maintain an auxiliary array A of size n. Given a query q = [l, r], we query PST 1 with ql = [l, r] × (−∞, l) and for each color c found, we set A[c] = t1(pl), where pl is the point stored at the node where we found c. Then we query PST 2 with qll = [l, r] × (r, +∞) and for each color c found, we report c and A[c] − t2(pll), where pll is the point stored at the node where we found c. This works because the queries on PST 1 and PST 2 eﬀectively ﬁnd the leftmost and rightmost points of color c in q = [l, r] (cf. Thus, A[c] − t2(pll) gives the number of points of color c in q.
proof of Lemma 64.1).
THEOREM 64.4 A set S of n colored points on the real line can be preprocessed into a data structure of size O(n) such that for any query interval, a type2 counting query can be answered in O(log n + i) time, where i is the output size.
A SparsiﬁcationBased Approach
The idea behind this approach is to generate from the given set, S, of colored objects a colored set, Sl—possibly consisting of diﬀerent objects than those in S—such that a query object q intersects an object in S if and only if it intersects at most a constant number of objects in Sl. This allows us to use a solution to a standard problem on Sl to solve the generalized reporting problem on S. (In the case of a generalized counting problem, the requirement is more stringent: exactly one object in Sl must be intersected.) We illustrate this method with the generalized halfspace range searching problem in Rd, d = 2, 3.
Generalized halfspace range searching in R2 and R3
Let S be a set of n colored points in Rd, d = 2, 3. We show how to preprocess S so that for any query hyperplane Q, the i distinct colors of the points lying in the closed halfspace Q− (i.e., below Q) can be reported or counted eﬃciently. Without loss of generality, we may assume that Q is nonvertical since vertical queries are easy to handle. The approach described here is from [19].
We denote the coordinate directions by x1, x2,…, xd. Let F denote the wellknown pointhyperplane duality transform [15]: If p = (p1,…, pd) is a point in Rd, then F (p) is the hyperplane xd = p1x1 + ··· + pd−1xd−1 − pd. If H : xd = a1x1 + ··· + ad−1xd−1 + ad is a (nonvertical) hyperplane in Rd, then F (H) is the point (a1,... , ad −1, −ad). It is easily veriﬁed that p is above (resp. on, below) H, in the xd direction, if and only if F (p) is below (resp. on, above) F (H). Note also that F (F (p)) = p and F (F (H)) = H.
Using F we map S to a set Sl of hyperplanes and map Q to the point q = F (Q), both in Rd. Our problem is now equivalent to: “Report or count the i distinct colors of the hyperplanes lying on or above q, i.e., the hyperplanes that are intersected by the vertical ray r emanating upwards from q.”
Let Sc be the set of hyperplanes of color c. For each color c, we compute the upper envelope Ec of the hyperplanes in Sc. Ec is the locus of the points of Sc of maximum xdcoordinate for each point on the plane xd = 0. Ec is a ddimensional convex polytope which is unbounded in the positive xddirection. Its boundary is composed of jfaces, 0 ≤ j ≤ d−1, where each j–face is a jdimensional convex polytope. Of particular interest to us are the (d − 1)faces of Ec, called facets. For instance, in R2, Ec is an unbounded convex chain and its facets are line segments; in R3, Ec is an unbounded convex polytope whose facets are convex polygons.
Let us assume that r is wellbehaved in the sense that for no color c does r intersect two or more facets of Ec at a common boundary—for instance, a vertex in R2 and an edge or a vertex in R3. (This assumption can be removed; details can be found in [19].) Then, by deﬁnition of the upper envelope, it follows that (i) r intersects a ccolored hyperplane if and only if r intersects Ec and, moreover, (ii) if r intersects Ec, then r intersects a unique facet of Ec (in the interior of the facet). Let E be the collection of the envelopes of the diﬀerent colors. By the above discussion, our problem is equivalent to: “Report or count the facets of E that are intersected by r”, which is a standard intersection searching problem. We will show how to solve eﬃciently this rayenvelope intersection problem in R2 and in R3. This approach does not give an eﬃcient solution to the generalized halfspace searching problem in Rd for d> 3; for this case, we will give a diﬀerent solution in Section 64.3.4.
To solve the ray–envelope intersection problem in R2, we project the endpoints of the line segments of E on the xaxis, thus partitioning it into 2n + 1 elementary intervals (some of which may be empty). We build a segment tree T which stores these elementary intervals at the leaves. Let v be any node of T . We associate with v an xinterval I(v), which is the union of the elementary intervals stored at the leaves in v’s subtree. Let Strip(v) be the vertical strip deﬁned by I(v). We say that a segment s ∈E is allocated to a node v ∈ T if and only if I(v) /= ∅ and s crosses Strip(v) but not Strip(parent (v)). Let E(v) be the set of segments allocated to v. Within Strip(v), the segments of E(v) can be viewed as lines since they cross Strip(v) completely. Let E l(v) be the set of points dual to these lines. We store El(v) in an instance D(v) of the standard halfplane reporting (resp. counting) structure for R2 given in [10] (resp. [26]). This structure uses O(m) space and has a query time of O(log m + kv ) (resp. O(m1/2 )), where m = E (v) and kv is the output size at v.
To answer a query, we search in T using q’s xcoordinate. At each node v visited, we need to report or count the lines intersected by r. But, by duality, this is equivalent to answering, in R2, a halfplane query at v using the query F (q)− = Q−, which we do using D(v). For the reporting problem, we simply output what is returned by the query at each visited node; for the counting problem, we return the sum of the counts obtained at the visited nodes.
THEOREM 64.5 A set S of n colored points in R2 can be stored in a data structure of size O(n log n) so that the i distinct colors of the points contained in any query halfplane can be reported (resp. counted) in time O(log2 n + i) (resp. O(n1/2)).
Proof Correctness follows from the preceding discussion. As noted earlier, there are O(Sc ) line segments (facets) in Ec; thus E  = O(),c Sc) = O(n) and so T  = O(n).
Hence each segment of E can get allocated to O(log n) nodes of T . Since the structure D(v) has size linear in m = E (v), the total space used is O(n log n). For the reporting problem, the query time at a node v is O(log m + kv ) = O(log n + kv ). When summed over the O(log n) nodes visited, this gives O(log2 n + i). To see this, recall that the ray r can intersect at most one envelope segment of any color; thus the terms kv , taken over all nodes
v visited, sum to i.
For the counting problem, the query time at v is O(m1/2 ). It can be shown that if v has
depth j in T , then m = E (v) = O(n/2j ).
(See, for instance, [12, page 675].)
Thus, the

overall query time is O(),O(log n)(n/2j )1/2), which is O(n1/2).
In R3, the approach is similar, but more complex. Our goal is to solve the ray–envelope intersection problem in R3. As shown in [19], this problem can be reduced to certain standard halfspace range queries in R3 on a set of triangles (obtained by triangulating the Ec’s.) This problem can be solved by building a segment tree on the xspans of the triangles projected to the xyplane and augmenting each node of this tree with a data structure based on partition trees [25] or cutting trees [24] to answer the halfplane queries. Details may be found in [19].
THEOREM 64.6 The reporting version of the generalized halfspace range searching problem for a set of n colored points in R3 can be solved in O(n log2 n) (resp. O(n2+E)) space and O(n1/2+E + i) (resp. O(log2 n + i)) query time, where i is the output size and E > 0 is an arbitrarily small constant. The counting version is solvable in O(n log n) space and O(n2/3+E ) query time.
Additional examples of the sparsiﬁcationbased approach may be found in [23]. (An example also appears in the next section, enroute to a persistencebased solution of a generalized problem.)
A PersistenceBased Approach
Roughly speaking, we use persistence as follows: To solve a given generalized problem we ﬁrst identify a diﬀerent, but simpler, generalized problem and devise a data structure for it that also supports updates (usually just insertions). We then make this structure partially persistent [14] and query this persistent structure appropriately to solve the original problem.
We illustrate this approach for the generalized 3dimensional range searching problem, where we are required to preprocess a set, S, of n colored points in R3 so that for any query box q = [a, b] × [c, d] × [e, f ] the i distinct colors of the points inside q can be reported eﬃciently. We ﬁrst show how to build a semidynamic (i.e., insertionsonly) data structure for the generalized versions of the quadrant searching and 2dimensional range searching problems. These two structures will be the building blocks of our solution for the 2 dimensional problem.
Generalized semidynamic quadrant searching
Let S be a set of n colored points in the plane. For any point q = (a, b), the northeast quadrant of q, denoted by NE (q), is the set of all points (x, y) in the plane such that x ≥ a and y ≥ b. We show how to preprocess S so that for any query point q, the distinct colors of the points of S contained in NE (q) can be reported, and how points can be inserted into S. The data structure uses O(n) space, has a query time of O(log2 n + i), and an amortized insertion time of O(log n). This solution is based on the sparsiﬁcation approach described previously.
For each color c, we determine the cmaximal points. (A point p is called c–maximal if it has color c and there are no points of color c in p’s northeast quadrant.) We discard all points of color c that are not cmaximal. In the resulting set, let the predecessor, pred (p), of a ccolored point p be the ccolored point that lies immediately to the left of p. (For the leftmost point of color c, the predecessor is the point (−∞, ∞).) With each point p = (a, b), we associate the horizontal segment with endpoints (al, b) and (a, b), where al is the xcoordinate of pred (p). This segment gets the same color as p. Let Sc be the set of such segments of color c. The data structure consists of two parts, as follows.
The ﬁrst part is a structure T storing the segments in the sets Sc, where c runs over all colors. T supports the following query: given a point q in the plane, report the segments that are intersected by the upwardvertical ray starting at q. Moreover, it allows segments to be inserted and deleted. We implement T as the structure given in [11]. This structure uses O(n) space, supports insertions and deletions in O(log n) time, and has a query time of O(log2 n + l), where l is the number of segments intersected.
The second part is a balanced search tree CT , storing all colors. For each color c, we maintain a balanced search tree, Tc, storing the segments of Sc by increasing ycoordinate. This structure allows us to dynamically maintain Sc when a new ccolored point p is inserted. The general approach (omitting some special cases; see [18]) is as follows: By doing a binary search in Tc we can determine whether or not p is cmaximal in the current set of cmaximal points, i.e., the set of right endpoints of the segments of Sc. If p is not cmaximal, then we simply discard it. If p is cmaximal, then let s1,…, sk be the segments of Sc whose left endpoints are in the southwest quadrant of p. We do the following: (i) delete s2,... , sk from Tc; (ii) insert into Tc the horizontal segment which starts at p and extends leftwards upto the xcoordinate of the left endpoint of sk ; and (iii) truncate the segment s1 by keeping only the part of it that extends leftwards upto p’s xcoordinate. The entire operation can be done in O(log n + k) time.
Let us now consider how to answer a quadrant query, NE (q), and how to insert a point into S. To answer NE (q), we query T with the upwardvertical ray from q and report the colors of the segments intersected. The correctness of this algorithm follows from the easily proved facts that (i) a ccolored point lies in NE (q) if and only if a cmaximal point lies in NE (q) and (ii) if a cmaximal point is in NE (q), then the upwardvertical ray from q must intersect a segment of Sc. The correctness of T guarantees that only the segments intersected by this ray are reported. Since the query can intersect at most two segments in any Sc, we have l ≤ 2i, and so the query time is O(log2 n + i).
Let p be a ccolored point that is to be inserted into S. If c is not in CT , then we insert it into CT and insert the horizontal, leftwarddirected ray emanating from p into a new structure Tc. If c is present already, then we update Tc as just described. In both cases, we then perform the same updates on T . Hence, an insertion takes O((k + 1) log n) time.
What is the total time for n insertions into an initially empty set S? For each insertion, we can charge the O(log n) time to delete a segment si, 2 ≤ i ≤ k, to si itself. Notice that none of these segments will reappear. Thus each segment is charged at most once. Moreover, each of these segments has some previously inserted point as a right endpoint. It follows that the number of segments existing over the entire sequence of insertions is O(n) and so the total charge to them is O(n log n). The rest of the cost for each insertion (O(log n) for the binary search plus O(1) for steps (ii) and (iii)) we charge to p itself. Since any p is charged in this mode only once, the total charge incurred in this mode by all the inserted points is O(n log n). Thus the time for n insertions is O(n log n), which implies an amortized insertion time of O(log n).
LEMMA 64.2 Let S be a set of n colored points in the plane. There exists a data structure of size O(n) such that for any query point q, we can report the i distinct colors of the points that are contained in the northeast quadrant of q in O(log2 n+ i) time. Moreover, if we do n insertions into an initiallyempty set then the amortized insertion time is O(log n).
Generalized semidynamic 2dimensional range searching
Our goal here is to preprocess a set S of n colored points in the plane so that for any axes parallel query rectangle q = [a, b] × [c, d], we can solve the semidynamic reporting problem eﬃciently.
Our solution is based on the quadrant reporting structure of Lemma 64.2. We ﬁrst show how to solve the problem for query rectangles ql = [a, b] × [c, ∞). We store the points of S in sorted order by xcoordinate at the leaves of a BB (α) tree T l. At each internal node v, we store an instance of the structure of Lemma 64.2 for NE queries (resp., NW queries) built on the points in v’s left (resp., right) subtree. Let X(v) denote the average of the xcoordinate in the rightmost leaf in v’s left subtree and the xcoordinate in the leftmost leaf of v’s right subtree; for a leaf v, we take X(v) to be the xcoordinate of the point stored at v.
To answer a query ql, we do a binary search down T l, using [a, b], until either the search runs oﬀ T l or a (highest) node v is reached such that [a, b] intersects X(v). In the former case, we stop. In the latter case, if v is a leaf, then if v’s point is in ql we report its color.
If v is a nonleaf, then we query the structures at v using the NE quadrant and the NW – quadrant derived from ql (i.e., the quadrants with corners at (a, c) and (b, c), respectively), and then combine the answers. Updates on T l are performed using the amortizedcase updating strategy for BB (α) trees [32]. The correctness of the method should be clear.
The space and query time bounds follow from Lemma 64.2. Since the amortized insertion time of the quadrant searching structure is O(log n), the insertion in the BB (α) tree takes amortized time O(log2 n) [32].
To solve the problem for general query rectangles q = [a, b] × [c, d], we use the above approach again, except that we store the points in the tree by sorted ycoordinates. At each internal node v, we store an instance of the data structure above to answer queries of
the form [a, b] × [c, ∞) (resp. [a, b] × (−∞, d]) on the points in v’s left (resp. right) subtree.
The query strategy is similar to the previous one, except that we use the interval [c, d] to search in the tree. The query time is as before, while the space and update times increase by a logarithmic factor.
LEMMA 64.3 Let S be a set of n colored points in the plane. There exists a data structure of size O(n log2 n) such that for any query rectangle [a, b] × [c, d], we can report the i distinct colors of the points that are contained in it in O(log2 n + i) time. Moreover, points can be inserted into this data structure in O(log3 n) amortized time.
Generalized 3dimensional range searching
The semidynamic structure of Lemma 64.3 coupled with persistence allows us to go up one dimension and solve the original problem of interest: Preprocess a set S of n colored points in R3 so that for any query box q = [a, b] × [c, d] × [e, f ] the i distinct colors of the points inside q can be reported eﬃciently.
First consider queries of the form ql = [a, b] × [c, d] × [e, ∞). We sort the points of S by nonincreasing zcoordinates, and insert them in this order into a partially persistent version of the structure of Lemma 64.3, taking only the ﬁrst two coordinates into account. To answer ql, we access the version corresponding to the smallest zcoordinate greater than or equal to e and query it with [a, b] × [c, d].
To see that the query algorithm is correct, observe that the version accessed contains the projections on the xyplane of exactly those points of S whose zcoordinate is at least e.
Lemma 64.3 then guarantees that among these only the distinct colors of the ones in [a, b] × [c, d] are reported. These are precisely the distinct colors of the points contained in [a, b] × [c, d] × [e, ∞). The query time follows from Lemma 64.3. To analyze the space requirement, we note that the structure of Lemma 64.3 satisﬁes the conditions given in [14]. Speciﬁcally, it is a pointerbased structure, where each node is pointed to by only O(1) other nodes. As shown in [14], any modiﬁcation made by a persistent update operation on such a structure adds only O(1) amortized space to the resulting persistent structure. By Lemma 64.3, the total time for creating the persistent structure, via insertions, is O(n log3 n). This implies the same bound for the number of modiﬁcations in the structure, so the total space is O(n log3 n).
To solve the problem for general query boxes q = [a, b] × [c, d] × [e, f ], we follow an approach similar to that described for the 2dimensional case: We store the points in a
balanced binary search tree, sorted by zcoordinates. We associate with each internal node in the tree the auxiliary structure described above for answering queries of the form [a, b] × [c, d] × [e, ∞) (resp. [a, b] × [c, d] × (−∞,f ]) on the points in v’s left (resp. right)
subtree. (Note that since we do not need to do updates here the tree need not be a BB (α) tree.) Queries are done by searching down the tree using the interval [e, f ]. The query time is as before, but the space increases by a logarithmic factor.
THEOREM 64.7 Let S be a set of n colored points in 3space. S can be stored in a data structure of size O(n log4 n) such that for any query box [a, b] × [c, d] × [e, f ], we can report the i distinct colors of the points that are contained in it in O(log2 n + i) time.
Additional applications of the persistencebased approach to generalized intersection problems can be found in [18, 19, 21].
A General Approach for Reporting Problems
We describe a general method from [21] for solving any generalized reporting problem given a data structure for a “related” standard decision problem.
Let S be a set of n colored geometric objects and let q be any query object. In preprocessing, we store the distinct colors in S at the leaves of a balanced binary tree CT (in no particular order). For any node v of CT , let C(v) be the set of colors stored in the leaves of v’s subtree and let S(v) be the set of those objects of S colored with the colors in C(v). At v, we store a data structure DEC (v) to solve the following standard decision problem on S(v): “Decide whether or not q intersects any object of S(v).” DEC (v) returns “true” if and only if there is an intersection.
To answer a generalized reporting query on S, we do a depthﬁrst search in CT and query DEC (v) with q at each node v visited. If v is a nonleaf node then we continue searching below v if and only if the query returns “true”; if v is a leaf, then we output the color stored
there if and only if the query returns “true”.
THEOREM 64.8 Assume that a set of n geometric objects can be stored in a data structure of size M (n) such that it can be decided in f (n) time whether or not a query object intersects any of the n objects. Assume that M (n)/n and f (n) are nondecreasing functions for nonnegative values of n. Then a set S of n colored geometric objects can be preprocessed into a data structure of size O(M (n) log n) such that the i distinct colors of the objects in S that are intersected by a query object q can be reported in time O(f (n) + i · f (n) log n).
Proof We argue that a color c is reported if and only if there is a ccolored object in S intersecting q. Suppose that c is reported. This implies that a leaf v is reached in the search such that v stores c and the query on DEC (v) returns “true”. Thus, some object in S(v) intersects q. Since v is a leaf, all objects in S(v) have the same color c and the claim follows.
For the converse, suppose that q intersects a ccolored object p. Let v be the leaf storing c. Thus, p ∈ S(vl) for every node vl on the roottov path in CT . Thus, for each vl, the query on DEC (vl) will return “true”, which implies that v will be visited and c will be output.
are the nodes at any level, then the total space used by CT at that level is
),r
),r r

i=1 M (S(vi)) =
i=1 S(vi)· (M (S(vi))/S(vi)) ≤ ),
S(vi)· (M (n)/n) = M (n),

since ),r
S(vi) = n and since S(vi) ≤ n implies that M (S(vi))/S(vi) ≤ M (n)/n.
Now since there are O(log n) levels, the overall space is O(M (n) log n). The query time
can be upperbounded as follows: If i = 0, then the query on DEC (root) returns “false”
and we abandon the search at the root itself; in this case, the query time is just O(f (n)). Suppose that i /= 0. Call a visited node v fruitful if the query on DEC (v) returns “true”
and fruitless otherwise. Each fruitful node can be charged to some color in its subtree that gets reported. Since the number of times any reported color can be charged is O(log n) (the height of CT ) and since i colors are reported, the number of fruitful nodes is O(i log n). Since each fruitless node has a fruitful parent and CT is a binary tree, it follows that there are only O(i log n) fruitless nodes. Hence the number of nodes visited by the search is O(i log n). At each such node, v, we spend time f (S(v)), which is O(f (n)) since S(v)≤ n and f is nondecreasing. Thus the total time spent in doing queries at the visited nodes is O(i · f (n) log n). The claimed query time follows.
As an application of this method, consider the generalized halfspace range searching in Rd, for any ﬁxed d ≥ 2. For d = 2, 3, we discussed a solution for this problem in Section 64.3.2. For d > 3, the problem can be solved by extending (signiﬁcantly) the ray envelope intersection algorithm outlined in Section 64.3.2. However, the bounds are not very satisfactory—O(ndld/2J+E) space and logarithmic query time or nearlinear space and superlinear query time. The solution we give below has more desirable bounds.
The colored objects for this problem are points in Rd and the query is a closed halfspace in Rd. We store the objects in CT , as described previously. The standard decision problem that we need to solve at each node v of CT is “Does a query halfspace contain any point of S(v).” The answer to this query is “true” if and only if the query halfspace is nonempty. We take the data structure, DEC (v), for this problem to be the one given in [27]. If Sv  = nv , then DEC (v) uses O(nld/2J/(log n )ld/2J−E ) space and has query time O(log n )
[27]. The conditions in Theorem 64.8 hold, so applying it gives the following result.
THEOREM 64.9 For any ﬁxed d ≥ 2, a set S of n colored points in Rd can be stored in a data structure of size O(nld/2J /(log n)ld/2J−1−E ) such that the i distinct colors of the points contained in a query halfspace Q− can be reported in time O(log n + i log2 n). Here E > 0 is an arbitrarily small constant.
Other applications of the general method may be found in [21].
Adding Range Restrictions
We describe the general technique of [20] that adds a range restriction to a generalized intersection searching problem.
Let PR be a generalized intersection searching problem on a set S of n colored objects and query objects q belonging to a class Q. We denote the answer to a query by PR(q, S).
To add a range restriction, we associate with each element p ∈ S a real number kp. In a rangerestricted generalized intersection searching problem, denoted by TPR, a query consists of an element q ∈ Q and an interval [l, r], and
THEOREM 64.11 Let S be a set of n colored points in Rd, where d ≥ 1 is a constant. There exists a data structure of size O(n1+E ) such that for any query box in Rd, we can report the i distinct colors of the points that are contained in it in O(log n + i) time.
Proof The proof is by induction on d. For d = 1, the claim follows from Theorem 64.1. Let d ≥ 2, and let DS be a data structure of size O(n1+E ) that answers generalized (d − 1)dimensional range queries in O(log n + i) time. Observe that for the generalized ddimensional range searching problem, there are only polynomially many distinct semi inﬁnite queries. Hence, there exists a data structure TDS of polynomial size that answers generalized ddimensional semiinﬁnite range queries in O(log n + i) time. Applying Theo rem 64.10 to DS and TDS proves the claim.
Conclusion and Future Directions
We have reviewed recent research on a class of geometric queryretrieval problems, where the objects to be queried come aggregated in disjoint groups and of interest are questions concerning the intersection of the query object with the groups (rather than with the individual objects). These problems include the wellstudied standard intersection problems as a special case and have many applications. We have described several general techniques that have been identiﬁed for these problems and have illustrated them with examples.
Some potential directions for future work include: (i) extending the transformationbased approach to higher dimensions; (ii) improving the time bounds for some of the problems discussed here—for instance, can the generalized orthogonal range searching problem in Rd, for d ≥ 4, be solved with O(polylog(n) + i) query time and O(n(log n)O(1)n) space;
(i) developing general dynamization techniques for generalized problems, along the lines of, for instance, [5] for standard problems; (iv) developing eﬃcient solutions to generalized problems where the objects may be in timedependent motion; and (v) implementing and testing experimentally some of the solutions presented here.