cupyx.scipy.spatial.KDTree#

class cupyx.scipy.spatial.KDTree(data, leafsize=10, compact_nodes=True, copy_data=False, balanced_tree=True, boxsize=None)[source]#

KDTree(data, leafsize=16, compact_nodes=True, copy_data=False,: balanced_tree=True, boxsize=None)

kd-tree for quick nearest-neighbor lookup

This class provides an index into a set of k-dimensional points which can be used to rapidly look up the nearest neighbors of any point.

Parameters:

data (array_like, shape (n,m)) – The n data points of dimension m to be indexed. This array is not copied unless this is necessary to produce a contiguous array of doubles, and so modifying this data will result in bogus results. The data are also copied if the kd-tree is built with copy_data=True.
leafsize (positive int, optional) – The number of points at which the algorithm switches over to brute-force. Default: 16.
compact_nodes (bool, optional) – If True, the kd-tree is built to shrink the hyperrectangles to the actual data range. This usually gives a more compact tree that is robust against degenerated input data and gives faster queries at the expense of longer build time. Default: True.
copy_data (bool, optional) – If True the data is always copied to protect the kd-tree against data corruption. Default: False.
balanced_tree (bool, optional) – If True, the median is used to split the hyperrectangles instead of the midpoint. This usually gives a more compact tree and faster queries at the expense of longer build time. Default: True.
boxsize (array_like or scalar, optional) – Apply a m-d toroidal topology to the KDTree.. The topology is generated by \(x_i + n_i L_i\) where \(n_i\) are integers and \(L_i\) is the boxsize along i-th dimension. The input data shall be wrapped into \([0, L_i)\). A ValueError is raised if any of the data is outside of this bound.

Notes

The algorithm used is described in Wald, I. 2022 [1]. The general idea is that the kd-tree is a binary tree, each of whose nodes represents an axis-aligned hyperrectangle. Each node specifies an axis and splits the set of points based on whether their coordinate along that axis is greater than or less than a particular value.

The tree can be queried for the r closest neighbors of any given point (optionally returning only those within some maximum distance of the point). It can also be queried, with a substantial gain in efficiency, for the r approximate closest neighbors. See [2] for more information regarding the implementation.

For large dimensions (20 is already large) do not expect this to run significantly faster than brute force. High-dimensional nearest-neighbor queries are a substantial open problem in computer science.

Variables:

data (ndarray, shape (n,m)) – The n data points of dimension m to be indexed. This array is not copied unless this is necessary to produce a contiguous array of doubles. The data are also copied if the kd-tree is built with copy_data=True.
leafsize (positive int) – The number of points at which the algorithm switches over to brute-force.
m (int) – The dimension of a single data-point.
n (int) – The number of data points.
maxes (ndarray, shape (m,)) – The maximum value in each dimension of the n data points.
mins (ndarray, shape (m,)) – The minimum value in each dimension of the n data points.
tree (ndarray) – This attribute exposes the array representation of the tree.
size (int) – The number of nodes in the tree.

References

Methods

count_neighbors(other, r, p=2.0, weights=None, cumulative=True)[source]#

Count how many nearby pairs can be formed.

Count the number of pairs (x1,x2) can be formed, with x1 drawn from self and x2 drawn from other, and where distance(x1, x2, p) <= r.

Data points on self and other are optionally weighted by the weights argument. (See below)

This is adapted from the “two-point correlation” algorithm described by Gray and Moore [1]. See notes for further discussion.

Parameters:

other (KDTree instance) – The other tree to draw points from, can be the same tree as self.
r (float or one-dimensional array of floats) – The radius to produce a count for. Multiple radii are searched with a single tree traversal. If the count is non-cumulative(cumulative=False), r defines the edges of the bins, and must be non-decreasing.
p (float, optional) – 1<=p<=infinity. Which Minkowski p-norm to use. Default 2.0. A finite large p may cause a ValueError if overflow can occur.
weights (tuple, array_like, or None, optional) – If None, the pair-counting is unweighted. If given as a tuple, weights[0] is the weights of points in self, and weights[1] is the weights of points in other; either can be None to indicate the points are unweighted. If given as an array_like, weights is the weights of points in self and other. For this to make sense, self and other must be the same tree. If self and other are two different trees, a ValueError is raised. Default: None
cumulative (bool, optional) – Whether the returned counts are cumulative. When cumulative is set to False the algorithm is optimized to work with a large number of bins (>10) specified by r. When cumulative is set to True, the algorithm is optimized to work with a small number of r. Default: True

Returns:

result – The number of pairs. For unweighted counts, the result is integer. For weighted counts, the result is float. If cumulative is False, result[i] contains the counts with (-inf if i == 0 else r[i-1]) < R <= r[i]

Return type:

scalar or 1-D array

query(x, k=1, eps=0.0, p=2.0, distance_upper_bound=inf)[source]#

Query the kd-tree for nearest neighbors.

Parameters:

x (array_like, last dimension self.m) – An array of points to query.
k (list of integer or integer) – The list of k-th nearest neighbors to return. If k is an integer it is treated as a list of [1, … k] (range(1, k+1)). Note that the counting starts from 1.
eps (non-negative float) – Return approximate nearest neighbors; the k-th returned value is guaranteed to be no further than (1+eps) times the distance to the real k-th nearest neighbor.
p (float, 1<=p<=infinity) – Which Minkowski p-norm to use. 1 is the sum-of-absolute-values “Manhattan” distance 2 is the usual Euclidean distance infinity is the maximum-coordinate-difference distance A finite large p may cause a ValueError if overflow can occur.
distance_upper_bound (nonnegative float) – Return only neighbors within this distance. This is used to prune tree searches, so if you are doing a series of nearest-neighbor queries, it may help to supply the distance to the nearest neighbor of the most recent point.

Returns:

d (array of floats) – The distances to the nearest neighbors. If x has shape tuple+(self.m,), then d has shape tuple+(k,). When k == 1, the last dimension of the output is squeezed. Missing neighbors are indicated with infinite distances.
i (ndarray of ints) – The index of each neighbor in self.data. If x has shape tuple+(self.m,), then i has shape tuple+(k,). When k == 1, the last dimension of the output is squeezed. Missing neighbors are indicated with self.n.

Notes

If the KD-Tree is periodic, the position x is wrapped into the box.

When the input k is a list, a query for arange(max(k)) is performed, but only columns that store the requested values of k are preserved. This is implemented in a manner that reduces memory usage.

Examples

>>> import cupy as cp
>>> from cupyx.scipy.spatial import KDTree
>>> x, y = cp.mgrid[0:5, 2:8]
>>> tree = KDTree(cp.c_[x.ravel(), y.ravel()])

To query the nearest neighbours and return squeezed result, use

>>> dd, ii = tree.query([[0, 0], [2.2, 2.9]], k=1)
>>> print(dd, ii, sep='\n')
[2.         0.2236068]
[ 0 13]

To query the nearest neighbours and return unsqueezed result, use

>>> dd, ii = tree.query([[0, 0], [2.2, 2.9]], k=[1])
>>> print(dd, ii, sep='\n')
[[2.        ]
 [0.2236068]]
[[ 0]
 [13]]

To query the second nearest neighbours and return unsqueezed result, use

>>> dd, ii = tree.query([[0, 0], [2.2, 2.9]], k=[2])
>>> print(dd, ii, sep='\n')
[[2.23606798]
 [0.80622577]]
[[ 6]
 [19]]

To query the first and second nearest neighbours, use

>>> dd, ii = tree.query([[0, 0], [2.2, 2.9]], k=2)
>>> print(dd, ii, sep='\n')
[[2.         2.23606798]
 [0.2236068  0.80622577]]
[[ 0  6]
 [13 19]]

or, be more specific

>>> dd, ii = tree.query([[0, 0], [2.2, 2.9]], k=[1, 2])
>>> print(dd, ii, sep='\n')
[[2.         2.23606798]
 [0.2236068  0.80622577]]
[[ 0  6]
 [13 19]]

query_ball_point(x, r, p=2.0, eps=0, return_sorted=None, return_length=False)[source]#

Find all points within distance r of point(s) x.

Parameters:

x (array_like, shape tuple + (self.m,)) – The point or points to search for neighbors of.
r (array_like, float) – The radius of points to return, shall broadcast to the length of x.
p (float, optional) – Which Minkowski p-norm to use. Should be in the range [1, inf]. A finite large p may cause a ValueError if overflow can occur.
eps (nonnegative float, optional) – Approximate search. Branches of the tree are not explored if their nearest points are further than r / (1 + eps), and branches are added in bulk if their furthest points are nearer than r * (1 + eps).
return_sorted (bool, optional) – Sorts returned indices if True and does not sort them if False. If None, does not sort single point queries, but does sort multi-point queries which was the behavior before this option was added in SciPy.
return_length (bool, optional) – Return the number of points inside the radius instead of a list of the indices.

Returns:

results – If x is a single point, returns a list of the indices of the neighbors of x. If x is an array of points, returns an object array of shape tuple containing lists of neighbors.

Return type:

list or array of lists

Notes

If you have many points whose neighbors you want to find, you may save substantial amounts of time by putting them in a KDTree and using query_ball_tree.

Examples

>>> import cupy as cp
>>> from cupyx.scipy import spatial
>>> x, y = cp.mgrid[0:4, 0:4]
>>> points = cp.c_[x.ravel(), y.ravel()]
>>> tree = spatial.KDTree(points)
>>> tree.query_ball_point([2, 0], 1)
[4, 8, 9, 12]

query_ball_tree(other, r, p=2.0, eps=0.0)[source]#

Find all pairs of points between self and other whose distance is at most r.

Parameters:

other (KDTree instance) – The tree containing points to search against.
r (float) – The maximum distance, has to be positive.
p (float, optional) – Which Minkowski norm to use. p has to meet the condition 1 <= p <= infinity. A finite large p may cause a ValueError if overflow can occur.
eps (float, optional) – Approximate search. Branches of the tree are not explored if their nearest points are further than r/(1+eps), and branches are added in bulk if their furthest points are nearer than r * (1+eps). eps has to be non-negative.

Returns:

results – For each element self.data[i] of this tree, results[i] is a list of the indices of its neighbors in other.data.

Return type:

list of ndarrays

Examples

You can search all pairs of points between two kd-trees within a distance:

>>> import matplotlib.pyplot as plt
>>> import cupy as cp
>>> from cupyx.scipy.spatial import KDTree
>>> points1 = cp.random.rand((15, 2))
>>> points2 = cp.random.rand((15, 2))
>>> plt.figure(figsize=(6, 6))
>>> plt.plot(points1[:, 0], points1[:, 1], "xk", markersize=14)
>>> plt.plot(points2[:, 0], points2[:, 1], "og", markersize=14)
>>> kd_tree1 = KDTree(points1)
>>> kd_tree2 = KDTree(points2)
>>> indexes = kd_tree1.query_ball_tree(kd_tree2, r=0.2)
>>> for i in range(len(indexes)):
...     for j in indexes[i]:
...         plt.plot([points1[i, 0], points2[j, 0]],
...             [points1[i, 1], points2[j, 1]], "-r")
>>> plt.show()

query_pairs(r, p=2.0, eps=0, output_type='ndarray')[source]#

Find all pairs of points in self whose distance is at most r.

Parameters:

r (positive float) – The maximum distance.
p (float, optional) – Which Minkowski norm to use. p has to meet the condition 1 <= p <= infinity. A finite large p may cause a ValueError if overflow can occur.
eps (float, optional) – Approximate search. Branches of the tree are not explored if their nearest points are further than r/(1+eps), and branches are added in bulk if their furthest points are nearer than r * (1+eps). eps has to be non-negative.
output_type (string, optional) – Choose the output container, ‘set’ or ‘ndarray’. Default: ‘ndarray’ Note: ‘set’ output is not supported.

Returns:

results – An ndarray of size (total_pairs, 2), containing each pair (i,j), with i < j, for which the corresponding positions are close.

Return type:

ndarray

Notes

This method does not support the set output type.

Examples

You can search all pairs of points in a kd-tree within a distance:

>>> import matplotlib.pyplot as plt
>>> import cupy as cp
>>> from cupyx.scipy.spatial import KDTree
>>> points = cp.random.rand((20, 2))
>>> plt.figure(figsize=(6, 6))
>>> plt.plot(points[:, 0], points[:, 1], "xk", markersize=14)
>>> kd_tree = KDTree(points)
>>> pairs = kd_tree.query_pairs(r=0.2)
>>> for (i, j) in pairs:
...     plt.plot([points[i, 0], points[j, 0]],
...             [points[i, 1], points[j, 1]], "-r")
>>> plt.show()

sparse_distance_matrix(other, max_distance, p=2.0, output_type='coo_matrix')[source]#

Compute a sparse distance matrix

Computes a distance matrix between two KDTrees, leaving as zero any distance greater than max_distance.

Parameters:

other (KDTree) –
max_distance (positive float) –
p (float, 1<=p<=infinity) – Which Minkowski p-norm to use. A finite large p may cause a ValueError if overflow can occur.
output_type (string, optional) – Which container to use for output data. Options: ‘coo_matrix’ or ‘ndarray’. Default: ‘coo_matrix’.

Returns:

result – Sparse matrix representing the results in “dictionary of keys” format. If output_type is ‘ndarray’ an NxM distance matrix will be returned.

Return type:

coo_matrix or ndarray

Examples

You can compute a sparse distance matrix between two kd-trees:

>>> import cupy
>>> from cupyx.scipy.spatial import KDTree
>>> points1 = cupy.random.rand((5, 2))
>>> points2 = cupy.random.rand((5, 2))
>>> kd_tree1 = KDTree(points1)
>>> kd_tree2 = KDTree(points2)
>>> sdm = kd_tree1.sparse_distance_matrix(kd_tree2, 0.3)
>>> sdm.toarray()
array([[0.        , 0.        , 0.12295571, 0.        , 0.        ],
   [0.        , 0.        , 0.        , 0.        , 0.        ],
   [0.28942611, 0.        , 0.        , 0.2333084 , 0.        ],
   [0.        , 0.        , 0.        , 0.        , 0.        ],
   [0.24617575, 0.29571802, 0.26836782, 0.        , 0.        ]])

You can check distances above the max_distance are zeros:

>>> from cupyx.scipy.spatial import distance_matrix
>>> distance_matrix(points1, points2)
array([[0.56906522, 0.39923701, 0.12295571, 0.8658745 , 0.79428925],
   [0.37327919, 0.7225693 , 0.87665969, 0.32580855, 0.75679479],
   [0.28942611, 0.30088013, 0.6395831 , 0.2333084 , 0.33630734],
   [0.31994999, 0.72658602, 0.71124834, 0.55396483, 0.90785663],
   [0.24617575, 0.29571802, 0.26836782, 0.57714465, 0.6473269 ]])

__eq__(value, /)#: Return self==value.

__ne__(value, /)#: Return self!=value.

__lt__(value, /)#: Return self<value.

__le__(value, /)#: Return self<=value.

__gt__(value, /)#: Return self>value.

__ge__(value, /)#: Return self>=value.