# [Numpy-discussion] huge array calculation speed

Andrew Dalke dalke@dalkescientific....
Thu Jul 10 18:48:12 CDT 2008

```On Jul 10, 2008, at 6:38 PM, Dan Lussier wrote:
> What I'm trying to do is to calculate the total total potential
> energy and coordination number of each atom within a relatively large
> simulation.

> you could try a three-dimensional grid (if your atoms aren't
> too clumpy) or octrees (if they are somewhat clumpy); kd-trees are
> probably overkill (since they're designed for higher-dimensional
> problems).

I implemented something like what you did in VMD about 14(!) years ago.
(VMD is a molecular structure visualization program.)  I needed to
find which atoms might be bonded to each other.  I assume you have
a cutoff value, which means you don't need to worry about the general
nearest-neighbor problem.

Molecules are nice because the distributions are not clumpy.  Atoms
don't get that close to nor that far from other atoms.  I implemented
a grid.  It goes something like this:

import collections

# Search within 3 A
d = 3.0

coordinates = [
("C1", 34.287,  50.970, 115.006),
("O1", 34.972,  51.144, 113.870),
("C2", 33.929,  52.255, 115.739),
("N2", 34.753,  52.387, 116.954),
("C3", 32.448,  52.219, 116.121),
("O3", 32.033,  50.877, 116.336),
("C4", 31.528,  52.817, 115.054),
("C5", 30.095,  53.013, 115.558),
("C6", 29.226,  53.835, 114.609),
("C7", 29.807,  55.217, 114.304),
("C8", 29.092,  55.920, 113.147),
("C9", 29.525,  57.375, 112.971),
("C10", 28.409,  58.267, 112.422),
("C11", 28.828,  59.734, 112.294),
("C12", 27.902,  60.542, 111.385),
("C13", 26.617,  60.996, 112.085),
("C14", 26.182,  62.401, 111.667),
("C15", 24.739,  62.731, 112.054),
("C16", 24.251,  64.046, 111.441),
("C17", 23.026,  64.624, 112.150),
("C18", 22.631,  66.007, 111.623),]

def dict_of_list():
return collections.defaultdict(list)

def dict_of_dict():
return collections.defaultdict(dict_of_list)

grid = collections.defaultdict(dict_of_dict)

for atom in coordinates:
name,x,y,z = atom
i = int(x/d)
j = int(y/d)
k = int(z/d)
grid[i][j][k].append(atom)

query_name, query_x, query_y, query_z = coordinates[8]
query_i = int(query_x/d)
query_j = int(query_y/d)
query_k = int(query_z/d)

# Given the search distance 'd', only need to check cells up to 1
unit away
within_names = set()
for i in range(query_i-1, query_i+2):
for j in range(query_j-1, query_j+2):
for k in range(query_k-1, query_k+2):
for atom in grid[i][j][k]:
name, x, y, z = atom
print "Check", atom,
query_d2 = (x-query_x)**2+(y-query_y)**2+(z-query_z)**2
if query_d2 < d*d:
print "Within!", query_d2
else:
print "Too far", query_d2

print len(within_names), "were close enough"

# Linear search to verify
count = 0
for name, x, y, z in coordinates:
query_d2 = (x-query_x)**2+(y-query_y)**2+(z-query_z)**2
if query_d2 < d*d:
print "Within", name
if name not in within_names:
raise AssertionError(name)
count += 1

if count != len(within_names):
raise AssertionError("count problem")

You can also grab the KDTree from Biopython, which is implemented in C.

http://www.biopython.org/DIST/docs/api/Bio.KDTree.KDTree'-module.html

It was designed for just this task.

Andrew
dalke@dalkescientific.com

```