Search notes:

Python library: ScaNN

import numpy as np
import scann
Github repository about-python, path: /libraries/ScaNN/example/import-libraries.py
We create 3 million vectors.
1 million is centered around (1, 0, 0), 1 million around (0, 1, 0) and 1 million around (0, 0, 1):
v1 = np.array([
        np.array([
           np.random.normal(1, 0.01),
           np.random.normal(0, 0.01),
           np.random.normal(0, 0.01)
        ])
        for _ in range(1000000)
     ]).astype(np.float32)

v2 = np.array([
        np.array([
           np.random.normal(0, 0.01),
           np.random.normal(1, 0.01),
           np.random.normal(0, 0.01)
        ])
        for _ in range(1000000)
     ]).astype(np.float32)

v3 = np.array([
        np.array([
           np.random.normal(0, 0.01),
           np.random.normal(0, 0.01),
           np.random.normal(1, 0.01)
        ])
        for _ in range(1000000)
     ]).astype(np.float32)
Github repository about-python, path: /libraries/ScaNN/example/vectors.py
We additionially create 10000 vectors that are centered around (0.5, 0.5, 0.5). The goal of this example is to find these vectors (which is why they're called needles here):
needles = np.array([
       np.array([
          np.random.normal(0.5, 0.01),
          np.random.normal(0.5, 0.01),
          np.random.normal(0.5, 0.01)
     ])
     for _ in range(10000)
   ]).astype(np.float32)
Github repository about-python, path: /libraries/ScaNN/example/needles.py
The vectors are combined and randomly shuffled:
data = np.concatenate( (v1, v2, v3, needles) )
np.random.shuffle(data)
Github repository about-python, path: /libraries/ScaNN/example/data.py
Creating a builder:
builder = scann.scann_ops_pybind.builder(
    data,
    num_neighbors    =  10,
    distance_measure = 'squared_l2'  # or, alternatively: 'dot_product'
)
Github repository about-python, path: /libraries/ScaNN/example/builder.py
builder = builder.tree(
    num_leaves           =   10000,
    num_leaves_to_search =   10000,
    training_sample_size = 1000000
)
Github repository about-python, path: /libraries/ScaNN/example/tree.py
builder = builder.score_ah(
    10000,
    anisotropic_quantization_threshold = 0.001
)
Github repository about-python, path: /libraries/ScaNN/example/score_ah.py
builder = builder.reorder(
    1000
)
Github repository about-python, path: /libraries/ScaNN/example/reorder.py
Creating a searcher:
searcher = builder.build()
Github repository about-python, path: /libraries/ScaNN/example/searcher.py
Executing the query:
query = np.array([ 0.5, 0.5, 0.5 ]).astype(np.float32)
neighbors, distances = searcher.search(query, final_num_neighbors=10)
Github repository about-python, path: /libraries/ScaNN/example/query.py
Printing the result:
for x in zip(neighbors,distances):
    print(x)
Github repository about-python, path: /libraries/ScaNN/example/result.py

Index

Fatal error: Uncaught PDOException: SQLSTATE[HY000]: General error: 8 attempt to write a readonly database in /home/httpd/vhosts/renenyffenegger.ch/php/web-request-database.php:78 Stack trace: #0 /home/httpd/vhosts/renenyffenegger.ch/php/web-request-database.php(78): PDOStatement->execute(Array) #1 /home/httpd/vhosts/renenyffenegger.ch/php/web-request-database.php(30): insert_webrequest_('/notes/developm...', 1759561876, '216.73.216.149', 'Mozilla/5.0 App...', NULL) #2 /home/httpd/vhosts/renenyffenegger.ch/httpsdocs/notes/development/languages/Python/libraries/ScaNN/index(135): insert_webrequest() #3 {main} thrown in /home/httpd/vhosts/renenyffenegger.ch/php/web-request-database.php on line 78