Skip to content

Conversation

@maxwelljin
Copy link
Contributor

Currently, the subindex feature is not available on exact in-memory NNIndexer. This PR aims to support nested documents for ExactNNIndexer

Signed-off-by: maxwelljin2 <gejin@berkeley.edu>
Signed-off-by: maxwelljin2 <gejin@berkeley.edu>
Signed-off-by: maxwelljin2 <gejin@berkeley.edu>
@maxwelljin maxwelljin linked an issue Jun 7, 2023 that may be closed by this pull request
@maxwelljin
Copy link
Contributor Author

This PR introduces support for sub-indexing in the exactNNSearch by enabling nested documents. Here's a summary of the changes:

  1. Abstract class methods are utilized to create a parent_id / initialization for each nested document.
  2. The backend stores everything in the new_schema type, including the parent_id, but user need raw type. To handle this, the PR includes helper methods to convert the raw type back into the original type that the user inputted to the Indexer.
  3. To ensure that the data type internally stored in the backend is not changed during casting, a copy method is included. The built-in document copy method does not support PyTorch tensors, so a new shallow copy method was created and tested.
  4. The PR also includes test code, which is similar to the tests for other backends.
    These changes aim to enhance the functionality of the exactNNSearch by supporting nested documents.

@maxwelljin maxwelljin marked this pull request as ready for review June 7, 2023 09:43
Copy link
Member

@JoanFM JoanFM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens when I call persist, does it persist and reload all the subindices?

@maxwelljin
Copy link
Contributor Author

I'll take a look on that. All doc contents should be persist. If the load_binary function call the initialization, it should reconstruct subindices. I'll try some examples

@JoanFM
Copy link
Member

JoanFM commented Jun 7, 2023

I'll take a look on that. All doc contents should be persist. If the load_binary function call the initialization, it should reconstruct subindices. I'll try some examples

And test it

Signed-off-by: maxwelljin2 <gejin@berkeley.edu>
@maxwelljin
Copy link
Contributor Author

This new commit supports the persist storage on subindex document and includes the relevant test.

my_tens: NdArray[30]


def test_subindex_index(tmp_path):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also check that find and find_subindex work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll add that

Signed-off-by: maxwelljin2 <gejin@berkeley.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Provide SubIndex feature on ExactNNSearchIndexer

2 participants