-
Notifications
You must be signed in to change notification settings - Fork 234
feat: check if a document is already index #1633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: maxwelljin2 <gejin@berkeley.edu>
docarray/index/backends/hnswlib.py
Outdated
| rows = self._sqlite_cursor.fetchall() | ||
| return len(rows) > 0 | ||
| else: | ||
| raise NotImplementedError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this does not seem a proper exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is still in progress, I'll change it later :) It should output proper hint to users
Signed-off-by: maxwelljin2 <gejin@berkeley.edu>
Signed-off-by: maxwelljin2 <gejin@berkeley.edu>
Signed-off-by: maxwelljin2 <gejin@berkeley.edu>
samsja
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. Good job 🎩
Signed-off-by: maxwelljin2 <gejin@berkeley.edu>
docarray/index/abstract.py
Outdated
| return False | ||
|
|
||
| if safe_issubclass(type(item), BaseDoc): | ||
| docs = self._get_all_documents() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think this is the way to do right? I think u should call __contains__ in every subindex and if one returns True, it is true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's implemented in the line 1205 - 1207. I'll remove all _get_all_documents method, so it would only look for any sub-document inside the DocArray. (so the meaning for this subindex_contains method would similar to subindex_find)
|
With the |
Signed-off-by: maxwelljin2 <gejin@berkeley.edu>
Signed-off-by: maxwelljin2 <gejin@berkeley.edu>
Signed-off-by: maxwelljin2 <gejin@berkeley.edu>
Signed-off-by: maxwelljin2 <gejin@berkeley.edu>
This PR is designed to enable the indexer (for all supported backends) to check whether a document has already been indexed. We aim to accommodate various backends including in_memory, hnswlib, elastic, qdrant, and weaviate. Given that different backends store and index documents in distinct ways, we need to custom-tailor our function for each backend.
Progress: