-
Notifications
You must be signed in to change notification settings - Fork 234
feat(v2): add from and to pandas df for documentarray #1161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Co-authored-by: Johannes Messner <44071807+JohannesMessner@users.noreply.github.com> Signed-off-by: Charlotte Gerhaher <charlotte.gerhaher@jina.ai>
Co-authored-by: Johannes Messner <44071807+JohannesMessner@users.noreply.github.com> Signed-off-by: Charlotte Gerhaher <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
…to-pandas Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
pyproject.toml
Outdated
| rich = ">=13.1.0" | ||
| lz4 = {version= ">=1.0.0", optional = true} | ||
| pydub = {version = "^0.25.1", optional = true } | ||
| pandas = ">=1.1.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be an optional dependecy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add it to common or a as a separate extra pandas ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
separate. we will reorginaze at some point anyway
samsja
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
look goods, I added some comments
docarray/array/array/io.py
Outdated
| ) | ||
|
|
||
| @staticmethod | ||
| def access_path_dict_to_nested_dict( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't you already implement something like this in your previous PR? or am i mixing things up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the function _access_path_to_dict that is used within _access_path_dict_to_nested_dict that I implemented in the last PR.
_access_path_to_dict: transforms one access path to nested dict_ access_path_dict_to_nested_dict: transforms dict with (multiple) access path keys to a joint nested dict by calling the former func
I will move the latter to that other helper file, to keep those functions together yes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok sounds good, yes keeping them in the same helper file would be ideal
docarray/array/array/io.py
Outdated
| return json.dumps([doc.json() for doc in self]) | ||
|
|
||
| @classmethod | ||
| def _check_for_valid_document_type(cls) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer returning a bool and rasing at the call site. That way it is clearer where the error actually occurs, and this method can be re-used in contexts where only the information is needed, without wanting to raise an exception
docarray/array/array/io.py
Outdated
| ) | ||
|
|
||
| @classmethod | ||
| def _check_for_valid_access_paths(cls, field_names: Optional[List[str]]) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
docarray/array/array/io.py
Outdated
|
|
||
| @classmethod | ||
| def _check_for_valid_document_type(cls) -> None: | ||
| if cls.document_type == AnyDocument: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there are plan to make this check more sophisticated as part of this PR? otherwise this doesn't need to be a method imo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, just tried to extract all the duplicate code but maybe I went a bit overboard hehe, moved it back to from_csv and from_pandas again
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Co-authored-by: Johannes Messner <44071807+JohannesMessner@users.noreply.github.com> Signed-off-by: Charlotte Gerhaher <charlotte.gerhaher@jina.ai>
Co-authored-by: Johannes Messner <44071807+JohannesMessner@users.noreply.github.com> Signed-off-by: Charlotte Gerhaher <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
samsja
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should make pandas install optional. Otherwise looks good
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
|
📝 Docs are deployed on https://ft-feat-from-to-pandas--jina-docs.netlify.app 🎉 |
* feat: load from and to csv Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: from to csv Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * feat: add access path to dict Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: from to csv Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: clean up Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * docs: add docstring and update tmpdir in test Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: merge nested dicts Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: clean up Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: clean up Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * test: update test Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: apply samis suggestion from code review Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: apply suggestions from code review wrt access paths Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: apply johannes suggestion Co-authored-by: Johannes Messner <44071807+JohannesMessner@users.noreply.github.com> Signed-off-by: Charlotte Gerhaher <charlotte.gerhaher@jina.ai> * fix: apply johannes suggestion Co-authored-by: Johannes Messner <44071807+JohannesMessner@users.noreply.github.com> Signed-off-by: Charlotte Gerhaher <charlotte.gerhaher@jina.ai> * fix: apply suggestions from code review Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: apply suggestions from code review Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: typos Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * refactor: move helper functions to helper file Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * test: fix fixture Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * feat: add to and from pandas df for documentarray Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * chore: add pandas to pyproject.toml Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * docs: update docstring Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: mypy Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: clean up Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: apply suggestions from code review Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: apply suggestion from johannes Co-authored-by: Johannes Messner <44071807+JohannesMessner@users.noreply.github.com> Signed-off-by: Charlotte Gerhaher <charlotte.gerhaher@jina.ai> * fix: apply suggestion from johannes Co-authored-by: Johannes Messner <44071807+JohannesMessner@users.noreply.github.com> Signed-off-by: Charlotte Gerhaher <charlotte.gerhaher@jina.ai> * fix: apply suggestions from code review Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: apply suggestion Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: apply suggestions Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> --------- Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> Signed-off-by: Charlotte Gerhaher <charlotte.gerhaher@jina.ai> Co-authored-by: Johannes Messner <44071807+JohannesMessner@users.noreply.github.com> Signed-off-by: Arnav Zutshi <arnzut1324@gmail.com>
Add from and to pandas to DocumentArray IOMixin
DocumentArray[Doc].from_pandas(df)da.to_pandas()