-
Notifications
You must be signed in to change notification settings - Fork 4.2k
GH-49644: [Python] Support converting list of multi-dimensional arrays to FixedShapeTensor #50203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 4 commits
84154db
e695d01
daed54f
832acee
e7b7f69
1f8307f
fb2b78d
290b330
7c1bbce
cf4aa91
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -908,8 +908,20 @@ class PyListConverter : public ListConverter<T, PyConverter, PyConverterTrait> { | |
|
|
||
| Status AppendNdarray(PyObject* value) { | ||
| PyArrayObject* ndarray = reinterpret_cast<PyArrayObject*>(value); | ||
| OwnedRef flattened; | ||
| if (PyArray_NDIM(ndarray) != 1) { | ||
| return Status::Invalid("Can only convert 1-dimensional array values"); | ||
| // GH-49644: a fixed-size list (e.g. the storage of a fixed-shape tensor) | ||
| // can be built from a multi-dimensional array by flattening it in C | ||
| // order. The total number of elements must still match the list size, | ||
| // which the builder validates below. 0-dimensional arrays and | ||
| // variable-sized lists remain restricted to 1-dimensional values. | ||
|
rok marked this conversation as resolved.
Outdated
|
||
| if (PyArray_NDIM(ndarray) < 2 || this->list_type_->id() != Type::FIXED_SIZE_LIST) { | ||
| return Status::Invalid("Can only convert 1-dimensional array values"); | ||
|
rok marked this conversation as resolved.
Outdated
|
||
| } | ||
|
aboderinsamuel marked this conversation as resolved.
|
||
| flattened.reset(PyArray_Ravel(ndarray, NPY_CORDER)); | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why did you decided to use
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. PyArray_Ravel(ndarray, NPY_CORDER) returns a 1-D, C-order view, which lets me reuse the existing 1-D append paths unchanged (the typed fast paths and the PySequence fallback). It's zero-copy when the input is already C-contiguous and only copies for non-contiguous/Fortran-order input, where a copy is unavoidable to get C-order data anyway, which also matches the fixed-shape-tensor's row-major storage. Alternatives: PyArray_Flatten(NPY_CORDER) gives the same result but always copies; a manual NpyIter/buffer loop could avoid even the view allocation but would mean duplicating the typed-append fast paths. Ravel was the smallest change with the best reuse/performance balance.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you have a look at I am not sure which would fit better but am interested in what you think.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also to note here - PyArray_Ravel seems to default to C order. So we should document somewhere that no matter the input arrays order we will return C order. (@AlenkaF please doublecheck me here :) )
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Great, I think having more control can be helpful. Thanks for looking into it!
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you 🙏, i'm currently taking a look at the reference
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the pointer to the flags doc, @AlenkaF, really useful read 🙂 Done: switched to PyArray_CheckFromAny with NPY_ARRAY_C_CONTIGUOUS | NPY_ARRAY_ALIGNED, then view it as 1-D and read in C order. The ALIGNED flag was the piece I'd have missed on my own, so thanks for steering me there! For byte order I kept the explicit ISBYTESWAPPED check (moved up top per @rok) instead of NPY_ARRAY_NOTSWAPPED, and left out WRITEABLE since we only read here. Really appreciate you and @rok taking the time on this, learning a ton from it.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, the comment now notes the array is always flattened in C order regardless of the input's layout. @rok
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @AlenkaF I think this is ok now. Do you agree? |
||
| RETURN_IF_PYERROR(); | ||
| value = flattened.obj(); | ||
| ndarray = reinterpret_cast<PyArrayObject*>(value); | ||
| } | ||
| if (PyArray_ISBYTESWAPPED(ndarray)) { | ||
|
rok marked this conversation as resolved.
Outdated
|
||
| // TODO | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.