Support pandas 3.0#1745
Conversation
Moves dependency constraints to pyproject.toml. Makes requirements.txt a lockfile.
Fixes an incompatibility caused by click 8.3.0, which passes the default value as-is.
Fixes an incompatibility caused by pyreadstat 1.2.9, which changed original_variable_type from 'NULL' to None
Works around an behavior change in jsonpath-ng 1.8.0 where Child.str gets wrapped in parenthesis.
Fixes tokenization errors when using dask 2024.8.1+. Starting with this version, dask enforces that tokens remain stable across pickle round-trips (dask/dask#11320). Capturing self in a lambda fails this check because instance objects can have non-deterministic pickle representations. Since calculate_variable_value_length is already a static method, replacing self with the class name is enough to remove the capture.
Dask 2025.4.0 optimizes multiple DataFrames together, which exposes division mismatches and causes dask to throw an error. This change removes a source of repartitioning, preserving the divisions when assigning a pandas series to a dask dataframe
Fixes a unit test to support pandas 2.2.0+. The pandas release fixes an sorting bug with pandas-dev/pandas#54611. This commit changes the expected results accordingly.
|
@filippsatverily Could you please resolve the conflicts on the branch so we can move towards validation and merge? |
| Pympler==1.1 | ||
| pyreadstat==1.2.7 | ||
| python-dotenv==1.0.0 | ||
| pytz==2026.2 |
There was a problem hiding this comment.
@filippsatverily what this dependency declaration intentional?
There was a problem hiding this comment.
Yes, pandas 3 dropped it as an implicit dependency, but it's still used in cdisc-rules-engine in one or two places for timezone conversion
|
@filippsatverily we have merged your other PR with some tweaks--we are now using a pyproject.toml for the dependency installation. I suspect this PR will need to be reworked given the changes--I have thus moved this back to In-Progress. I am happy to rereview once this is ready |
Per separate discussion, splitting this PR into many smaller ones; I'll keep the list below updated: |
Summary
Upgrades cdisc-rules-engine to support pandas 3.0. Stacked on top of #1713 (relax dependency constraints) — please merge that first.
pandas <3.0upper bound and addpytz(no longer a pandas transitive dep)applymap()withmap()(removed in pandas 3.0)inplace=Truemutation patterns (pandas 3.0 Copy-on-Write)StringDtypein comparison operatorsDaskDataset.__setitem__dd.DataFrametype annotation inparquet_readermethod=anddowncast=kwargs removed in pandas 3.0.apply(set)path inDistinctoperation