Supervised spatial indices by ljwolf · Pull Request #20 · pysal/spatialml

ljwolf · 2025-07-04T09:01:42Z

This adds supervised spatial indices to the package. Supervised spatial indices are a method of structuring a spatial training regime for local models. Basically, you fit a local model for each page in a spatial index, and split the page when the model can be improved by doing so. This adds quite a few classes:

QuadtreeRegressor: Use a Quadtree to structure a spatial feature engineering search. This trains a quadtree on the data where pages are split if they improve the regression loss. After the tree is built, we train a single final model on the set of feature:spatial_index_page interaction terms. Pruning "rolls up" feature:spatial_index_page interaction terms along the tree structure. This works on both rasters (X is n_row,n_col,n_features) and point (X is n_sites, n_features) data.
QuadtreeClassifier: Same as above but for discrete outcomes.
QuadtreeBoostingRegressor: Same as QuadtreeRegressor, but predictions accumulate down the tree. Thus, for each split, the parent prediction plus the child prediction is compared to the parent, rather than comparing the separate child model vs. the parent model in the child.
QuadtreeEnsembleRegressor: Same as QuadtreeRegressor, but instead of training a single global model on the discovered feature:spatial index interaction terms, we use ensemble of local models at each leaf in the spatial index. For any additive model, this will be the same as QuadtreeRegressor.
QuadtreeEnsembleClassifier: Same as QuadtreeEnsembleRegressor, but for discrete outcomes.
KDTreeRegressor: Like QuadtreeRegressor, but using a KDTree instead of a Quadtree. This means that each parent has two children (rather than four), and splits are made at the absolute residual-weighted median of the longest page side by default.
KDTreeClassifier: Like KDTreeRegressor but for discrete outcomes
KDTreeBoostingRegressor: Like QuadtreeBoostingRegressor but with KDTrees
KDTreeEnsembleRegressor: Like QuadtreeEnsembleRegressor but with KDTrees
KDTreeEnsembleClassifier:Like QuadtreeEnsembleClassifier but with KDTrees

Todo:

tests
Hilbert RTree for areal/lattice data
set split_test='eps' by default.
Allow splines. Each page instantiates a knot (KDTree at the median, Quadtree at the center). Right now, we just fit piecewise-linear models over the domain. But, one could apply a basis function in b(X,Y), too, which would ensure that predictions are smooth from page to page.

codecov · 2025-07-04T09:05:17Z

Codecov Report

Attention: Patch coverage is 0% with 396 lines in your changes missing coverage. Please review.

Please upload report for BASE (main@5858f2f). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
gwlearn/quadtree.py	0.00%	396 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main      #20   +/-   ##
=======================================
  Coverage        ?   59.72%           
=======================================
  Files           ?        6           
  Lines           ?     1100           
  Branches        ?        0           
=======================================
  Hits            ?      657           
  Misses          ?      443           
  Partials        ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ljwolf · 2025-07-04T11:19:16Z

After researching it, I am going to say that adding splines here is out of scope for this PR. I thought it would be as easy as applying a b-simple spline recursion function to the dummy variable matrix. But, I do not immediately see how to do this.

Naively, we'd do the following.

Start with a global model.
Consider candidate split in branch j. This split introduces extra knots (KDTree adds 2, Quadtree adds 4). Calculate a new scipy.interpolate.bisplrep(*data_coords, z=y, tx=new_knots, ty=new_knots).
Test if the new spline improves score by eps. If so, keep the new knots and add splits to queue. If not, consider j branch fathomed.
After growing, use the same "roll-up" pruning procedure; check the feature importance of the sets of spline terms by knot. If those are not important (in sum), zero that set of coefficients for that feature.

I think introducing splines would make this a SpatialMARS. With a little bit of digging, I think if this were desired, I'd need to look into the bivariate spline literature to see if this style of dynamic knot generation hasn't already been figured out. It seems highly likely someone has done this.

ljwolf added 3 commits July 4, 2025 09:29

add tree learners

1b3a9ec

add ensemble versions

a31dcec

whitespace

f0ba39b

ljwolf marked this pull request as draft July 4, 2025 09:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supervised spatial indices#20

Supervised spatial indices#20
ljwolf wants to merge 3 commits into
pysal:mainfrom
ljwolf:main

ljwolf commented Jul 4, 2025 •

edited

Loading

Uh oh!

codecov Bot commented Jul 4, 2025

Uh oh!

ljwolf commented Jul 4, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ljwolf commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Jul 4, 2025

Codecov Report

Uh oh!

ljwolf commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ljwolf commented Jul 4, 2025 •

edited

Loading

ljwolf commented Jul 4, 2025 •

edited

Loading