Supervised spatial indices#20
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #20 +/- ##
=======================================
Coverage ? 59.72%
=======================================
Files ? 6
Lines ? 1100
Branches ? 0
=======================================
Hits ? 657
Misses ? 443
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
After researching it, I am going to say that adding splines here is out of scope for this PR. I thought it would be as easy as applying a b-simple spline recursion function to the dummy variable matrix. But, I do not immediately see how to do this. Naively, we'd do the following.
I think introducing splines would make this a SpatialMARS. With a little bit of digging, I think if this were desired, I'd need to look into the bivariate spline literature to see if this style of dynamic knot generation hasn't already been figured out. It seems highly likely someone has done this. |
This adds supervised spatial indices to the package. Supervised spatial indices are a method of structuring a spatial training regime for local models. Basically, you fit a local model for each page in a spatial index, and split the page when the model can be improved by doing so. This adds quite a few classes:
QuadtreeRegressor: Use a Quadtree to structure a spatial feature engineering search. This trains a quadtree on the data where pages are split if they improve the regression loss. After the tree is built, we train a single final model on the set offeature:spatial_index_pageinteraction terms. Pruning "rolls up"feature:spatial_index_pageinteraction terms along the tree structure. This works on both rasters (Xisn_row,n_col,n_features) and point (Xisn_sites, n_features) data.QuadtreeClassifier: Same as above but for discrete outcomes.QuadtreeBoostingRegressor: Same asQuadtreeRegressor, but predictions accumulate down the tree. Thus, for each split, the parent prediction plus the child prediction is compared to the parent, rather than comparing the separate child model vs. the parent model in the child.QuadtreeEnsembleRegressor: Same asQuadtreeRegressor, but instead of training a single global model on the discovered feature:spatial index interaction terms, we use ensemble of local models at each leaf in the spatial index. For any additive model, this will be the same asQuadtreeRegressor.QuadtreeEnsembleClassifier: Same asQuadtreeEnsembleRegressor, but for discrete outcomes.KDTreeRegressor: LikeQuadtreeRegressor, but using a KDTree instead of a Quadtree. This means that each parent has two children (rather than four), and splits are made at the absolute residual-weighted median of the longest page side by default.KDTreeClassifier: LikeKDTreeRegressorbut for discrete outcomesKDTreeBoostingRegressor: LikeQuadtreeBoostingRegressorbut with KDTreesKDTreeEnsembleRegressor: LikeQuadtreeEnsembleRegressorbut with KDTreesKDTreeEnsembleClassifier:LikeQuadtreeEnsembleClassifierbut with KDTreesTodo:
split_test='eps'by default.Allow splines. Each page instantiates a knot (KDTree at the median, Quadtree at the center). Right now, we just fit piecewise-linear models over the domain. But, one could apply a basis function inb(X,Y), too, which would ensure that predictions are smooth from page to page.