ignore: scope compiled parent matchers by root#3420
Conversation
| }; | ||
| let mut compiled = self.0.compiled.write().unwrap(); | ||
| if let Some(weak) = compiled.get(parent.as_os_str()) { | ||
| if let Some(weak) = compiled.get(&key) { |
There was a problem hiding this comment.
I'm not very familiar with this code base, but this makes me wonder if it's still worth caching here.
for roots: /tmp/project/src, /tmp/project/tests:
- Before: The entry for
/tmp/projectwas shared - After: Each root uses its own pre-built
Ignore, making the caching here useless except for the case where someone callsadd_parentstwice with the exact same paths
The caching here also served two purposes:
- Avoid parsing the same ignore files multiple times
- Deduplicate errors (because we only built the ignore once)
Now, the main issue why sharing the Ignore across base paths is incorrect is because we set ignore.absolute_base to Some(absolute_base).
I haven't traced through all the changes necessary but I think a better solution is to decouple absolute_path from what we store in the cache, e.g. by:
- Moving
absolute_pathfromIgnoreInnerto theIgnorestruct (makes cloning more expensive) - Use a different struct for caching
There was a problem hiding this comment.
Yeah I think @MichaReiser is right here. In particular, I think this solution will fall apart when there are a lot of roots. For example, rg foo bar/** where bar contains many paths.
I'm on mobile so I don't have a good idea right now of what the simplest change is yet. But Micha's ideas at a very high level seem plausible.
There was a problem hiding this comment.
Thanks, I pushed a followup aligning with Micha's first suggestion. This largely restores the performance of the pre-existing code while still fixing the bug.
| Arc<IgnoreInner>, | ||
| // Parent matchers are cached independently of the root they are used | ||
| // from, but matching them still needs the current root to rewrite paths. | ||
| Option<Arc<PathBuf>>, |
There was a problem hiding this comment.
If using a named field is too disruptive, can we document what root means in this context.
There was a problem hiding this comment.
Moved to named fields, wasn't too bad
| } | ||
| } | ||
|
|
||
| /// An iterator over all parents of an ignore matcher, including itself. |
There was a problem hiding this comment.
This comment here is now misplaced
|
I got this benchmark from Codex that I think I believe. It finds a regression on the initial commit in this PR, but not on master or on the last commit in this PR. Do: I find this quite promising! |
ca8f3c2 to
9818a2c
Compare
9818a2c to
aff90bc
Compare
Parent matchers are cached by directory, but they also stored the canonicalized path passed to `Ignore::add_parents`. Reusing cached matchers while walking another root could therefore rewrite paths relative to the wrong root and apply scoped parent gitignore rules incorrectly. Keep the cached matcher chain independent of the walk root. Carry the absolute base path in `Ignore` instead, and add a regression test that checks cached parent matchers can be reused across roots without sharing their path semantics. Fixes BurntSushi#3419
aff90bc to
b6722c1
Compare
|
|
||
| #[derive(Clone, Copy)] | ||
| pub(crate) struct IgnoreRef<'a> { | ||
| inner: &'a IgnoreInner, |
There was a problem hiding this comment.
We use a named field here to reduce churn in the diff
|
I reworded the commits. I also had codex write a few benchmarks and verified that none of them regress (beyond noise). We could consider adding those benchmarks to the repo, but I won't do so as part of this PR. @BurntSushi I think this is good to go Benchmarks/*!
This module benchmarks ignore file matching.
*/
#![feature(test)]
extern crate test;
use std::{
fmt::Write as _,
fs,
path::{Path, PathBuf},
sync::atomic::{AtomicUsize, Ordering},
time::{SystemTime, UNIX_EPOCH},
};
use ignore::{WalkBuilder, WalkState};
const GITIGNORE_PATTERNS: usize = 100;
const DEEP_DIRS: usize = 64;
const LAYERED_DIRS: usize = 32;
const LAYERED_PARENTS: usize = 8;
const MANY_ROOTS: usize = 64;
const PARTIALLY_SHARED_GROUPS: usize = 8;
const PARTIALLY_SHARED_ROOTS_PER_GROUP: usize = 8;
const WIDE_DIRS: usize = 128;
struct TempDir(PathBuf);
impl TempDir {
fn new() -> TempDir {
let nanos =
SystemTime::now().duration_since(UNIX_EPOCH).unwrap().as_nanos();
let path = std::env::temp_dir()
.join(format!("ignore-bench-{}-{nanos}", std::process::id(),));
fs::create_dir(&path).unwrap();
TempDir(path)
}
fn path(&self) -> &Path {
&self.0
}
}
impl Drop for TempDir {
fn drop(&mut self) {
fs::remove_dir_all(&self.0).unwrap();
}
}
struct ParentGitignore {
_temp_dir: TempDir,
builder: WalkBuilder,
roots: usize,
}
impl ParentGitignore {
fn new(roots: usize) -> ParentGitignore {
let gitignore = gitignore_patterns("ignored");
ParentGitignore::with_paths(&gitignore, |project| {
(0..roots).map(|i| project.join(format!("root-{i:03}"))).collect()
})
}
fn empty(roots: usize) -> ParentGitignore {
ParentGitignore::with_paths("", |project| {
(0..roots).map(|i| project.join(format!("root-{i:03}"))).collect()
})
}
fn partially_shared() -> ParentGitignore {
let gitignore = gitignore_patterns("ignored");
ParentGitignore::with_paths(&gitignore, |project| {
(0..PARTIALLY_SHARED_GROUPS)
.flat_map(|group| {
(0..PARTIALLY_SHARED_ROOTS_PER_GROUP).map(move |root| {
project
.join(format!("group-{group:03}"))
.join(format!("root-{root:03}"))
})
})
.collect()
})
}
fn deep_kept() -> ParentGitignore {
ParentGitignore::with_paths("ignored/**\n", |project| {
let root = project.join("root");
let mut current = root.clone();
for i in 0..DEEP_DIRS {
current = current.join(format!("dir-{i:03}"));
fs::create_dir_all(¤t).unwrap();
}
fs::write(current.join("file"), "").unwrap();
vec![root]
})
}
fn layered_parent_gitignores_misses() -> ParentGitignore {
let gitignore = gitignore_patterns("ignored-project");
ParentGitignore::with_paths(&gitignore, |project| {
let mut parent = project.to_path_buf();
for layer in 0..LAYERED_PARENTS {
parent = parent.join(format!("parent-{layer:03}"));
fs::create_dir_all(&parent).unwrap();
fs::write(
parent.join(".gitignore"),
gitignore_patterns(&format!("ignored-{layer:03}")),
)
.unwrap();
}
let root = parent.join("root");
for i in 0..LAYERED_DIRS {
let dir = root.join(format!("dir-{i:03}"));
fs::create_dir_all(&dir).unwrap();
fs::write(dir.join("file"), "").unwrap();
}
vec![root]
})
}
fn wide_kept() -> ParentGitignore {
ParentGitignore::with_paths("ignored/**\n", |project| {
let root = project.join("root");
for i in 0..WIDE_DIRS {
let dir = root.join(format!("dir-{i:03}"));
fs::create_dir_all(&dir).unwrap();
fs::write(dir.join("file"), "").unwrap();
}
vec![root]
})
}
fn with_parent_matches(roots: usize) -> ParentGitignore {
ParentGitignore::with_paths("root-*/ignored/\n", |project| {
let paths = (0..roots)
.map(|i| project.join(format!("root-{i:03}")))
.collect::<Vec<_>>();
for path in &paths {
fs::create_dir_all(path.join("ignored")).unwrap();
fs::create_dir_all(path.join("kept")).unwrap();
fs::write(path.join("ignored/file"), "").unwrap();
fs::write(path.join("kept/file"), "").unwrap();
}
paths
})
}
fn with_paths(
gitignore: &str,
paths: impl FnOnce(&Path) -> Vec<PathBuf>,
) -> ParentGitignore {
let temp_dir = TempDir::new();
let project = temp_dir.path().join("project");
fs::create_dir_all(project.join(".git")).unwrap();
fs::write(project.join(".gitignore"), gitignore).unwrap();
let paths = paths(&project);
for path in &paths {
fs::create_dir_all(path).unwrap();
}
let mut builder = WalkBuilder::new(&paths[0]);
for path in &paths[1..] {
builder.add(path);
}
builder.max_depth(Some(0)).git_global(false).git_exclude(false);
ParentGitignore { _temp_dir: temp_dir, builder, roots: paths.len() }
}
fn walk(&self) {
assert_eq!(self.roots, self.builder.build().count());
}
fn walk_parallel(&self, expected_entries: usize) {
let entries = AtomicUsize::new(0);
self.builder.build_parallel().run(|| {
Box::new(|_| {
entries.fetch_add(1, Ordering::Relaxed);
WalkState::Continue
})
});
assert_eq!(expected_entries, entries.load(Ordering::Relaxed));
}
fn walk_entries(&self, expected_entries: usize) {
assert_eq!(expected_entries, self.builder.build().count());
}
}
fn gitignore_patterns(prefix: &str) -> String {
let mut gitignore = String::new();
for i in 0..GITIGNORE_PATTERNS {
writeln!(gitignore, "{prefix}/{i:03}/**").unwrap();
}
gitignore
}
#[bench]
fn add_parents_one_root(b: &mut test::Bencher) {
let fixture = ParentGitignore::new(1); |
BurntSushi
left a comment
There was a problem hiding this comment.
I love the fix! Rather elegant. Thank you @jelle-openai and @MichaReiser for getting this over the finish line. :-)
|
This PR is on crates.io in |
Wow, this wask quick. Thank you |
Fixes #3419.