Perf: Optimize pages loading (Filecache path like approach)#2549
Conversation
5d10baa to
5e6b2e9
Compare
mejo-
left a comment
There was a problem hiding this comment.
Thanks a lot @Koc, this looks really promising 🤩
I have some comments, but I'm genuinely curious what you think about the comments.
If you do further changes to the PR, could you do them in separate fixup commits (and don't force-push changes to the existing commit for now) so it's easier to review your changes?
| * @throws NotFoundException | ||
| * @throws NotPermittedException | ||
| */ | ||
| public function getPagesFromFolderV2(int $collectiveId, Folder $folder, string $userId, bool $recurse = false, bool $forceIndex = false): array { |
There was a problem hiding this comment.
Is there a reason to introduce this as a new function instead of replacing getPagesFromFolder() altogether? There's still one call left to the old function (lib/Service/PageService.php:984), was this left out on purpose?
I'd prefer to replace the old function and update the Unit tests along, as it would be great to not loose the existing unit testing for getPagesFromFolder() for the new function.
There was a problem hiding this comment.
I would say that this is temporary POC method to have possibility quickly switch between old an new implementation. So it's easier to add V2 prefix to call.
But, anyway, I guess that old implementation should be preserved for background jobs/console commands to re-create tree.
| $qb->select('fileid', 'storage', 'path', 'parent', 'name', 'mimetype', 'mimepart', | ||
| 'size', 'mtime', 'storage_mtime', 'encrypted', 'etag', 'permissions') | ||
| ->from('filecache') | ||
| ->where($qb->expr()->eq('storage', $qb->createNamedParameter($storageId, IQueryBuilder::PARAM_INT))) |
There was a problem hiding this comment.
Any reason to not further filter on mimetype here and only get folders and Markdown files? If I understand the code further down correctly, we only process folders and Markdown files anyway, right?
There was a problem hiding this comment.
My goal was to introduce a new index that will cover columns used in filter. So, in this case we should extend it with mime type then
| * @throws NotFoundException | ||
| * @throws \OCP\DB\Exception | ||
| */ | ||
| public function getFileCacheForCollective(int $collectiveId): array { |
There was a problem hiding this comment.
Wouldn't it be an option to pass the folder name here as well and further narrow down the query to only contain files in this subfolder? This would make the query less heavy when getPagesFromFolderV2() is called with a subfolder and not for the whole collective folder.
There was a problem hiding this comment.
This would be particularly helpful for building the templateFolder page tree, as that one doesn't need to process all the files from the collective rood folder.
There was a problem hiding this comment.
yeah, but as far I understand we should load all possible descendants (not only direct children). Filecache is not nested tree/closure tree structure, so we can't efficiently load all descendants
| $this->setTimestamp($fileInfo->mtime); | ||
| $this->setSize($fileInfo->size); | ||
| $this->setFileName($fileInfo->name); | ||
| if ($collectivePath !== null) { |
There was a problem hiding this comment.
I guess these checks can be omitted as the default is null anyway, so setting them to null again doesn't hurt.
There was a problem hiding this comment.
yeah, but in this case we should update setter to accept null. And we should do the same for many other setters. I'd prefer to keep same code style like we have in PageInfo::fromFile() method.
| /** | ||
| * Build the page info from a lightweight filecache entry (see PageService::getPagesFromFolderV2). | ||
| */ | ||
| public function fromFileInfo( |
There was a problem hiding this comment.
This copies much of the logic of fromFile(). Maybe worth refactoring to have less code duplication?
There was a problem hiding this comment.
I had an attempt to do that but with this approach we will have large method with all possible parameters. So we will have even more lines of code comparing to what we have now.
Signed-off-by: Kostiantyn Miakshyn <molodchick@gmail.com>
5e6b2e9 to
2901fd7
Compare
📝 Summary
This is alternative approach to #2390 that fixes same performance issue (closes #2380).
Benefits comparing to previous implementation:
So, we're just load all necessary pages via simple query
SELECT * FROM filecache WHERE storage_id = <storageId> AND path LIKE 'appdata_<instanceId>/collectives/<collectiveId>/%'🖼️ Screenshots
Collective with 390 pages with various nesting level
🚧 TODO
filecachetable tostorage_id, pathcolumns (but this requires extra PR to nextcloud/server)🏁 Checklist
npm run lint/npm run stylelint/composer run cs:check)🤖 AI (if applicable)