-
Notifications
You must be signed in to change notification settings - Fork 71
Crashed worker to update job itself (and make the UI clearer on status) #602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 8 commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
e4ae314
create test datasource
dale-wahl a44cfc0
update job.py to allow for "parking" crashed jobs for a restart
dale-wahl a78f3ca
update worker.py to officially mark crashed jobs.
dale-wahl b6eef6e
note in queue.py on new Job.STATUS_PARKED
dale-wahl a3fd241
update test datasource to let me have more than one running
dale-wahl 03b8eab
update api.py to use and pass parked/crashed jobs; show on Active Wor…
dale-wahl add712b
Update jobs.html and allow retrying crashed jobs (without restart!)
dale-wahl af877ac
fix attempt counting bug i introduced
dale-wahl a7bc04c
Merge branch 'master' into pr/602
dale-wahl 926f407
skip no import warning if datasource_disabled
dale-wahl 9da6df8
park reoccurring jobs as well on crash
dale-wahl fe12829
update test datasource for intervals (and note datasources cannot be …
dale-wahl b5938d8
word-trees: ruff says remove unused assignment
dale-wahl File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,27 @@ | ||
| """ | ||
| Test datasource (development only) | ||
|
|
||
| Provides a dummy "search" worker that creates datasets in deliberately distinct | ||
| states (completing normally, running forever, or crashing) so the worker-status | ||
| and queue admin pages can be exercised without running a real data collection. | ||
|
|
||
| The search worker only registers itself when the | ||
| ``FOURCAT_ENABLE_TEST_DATASOURCE`` environment variable is set to a truthy | ||
| value, so this datasource is inert (no worker, nothing runnable) on a normal or | ||
| production instance even though the folder is present. | ||
|
|
||
| See ``helper-scripts/create_test_jobs.py`` for enqueuing one of each state. | ||
| """ | ||
| import os | ||
|
|
||
| # only register this datasource when explicitly enabled, so it is inert on a | ||
| # normal/production instance. This MUST match the gate on the search worker in | ||
| # search_test.py: if the datasource registers without its worker, | ||
| # manager.validate_datasources() errors with "No search worker defined". | ||
| if os.environ.get("FOURCAT_ENABLE_TEST_DATASOURCE", "").lower() in ("1", "true", "yes", "on"): | ||
| # Use default data source init function | ||
| from common.lib.helpers import init_datasource as init_datasource | ||
|
|
||
| # Internal identifier for this data source | ||
| DATASOURCE = "test" | ||
| NAME = "Test datasource (dev only)" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,131 @@ | ||
| """ | ||
| Test datasource search worker (development only) | ||
|
|
||
| This worker only registers itself when the ``FOURCAT_ENABLE_TEST_DATASOURCE`` | ||
| environment variable is set to a truthy value, so it never loads on a normal or | ||
| production instance. It produces dummy datasets in one of three deliberately | ||
| distinct states, selected via the ``mode`` parameter: | ||
|
|
||
| - ``complete``: writes a few dummy rows and finishes normally | ||
| - ``forever``: runs indefinitely (until interrupted), updating progress | ||
| - ``crash``: raises a generic exception; because an unhandled exception | ||
| leaves the job claimed but releases no worker, this reproduces | ||
| the ``is_maybe_crashed`` state (claimed job, no live worker) | ||
|
|
||
| Use ``helper-scripts/create_test_jobs.py`` to enqueue one of each. Note that the | ||
| backend daemon must also have ``FOURCAT_ENABLE_TEST_DATASOURCE`` set for the | ||
| jobs to actually be picked up and run. | ||
| """ | ||
| import os | ||
| import time | ||
|
|
||
| from backend.lib.search import Search | ||
| from common.lib.user_input import UserInput | ||
| from common.lib.item_mapping import MappedItem | ||
| from common.lib.exceptions import ProcessorInterruptedException | ||
|
|
||
| # only make this worker available when explicitly enabled, so it never loads on | ||
| # a normal/production instance (the datasource folder is always discovered, but | ||
| # without this class there is no `test-search` worker and nothing can run) | ||
| TEST_DATASOURCE_ENABLED = os.environ.get("FOURCAT_ENABLE_TEST_DATASOURCE", "").lower() in ("1", "true", "yes", "on") | ||
|
|
||
| if TEST_DATASOURCE_ENABLED: | ||
|
|
||
| class SearchTest(Search): | ||
| """ | ||
| Dummy search worker for exercising the worker/queue status pages | ||
| """ | ||
| type = "test-search" # job ID | ||
| category = "Search" # category | ||
| title = "Test datasource (dev only)" # title displayed in UI | ||
| description = "Development-only datasource that creates dummy datasets in various states (complete, forever, crash) to exercise admin status pages." | ||
| extension = "ndjson" # extension of result file | ||
|
|
||
| # not offered as a processor for existing datasets | ||
| accepts = [None] | ||
|
|
||
| @classmethod | ||
| def get_queue_id(cls, remote_id, details, dataset) -> str: | ||
| # one queue per job so the dummy jobs run concurrently instead of | ||
| # serialising behind one another (a 'forever' job would otherwise block | ||
| # the rest). | ||
| return f"{cls.type}-{remote_id}" | ||
|
|
||
| @classmethod | ||
| def get_options(cls, parent_dataset=None, config=None): | ||
| return { | ||
| "mode": { | ||
| "type": UserInput.OPTION_CHOICE, | ||
| "help": "Test mode", | ||
| "options": { | ||
| "complete": "Complete normally (writes dummy rows)", | ||
| "forever": "Run forever (until interrupted)", | ||
| "crash": "Crash (raise an exception)", | ||
| }, | ||
| "default": "complete", | ||
| }, | ||
| "amount": { | ||
| "type": UserInput.OPTION_TEXT, | ||
| "help": "Number of dummy rows (complete mode)", | ||
| "coerce_type": int, | ||
| "default": 5, | ||
| "min": 0, | ||
| }, | ||
| } | ||
|
|
||
| def get_items(self, query): | ||
| """ | ||
| Generate dummy items, or run forever, or crash - depending on mode | ||
|
|
||
| :param dict query: Query parameters, expects a `mode` key | ||
| :return: Iterable of dummy items (complete mode) or None (forever) | ||
| """ | ||
| mode = query.get("mode", "complete") | ||
|
|
||
| if mode == "crash": | ||
| # leaves the job claimed with no live worker once the thread | ||
| # ends -> shows up as `is_maybe_crashed` on the status page | ||
| self.dataset.update_status("Test datasource: about to raise an exception") | ||
| raise Exception("Test datasource intentional crash (mode=crash)") | ||
|
|
||
| if mode == "forever": | ||
| # block here until interrupted; this holds a worker slot so the | ||
| # job shows up as actively running with a moving progress bar | ||
| tick = 0 | ||
| while True: | ||
| if self.interrupted: | ||
| raise ProcessorInterruptedException("Interrupted while running forever (mode=forever)") | ||
| tick += 1 | ||
| self.dataset.update_status("Test datasource: running forever (tick %i)" % tick) | ||
| # oscillate progress 0..1 so the bar is visibly active | ||
| self.dataset.update_progress((tick % 20) / 20) | ||
| time.sleep(2) | ||
|
|
||
| # mode == "complete": write some dummy rows and finish | ||
| amount = query.get("amount", 5) | ||
| items = [] | ||
| for i in range(amount): | ||
| if self.interrupted: | ||
| raise ProcessorInterruptedException("Interrupted while generating dummy data (mode=complete)") | ||
| self.dataset.update_progress((i + 1) / amount if amount else 1) | ||
| items.append({ | ||
| "id": str(i), | ||
| "thread_id": str(i), | ||
| "subject": "Dummy item %i" % i, | ||
| "body": "This is dummy test item %i." % i, | ||
| "author": "test_user", | ||
| "timestamp": "1970-01-01 00:00:00", | ||
| }) | ||
|
|
||
| return items | ||
|
|
||
| @staticmethod | ||
| def map_item(item): | ||
| return MappedItem({ | ||
| "id": item.get("id", ""), | ||
| "thread_id": item.get("thread_id", ""), | ||
| "subject": item.get("subject", ""), | ||
| "body": item.get("body", ""), | ||
| "author": item.get("author", ""), | ||
| "timestamp": item.get("timestamp", "1970-01-01 00:00:00"), | ||
| }) |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.