From ae140b7581681041330a47cf05b10890623211d7 Mon Sep 17 00:00:00 2001 From: KusevskaElena Date: Wed, 3 Mar 2021 13:08:58 +0100 Subject: [PATCH 01/11] Add CODE_OF_CONDUCT.md and CONTRIBUTING.md --- CODE_OF_CONDUCT.md | 76 ++++++++++++++++++++++++++++++++++++++++++++++ CONTRIBUTING.md | 27 ++++++++++++++++ 2 files changed, 103 insertions(+) create mode 100644 CODE_OF_CONDUCT.md create mode 100644 CONTRIBUTING.md diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md new file mode 100644 index 00000000..04b029f9 --- /dev/null +++ b/CODE_OF_CONDUCT.md @@ -0,0 +1,76 @@ +# Contributor Covenant Code of Conduct + +## Our Pledge + +In the interest of fostering an open and welcoming environment, we as +contributors and maintainers pledge to making participation in our project and +our community a harassment-free experience for everyone, regardless of age, body +size, disability, ethnicity, sex characteristics, gender identity and expression, +level of experience, education, socio-economic status, nationality, personal +appearance, race, religion, or sexual identity and orientation. + +## Our Standards + +Examples of behavior that contributes to creating a positive environment +include: + +* Using welcoming and inclusive language +* Being respectful of differing viewpoints and experiences +* Gracefully accepting constructive criticism +* Focusing on what is best for the community +* Showing empathy towards other community members + +Examples of unacceptable behavior by participants include: + +* The use of sexualized language or imagery and unwelcome sexual attention or + advances +* Trolling, insulting/derogatory comments, and personal or political attacks +* Public or private harassment +* Publishing others' private information, such as a physical or electronic + address, without explicit permission +* Other conduct which could reasonably be considered inappropriate in a + professional setting + +## Our Responsibilities + +Project maintainers are responsible for clarifying the standards of acceptable +behavior and are expected to take appropriate and fair corrective action in +response to any instances of unacceptable behavior. + +Project maintainers have the right and responsibility to remove, edit, or +reject comments, commits, code, wiki edits, issues, and other contributions +that are not aligned to this Code of Conduct, or to ban temporarily or +permanently any contributor for other behaviors that they deem inappropriate, +threatening, offensive, or harmful. + +## Scope + +This Code of Conduct applies both within project spaces and in public spaces +when an individual is representing the project or its community. Examples of +representing a project or community include using an official project e-mail +address, posting via an official social media account, or acting as an appointed +representative at an online or offline event. Representation of a project may be +further defined and clarified by project maintainers. + +## Enforcement + +Instances of abusive, harassing, or otherwise unacceptable behavior may be +reported by contacting the project team at info@keitaro.com. All +complaints will be reviewed and investigated and will result in a response that +is deemed necessary and appropriate to the circumstances. The project team is +obligated to maintain confidentiality with regard to the reporter of an incident. +Further details of specific enforcement policies may be posted separately. + +Project maintainers who do not follow or enforce the Code of Conduct in good +faith may face temporary or permanent repercussions as determined by other +members of the project's leadership. + +## Attribution + +This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, +available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html + +[homepage]: https://www.contributor-covenant.org + +For answers to common questions about this code of conduct, see +https://www.contributor-covenant.org/faq diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 00000000..234d2591 --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,27 @@ +## How to contribute to ckanext-archiver + +#### **Did you find a bug?** + +* **Ensure the bug was not already reported** by searching on GitHub under [Issues](https://github.com/keitaroinc/ckanext-archiver/issues). + +* If you're unable to find an open issue addressing the problem, [open a new one](https://github.com/keitaroinc/ckanext-archiver/issues/new). Be sure to include a **title and clear description**, as much relevant information as possible, we include an issue template to help out in filling-in the issue. + +#### **Did you write a patch that fixes a bug?** + +* Open a new GitHub pull request with the patch. + +* Ensure the PR description clearly describes the problem and solution. Include the relevant issue number if applicable. + +#### **Do you intend to add a new feature or change an existing one?** + +* [Create a new feature issue](https://github.com/keitaroinc/ckanext-archiver/issues/new) using the Feature Request template and describe your proposed changes + +* Submit a pull request referring to the relevant feature issue/s + +#### **Do you have questions about the source code?** + +* Ask any question about how to use ckanext-archiver in our [gitter chat](https://gitter.im/keitaroinc/ckan). + +Thanks! + +Keitaro Team From adbb5ec32ba50cd1a0e1ba954945ea49cdc96205 Mon Sep 17 00:00:00 2001 From: antuarc Date: Fri, 16 Sep 2022 09:24:18 +1000 Subject: [PATCH 02/11] reduce duplicate GitHub Actions builds, #2 - Only run on pull request if targeting master, since 'push' covers most cases except for cross-repo pull requests. --- .github/workflows/test.yml | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml index eccc5206..fdeb9ac6 100644 --- a/.github/workflows/test.yml +++ b/.github/workflows/test.yml @@ -1,5 +1,11 @@ +--- name: Tests -on: [push, pull_request] +on: + push: + pull_request: + branches: + - master + jobs: lint: runs-on: ubuntu-latest From f92face333305d3a50945452669a417a7227136d Mon Sep 17 00:00:00 2001 From: antuarc Date: Fri, 16 Sep 2022 09:26:53 +1000 Subject: [PATCH 03/11] use 'ckan_cli' script to unify test setup logic across CKAN versions, #2 - This script calls either 'paster' or 'ckan' according to what is available, adjusting parameters as needed --- .github/workflows/test.yml | 19 +++++----- bin/ckan_cli | 75 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 85 insertions(+), 9 deletions(-) create mode 100644 bin/ckan_cli diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml index fdeb9ac6..95eb9e9e 100644 --- a/.github/workflows/test.yml +++ b/.github/workflows/test.yml @@ -53,6 +53,8 @@ jobs: steps: - uses: actions/checkout@v3 + timeout-minutes: 1 + - name: Install requirements run: | pip install -r requirements.txt @@ -60,15 +62,14 @@ jobs: pip install -e . # Replace default path to CKAN core config file with the one on the container sed -i -e 's/use = config:.*/use = config:\/srv\/app\/src\/ckan\/test-core.ini/' test.ini - - name: Setup extension (CKAN >= 2.9) - if: ${{ matrix.ckan-version != '2.7' && matrix.ckan-version != '2.8' }} - run: | - ckan -c test.ini db init - ckan -c test.ini archiver init - - name: Setup extension (CKAN < 2.9) - if: ${{ matrix.ckan-version == '2.7' || matrix.ckan-version == '2.8' }} + timeout-minutes: 10 + + - name: Setup extension run: | - paster --plugin=ckan db init -c test.ini - paster --plugin=ckanext-archiver archiver init -c test.ini + export CKAN_INI=test.ini + chmod u+x bin/ckan_cli + bin/ckan_cli db init + PASTER_PLUGIN=ckanext-archiver bin/ckan_cli archiver init + - name: Run tests run: pytest --ckan-ini=test.ini --cov=ckanext.archiver --disable-warnings ckanext/archiver/tests diff --git a/bin/ckan_cli b/bin/ckan_cli new file mode 100644 index 00000000..7757dc8b --- /dev/null +++ b/bin/ckan_cli @@ -0,0 +1,75 @@ +#!/bin/sh + +# Call either 'ckan' (from CKAN >= 2.9) or 'paster' (from CKAN <= 2.8) +# with appropriate syntax, depending on what is present on the system. +# This is intended to smooth the upgrade process from 2.8 to 2.9. +# Eg: +# ckan_cli jobs list +# could become either: +# paster --plugin=ckan jobs list -c /etc/ckan/default/production.ini +# or: +# ckan -c /etc/ckan/default/production.ini jobs list + +# This script is aware of the VIRTUAL_ENV environment variable, and will +# attempt to respect it with similar behaviour to commands like 'pip'. +# Eg placing this script in a virtualenv 'bin' directory will cause it +# to call the 'ckan' or 'paster' command in that directory, while +# placing this script elsewhere will cause it to rely on the VIRTUAL_ENV +# variable, or if that is not set, the system PATH. + +# Since the positioning of the CKAN configuration file is central to the +# differences between 'paster' and 'ckan', this script needs to be aware +# of the config file location. It will use the CKAN_INI environment +# variable if it exists, or default to /etc/ckan/default/production.ini. + +# If 'paster' is being used, the default plugin is 'ckan'. A different +# plugin can be specified by setting the PASTER_PLUGIN environment +# variable. This variable is irrelevant if using the 'ckan' command. + +CKAN_INI="${CKAN_INI:-/etc/ckan/default/production.ini}" +PASTER_PLUGIN="${PASTER_PLUGIN:-ckan}" +# First, look for a command alongside this file +ENV_DIR=$(dirname "$0") +if [ -f "$ENV_DIR/ckan" ]; then + COMMAND=ckan +elif [ -f "$ENV_DIR/paster" ]; then + COMMAND=paster +elif [ "$VIRTUAL_ENV" != "" ]; then + # If command not found alongside this file, check the virtualenv + ENV_DIR="$VIRTUAL_ENV/bin" + if [ -f "$ENV_DIR/ckan" ]; then + COMMAND=ckan + elif [ -f "$ENV_DIR/paster" ]; then + COMMAND=paster + fi +else + # if no virtualenv is active, try the system path + if (which ckan > /dev/null 2>&1); then + ENV_DIR=$(dirname $(which ckan)) + COMMAND=ckan + elif (which paster > /dev/null 2>&1); then + ENV_DIR=$(dirname $(which paster)) + COMMAND=paster + else + echo "Unable to locate 'ckan' or 'paster' command" >&2 + exit 1 + fi +fi + +if [ "$COMMAND" = "ckan" ]; then + # adjust args to match ckan expectations + COMMAND=$(echo "$1" | sed -e 's/create-test-data/seed/') + echo "Using 'ckan' command from $ENV_DIR with config ${CKAN_INI} to run $COMMAND..." >&2 + shift + exec $ENV_DIR/ckan -c ${CKAN_INI} $COMMAND "$@" $CLICK_ARGS +elif [ "$COMMAND" = "paster" ]; then + # adjust args to match paster expectations + COMMAND=$1 + echo "Using 'paster' command from $ENV_DIR with config ${CKAN_INI} to run $COMMAND..." >&2 + shift + if [ "$1" = "show" ]; then shift; fi + exec $ENV_DIR/paster --plugin=$PASTER_PLUGIN $COMMAND "$@" -c ${CKAN_INI} +else + echo "Unable to locate 'ckan' or 'paster' command in $ENV_DIR" >&2 + exit 1 +fi From 55ac6a1339b0c908dccdde1daa34b34fca37d154 Mon Sep 17 00:00:00 2001 From: antuarc Date: Fri, 16 Sep 2022 10:06:57 +1000 Subject: [PATCH 04/11] clean up import ordering, typos, whitespace, #2 - Also combine duplicate exception blocks --- ckanext/archiver/bin/common.py | 2 + ckanext/archiver/bin/migrate_task_status.py | 4 +- ckanext/archiver/bin/running_stats.py | 6 +- ckanext/archiver/cli.py | 4 +- ckanext/archiver/command_celery.py | 2 + ckanext/archiver/commands.py | 9 ++- ckanext/archiver/lib.py | 6 +- ckanext/archiver/model.py | 16 ++--- ckanext/archiver/plugin.py | 12 ++-- ckanext/archiver/reports.py | 8 ++- ckanext/archiver/tasks.py | 67 ++++++++------------- ckanext/archiver/tests/test_archiver.py | 7 +-- ckanext/archiver/tests/test_model.py | 2 + ckanext/archiver/utils.py | 34 +++++------ setup.py | 17 +++--- 15 files changed, 94 insertions(+), 102 deletions(-) diff --git a/ckanext/archiver/bin/common.py b/ckanext/archiver/bin/common.py index 093d005d..5d551903 100644 --- a/ckanext/archiver/bin/common.py +++ b/ckanext/archiver/bin/common.py @@ -1,3 +1,5 @@ +# encoding: utf-8 + from __future__ import print_function import os import ckan.plugins as p diff --git a/ckanext/archiver/bin/migrate_task_status.py b/ckanext/archiver/bin/migrate_task_status.py index 67d58486..a1c35ea0 100644 --- a/ckanext/archiver/bin/migrate_task_status.py +++ b/ckanext/archiver/bin/migrate_task_status.py @@ -95,7 +95,7 @@ def migrate(options): archival = Archival.get_for_resource(res.id) if archival: changed = None - for field, value in list(fields.items()): + for field, value in fields.items(): if getattr(archival, field) != value: if options.write: setattr(archival, field, value) @@ -107,7 +107,7 @@ def migrate(options): else: archival = Archival.create(res.id) if options.write: - for field, value in list(fields.items()): + for field, value in fields.items(): setattr(archival, field, value) model.Session.add(archival) add_stat('Added to archival table', res, stats) diff --git a/ckanext/archiver/bin/running_stats.py b/ckanext/archiver/bin/running_stats.py index fc5e115f..6acc2722 100644 --- a/ckanext/archiver/bin/running_stats.py +++ b/ckanext/archiver/bin/running_stats.py @@ -14,7 +14,7 @@ package_stats.increment('deleted') else: package_stats.increment('not deleted') -print package_stats.report() +print(package_stats.report()) > deleted: 30 > not deleted: 70 @@ -26,7 +26,7 @@ package_stats.add('deleted', package.name) else: package_stats.add('not deleted' package.name) -print package_stats.report() +print(package_stats.report()) > deleted: 30 pollution-uk, flood-regions, river-quality, ... > not deleted: 70 spending-bristol, ... @@ -65,7 +65,7 @@ def report(self, indent=1, order_by_title=False, show_time_taken=True): lines = [] indent_str = '\t' * indent report_dict = dict() - for category in list(self.keys()): + for category in self.keys(): report_dict[category] = self.report_value(category) if order_by_title: diff --git a/ckanext/archiver/cli.py b/ckanext/archiver/cli.py index 53adb4d9..c338b620 100644 --- a/ckanext/archiver/cli.py +++ b/ckanext/archiver/cli.py @@ -1,5 +1,7 @@ +# encoding: utf-8 + import click -from ckanext.archiver import utils +from . import utils def get_commands(): diff --git a/ckanext/archiver/command_celery.py b/ckanext/archiver/command_celery.py index 66b3f32e..32386f9d 100644 --- a/ckanext/archiver/command_celery.py +++ b/ckanext/archiver/command_celery.py @@ -1,3 +1,5 @@ +# encoding: utf-8 + from __future__ import print_function from future import standard_library import sys diff --git a/ckanext/archiver/commands.py b/ckanext/archiver/commands.py index 5f628fce..847c59c3 100644 --- a/ckanext/archiver/commands.py +++ b/ckanext/archiver/commands.py @@ -1,13 +1,12 @@ +# encoding: utf-8 + from __future__ import print_function import logging import sys from ckan.lib.cli import CkanCommand -from ckanext.archiver import utils - - -REQUESTS_HEADER = {'content-type': 'application/json'} +from . import utils class Archiver(CkanCommand): @@ -27,7 +26,7 @@ class Archiver(CkanCommand): package or group, if specified paster archiver update-test [{package-name/id}|{group-name/id}] - - Does an archive in the current process i.e. avoiding Celery queue + - Does an archive in the current process i.e. avoiding worker queue so that you can test on the command-line more easily. paster archiver clean-status diff --git a/ckanext/archiver/lib.py b/ckanext/archiver/lib.py index e6b21be8..3b8327a6 100644 --- a/ckanext/archiver/lib.py +++ b/ckanext/archiver/lib.py @@ -1,3 +1,5 @@ +# encoding: utf-8 + from builtins import str import logging import ckan.plugins as p @@ -31,14 +33,14 @@ def create_archiver_resource_task(resource, queue): compat_enqueue('archiver.update_resource', update_resource, queue, [resource.id]) - log.debug('Archival of resource put into celery queue %s: %s/%s url=%r', + log.debug('Archival of resource put into queue %s: %s/%s url=%r', queue, package.name, resource.id, resource.url) def create_archiver_package_task(package, queue): compat_enqueue('archiver.update_package', update_package, queue, [package.id]) - log.debug('Archival of package put into celery queue %s: %s', + log.debug('Archival of package put into queue %s: %s', queue, package.name) diff --git a/ckanext/archiver/model.py b/ckanext/archiver/model.py index 419ec9a0..14ff8827 100644 --- a/ckanext/archiver/model.py +++ b/ckanext/archiver/model.py @@ -1,8 +1,8 @@ -import itertools +# encoding: utf-8 + from builtins import str -from builtins import object -import uuid from datetime import datetime +import uuid from sqlalchemy import Column, MetaData from sqlalchemy import types @@ -27,29 +27,23 @@ def make_uuid(): # enum of all the archival statuses (singleton) # NB Be very careful changing these status strings. They are also used in # ckanext-qa tasks.py. -class Status(object): +class Status: _instance = None def __init__(self): - not_broken = { + self._by_id = { # is_broken = False 0: 'Archived successfully', 1: 'Content has not changed', - } - broken = { # is_broken = True 10: 'URL invalid', 11: 'URL request failed', 12: 'Download error', - } - not_sure = { # is_broken = None i.e. not sure 21: 'Chose not to download', 22: 'Download failure', 23: 'System error during archival', } - self._by_id = dict(itertools.chain(not_broken.items(), broken.items())) - self._by_id.update(not_sure) self._by_text = dict((value, key) for key, value in self._by_id.items()) diff --git a/ckanext/archiver/plugin.py b/ckanext/archiver/plugin.py index beb37e23..3e343472 100644 --- a/ckanext/archiver/plugin.py +++ b/ckanext/archiver/plugin.py @@ -1,3 +1,5 @@ +# encoding: utf-8 + import logging from ckan import model @@ -39,9 +41,7 @@ def notify(self, entity, operation=None): log.debug('Notified of package event: %s %s', entity.name, operation) - run_archiver = \ - self._is_it_sufficient_change_to_run_archiver(entity, operation) - if not run_archiver: + if not self._is_it_sufficient_change_to_run_archiver(entity, operation): return log.debug('Creating archiver task: %s', entity.name) @@ -170,7 +170,7 @@ def get_actions(self): return { 'archiver_resource_show': action.archiver_resource_show, 'archiver_dataset_show': action.archiver_dataset_show, - } + } # IAuthFunctions @@ -178,13 +178,13 @@ def get_auth_functions(self): return { 'archiver_resource_show': auth.archiver_resource_show, 'archiver_dataset_show': auth.archiver_dataset_show, - } + } # ITemplateHelpers def get_helpers(self): return dict((name, function) for name, function - in list(helpers.__dict__.items()) + in helpers.__dict__.items() if callable(function) and name[0] != '_') # IPackageController diff --git a/ckanext/archiver/reports.py b/ckanext/archiver/reports.py index f50e2421..138b58c0 100644 --- a/ckanext/archiver/reports.py +++ b/ckanext/archiver/reports.py @@ -1,3 +1,5 @@ +# encoding: utf-8 + import copy try: from collections import OrderedDict # from python 2.7 @@ -97,7 +99,7 @@ def broken_links_index(include_sub_organizations=False): ('broken_package_percent', lib.percent(org_counts['broken_packages'], org_counts['packages'])), ('broken_resource_count', org_counts['broken_resources']), ('broken_resource_percent', lib.percent(org_counts['broken_resources'], org_counts['resources'])), - ))) + ))) # Totals - always use the counts, rather than counts_with_sub_orgs, to # avoid counting a package in both its org and parent org org_counts_ = counts[org_name] @@ -197,7 +199,7 @@ def broken_links_for_organization(organization, include_sub_organizations=False) ('reason', archival.reason), ('status', archival.status), ('failure_count', archival.failure_count), - )) + )) results.append(row_data) @@ -246,7 +248,7 @@ def broken_links_option_combinations(): 'option_combinations': broken_links_option_combinations, 'generate': broken_links, 'template': 'report/broken_links.html', - } +} def add_progress_bar(iterable, caption=None): diff --git a/ckanext/archiver/tasks.py b/ckanext/archiver/tasks.py index 06a1758c..0146a03a 100644 --- a/ckanext/archiver/tasks.py +++ b/ckanext/archiver/tasks.py @@ -1,32 +1,39 @@ +# encoding: utf-8 + from __future__ import absolute_import from builtins import str -import os +import copy +import datetime import hashlib import http.client -import requests import json -import tempfile -import shutil -import datetime -import copy import mimetypes +import os +import requests import re +import shutil +import tempfile from time import sleep from requests.packages import urllib3 from future.moves.urllib.parse import urlparse, urljoin, quote, urlunparse +from ckan import model, plugins as p from ckan.common import _ from ckan.lib import uploader -from ckan import plugins as p -from ckanext.archiver import interfaces as archiver_interfaces +from ckan.lib.search.index import PackageSearchIndex +from ckan.plugins import toolkit +from ckan.plugins.toolkit import config + +from . import interfaces as archiver_interfaces, \ + default_settings as settings +from .model import Status, Archival +from .requests_ssl import SSLv3Adapter import logging log = logging.getLogger(__name__) -toolkit = p.toolkit - ALLOWED_SCHEMES = set(('http', 'https', 'ftp')) USER_AGENT = 'ckanext-archiver' @@ -155,8 +162,6 @@ def update_package(package_id, queue='bulk'): def _update_package(package_id, queue, log): - from ckan import model - get_action = toolkit.get_action num_archived = 0 @@ -188,8 +193,6 @@ def _update_search_index(package_id, log): ''' Tells CKAN to update its search index for a given package. ''' - from ckan import model - from ckan.lib.search.index import PackageSearchIndex package_index = PackageSearchIndex() context_ = {'model': model, 'ignore_auth': True, 'session': model.Session, 'use_cache': False, 'validate': False} @@ -221,11 +224,6 @@ def _update_resource(resource_id, queue, log): If not successful, returns None. """ - from ckan import model - from ckan.plugins.toolkit import config - from ckanext.archiver import default_settings as settings - from ckanext.archiver.model import Status, Archival - get_action = toolkit.get_action assert is_id(resource_id), resource_id @@ -263,7 +261,7 @@ def _save(status_id, exception, resource, url_redirected_to=None, hosted_externally = not url.startswith(config['ckan.site_url']) or urlparse(filepath).scheme != '' # if resource.get('resource_type') == 'file.upload' and not hosted_externally: if not hosted_externally: - log.info("Won't attemp to archive resource uploaded locally: %s" % resource['url']) + log.info("Won't attempt to archive resource uploaded locally: %s", resource['url']) try: hash, length = _file_hashnlength(filepath) @@ -307,7 +305,7 @@ def _save(status_id, exception, resource, url_redirected_to=None, 'site_url': config.get('ckan.site_url_internally') or config['ckan.site_url'], 'cache_url_root': config.get('ckanext-archiver.cache_url_root'), 'previous': Archival.get_for_resource(resource_id) - } + } err = None try: @@ -321,11 +319,7 @@ def _save(status_id, exception, resource, url_redirected_to=None, download_status_id = Status.by_text('URL invalid') try_as_api = False err = e - except DownloadException as e: - download_status_id = Status.by_text('Download error') - try_as_api = True - err = e - except DownloadError as e: + except (DownloadException, DownloadError) as e: download_status_id = Status.by_text('Download error') try_as_api = True err = e @@ -402,8 +396,6 @@ def download(context, resource, url_timeout=30, Returns a dict of results of a successful download: mimetype, size, hash, headers, saved_file, url_redirected_to ''' - from ckanext.archiver import default_settings as settings - from ckan.plugins.toolkit import config if max_content_length == 'default': max_content_length = settings.MAX_CONTENT_LENGTH @@ -411,8 +403,8 @@ def download(context, resource, url_timeout=30, url = resource['url'] url = tidy_url(url) - if (resource.get('url_type') == 'upload' and - not url.startswith('http')): + if (resource.get('url_type') == 'upload' + and not url.startswith('http')): url = context['site_url'].rstrip('/') + url hosted_externally = not url.startswith(config['ckan.site_url']) @@ -425,7 +417,7 @@ def download(context, resource, url_timeout=30, if not config.get('ckanext-archiver.archive_cloud', False): raise ChooseNotToDownload('Skipping resource hosted externally to download resource: %s' - % url, url) + % url, url) headers = _set_user_agent_string({}) @@ -476,7 +468,7 @@ def download(context, resource, url_timeout=30, 'Resource: %s %r', content_length, max_content_length, resource['id'], url) raise ChooseNotToDownload(_('Content-length %s exceeds maximum ' - 'allowed value %s') % + 'allowed value %s') % (content_length, max_content_length), url_redirected_to) # content_length in the headers is useful but can be unreliable, so when we @@ -555,7 +547,6 @@ def archive_resource(context, resource, log, result=None, url_timeout=30): Returns: {cache_filepath, cache_url} """ - from ckanext.archiver import default_settings as settings relative_archive_path = os.path.join(resource['id'][:2], resource['id']) archive_dir = os.path.join(settings.ARCHIVE_DIR, relative_archive_path) if not os.path.exists(archive_dir): @@ -585,7 +576,7 @@ def archive_resource(context, resource, log, result=None, url_timeout=30): 'ckanext-archiver.cache_url_root in config') raise ArchiveError(_('No value for ckanext-archiver.cache_url_root in config')) cache_url = urljoin(str(context['cache_url_root']), - '%s/%s' % (str(relative_archive_path), str(file_name))) + '%s/%s' % (relative_archive_path, file_name)) return {'cache_filepath': saved_file, 'cache_url': cache_url} @@ -618,7 +609,6 @@ def get_plugins_waiting_on_ipipe(): def verify_https(): - from ckan.plugins.toolkit import config return toolkit.asbool(config.get('ckanext-archiver.verify_https', True)) @@ -635,7 +625,6 @@ def _set_user_agent_string(headers): Update the passed headers object with a `User-Agent` key, if there is a USER_AGENT_STRING option in settings. ''' - from ckanext.archiver import default_settings as settings ua_str = settings.USER_AGENT_STRING if ua_str is not None: headers['User-Agent'] = ua_str @@ -684,7 +673,7 @@ def tidy_url(url): return url -def _save_resource(resource, response, max_file_size, chunk_size=1024*16): +def _save_resource(resource, response, max_file_size, chunk_size=1024 * 16): """ Write the response content to disk. @@ -724,9 +713,6 @@ def save_archival(resource, status_id, reason, url_redirected_to, ''' now = datetime.datetime.now() - from ckanext.archiver.model import Archival, Status - from ckan import model - archival = Archival.get_for_resource(resource['id']) first_archival = not archival previous_archival_was_broken = None @@ -792,7 +778,6 @@ def requests_wrapper(log, func, *args, **kwargs): runs: res = requests.get(url, timeout=url_timeout) ''' - from .requests_ssl import SSLv3Adapter try: try: response = func(*args, **kwargs) diff --git a/ckanext/archiver/tests/test_archiver.py b/ckanext/archiver/tests/test_archiver.py index 731f9639..bf326df8 100644 --- a/ckanext/archiver/tests/test_archiver.py +++ b/ckanext/archiver/tests/test_archiver.py @@ -68,7 +68,7 @@ def test_bad_url(self): def test_non_escaped_url(self, client): url = client + '/+/http://www.homeoffice.gov.uk/publications/science-research-statistics/research-statistics/' \ - + 'drugs-alcohol-research/hosb1310/hosb1310-ann2tabs?view=Binary' + + 'drugs-alcohol-research/hosb1310/hosb1310-ann2tabs?view=Binary' context = json.dumps({}) data = json.dumps({'url': url}) res = link_checker(context, data) @@ -156,7 +156,7 @@ def initial_data(cls, clean_db): def _test_package(self, url, format=None): pkg = {'resources': [ {'url': url, 'format': format or 'TXT', 'description': 'Test'} - ]} + ]} pkg = ckan_factories.Dataset(**pkg) return pkg @@ -336,7 +336,6 @@ class TestDownload: @pytest.fixture(autouse=True) @pytest.mark.usefixtures(u"clean_index") def initialData(cls, clean_db): - config cls.fake_context = { 'site_url': config.get('ckan.site_url_internally') or config['ckan.site_url'], 'cache_url_root': config.get('ckanext-archiver.cache_url_root'), @@ -346,7 +345,7 @@ def _test_resource(self, url, format=None): context = {'model': model, 'ignore_auth': True, 'session': model.Session, 'user': 'test'} pkg = {'name': 'testpkg', 'resources': [ {'url': url, 'format': format or 'TXT', 'description': 'Test'} - ]} + ]} pkg = get_action('package_create')(context, pkg) return pkg['resources'][0] diff --git a/ckanext/archiver/tests/test_model.py b/ckanext/archiver/tests/test_model.py index 6c0e4bfe..c86bd829 100644 --- a/ckanext/archiver/tests/test_model.py +++ b/ckanext/archiver/tests/test_model.py @@ -1,3 +1,5 @@ +# encoding: utf-8 + from builtins import object import ckanext.archiver.model as archiver_model from ckan.tests import factories as ckan_factories diff --git a/ckanext/archiver/utils.py b/ckanext/archiver/utils.py index 15def84b..be681d86 100644 --- a/ckanext/archiver/utils.py +++ b/ckanext/archiver/utils.py @@ -1,12 +1,13 @@ +# encoding: utf-8 + import itertools import logging -import sys -from time import sleep - import os import re import shutil from sqlalchemy import func +import sys +from time import sleep import ckan.plugins as p from ckan.plugins.toolkit import config @@ -16,7 +17,6 @@ except ImportError: from sqlalchemy.util import OrderedDict - log = logging.getLogger(__name__) @@ -26,7 +26,8 @@ def update(identifiers, queue): _get_packages_and_resources_in_args(identifiers, queue): if is_pkg: package = pkg_or_res - log.info('Queuing dataset %s (%s resources) Q:%s', package.name, num_resources_for_pkg, queue) + log.info('Queuing dataset %s (%s resources) Q:%s', + package.name, num_resources_for_pkg, queue) lib.create_archiver_package_task(package, queue) sleep(0.1) # to try to avoid Redis getting overloaded else: @@ -38,7 +39,7 @@ def update(identifiers, queue): def _get_packages_and_resources_in_args(identifiers, queue): - '''Given identifies that specify one or more datasets or + '''Given identifiers that specify one or more datasets or resources, it generates a list of those packages & resources with some basic properties. @@ -109,10 +110,10 @@ def _get_packages_and_resources_in_args(identifiers, queue): # earlier CKANs had ResourceGroup pkg_resources = \ [resource for resource in - itertools.chain.from_iterable( - (rg.resources_all - for rg in package.resource_groups_all) - ) + itertools.chain.from_iterable( + (rg.resources_all + for rg in package.resource_groups_all) + ) if res.state == 'active'] else: pkg_resources = \ @@ -330,8 +331,7 @@ def migrate(): "last_modified": "ALTER TABLE archival ADD COLUMN last_modified character varying" }) - MIGRATIONS_MODIFY = OrderedDict({ - }) + MIGRATIONS_MODIFY = OrderedDict({}) q = "select column_name from INFORMATION_SCHEMA.COLUMNS where table_name = 'archival';" current_cols = list([m[0] for m in model.Session.execute(q)]) @@ -431,13 +431,13 @@ def size_report(): from ckan import model from ckanext.archiver.model import Archival kb = 1024 - mb = 1024*1024 + mb = 1024 * 1024 gb = pow(1024, 3) size_bins = [ - (kb, '<1 KB'), (10*kb, '1-10 KB'), (100*kb, '10-100 KB'), - (mb, '100 KB - 1 MB'), (10*mb, '1-10 MB'), (100*mb, '10-100 MB'), - (gb, '100 MB - 1 GB'), (10*gb, '1-10 GB'), (100*gb, '10-100 GB'), - (gb*gb, '>100 GB'), + (kb, '<1 KB'), (10 * kb, '1-10 KB'), (100 * kb, '10-100 KB'), + (mb, '100 KB - 1 MB'), (10 * mb, '1-10 MB'), (100 * mb, '10-100 MB'), + (gb, '100 MB - 1 GB'), (10 * gb, '1-10 GB'), (100 * gb, '10-100 GB'), + (gb * gb, '>100 GB'), ] previous_bin = (0, '') counts = [] diff --git a/setup.py b/setup.py index d40306ec..b67c9f8e 100644 --- a/setup.py +++ b/setup.py @@ -1,3 +1,4 @@ +# encoding: utf-8 from setuptools import setup, find_packages # Always prefer setuptools over distutils from codecs import open # To use a consistent encoding from os import path @@ -44,6 +45,9 @@ # that you indicate whether you support Python 2, Python 3 or both. 'Programming Language :: Python :: 2.6', 'Programming Language :: Python :: 2.7', + 'Programming Language :: Python :: 3.6', + 'Programming Language :: Python :: 3.7', + 'Programming Language :: Python :: 3.8', 'Environment :: Console', 'Intended Audience :: Developers', @@ -58,12 +62,13 @@ # You can just specify the packages manually here if your project is # simple. Or you can use find_packages(). packages=find_packages(exclude=['contrib', 'docs', 'tests*']), + namespace_packages=['ckanext'], install_requires=[ - # CKAN extensions should not list dependencies here, but in a separate - # ``requirements.txt`` file. - # - # http://docs.ckan.org/en/latest/extensions/best-practices.html#add-third-party-libraries-to-requirements-txt + # CKAN extensions should not list dependencies here, but in a separate + # ``requirements.txt`` file. + # + # http://docs.ckan.org/en/latest/extensions/best-practices.html#add-third-party-libraries-to-requirements-txt ], # If there are data files included in your packages that need to be @@ -108,7 +113,5 @@ ('**.js', 'javascript', None), ('**/templates/**.html', 'ckan', None), ], - }, - - namespace_packages=['ckanext'], + } ) From 6f87680e02d233020d3c09d0fb1c86ec1b492e61 Mon Sep 17 00:00:00 2001 From: antuarc Date: Fri, 16 Sep 2022 10:17:14 +1000 Subject: [PATCH 05/11] store time internally as UTC so we can render it consistently, #2 --- ckanext/archiver/bin/running_stats.py | 4 ++-- ckanext/archiver/model.py | 2 +- ckanext/archiver/tasks.py | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/ckanext/archiver/bin/running_stats.py b/ckanext/archiver/bin/running_stats.py index 6acc2722..3a031c81 100644 --- a/ckanext/archiver/bin/running_stats.py +++ b/ckanext/archiver/bin/running_stats.py @@ -43,7 +43,7 @@ class StatsCount(dict): report_value_limit = 150 def __init__(self, *args, **kwargs): - self._start_time = datetime.datetime.now() + self._start_time = datetime.datetime.utcnow() super(StatsCount, self).__init__(*args, **kwargs) def _init_category(self, category): @@ -81,7 +81,7 @@ def report(self, indent=1, order_by_title=False, show_time_taken=True): lines = [indent_str + 'None'] if show_time_taken: - time_taken = datetime.datetime.now() - self._start_time + time_taken = datetime.datetime.utcnow() - self._start_time lines.append(indent_str + 'Time taken (h:m:s): %s' % time_taken) return '\n'.join(lines) diff --git a/ckanext/archiver/model.py b/ckanext/archiver/model.py index 14ff8827..7635f950 100644 --- a/ckanext/archiver/model.py +++ b/ckanext/archiver/model.py @@ -112,7 +112,7 @@ class Archival(Base): last_success = Column(types.DateTime) failure_count = Column(types.Integer, default=0) - created = Column(types.DateTime, default=datetime.now) + created = Column(types.DateTime, default=datetime.utcnow) updated = Column(types.DateTime) def __repr__(self): diff --git a/ckanext/archiver/tasks.py b/ckanext/archiver/tasks.py index 0146a03a..bcc80653 100644 --- a/ckanext/archiver/tasks.py +++ b/ckanext/archiver/tasks.py @@ -711,7 +711,7 @@ def save_archival(resource, status_id, reason, url_redirected_to, May propagate a CkanError. ''' - now = datetime.datetime.now() + now = datetime.datetime.utcnow() archival = Archival.get_for_resource(resource['id']) first_archival = not archival From ba731ece613589fa9f3b4de179c0ef674ce9f890 Mon Sep 17 00:00:00 2001 From: antuarc Date: Fri, 16 Sep 2022 11:12:05 +1000 Subject: [PATCH 06/11] use helper function to look up status, #2 - This allows us to easily apply null-checks to every lookup --- ckanext/archiver/model.py | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/ckanext/archiver/model.py b/ckanext/archiver/model.py index 7635f950..9358baaf 100644 --- a/ckanext/archiver/model.py +++ b/ckanext/archiver/model.py @@ -80,6 +80,12 @@ def is_ok(cls, status_id): False: 'Downloaded OK'} +def _get_status_by_id(status_id): + if status_id is None: + return None + return Status.by_id(status_id) + + class Archival(Base): """ Details of the archival of resources. Has the filepath for successfully @@ -151,9 +157,7 @@ def create(cls, resource_id): @property def status(self): - if self.status_id is None: - return None - return Status.by_id(self.status_id) + return _get_status_by_id(self.status_id) def as_dict(self): context = {'model': model} @@ -186,7 +190,7 @@ def aggregate_archivals_for_a_dataset(archivals): archival_dict['reason'] = archival.reason if archivals: - archival_dict['status'] = Status.by_id(archival_dict['status_id']) + archival_dict['status'] = _get_status_by_id(archival_dict['status_id']) archival_dict['is_broken'] = \ Status.is_status_broken(archival_dict['status_id']) return archival_dict From a2888d0c1770c2eb360bc8c2bc6709ebb3fa1e1c Mon Sep 17 00:00:00 2001 From: antuarc Date: Fri, 16 Sep 2022 11:24:24 +1000 Subject: [PATCH 07/11] clean up more imports, #2 --- ckanext/archiver/tasks.py | 2 +- ckanext/archiver/tests/test_archiver.py | 4 +++- ckanext/archiver/utils.py | 10 ++++------ 3 files changed, 8 insertions(+), 8 deletions(-) diff --git a/ckanext/archiver/tasks.py b/ckanext/archiver/tasks.py index bcc80653..b817606b 100644 --- a/ckanext/archiver/tasks.py +++ b/ckanext/archiver/tasks.py @@ -39,7 +39,7 @@ USER_AGENT = 'ckanext-archiver' # CKAN 2.7 introduces new jobs system -if p.toolkit.check_ckan_version(max_version='2.6.99'): +if toolkit.check_ckan_version(max_version='2.6.99'): from ckan.lib.celery_app import celery @celery.task(name="archiver.update_resource") diff --git a/ckanext/archiver/tests/test_archiver.py b/ckanext/archiver/tests/test_archiver.py index bf326df8..a06f4715 100644 --- a/ckanext/archiver/tests/test_archiver.py +++ b/ckanext/archiver/tests/test_archiver.py @@ -1,9 +1,11 @@ +# encoding: utf-8 + from __future__ import print_function +import json import logging import os import shutil import tempfile -import json from future.moves.urllib.parse import quote_plus from ckan.plugins.toolkit import config diff --git a/ckanext/archiver/utils.py b/ckanext/archiver/utils.py index be681d86..46470a4a 100644 --- a/ckanext/archiver/utils.py +++ b/ckanext/archiver/utils.py @@ -9,9 +9,7 @@ import sys from time import sleep -import ckan.plugins as p -from ckan.plugins.toolkit import config - +from ckan.plugins.toolkit import check_ckan_version, config try: from collections import OrderedDict # from python 2.7 except ImportError: @@ -106,7 +104,7 @@ def _get_packages_and_resources_in_args(identifiers, queue): log.info('Queue: %s', queue) for package in packages: - if p.toolkit.check_ckan_version(max_version='2.2.99'): + if check_ckan_version(max_version='2.2.99'): # earlier CKANs had ResourceGroup pkg_resources = \ [resource for resource in @@ -122,7 +120,7 @@ def _get_packages_and_resources_in_args(identifiers, queue): yield package, True, len(pkg_resources), None for resource in resources: - if p.toolkit.check_ckan_version(max_version='2.2.99'): + if check_ckan_version(max_version='2.2.99'): package = resource.resource_group.package else: package = resource.package @@ -377,7 +375,7 @@ def migrate_archiver_dirs(): # check the package isn't deleted # Need to refresh the resource's session resource = model.Session.query(model.Resource).get(resource.id) - if p.toolkit.check_ckan_version(max_version='2.2.99'): + if check_ckan_version(max_version='2.2.99'): package = None if resource.resource_group: package = resource.resource_group.package From 1f875cf4b77098a3fdf4ad6b4001dc6ef9549c0c Mon Sep 17 00:00:00 2001 From: antuarc Date: Fri, 16 Sep 2022 11:30:50 +1000 Subject: [PATCH 08/11] use helper functions to simplify tests, #2 --- ckanext/archiver/tests/mock_remote_server.py | 75 +++++++++++--------- ckanext/archiver/tests/test_archiver.py | 22 +++--- 2 files changed, 56 insertions(+), 41 deletions(-) diff --git a/ckanext/archiver/tests/mock_remote_server.py b/ckanext/archiver/tests/mock_remote_server.py index 0355907d..0a3b988f 100644 --- a/ckanext/archiver/tests/mock_remote_server.py +++ b/ckanext/archiver/tests/mock_remote_server.py @@ -6,6 +6,7 @@ from builtins import range from builtins import object from contextlib import contextmanager +from http.client import responses from threading import Thread from time import sleep from wsgiref.simple_server import make_server @@ -15,6 +16,16 @@ from functools import reduce +def _get_str_params(request): + """ Get parameters from the request. If 'str_params' is available, + use that, otherwise just use 'params'. + """ + if hasattr(request, 'str_params'): + return request.str_params + else: + return request.params + + class MockHTTPServer(object): """ Mock HTTP server that can take the place of a remote server for testing @@ -42,7 +53,7 @@ def serve(self, host='localhost', port_range=(8000, 9000)): This uses context manager to make sure the server is stopped:: >>> with MockTestServer().serve() as addr: - ... print urlopen('%s/?content=hello+world').read() + ... print(urlopen('%s/?content=hello+world').read()) ... 'hello world' """ @@ -112,27 +123,26 @@ class MockEchoTestServer(MockHTTPServer): def __call__(self, environ, start_response): - from http.client import responses from webob import Request request = Request(environ) - status = int(request.str_params.get('status', '200')) - # if 'redirect' in redirect.str_params: - # params = dict([(key, value) for param in request.str_params \ + status = int(_get_str_params(request).get('status', '200')) + # if 'redirect' in _get_str_params(redirect): + # params = dict([(key, value) for param in _get_str_params(request) \ # if key != 'redirect']) - # redirect_status = int(request.str_params['redirect']) - # status = int(request.str_params.get('status', '200')) + # redirect_status = int(_get_str_params(request)['redirect']) + # status = int(_get_str_params(request).get('status', '200')) # resp = make_response(render_template('error.html'), redirect_status) # resp.headers['Location'] = url_for(request.path, params) # return resp - if 'content_var' in request.str_params: - content = request.str_params.get('content_var') + if 'content_var' in _get_str_params(request): + content = _get_str_params(request).get('content_var') content = self.get_content(content) - elif 'content_long' in request.str_params: + elif 'content_long' in _get_str_params(request): content = '*' * 1000001 else: - content = request.str_params.get('content', '') - if 'method' in request.str_params \ - and request.method.lower() != request.str_params['method'].lower(): + content = _get_str_params(request).get('content', '') + if 'method' in _get_str_params(request) \ + and request.method.lower() != _get_str_params(request)['method'].lower(): content = '' status = 405 @@ -141,14 +151,17 @@ def __call__(self, environ, start_response): headers = [ item - for item in list(request.str_params.items()) + for item in _get_str_params(request).items() if item[0] not in ('content', 'status') ] - if 'length' in request.str_params: - cl = request.str_params.get('length') + if 'length' in _get_str_params(request): + cl = _get_str_params(request).get('length') headers += [('Content-Length', cl)] - elif content and 'no-content-length' not in request.str_params: - headers += [('Content-Length', bytes(len(content)))] + elif content and 'no-content-length' not in _get_str_params(request): + # Python 2 with old WebOb wants bytes, + # Python 3 with new WebOb wants text, + # so both want 'str' + headers += [('Content-Length', str(len(content)))] start_response( '%d %s' % (status, responses[status]), headers @@ -187,20 +200,19 @@ def __init__(self, wms_version='1.3'): super(MockWmsServer, self).__init__() def __call__(self, environ, start_response): - from http.client import responses from webob import Request request = Request(environ) - status = int(request.str_params.get('status', '200')) - headers = {'Content-Type': 'text/plain'} + status = int(_get_str_params(request).get('status', '200')) + headers = [('Content-Type', 'text/plain')] # e.g. params ?service=WMS&request=GetCapabilities&version=1.1.1 - if request.str_params.get('service') != 'WMS': + if _get_str_params(request).get('service') != 'WMS': status = 200 content = ERROR_WRONG_SERVICE - elif request.str_params.get('request') != 'GetCapabilities': + elif _get_str_params(request).get('request') != 'GetCapabilities': status = 405 content = '"request" param wrong' - elif 'version' in request.str_params and \ - request.str_params.get('version') != self.wms_version: + elif 'version' in _get_str_params(request) and \ + _get_str_params(request).get('version') != self.wms_version: status = 405 content = '"version" not compatible - need to be %s' % self.wms_version elif self.wms_version == '1.1.1': @@ -211,7 +223,7 @@ def __call__(self, environ, start_response): content = get_file_content('wms_getcap_1.3.xml') start_response( '%d %s' % (status, responses[status]), - list(headers.items()) + headers ) return [content] @@ -223,16 +235,15 @@ def __init__(self): super(MockWfsServer, self).__init__() def __call__(self, environ, start_response): - from http.client import responses from webob import Request request = Request(environ) - status = int(request.str_params.get('status', '200')) - headers = {'Content-Type': 'text/plain'} + status = int(_get_str_params(request).get('status', '200')) + headers = [('Content-Type', 'text/plain')] # e.g. params ?service=WFS&request=GetCapabilities - if request.str_params.get('service') != 'WFS': + if _get_str_params(request).get('service') != 'WFS': status = 200 content = ERROR_WRONG_SERVICE - elif request.str_params.get('request') != 'GetCapabilities': + elif _get_str_params(request).get('request') != 'GetCapabilities': status = 405 content = '"request" param wrong' else: @@ -240,7 +251,7 @@ def __call__(self, environ, start_response): content = get_file_content('wfs_getcap.xml') start_response( '%d %s' % (status, responses[status]), - list(headers.items()) + headers ) return [content] diff --git a/ckanext/archiver/tests/test_archiver.py b/ckanext/archiver/tests/test_archiver.py index a06f4715..95503b40 100644 --- a/ckanext/archiver/tests/test_archiver.py +++ b/ckanext/archiver/tests/test_archiver.py @@ -13,7 +13,6 @@ from ckan import model from ckan import plugins -from ckan.logic import get_action from ckan.tests import factories as ckan_factories from ckanext.archiver import model as archiver_model @@ -187,7 +186,7 @@ def test_bad_url(self): def test_resource_hash_and_content_length(self, client): url = client + '/?status=200&content=test&content-type=csv' res_id = self._test_resource(url)['id'] - result = json.loads(update_resource(res_id)) + result = self._get_update_resource_json(res_id) assert result['size'] == len('test') from hashlib import sha1 assert result['hash'] == sha1('test'.encode('utf-8')).hexdigest(), result @@ -196,7 +195,7 @@ def test_resource_hash_and_content_length(self, client): def test_archived_file(self, client): url = client + '/?status=200&content=test&content-type=csv' res_id = self._test_resource(url)['id'] - result = json.loads(update_resource(res_id)) + result = self._get_update_resource_json(res_id) assert result['cache_filepath'] assert os.path.exists(result['cache_filepath']) @@ -211,14 +210,14 @@ def test_archived_file(self, client): def test_update_url_with_unknown_content_type(self, client): url = client + '/?content-type=application/foo&content=test' res_id = self._test_resource(url, format='foo')['id'] # format has no effect - result = json.loads(update_resource(res_id)) + result = self._get_update_resource_json(res_id) assert result, result assert result['mimetype'] == 'application/foo' # stored from the header def test_wms_1_3(self, client): url = client + '/WMS_1_3/' res_id = self._test_resource(url)['id'] - result = json.loads(update_resource(res_id)) + result = self._get_update_resource_json(res_id) assert result, result assert result['request_type'] == 'WMS 1.3' @@ -268,7 +267,7 @@ def test_file_too_large_2(self, client): def test_content_length_not_integer(self, client): url = client + '/?status=200&content=content&length=abc&content-type=csv' res_id = self._test_resource(url)['id'] - result = json.loads(update_resource(res_id)) + result = self._get_update_resource_json(res_id) assert result, result def test_content_length_repeated(self, client): @@ -276,7 +275,7 @@ def test_content_length_repeated(self, client): # listing the Content-Length header twice causes requests to # store the value as a comma-separated list res_id = self._test_resource(url)['id'] - result = json.loads(update_resource(res_id)) + result = self._get_update_resource_json(res_id) assert result, result def test_url_with_30x_follows_and_records_redirect(self, client): @@ -284,7 +283,7 @@ def test_url_with_30x_follows_and_records_redirect(self, client): redirect_url = url + u'?status=200&content=test&content-type=text/csv' url += u'?status=301&location=%s' % quote_plus(redirect_url) res_id = self._test_resource(url)['id'] - result = json.loads(update_resource(res_id)) + result = self._get_update_resource_json(res_id) assert result assert result['url_redirected_to'] == redirect_url @@ -329,6 +328,11 @@ def test_ipipe_notified_dataset(self, client): assert params.get('package_id') == pkg['id'] assert params.get('resource_id') is None + def _get_update_resource_json(self, id): + result = update_resource(resource_id=id) + assert result, "update_resource returned: {}".format(result) + return json.loads(result) + class TestDownload: '''Tests of the download method (and things it calls). @@ -348,7 +352,7 @@ def _test_resource(self, url, format=None): pkg = {'name': 'testpkg', 'resources': [ {'url': url, 'format': format or 'TXT', 'description': 'Test'} ]} - pkg = get_action('package_create')(context, pkg) + pkg = ckan_factories.Dataset(**pkg) return pkg['resources'][0] def test_head_unsupported(self, client): From 91f7ee04d81816a492ddad5818c1c8bb0b0bdccf Mon Sep 17 00:00:00 2001 From: antuarc Date: Fri, 16 Sep 2022 11:34:14 +1000 Subject: [PATCH 09/11] document the 'ckan.download_proxy' option, #2 --- README.rst | 1 + ckanext/archiver/tests/test_archiver.py | 1 - 2 files changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index f0b7c5c4..ca595944 100644 --- a/README.rst +++ b/README.rst @@ -244,6 +244,7 @@ Config settings * ``ckanext-archiver.max_content_length`` = the maximum size (in bytes) of files to archive (default ``50000000`` =50MB) * ``ckanext-archiver.user_agent_string`` = identifies the archiver to servers it archives from * ``ckanext-archiver.verify_https`` = true/false whether you want to verify https connections and therefore fail if it is specified in the URL but does not verify. + * ``ckan.download_proxy`` = URL to a HTTP/S proxy server that will be used to download resources. 4. Nightly report generation diff --git a/ckanext/archiver/tests/test_archiver.py b/ckanext/archiver/tests/test_archiver.py index 95503b40..95aefb38 100644 --- a/ckanext/archiver/tests/test_archiver.py +++ b/ckanext/archiver/tests/test_archiver.py @@ -348,7 +348,6 @@ def initialData(cls, clean_db): } def _test_resource(self, url, format=None): - context = {'model': model, 'ignore_auth': True, 'session': model.Session, 'user': 'test'} pkg = {'name': 'testpkg', 'resources': [ {'url': url, 'format': format or 'TXT', 'description': 'Test'} ]} From 76b52ae0cd7b675aad6d9ce9e9b50821bd22c3af Mon Sep 17 00:00:00 2001 From: antuarc Date: Fri, 16 Sep 2022 11:37:24 +1000 Subject: [PATCH 10/11] add Flake8 config file, #2 --- .flake8 | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) create mode 100644 .flake8 diff --git a/.flake8 b/.flake8 new file mode 100644 index 00000000..9a0d11a5 --- /dev/null +++ b/.flake8 @@ -0,0 +1,20 @@ +[flake8] +# @see https://flake8.pycqa.org/en/latest/user/configuration.html?highlight=.flake8 + +exclude = + ckan + .git + +# Extended output format. +format = pylint + +# Show the source of errors. +show_source = True + +max-complexity = 10 +max-line-length = 127 + +# List ignore rules one per line. +ignore = + C901 + W503 From 56519f44c0f362d2d0a0020bc4b55a82306a6f34 Mon Sep 17 00:00:00 2001 From: ThrawnCA Date: Thu, 6 Oct 2022 13:15:25 +1000 Subject: [PATCH 11/11] drop Keitaro-specific files, #2 --- CODE_OF_CONDUCT.md | 76 ---------------------------------------------- CONTRIBUTING.md | 27 ---------------- 2 files changed, 103 deletions(-) delete mode 100644 CODE_OF_CONDUCT.md delete mode 100644 CONTRIBUTING.md diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md deleted file mode 100644 index 04b029f9..00000000 --- a/CODE_OF_CONDUCT.md +++ /dev/null @@ -1,76 +0,0 @@ -# Contributor Covenant Code of Conduct - -## Our Pledge - -In the interest of fostering an open and welcoming environment, we as -contributors and maintainers pledge to making participation in our project and -our community a harassment-free experience for everyone, regardless of age, body -size, disability, ethnicity, sex characteristics, gender identity and expression, -level of experience, education, socio-economic status, nationality, personal -appearance, race, religion, or sexual identity and orientation. - -## Our Standards - -Examples of behavior that contributes to creating a positive environment -include: - -* Using welcoming and inclusive language -* Being respectful of differing viewpoints and experiences -* Gracefully accepting constructive criticism -* Focusing on what is best for the community -* Showing empathy towards other community members - -Examples of unacceptable behavior by participants include: - -* The use of sexualized language or imagery and unwelcome sexual attention or - advances -* Trolling, insulting/derogatory comments, and personal or political attacks -* Public or private harassment -* Publishing others' private information, such as a physical or electronic - address, without explicit permission -* Other conduct which could reasonably be considered inappropriate in a - professional setting - -## Our Responsibilities - -Project maintainers are responsible for clarifying the standards of acceptable -behavior and are expected to take appropriate and fair corrective action in -response to any instances of unacceptable behavior. - -Project maintainers have the right and responsibility to remove, edit, or -reject comments, commits, code, wiki edits, issues, and other contributions -that are not aligned to this Code of Conduct, or to ban temporarily or -permanently any contributor for other behaviors that they deem inappropriate, -threatening, offensive, or harmful. - -## Scope - -This Code of Conduct applies both within project spaces and in public spaces -when an individual is representing the project or its community. Examples of -representing a project or community include using an official project e-mail -address, posting via an official social media account, or acting as an appointed -representative at an online or offline event. Representation of a project may be -further defined and clarified by project maintainers. - -## Enforcement - -Instances of abusive, harassing, or otherwise unacceptable behavior may be -reported by contacting the project team at info@keitaro.com. All -complaints will be reviewed and investigated and will result in a response that -is deemed necessary and appropriate to the circumstances. The project team is -obligated to maintain confidentiality with regard to the reporter of an incident. -Further details of specific enforcement policies may be posted separately. - -Project maintainers who do not follow or enforce the Code of Conduct in good -faith may face temporary or permanent repercussions as determined by other -members of the project's leadership. - -## Attribution - -This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, -available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html - -[homepage]: https://www.contributor-covenant.org - -For answers to common questions about this code of conduct, see -https://www.contributor-covenant.org/faq diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md deleted file mode 100644 index 234d2591..00000000 --- a/CONTRIBUTING.md +++ /dev/null @@ -1,27 +0,0 @@ -## How to contribute to ckanext-archiver - -#### **Did you find a bug?** - -* **Ensure the bug was not already reported** by searching on GitHub under [Issues](https://github.com/keitaroinc/ckanext-archiver/issues). - -* If you're unable to find an open issue addressing the problem, [open a new one](https://github.com/keitaroinc/ckanext-archiver/issues/new). Be sure to include a **title and clear description**, as much relevant information as possible, we include an issue template to help out in filling-in the issue. - -#### **Did you write a patch that fixes a bug?** - -* Open a new GitHub pull request with the patch. - -* Ensure the PR description clearly describes the problem and solution. Include the relevant issue number if applicable. - -#### **Do you intend to add a new feature or change an existing one?** - -* [Create a new feature issue](https://github.com/keitaroinc/ckanext-archiver/issues/new) using the Feature Request template and describe your proposed changes - -* Submit a pull request referring to the relevant feature issue/s - -#### **Do you have questions about the source code?** - -* Ask any question about how to use ckanext-archiver in our [gitter chat](https://gitter.im/keitaroinc/ckan). - -Thanks! - -Keitaro Team