Skip to content
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
0750e91
feature/union-data
fivetran-catfritz Oct 16, 2025
5e7b061
add new files
fivetran-catfritz Oct 16, 2025
8731319
fixes
fivetran-catfritz Oct 16, 2025
7996d58
adjust staging
fivetran-catfritz Oct 16, 2025
edb9fb5
update tests
fivetran-catfritz Oct 16, 2025
6e48fb4
fixes
fivetran-catfritz Oct 17, 2025
0847422
changelog
fivetran-catfritz Oct 17, 2025
b80a7d6
update docs
fivetran-catfritz Oct 17, 2025
5ee26fb
Generate dbt docs via GitHub Actions
github-actions[bot] Oct 17, 2025
66af67a
update tests
fivetran-catfritz Oct 20, 2025
d374b1c
update tests
fivetran-catfritz Oct 20, 2025
49031b5
changelog
fivetran-catfritz Oct 20, 2025
b866815
one more test update
fivetran-catfritz Oct 20, 2025
5393797
update union_connections
fivetran-catfritz Oct 21, 2025
fd38316
update union_connections
fivetran-catfritz Oct 21, 2025
1315d4a
update union_connections
fivetran-catfritz Oct 22, 2025
5efbf7a
update github_union_connections
fivetran-catfritz Oct 22, 2025
210b65c
put back the thing
fivetran-catfritz Oct 22, 2025
a66527a
Apply suggestions from code review
fivetran-catfritz Oct 22, 2025
453f703
update source enablement
fivetran-catfritz Oct 22, 2025
b3f000c
changelog
fivetran-catfritz Oct 23, 2025
7712c9c
changelog
fivetran-catfritz Oct 23, 2025
5aab18a
Apply suggestions from code review
fivetran-catfritz Oct 23, 2025
05653d2
update src configs
fivetran-catfritz Oct 23, 2025
6139bcd
Update CHANGELOG.md
fivetran-catfritz Oct 24, 2025
a5f4825
formatting
fivetran-catfritz Oct 27, 2025
b2ee9bf
Merge branch 'feature/union-data' of https://github.com/fivetran/dbt_…
fivetran-catfritz Oct 27, 2025
7ec576b
Generate dbt docs via GitHub Actions
github-actions[bot] Oct 27, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -71,3 +71,5 @@ env/
env.bak/
venv/
venv.bak/

CLAUDE.md
12 changes: 12 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,15 @@
# dbt_github v1.1.0

## Schema/Data Change
**1 total change • 0 possible breaking changes**

| Data Model(s) | Change type | Old | New | Notes |
| ------------- | ----------- | ----| --- | ----- |
| All models | New column | | `source_relation` | Identifies the source connection when using multiple Github connections |

## Feature Update
- **Union Data Functionality**: This release supports running the package on multiple GitHub source connections. See the [README](https://github.com/fivetran/dbt_github/tree/main?tab=readme-ov-file#step-3-define-database-and-schema-variables) for details on how to leverage this feature.

# dbt_github v1.0.0

[PR #67](https://github.com/fivetran/dbt_github/pull/67) includes the following updates:
Expand Down
60 changes: 58 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,19 +60,75 @@ Include the following github package version in your `packages.yml` file.
```yaml
packages:
- package: fivetran/github
version: [">=1.0.0", "<1.1.0"] # we recommend using ranges to capture non-breaking changes automatically
version: [">=1.1.0", "<1.2.0"] # we recommend using ranges to capture non-breaking changes automatically
```

> All required sources and staging models are now bundled into this transformation package. Do not include `fivetran/github_source` in your `packages.yml` since this package has been deprecated.

### Step 3: Define database and schema variables

#### Option A: Single connection
By default, this package runs using your [destination](https://docs.getdbt.com/docs/running-a-dbt-project/using-the-command-line-interface/configure-your-profile) and the `github` schema. If this is not where your GitHub data is (for example, if your github schema is named `github_fivetran`), add the following configuration to your root `dbt_project.yml` file:

```yml
vars:
github:
github_database: your_database_name
github_schema: your_schema_name
github_schema: your_schema_name
```

#### Option B: Union multiple connections
If you have multiple GitHub connections in Fivetran and would like to use this package on all of them simultaneously, we have provided functionality to do so. For each source table, the package will union all of the data together and pass the unioned table into the transformations. The `source_relation` column in each model indicates the origin of each record.

To use this functionality, you will need to set the github_sources variable in your root dbt_project.yml file:
Comment thread
fivetran-catfritz marked this conversation as resolved.
Outdated

```yml
# dbt_project.yml

vars:
github:
github_sources:
- database: connection_1_destination_name # Required
schema: connection_1_schema_name # Required
name: connection_1_source_name # Required only if following the step in the following subsection

- database: connection_2_destination_name
schema: connection_2_schema_name
name: connection_2_source_name
```

##### Recommended: Incorporate unioned sources into DAG
> *If you are running the package through [Fivetran Transformations for dbt Core™](https://fivetran.com/docs/transformations/dbt#transformationsfordbtcore), the below step is necessary in order to synchronize model runs with your GitHub connections. Alternatively, you may choose to run the package through Fivetran [Quickstart](https://fivetran.com/docs/transformations/quickstart), which would create separate sets of models for each GitHub source rather than one set of unioned models.*

By default, this package defines one single-connection source, called `github`, which will be disabled if you are unioning multiple connections. This means that your DAG will not include your GitHub sources, though the package will run successfully.
Comment thread
fivetran-catfritz marked this conversation as resolved.

To properly incorporate all of your GitHub connections into your project's DAG:
1. Define each of your sources in a `.yml` file in your project. Utilize the following template for the `source`-level configurations, and, **most importantly**, copy and paste the table and column-level definitions from the package's `src_github.yml` [file](https://github.com/fivetran/dbt_github/blob/main/models/staging/src_github.yml).

```yml
# a .yml file in your root project
Comment thread
fivetran-catfritz marked this conversation as resolved.
sources:
- name: <name> # ex: Should match name in github_sources
schema: <schema_name>
database: <database_name>
loader: fivetran
loaded_at_field: _fivetran_synced

freshness: # feel free to adjust to your liking
warn_after: {count: 72, period: hour}
error_after: {count: 168, period: hour}
Comment thread
fivetran-catfritz marked this conversation as resolved.
Outdated

tables: # copy and paste from github/models/staging/src_github.yml - see https://support.atlassian.com/bitbucket-cloud/docs/yaml-anchors/ for how to use anchors to only do so once
```

> **Note**: If there are source tables you do not have (see [Step 4](https://github.com/fivetran/dbt_github?tab=readme-ov-file#step-4-disable-models-for-non-existent-sources)), you may still include them, as long as you have set the right variables to `False`. Otherwise, you may remove them from your source definition.
Comment thread
fivetran-catfritz marked this conversation as resolved.
Outdated

2. Set the `has_defined_sources` variable (scoped to the `github` package) to `True`, like such:
```yml
# dbt_project.yml
vars:
github:
has_defined_sources: true
```

### Step 4: Disable models for non-existent sources
Expand Down
3 changes: 2 additions & 1 deletion dbt_project.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
config-version: 2
name: 'github'
version: '1.0.0'
version: '1.1.0'
require-dbt-version: [">=1.3.0", "<2.0.0"]
models:
github:
Expand All @@ -15,6 +15,7 @@ models:
+materialized: view
vars:
github:
github_sources: []
issue_assignee: "{{ source('github', 'issue_assignee') }}"
issue_closed_history: "{{ source('github', 'issue_closed_history') }}"
issue_comment: "{{ source('github', 'issue_comment') }}"
Expand Down
2 changes: 1 addition & 1 deletion docs/catalog.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/manifest.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion integration_tests/dbt_project.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name: 'github_integration_tests'
version: '1.0.0'
version: '1.1.0'
config-version: 2
profile: 'integration_tests'
vars:
Expand Down
15 changes: 15 additions & 0 deletions macros/union/apply_source_relation.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{% macro apply_source_relation() -%}

{{ adapter.dispatch('apply_source_relation', 'github') () }}

{%- endmacro %}

{% macro default__apply_source_relation() -%}

{% if var('github_sources', []) != [] %}
, _dbt_source_relation as source_relation
{% else %}
, '{{ var("github_database", target.database) }}' || '.'|| '{{ var("github_schema", "github") }}' as source_relation
{% endif %}

{%- endmacro %}
76 changes: 76 additions & 0 deletions macros/union/github_union_connections.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
{% macro github_union_connections(connection_dictionary, single_source_name, single_table_name, default_identifier) %}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove the default_identifier argument? It was necessary for Netsuite since the Netsuite1 vs Netsuite2 names are slightly different (which is why the table names != the source table names), but here the single_table_name and default_identifier are always the same

I don't think it's necessary for the other packages we plan to roll union_data out to as well (I think)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From standup, we decided to use default_identifier=single_table_name so we can keep this and for another package to use if necessary. We'll store macros like this in a central area but just for reference and not as a package.


{{ adapter.dispatch('github_union_connections', 'github') (connection_dictionary, single_source_name, single_table_name, default_identifier) }}

{%- endmacro %}

{% macro default__github_union_connections(connection_dictionary, single_source_name, single_table_name, default_identifier) %}

{% if connection_dictionary %}
{# For unioning #}
{%- set relations = [] -%}
{%- for connection in connection_dictionary -%}
Comment thread
fivetran-catfritz marked this conversation as resolved.
Outdated

{%- set relation=adapter.get_relation(
database=source(connection.name, single_table_name).database,
schema=source(connection.name, single_table_name).schema,
identifier=source(connection.name, single_table_name).identifier)
if var('has_defined_sources', false)

else adapter.get_relation(
database=connection.database if connection.database else target.database,
schema=connection.schema if connection.schema else single_source_name,
identifier=default_identifier
)
-%}

{%- if relation is not none -%}
{%- do relations.append(relation) -%}
{%- endif -%}

{%- endfor -%}

{%- if relations != [] -%}
{{ github.github_union_relations(relations) }}
{%- else -%}
{% if execute and not var('fivetran__remove_empty_table_warnings', false) -%}
{{ exceptions.warn("\n\nPlease be aware: The " ~ single_source_name ~ "." ~ single_table_name ~ " table was not found in your schema(s). The Fivetran Data Model will create a completely empty staging model as to not break downstream transformations. To turn off these warnings, set the `fivetran__remove_empty_table_warnings` variable to TRUE (see https://github.com/fivetran/dbt_fivetran_utils/tree/releases/v0.4.latest#union_data-source for details).\n") }}
{% endif -%}
select
cast(null as {{ dbt.type_string() }}) as _dbt_source_relation
limit {{ '0' if target.type != 'redshift' else '1' }}
{%- endif -%}

{% else %}
{# Not unioning #}

{% set identifier_var = single_source_name + "_" + single_table_name + "_identifier"%}

{%- set relation=adapter.get_relation(
database=source(single_source_name, single_table_name).database,
schema=source(single_source_name, single_table_name).schema,
identifier=source(single_source_name, single_table_name).identifier
) -%}
-- ** Values passed to adapter.get_relation:
{{ '-- full-identifier_var: ' ~ identifier_var }}
{{ '-- database: ' ~ source(single_source_name, single_table_name).database }}
{{ '-- schema: ' ~ source(single_source_name, single_table_name).schema }}
{{ '-- identifier: ' ~ source(single_source_name, single_table_name).identifier }}

{% if relation is not none -%}
select
{{ dbt_utils.star(from=source(single_source_name, single_table_name)) }}
from {{ source(single_source_name, single_table_name) }} as source_table

{% else %}
{% if execute and not var('fivetran__remove_empty_table_warnings', false) -%}
{{ exceptions.warn("\n\nPlease be aware: The " ~ single_source_name|upper ~ "." ~ single_table_name|upper ~ " table was not found in your schema(s). The Fivetran Data Model will create a completely empty staging model as to not break downstream transformations. To turn off these warnings, set the `fivetran__remove_empty_table_warnings` variable to TRUE (see https://github.com/fivetran/dbt_fivetran_utils/tree/releases/v0.4.latest#union_data-source for details).\n") }}
{% endif -%}

select
cast(null as {{ dbt.type_string() }}) as _dbt_source_relation
limit {{ '0' if target.type != 'redshift' else '1' }}
{%- endif -%}
{% endif -%}

{%- endmacro %}
131 changes: 131 additions & 0 deletions macros/union/github_union_relations.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
{# Adapted from dbt_utils.union_relations() #}

{%- macro github_union_relations(relations, aliases=none, column_override=none, include=[], exclude=[], source_column_name='_dbt_source_relation', where=none) -%}
{{ return(adapter.dispatch('github_union_relations', 'github')(relations, aliases, column_override, include, exclude, source_column_name, where)) }}
{% endmacro %}

{%- macro default__github_union_relations(relations, aliases=none, column_override=none, include=[], exclude=[], source_column_name='_dbt_source_relation', where=none) -%}

{%- if exclude and include -%}
{{ exceptions.raise_compiler_error("Both an exclude and include list were provided to the `union` macro. Only one is allowed") }}
{%- endif -%}

{#-- Prevent querying of db in parsing mode. This works because this macro does not create any new refs. -#}
{%- if not execute %}
{{ return('') }}
{% endif -%}

{%- set column_override = column_override if column_override is not none else {} -%}

{%- set relation_columns = {} -%}
{%- set column_superset = {} -%}
{%- set all_excludes = [] -%}
{%- set all_includes = [] -%}

{%- if exclude -%}
{%- for exc in exclude -%}
{%- do all_excludes.append(exc | lower) -%}
{%- endfor -%}
{%- endif -%}

{%- if include -%}
{%- for inc in include -%}
{%- do all_includes.append(inc | lower) -%}
{%- endfor -%}
{%- endif -%}

{%- for relation in relations -%}

{%- do relation_columns.update({relation: []}) -%}

{%- do dbt_utils._is_relation(relation, 'github_union_relations') -%}
{%- do dbt_utils._is_ephemeral(relation, 'github_union_relations') -%}
{%- set cols = adapter.get_columns_in_relation(relation) -%}
{%- for col in cols -%}

{#- If an exclude list was provided and the column is in the list, do nothing -#}
{%- if exclude and col.column | lower in all_excludes -%}

{#- If an include list was provided and the column is not in the list, do nothing -#}
{%- elif include and col.column | lower not in all_includes -%}

{#- Otherwise add the column to the column superset -#}
{%- else -%}

{#- update the list of columns in this relation -#}
{%- do relation_columns[relation].append(col.column) -%}

{%- if col.column in column_superset -%}

{%- set stored = column_superset[col.column] -%}
{%- if col.is_string() and stored.is_string() and col.string_size() > stored.string_size() -%}

{%- do column_superset.update({col.column: col}) -%}

{%- endif %}

{%- else -%}

{%- do column_superset.update({col.column: col}) -%}

{%- endif -%}

{%- endif -%}

{%- endfor -%}
{%- endfor -%}

{%- set ordered_column_names = column_superset.keys() -%}
{%- set dbt_command = flags.WHICH -%}


{% if dbt_command in ['run', 'build'] %}
{% if (include | length > 0 or exclude | length > 0) and not column_superset.keys() %}
{%- set relations_string -%}
{%- for relation in relations -%}
{{ relation.name }}
{%- if not loop.last %}, {% endif -%}
{%- endfor -%}
{%- endset -%}

{%- set error_message -%}
There were no columns found to union for relations {{ relations_string }}
{%- endset -%}

{{ exceptions.raise_compiler_error(error_message) }}
{%- endif -%}
{%- endif -%}

{%- for relation in relations %}

(
select

{%- if source_column_name is not none %}
cast({{ dbt.string_literal(relation.database ~ '.' ~ relation.schema) }} as {{ dbt.type_string() }}) as {{ source_column_name }},
{%- endif %}

{% for col_name in ordered_column_names -%}

{%- set col = column_superset[col_name] %}
{%- set col_type = column_override.get(col.column, col.data_type) %}
{%- set col_name = adapter.quote(col_name) if col_name in relation_columns[relation] else 'null' %}
cast({{ col_name }} as {{ col_type }}) as {{ col.quoted }} {% if not loop.last %},{% endif -%}

{%- endfor %}

{# This alias is the only addition made to the dbt_utils.union_relations() code. Avoids errors if the table is named a reserved keyword #}
from {{ aliases[loop.index0] if aliases else relation }} as unioned_relation_{{ loop.index }}

{% if where -%}
where {{ where }}
{%- endif %}
)

{% if not loop.last -%}
union all
{% endif -%}

{%- endfor -%}

{%- endmacro -%}
1 change: 1 addition & 0 deletions models/docs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{% docs source_relation %} Identifies the record's source. {% enddocs %}
Loading