Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/jekyll.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ jobs:
- name: Setup Ruby
uses: ruby/setup-ruby@v1.204.0
with:
ruby-version: '3.3'
ruby-version: '3.1'
bundler-cache: true
cache-version: 0

Expand Down
53 changes: 42 additions & 11 deletions _data/sidebars/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,23 +12,54 @@ subitems:
url: /access#duration
- title: Connecting to BioShell
url: /access#connecting
subitems:
- title: Generate an SSH key
url: /access#ssh-key
- title: Submit your public key
url: /access#submit-key
- title: Connect
url: /access#connect
- title: Using BioShell
url: /using-bioshell
subitems:
- title: Choosing a flavour
url: /using-bioshell#flavours
- title: Reproducible tools
url: /using-bioshell#tools
- title: Bio-Shelley
url: /using-bioshell#bio-shelley
- title: Reference datasets
url: /using-bioshell#reference-data
url: /flavours
subitems:
- title: Flavour types
url: /flavours#flavour-types
- title: Quick sizing guide
url: /flavours#sizing-guide
- title: Worked examples
url: /flavours#worked-examples
- title: How to choose
url: /flavours#how-to-choose
- title: Tools and reference data
url: /tools
subitems:
- title: Verify CVMFS
url: /tools#cvmfs
- title: Find and install with Bio-Shelley
url: /tools#bio-shelley
- title: Load a tool
url: /tools#load-tool
- title: Reference datasets
url: /tools#reference-data
- title: Advanced — manual SHPC
url: /tools#shpc-manual
- title: Interactive environments
url: /using-bioshell#interactive
url: /interactive
subitems:
- title: JupyterLab
url: /interactive#jupyterlab
- title: RStudio
url: /interactive#rstudio
- title: Using Nextflow with CVMFS
url: /using-bioshell#nextflow
- title: Advanced — manual SHPC
url: /using-bioshell#shpc-manual
url: /nextflow
subitems:
- title: CVMFS containers
url: /nextflow#cvmfs-containers
- title: SHPC modules
url: /nextflow#shpc-module
- title: BioShell in practice
url: /community
subitems:
Expand Down
108 changes: 103 additions & 5 deletions pages/access.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,15 +77,113 @@ application.

## Connecting to BioShell {#connecting}

Once your environment is provisioned you will receive connection details by email. Connect
via SSH from your local terminal:
Once your environment is provisioned you will receive connection details by email.

### Step 1 — Generate an SSH key {#ssh-key}

BioShell uses SSH key authentication. If you do not already have an SSH key pair, generate
one on your local machine before connecting.

**Quick start** (works on macOS, Linux, and Windows with OpenSSH):

```bash
ssh-keygen -t ed25519 -C "user@example.com" -f ~/.ssh/bioshell_key
```

Replace `user@example.com` with your own email address. Accept the default prompts, or set
a passphrase when asked (recommended).

**Complete setup — macOS:**

```bash
# Create .ssh directory if it doesn't exist
mkdir -p ~/.ssh && chmod 700 ~/.ssh

# Generate the key
ssh-keygen -t ed25519 -C "user@example.com" -f ~/.ssh/bioshell_key

# Start ssh-agent and load the key into the macOS Keychain
eval "$(ssh-agent -s)"
ssh-add --apple-use-keychain ~/.ssh/bioshell_key

# Set correct permissions on the private key
chmod 600 ~/.ssh/bioshell_key

# Copy the public key to your clipboard
pbcopy < ~/.ssh/bioshell_key.pub
```

**Complete setup — Linux:**

```bash
# Create .ssh directory if it doesn't exist
mkdir -p ~/.ssh && chmod 700 ~/.ssh

# Generate the key
ssh-keygen -t ed25519 -C "user@example.com" -f ~/.ssh/bioshell_key

# Start ssh-agent and add the key
eval "$(ssh-agent -s)"
ssh-add ~/.ssh/bioshell_key

# Set correct permissions on the private key
chmod 600 ~/.ssh/bioshell_key

# Copy the public key to your clipboard (requires xclip)
xclip -selection clipboard < ~/.ssh/bioshell_key.pub
```

**Complete setup — Windows (PowerShell with OpenSSH):**

```powershell
# Generate the key
ssh-keygen -t ed25519 -C "user@example.com" -f "$env:USERPROFILE\.ssh\bioshell_key"

# Start ssh-agent service and add the key
Start-Service ssh-agent
ssh-add "$env:USERPROFILE\.ssh\bioshell_key"

# Copy the public key to your clipboard
Get-Content "$env:USERPROFILE\.ssh\bioshell_key.pub" | Set-Clipboard
```

> **Note:** OpenSSH ships with Windows 10 (version 1809 and later) and Windows 11. If
> `ssh-keygen` is not found, go to **Settings → Apps → Optional features** and install
> **OpenSSH Client**.

> **Tip:** Your public key is the `.pub` file — this is what you share with others or
> submit when registering for access. Never share your private key (the file without `.pub`).

### Step 2 — Submit your public key {#submit-key}

[AUTHOR TO SUPPLY — confirm how users submit their public key as part of the BioShell
provisioning process, e.g. via the access request form or a separate step after approval]

### Step 3 — Connect {#connect}

Add an entry to `~/.ssh/config` so SSH automatically uses your BioShell key without
needing to specify it each time:

```
Host bioshell
HostName <your-bioshell-ip>
User <username>
IdentityFile ~/.ssh/bioshell_key
```

Then connect with:

```bash
ssh bioshell
```

Or connect directly without the config entry:

```bash
ssh <username>@<your-bioshell-ip>
ssh -i ~/.ssh/bioshell_key <username>@<your-bioshell-ip>
```

[AUTHOR TO SUPPLY — confirm username format and whether SSH key or password authentication
is used]
[AUTHOR TO SUPPLY — confirm username format]

Once connected, you can also open interactive environments directly in your browser:

Expand Down
110 changes: 110 additions & 0 deletions pages/flavours.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
---
title: Choosing a flavour
description: How to select the right combination of vCPUs and memory for your BioShell environment.
---

A flavour is the combination of virtual CPUs (vCPUs) and memory (RAM) allocated to your
BioShell environment. BioShell is a shared national resource, so you are encouraged to
request a flavour that closely matches your actual needs — this keeps capacity available for
everyone.

Estimating your requirements can be hard at the start of a project. Use the guidance below
to make a reasonable starting choice. You can always request a larger flavour later if your
analysis grows.

---

## Flavour types at a glance {#flavour-types}

| Type | Characteristics | Best for |
|------|----------------|----------|
| **t3** | Low memory relative to CPUs | Testing, small jobs, getting started |
| **m3** | Balanced CPU and memory | Most general-purpose workloads |
| **c3** | CPU-optimised (same memory ratio as m3) | Compute-heavy tasks such as alignment and assembly |
| **r3** | High memory per CPU | Memory-intensive analysis such as variant calling or large data processing |

### Available flavours

| vCPUs | RAM (GB) | Nectar flavours | Nirin flavours |
|-------|----------|----------------|----------------|
| 1 | 1 | t3.xsmall | c3.1c1m5d |
| 1 | 2 | m3.xsmall, c3.xsmall | c3.1c2m10d |
| 1 | 4 | r3.xsmall | — |
| 2 | 4 | m3.small, c3.small | c3.2c4m20d, c3.2c4m10d |
| 2 | 8 | r3.small | — |
| 4 | 8 | m3.medium, c3.medium | c3.4c8m20d, c3.4c8m10d |
| 4 | 16 | r3.medium | — |
| 8 | 16 | m3.large, c3.large | c3.8c16m20d, c3.8c16m10d |
| 8 | 32 | r3.large | — |
| 16 | 32 | m3.xlarge, c3.xlarge | — |
| 16 | 64 | r3.xlarge | — |
| 32 | 64 | m3.xxlarge, c3.xxlarge | — |
| 32 | 128 | r3.xxlarge | — |
| 64 | 128 | c3.3xlarge | — |

---

## Quick sizing guide {#sizing-guide}

| Task type | Suggested resources |
|-----------|-------------------|
| Light preprocessing (QC, trimming, filtering) | 1–4 vCPUs, 2–8 GB RAM |
| Alignment and assembly (e.g. `bwa`, `STAR`, `SPAdes`) | 8–16 vCPUs, 16–32 GB RAM |
| Memory-intensive analysis (variant calling, genome-wide statistics) | 16–32+ vCPUs, 32–128 GB RAM |
| Interactive analysis and visualisation (JupyterLab, RStudio) | 2–4 vCPUs, 8–16 GB RAM |

---

## Worked examples {#worked-examples}

### Light processing

John is starting a project analysing drought-resistant genes from 20 crop samples (~7 GB
raw sequencing data per sample). His pipeline includes quality control with `FastQC`,
adapter trimming, alignment to a reference genome, annotation, and phylogenetic tree
construction.

The most resource-intensive steps require moderate CPU and memory. Because John is analysing
a subset of genes rather than whole genomes, each run is relatively small:

- 2–4 vCPUs
- Up to 10 GB RAM

A balanced `m3` flavour is a good starting point:

- **Nectar:** `m3.medium` (4 vCPUs / 8 GB RAM)
- **Nirin:** `c3.4c8m10d` or `c3.4c8m20d` (4 vCPUs / 8 GB RAM)

### Memory-intensive processing

Georgie is running `GATK4` best-practice variant calling on 15 human exome samples. Each
sample produces BAM files of ~15 GB, with similar-sized temporary files during processing.
Total storage is approximately 1 TB. `GATK4` tools benefit from both high memory and
multiple CPU cores:

- 8 vCPUs
- 32 GB RAM

Recommended flavours for ~15 exomes:

- **Nectar:** `r3.large` (8 vCPUs / 32 GB RAM)
- **Nirin:** `c3.8c16m20d` or `c3.8c16m10d` (8 vCPUs / 16 GB RAM)

If processing ~30 exomes or running multiple jobs in parallel, consider `r3.xlarge`
(16 vCPUs / 64 GB RAM) on Nectar, or multiple 8 vCPU Nirin instances.

---

## How to choose if you are unsure {#how-to-choose}

1. **Check software documentation** — most bioinformatics tools publish minimum and
recommended system requirements.
2. **Start small and scale up** — begin with a smaller flavour. If jobs run slowly with
CPU usage consistently near 100%, you need more vCPUs. If jobs fail with memory errors,
you need more RAM.
3. **Review previous runs** — if you have run similar analyses before, check your peak CPU
and RAM usage from those logs to guide your estimate.

> **Note:** For interactive work in JupyterLab or RStudio, 2–4 vCPUs and 8–16 GB RAM is
> sufficient for most datasets. Increase memory if you are loading large files directly
> into your session.
56 changes: 56 additions & 0 deletions pages/interactive.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
---
title: Interactive environments
description: How to use JupyterLab and RStudio browser-based coding environments in BioShell.
---

BioShell supports two browser-based interactive environments for notebook and script-based
work. Both run on your BioShell instance and are accessible through your browser once you
have connected via SSH.

> **Important:** You must have an active SSH connection to your BioShell instance before
> opening either environment in your browser. See [Connecting to BioShell](/access#connecting).

---

## JupyterLab {#jupyterlab}

JupyterLab is a browser-based environment for writing and running Python, R, and shell
notebooks. It is well suited to exploratory analysis, visualisation, and combining code
with documentation.

Open your browser and go to:

```
http://<your-bioshell-ip>:8888
```

![](images/bioshell/SCREENSHOT_NEEDED_jupyterlab.png)

*Fig 4. JupyterLab open in a browser connected to a BioShell instance.*

---

## RStudio {#rstudio}

RStudio is a browser-based environment for R-based analysis. If you are familiar with the
RStudio desktop application, the browser version works in exactly the same way.

Open your browser and go to:

```
http://<your-bioshell-ip>:8787
```

![](images/bioshell/SCREENSHOT_NEEDED_rstudio.png)

*Fig 5. RStudio open in a browser connected to a BioShell instance.*

---

> **Tip:** Your SSH terminal and browser environments connect to the same BioShell instance.
> Any tool you have loaded with `module load` in your terminal is also available inside
> JupyterLab and RStudio.

> **Note:** If you cannot reach JupyterLab or RStudio in your browser, check that your SSH
> connection is still active. If you are connecting from outside your institution's network,
> you may need an SSH tunnel. Contact your local IT support for help with this.
Loading
Loading