Epic: Researcher User Workflow — End-to-End Execution Experience

Owner: Anurag Gupta

Contributors: Javed Habib (execution backend), Khoa Nguyen (sandboxing / code execution), Samyak Jain (MDI / ingestion)

Version: 1.1 (March 2026)

Purpose of This Document

This document defines the frontend architecture and implementation plan for the researcher-facing workflow execution experience. It covers the full roundtrip from cohort export through samplesheet assembly, pipeline execution, and result retrieval.

It is also the integration reference for backend contributors. The contracts defined here — API shapes, data handoff points, error formats — determine what the backend must deliver for the UI to function. Where those contracts are not yet finalised, this document flags them as open dependencies.

The Problem Being Solved

The platform already has several pieces: a cohort builder, a file registry, a pipeline execution engine, an ingestion service. What it does not have is a continuous path through them. Today, a researcher who builds a cohort has no clear next step inside the platform. They would have to export a file, edit it externally, manually assemble a Nextflow command, and run it somewhere else. That is not a platform — that is a collection of tools.

This epic closes the gap. It makes the platform the place where the full roundtrip happens:

Cohort → Samplesheet → Edit → Run → Results

Everything in scope here is in service of that single journey.

User Journey

The researcher has built a cohort — a named collection of samples — in the cohort builder. They click Export to Samplesheet.
The platform generates a CSV pre-filled with sample IDs and file paths for all samples in the cohort. It opens immediately in an in-platform spreadsheet editor.
The researcher reviews the samplesheet. They:
- Add or remove rows (mix in their own uploaded files alongside SRA data)
- Fill in pipeline-specific columns the platform cannot auto-fill (e.g. strandedness, contrast group labels)
- Copy file paths from the project file browser directly into cells, without leaving the editor
When ready, they click Save & Run, select a pipeline, and submit.
The run status updates automatically — Queued → Running → Complete — without page refreshes.
On completion, the researcher opens the output browser and downloads results.

No command line. No spreadsheet software. No leaving the platform.

Reference

PA Platform Roadmap v1.0 (March 2026) — UC-2, §1.1 (Labels as Soft Associations), §1.4 (Platform as Workspace), TASK-03-A/B
Boris meeting notes, March 3 2026:
- "For the simplest use case, the user provides the sample sheet."
- "Users must have the ability to edit the sample sheet CSV — they may need to add their own data locations or provide specific parameters."
- "The sample sheet can be a mix of SRA data and user-uploaded data."
- "The cohort function generates soft links to samples — it does not trigger downloads or run workflows."

Integration Map

This document sits at the intersection of four backend systems. The frontend does not own any of them, but it is the component that must make them coherent to the user. Each integration point is a dependency that needs a defined contract before the corresponding frontend task can ship.

┌─────────────────────┐     export      ┌──────────────────────────┐
│   Cohort Builder    │ ─────────────── │                          │
│                     │                 │   Samplesheet Editor     │
└─────────────────────┘                 │   (this epic)            │
                                        │                          │
┌─────────────────────┐   file paths    │                          │
│   DRS / File        │ ─────────────── │                          │
│   Registry          │                 └──────────┬───────────────┘
└─────────────────────┘                            │
                                                   │ submit run
┌─────────────────────┐   pipeline list            ▼
│   Pipeline Registry │ ─────────────── ┌──────────────────────────┐
│   TRS               │                 │   Run Monitor            │
└─────────────────────┘                 │   (this epic)            │
                                        │                          │
┌─────────────────────┐   run status    │                          │
│   Execution Engine  │ ─────────────── │                          │
│   Some sort of exec │                 └──────────────────────────┘
└─────────────────────┘

Open API contracts:

#	Contract
C-1	Cohort export endpoint — shape of the CSV returned, how blank fields are represented
C-2	File registry query — how to list project files and extract their storage paths
C-3	Pipeline list endpoint — fields returned, how to filter DSL2-only
C-4	Run submission payload — samplesheet reference format, params file format
C-5	Run status response — status enum, error message format

What the Platform Fills vs What the User Fills

The platform fills what it knows from the cohort and the file registry. The user fills what only they know. Importantly, user-uploaded files can carry more pre-filled information than SRA data, because the researcher annotated them with custom metadata at upload time.

Column	SRA-ingested files	User-uploaded files	Source
`sample`	Platform	Platform	SRR accession ID / upload filename
`fastq_1` / `fastq_2`	Platform	Platform	DRS storage path (S3 URI)
`strandedness`	User	Platform (if annotated)	SRA metadata unreliable; upload annotation is explicit
`organism`	Platform	Platform (if annotated)	SRA metadata / upload annotation
`paired_end`	Platform	Platform (if annotated)	Detected from SRA layout / upload annotation
Custom fields (e.g. `condition`, `traveller_status`)	Not available	Platform (if annotated)	User-defined at upload time
`cohort_arm` / contrast group	User	User	Experimental design decision — platform never knows this

The key principle: metadata the researcher already provided when uploading a file should not need to be re-entered in the samplesheet. The file picker panel surfaces that metadata and inserts it automatically where column names match.

Cells the platform filled are visually distinct. Cells still requiring user input are highlighted. The editor does not block saving on empty cells — it trusts the researcher.

Scope

In Scope

Cohort export → samplesheet generation (with backend contract definition)
In-platform spreadsheet editor: view, edit, add/remove rows and columns
File path and metadata picker integrated into the editor — browse project files (SRA-ingested and user-uploaded), insert paths and pre-annotated metadata directly into samplesheet rows
Save samplesheet to project (persisted as a named file, accessible later)
Pipeline catalogue (browse, search, select)
Run submission from the editor (samplesheet + pipeline → submit in one action)
Optional parameters file upload at submission time
Run list with live status polling (no page refresh required)
Run detail view with plain-English error messages
Read-only output file browser with download links

Out of Scope

Auto-filling strandedness, cohort_arm, or other experimental design columns
Pre-flight samplesheet validation against pipeline column schema — post-MVP
Dynamic parameter forms from pipeline schemas — post-MVP
Cohort builder itself — separate epic (this epic begins at the Export button)
Data ingestion / SRA download UI — separate epic (Samyak)
Statistical analysis / sandboxing UI — separate epic (Khoa)
Live log streaming during a run
Stop / resume / delete a run
AI-assisted parameter suggestions — post-MVP

Design Decisions

In-platform editor, not download-and-reupload

The simpler path is to export a CSV and let the user edit it in Excel. We are not doing this. The problem: the user needs to fill in file paths that only exist inside the platform. If they leave to edit, they have no way to get those paths without coming back in anyway. The round-trip is: export → open Excel → come back to copy a path → paste → repeat for every row. On 50 samples, this is untenable.

The in-platform editor solves this by bringing the file browser into the editing context. The user never needs to leave.

Samplesheet is a first-class project file

The samplesheet is saved to the project, not used once and discarded. This has two consequences: (1) the researcher can iterate — modify the samplesheet and re-run the pipeline without starting from scratch; (2) the samplesheet becomes part of the project's scientific record, which matters for publishing and reproducibility. The roadmap (§1.4) explicitly positions the platform as a workspace, not a one-shot tool. Saving samplesheets is consistent with that principle.

Polling, not push notifications

Run status is updated by polling the backend every 10–15 seconds. This is adequate for runs that take minutes to hours, requires no additional infrastructure, and aligns with the WES spec's polling-first design. Event-driven updates are additive later.

Error messages are the UI's responsibility

When a run fails, the error originates in the execution engine (Metis / Nextflow). The frontend must translate that error into something the researcher can act on. "Exit status 1" is not useful. "Column 'strandedness' was not found in your samplesheet — this pipeline requires it" is. Translating backend errors into researcher-legible messages is explicitly frontend work and is budgeted into the run monitoring tasks.

Implementation Plan

Workstream 1: Cohort → Samplesheet Handoff

Goal: Define and implement the handoff between the cohort builder and the samplesheet editor.

This is a cross-team integration. The backend (Javed) owns the export endpoint. The frontend (Anurag) owns the editor that receives it. Both sides need to agree on the contract before either builds.

[TASK-FE-01-A] Define and agree the cohort export API contract

Before any code, align with Javed on:

Endpoint: GET /cohorts/{cohort_id}/samplesheet?pipeline={pipeline_id} (proposed)
Response: CSV (or JSON that the frontend renders as CSV) with columns sample, fastq_1, fastq_2 filled where available; pipeline-specific columns (e.g. strandedness) included as blank columns
For samples with no ingested files: row is included, fastq_1 / fastq_2 are empty, a flag indicates "not yet ingested"
For single-end samples: fastq_2 column is present but empty

This is a design task, not a coding task. Output is a shared API spec document, not code.

Done when: Javed and Anurag have signed off on the endpoint shape and the frontend can begin building against a mock.

[TASK-FE-01-B] Export button in the cohort builder

Add an Export to Samplesheet button to the cohort builder UI.

Calls the export endpoint from FE-01-A
Opens the response in the samplesheet editor (same-page transition, not a new tab)
If cohort contains samples with no ingested files: show a banner in the editor explaining which rows are incomplete and why — "3 samples haven't been ingested yet. File paths for these rows are empty."
If the cohort is empty: show an error before calling the endpoint

Acceptance: A user with a 10-sample cohort clicks Export and sees all 10 rows in the editor with file paths filled, within 2 seconds.

Workstream 2: Samplesheet Editor

Goal: The user can review and complete the samplesheet without leaving the platform.

[TASK-FE-02-A] Spreadsheet editor component

Build a spreadsheet-style grid editor. Evaluate AG Grid Community Edition first — it handles large datasets with virtual scrolling, is MIT-licensed, and has React bindings. Only build a custom grid if AG Grid proves unsuitable.

Minimum feature set:

Editable cells (click to type)
Add row / delete row
Add column (user names it) / delete column (with confirmation)
Editable column headers
Auto-filled cells visually distinct (light background) — user can still overwrite
Empty cells in required-looking columns highlighted in amber (non-blocking)
Undo for the last action
Auto-save draft to browser local storage every 30 seconds (with restore prompt on return)

The editor renders whatever columns come from the export. It does not enforce a fixed schema — that is intentional, because different pipelines expect different columns.

Performance requirement: must handle 500 rows without visible lag. Test this before shipping.

Acceptance: A user can add a column named strandedness, type reverse into 10 cells, delete a row, undo the deletion, and save — all without a page reload.

[TASK-FE-02-B] File path picker with metadata panel

A side panel that lets the user browse all project files — both SRA-ingested and user-uploaded — and insert paths and metadata into the editor. This panel is the primary mechanism for referencing user-uploaded datasets in the samplesheet.

Panel is toggled open/closed without losing editor state
Lists all files registered in the project (from DRS GET /objects), regardless of origin — SRA download or user upload are treated identically
For each file, shows:
- Filename, file type, size
- All metadata attached to that file: organism, assay_type, paired_end, strandedness where available; and any custom fields the user annotated at upload time (e.g. condition, tissue_type, traveller_status)
Two interaction modes:
- Insert path: clicking a file inserts its storage path into the currently active cell
- Insert row: for user-uploaded files with metadata, a button inserts a full new row into the samplesheet, pre-filling sample (from filename), fastq_1 / fastq_2 (from storage path), and any metadata fields that match existing column names in the samplesheet
Supports text search and filter by file type or metadata field within the panel
Metadata is read from the DRS object's registered attributes — the panel does not require the user to re-enter anything they already provided at upload time

This panel is particularly important for user-uploaded data. When a researcher uploads their own FASTQ files and annotates them with custom metadata (e.g. strandedness = reverse, condition = treated), that information should flow directly into the samplesheet without the user having to recall or retype it.

Acceptance: A user with an uploaded FASTQ file annotated with strandedness = reverse opens the picker, clicks "Insert row", and a new row appears in the samplesheet with fastq_1, fastq_2, and strandedness already filled from the file's registered metadata.

[TASK-FE-02-C] Save samplesheet to project

Save: persists current state with a default name (samplesheet-{cohort-name}-{date}.csv). Subsequent saves overwrite.
Save As: prompts for a name — allows multiple versions of the same samplesheet
Saved samplesheets appear in the project file browser alongside other project files
Storage: uploaded to MinIO and registered as a DRS object via the existing file registration path
Loading a saved samplesheet: clicking it in the file browser opens it back in the editor

Acceptance: A user saves a samplesheet, closes the browser, returns, opens the file from the project browser, and sees all their edits intact.

Workstream 3: Pipeline Selection & Run Submission

Goal: From the samplesheet editor, the user selects a pipeline and submits the run in one action.

[TASK-FE-03-A] Pipeline catalogue

A searchable list of available pipelines, accessible from the editor toolbar and as a standalone page.

Data source: pipeline registry API (GET /tools from TRS — contract needed from Alex/Javed, see C-3)
Display: pipeline name, description, version, pipeline type (RNA-seq, WGS, etc.)
Filter: text search client-side
DSL2-only filter applied server-side — frontend should not need to handle this

Acceptance: A user searches "rnaseq" and sees the correct pipeline listed with its version.

[TASK-FE-03-B] Run submission from the editor

The editor toolbar has a Save & Run button.

Submission flow:

User selects a pipeline from the catalogue panel (or a dropdown in the toolbar)
Optional: user attaches a parameters file (advanced pipeline settings, .yaml or .json)
User clicks Save & Run
Frontend saves the current samplesheet (same as explicit Save)
Frontend submits the run: POST /runs with pipeline ID + samplesheet file reference (as DRS URI or S3 path — per contract C-4)
Spinner shown; on success, redirect to run detail view; on failure, inline error

Helper text adjacent to the button: "File paths in the samplesheet must be valid storage paths. The platform does not resolve or correct paths automatically."

Acceptance: A user selects a pipeline, clicks Save & Run, and within 5 seconds sees the run in the run list with status Queued.

Workstream 4: Run Monitoring

Goal: The user can track runs and understand what happened when something goes wrong.

[TASK-FE-04-A] Run list

A table of all runs in the current project, ordered by most recent.

Columns: Pipeline, Status, Submitted, Duration, Samplesheet (link to the file used)
Status badge: Queued (grey) / Running (blue, animated) / Complete (green) / Failed (red)
Polling: refresh every 10–15 seconds using GET /runs (WES endpoint, OPA-scoped to project)
No page reload required

Acceptance: A run submitted on another tab or by a colleague appears in the list within 15 seconds.

[TASK-FE-04-B] Run detail view

Pipeline name, version, status, timestamps (submitted / started / finished)
Link to the samplesheet file that was used
If Failed: plain-English error message. The frontend is responsible for translating backend error codes/messages into readable text. Maintain a mapping of common failure modes (missing column, invalid path, resource limit exceeded) to human-readable explanations. For unknown errors, show the raw message in a collapsible section rather than as the primary display.
If Complete: prominent button to open the output browser

Acceptance: A user with a Failed run sees a sentence they can act on, not a stack trace.

Workstream 5: Output Browser

Goal: Once a run completes, the user can access and download the results.

[TASK-FE-05-A] Output file browser

Read-only file list scoped to a single completed run's output directory.

Columns: filename, type, size
Download button per file — links must be direct (not proxied through the frontend server; direct S3 presigned URLs or DRS access URLs)
No delete, rename, or move
Large file warning: if a file is >1 GB, add a note near the download button

Acceptance: A user opens the output browser for a completed run, sees the result files, and successfully downloads one.

Full Roundtrip Summary

Cohort Builder
    └── [Export to Samplesheet]
              │
              ▼
    Samplesheet Editor
    ┌────────────────────────────────────────────────┐
    │  sample  │  fastq_1      │  fastq_2      │ ... │
    │  SRR001  │  s3://…/r1.fq │  s3://…/r2.fq │     │  ← platform filled
    │  SRR002  │  s3://…/r1.fq │  s3://…/r2.fq │     │
    │  upload1 │  [file picker]│               │     │  ← user fills
    └────────────────────────────────────────────────┘
    + user adds strandedness column, fills values
    + user adds cohort_arm column, assigns groups
              │
    [Save & Run] — pipeline selected
              │
              ▼
    Run List: Queued → Running → Complete
              │
              ▼
    Output Browser — download results

Risks & Mitigations

API contracts not finalised before frontend build starts The editor and run submission both depend on backend endpoints that are not yet fully defined (C-1 through C-5 above). If frontend build starts before these are agreed, rework is likely. Mitigation: define contracts in week 1 using shared API spec docs. Frontend builds against mocks in parallel.

File paths wrong in the samplesheet If a storage path is malformed or points to a file the user doesn't have access to, the pipeline fails. The file picker (FE-02-B) is the primary mitigation — it inserts correct paths directly from the registry, so the user rarely has to type one manually. The run detail view (FE-04-B) surfaces the error clearly. Post-MVP: pre-flight path validation before submission.

User-uploaded file metadata is incomplete or absent If a user uploads a file without annotating it, the file picker panel will show it with empty metadata fields. The "Insert row" feature will pre-fill only what's available; the rest stays blank for the user to complete. The UI must make this obvious — blank metadata cells in the picker should be visually distinct from populated ones, so the user knows what they still need to fill in the samplesheet. This is a data quality problem, not a UI bug, but the UI should not silently insert empty values without the user noticing.

Samplesheet columns don't match what the pipeline expects Different pipelines require different column names. For MVP, the pipeline fails and the error is surfaced in the run detail view. Post-MVP: validate samplesheet columns against the pipeline schema before submission.

Editor performance with large cohorts 500+ row samplesheets are realistic. A naive HTML table will freeze the browser. AG Grid with virtual scrolling handles this. Performance test with 500 rows before any release.

Cohort has samples with no ingested files A cohort built from MDI search may contain accessions not yet downloaded. The export will have empty file paths for those rows. Mitigation: the editor clearly marks incomplete rows and explains why. The user can still proceed with the complete rows, or trigger ingestion first (separate flow, separate epic).

Editor state lost on refresh before saving Mitigated by auto-saving a draft to browser local storage every 30 seconds, with a restore prompt on return. Explicit Save still required for permanent persistence.

User Flow