PA Platform Roadmap v5.2

Status: Version 5.2 — March 2026
Audience: Leadership, engineering leads, procurement reviewers
Supersedes: All prior Roadmap versions

The PA platform is a sovereign operating system for scientific discovery. It connects data ingestion, workflow execution, AI-driven analysis, and research publishing into a single integrated pipeline — enabling biomedical researchers to move from a research question to a statistically validated, citable result without leaving the platform.

Product Strategy

Two deployment models, sequenced by market readiness:

PA Cloud — Hosted on trusted European cloud providers (Exoscale/SWITCH in Switzerland, Nebius in Netherlands, Genesis Cloud in Germany, Scaleway in France). Targets individual researchers and labs. Sovereignty through choice of national cloud provider. Primary development focus for the first 6 months (M1–M3).
PA On-Prem — Sovereign deployment on institutional infrastructure. Full compliance and data security scope. From M4 onward, every feature ships for both Cloud and On-Prem simultaneously; what differs is the deployment modality, compliance posture, and governance layer.

Platform Verticals

The platform is organised around five verticals (see Atlas v4.0 for full strategic context). Each vertical builds on the prior:

Vertical	Description	First Appears
V1 Biomedical Research OS	Ingest → Execute → Analyse → Publish	M1 (Alpha)
V2 Project Presence	Persistent AI research companion	M3 (foundation)
V3 Sovereign Enterprise Suite	Secure productivity (files, chat, docs, video)	M4 (foundation)
V4 Government OS	Federated analytics, compliance, multi-jurisdiction	M5 (foundation)
V5 DeSci Marketplace	On-chain research IP, data exchange protocol	M6 (foundation)

Development Principle

Every capability ships in its most basic viable form at the milestone where it first appears. Refinement is driven by user feedback, not speculative feature completeness. This applies across all verticals and both products. The scope of M4–M6 is intentionally higher-level — it will be shaped by what we learn from real users in M1–M3.

Milestones (Bimonthly, 12 Months)

Milestone	Target	Primary Product	Vertical Focus
M1 Alpha	April 6, 2026	Cloud	V1 foundation
M2 Beta-1	June 1, 2026	Cloud	V1 full pipeline
M3 Beta-2	August 3, 2026	Cloud	V1 hardening + V2 foundation
M4	October 6, 2026	Cloud + On-Prem	V2 companion + V3 foundation
M5	December 1, 2026	Cloud + On-Prem	V3 productivity + V4 foundation
M6 V1 Full	March 2, 2027	Cloud + On-Prem	V4 government + V5 foundation

Team: 6 (4 backend, 1 frontend, 1 DevOps) + bioinformatics part-time through M3. Seed funding and team expansion to 10–14 happens in October 2026; the larger team is available from M4 onward.

Key Risks

Ares DRS readiness — Multiple services depend on it; status is "under discussion." PostgreSQL shim fallback if unresolved by pre-sprint.
LLM-generated analysis correctness — Plausible but wrong results are worse than crashes. Three-stage validation framework defined.
Manual S3 URI entry UX — Error-prone. File picker and client-side validator mitigate. DRS URI resolution promoted if it becomes a top-3 adoption blocker.

1. Vision and Strategy

1.1 The Research Pipeline

The platform enables researchers to move from a research question to a statistically validated answer through an integrated pipeline, then publish that work as a citable, portable data artefact. Every file at every stage is a DRS object. Publishing is the exit point.

1.2 The Workspace Model

The platform is a workspace, not a repository. Metadata burden does not fall on the researcher during active work. Files are ingested without annotation, cohorts are assembled without rigid labels, workflows run without provenance overhead. When it is time to publish, the Notes ELN entries are the primary narrative thread; datasets, runs, and results are the supporting evidence.

1.3 Labels as Soft Associations

Metadata labels (cohort_arm, contrast_label, assay_type) are soft associations on DRS objects, not structural constraints. The same dataset can participate in multiple contrasts without duplication. Labels are resolved at runtime by the Cohorts Service, not baked in at ingestion. This avoids data redundancy, supports overlapping experimental designs, and keeps ingestion decoupled from analysis context.

1.4 Existing Foundation

The platform builds on a substantial existing base: the Medical Data Index (MDI) with billions of curated SRA metadata points; 1,600+ catalogued nf-core modules across 10 biological domains; a Notes service providing an immutable, cryptographically chained Electronic Lab Notebook; a multi-agent AI system with gateway routing, hypothesis generation, policy analysis, and natural-language database access; three-tier multi-tenancy with OPA enforcement; zero-trust networking with Istio ambient mTLS; four deployment tiers including ephemeral preview environments; and a white-labelled frontend with 9 feature modules. The full service inventory is in Appendix D.

1.5 Anchor Use Case: MRSA Surveillance

A researcher tests whether MRSA strains from international pilgrimage travellers are genetically distinct from locally-acquired strains:

Query MDI for Staphylococcus aureus WGS data filtered by traveller status.
Assemble two cohort groups via the Cohort Builder.
Ingest relevant SRR accessions; files register as DRS objects.
Assemble a samplesheet in the in-platform editor, submit an nf-core WGS pipeline via Metis WES.
Analysis agent runs cohort-level SNP comparison in a sandboxed container.
Record decisions in the Notes ELN throughout.
Publish cohort, run, results, and notes as an RO-Crate.

2. Milestones

2.0 Milestone 0: Resolve Pre-Sprint Blockers

#	Blocker	Action	Owner	Deadline
B1	Ares DRS readiness	Confirm operational by M1 Week 2, or commit to PostgreSQL-backed DRS shim	Architecture lead	Pre-sprint
B2	Notes chain export	Confirm GET /notes/:id/chain is implemented; if not, estimate effort and assign	Notes team	Pre-sprint
B3	API contracts for samplesheet editor	Agree endpoint shapes C-1 through C-5 (see Anurag's Researcher Workflow epic)	Anurag + backend lead	M2 Week 1
B4	Alpha hard deadline	April 6, 2026 — deployed and accessible to invited external users	Boris	Confirmed

2.1 M1: Alpha — Mid-April 2026 (~6 Weeks)

Vertical: V1 Biomedical Research OS (foundation)
Product: PA Cloud
Goal: A researcher can ingest public data, run a Nextflow pipeline, and see registered outputs. Minimum viable proof that the pipeline works end-to-end.

No compliance work in M1. The cloud deployment targets European providers where infrastructure-level security is handled by the hosting provider.

Scope

In:

SRA data ingestion — FASTQ download (AWS Open Data, SRA Toolkit fallback), Upgate upload, DRS registration with source metadata, pre-flight size estimation with user acknowledgement, partial job status (per-file tracking, auto-retry with backoff), controlled-access flagging
nf-core DSL2 workflow execution via Metis WES on Kubernetes — output DRS registration via direct path (batch POST /objects), platform-controlled --outdir
RabbitMQ event topology — pipeline_events exchange, ingestion.complete and workflow.complete routing
Frontend — ingestion UI (accession input, pre-flight, per-file progress), workflow selector (TRS query), dataset browser with copyable S3 URIs and "Copy all" bulk action, config upload with client-side S3 URI validator, run status dashboard with 10–15s polling, result browser with download links
Cloud deployment on European provider (Exoscale or equivalent), accessible to invited users

Out (deferred to M2+):

Analysis agent, publishing module, samplesheet editor, oCIS, accounting, compliance/Wazuh, Agentic OS

Timeline

Week	Backend (2 engineers)	Backend (2 engineers)	Frontend	DevOps
1	Orchestrator scaffold (FastAPI, pa-auth, CNPG migrations, Redis heartbeat)	Nextflow gRPC plugin — TRS/DRS resolution, DSL2 param separation	—	Cloud environment provisioning
1–2	SRA download manager (accession resolution via MDI, dedup, AWS Open Data + fallback)	Nextflow plugin — --outdir enforcement. Target: "Hello World" acceptance	Ingestion UI: accession input, pre-flight, progress	K8s namespace, ServiceAccount, RBAC
2–3	Upgate integration, DRS registration, pre-flight estimation	trs-syncer nf-core plugin, trs-cache-filer validation	Dataset browser with S3 URIs, bulk copy	RabbitMQ exchange + bindings
3–4	RabbitMQ events, partial job status, controlled-access flagging	S3 provisioning + DRS output registration (batch). WES endpoint stabilisation.	Workflow selector, config upload with URI validator	OTEL instrumentation (Go + Python)
5	TRS /tests for top 20 modules	TES engine (trusted + sandboxed profiles), K8sJobWatcher	Run status dashboard, result browser	—
6	Integration testing, bug fixes	End-to-end demo on real SRA data	UX polish	Deploy to cloud

Critical-Path Tasks

Nextflow gRPC Plugin for Metis

TRS URI resolution (workflow repo) and DRS URI resolution (top-level config files submitted via WES)
DSL2 parameter separation: workflow_params → -params-file, resource overrides → -c
Platform-controlled --outdir enforcement (CLI flags override user-provided outdir)
Does not parse contents of user-provided params or sample sheets for embedded DRS URIs
Acceptance: "Hello World" module runs via POST /runs → successful k8s pod execution

Project-Scoped S3 Provisioning + DRS Output Registration

Generate S3 output paths from OPA project context
Post-execution: recursive --outdir parse, compute sizes + MD5 checksums, batch POST /objects to Ares
Acceptance: Workflow outputs in correct S3 location AND registered in DRS

WES Run Management Endpoint Stabilisation

Cursor-based pagination, correct lifecycle states (QUEUED, RUNNING, COMPLETE, SYSTEM_ERROR)
CancelRun → explicit 501 Not Implemented
Acceptance: GET /runs/{id} reflects correct state

Exit Criteria

Ingest 2+ SRA accessions → DRS objects in Ares (or shim)
nf-core "Hello World" runs → k8s pod completes → outputs registered as DRS objects
nf-core/rnaseq runs on minimal dataset with samplesheet
Platform deployed to cloud and accessible to invited external users

2.2 M2: Beta-1 — June 1, 2026 (~8 Weeks)

Vertical: V1 Biomedical Research OS (full pipeline)
Product: PA Cloud
Goal: The full research pipeline works end-to-end: Ingest → Execute → Analyse → Publish. The samplesheet editor closes the cohort-to-pipeline gap. oCIS workspace is live.

Scope (Adds to M1)

Analysis and Publishing:

pa-analysis-agent — methodology selection (DESeq2, edgeR, limma-voom via constrained decision matrix), LLM code generation (Python and R), three-stage validation framework (pre-execution, post-execution, plausibility)
Feature selection (MRMR, Boruta, SHAP) — same infrastructure as DESeq2/edgeR: new rows in the decision matrix + new validation rules, no new services
LLM Sandbox Phases 1–2 (per Khoa's epic): pa-sandbox-mcp-server wrapping llm-sandbox, pre-built images (pa-bio:3.11, bioconductor:3.18), K8s namespace + NetworkPolicy, pa-db-mcp-server (read-only PostgreSQL, OPA gating), pa-storage-mcp-server (DRS resolution, presigned URLs), self-healing debugger (basic)
Publishing module — minimal metadata form, resource selection (notes primary), DRS URI resolution for packaging, RO-Crate 1.2 generation, ZIP to MinIO, background task with per-item tracking

Samplesheet Editor (per Anurag's Researcher Workflow epic):

Cohort export → in-platform spreadsheet editor (AG Grid or equivalent)
File path and metadata picker panel integrated into editor
Save samplesheet as project file (DRS-registered, reusable)
Pipeline selection + "Save & Run" submission from editor toolbar
Run list with polling, run detail with human-readable error translation, output browser

oCIS Workspace:

oCIS v8 with s3ng driver on MinIO, Keycloak SSO, OPA access control
Project Spaces model: each project maps to an oCIS Space
Basic file browsing, sharing, and organisation through web UI
Not on the pipeline critical path but required for the workspace experience

Metadata Module (per Samyak's epic, partial):

Workstream 1 begin: unconstrained SRA fetcher with checkpointing, Postgres loader optimisation
Workstream 2: dynamic query builder, filter API endpoints
Workstreams 3–4 (Contrast data model, Ares integration) deferred until post-M3 clarity on user flow. The distinction between intra- and inter-cohort contrasts needs further design; saving sub-cohorts (original filters + additional filters) may be sufficient. Ares integration also deferred pending readiness confirmation.

Timeline

Weeks	Backend (Analysis)	Backend (Platform)	Frontend	DevOps
7–8	Agent scaffold. Methodology selector + constrained parser. Code generator.	LLM Sandbox Phase 1 (spike, images, pa-sandbox-mcp-server, NetworkPolicy).	Samplesheet editor (AG Grid eval, edit/add/delete). Cohort export button.	oCIS deployment (s3ng, Keycloak, OPA).
9–10	Validation framework (3 stages). Sandboxed TES execution. RabbitMQ wiring.	LLM Sandbox Phase 2 (pa-db-mcp-server, pa-storage-mcp-server, MCP auth).	File path picker with metadata panel. Save samplesheet.	—
11–12	Publishing module (form, selection, DRS resolution, RO-Crate, ZIP).	Self-healing debugger. E2E integration test (agent → MCP → sandbox → S3 → DB).	Pipeline catalogue. "Save & Run." Run list + detail + error translation.	—
13–14	Feature selection (matrix rows + validation rules). Golden file CI tests.	Metadata Module WS1 begin (fetcher, checkpointing). WS2 (query builder, filter API).	Output browser. oCIS frontend integration.	Postgres loader optimisation.

Exit Criteria

Full pipeline: Ingest → Execute → Analyse → Publish on MRSA anchor use case
DESeq2 sandbox → results registered + validation passes
Validation framework catches all known-bad test cases (golden file: pasilla dataset, Spearman > 0.9)
Publish job → valid RO-Crate ZIP with correct metadata
Samplesheet: export cohort → edit → insert paths via picker → Save & Run → pipeline executes
oCIS: file browsing, sharing, and project Spaces operational

2.3 M3: Beta-2 — August 3, 2026 (~8 Weeks)

Verticals: V1 hardening + V2 Project Presence (foundation)
Product: PA Cloud (On-Prem preparation begins)
Goal: Platform ready for pilot UAT. Accounting live. Security baseline for procurement conversations. First elements of the AI research companion.

Scope (Adds to M2)

V1 Hardening:

Accounting/billing — usage metering (CPU-hours, GB-months, ingestion volume per tenant), Stripe for cloud, institutional PO for On-Prem pipeline, storage tier quotas, overage alerts, free academic tier, billing audit trail, usage dashboard
LLM Sandbox Phases 3–4 (per Khoa's epic): agent MCP client wiring, multi-step reasoning loop, bio prompt engineering (DESeq2, limma, scanpy; validate 5+ patterns), Monaco Editor workbench (syntax highlighting, runtime selector, run/stop, SSE log streaming, inline output rendering, .ipynb export)
Metadata Module WS1 continued/completed (full SRA ingestion pipeline), WS2 completed (filter API endpoints with dynamic counts)
ADR-10 benchmark (10M records in PostgreSQL, ClickHouse, DuckDB) — database engine decision
Load testing: 5 concurrent ingestions, 10 concurrent workflows, 3 concurrent analyses, 2 concurrent publishes. Targets: 500 MB/s ingestion, <5s workflow submission, <5s sandbox cold start, 10 GB artefact in <10 min
Failure injection: RabbitMQ drop, MinIO kill, Ares unavailable, ResourceQuota exhaustion, invalid S3 URIs
Documentation, API reference, deployment runbook
Pilot UAT with anchor partner

V2 Foundation (basic):

Persistent research memory — soul documents applied to per-researcher context, loaded at session start from Qdrant
Proactive monitoring seed — heartbeat cron for pipeline completion notifications (pipeline.analysis.complete → researcher notification via in-app or email, not yet Matrix)
Notes ELN AI integration as the first element of the research companion (contextual suggestions based on project history)

Security Hardening (On-Prem preparation):

Wazuh Phases 1–4 (cluster provisioning, TLS, agent rollout, log forwarding)
Alertmanager → calert wiring
Cinder encryption on ClickHouse volumes
mTLS on observability cluster
S3 cold storage with 12-month retention

Standards Framework

Capability	Evidence	Test Gate
Data at rest	Cinder encryption	Volume metadata confirms
Data in transit	Istio mTLS + internal mTLS	Cert validation
Alert delivery	Alertmanager → calert → Google Chat	Synthetic alert within 60s
Log retention	S3 cold storage, 12-month lifecycle	Oldest log ≥ 365 days
SIEM baseline	Wazuh Phases 1–4	Dashboard: all agents connected
Access control	OPA on all services	Policy test suite passes
Audit trail	Billing + research provenance	Export covers 30 days
Analysis correctness	Three-stage validation	CI: all known-bad caught

Exit Criteria

All M2 criteria hold
Billing: at least one tenant metered, can generate invoice
Monaco workbench: write/run code, view outputs, export .ipynb
ADR-10 benchmark complete, database decision made
Wazuh Phases 1–4 operational
Encryption, mTLS, cold storage active
Load test targets met, P0/P1 bugs resolved
Pilot partner UAT sign-off
V2: researcher receives pipeline completion notification without polling; soul document loaded per session

2.4 M4 — October 6, 2026 (~8 Weeks)

Verticals: V2 Project Presence (companion) + V3 Sovereign Enterprise Suite (foundation)
Product: Cloud + On-Prem
Goal: The AI research companion is operational. The sovereign productivity stack begins deployment. First On-Prem institutional deployment.

Team: Seed funding closes. Team expands from 6 to 10–14. New hires: +2 backend, +1 frontend, +1 ML/agent engineer, +1 DevOps, +1 bioinformatics full-time, +1 product/design optional.

Scope from M4 onward is shaped by user feedback from M1–M3. The items below represent the planned direction; specifics will be adjusted.

V2 Scope (basic)

pa-relay service — Matrix → LiteLLM bridge, session management, audit logging, NO_REPLY suppression
Researcher-specific soul documents with persistent memory (project history, active hypotheses, dataset context maintained across sessions via Qdrant)
Heartbeat cron loop — configurable per deployment: pipeline completion alerts, cohort match notifications, surveillance signal thresholds
Skills registry YAML + CLI wrappers — priority conversions: pa-drs-fetch, pa-policy-search, pa-cohort-query, pa-compliance-check

V3 Scope (foundation, basic)

Matrix — Deployment for institutional messaging and chat. Element as the client. Bridges to pa-relay for agent-accessible communication channels.
oCIS hardened — Document governance: fine-grained access controls, audit trails, retention policies. oCIS Spaces enforced per project/team.
Collaborative editing — Begin integration of OnlyOffice or Collabora via WOPI protocol in oCIS. Basic document and spreadsheet co-editing. Tool selection decision required early in M4.
Video conferencing — Evaluate and select: Jitsi Meet, Element Call (Matrix-native), or BigBlueButton. Deploy basic instance integrated with institutional auth (Keycloak SSO). Decision based on: sovereign deployability, Matrix integration quality, and institutional fit.

On-Prem

Wazuh Phases 5–7 (FIM, CVE scanning, CIS benchmarks, compliance modules)
Wazuh architecture documentation for procurement
First institutional On-Prem deployment preparation
Full SRA ingestion completion (149M records, using ADR-10 engine)

Infrastructure: MinIO → SeaweedFS Migration

MinIO Community Edition has been phased out, creating a licensing and business continuity risk for the platform. M4 begins the migration to SeaweedFS as the S3-compatible object storage layer.

Deploy SeaweedFS on private K8s
Migrate all services currently using MinIO: Upgate (file upload), Nextflow workDir, Velero backup, Harbor registry backend, DRS/presigned URLs, oCIS s3ng driver, publishing staging area
Validate S3 API compatibility across all integration points
Decommission MinIO

This migration is feasible in M4 due to the expanded team (10–14 engineers). Assign 1 backend + 1 DevOps engineer to the migration track in parallel with V2/V3 feature work.

Infrastructure: Storage Layer Autoscaling (On-Prem)

On-Prem deployments run on OpenStack, which lacks managed load balancers (no Magnum/Octavia). Custom autoscaling is required:

SeaweedFS volume server horizontal scaling triggered by Prometheus capacity metrics
Cinder volume auto-expansion for persistent storage
K8s worker node autoscaling via OpenStack Nova (Terraform + Ansible triggered by Prometheus)
Capacity alerting integrated into the observability stack

This work begins in M4 and may extend into M5 depending on the complexity of each institution's OpenStack environment.

Exit Criteria

V2: researcher has persistent AI companion that remembers project context across sessions, receives proactive alerts
V3: Matrix operational for team chat, collaborative editing functional (basic), video conferencing deployed
On-Prem: first deployment environment provisioned with Wazuh active
Full SRA queryable via ChatNexus
SeaweedFS deployed and validated; at least one major service (Upgate or Nextflow workDir) migrated from MinIO

2.5 M5 — December 1, 2026 (~8 Weeks)

Verticals: V3 Sovereign Enterprise Suite (productivity) + V4 Government OS (foundation)
Product: Cloud + On-Prem
Goal: The full sovereign productivity stack is operational. Government deployment architecture is validated.

V3 Scope (operational)

Matrix: channels, threads, file sharing, search — operational for institutional use
Collaborative editing: OnlyOffice/Collabora hardened — version history, comment threads, review workflows
Video conferencing: operational, integrated with calendar/scheduling if applicable
Governance layer: role-based access, audit trails, retention policies, compliance reporting integrated across oCIS + Matrix + editing tools
AI-assisted document work: summarisation, classification, policy gap analysis within sovereign perimeter

V4 Scope (foundation, basic)

Federated analytics architecture — design and prototype the hub-and-spoke model for multi-institutional collaboration (building on PAHO syndromic surveillance pattern)
Multi-jurisdictional policy intelligence — extend Policy Team agent for cross-country regulatory analysis
Compliance reporting framework — HIPAA, GDPR, ISO 27001 templates and continuous monitoring
Multi-region deployment preparation (EU sovereign + MENA)

Platform

MinIO → SeaweedFS migration completion (if not finished in M4) and decommission
Storage layer autoscaling completion (SeaweedFS scaling, Cinder expansion, K8s node autoscaling on OpenStack)
DRS URI resolution inside workflow files (post-ADR-3 — if user feedback from M1–M3 confirms need)
Repository connectors (Zenodo, Dataverse) — basic publish-to-external
Incremental re-ingestion (download only delta when cohort updated)
CLI/API ingestion for power users
Metadata Module WS3–4 revisited: Contrast data model and Ares integration, if user feedback and design clarity warrant it

Exit Criteria

V3: full productivity suite operational (files + chat + docs + video) under institutional governance
V4: federated analytics prototype functional between two test environments
At least one On-Prem institutional client in active deployment
Multi-region: deployment architecture validated

2.6 M6: V1 Full — March 2027 (~12 Weeks)

Verticals: V4 Government OS (operational) + V5 DeSci Marketplace (foundation)
Product: Cloud + On-Prem
Goal: Full platform at production scale. Paying institutional clients. Agentic OS mature.

Team assumption: Scale from 6 → 10–14 with seed funding. Hires: +2 backend, +1 frontend, +1 ML/agent, +1 DevOps, +1 bioinformatics full-time, +1 product/design optional.

V4 Scope (operational)

Federated hub-and-spoke productised — municipality/institution nodes transmit aggregates only, raw data never leaves origin
Compliance reporting operational for HIPAA, GDPR, ISO 27001
Multi-region: at least one EU and one MENA deployment live
Air-gapped deployment validation for government contexts
Sovereign AI inference: full reasoning pipelines inside government security perimeter

V5 Scope (foundation, design + prototype only)

On-chain hypothesis registration — design, smart contract prototype, timestamp-based priority claim
Research data objects — DRS-registered datasets with on-chain provenance records
Federated data marketplace architecture — design for governed dataset access through protocol
Protocol fee model — design for transaction-based revenue on dataset access, hypothesis citation, federated analytics

Agentic OS (mature)

Validation Triad — Naysayer + ELO Rater + Risk Analyser, two-round deliberation, DRS audit artefact for every analysis output
Agent self-modification (constrained) — operational sections of soul.md modifiable with human confirmation, constitutional constraints hash-verified
Skills registry complete — all stateless MCP servers converted to CLI wrappers
Full heartbeat configurations operational: WHO/PAHO (30-min), Saudi MoH (daily), pharma (event-triggered)

Platform Capabilities

Custom workflows (non-nf-core, user-uploaded)
Live log streaming, cancel/resume for long-running pipelines
RAG-driven methodology selection (replaces decision matrix)
Workflow parameter UI generation from nextflow_schema.json

Exit Criteria

V4: federated deployment operational between 2+ institutions
V5: hypothesis registration prototype functional on testnet
Agentic OS: heartbeat delivers proactive alerts for at least one live deployment; Validation Triad runs on analysis outputs
At least 2 institutional clients with active billing
Multi-region live
Platform handles 50+ concurrent researchers without degradation

2.7 Dependency Graph

   ┌──────────────────────────────┐
   │  M0: BLOCKERS                │
   │  Ares confirmed/shimmed      │
   │  Notes chain endpoint        │
   │  Alpha hard deadline          │
   └──────────────┬───────────────┘
                  │
   ┌──────────────▼───────────────┐
   │  M1 ALPHA (Apr)              │
   │  V1: Ingest + Execute        │
   │  Cloud deployment            │
   │  ⚠ CRITICAL PATH:           │
   │  Nextflow gRPC plugin        │
   └──────────────┬───────────────┘
                  │
   ┌──────────────▼───────────────┐
   │  M2 BETA-1 (Jun)            │
   │  V1: + Analyse + Publish     │
   │  + Samplesheet editor        │
   │  + oCIS + LLM Sandbox 1–2   │
   │  + Metadata WS1–2 begin     │
   └──────────────┬───────────────┘
                  │
   ┌──────────────▼───────────────┐
   │  M3 BETA-2 (Aug)            │
   │  V1: hardening + billing     │
   │  V2: foundation (memory,     │
   │      heartbeat, ELN AI)      │
   │  + LLM Sandbox 3–4          │
   │  + Security baseline         │
   │  + Pilot UAT                 │
   └──────────────┬───────────────┘
                  │
   ┌──────────────▼───────────────┐
   │  M4 (Oct)                    │
   │  V2: companion (relay, souls,│
   │      skills, heartbeat)      │
   │  V3: foundation (Matrix,     │
   │      collab editing, video)  │
   │  + First On-Prem deploy      │
   │  + Full SRA (149M)           │
   └──────────────┬───────────────┘
                  │
   ┌──────────────▼───────────────┐
   │  M5 (Dec)                    │
   │  V3: operational (full       │
   │      productivity suite)     │
   │  V4: foundation (federated   │
   │      analytics, compliance)  │
   │  + Multi-region prep         │
   └──────────────┬───────────────┘
                  │
   ┌──────────────▼───────────────┐
   │  M6 V1 FULL (Mar 2027)      │
   │  V4: operational (gov OS)    │
   │  V5: foundation (DeSci)      │
   │  + Agentic OS mature         │
   │  + Paying clients            │
   └──────────────────────────────┘

Critical path through M1–M2: Ares readiness → SRA download + DRS registration (M1 Wk 1–2) → Nextflow gRPC plugin (M1 Wk 1–4) → DRS output registration (M1 Wk 3–5) → Analysis sandboxed execution (M2) → End-to-end integration (M3).

3. Blockers, Risks, and Open Questions

3.1 Pre-Sprint Blockers

See §2.0 (B1–B4).

3.2 Critical Risks

R1: LLM-Generated Analysis Correctness. Plausible but wrong results are worse than crashes. Mitigation: Three-stage validation, golden file CI, validation_failed as distinct status, researcher review always required. Residual: Novel failure modes not in validation rules; partially mitigated by Validation Triad (M6).

R2: Manual S3 URI Entry UX. Error-prone for non-technical users. Mitigation: Client-side validator, file picker panel (Anurag's epic FE-02-B), bulk copy. Trigger: If top-3 adoption blocker, promote DRS URI resolution.

R3: Nextflow gRPC Plugin (Schedule). The Nextflow plugin is the critical path. Mitigation: Most experienced engineer, "Hello World" first, weekly check-in with escalation.

R4: API Contracts Not Finalised. Samplesheet editor depends on 5 backend contracts (C-1 through C-5). Mitigation: Agree in M2 Week 1, frontend builds against mocks.

R5: V3 Tool Selection. Collaborative editing (OnlyOffice vs Collabora) and video conferencing (Jitsi vs Element Call vs BBB) require evaluation against sovereign deployability, Matrix integration, and institutional fit. Wrong choice means migration cost. Mitigation: Spike evaluation in M4 Week 1, decide before committing engineering time.

R5b: MinIO Community Edition Phased Out. MinIO CE is no longer available, creating licensing and business continuity risk. Every storage-dependent service (Upgate, Nextflow, Velero, Harbor, DRS, oCIS, publishing) is affected. Mitigation: SeaweedFS migration planned for M4 with expanded team. Validate S3 compatibility early. If SeaweedFS reveals incompatibilities, evaluate Garage or Ceph RGW as fallback.

3.3 Moderate Risks

R6: Workflow DSL2 compatibility — trs-syncer filters, TRS /tests, fallback allowlist.
R7: Output hijacking via publishDir — test with nf-core/rnaseq, batch DRS registration.
R8: Publishing large artefacts — background task, per-item tracking, 50 GB cap.
R9: Samplesheet column mismatch — fail and surface error (MVP), schema validation (post-MVP).
R10: Editor state loss — auto-save to browser storage every 30s.
R11: Metadata Module scale — 149M records may exceed PostgreSQL. ADR-10 benchmark in M3.
R12: V4 federated analytics complexity — hub-and-spoke with privacy-preserving aggregation is architecturally complex. PAHO deployment is the proof point but generalising it is non-trivial.

3.4 Open Questions

#	Question	Blocking?	Deadline
Q1	~~Alpha hard deadline~~	M1	Resolved: April 6, deployed for invited users
Q2	Cloud provider selection (primary EU provider)	M1 deploy	M1 Wk 1
Q3	Billing model — Stripe vs PO vs hybrid	M3 billing	M2 end
Q4	oCIS scope for M2 — read-only browser or full workspace?	No	M2 Wk 7
Q5	ADR-10 database engine (PostgreSQL vs ClickHouse vs DuckDB)	Metadata expansion	M3 Wk 19
Q6	Contrast model design — sub-cohorts sufficient vs explicit contrasts?	Metadata WS3–4	Post-M3
Q7	Samplesheet generation per workflow type (column mapping)	Auto-generation	Post-M3
Q8	Publishing module extraction to standalone service	No	Post-M2
Q9	V3 collaborative editor: OnlyOffice vs Collabora	M4	M4 Wk 1
Q10	V3 video conferencing: Jitsi vs Element Call vs BBB	M4	M4 Wk 1
Q11	Monorepo (Nx) adoption timing across all services	No	Post-M1
Q12	Istio ambient mode fallback — sidecar mode pre-configured?	No	M3
Q13	TUS vs S3 presigned URL upload consolidation	No	Post-M3
Q14	Wazuh Indexer storage growth rate — cold-tier from day one?	No	M3
Q15	R language prompt engineering and validation — testing effort for Bioconductor workflows? R execution included from M2 but prompt quality needs validation.	No	M2
Q16	MinIO → SeaweedFS migration — S3 compatibility gaps? Fallback to Garage or Ceph RGW if SeaweedFS insufficient?	M4	M4 Wk 2

4. High-Level Architecture

4.1 Pipeline Overview

Hypothesis Generator
    │
    ▼
Cohorts Service ◄── MDI Postgres
    │
    ▼
┌──────────────────────────────────────────────────────────────────────┐
│  pa-pipeline-orchestrator                                            │
│                                                                      │
│  Ingest ──► Execute (Metis/WES) ──► event ──► pa-analysis-agent     │
│  (SRA → MinIO/DRS)  (Nextflow on k8s)        (DEA + publish)       │
└──────────────────────────────────────────────────────────────────────┘
    │                                                   │
    ▼                                                   ▼
MinIO ◄──► Ares (DRS)                        Results (DRS objects)
                                                        │
                                                        ▼
                                              Publishing Module
                                              (RO-Crate → ZIP → MinIO)

4.2 Two-Service Model

pa-pipeline-orchestrator — data ingestion and atomic task execution. Ingestion jobs, per-file tracking, TES task state. Exposes /ingestion/jobs and /ga4gh/tes/v1/tasks.

pa-analysis-agent — methodology selection, code generation, validation, publishing. Consumes pipeline.workflow.complete, publishes pipeline.analysis.complete and publish.artefact.ready. Exposes /analyses, /methodologies, /artefacts.

Metis (WES) — existing GA4GH workflow execution. TRS/DRS resolution, Nextflow on k8s, MongoDB for run state.

Notes (ELN) — existing immutable append-only chain. Chain export for publishing.

4.3 Event Topology

RabbitMQ is the sole durable event bus. Redis for caching, heartbeats, ephemeral pub/sub only.

Exchange: pipeline_events (topic)
  ├── pipeline.ingestion.complete  → orchestrator
  ├── pipeline.workflow.complete   → analysis-agent
  ├── pipeline.analysis.complete   → notification service
  ├── publish.artefact.ready       → notification service
  └── drs.object.registered        → downstream consumers

4.4 LLM Inference

All inference on-premises. GPT-OSS 120B via vLLM (31.3 tokens/sec, 50 concurrent users). Embeddings: Qwen3-Embedding-0.6B (2×A100). No data leaves the sovereign environment.

4.5 oCIS Integration (M2+)

oCIS v8 provides the file workspace layer. s3ng storage driver on MinIO (shared storage backend with the pipeline). Keycloak SSO for authentication. OPA for access control consistent with the rest of the platform. Project Spaces model: each PA project maps to an oCIS Space, giving researchers a familiar file browsing and sharing experience.

Not on the pipeline critical path. The pipeline operates on DRS objects via MinIO; oCIS is the human-facing view of the same storage.

4.6 V3 Sovereign Productivity Stack (M4–M5)

The target architecture for V3 Sovereign Enterprise Suite combines four components, all self-hosted within the institutional or cloud environment:

oCIS — file management, project spaces, governance layer
Matrix (Element) — institutional messaging, channels, threads; bridges to pa-relay for agent interaction
OnlyOffice or Collabora — collaborative document/spreadsheet editing via WOPI protocol in oCIS (tool selection: Q9, decided M4 Wk 1)
Jitsi, Element Call, or BBB — video conferencing with Keycloak SSO (tool selection: Q10, decided M4 Wk 1)

All four share Keycloak for authentication and OPA for access policy. The governance layer (audit trails, retention, compliance reporting) spans all components.

4.7 Service Pattern

All PA services follow: FastAPI + Keycloak OIDC + structured logging (OpenTelemetry → SigNoz) + CNPG Postgres (RW/RO split) + Redis heartbeat + Harbor images + ArgoCD GitOps.

5. Use Cases

UC-1: Ingest Data from NCBI SRA (M1)

Orchestrator resolves accessions to SRR via MDI, deduplicates, surfaces pre-flight estimate, downloads from AWS Open Data (fallback: SRA Toolkit), uploads via Upgate, registers DRS objects. Partial success first-class. See §6.

UC-2: Execute a Nextflow Workflow (M1)

User selects nf-core workflow from TRS, copies S3 URIs from dataset browser (client-side validator), submits to Metis. Platform enforces --outdir. Outputs batch-registered. See §7.

UC-3: Assemble Samplesheet and Submit Run (M2)

Researcher exports cohort to in-platform editor. Platform pre-fills sample IDs and file paths. Researcher adds pipeline-specific columns, inserts file paths via picker panel, saves as project file, selects pipeline, clicks "Save & Run." Full spec: Anurag's Researcher Workflow epic.

UC-4: Run Statistical Analysis (M2)

Analysis agent selects method, generates code, validates (3 stages), submits sandboxed TES task, registers results. Full spec: §8 + Khoa's LLM Sandbox epic.

UC-5: Publish a Data Artefact (M2)

Minimal metadata form, resource selection, RO-Crate 1.2 ZIP packaging. Background task. See §9.

6. Data Ingestion — Design Detail

6.1 Download Strategy

Primary: AWS Open Data (no credentials, negligible cost). Fallback: SRA Toolkit Docker image. Accession resolution always to SRR via MDI first, E-utils only if not indexed. Paired-end: two DRS objects per accession.

6.2 Registration Paths

External uploads via Upgate (chunked, resumable → RabbitMQ → Ares). Internal outputs via direct path (POST /objects, milliseconds).

6.3 Ingestion Does NOT (M1–M2)

Assign cohort/contrast labels. Determine pipelines. Parse/validate file contents. Support controlled-access or non-SRA repositories.

7. Nextflow Execution — Design Detail

7.1 Tradeoffs

Manual S3 URIs. Client-side validator. WES via Metis. Native k8s (TES later). See ADR-2, ADR-3.

7.2 Output Capture

Recursive --outdir parse → sizes + MD5 → batch DRS registration → tag with pipeline/version/output_type/run_id/project_id → publish pipeline.workflow.complete.

8. Analysis Agent — Design Detail

8.1 Methodology Selection

Config-driven decision matrix. DESeq2, edgeR, limma-voom. Feature selection (MRMR, Boruta, SHAP) follows the identical pattern: new matrix rows + validation rules. Conditions evaluated by constrained parser — whitelisted variables + numeric literals + comparisons only. eval() prohibited.

8.2 Validation Framework

Three stages: pre-execution (imports, paths, no network, cohort_arm used), post-execution (required columns, no NaN/Inf, padj in [0,1], plot valid), plausibility (p-value KS test, fold-change variance, sample count cross-check). validation_failed is distinct from failed. Golden file CI: pasilla dataset, Spearman > 0.9. Full spec: Khoa's LLM Sandbox epic.

8.3 LLM Sandbox

Four MCP servers: pa-sandbox-mcp-server, pa-db-mcp-server, pa-storage-mcp-server, pa-search-mcp-server. Pre-built images (pa-bio:3.11, bioconductor:3.18, pa-bio-ml:3.11) in Harbor with DaemonSet pre-pull. R included from M2. Self-healing debugger: PII strip → diagnose → web search → patch → retry (max 2). Full spec: Khoa's LLM Sandbox epic.

9. Publishing Module — Design Detail

Exit point. Packages workspace resources into RO-Crate 1.2 ZIP. Notes are primary narrative source. Create staging → resolve/download → fetch note chains → ro-crate-metadata.json → ZIP → MinIO → notify. Background task, per-item tracking, 50 GB cap. Lives within pa-analysis-agent for M2.

10. Operational Readiness

10.1 Compliance Timeline

Compliance is minimal for M1–M2 (cloud product, provider handles infrastructure security). Hardening begins in M3 for On-Prem preparation, completes across M4–M5.

Item	Milestone	Product Target
Alertmanager → calert	M3	On-Prem
Cinder encryption	M3	On-Prem
mTLS observability	M3	On-Prem
S3 cold storage 12-month	M3	On-Prem
Wazuh Phases 1–4	M3	On-Prem
Wazuh Phases 5–7 (FIM, CVE, compliance)	M4	On-Prem
Wazuh architecture documentation	M4	On-Prem (procurement)
Full compliance reporting (HIPAA, GDPR, ISO 27001)	M5	On-Prem

10.2 Capacity Planning

Year 1 projected (10–20 researchers): 5–10 concurrent runs (burst 20), ~5 TB/month ingestion, ~85 TB MinIO Year 1, ~1,700–3,400 AUD/month storage, GPU at 10–30%, no cloud API spend for core pipeline.

Quotas per project: 16 CPU / 64 GB RAM / 500 GB ephemeral default. Per-workflow max 8 CPU / 32 GB RAM. Per-sandbox max 4 CPU / 8 GB RAM / 2h timeout.

10.3 Testing Strategy

Functional progression (M1–M2): TES echo → ingest 2 accessions → Hello World → rnaseq → DESeq2 sandbox → E2E → publish.

Validation CI (M2+): Known-bad inputs/outputs, golden file (pasilla, Spearman > 0.9).

Load testing (M3): 5 ingestions, 10 workflows, 3 analyses, 2 publishes concurrent. Targets: 500 MB/s, <5s submission, <5s cold start, 10 GB in <10 min.

Failure injection (M3): RabbitMQ kill, MinIO kill, Ares unavailable, quota exhaustion, invalid URIs.

Frontend profiles (M3+): 4 curated (Full, Research, Surveillance, Minimal) in CI.

11. Agentic OS Layer

Distributed across milestones rather than big-bang:

Component	Milestone	Notes
Soul documents (per-researcher)	M3	V2 foundation — persistent memory
Pipeline heartbeat notifications	M3	V2 foundation — basic proactive alerts
pa-relay (Matrix → LiteLLM)	M4	V2 companion — messaging channel
Skills registry + CLI wrappers	M4	V2 companion — lightweight tool access
Full heartbeat (multi-config)	M4–M5	WHO/PAHO, Saudi MoH, pharma
Validation Triad	M6	Two-round deliberation, DRS audit record
Agent self-modification	M6	Constrained, human confirmation, hash check

Appendix A: Architecture Decision Records

ADR	Decision	Status
ADR-1	Two services, not four	Active
ADR-2	WES (Metis), native k8s backend	Active
ADR-3	No DRS URI resolution in workflow files (MVP)	Active
ADR-4	Dual file registration path	Active
ADR-5	RabbitMQ as primary event bus	Active
ADR-6	Decision matrix, constrained parser, no eval()	Active
ADR-8	Nx monorepo, Bazel on pain points	Active
ADR-9	Adopt agentic OS patterns, don't fork OpenClaw	Active
ADR-10	Database for full SRA (149M). Benchmark M3.	Pending
ADR-11	LLM Sandbox engine (llm-sandbox, MIT)	Active
ADR-12	Three-tier multi-tenancy, OPA	Implemented
ADR-13	Four-tier deployment environments	Implemented
ADR-14	Zero-trust (Istio ambient, OPA sidecars)	Implemented
ADR-15	Frontend 9 modules, runtime white-labelling	Implemented
ADR-16	SIEM (Wazuh)	Planned (M3)
ADR-17	MinIO → SeaweedFS migration. Deploy SeaweedFS, migrate all S3 consumers, validate compat, decommission MinIO.	Planned (M4)
ADR-18	Storage layer autoscaling (On-Prem). Custom Terraform + Ansible scaling for SeaweedFS, Cinder, K8s nodes on OpenStack (no Magnum/Octavia).	Planned (M4–M5)

Document	Owner	Maps To
LLM Sandbox Epic (PA-SANDBOX)	Khoa	Phases 1–2 → M2, Phases 3–4 → M3
Researcher Workflow Epic	Anurag	Samplesheet editor + run monitoring → M2
Metadata Module Epic	Samyak	WS1–2 → M2/M3, WS3–4 deferred
Admin Docs Hub Epic	Alex	Independent track
Observability Cluster Summary	—	Compliance detail for §10.1
PA Atlas v4.0	Boris	Strategic vision, 5 verticals, commercial model
Alpha Release Roadmap	CTO	Superseded by §2.1
Platform Functionality Roadmap	Boris	Strategic framing, superseded by §2
Milestones & v4→v5 Delta	Boris	Cross-reference driving this version

Appendix C: Database Schemas

C.1 Pipeline Orchestrator

CREATE TABLE ingestion_jobs (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    target_dataset_id UUID NOT NULL,
    accessions JSONB NOT NULL,
    status TEXT NOT NULL DEFAULT 'queued',
    requested_by TEXT NOT NULL,
    error_message TEXT,
    created_at TIMESTAMPTZ DEFAULT now(),
    started_at TIMESTAMPTZ,
    completed_at TIMESTAMPTZ
);

CREATE TABLE ingestion_files (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    ingestion_job_id UUID NOT NULL REFERENCES ingestion_jobs(id),
    accession TEXT NOT NULL,
    filename TEXT NOT NULL,
    drs_uri TEXT,
    status TEXT NOT NULL DEFAULT 'pending',
    size_bytes BIGINT,
    checksum_md5 TEXT,
    error_msg TEXT
);

CREATE TABLE tes_tasks (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    analysis_job_id UUID,
    name TEXT,
    state TEXT NOT NULL DEFAULT 'QUEUED',
    security_profile TEXT NOT NULL DEFAULT 'trusted',
    submitted_by TEXT NOT NULL,
    inputs JSONB NOT NULL,
    outputs JSONB NOT NULL,
    executors JSONB NOT NULL,
    resources JSONB,
    code_bundle JSONB,
    logs JSONB,
    output_drs_uris JSONB,
    created_at TIMESTAMPTZ DEFAULT now(),
    started_at TIMESTAMPTZ,
    completed_at TIMESTAMPTZ
);

C.2 Analysis Agent

CREATE TABLE analysis_jobs (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    hypothesis_id UUID,
    workflow_run_id TEXT NOT NULL,
    cohort_id UUID NOT NULL,
    assay_type TEXT NOT NULL,
    status TEXT NOT NULL DEFAULT 'pending',
    methodology JSONB,
    validation_errors JSONB,
    created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
    completed_at TIMESTAMPTZ,
    created_by UUID NOT NULL
);

CREATE TABLE analysis_results (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    analysis_job_id UUID NOT NULL REFERENCES analysis_jobs(id),
    tes_task_id UUID NOT NULL,
    method_name TEXT NOT NULL,
    method_role TEXT NOT NULL,
    filename TEXT NOT NULL,
    file_type TEXT NOT NULL,
    drs_uri TEXT NOT NULL,
    result_role TEXT NOT NULL,
    validation_status TEXT DEFAULT 'pending',
    validation_details JSONB,
    created_at TIMESTAMPTZ DEFAULT now()
);

CREATE TABLE publish_artefacts (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL,
    name TEXT NOT NULL,
    description TEXT NOT NULL,
    license TEXT NOT NULL,
    date_published DATE NOT NULL DEFAULT CURRENT_DATE,
    status TEXT NOT NULL DEFAULT 'staging',
    minio_path TEXT,
    zip_size_bytes BIGINT,
    error_msg TEXT,
    created_at TIMESTAMPTZ DEFAULT now(),
    completed_at TIMESTAMPTZ
);

CREATE TABLE publish_artefact_items (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    artefact_id UUID NOT NULL REFERENCES publish_artefacts(id),
    resource_type TEXT NOT NULL,
    resource_id TEXT NOT NULL,
    display_name TEXT,
    zip_path TEXT,
    size_bytes BIGINT,
    status TEXT NOT NULL DEFAULT 'pending'
);

Appendix D: Existing Services

Service	Language	Status
IAM	Python/FastAPI	Production
Upgate	Rust/Axum	Production
Ares (TRS/DRS)	Rust/Axum	Production
Metis (WES)	Rust	Production
zipRS	Python	Production
Notes (ELN)	—	Production
GAS (Gateway)	Python	Production
ChatMDI/ChatNexus	Python	Production
Forge (Hypothesis)	Python	Production
Research Team	Python	Production
Policy Team	Python	Production
General Agent	Python	Production
RMS (RAG)	Python	Production
Frontend	React 18/TS	Production
pa-pipeline-orchestrator	Python/FastAPI	New (M1)
pa-analysis-agent	Python/FastAPI	New (M2)

Appendix E: Frontend Architecture

Technology Stack

React 18, TypeScript 5.6, Vite 6. State management: Redux Toolkit + RTK Query with 14 backend service slices, redux-persist for workspace and upload state. UI: Radix UI + shadcn/ui + Tailwind CSS v3. Rich text: TipTap v3 (ProseMirror). Code editor: Monaco. Charts: Recharts. Auth: oidc-client-ts + Keycloak with cross-tab session sync. GA4GH components: Elixir Cloud Components (@elixir-cloud/*).

SSE Streaming Protocol

All AI-facing endpoints use Server-Sent Events with 8 event types: data-conversation, data-agent, text-start, text-delta, text-end, tool-output-available, error, finish. A 1.5-second thinking stall indicator fires during long reasoning steps. Frontend consumes via Vercel AI SDK v5 with 16 composable chat primitives.

White-Labelling

Single Docker image serves all environments. runtime-env.js generated at container startup injects: VITE_BRAND_NAME, VITE_PRIMARY_COLOR (OKLCH), logo URLs, backend service URLs, and VITE_ENABLED_MODULES. Three-tier tenancy (Org → Tenant → Project) with workspace auto-provisioning. Cache isolation: switching projects resets all 9 RTK Query service caches simultaneously.

Feature Modules

Module	Status
Datasets (DRS CRUD, file tree, downloads)	Production
Cohort Builder (faceted filtering, visualisation)	Production
Workflows (TRS registry, public browser)	Production
Runs (WES submission, monitoring, outputs)	Production
Research (multi-tab AI chat)	Production
RAG Knowledge Bases (upload, tag, publish)	Production
Notes (TipTap, autosave, PDF export, AI)	Production
Exploratory Analysis (JupyterLab → Monaco in M3)	Production
Sidebar (AI assistant + Notes, global)	Production

Appendix F: Deployment Checklists

F.1 pa-pipeline-orchestrator (M1)

☐ CNPG migrations (ingestion_jobs, ingestion_files, tes_tasks)
☐ FastAPI scaffold (pa-auth, pa-logging, Redis heartbeat, Taskfile)
☐ Download pod image (ncbi/sra-tools + Upgate client)
☐ K8s namespace + ServiceAccount + RBAC
☐ MinIO credentials Secret
☐ RabbitMQ exchange pipeline_events + bindings
☐ NetworkPolicy for sandboxed TES tasks
☐ Istio routing for /ingestion/, /ga4gh/tes/
☐ OTEL + Prometheus scrape config
☐ CI → Harbor → Helm values
☐ Helm sub-chart in pa-platform

F.2 Metis (M1)

☐ Nextflow gRPC plugin
☐ WES endpoint stabilisation
☐ DRS output registration post-execution
☐ trs-syncer nf-core plugin
☐ trs-cache-filer Nextflow validation
☐ TRS /tests endpoint
☐ Frontend: workflow selector + dataset browser + result viewer
☐ OTEL integration

F.3 pa-analysis-agent (M2)

☐ CNPG migrations (analysis_jobs, analysis_results, publish_artefacts, publish_artefact_items)
☐ FastAPI scaffold
☐ pa-bio:3.11 + bioconductor:3.18 images → Harbor
☐ Methodology matrix YAML + constrained parser
☐ Validation framework + CI test suite
☐ RabbitMQ consumer (workflow.complete) + publisher (artefact.ready)
☐ MinIO credentials (staging + artefacts)
☐ OTEL + Prometheus
☐ Helm sub-chart + Keycloak client
☐ Frontend: analysis UI + publish section

Appendix G: Deployment Infrastructure

GitOps and Sync Waves

All 30+ services are managed by ArgoCD via the App-of-Apps pattern. Terraform handles one-time bootstrap (cluster, service mesh, GitOps controller, messaging operator). Helm charts with environment-specific values files. 9 ordered sync waves:

cert-manager
Operators (CloudNativePG, MongoDB, RabbitMQ)
Infrastructure (Keycloak, PostgreSQL, Redis, MinIO/SeaweedFS, RabbitMQ clusters)
Platform services (IAM, Gateway, domain services)
RAG/AI services (Qdrant, RMS, GAS, agent teams) 6–9. Progressive application layers

CI/CD: image tag propagation → infra repo webhook → ArgoCD detect → auto-deploy with exponential backoff retry (500 attempts).

Preview Environments (ADR-13)

Preview environments are auto-provisioned by GitHub Actions on preview/ branches: Talos cluster on cloud → Terraform bootstrap (mesh, ArgoCD, RabbitMQ Operator) → full platform via ArgoCD → auto-cleanup on branch deletion. Each preview gets isolated Terraform state. Provisioning completes within minutes.

Constraints: concurrent preview environments limited to 3 via GitHub Actions concurrency groups. Auto-teardown after 48-hour inactive TTL. Monthly cloud budget tracked with alerts at 80%.

Deployment Tiers

Tier	Infrastructure	Sync Mode	Purpose
Stable	kubeadm	Manual gate	Production
Staging	kubeadm	Continuous	Integration testing
Dev	kubeadm	Continuous	Development (relaxed constraints)
Preview	Talos	Ephemeral per-branch	Feature isolation

Appendix H: Ingestion Design Q&A

Key decisions from the team design review:

Pre-flight estimates required, explicit acknowledgement before download
Partial success first-class — per-file status, don't abort whole job
Auto-retry 2–3× with backoff; distinguish transient vs source errors
Cohort editing post-ingestion supported; incremental re-ingestion without re-download
Group reassignment possible without re-downloading
Background jobs, async notification; all progress info (%, count, ETA)
6+ hour downloads acceptable; always offer free tier
Human-readable errors only; never raw stack traces
Re-runs with different parameters tracked distinctly

Full transcript: v4.0 Appendix C.

Appendix I: RO-Crate Metadata Structure

{
  "@context": "https://w3id.org/ro/crate/1.2/context",
  "@graph": [
    {
      "@type": "CreativeWork",
      "@id": "ro-crate-metadata.json",
      "conformsTo": {"@id": "https://w3id.org/ro/crate/1.2"},
      "about": {"@id": "./"}
    },
    {
      "@type": "Dataset",
      "@id": "./",
      "name": "{user-provided}",
      "description": "{user-provided}",
      "license": {"@id": "https://creativecommons.org/licenses/by/4.0/"},
      "datePublished": "2026-07-01",
      "author": {"@id": "#author-1"},
      "hasPart": [
        {"@id": "notes/{note_id}.jsonl"},
        {"@id": "datasets/{drs_id}/{filename}"},
        {"@id": "runs/{run_id}/outputs/"},
        {"@id": "workflows/{trs_id}/"}
      ]
    },
    {
      "@type": "Person",
      "@id": "#author-1",
      "name": "{from Keycloak}",
      "@identifier": "https://orcid.org/..."
    }
  ]
}

Output ZIP structure:

data-artefact-{date}-{id}.zip
  ├── ro-crate-metadata.json
  ├── notes/
  ├── datasets/
  │     └── {drs_object_id}/{filename}
  ├── runs/
  │     └── {run_id}/outputs/
  └── workflows/
        └── {trs_tool_id}/

Appendix J: Methodology Decision Matrix

methodology_matrix:
  bulk_rnaseq:
    counts:
      - condition: "n_min >= 3"
        primary: DESeq2
        alternative: edgeR
      - condition: "n_min < 3"
        primary: limma-voom
        alternative: edgeR
  # Feature selection follows the same pattern:
  # feature_selection:
  #   high_dimensional:
  #     - condition: "n_features > 1000"
  #       primary: MRMR
  #       alternative: Boruta
  #     - condition: "n_features <= 1000"
  #       primary: SHAP
  #       alternative: Boruta

Conditions are evaluated by a constrained recursive-descent parser. Accepted tokens: whitelisted variable names (n_min, n_total, n_group_a, n_group_b, n_features), numeric literals, comparison operators (<, >, <=, >=, ==). eval() is prohibited. Unparseable conditions are rejected at config load time.

Platform Roadmap

On this page