Platform Roadmap
PacificAnalytics/unified-roundtrip-roadmap
PA Platform Roadmap v5.2
Status: Version 5.2 — March 2026
Audience: Leadership, engineering leads, procurement reviewers
Supersedes: All prior Roadmap versions
Executive Summary
What We Are Building
The PA platform is a sovereign operating system for scientific discovery. It connects data ingestion, workflow execution, AI-driven analysis, and research publishing into a single integrated pipeline — enabling biomedical researchers to move from a research question to a statistically validated, citable result without leaving the platform.
Product Strategy
Two deployment models, sequenced by market readiness:
- PA Cloud — Hosted on trusted European cloud providers (Exoscale/SWITCH in Switzerland, Nebius in Netherlands, Genesis Cloud in Germany, Scaleway in France). Targets individual researchers and labs. Sovereignty through choice of national cloud provider. Primary development focus for the first 6 months (M1–M3).
- PA On-Prem — Sovereign deployment on institutional infrastructure. Full compliance and data security scope. From M4 onward, every feature ships for both Cloud and On-Prem simultaneously; what differs is the deployment modality, compliance posture, and governance layer.
Platform Verticals
The platform is organised around five verticals (see Atlas v4.0 for full strategic context). Each vertical builds on the prior:
| Vertical | Description | First Appears |
|---|---|---|
| V1 Biomedical Research OS | Ingest → Execute → Analyse → Publish | M1 (Alpha) |
| V2 Project Presence | Persistent AI research companion | M3 (foundation) |
| V3 Sovereign Enterprise Suite | Secure productivity (files, chat, docs, video) | M4 (foundation) |
| V4 Government OS | Federated analytics, compliance, multi-jurisdiction | M5 (foundation) |
| V5 DeSci Marketplace | On-chain research IP, data exchange protocol | M6 (foundation) |
Development Principle
Every capability ships in its most basic viable form at the milestone where it first appears. Refinement is driven by user feedback, not speculative feature completeness. This applies across all verticals and both products. The scope of M4–M6 is intentionally higher-level — it will be shaped by what we learn from real users in M1–M3.
Milestones (Bimonthly, 12 Months)
| Milestone | Target | Primary Product | Vertical Focus |
|---|---|---|---|
| M1 Alpha | April 6, 2026 | Cloud | V1 foundation |
| M2 Beta-1 | June 1, 2026 | Cloud | V1 full pipeline |
| M3 Beta-2 | August 3, 2026 | Cloud | V1 hardening + V2 foundation |
| M4 | October 6, 2026 | Cloud + On-Prem | V2 companion + V3 foundation |
| M5 | December 1, 2026 | Cloud + On-Prem | V3 productivity + V4 foundation |
| M6 V1 Full | March 2, 2027 | Cloud + On-Prem | V4 government + V5 foundation |
Team: 6 (4 backend, 1 frontend, 1 DevOps) + bioinformatics part-time through M3. Seed funding and team expansion to 10–14 happens in October 2026; the larger team is available from M4 onward.
Key Risks
- Ares DRS readiness — Multiple services depend on it; status is "under discussion." PostgreSQL shim fallback if unresolved by pre-sprint.
- LLM-generated analysis correctness — Plausible but wrong results are worse than crashes. Three-stage validation framework defined.
- Manual S3 URI entry UX — Error-prone. File picker and client-side validator mitigate. DRS URI resolution promoted if it becomes a top-3 adoption blocker.
1. Vision and Strategy
1.1 The Research Pipeline
The platform enables researchers to move from a research question to a statistically validated answer through an integrated pipeline, then publish that work as a citable, portable data artefact. Every file at every stage is a DRS object. Publishing is the exit point.
1.2 The Workspace Model
The platform is a workspace, not a repository. Metadata burden does not fall on the researcher during active work. Files are ingested without annotation, cohorts are assembled without rigid labels, workflows run without provenance overhead. When it is time to publish, the Notes ELN entries are the primary narrative thread; datasets, runs, and results are the supporting evidence.
1.3 Labels as Soft Associations
Metadata labels (cohort_arm, contrast_label, assay_type) are soft associations on DRS objects, not structural constraints. The same dataset can participate in multiple contrasts without duplication. Labels are resolved at runtime by the Cohorts Service, not baked in at ingestion. This avoids data redundancy, supports overlapping experimental designs, and keeps ingestion decoupled from analysis context.
1.4 Existing Foundation
The platform builds on a substantial existing base: the Medical Data Index (MDI) with billions of curated SRA metadata points; 1,600+ catalogued nf-core modules across 10 biological domains; a Notes service providing an immutable, cryptographically chained Electronic Lab Notebook; a multi-agent AI system with gateway routing, hypothesis generation, policy analysis, and natural-language database access; three-tier multi-tenancy with OPA enforcement; zero-trust networking with Istio ambient mTLS; four deployment tiers including ephemeral preview environments; and a white-labelled frontend with 9 feature modules. The full service inventory is in Appendix D.
1.5 Anchor Use Case: MRSA Surveillance
A researcher tests whether MRSA strains from international pilgrimage travellers are genetically distinct from locally-acquired strains:
- Query MDI for Staphylococcus aureus WGS data filtered by traveller status.
- Assemble two cohort groups via the Cohort Builder.
- Ingest relevant SRR accessions; files register as DRS objects.
- Assemble a samplesheet in the in-platform editor, submit an nf-core WGS pipeline via Metis WES.
- Analysis agent runs cohort-level SNP comparison in a sandboxed container.
- Record decisions in the Notes ELN throughout.
- Publish cohort, run, results, and notes as an RO-Crate.
2. Milestones
2.0 Milestone 0: Resolve Pre-Sprint Blockers
| # | Blocker | Action | Owner | Deadline |
|---|---|---|---|---|
| B1 | Ares DRS readiness | Confirm operational by M1 Week 2, or commit to PostgreSQL-backed DRS shim | Architecture lead | Pre-sprint |
| B2 | Notes chain export | Confirm GET /notes/:id/chain is implemented; if not, estimate effort and assign | Notes team | Pre-sprint |
| B3 | API contracts for samplesheet editor | Agree endpoint shapes C-1 through C-5 (see Anurag's Researcher Workflow epic) | Anurag + backend lead | M2 Week 1 |
| B4 | Alpha hard deadline | April 6, 2026 — deployed and accessible to invited external users | Boris | Confirmed |
2.1 M1: Alpha — Mid-April 2026 (~6 Weeks)
Vertical: V1 Biomedical Research OS (foundation)
Product: PA Cloud
Goal: A researcher can ingest public data, run a Nextflow pipeline, and see registered outputs. Minimum viable proof that the pipeline works end-to-end.
No compliance work in M1. The cloud deployment targets European providers where infrastructure-level security is handled by the hosting provider.
Scope
In:
- SRA data ingestion — FASTQ download (AWS Open Data, SRA Toolkit fallback), Upgate upload, DRS registration with source metadata, pre-flight size estimation with user acknowledgement, partial job status (per-file tracking, auto-retry with backoff), controlled-access flagging
- nf-core DSL2 workflow execution via Metis WES on Kubernetes — output DRS registration via direct path (batch POST /objects), platform-controlled --outdir
- RabbitMQ event topology — pipeline_events exchange, ingestion.complete and workflow.complete routing
- Frontend — ingestion UI (accession input, pre-flight, per-file progress), workflow selector (TRS query), dataset browser with copyable S3 URIs and "Copy all" bulk action, config upload with client-side S3 URI validator, run status dashboard with 10–15s polling, result browser with download links
- Cloud deployment on European provider (Exoscale or equivalent), accessible to invited users
Out (deferred to M2+):
- Analysis agent, publishing module, samplesheet editor, oCIS, accounting, compliance/Wazuh, Agentic OS
Timeline
| Week | Backend (2 engineers) | Backend (2 engineers) | Frontend | DevOps |
|---|---|---|---|---|
| 1 | Orchestrator scaffold (FastAPI, pa-auth, CNPG migrations, Redis heartbeat) | Nextflow gRPC plugin — TRS/DRS resolution, DSL2 param separation | — | Cloud environment provisioning |
| 1–2 | SRA download manager (accession resolution via MDI, dedup, AWS Open Data + fallback) | Nextflow plugin — --outdir enforcement. Target: "Hello World" acceptance | Ingestion UI: accession input, pre-flight, progress | K8s namespace, ServiceAccount, RBAC |
| 2–3 | Upgate integration, DRS registration, pre-flight estimation | trs-syncer nf-core plugin, trs-cache-filer validation | Dataset browser with S3 URIs, bulk copy | RabbitMQ exchange + bindings |
| 3–4 | RabbitMQ events, partial job status, controlled-access flagging | S3 provisioning + DRS output registration (batch). WES endpoint stabilisation. | Workflow selector, config upload with URI validator | OTEL instrumentation (Go + Python) |
| 5 | TRS /tests for top 20 modules | TES engine (trusted + sandboxed profiles), K8sJobWatcher | Run status dashboard, result browser | — |
| 6 | Integration testing, bug fixes | End-to-end demo on real SRA data | UX polish | Deploy to cloud |
Critical-Path Tasks
Nextflow gRPC Plugin for Metis
- TRS URI resolution (workflow repo) and DRS URI resolution (top-level config files submitted via WES)
- DSL2 parameter separation: workflow_params →
-params-file, resource overrides →-c - Platform-controlled
--outdirenforcement (CLI flags override user-provided outdir) - Does not parse contents of user-provided params or sample sheets for embedded DRS URIs
- Acceptance: "Hello World" module runs via POST /runs → successful k8s pod execution
Project-Scoped S3 Provisioning + DRS Output Registration
- Generate S3 output paths from OPA project context
- Post-execution: recursive --outdir parse, compute sizes + MD5 checksums, batch POST /objects to Ares
- Acceptance: Workflow outputs in correct S3 location AND registered in DRS
WES Run Management Endpoint Stabilisation
- Cursor-based pagination, correct lifecycle states (QUEUED, RUNNING, COMPLETE, SYSTEM_ERROR)
- CancelRun → explicit 501 Not Implemented
- Acceptance: GET /runs/{id} reflects correct state
Exit Criteria
- Ingest 2+ SRA accessions → DRS objects in Ares (or shim)
- nf-core "Hello World" runs → k8s pod completes → outputs registered as DRS objects
- nf-core/rnaseq runs on minimal dataset with samplesheet
- Platform deployed to cloud and accessible to invited external users
2.2 M2: Beta-1 — June 1, 2026 (~8 Weeks)
Vertical: V1 Biomedical Research OS (full pipeline)
Product: PA Cloud
Goal: The full research pipeline works end-to-end: Ingest → Execute → Analyse → Publish. The samplesheet editor closes the cohort-to-pipeline gap. oCIS workspace is live.
Scope (Adds to M1)
Analysis and Publishing:
- pa-analysis-agent — methodology selection (DESeq2, edgeR, limma-voom via constrained decision matrix), LLM code generation (Python and R), three-stage validation framework (pre-execution, post-execution, plausibility)
- Feature selection (MRMR, Boruta, SHAP) — same infrastructure as DESeq2/edgeR: new rows in the decision matrix + new validation rules, no new services
- LLM Sandbox Phases 1–2 (per Khoa's epic): pa-sandbox-mcp-server wrapping llm-sandbox, pre-built images (pa-bio:3.11, bioconductor:3.18), K8s namespace + NetworkPolicy, pa-db-mcp-server (read-only PostgreSQL, OPA gating), pa-storage-mcp-server (DRS resolution, presigned URLs), self-healing debugger (basic)
- Publishing module — minimal metadata form, resource selection (notes primary), DRS URI resolution for packaging, RO-Crate 1.2 generation, ZIP to MinIO, background task with per-item tracking
Samplesheet Editor (per Anurag's Researcher Workflow epic):
- Cohort export → in-platform spreadsheet editor (AG Grid or equivalent)
- File path and metadata picker panel integrated into editor
- Save samplesheet as project file (DRS-registered, reusable)
- Pipeline selection + "Save & Run" submission from editor toolbar
- Run list with polling, run detail with human-readable error translation, output browser
oCIS Workspace:
- oCIS v8 with s3ng driver on MinIO, Keycloak SSO, OPA access control
- Project Spaces model: each project maps to an oCIS Space
- Basic file browsing, sharing, and organisation through web UI
- Not on the pipeline critical path but required for the workspace experience
Metadata Module (per Samyak's epic, partial):
- Workstream 1 begin: unconstrained SRA fetcher with checkpointing, Postgres loader optimisation
- Workstream 2: dynamic query builder, filter API endpoints
- Workstreams 3–4 (Contrast data model, Ares integration) deferred until post-M3 clarity on user flow. The distinction between intra- and inter-cohort contrasts needs further design; saving sub-cohorts (original filters + additional filters) may be sufficient. Ares integration also deferred pending readiness confirmation.
Timeline
| Weeks | Backend (Analysis) | Backend (Platform) | Frontend | DevOps |
|---|---|---|---|---|
| 7–8 | Agent scaffold. Methodology selector + constrained parser. Code generator. | LLM Sandbox Phase 1 (spike, images, pa-sandbox-mcp-server, NetworkPolicy). | Samplesheet editor (AG Grid eval, edit/add/delete). Cohort export button. | oCIS deployment (s3ng, Keycloak, OPA). |
| 9–10 | Validation framework (3 stages). Sandboxed TES execution. RabbitMQ wiring. | LLM Sandbox Phase 2 (pa-db-mcp-server, pa-storage-mcp-server, MCP auth). | File path picker with metadata panel. Save samplesheet. | — |
| 11–12 | Publishing module (form, selection, DRS resolution, RO-Crate, ZIP). | Self-healing debugger. E2E integration test (agent → MCP → sandbox → S3 → DB). | Pipeline catalogue. "Save & Run." Run list + detail + error translation. | — |
| 13–14 | Feature selection (matrix rows + validation rules). Golden file CI tests. | Metadata Module WS1 begin (fetcher, checkpointing). WS2 (query builder, filter API). | Output browser. oCIS frontend integration. | Postgres loader optimisation. |
Exit Criteria
- Full pipeline: Ingest → Execute → Analyse → Publish on MRSA anchor use case
- DESeq2 sandbox → results registered + validation passes
- Validation framework catches all known-bad test cases (golden file: pasilla dataset, Spearman > 0.9)
- Publish job → valid RO-Crate ZIP with correct metadata
- Samplesheet: export cohort → edit → insert paths via picker → Save & Run → pipeline executes
- oCIS: file browsing, sharing, and project Spaces operational
2.3 M3: Beta-2 — August 3, 2026 (~8 Weeks)
Verticals: V1 hardening + V2 Project Presence (foundation)
Product: PA Cloud (On-Prem preparation begins)
Goal: Platform ready for pilot UAT. Accounting live. Security baseline for procurement conversations. First elements of the AI research companion.
Scope (Adds to M2)
V1 Hardening:
- Accounting/billing — usage metering (CPU-hours, GB-months, ingestion volume per tenant), Stripe for cloud, institutional PO for On-Prem pipeline, storage tier quotas, overage alerts, free academic tier, billing audit trail, usage dashboard
- LLM Sandbox Phases 3–4 (per Khoa's epic): agent MCP client wiring, multi-step reasoning loop, bio prompt engineering (DESeq2, limma, scanpy; validate 5+ patterns), Monaco Editor workbench (syntax highlighting, runtime selector, run/stop, SSE log streaming, inline output rendering, .ipynb export)
- Metadata Module WS1 continued/completed (full SRA ingestion pipeline), WS2 completed (filter API endpoints with dynamic counts)
- ADR-10 benchmark (10M records in PostgreSQL, ClickHouse, DuckDB) — database engine decision
- Load testing: 5 concurrent ingestions, 10 concurrent workflows, 3 concurrent analyses, 2 concurrent publishes. Targets: 500 MB/s ingestion, <5s workflow submission, <5s sandbox cold start, 10 GB artefact in <10 min
- Failure injection: RabbitMQ drop, MinIO kill, Ares unavailable, ResourceQuota exhaustion, invalid S3 URIs
- Documentation, API reference, deployment runbook
- Pilot UAT with anchor partner
V2 Foundation (basic):
- Persistent research memory — soul documents applied to per-researcher context, loaded at session start from Qdrant
- Proactive monitoring seed — heartbeat cron for pipeline completion notifications (pipeline.analysis.complete → researcher notification via in-app or email, not yet Matrix)
- Notes ELN AI integration as the first element of the research companion (contextual suggestions based on project history)
Security Hardening (On-Prem preparation):
- Wazuh Phases 1–4 (cluster provisioning, TLS, agent rollout, log forwarding)
- Alertmanager → calert wiring
- Cinder encryption on ClickHouse volumes
- mTLS on observability cluster
- S3 cold storage with 12-month retention
Standards Framework
| Capability | Evidence | Test Gate |
|---|---|---|
| Data at rest | Cinder encryption | Volume metadata confirms |
| Data in transit | Istio mTLS + internal mTLS | Cert validation |
| Alert delivery | Alertmanager → calert → Google Chat | Synthetic alert within 60s |
| Log retention | S3 cold storage, 12-month lifecycle | Oldest log ≥ 365 days |
| SIEM baseline | Wazuh Phases 1–4 | Dashboard: all agents connected |
| Access control | OPA on all services | Policy test suite passes |
| Audit trail | Billing + research provenance | Export covers 30 days |
| Analysis correctness | Three-stage validation | CI: all known-bad caught |
Exit Criteria
- All M2 criteria hold
- Billing: at least one tenant metered, can generate invoice
- Monaco workbench: write/run code, view outputs, export .ipynb
- ADR-10 benchmark complete, database decision made
- Wazuh Phases 1–4 operational
- Encryption, mTLS, cold storage active
- Load test targets met, P0/P1 bugs resolved
- Pilot partner UAT sign-off
- V2: researcher receives pipeline completion notification without polling; soul document loaded per session
2.4 M4 — October 6, 2026 (~8 Weeks)
Verticals: V2 Project Presence (companion) + V3 Sovereign Enterprise Suite (foundation)
Product: Cloud + On-Prem
Goal: The AI research companion is operational. The sovereign productivity stack begins deployment. First On-Prem institutional deployment.
Team: Seed funding closes. Team expands from 6 to 10–14. New hires: +2 backend, +1 frontend, +1 ML/agent engineer, +1 DevOps, +1 bioinformatics full-time, +1 product/design optional.
Scope from M4 onward is shaped by user feedback from M1–M3. The items below represent the planned direction; specifics will be adjusted.
V2 Scope (basic)
- pa-relay service — Matrix → LiteLLM bridge, session management, audit logging, NO_REPLY suppression
- Researcher-specific soul documents with persistent memory (project history, active hypotheses, dataset context maintained across sessions via Qdrant)
- Heartbeat cron loop — configurable per deployment: pipeline completion alerts, cohort match notifications, surveillance signal thresholds
- Skills registry YAML + CLI wrappers — priority conversions: pa-drs-fetch, pa-policy-search, pa-cohort-query, pa-compliance-check
V3 Scope (foundation, basic)
- Matrix — Deployment for institutional messaging and chat. Element as the client. Bridges to pa-relay for agent-accessible communication channels.
- oCIS hardened — Document governance: fine-grained access controls, audit trails, retention policies. oCIS Spaces enforced per project/team.
- Collaborative editing — Begin integration of OnlyOffice or Collabora via WOPI protocol in oCIS. Basic document and spreadsheet co-editing. Tool selection decision required early in M4.
- Video conferencing — Evaluate and select: Jitsi Meet, Element Call (Matrix-native), or BigBlueButton. Deploy basic instance integrated with institutional auth (Keycloak SSO). Decision based on: sovereign deployability, Matrix integration quality, and institutional fit.
On-Prem
- Wazuh Phases 5–7 (FIM, CVE scanning, CIS benchmarks, compliance modules)
- Wazuh architecture documentation for procurement
- First institutional On-Prem deployment preparation
- Full SRA ingestion completion (149M records, using ADR-10 engine)
Infrastructure: MinIO → SeaweedFS Migration
MinIO Community Edition has been phased out, creating a licensing and business continuity risk for the platform. M4 begins the migration to SeaweedFS as the S3-compatible object storage layer.
- Deploy SeaweedFS on private K8s
- Migrate all services currently using MinIO: Upgate (file upload), Nextflow workDir, Velero backup, Harbor registry backend, DRS/presigned URLs, oCIS s3ng driver, publishing staging area
- Validate S3 API compatibility across all integration points
- Decommission MinIO
This migration is feasible in M4 due to the expanded team (10–14 engineers). Assign 1 backend + 1 DevOps engineer to the migration track in parallel with V2/V3 feature work.
Infrastructure: Storage Layer Autoscaling (On-Prem)
On-Prem deployments run on OpenStack, which lacks managed load balancers (no Magnum/Octavia). Custom autoscaling is required:
- SeaweedFS volume server horizontal scaling triggered by Prometheus capacity metrics
- Cinder volume auto-expansion for persistent storage
- K8s worker node autoscaling via OpenStack Nova (Terraform + Ansible triggered by Prometheus)
- Capacity alerting integrated into the observability stack
This work begins in M4 and may extend into M5 depending on the complexity of each institution's OpenStack environment.
Exit Criteria
- V2: researcher has persistent AI companion that remembers project context across sessions, receives proactive alerts
- V3: Matrix operational for team chat, collaborative editing functional (basic), video conferencing deployed
- On-Prem: first deployment environment provisioned with Wazuh active
- Full SRA queryable via ChatNexus
- SeaweedFS deployed and validated; at least one major service (Upgate or Nextflow workDir) migrated from MinIO
2.5 M5 — December 1, 2026 (~8 Weeks)
Verticals: V3 Sovereign Enterprise Suite (productivity) + V4 Government OS (foundation)
Product: Cloud + On-Prem
Goal: The full sovereign productivity stack is operational. Government deployment architecture is validated.
V3 Scope (operational)
- Matrix: channels, threads, file sharing, search — operational for institutional use
- Collaborative editing: OnlyOffice/Collabora hardened — version history, comment threads, review workflows
- Video conferencing: operational, integrated with calendar/scheduling if applicable
- Governance layer: role-based access, audit trails, retention policies, compliance reporting integrated across oCIS + Matrix + editing tools
- AI-assisted document work: summarisation, classification, policy gap analysis within sovereign perimeter
V4 Scope (foundation, basic)
- Federated analytics architecture — design and prototype the hub-and-spoke model for multi-institutional collaboration (building on PAHO syndromic surveillance pattern)
- Multi-jurisdictional policy intelligence — extend Policy Team agent for cross-country regulatory analysis
- Compliance reporting framework — HIPAA, GDPR, ISO 27001 templates and continuous monitoring
- Multi-region deployment preparation (EU sovereign + MENA)
Platform
- MinIO → SeaweedFS migration completion (if not finished in M4) and decommission
- Storage layer autoscaling completion (SeaweedFS scaling, Cinder expansion, K8s node autoscaling on OpenStack)
- DRS URI resolution inside workflow files (post-ADR-3 — if user feedback from M1–M3 confirms need)
- Repository connectors (Zenodo, Dataverse) — basic publish-to-external
- Incremental re-ingestion (download only delta when cohort updated)
- CLI/API ingestion for power users
- Metadata Module WS3–4 revisited: Contrast data model and Ares integration, if user feedback and design clarity warrant it
Exit Criteria
- V3: full productivity suite operational (files + chat + docs + video) under institutional governance
- V4: federated analytics prototype functional between two test environments
- At least one On-Prem institutional client in active deployment
- Multi-region: deployment architecture validated
2.6 M6: V1 Full — March 2027 (~12 Weeks)
Verticals: V4 Government OS (operational) + V5 DeSci Marketplace (foundation)
Product: Cloud + On-Prem
Goal: Full platform at production scale. Paying institutional clients. Agentic OS mature.
Team assumption: Scale from 6 → 10–14 with seed funding. Hires: +2 backend, +1 frontend, +1 ML/agent, +1 DevOps, +1 bioinformatics full-time, +1 product/design optional.
V4 Scope (operational)
- Federated hub-and-spoke productised — municipality/institution nodes transmit aggregates only, raw data never leaves origin
- Compliance reporting operational for HIPAA, GDPR, ISO 27001
- Multi-region: at least one EU and one MENA deployment live
- Air-gapped deployment validation for government contexts
- Sovereign AI inference: full reasoning pipelines inside government security perimeter
V5 Scope (foundation, design + prototype only)
- On-chain hypothesis registration — design, smart contract prototype, timestamp-based priority claim
- Research data objects — DRS-registered datasets with on-chain provenance records
- Federated data marketplace architecture — design for governed dataset access through protocol
- Protocol fee model — design for transaction-based revenue on dataset access, hypothesis citation, federated analytics
Agentic OS (mature)
- Validation Triad — Naysayer + ELO Rater + Risk Analyser, two-round deliberation, DRS audit artefact for every analysis output
- Agent self-modification (constrained) — operational sections of soul.md modifiable with human confirmation, constitutional constraints hash-verified
- Skills registry complete — all stateless MCP servers converted to CLI wrappers
- Full heartbeat configurations operational: WHO/PAHO (30-min), Saudi MoH (daily), pharma (event-triggered)
Platform Capabilities
- Custom workflows (non-nf-core, user-uploaded)
- Live log streaming, cancel/resume for long-running pipelines
- RAG-driven methodology selection (replaces decision matrix)
- Workflow parameter UI generation from nextflow_schema.json
Exit Criteria
- V4: federated deployment operational between 2+ institutions
- V5: hypothesis registration prototype functional on testnet
- Agentic OS: heartbeat delivers proactive alerts for at least one live deployment; Validation Triad runs on analysis outputs
- At least 2 institutional clients with active billing
- Multi-region live
- Platform handles 50+ concurrent researchers without degradation
2.7 Dependency Graph
┌──────────────────────────────┐
│ M0: BLOCKERS │
│ Ares confirmed/shimmed │
│ Notes chain endpoint │
│ Alpha hard deadline │
└──────────────┬───────────────┘
│
┌──────────────▼───────────────┐
│ M1 ALPHA (Apr) │
│ V1: Ingest + Execute │
│ Cloud deployment │
│ ⚠ CRITICAL PATH: │
│ Nextflow gRPC plugin │
└──────────────┬───────────────┘
│
┌──────────────▼───────────────┐
│ M2 BETA-1 (Jun) │
│ V1: + Analyse + Publish │
│ + Samplesheet editor │
│ + oCIS + LLM Sandbox 1–2 │
│ + Metadata WS1–2 begin │
└──────────────┬───────────────┘
│
┌──────────────▼───────────────┐
│ M3 BETA-2 (Aug) │
│ V1: hardening + billing │
│ V2: foundation (memory, │
│ heartbeat, ELN AI) │
│ + LLM Sandbox 3–4 │
│ + Security baseline │
│ + Pilot UAT │
└──────────────┬───────────────┘
│
┌──────────────▼───────────────┐
│ M4 (Oct) │
│ V2: companion (relay, souls,│
│ skills, heartbeat) │
│ V3: foundation (Matrix, │
│ collab editing, video) │
│ + First On-Prem deploy │
│ + Full SRA (149M) │
└──────────────┬───────────────┘
│
┌──────────────▼───────────────┐
│ M5 (Dec) │
│ V3: operational (full │
│ productivity suite) │
│ V4: foundation (federated │
│ analytics, compliance) │
│ + Multi-region prep │
└──────────────┬───────────────┘
│
┌──────────────▼───────────────┐
│ M6 V1 FULL (Mar 2027) │
│ V4: operational (gov OS) │
│ V5: foundation (DeSci) │
│ + Agentic OS mature │
│ + Paying clients │
└──────────────────────────────┘Critical path through M1–M2: Ares readiness → SRA download + DRS registration (M1 Wk 1–2) → Nextflow gRPC plugin (M1 Wk 1–4) → DRS output registration (M1 Wk 3–5) → Analysis sandboxed execution (M2) → End-to-end integration (M3).
3. Blockers, Risks, and Open Questions
3.1 Pre-Sprint Blockers
See §2.0 (B1–B4).
3.2 Critical Risks
R1: LLM-Generated Analysis Correctness. Plausible but wrong results are worse than crashes. Mitigation: Three-stage validation, golden file CI, validation_failed as distinct status, researcher review always required. Residual: Novel failure modes not in validation rules; partially mitigated by Validation Triad (M6).
R2: Manual S3 URI Entry UX. Error-prone for non-technical users. Mitigation: Client-side validator, file picker panel (Anurag's epic FE-02-B), bulk copy. Trigger: If top-3 adoption blocker, promote DRS URI resolution.
R3: Nextflow gRPC Plugin (Schedule). The Nextflow plugin is the critical path. Mitigation: Most experienced engineer, "Hello World" first, weekly check-in with escalation.
R4: API Contracts Not Finalised. Samplesheet editor depends on 5 backend contracts (C-1 through C-5). Mitigation: Agree in M2 Week 1, frontend builds against mocks.
R5: V3 Tool Selection. Collaborative editing (OnlyOffice vs Collabora) and video conferencing (Jitsi vs Element Call vs BBB) require evaluation against sovereign deployability, Matrix integration, and institutional fit. Wrong choice means migration cost. Mitigation: Spike evaluation in M4 Week 1, decide before committing engineering time.
R5b: MinIO Community Edition Phased Out. MinIO CE is no longer available, creating licensing and business continuity risk. Every storage-dependent service (Upgate, Nextflow, Velero, Harbor, DRS, oCIS, publishing) is affected. Mitigation: SeaweedFS migration planned for M4 with expanded team. Validate S3 compatibility early. If SeaweedFS reveals incompatibilities, evaluate Garage or Ceph RGW as fallback.
3.3 Moderate Risks
R6: Workflow DSL2 compatibility — trs-syncer filters, TRS /tests, fallback allowlist.
R7: Output hijacking via publishDir — test with nf-core/rnaseq, batch DRS registration.
R8: Publishing large artefacts — background task, per-item tracking, 50 GB cap.
R9: Samplesheet column mismatch — fail and surface error (MVP), schema validation (post-MVP).
R10: Editor state loss — auto-save to browser storage every 30s.
R11: Metadata Module scale — 149M records may exceed PostgreSQL. ADR-10 benchmark in M3.
R12: V4 federated analytics complexity — hub-and-spoke with privacy-preserving aggregation is architecturally complex. PAHO deployment is the proof point but generalising it is non-trivial.
3.4 Open Questions
| # | Question | Blocking? | Deadline |
|---|---|---|---|
| Q1 | Resolved: April 6, deployed for invited users | ||
| Q2 | Cloud provider selection (primary EU provider) | M1 deploy | M1 Wk 1 |
| Q3 | Billing model — Stripe vs PO vs hybrid | M3 billing | M2 end |
| Q4 | oCIS scope for M2 — read-only browser or full workspace? | No | M2 Wk 7 |
| Q5 | ADR-10 database engine (PostgreSQL vs ClickHouse vs DuckDB) | Metadata expansion | M3 Wk 19 |
| Q6 | Contrast model design — sub-cohorts sufficient vs explicit contrasts? | Metadata WS3–4 | Post-M3 |
| Q7 | Samplesheet generation per workflow type (column mapping) | Auto-generation | Post-M3 |
| Q8 | Publishing module extraction to standalone service | No | Post-M2 |
| Q9 | V3 collaborative editor: OnlyOffice vs Collabora | M4 | M4 Wk 1 |
| Q10 | V3 video conferencing: Jitsi vs Element Call vs BBB | M4 | M4 Wk 1 |
| Q11 | Monorepo (Nx) adoption timing across all services | No | Post-M1 |
| Q12 | Istio ambient mode fallback — sidecar mode pre-configured? | No | M3 |
| Q13 | TUS vs S3 presigned URL upload consolidation | No | Post-M3 |
| Q14 | Wazuh Indexer storage growth rate — cold-tier from day one? | No | M3 |
| Q15 | R language prompt engineering and validation — testing effort for Bioconductor workflows? R execution included from M2 but prompt quality needs validation. | No | M2 |
| Q16 | MinIO → SeaweedFS migration — S3 compatibility gaps? Fallback to Garage or Ceph RGW if SeaweedFS insufficient? | M4 | M4 Wk 2 |
4. High-Level Architecture
4.1 Pipeline Overview
Hypothesis Generator
│
▼
Cohorts Service ◄── MDI Postgres
│
▼
┌──────────────────────────────────────────────────────────────────────┐
│ pa-pipeline-orchestrator │
│ │
│ Ingest ──► Execute (Metis/WES) ──► event ──► pa-analysis-agent │
│ (SRA → MinIO/DRS) (Nextflow on k8s) (DEA + publish) │
└──────────────────────────────────────────────────────────────────────┘
│ │
▼ ▼
MinIO ◄──► Ares (DRS) Results (DRS objects)
│
▼
Publishing Module
(RO-Crate → ZIP → MinIO)4.2 Two-Service Model
pa-pipeline-orchestrator — data ingestion and atomic task execution. Ingestion jobs, per-file tracking, TES task state. Exposes /ingestion/jobs and /ga4gh/tes/v1/tasks.
pa-analysis-agent — methodology selection, code generation, validation, publishing. Consumes pipeline.workflow.complete, publishes pipeline.analysis.complete and publish.artefact.ready. Exposes /analyses, /methodologies, /artefacts.
Metis (WES) — existing GA4GH workflow execution. TRS/DRS resolution, Nextflow on k8s, MongoDB for run state.
Notes (ELN) — existing immutable append-only chain. Chain export for publishing.
4.3 Event Topology
RabbitMQ is the sole durable event bus. Redis for caching, heartbeats, ephemeral pub/sub only.
Exchange: pipeline_events (topic)
├── pipeline.ingestion.complete → orchestrator
├── pipeline.workflow.complete → analysis-agent
├── pipeline.analysis.complete → notification service
├── publish.artefact.ready → notification service
└── drs.object.registered → downstream consumers4.4 LLM Inference
All inference on-premises. GPT-OSS 120B via vLLM (31.3 tokens/sec, 50 concurrent users). Embeddings: Qwen3-Embedding-0.6B (2×A100). No data leaves the sovereign environment.
4.5 oCIS Integration (M2+)
oCIS v8 provides the file workspace layer. s3ng storage driver on MinIO (shared storage backend with the pipeline). Keycloak SSO for authentication. OPA for access control consistent with the rest of the platform. Project Spaces model: each PA project maps to an oCIS Space, giving researchers a familiar file browsing and sharing experience.
Not on the pipeline critical path. The pipeline operates on DRS objects via MinIO; oCIS is the human-facing view of the same storage.
4.6 V3 Sovereign Productivity Stack (M4–M5)
The target architecture for V3 Sovereign Enterprise Suite combines four components, all self-hosted within the institutional or cloud environment:
- oCIS — file management, project spaces, governance layer
- Matrix (Element) — institutional messaging, channels, threads; bridges to pa-relay for agent interaction
- OnlyOffice or Collabora — collaborative document/spreadsheet editing via WOPI protocol in oCIS (tool selection: Q9, decided M4 Wk 1)
- Jitsi, Element Call, or BBB — video conferencing with Keycloak SSO (tool selection: Q10, decided M4 Wk 1)
All four share Keycloak for authentication and OPA for access policy. The governance layer (audit trails, retention, compliance reporting) spans all components.
4.7 Service Pattern
All PA services follow: FastAPI + Keycloak OIDC + structured logging (OpenTelemetry → SigNoz) + CNPG Postgres (RW/RO split) + Redis heartbeat + Harbor images + ArgoCD GitOps.
5. Use Cases
UC-1: Ingest Data from NCBI SRA (M1)
Orchestrator resolves accessions to SRR via MDI, deduplicates, surfaces pre-flight estimate, downloads from AWS Open Data (fallback: SRA Toolkit), uploads via Upgate, registers DRS objects. Partial success first-class. See §6.
UC-2: Execute a Nextflow Workflow (M1)
User selects nf-core workflow from TRS, copies S3 URIs from dataset browser (client-side validator), submits to Metis. Platform enforces --outdir. Outputs batch-registered. See §7.
UC-3: Assemble Samplesheet and Submit Run (M2)
Researcher exports cohort to in-platform editor. Platform pre-fills sample IDs and file paths. Researcher adds pipeline-specific columns, inserts file paths via picker panel, saves as project file, selects pipeline, clicks "Save & Run." Full spec: Anurag's Researcher Workflow epic.
UC-4: Run Statistical Analysis (M2)
Analysis agent selects method, generates code, validates (3 stages), submits sandboxed TES task, registers results. Full spec: §8 + Khoa's LLM Sandbox epic.
UC-5: Publish a Data Artefact (M2)
Minimal metadata form, resource selection, RO-Crate 1.2 ZIP packaging. Background task. See §9.
6. Data Ingestion — Design Detail
6.1 Download Strategy
Primary: AWS Open Data (no credentials, negligible cost). Fallback: SRA Toolkit Docker image. Accession resolution always to SRR via MDI first, E-utils only if not indexed. Paired-end: two DRS objects per accession.
6.2 Registration Paths
External uploads via Upgate (chunked, resumable → RabbitMQ → Ares). Internal outputs via direct path (POST /objects, milliseconds).
6.3 Ingestion Does NOT (M1–M2)
Assign cohort/contrast labels. Determine pipelines. Parse/validate file contents. Support controlled-access or non-SRA repositories.
7. Nextflow Execution — Design Detail
7.1 Tradeoffs
Manual S3 URIs. Client-side validator. WES via Metis. Native k8s (TES later). See ADR-2, ADR-3.
7.2 Output Capture
Recursive --outdir parse → sizes + MD5 → batch DRS registration → tag with pipeline/version/output_type/run_id/project_id → publish pipeline.workflow.complete.
8. Analysis Agent — Design Detail
8.1 Methodology Selection
Config-driven decision matrix. DESeq2, edgeR, limma-voom. Feature selection (MRMR, Boruta, SHAP) follows the identical pattern: new matrix rows + validation rules. Conditions evaluated by constrained parser — whitelisted variables + numeric literals + comparisons only. eval() prohibited.
8.2 Validation Framework
Three stages: pre-execution (imports, paths, no network, cohort_arm used), post-execution (required columns, no NaN/Inf, padj in [0,1], plot valid), plausibility (p-value KS test, fold-change variance, sample count cross-check). validation_failed is distinct from failed. Golden file CI: pasilla dataset, Spearman > 0.9. Full spec: Khoa's LLM Sandbox epic.
8.3 LLM Sandbox
Four MCP servers: pa-sandbox-mcp-server, pa-db-mcp-server, pa-storage-mcp-server, pa-search-mcp-server. Pre-built images (pa-bio:3.11, bioconductor:3.18, pa-bio-ml:3.11) in Harbor with DaemonSet pre-pull. R included from M2. Self-healing debugger: PII strip → diagnose → web search → patch → retry (max 2). Full spec: Khoa's LLM Sandbox epic.
9. Publishing Module — Design Detail
Exit point. Packages workspace resources into RO-Crate 1.2 ZIP. Notes are primary narrative source. Create staging → resolve/download → fetch note chains → ro-crate-metadata.json → ZIP → MinIO → notify. Background task, per-item tracking, 50 GB cap. Lives within pa-analysis-agent for M2.
10. Operational Readiness
10.1 Compliance Timeline
Compliance is minimal for M1–M2 (cloud product, provider handles infrastructure security). Hardening begins in M3 for On-Prem preparation, completes across M4–M5.
| Item | Milestone | Product Target |
|---|---|---|
| Alertmanager → calert | M3 | On-Prem |
| Cinder encryption | M3 | On-Prem |
| mTLS observability | M3 | On-Prem |
| S3 cold storage 12-month | M3 | On-Prem |
| Wazuh Phases 1–4 | M3 | On-Prem |
| Wazuh Phases 5–7 (FIM, CVE, compliance) | M4 | On-Prem |
| Wazuh architecture documentation | M4 | On-Prem (procurement) |
| Full compliance reporting (HIPAA, GDPR, ISO 27001) | M5 | On-Prem |
10.2 Capacity Planning
Year 1 projected (10–20 researchers): 5–10 concurrent runs (burst 20), ~5 TB/month ingestion, ~85 TB MinIO Year 1, ~1,700–3,400 AUD/month storage, GPU at 10–30%, no cloud API spend for core pipeline.
Quotas per project: 16 CPU / 64 GB RAM / 500 GB ephemeral default. Per-workflow max 8 CPU / 32 GB RAM. Per-sandbox max 4 CPU / 8 GB RAM / 2h timeout.
10.3 Testing Strategy
Functional progression (M1–M2): TES echo → ingest 2 accessions → Hello World → rnaseq → DESeq2 sandbox → E2E → publish.
Validation CI (M2+): Known-bad inputs/outputs, golden file (pasilla, Spearman > 0.9).
Load testing (M3): 5 ingestions, 10 workflows, 3 analyses, 2 publishes concurrent. Targets: 500 MB/s, <5s submission, <5s cold start, 10 GB in <10 min.
Failure injection (M3): RabbitMQ kill, MinIO kill, Ares unavailable, quota exhaustion, invalid URIs.
Frontend profiles (M3+): 4 curated (Full, Research, Surveillance, Minimal) in CI.
11. Agentic OS Layer
Distributed across milestones rather than big-bang:
| Component | Milestone | Notes |
|---|---|---|
| Soul documents (per-researcher) | M3 | V2 foundation — persistent memory |
| Pipeline heartbeat notifications | M3 | V2 foundation — basic proactive alerts |
| pa-relay (Matrix → LiteLLM) | M4 | V2 companion — messaging channel |
| Skills registry + CLI wrappers | M4 | V2 companion — lightweight tool access |
| Full heartbeat (multi-config) | M4–M5 | WHO/PAHO, Saudi MoH, pharma |
| Validation Triad | M6 | Two-round deliberation, DRS audit record |
| Agent self-modification | M6 | Constrained, human confirmation, hash check |
Appendix A: Architecture Decision Records
| ADR | Decision | Status |
|---|---|---|
| ADR-1 | Two services, not four | Active |
| ADR-2 | WES (Metis), native k8s backend | Active |
| ADR-3 | No DRS URI resolution in workflow files (MVP) | Active |
| ADR-4 | Dual file registration path | Active |
| ADR-5 | RabbitMQ as primary event bus | Active |
| ADR-6 | Decision matrix, constrained parser, no eval() | Active |
| ADR-8 | Nx monorepo, Bazel on pain points | Active |
| ADR-9 | Adopt agentic OS patterns, don't fork OpenClaw | Active |
| ADR-10 | Database for full SRA (149M). Benchmark M3. | Pending |
| ADR-11 | LLM Sandbox engine (llm-sandbox, MIT) | Active |
| ADR-12 | Three-tier multi-tenancy, OPA | Implemented |
| ADR-13 | Four-tier deployment environments | Implemented |
| ADR-14 | Zero-trust (Istio ambient, OPA sidecars) | Implemented |
| ADR-15 | Frontend 9 modules, runtime white-labelling | Implemented |
| ADR-16 | SIEM (Wazuh) | Planned (M3) |
| ADR-17 | MinIO → SeaweedFS migration. Deploy SeaweedFS, migrate all S3 consumers, validate compat, decommission MinIO. | Planned (M4) |
| ADR-18 | Storage layer autoscaling (On-Prem). Custom Terraform + Ansible scaling for SeaweedFS, Cinder, K8s nodes on OpenStack (no Magnum/Octavia). | Planned (M4–M5) |
Appendix B: Related Documents
| Document | Owner | Maps To |
|---|---|---|
| LLM Sandbox Epic (PA-SANDBOX) | Khoa | Phases 1–2 → M2, Phases 3–4 → M3 |
| Researcher Workflow Epic | Anurag | Samplesheet editor + run monitoring → M2 |
| Metadata Module Epic | Samyak | WS1–2 → M2/M3, WS3–4 deferred |
| Admin Docs Hub Epic | Alex | Independent track |
| Observability Cluster Summary | — | Compliance detail for §10.1 |
| PA Atlas v4.0 | Boris | Strategic vision, 5 verticals, commercial model |
| Alpha Release Roadmap | CTO | Superseded by §2.1 |
| Platform Functionality Roadmap | Boris | Strategic framing, superseded by §2 |
| Milestones & v4→v5 Delta | Boris | Cross-reference driving this version |
Appendix C: Database Schemas
C.1 Pipeline Orchestrator
CREATE TABLE ingestion_jobs (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
target_dataset_id UUID NOT NULL,
accessions JSONB NOT NULL,
status TEXT NOT NULL DEFAULT 'queued',
requested_by TEXT NOT NULL,
error_message TEXT,
created_at TIMESTAMPTZ DEFAULT now(),
started_at TIMESTAMPTZ,
completed_at TIMESTAMPTZ
);
CREATE TABLE ingestion_files (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
ingestion_job_id UUID NOT NULL REFERENCES ingestion_jobs(id),
accession TEXT NOT NULL,
filename TEXT NOT NULL,
drs_uri TEXT,
status TEXT NOT NULL DEFAULT 'pending',
size_bytes BIGINT,
checksum_md5 TEXT,
error_msg TEXT
);
CREATE TABLE tes_tasks (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
analysis_job_id UUID,
name TEXT,
state TEXT NOT NULL DEFAULT 'QUEUED',
security_profile TEXT NOT NULL DEFAULT 'trusted',
submitted_by TEXT NOT NULL,
inputs JSONB NOT NULL,
outputs JSONB NOT NULL,
executors JSONB NOT NULL,
resources JSONB,
code_bundle JSONB,
logs JSONB,
output_drs_uris JSONB,
created_at TIMESTAMPTZ DEFAULT now(),
started_at TIMESTAMPTZ,
completed_at TIMESTAMPTZ
);C.2 Analysis Agent
CREATE TABLE analysis_jobs (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
hypothesis_id UUID,
workflow_run_id TEXT NOT NULL,
cohort_id UUID NOT NULL,
assay_type TEXT NOT NULL,
status TEXT NOT NULL DEFAULT 'pending',
methodology JSONB,
validation_errors JSONB,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
completed_at TIMESTAMPTZ,
created_by UUID NOT NULL
);
CREATE TABLE analysis_results (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
analysis_job_id UUID NOT NULL REFERENCES analysis_jobs(id),
tes_task_id UUID NOT NULL,
method_name TEXT NOT NULL,
method_role TEXT NOT NULL,
filename TEXT NOT NULL,
file_type TEXT NOT NULL,
drs_uri TEXT NOT NULL,
result_role TEXT NOT NULL,
validation_status TEXT DEFAULT 'pending',
validation_details JSONB,
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE TABLE publish_artefacts (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL,
name TEXT NOT NULL,
description TEXT NOT NULL,
license TEXT NOT NULL,
date_published DATE NOT NULL DEFAULT CURRENT_DATE,
status TEXT NOT NULL DEFAULT 'staging',
minio_path TEXT,
zip_size_bytes BIGINT,
error_msg TEXT,
created_at TIMESTAMPTZ DEFAULT now(),
completed_at TIMESTAMPTZ
);
CREATE TABLE publish_artefact_items (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
artefact_id UUID NOT NULL REFERENCES publish_artefacts(id),
resource_type TEXT NOT NULL,
resource_id TEXT NOT NULL,
display_name TEXT,
zip_path TEXT,
size_bytes BIGINT,
status TEXT NOT NULL DEFAULT 'pending'
);Appendix D: Existing Services
| Service | Language | Status |
|---|---|---|
| IAM | Python/FastAPI | Production |
| Upgate | Rust/Axum | Production |
| Ares (TRS/DRS) | Rust/Axum | Production |
| Metis (WES) | Rust | Production |
| zipRS | Python | Production |
| Notes (ELN) | — | Production |
| GAS (Gateway) | Python | Production |
| ChatMDI/ChatNexus | Python | Production |
| Forge (Hypothesis) | Python | Production |
| Research Team | Python | Production |
| Policy Team | Python | Production |
| General Agent | Python | Production |
| RMS (RAG) | Python | Production |
| Frontend | React 18/TS | Production |
| pa-pipeline-orchestrator | Python/FastAPI | New (M1) |
| pa-analysis-agent | Python/FastAPI | New (M2) |
Appendix E: Frontend Architecture
Technology Stack
React 18, TypeScript 5.6, Vite 6. State management: Redux Toolkit + RTK Query with 14 backend service slices, redux-persist for workspace and upload state. UI: Radix UI + shadcn/ui + Tailwind CSS v3. Rich text: TipTap v3 (ProseMirror). Code editor: Monaco. Charts: Recharts. Auth: oidc-client-ts + Keycloak with cross-tab session sync. GA4GH components: Elixir Cloud Components (@elixir-cloud/*).
SSE Streaming Protocol
All AI-facing endpoints use Server-Sent Events with 8 event types: data-conversation, data-agent, text-start, text-delta, text-end, tool-output-available, error, finish. A 1.5-second thinking stall indicator fires during long reasoning steps. Frontend consumes via Vercel AI SDK v5 with 16 composable chat primitives.
White-Labelling
Single Docker image serves all environments. runtime-env.js generated at container startup injects: VITE_BRAND_NAME, VITE_PRIMARY_COLOR (OKLCH), logo URLs, backend service URLs, and VITE_ENABLED_MODULES. Three-tier tenancy (Org → Tenant → Project) with workspace auto-provisioning. Cache isolation: switching projects resets all 9 RTK Query service caches simultaneously.
Feature Modules
| Module | Status |
|---|---|
| Datasets (DRS CRUD, file tree, downloads) | Production |
| Cohort Builder (faceted filtering, visualisation) | Production |
| Workflows (TRS registry, public browser) | Production |
| Runs (WES submission, monitoring, outputs) | Production |
| Research (multi-tab AI chat) | Production |
| RAG Knowledge Bases (upload, tag, publish) | Production |
| Notes (TipTap, autosave, PDF export, AI) | Production |
| Exploratory Analysis (JupyterLab → Monaco in M3) | Production |
| Sidebar (AI assistant + Notes, global) | Production |
Appendix F: Deployment Checklists
F.1 pa-pipeline-orchestrator (M1)
- ☐ CNPG migrations (ingestion_jobs, ingestion_files, tes_tasks)
- ☐ FastAPI scaffold (pa-auth, pa-logging, Redis heartbeat, Taskfile)
- ☐ Download pod image (ncbi/sra-tools + Upgate client)
- ☐ K8s namespace + ServiceAccount + RBAC
- ☐ MinIO credentials Secret
- ☐ RabbitMQ exchange pipeline_events + bindings
- ☐ NetworkPolicy for sandboxed TES tasks
- ☐ Istio routing for /ingestion/, /ga4gh/tes/
- ☐ OTEL + Prometheus scrape config
- ☐ CI → Harbor → Helm values
- ☐ Helm sub-chart in pa-platform
F.2 Metis (M1)
- ☐ Nextflow gRPC plugin
- ☐ WES endpoint stabilisation
- ☐ DRS output registration post-execution
- ☐ trs-syncer nf-core plugin
- ☐ trs-cache-filer Nextflow validation
- ☐ TRS /tests endpoint
- ☐ Frontend: workflow selector + dataset browser + result viewer
- ☐ OTEL integration
F.3 pa-analysis-agent (M2)
- ☐ CNPG migrations (analysis_jobs, analysis_results, publish_artefacts, publish_artefact_items)
- ☐ FastAPI scaffold
- ☐ pa-bio:3.11 + bioconductor:3.18 images → Harbor
- ☐ Methodology matrix YAML + constrained parser
- ☐ Validation framework + CI test suite
- ☐ RabbitMQ consumer (workflow.complete) + publisher (artefact.ready)
- ☐ MinIO credentials (staging + artefacts)
- ☐ OTEL + Prometheus
- ☐ Helm sub-chart + Keycloak client
- ☐ Frontend: analysis UI + publish section
Appendix G: Deployment Infrastructure
GitOps and Sync Waves
All 30+ services are managed by ArgoCD via the App-of-Apps pattern. Terraform handles one-time bootstrap (cluster, service mesh, GitOps controller, messaging operator). Helm charts with environment-specific values files. 9 ordered sync waves:
- cert-manager
- Operators (CloudNativePG, MongoDB, RabbitMQ)
- Infrastructure (Keycloak, PostgreSQL, Redis, MinIO/SeaweedFS, RabbitMQ clusters)
- Platform services (IAM, Gateway, domain services)
- RAG/AI services (Qdrant, RMS, GAS, agent teams) 6–9. Progressive application layers
CI/CD: image tag propagation → infra repo webhook → ArgoCD detect → auto-deploy with exponential backoff retry (500 attempts).
Preview Environments (ADR-13)
Preview environments are auto-provisioned by GitHub Actions on preview/ branches: Talos cluster on cloud → Terraform bootstrap (mesh, ArgoCD, RabbitMQ Operator) → full platform via ArgoCD → auto-cleanup on branch deletion. Each preview gets isolated Terraform state. Provisioning completes within minutes.
Constraints: concurrent preview environments limited to 3 via GitHub Actions concurrency groups. Auto-teardown after 48-hour inactive TTL. Monthly cloud budget tracked with alerts at 80%.
Deployment Tiers
| Tier | Infrastructure | Sync Mode | Purpose |
|---|---|---|---|
| Stable | kubeadm | Manual gate | Production |
| Staging | kubeadm | Continuous | Integration testing |
| Dev | kubeadm | Continuous | Development (relaxed constraints) |
| Preview | Talos | Ephemeral per-branch | Feature isolation |
Appendix H: Ingestion Design Q&A
Key decisions from the team design review:
- Pre-flight estimates required, explicit acknowledgement before download
- Partial success first-class — per-file status, don't abort whole job
- Auto-retry 2–3× with backoff; distinguish transient vs source errors
- Cohort editing post-ingestion supported; incremental re-ingestion without re-download
- Group reassignment possible without re-downloading
- Background jobs, async notification; all progress info (%, count, ETA)
- 6+ hour downloads acceptable; always offer free tier
- Human-readable errors only; never raw stack traces
- Re-runs with different parameters tracked distinctly
Full transcript: v4.0 Appendix C.
Appendix I: RO-Crate Metadata Structure
{
"@context": "https://w3id.org/ro/crate/1.2/context",
"@graph": [
{
"@type": "CreativeWork",
"@id": "ro-crate-metadata.json",
"conformsTo": {"@id": "https://w3id.org/ro/crate/1.2"},
"about": {"@id": "./"}
},
{
"@type": "Dataset",
"@id": "./",
"name": "{user-provided}",
"description": "{user-provided}",
"license": {"@id": "https://creativecommons.org/licenses/by/4.0/"},
"datePublished": "2026-07-01",
"author": {"@id": "#author-1"},
"hasPart": [
{"@id": "notes/{note_id}.jsonl"},
{"@id": "datasets/{drs_id}/{filename}"},
{"@id": "runs/{run_id}/outputs/"},
{"@id": "workflows/{trs_id}/"}
]
},
{
"@type": "Person",
"@id": "#author-1",
"name": "{from Keycloak}",
"@identifier": "https://orcid.org/..."
}
]
}Output ZIP structure:
data-artefact-{date}-{id}.zip
├── ro-crate-metadata.json
├── notes/
├── datasets/
│ └── {drs_object_id}/{filename}
├── runs/
│ └── {run_id}/outputs/
└── workflows/
└── {trs_tool_id}/Appendix J: Methodology Decision Matrix
methodology_matrix:
bulk_rnaseq:
counts:
- condition: "n_min >= 3"
primary: DESeq2
alternative: edgeR
- condition: "n_min < 3"
primary: limma-voom
alternative: edgeR
# Feature selection follows the same pattern:
# feature_selection:
# high_dimensional:
# - condition: "n_features > 1000"
# primary: MRMR
# alternative: Boruta
# - condition: "n_features <= 1000"
# primary: SHAP
# alternative: BorutaConditions are evaluated by a constrained recursive-descent parser. Accepted tokens: whitelisted variable names (n_min, n_total, n_group_a, n_group_b, n_features), numeric literals, comparison operators (<, >, <=, >=, ==). eval() is prohibited. Unparseable conditions are rejected at config load time.