PA Docs Hub
Platform

Platform Roadmap

PacificAnalytics/unified-roundtrip-roadmap

PA Platform Roadmap v5.2

Status: Version 5.2 — March 2026
Audience: Leadership, engineering leads, procurement reviewers
Supersedes: All prior Roadmap versions


Executive Summary

What We Are Building

The PA platform is a sovereign operating system for scientific discovery. It connects data ingestion, workflow execution, AI-driven analysis, and research publishing into a single integrated pipeline — enabling biomedical researchers to move from a research question to a statistically validated, citable result without leaving the platform.

Product Strategy

Two deployment models, sequenced by market readiness:

  • PA Cloud — Hosted on trusted European cloud providers (Exoscale/SWITCH in Switzerland, Nebius in Netherlands, Genesis Cloud in Germany, Scaleway in France). Targets individual researchers and labs. Sovereignty through choice of national cloud provider. Primary development focus for the first 6 months (M1–M3).
  • PA On-Prem — Sovereign deployment on institutional infrastructure. Full compliance and data security scope. From M4 onward, every feature ships for both Cloud and On-Prem simultaneously; what differs is the deployment modality, compliance posture, and governance layer.

Platform Verticals

The platform is organised around five verticals (see Atlas v4.0 for full strategic context). Each vertical builds on the prior:

VerticalDescriptionFirst Appears
V1 Biomedical Research OSIngest → Execute → Analyse → PublishM1 (Alpha)
V2 Project PresencePersistent AI research companionM3 (foundation)
V3 Sovereign Enterprise SuiteSecure productivity (files, chat, docs, video)M4 (foundation)
V4 Government OSFederated analytics, compliance, multi-jurisdictionM5 (foundation)
V5 DeSci MarketplaceOn-chain research IP, data exchange protocolM6 (foundation)

Development Principle

Every capability ships in its most basic viable form at the milestone where it first appears. Refinement is driven by user feedback, not speculative feature completeness. This applies across all verticals and both products. The scope of M4–M6 is intentionally higher-level — it will be shaped by what we learn from real users in M1–M3.

Milestones (Bimonthly, 12 Months)

MilestoneTargetPrimary ProductVertical Focus
M1 AlphaApril 6, 2026CloudV1 foundation
M2 Beta-1June 1, 2026CloudV1 full pipeline
M3 Beta-2August 3, 2026CloudV1 hardening + V2 foundation
M4October 6, 2026Cloud + On-PremV2 companion + V3 foundation
M5December 1, 2026Cloud + On-PremV3 productivity + V4 foundation
M6 V1 FullMarch 2, 2027Cloud + On-PremV4 government + V5 foundation

Team: 6 (4 backend, 1 frontend, 1 DevOps) + bioinformatics part-time through M3. Seed funding and team expansion to 10–14 happens in October 2026; the larger team is available from M4 onward.

Key Risks

  1. Ares DRS readiness — Multiple services depend on it; status is "under discussion." PostgreSQL shim fallback if unresolved by pre-sprint.
  2. LLM-generated analysis correctness — Plausible but wrong results are worse than crashes. Three-stage validation framework defined.
  3. Manual S3 URI entry UX — Error-prone. File picker and client-side validator mitigate. DRS URI resolution promoted if it becomes a top-3 adoption blocker.

1. Vision and Strategy

1.1 The Research Pipeline

The platform enables researchers to move from a research question to a statistically validated answer through an integrated pipeline, then publish that work as a citable, portable data artefact. Every file at every stage is a DRS object. Publishing is the exit point.

1.2 The Workspace Model

The platform is a workspace, not a repository. Metadata burden does not fall on the researcher during active work. Files are ingested without annotation, cohorts are assembled without rigid labels, workflows run without provenance overhead. When it is time to publish, the Notes ELN entries are the primary narrative thread; datasets, runs, and results are the supporting evidence.

1.3 Labels as Soft Associations

Metadata labels (cohort_arm, contrast_label, assay_type) are soft associations on DRS objects, not structural constraints. The same dataset can participate in multiple contrasts without duplication. Labels are resolved at runtime by the Cohorts Service, not baked in at ingestion. This avoids data redundancy, supports overlapping experimental designs, and keeps ingestion decoupled from analysis context.

1.4 Existing Foundation

The platform builds on a substantial existing base: the Medical Data Index (MDI) with billions of curated SRA metadata points; 1,600+ catalogued nf-core modules across 10 biological domains; a Notes service providing an immutable, cryptographically chained Electronic Lab Notebook; a multi-agent AI system with gateway routing, hypothesis generation, policy analysis, and natural-language database access; three-tier multi-tenancy with OPA enforcement; zero-trust networking with Istio ambient mTLS; four deployment tiers including ephemeral preview environments; and a white-labelled frontend with 9 feature modules. The full service inventory is in Appendix D.

1.5 Anchor Use Case: MRSA Surveillance

A researcher tests whether MRSA strains from international pilgrimage travellers are genetically distinct from locally-acquired strains:

  1. Query MDI for Staphylococcus aureus WGS data filtered by traveller status.
  2. Assemble two cohort groups via the Cohort Builder.
  3. Ingest relevant SRR accessions; files register as DRS objects.
  4. Assemble a samplesheet in the in-platform editor, submit an nf-core WGS pipeline via Metis WES.
  5. Analysis agent runs cohort-level SNP comparison in a sandboxed container.
  6. Record decisions in the Notes ELN throughout.
  7. Publish cohort, run, results, and notes as an RO-Crate.

2. Milestones

2.0 Milestone 0: Resolve Pre-Sprint Blockers

#BlockerActionOwnerDeadline
B1Ares DRS readinessConfirm operational by M1 Week 2, or commit to PostgreSQL-backed DRS shimArchitecture leadPre-sprint
B2Notes chain exportConfirm GET /notes/:id/chain is implemented; if not, estimate effort and assignNotes teamPre-sprint
B3API contracts for samplesheet editorAgree endpoint shapes C-1 through C-5 (see Anurag's Researcher Workflow epic)Anurag + backend leadM2 Week 1
B4Alpha hard deadlineApril 6, 2026 — deployed and accessible to invited external usersBorisConfirmed

2.1 M1: Alpha — Mid-April 2026 (~6 Weeks)

Vertical: V1 Biomedical Research OS (foundation)
Product: PA Cloud
Goal: A researcher can ingest public data, run a Nextflow pipeline, and see registered outputs. Minimum viable proof that the pipeline works end-to-end.

No compliance work in M1. The cloud deployment targets European providers where infrastructure-level security is handled by the hosting provider.

Scope

In:

  • SRA data ingestion — FASTQ download (AWS Open Data, SRA Toolkit fallback), Upgate upload, DRS registration with source metadata, pre-flight size estimation with user acknowledgement, partial job status (per-file tracking, auto-retry with backoff), controlled-access flagging
  • nf-core DSL2 workflow execution via Metis WES on Kubernetes — output DRS registration via direct path (batch POST /objects), platform-controlled --outdir
  • RabbitMQ event topology — pipeline_events exchange, ingestion.complete and workflow.complete routing
  • Frontend — ingestion UI (accession input, pre-flight, per-file progress), workflow selector (TRS query), dataset browser with copyable S3 URIs and "Copy all" bulk action, config upload with client-side S3 URI validator, run status dashboard with 10–15s polling, result browser with download links
  • Cloud deployment on European provider (Exoscale or equivalent), accessible to invited users

Out (deferred to M2+):

  • Analysis agent, publishing module, samplesheet editor, oCIS, accounting, compliance/Wazuh, Agentic OS

Timeline

WeekBackend (2 engineers)Backend (2 engineers)FrontendDevOps
1Orchestrator scaffold (FastAPI, pa-auth, CNPG migrations, Redis heartbeat)Nextflow gRPC plugin — TRS/DRS resolution, DSL2 param separationCloud environment provisioning
1–2SRA download manager (accession resolution via MDI, dedup, AWS Open Data + fallback)Nextflow plugin — --outdir enforcement. Target: "Hello World" acceptanceIngestion UI: accession input, pre-flight, progressK8s namespace, ServiceAccount, RBAC
2–3Upgate integration, DRS registration, pre-flight estimationtrs-syncer nf-core plugin, trs-cache-filer validationDataset browser with S3 URIs, bulk copyRabbitMQ exchange + bindings
3–4RabbitMQ events, partial job status, controlled-access flaggingS3 provisioning + DRS output registration (batch). WES endpoint stabilisation.Workflow selector, config upload with URI validatorOTEL instrumentation (Go + Python)
5TRS /tests for top 20 modulesTES engine (trusted + sandboxed profiles), K8sJobWatcherRun status dashboard, result browser
6Integration testing, bug fixesEnd-to-end demo on real SRA dataUX polishDeploy to cloud

Critical-Path Tasks

Nextflow gRPC Plugin for Metis

  • TRS URI resolution (workflow repo) and DRS URI resolution (top-level config files submitted via WES)
  • DSL2 parameter separation: workflow_params → -params-file, resource overrides → -c
  • Platform-controlled --outdir enforcement (CLI flags override user-provided outdir)
  • Does not parse contents of user-provided params or sample sheets for embedded DRS URIs
  • Acceptance: "Hello World" module runs via POST /runs → successful k8s pod execution

Project-Scoped S3 Provisioning + DRS Output Registration

  • Generate S3 output paths from OPA project context
  • Post-execution: recursive --outdir parse, compute sizes + MD5 checksums, batch POST /objects to Ares
  • Acceptance: Workflow outputs in correct S3 location AND registered in DRS

WES Run Management Endpoint Stabilisation

  • Cursor-based pagination, correct lifecycle states (QUEUED, RUNNING, COMPLETE, SYSTEM_ERROR)
  • CancelRun → explicit 501 Not Implemented
  • Acceptance: GET /runs/{id} reflects correct state

Exit Criteria

  1. Ingest 2+ SRA accessions → DRS objects in Ares (or shim)
  2. nf-core "Hello World" runs → k8s pod completes → outputs registered as DRS objects
  3. nf-core/rnaseq runs on minimal dataset with samplesheet
  4. Platform deployed to cloud and accessible to invited external users

2.2 M2: Beta-1 — June 1, 2026 (~8 Weeks)

Vertical: V1 Biomedical Research OS (full pipeline)
Product: PA Cloud
Goal: The full research pipeline works end-to-end: Ingest → Execute → Analyse → Publish. The samplesheet editor closes the cohort-to-pipeline gap. oCIS workspace is live.

Scope (Adds to M1)

Analysis and Publishing:

  • pa-analysis-agent — methodology selection (DESeq2, edgeR, limma-voom via constrained decision matrix), LLM code generation (Python and R), three-stage validation framework (pre-execution, post-execution, plausibility)
  • Feature selection (MRMR, Boruta, SHAP) — same infrastructure as DESeq2/edgeR: new rows in the decision matrix + new validation rules, no new services
  • LLM Sandbox Phases 1–2 (per Khoa's epic): pa-sandbox-mcp-server wrapping llm-sandbox, pre-built images (pa-bio:3.11, bioconductor:3.18), K8s namespace + NetworkPolicy, pa-db-mcp-server (read-only PostgreSQL, OPA gating), pa-storage-mcp-server (DRS resolution, presigned URLs), self-healing debugger (basic)
  • Publishing module — minimal metadata form, resource selection (notes primary), DRS URI resolution for packaging, RO-Crate 1.2 generation, ZIP to MinIO, background task with per-item tracking

Samplesheet Editor (per Anurag's Researcher Workflow epic):

  • Cohort export → in-platform spreadsheet editor (AG Grid or equivalent)
  • File path and metadata picker panel integrated into editor
  • Save samplesheet as project file (DRS-registered, reusable)
  • Pipeline selection + "Save & Run" submission from editor toolbar
  • Run list with polling, run detail with human-readable error translation, output browser

oCIS Workspace:

  • oCIS v8 with s3ng driver on MinIO, Keycloak SSO, OPA access control
  • Project Spaces model: each project maps to an oCIS Space
  • Basic file browsing, sharing, and organisation through web UI
  • Not on the pipeline critical path but required for the workspace experience

Metadata Module (per Samyak's epic, partial):

  • Workstream 1 begin: unconstrained SRA fetcher with checkpointing, Postgres loader optimisation
  • Workstream 2: dynamic query builder, filter API endpoints
  • Workstreams 3–4 (Contrast data model, Ares integration) deferred until post-M3 clarity on user flow. The distinction between intra- and inter-cohort contrasts needs further design; saving sub-cohorts (original filters + additional filters) may be sufficient. Ares integration also deferred pending readiness confirmation.

Timeline

WeeksBackend (Analysis)Backend (Platform)FrontendDevOps
7–8Agent scaffold. Methodology selector + constrained parser. Code generator.LLM Sandbox Phase 1 (spike, images, pa-sandbox-mcp-server, NetworkPolicy).Samplesheet editor (AG Grid eval, edit/add/delete). Cohort export button.oCIS deployment (s3ng, Keycloak, OPA).
9–10Validation framework (3 stages). Sandboxed TES execution. RabbitMQ wiring.LLM Sandbox Phase 2 (pa-db-mcp-server, pa-storage-mcp-server, MCP auth).File path picker with metadata panel. Save samplesheet.
11–12Publishing module (form, selection, DRS resolution, RO-Crate, ZIP).Self-healing debugger. E2E integration test (agent → MCP → sandbox → S3 → DB).Pipeline catalogue. "Save & Run." Run list + detail + error translation.
13–14Feature selection (matrix rows + validation rules). Golden file CI tests.Metadata Module WS1 begin (fetcher, checkpointing). WS2 (query builder, filter API).Output browser. oCIS frontend integration.Postgres loader optimisation.

Exit Criteria

  1. Full pipeline: Ingest → Execute → Analyse → Publish on MRSA anchor use case
  2. DESeq2 sandbox → results registered + validation passes
  3. Validation framework catches all known-bad test cases (golden file: pasilla dataset, Spearman > 0.9)
  4. Publish job → valid RO-Crate ZIP with correct metadata
  5. Samplesheet: export cohort → edit → insert paths via picker → Save & Run → pipeline executes
  6. oCIS: file browsing, sharing, and project Spaces operational

2.3 M3: Beta-2 — August 3, 2026 (~8 Weeks)

Verticals: V1 hardening + V2 Project Presence (foundation)
Product: PA Cloud (On-Prem preparation begins)
Goal: Platform ready for pilot UAT. Accounting live. Security baseline for procurement conversations. First elements of the AI research companion.

Scope (Adds to M2)

V1 Hardening:

  • Accounting/billing — usage metering (CPU-hours, GB-months, ingestion volume per tenant), Stripe for cloud, institutional PO for On-Prem pipeline, storage tier quotas, overage alerts, free academic tier, billing audit trail, usage dashboard
  • LLM Sandbox Phases 3–4 (per Khoa's epic): agent MCP client wiring, multi-step reasoning loop, bio prompt engineering (DESeq2, limma, scanpy; validate 5+ patterns), Monaco Editor workbench (syntax highlighting, runtime selector, run/stop, SSE log streaming, inline output rendering, .ipynb export)
  • Metadata Module WS1 continued/completed (full SRA ingestion pipeline), WS2 completed (filter API endpoints with dynamic counts)
  • ADR-10 benchmark (10M records in PostgreSQL, ClickHouse, DuckDB) — database engine decision
  • Load testing: 5 concurrent ingestions, 10 concurrent workflows, 3 concurrent analyses, 2 concurrent publishes. Targets: 500 MB/s ingestion, <5s workflow submission, <5s sandbox cold start, 10 GB artefact in <10 min
  • Failure injection: RabbitMQ drop, MinIO kill, Ares unavailable, ResourceQuota exhaustion, invalid S3 URIs
  • Documentation, API reference, deployment runbook
  • Pilot UAT with anchor partner

V2 Foundation (basic):

  • Persistent research memory — soul documents applied to per-researcher context, loaded at session start from Qdrant
  • Proactive monitoring seed — heartbeat cron for pipeline completion notifications (pipeline.analysis.complete → researcher notification via in-app or email, not yet Matrix)
  • Notes ELN AI integration as the first element of the research companion (contextual suggestions based on project history)

Security Hardening (On-Prem preparation):

  • Wazuh Phases 1–4 (cluster provisioning, TLS, agent rollout, log forwarding)
  • Alertmanager → calert wiring
  • Cinder encryption on ClickHouse volumes
  • mTLS on observability cluster
  • S3 cold storage with 12-month retention

Standards Framework

CapabilityEvidenceTest Gate
Data at restCinder encryptionVolume metadata confirms
Data in transitIstio mTLS + internal mTLSCert validation
Alert deliveryAlertmanager → calert → Google ChatSynthetic alert within 60s
Log retentionS3 cold storage, 12-month lifecycleOldest log ≥ 365 days
SIEM baselineWazuh Phases 1–4Dashboard: all agents connected
Access controlOPA on all servicesPolicy test suite passes
Audit trailBilling + research provenanceExport covers 30 days
Analysis correctnessThree-stage validationCI: all known-bad caught

Exit Criteria

  1. All M2 criteria hold
  2. Billing: at least one tenant metered, can generate invoice
  3. Monaco workbench: write/run code, view outputs, export .ipynb
  4. ADR-10 benchmark complete, database decision made
  5. Wazuh Phases 1–4 operational
  6. Encryption, mTLS, cold storage active
  7. Load test targets met, P0/P1 bugs resolved
  8. Pilot partner UAT sign-off
  9. V2: researcher receives pipeline completion notification without polling; soul document loaded per session

2.4 M4 — October 6, 2026 (~8 Weeks)

Verticals: V2 Project Presence (companion) + V3 Sovereign Enterprise Suite (foundation)
Product: Cloud + On-Prem
Goal: The AI research companion is operational. The sovereign productivity stack begins deployment. First On-Prem institutional deployment.

Team: Seed funding closes. Team expands from 6 to 10–14. New hires: +2 backend, +1 frontend, +1 ML/agent engineer, +1 DevOps, +1 bioinformatics full-time, +1 product/design optional.

Scope from M4 onward is shaped by user feedback from M1–M3. The items below represent the planned direction; specifics will be adjusted.

V2 Scope (basic)

  • pa-relay service — Matrix → LiteLLM bridge, session management, audit logging, NO_REPLY suppression
  • Researcher-specific soul documents with persistent memory (project history, active hypotheses, dataset context maintained across sessions via Qdrant)
  • Heartbeat cron loop — configurable per deployment: pipeline completion alerts, cohort match notifications, surveillance signal thresholds
  • Skills registry YAML + CLI wrappers — priority conversions: pa-drs-fetch, pa-policy-search, pa-cohort-query, pa-compliance-check

V3 Scope (foundation, basic)

  • Matrix — Deployment for institutional messaging and chat. Element as the client. Bridges to pa-relay for agent-accessible communication channels.
  • oCIS hardened — Document governance: fine-grained access controls, audit trails, retention policies. oCIS Spaces enforced per project/team.
  • Collaborative editing — Begin integration of OnlyOffice or Collabora via WOPI protocol in oCIS. Basic document and spreadsheet co-editing. Tool selection decision required early in M4.
  • Video conferencing — Evaluate and select: Jitsi Meet, Element Call (Matrix-native), or BigBlueButton. Deploy basic instance integrated with institutional auth (Keycloak SSO). Decision based on: sovereign deployability, Matrix integration quality, and institutional fit.

On-Prem

  • Wazuh Phases 5–7 (FIM, CVE scanning, CIS benchmarks, compliance modules)
  • Wazuh architecture documentation for procurement
  • First institutional On-Prem deployment preparation
  • Full SRA ingestion completion (149M records, using ADR-10 engine)

Infrastructure: MinIO → SeaweedFS Migration

MinIO Community Edition has been phased out, creating a licensing and business continuity risk for the platform. M4 begins the migration to SeaweedFS as the S3-compatible object storage layer.

  • Deploy SeaweedFS on private K8s
  • Migrate all services currently using MinIO: Upgate (file upload), Nextflow workDir, Velero backup, Harbor registry backend, DRS/presigned URLs, oCIS s3ng driver, publishing staging area
  • Validate S3 API compatibility across all integration points
  • Decommission MinIO

This migration is feasible in M4 due to the expanded team (10–14 engineers). Assign 1 backend + 1 DevOps engineer to the migration track in parallel with V2/V3 feature work.

Infrastructure: Storage Layer Autoscaling (On-Prem)

On-Prem deployments run on OpenStack, which lacks managed load balancers (no Magnum/Octavia). Custom autoscaling is required:

  • SeaweedFS volume server horizontal scaling triggered by Prometheus capacity metrics
  • Cinder volume auto-expansion for persistent storage
  • K8s worker node autoscaling via OpenStack Nova (Terraform + Ansible triggered by Prometheus)
  • Capacity alerting integrated into the observability stack

This work begins in M4 and may extend into M5 depending on the complexity of each institution's OpenStack environment.

Exit Criteria

  1. V2: researcher has persistent AI companion that remembers project context across sessions, receives proactive alerts
  2. V3: Matrix operational for team chat, collaborative editing functional (basic), video conferencing deployed
  3. On-Prem: first deployment environment provisioned with Wazuh active
  4. Full SRA queryable via ChatNexus
  5. SeaweedFS deployed and validated; at least one major service (Upgate or Nextflow workDir) migrated from MinIO

2.5 M5 — December 1, 2026 (~8 Weeks)

Verticals: V3 Sovereign Enterprise Suite (productivity) + V4 Government OS (foundation)
Product: Cloud + On-Prem
Goal: The full sovereign productivity stack is operational. Government deployment architecture is validated.

V3 Scope (operational)

  • Matrix: channels, threads, file sharing, search — operational for institutional use
  • Collaborative editing: OnlyOffice/Collabora hardened — version history, comment threads, review workflows
  • Video conferencing: operational, integrated with calendar/scheduling if applicable
  • Governance layer: role-based access, audit trails, retention policies, compliance reporting integrated across oCIS + Matrix + editing tools
  • AI-assisted document work: summarisation, classification, policy gap analysis within sovereign perimeter

V4 Scope (foundation, basic)

  • Federated analytics architecture — design and prototype the hub-and-spoke model for multi-institutional collaboration (building on PAHO syndromic surveillance pattern)
  • Multi-jurisdictional policy intelligence — extend Policy Team agent for cross-country regulatory analysis
  • Compliance reporting framework — HIPAA, GDPR, ISO 27001 templates and continuous monitoring
  • Multi-region deployment preparation (EU sovereign + MENA)

Platform

  • MinIO → SeaweedFS migration completion (if not finished in M4) and decommission
  • Storage layer autoscaling completion (SeaweedFS scaling, Cinder expansion, K8s node autoscaling on OpenStack)
  • DRS URI resolution inside workflow files (post-ADR-3 — if user feedback from M1–M3 confirms need)
  • Repository connectors (Zenodo, Dataverse) — basic publish-to-external
  • Incremental re-ingestion (download only delta when cohort updated)
  • CLI/API ingestion for power users
  • Metadata Module WS3–4 revisited: Contrast data model and Ares integration, if user feedback and design clarity warrant it

Exit Criteria

  1. V3: full productivity suite operational (files + chat + docs + video) under institutional governance
  2. V4: federated analytics prototype functional between two test environments
  3. At least one On-Prem institutional client in active deployment
  4. Multi-region: deployment architecture validated

2.6 M6: V1 Full — March 2027 (~12 Weeks)

Verticals: V4 Government OS (operational) + V5 DeSci Marketplace (foundation)
Product: Cloud + On-Prem
Goal: Full platform at production scale. Paying institutional clients. Agentic OS mature.

Team assumption: Scale from 6 → 10–14 with seed funding. Hires: +2 backend, +1 frontend, +1 ML/agent, +1 DevOps, +1 bioinformatics full-time, +1 product/design optional.

V4 Scope (operational)

  • Federated hub-and-spoke productised — municipality/institution nodes transmit aggregates only, raw data never leaves origin
  • Compliance reporting operational for HIPAA, GDPR, ISO 27001
  • Multi-region: at least one EU and one MENA deployment live
  • Air-gapped deployment validation for government contexts
  • Sovereign AI inference: full reasoning pipelines inside government security perimeter

V5 Scope (foundation, design + prototype only)

  • On-chain hypothesis registration — design, smart contract prototype, timestamp-based priority claim
  • Research data objects — DRS-registered datasets with on-chain provenance records
  • Federated data marketplace architecture — design for governed dataset access through protocol
  • Protocol fee model — design for transaction-based revenue on dataset access, hypothesis citation, federated analytics

Agentic OS (mature)

  • Validation Triad — Naysayer + ELO Rater + Risk Analyser, two-round deliberation, DRS audit artefact for every analysis output
  • Agent self-modification (constrained) — operational sections of soul.md modifiable with human confirmation, constitutional constraints hash-verified
  • Skills registry complete — all stateless MCP servers converted to CLI wrappers
  • Full heartbeat configurations operational: WHO/PAHO (30-min), Saudi MoH (daily), pharma (event-triggered)

Platform Capabilities

  • Custom workflows (non-nf-core, user-uploaded)
  • Live log streaming, cancel/resume for long-running pipelines
  • RAG-driven methodology selection (replaces decision matrix)
  • Workflow parameter UI generation from nextflow_schema.json

Exit Criteria

  1. V4: federated deployment operational between 2+ institutions
  2. V5: hypothesis registration prototype functional on testnet
  3. Agentic OS: heartbeat delivers proactive alerts for at least one live deployment; Validation Triad runs on analysis outputs
  4. At least 2 institutional clients with active billing
  5. Multi-region live
  6. Platform handles 50+ concurrent researchers without degradation

2.7 Dependency Graph

   ┌──────────────────────────────┐
   │  M0: BLOCKERS                │
   │  Ares confirmed/shimmed      │
   │  Notes chain endpoint        │
   │  Alpha hard deadline          │
   └──────────────┬───────────────┘

   ┌──────────────▼───────────────┐
   │  M1 ALPHA (Apr)              │
   │  V1: Ingest + Execute        │
   │  Cloud deployment            │
   │  ⚠ CRITICAL PATH:           │
   │  Nextflow gRPC plugin        │
   └──────────────┬───────────────┘

   ┌──────────────▼───────────────┐
   │  M2 BETA-1 (Jun)            │
   │  V1: + Analyse + Publish     │
   │  + Samplesheet editor        │
   │  + oCIS + LLM Sandbox 1–2   │
   │  + Metadata WS1–2 begin     │
   └──────────────┬───────────────┘

   ┌──────────────▼───────────────┐
   │  M3 BETA-2 (Aug)            │
   │  V1: hardening + billing     │
   │  V2: foundation (memory,     │
   │      heartbeat, ELN AI)      │
   │  + LLM Sandbox 3–4          │
   │  + Security baseline         │
   │  + Pilot UAT                 │
   └──────────────┬───────────────┘

   ┌──────────────▼───────────────┐
   │  M4 (Oct)                    │
   │  V2: companion (relay, souls,│
   │      skills, heartbeat)      │
   │  V3: foundation (Matrix,     │
   │      collab editing, video)  │
   │  + First On-Prem deploy      │
   │  + Full SRA (149M)           │
   └──────────────┬───────────────┘

   ┌──────────────▼───────────────┐
   │  M5 (Dec)                    │
   │  V3: operational (full       │
   │      productivity suite)     │
   │  V4: foundation (federated   │
   │      analytics, compliance)  │
   │  + Multi-region prep         │
   └──────────────┬───────────────┘

   ┌──────────────▼───────────────┐
   │  M6 V1 FULL (Mar 2027)      │
   │  V4: operational (gov OS)    │
   │  V5: foundation (DeSci)      │
   │  + Agentic OS mature         │
   │  + Paying clients            │
   └──────────────────────────────┘

Critical path through M1–M2: Ares readiness → SRA download + DRS registration (M1 Wk 1–2) → Nextflow gRPC plugin (M1 Wk 1–4) → DRS output registration (M1 Wk 3–5) → Analysis sandboxed execution (M2) → End-to-end integration (M3).


3. Blockers, Risks, and Open Questions

3.1 Pre-Sprint Blockers

See §2.0 (B1–B4).

3.2 Critical Risks

R1: LLM-Generated Analysis Correctness. Plausible but wrong results are worse than crashes. Mitigation: Three-stage validation, golden file CI, validation_failed as distinct status, researcher review always required. Residual: Novel failure modes not in validation rules; partially mitigated by Validation Triad (M6).

R2: Manual S3 URI Entry UX. Error-prone for non-technical users. Mitigation: Client-side validator, file picker panel (Anurag's epic FE-02-B), bulk copy. Trigger: If top-3 adoption blocker, promote DRS URI resolution.

R3: Nextflow gRPC Plugin (Schedule). The Nextflow plugin is the critical path. Mitigation: Most experienced engineer, "Hello World" first, weekly check-in with escalation.

R4: API Contracts Not Finalised. Samplesheet editor depends on 5 backend contracts (C-1 through C-5). Mitigation: Agree in M2 Week 1, frontend builds against mocks.

R5: V3 Tool Selection. Collaborative editing (OnlyOffice vs Collabora) and video conferencing (Jitsi vs Element Call vs BBB) require evaluation against sovereign deployability, Matrix integration, and institutional fit. Wrong choice means migration cost. Mitigation: Spike evaluation in M4 Week 1, decide before committing engineering time.

R5b: MinIO Community Edition Phased Out. MinIO CE is no longer available, creating licensing and business continuity risk. Every storage-dependent service (Upgate, Nextflow, Velero, Harbor, DRS, oCIS, publishing) is affected. Mitigation: SeaweedFS migration planned for M4 with expanded team. Validate S3 compatibility early. If SeaweedFS reveals incompatibilities, evaluate Garage or Ceph RGW as fallback.

3.3 Moderate Risks

R6: Workflow DSL2 compatibility — trs-syncer filters, TRS /tests, fallback allowlist.
R7: Output hijacking via publishDir — test with nf-core/rnaseq, batch DRS registration.
R8: Publishing large artefacts — background task, per-item tracking, 50 GB cap.
R9: Samplesheet column mismatch — fail and surface error (MVP), schema validation (post-MVP).
R10: Editor state loss — auto-save to browser storage every 30s.
R11: Metadata Module scale — 149M records may exceed PostgreSQL. ADR-10 benchmark in M3.
R12: V4 federated analytics complexity — hub-and-spoke with privacy-preserving aggregation is architecturally complex. PAHO deployment is the proof point but generalising it is non-trivial.

3.4 Open Questions

#QuestionBlocking?Deadline
Q1Alpha hard deadlineM1Resolved: April 6, deployed for invited users
Q2Cloud provider selection (primary EU provider)M1 deployM1 Wk 1
Q3Billing model — Stripe vs PO vs hybridM3 billingM2 end
Q4oCIS scope for M2 — read-only browser or full workspace?NoM2 Wk 7
Q5ADR-10 database engine (PostgreSQL vs ClickHouse vs DuckDB)Metadata expansionM3 Wk 19
Q6Contrast model design — sub-cohorts sufficient vs explicit contrasts?Metadata WS3–4Post-M3
Q7Samplesheet generation per workflow type (column mapping)Auto-generationPost-M3
Q8Publishing module extraction to standalone serviceNoPost-M2
Q9V3 collaborative editor: OnlyOffice vs CollaboraM4M4 Wk 1
Q10V3 video conferencing: Jitsi vs Element Call vs BBBM4M4 Wk 1
Q11Monorepo (Nx) adoption timing across all servicesNoPost-M1
Q12Istio ambient mode fallback — sidecar mode pre-configured?NoM3
Q13TUS vs S3 presigned URL upload consolidationNoPost-M3
Q14Wazuh Indexer storage growth rate — cold-tier from day one?NoM3
Q15R language prompt engineering and validation — testing effort for Bioconductor workflows? R execution included from M2 but prompt quality needs validation.NoM2
Q16MinIO → SeaweedFS migration — S3 compatibility gaps? Fallback to Garage or Ceph RGW if SeaweedFS insufficient?M4M4 Wk 2

4. High-Level Architecture

4.1 Pipeline Overview

Hypothesis Generator


Cohorts Service ◄── MDI Postgres


┌──────────────────────────────────────────────────────────────────────┐
│  pa-pipeline-orchestrator                                            │
│                                                                      │
│  Ingest ──► Execute (Metis/WES) ──► event ──► pa-analysis-agent     │
│  (SRA → MinIO/DRS)  (Nextflow on k8s)        (DEA + publish)       │
└──────────────────────────────────────────────────────────────────────┘
    │                                                   │
    ▼                                                   ▼
MinIO ◄──► Ares (DRS)                        Results (DRS objects)


                                              Publishing Module
                                              (RO-Crate → ZIP → MinIO)

4.2 Two-Service Model

pa-pipeline-orchestrator — data ingestion and atomic task execution. Ingestion jobs, per-file tracking, TES task state. Exposes /ingestion/jobs and /ga4gh/tes/v1/tasks.

pa-analysis-agent — methodology selection, code generation, validation, publishing. Consumes pipeline.workflow.complete, publishes pipeline.analysis.complete and publish.artefact.ready. Exposes /analyses, /methodologies, /artefacts.

Metis (WES) — existing GA4GH workflow execution. TRS/DRS resolution, Nextflow on k8s, MongoDB for run state.

Notes (ELN) — existing immutable append-only chain. Chain export for publishing.

4.3 Event Topology

RabbitMQ is the sole durable event bus. Redis for caching, heartbeats, ephemeral pub/sub only.

Exchange: pipeline_events (topic)
  ├── pipeline.ingestion.complete  → orchestrator
  ├── pipeline.workflow.complete   → analysis-agent
  ├── pipeline.analysis.complete   → notification service
  ├── publish.artefact.ready       → notification service
  └── drs.object.registered        → downstream consumers

4.4 LLM Inference

All inference on-premises. GPT-OSS 120B via vLLM (31.3 tokens/sec, 50 concurrent users). Embeddings: Qwen3-Embedding-0.6B (2×A100). No data leaves the sovereign environment.

4.5 oCIS Integration (M2+)

oCIS v8 provides the file workspace layer. s3ng storage driver on MinIO (shared storage backend with the pipeline). Keycloak SSO for authentication. OPA for access control consistent with the rest of the platform. Project Spaces model: each PA project maps to an oCIS Space, giving researchers a familiar file browsing and sharing experience.

Not on the pipeline critical path. The pipeline operates on DRS objects via MinIO; oCIS is the human-facing view of the same storage.

4.6 V3 Sovereign Productivity Stack (M4–M5)

The target architecture for V3 Sovereign Enterprise Suite combines four components, all self-hosted within the institutional or cloud environment:

  • oCIS — file management, project spaces, governance layer
  • Matrix (Element) — institutional messaging, channels, threads; bridges to pa-relay for agent interaction
  • OnlyOffice or Collabora — collaborative document/spreadsheet editing via WOPI protocol in oCIS (tool selection: Q9, decided M4 Wk 1)
  • Jitsi, Element Call, or BBB — video conferencing with Keycloak SSO (tool selection: Q10, decided M4 Wk 1)

All four share Keycloak for authentication and OPA for access policy. The governance layer (audit trails, retention, compliance reporting) spans all components.

4.7 Service Pattern

All PA services follow: FastAPI + Keycloak OIDC + structured logging (OpenTelemetry → SigNoz) + CNPG Postgres (RW/RO split) + Redis heartbeat + Harbor images + ArgoCD GitOps.


5. Use Cases

UC-1: Ingest Data from NCBI SRA (M1)

Orchestrator resolves accessions to SRR via MDI, deduplicates, surfaces pre-flight estimate, downloads from AWS Open Data (fallback: SRA Toolkit), uploads via Upgate, registers DRS objects. Partial success first-class. See §6.

UC-2: Execute a Nextflow Workflow (M1)

User selects nf-core workflow from TRS, copies S3 URIs from dataset browser (client-side validator), submits to Metis. Platform enforces --outdir. Outputs batch-registered. See §7.

UC-3: Assemble Samplesheet and Submit Run (M2)

Researcher exports cohort to in-platform editor. Platform pre-fills sample IDs and file paths. Researcher adds pipeline-specific columns, inserts file paths via picker panel, saves as project file, selects pipeline, clicks "Save & Run." Full spec: Anurag's Researcher Workflow epic.

UC-4: Run Statistical Analysis (M2)

Analysis agent selects method, generates code, validates (3 stages), submits sandboxed TES task, registers results. Full spec: §8 + Khoa's LLM Sandbox epic.

UC-5: Publish a Data Artefact (M2)

Minimal metadata form, resource selection, RO-Crate 1.2 ZIP packaging. Background task. See §9.


6. Data Ingestion — Design Detail

6.1 Download Strategy

Primary: AWS Open Data (no credentials, negligible cost). Fallback: SRA Toolkit Docker image. Accession resolution always to SRR via MDI first, E-utils only if not indexed. Paired-end: two DRS objects per accession.

6.2 Registration Paths

External uploads via Upgate (chunked, resumable → RabbitMQ → Ares). Internal outputs via direct path (POST /objects, milliseconds).

6.3 Ingestion Does NOT (M1–M2)

Assign cohort/contrast labels. Determine pipelines. Parse/validate file contents. Support controlled-access or non-SRA repositories.


7. Nextflow Execution — Design Detail

7.1 Tradeoffs

Manual S3 URIs. Client-side validator. WES via Metis. Native k8s (TES later). See ADR-2, ADR-3.

7.2 Output Capture

Recursive --outdir parse → sizes + MD5 → batch DRS registration → tag with pipeline/version/output_type/run_id/project_id → publish pipeline.workflow.complete.


8. Analysis Agent — Design Detail

8.1 Methodology Selection

Config-driven decision matrix. DESeq2, edgeR, limma-voom. Feature selection (MRMR, Boruta, SHAP) follows the identical pattern: new matrix rows + validation rules. Conditions evaluated by constrained parser — whitelisted variables + numeric literals + comparisons only. eval() prohibited.

8.2 Validation Framework

Three stages: pre-execution (imports, paths, no network, cohort_arm used), post-execution (required columns, no NaN/Inf, padj in [0,1], plot valid), plausibility (p-value KS test, fold-change variance, sample count cross-check). validation_failed is distinct from failed. Golden file CI: pasilla dataset, Spearman > 0.9. Full spec: Khoa's LLM Sandbox epic.

8.3 LLM Sandbox

Four MCP servers: pa-sandbox-mcp-server, pa-db-mcp-server, pa-storage-mcp-server, pa-search-mcp-server. Pre-built images (pa-bio:3.11, bioconductor:3.18, pa-bio-ml:3.11) in Harbor with DaemonSet pre-pull. R included from M2. Self-healing debugger: PII strip → diagnose → web search → patch → retry (max 2). Full spec: Khoa's LLM Sandbox epic.


9. Publishing Module — Design Detail

Exit point. Packages workspace resources into RO-Crate 1.2 ZIP. Notes are primary narrative source. Create staging → resolve/download → fetch note chains → ro-crate-metadata.json → ZIP → MinIO → notify. Background task, per-item tracking, 50 GB cap. Lives within pa-analysis-agent for M2.


10. Operational Readiness

10.1 Compliance Timeline

Compliance is minimal for M1–M2 (cloud product, provider handles infrastructure security). Hardening begins in M3 for On-Prem preparation, completes across M4–M5.

ItemMilestoneProduct Target
Alertmanager → calertM3On-Prem
Cinder encryptionM3On-Prem
mTLS observabilityM3On-Prem
S3 cold storage 12-monthM3On-Prem
Wazuh Phases 1–4M3On-Prem
Wazuh Phases 5–7 (FIM, CVE, compliance)M4On-Prem
Wazuh architecture documentationM4On-Prem (procurement)
Full compliance reporting (HIPAA, GDPR, ISO 27001)M5On-Prem

10.2 Capacity Planning

Year 1 projected (10–20 researchers): 5–10 concurrent runs (burst 20), ~5 TB/month ingestion, ~85 TB MinIO Year 1, ~1,700–3,400 AUD/month storage, GPU at 10–30%, no cloud API spend for core pipeline.

Quotas per project: 16 CPU / 64 GB RAM / 500 GB ephemeral default. Per-workflow max 8 CPU / 32 GB RAM. Per-sandbox max 4 CPU / 8 GB RAM / 2h timeout.

10.3 Testing Strategy

Functional progression (M1–M2): TES echo → ingest 2 accessions → Hello World → rnaseq → DESeq2 sandbox → E2E → publish.

Validation CI (M2+): Known-bad inputs/outputs, golden file (pasilla, Spearman > 0.9).

Load testing (M3): 5 ingestions, 10 workflows, 3 analyses, 2 publishes concurrent. Targets: 500 MB/s, <5s submission, <5s cold start, 10 GB in <10 min.

Failure injection (M3): RabbitMQ kill, MinIO kill, Ares unavailable, quota exhaustion, invalid URIs.

Frontend profiles (M3+): 4 curated (Full, Research, Surveillance, Minimal) in CI.


11. Agentic OS Layer

Distributed across milestones rather than big-bang:

ComponentMilestoneNotes
Soul documents (per-researcher)M3V2 foundation — persistent memory
Pipeline heartbeat notificationsM3V2 foundation — basic proactive alerts
pa-relay (Matrix → LiteLLM)M4V2 companion — messaging channel
Skills registry + CLI wrappersM4V2 companion — lightweight tool access
Full heartbeat (multi-config)M4–M5WHO/PAHO, Saudi MoH, pharma
Validation TriadM6Two-round deliberation, DRS audit record
Agent self-modificationM6Constrained, human confirmation, hash check

Appendix A: Architecture Decision Records

ADRDecisionStatus
ADR-1Two services, not fourActive
ADR-2WES (Metis), native k8s backendActive
ADR-3No DRS URI resolution in workflow files (MVP)Active
ADR-4Dual file registration pathActive
ADR-5RabbitMQ as primary event busActive
ADR-6Decision matrix, constrained parser, no eval()Active
ADR-8Nx monorepo, Bazel on pain pointsActive
ADR-9Adopt agentic OS patterns, don't fork OpenClawActive
ADR-10Database for full SRA (149M). Benchmark M3.Pending
ADR-11LLM Sandbox engine (llm-sandbox, MIT)Active
ADR-12Three-tier multi-tenancy, OPAImplemented
ADR-13Four-tier deployment environmentsImplemented
ADR-14Zero-trust (Istio ambient, OPA sidecars)Implemented
ADR-15Frontend 9 modules, runtime white-labellingImplemented
ADR-16SIEM (Wazuh)Planned (M3)
ADR-17MinIO → SeaweedFS migration. Deploy SeaweedFS, migrate all S3 consumers, validate compat, decommission MinIO.Planned (M4)
ADR-18Storage layer autoscaling (On-Prem). Custom Terraform + Ansible scaling for SeaweedFS, Cinder, K8s nodes on OpenStack (no Magnum/Octavia).Planned (M4–M5)

DocumentOwnerMaps To
LLM Sandbox Epic (PA-SANDBOX)KhoaPhases 1–2 → M2, Phases 3–4 → M3
Researcher Workflow EpicAnuragSamplesheet editor + run monitoring → M2
Metadata Module EpicSamyakWS1–2 → M2/M3, WS3–4 deferred
Admin Docs Hub EpicAlexIndependent track
Observability Cluster SummaryCompliance detail for §10.1
PA Atlas v4.0BorisStrategic vision, 5 verticals, commercial model
Alpha Release RoadmapCTOSuperseded by §2.1
Platform Functionality RoadmapBorisStrategic framing, superseded by §2
Milestones & v4→v5 DeltaBorisCross-reference driving this version

Appendix C: Database Schemas

C.1 Pipeline Orchestrator

CREATE TABLE ingestion_jobs (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    target_dataset_id UUID NOT NULL,
    accessions JSONB NOT NULL,
    status TEXT NOT NULL DEFAULT 'queued',
    requested_by TEXT NOT NULL,
    error_message TEXT,
    created_at TIMESTAMPTZ DEFAULT now(),
    started_at TIMESTAMPTZ,
    completed_at TIMESTAMPTZ
);

CREATE TABLE ingestion_files (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    ingestion_job_id UUID NOT NULL REFERENCES ingestion_jobs(id),
    accession TEXT NOT NULL,
    filename TEXT NOT NULL,
    drs_uri TEXT,
    status TEXT NOT NULL DEFAULT 'pending',
    size_bytes BIGINT,
    checksum_md5 TEXT,
    error_msg TEXT
);

CREATE TABLE tes_tasks (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    analysis_job_id UUID,
    name TEXT,
    state TEXT NOT NULL DEFAULT 'QUEUED',
    security_profile TEXT NOT NULL DEFAULT 'trusted',
    submitted_by TEXT NOT NULL,
    inputs JSONB NOT NULL,
    outputs JSONB NOT NULL,
    executors JSONB NOT NULL,
    resources JSONB,
    code_bundle JSONB,
    logs JSONB,
    output_drs_uris JSONB,
    created_at TIMESTAMPTZ DEFAULT now(),
    started_at TIMESTAMPTZ,
    completed_at TIMESTAMPTZ
);

C.2 Analysis Agent

CREATE TABLE analysis_jobs (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    hypothesis_id UUID,
    workflow_run_id TEXT NOT NULL,
    cohort_id UUID NOT NULL,
    assay_type TEXT NOT NULL,
    status TEXT NOT NULL DEFAULT 'pending',
    methodology JSONB,
    validation_errors JSONB,
    created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
    completed_at TIMESTAMPTZ,
    created_by UUID NOT NULL
);

CREATE TABLE analysis_results (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    analysis_job_id UUID NOT NULL REFERENCES analysis_jobs(id),
    tes_task_id UUID NOT NULL,
    method_name TEXT NOT NULL,
    method_role TEXT NOT NULL,
    filename TEXT NOT NULL,
    file_type TEXT NOT NULL,
    drs_uri TEXT NOT NULL,
    result_role TEXT NOT NULL,
    validation_status TEXT DEFAULT 'pending',
    validation_details JSONB,
    created_at TIMESTAMPTZ DEFAULT now()
);

CREATE TABLE publish_artefacts (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL,
    name TEXT NOT NULL,
    description TEXT NOT NULL,
    license TEXT NOT NULL,
    date_published DATE NOT NULL DEFAULT CURRENT_DATE,
    status TEXT NOT NULL DEFAULT 'staging',
    minio_path TEXT,
    zip_size_bytes BIGINT,
    error_msg TEXT,
    created_at TIMESTAMPTZ DEFAULT now(),
    completed_at TIMESTAMPTZ
);

CREATE TABLE publish_artefact_items (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    artefact_id UUID NOT NULL REFERENCES publish_artefacts(id),
    resource_type TEXT NOT NULL,
    resource_id TEXT NOT NULL,
    display_name TEXT,
    zip_path TEXT,
    size_bytes BIGINT,
    status TEXT NOT NULL DEFAULT 'pending'
);

Appendix D: Existing Services

ServiceLanguageStatus
IAMPython/FastAPIProduction
UpgateRust/AxumProduction
Ares (TRS/DRS)Rust/AxumProduction
Metis (WES)RustProduction
zipRSPythonProduction
Notes (ELN)Production
GAS (Gateway)PythonProduction
ChatMDI/ChatNexusPythonProduction
Forge (Hypothesis)PythonProduction
Research TeamPythonProduction
Policy TeamPythonProduction
General AgentPythonProduction
RMS (RAG)PythonProduction
FrontendReact 18/TSProduction
pa-pipeline-orchestratorPython/FastAPINew (M1)
pa-analysis-agentPython/FastAPINew (M2)

Appendix E: Frontend Architecture

Technology Stack

React 18, TypeScript 5.6, Vite 6. State management: Redux Toolkit + RTK Query with 14 backend service slices, redux-persist for workspace and upload state. UI: Radix UI + shadcn/ui + Tailwind CSS v3. Rich text: TipTap v3 (ProseMirror). Code editor: Monaco. Charts: Recharts. Auth: oidc-client-ts + Keycloak with cross-tab session sync. GA4GH components: Elixir Cloud Components (@elixir-cloud/*).

SSE Streaming Protocol

All AI-facing endpoints use Server-Sent Events with 8 event types: data-conversation, data-agent, text-start, text-delta, text-end, tool-output-available, error, finish. A 1.5-second thinking stall indicator fires during long reasoning steps. Frontend consumes via Vercel AI SDK v5 with 16 composable chat primitives.

White-Labelling

Single Docker image serves all environments. runtime-env.js generated at container startup injects: VITE_BRAND_NAME, VITE_PRIMARY_COLOR (OKLCH), logo URLs, backend service URLs, and VITE_ENABLED_MODULES. Three-tier tenancy (Org → Tenant → Project) with workspace auto-provisioning. Cache isolation: switching projects resets all 9 RTK Query service caches simultaneously.

Feature Modules

ModuleStatus
Datasets (DRS CRUD, file tree, downloads)Production
Cohort Builder (faceted filtering, visualisation)Production
Workflows (TRS registry, public browser)Production
Runs (WES submission, monitoring, outputs)Production
Research (multi-tab AI chat)Production
RAG Knowledge Bases (upload, tag, publish)Production
Notes (TipTap, autosave, PDF export, AI)Production
Exploratory Analysis (JupyterLab → Monaco in M3)Production
Sidebar (AI assistant + Notes, global)Production

Appendix F: Deployment Checklists

F.1 pa-pipeline-orchestrator (M1)

  1. ☐ CNPG migrations (ingestion_jobs, ingestion_files, tes_tasks)
  2. ☐ FastAPI scaffold (pa-auth, pa-logging, Redis heartbeat, Taskfile)
  3. ☐ Download pod image (ncbi/sra-tools + Upgate client)
  4. ☐ K8s namespace + ServiceAccount + RBAC
  5. ☐ MinIO credentials Secret
  6. ☐ RabbitMQ exchange pipeline_events + bindings
  7. ☐ NetworkPolicy for sandboxed TES tasks
  8. ☐ Istio routing for /ingestion/, /ga4gh/tes/
  9. ☐ OTEL + Prometheus scrape config
  10. ☐ CI → Harbor → Helm values
  11. ☐ Helm sub-chart in pa-platform

F.2 Metis (M1)

  1. ☐ Nextflow gRPC plugin
  2. ☐ WES endpoint stabilisation
  3. ☐ DRS output registration post-execution
  4. ☐ trs-syncer nf-core plugin
  5. ☐ trs-cache-filer Nextflow validation
  6. ☐ TRS /tests endpoint
  7. ☐ Frontend: workflow selector + dataset browser + result viewer
  8. ☐ OTEL integration

F.3 pa-analysis-agent (M2)

  1. ☐ CNPG migrations (analysis_jobs, analysis_results, publish_artefacts, publish_artefact_items)
  2. ☐ FastAPI scaffold
  3. ☐ pa-bio:3.11 + bioconductor:3.18 images → Harbor
  4. ☐ Methodology matrix YAML + constrained parser
  5. ☐ Validation framework + CI test suite
  6. ☐ RabbitMQ consumer (workflow.complete) + publisher (artefact.ready)
  7. ☐ MinIO credentials (staging + artefacts)
  8. ☐ OTEL + Prometheus
  9. ☐ Helm sub-chart + Keycloak client
  10. ☐ Frontend: analysis UI + publish section

Appendix G: Deployment Infrastructure

GitOps and Sync Waves

All 30+ services are managed by ArgoCD via the App-of-Apps pattern. Terraform handles one-time bootstrap (cluster, service mesh, GitOps controller, messaging operator). Helm charts with environment-specific values files. 9 ordered sync waves:

  1. cert-manager
  2. Operators (CloudNativePG, MongoDB, RabbitMQ)
  3. Infrastructure (Keycloak, PostgreSQL, Redis, MinIO/SeaweedFS, RabbitMQ clusters)
  4. Platform services (IAM, Gateway, domain services)
  5. RAG/AI services (Qdrant, RMS, GAS, agent teams) 6–9. Progressive application layers

CI/CD: image tag propagation → infra repo webhook → ArgoCD detect → auto-deploy with exponential backoff retry (500 attempts).

Preview Environments (ADR-13)

Preview environments are auto-provisioned by GitHub Actions on preview/ branches: Talos cluster on cloud → Terraform bootstrap (mesh, ArgoCD, RabbitMQ Operator) → full platform via ArgoCD → auto-cleanup on branch deletion. Each preview gets isolated Terraform state. Provisioning completes within minutes.

Constraints: concurrent preview environments limited to 3 via GitHub Actions concurrency groups. Auto-teardown after 48-hour inactive TTL. Monthly cloud budget tracked with alerts at 80%.

Deployment Tiers

TierInfrastructureSync ModePurpose
StablekubeadmManual gateProduction
StagingkubeadmContinuousIntegration testing
DevkubeadmContinuousDevelopment (relaxed constraints)
PreviewTalosEphemeral per-branchFeature isolation

Appendix H: Ingestion Design Q&A

Key decisions from the team design review:

  • Pre-flight estimates required, explicit acknowledgement before download
  • Partial success first-class — per-file status, don't abort whole job
  • Auto-retry 2–3× with backoff; distinguish transient vs source errors
  • Cohort editing post-ingestion supported; incremental re-ingestion without re-download
  • Group reassignment possible without re-downloading
  • Background jobs, async notification; all progress info (%, count, ETA)
  • 6+ hour downloads acceptable; always offer free tier
  • Human-readable errors only; never raw stack traces
  • Re-runs with different parameters tracked distinctly

Full transcript: v4.0 Appendix C.


Appendix I: RO-Crate Metadata Structure

{
  "@context": "https://w3id.org/ro/crate/1.2/context",
  "@graph": [
    {
      "@type": "CreativeWork",
      "@id": "ro-crate-metadata.json",
      "conformsTo": {"@id": "https://w3id.org/ro/crate/1.2"},
      "about": {"@id": "./"}
    },
    {
      "@type": "Dataset",
      "@id": "./",
      "name": "{user-provided}",
      "description": "{user-provided}",
      "license": {"@id": "https://creativecommons.org/licenses/by/4.0/"},
      "datePublished": "2026-07-01",
      "author": {"@id": "#author-1"},
      "hasPart": [
        {"@id": "notes/{note_id}.jsonl"},
        {"@id": "datasets/{drs_id}/{filename}"},
        {"@id": "runs/{run_id}/outputs/"},
        {"@id": "workflows/{trs_id}/"}
      ]
    },
    {
      "@type": "Person",
      "@id": "#author-1",
      "name": "{from Keycloak}",
      "@identifier": "https://orcid.org/..."
    }
  ]
}

Output ZIP structure:

data-artefact-{date}-{id}.zip
  ├── ro-crate-metadata.json
  ├── notes/
  ├── datasets/
  │     └── {drs_object_id}/{filename}
  ├── runs/
  │     └── {run_id}/outputs/
  └── workflows/
        └── {trs_tool_id}/

Appendix J: Methodology Decision Matrix

methodology_matrix:
  bulk_rnaseq:
    counts:
      - condition: "n_min >= 3"
        primary: DESeq2
        alternative: edgeR
      - condition: "n_min < 3"
        primary: limma-voom
        alternative: edgeR
  # Feature selection follows the same pattern:
  # feature_selection:
  #   high_dimensional:
  #     - condition: "n_features > 1000"
  #       primary: MRMR
  #       alternative: Boruta
  #     - condition: "n_features <= 1000"
  #       primary: SHAP
  #       alternative: Boruta

Conditions are evaluated by a constrained recursive-descent parser. Accepted tokens: whitelisted variable names (n_min, n_total, n_group_a, n_group_b, n_features), numeric literals, comparison operators (<, >, <=, >=, ==). eval() is prohibited. Unparseable conditions are rejected at config load time.

On this page

PA Platform Roadmap v5.2Executive SummaryWhat We Are BuildingProduct StrategyPlatform VerticalsDevelopment PrincipleMilestones (Bimonthly, 12 Months)Key Risks1. Vision and Strategy1.1 The Research Pipeline1.2 The Workspace Model1.3 Labels as Soft Associations1.4 Existing Foundation1.5 Anchor Use Case: MRSA Surveillance2. Milestones2.0 Milestone 0: Resolve Pre-Sprint Blockers2.1 M1: Alpha — Mid-April 2026 (~6 Weeks)ScopeTimelineCritical-Path TasksExit Criteria2.2 M2: Beta-1 — June 1, 2026 (~8 Weeks)Scope (Adds to M1)TimelineExit Criteria2.3 M3: Beta-2 — August 3, 2026 (~8 Weeks)Scope (Adds to M2)Standards FrameworkExit Criteria2.4 M4 — October 6, 2026 (~8 Weeks)V2 Scope (basic)V3 Scope (foundation, basic)On-PremInfrastructure: MinIO → SeaweedFS MigrationInfrastructure: Storage Layer Autoscaling (On-Prem)Exit Criteria2.5 M5 — December 1, 2026 (~8 Weeks)V3 Scope (operational)V4 Scope (foundation, basic)PlatformExit Criteria2.6 M6: V1 Full — March 2027 (~12 Weeks)V4 Scope (operational)V5 Scope (foundation, design + prototype only)Agentic OS (mature)Platform CapabilitiesExit Criteria2.7 Dependency Graph3. Blockers, Risks, and Open Questions3.1 Pre-Sprint Blockers3.2 Critical Risks3.3 Moderate Risks3.4 Open Questions4. High-Level Architecture4.1 Pipeline Overview4.2 Two-Service Model4.3 Event Topology4.4 LLM Inference4.5 oCIS Integration (M2+)4.6 V3 Sovereign Productivity Stack (M4–M5)4.7 Service Pattern5. Use CasesUC-1: Ingest Data from NCBI SRA (M1)UC-2: Execute a Nextflow Workflow (M1)UC-3: Assemble Samplesheet and Submit Run (M2)UC-4: Run Statistical Analysis (M2)UC-5: Publish a Data Artefact (M2)6. Data Ingestion — Design Detail6.1 Download Strategy6.2 Registration Paths6.3 Ingestion Does NOT (M1–M2)7. Nextflow Execution — Design Detail7.1 Tradeoffs7.2 Output Capture8. Analysis Agent — Design Detail8.1 Methodology Selection8.2 Validation Framework8.3 LLM Sandbox9. Publishing Module — Design Detail10. Operational Readiness10.1 Compliance Timeline10.2 Capacity Planning10.3 Testing Strategy11. Agentic OS LayerAppendix A: Architecture Decision RecordsAppendix B: Related DocumentsAppendix C: Database SchemasC.1 Pipeline OrchestratorC.2 Analysis AgentAppendix D: Existing ServicesAppendix E: Frontend ArchitectureTechnology StackSSE Streaming ProtocolWhite-LabellingFeature ModulesAppendix F: Deployment ChecklistsF.1 pa-pipeline-orchestrator (M1)F.2 Metis (M1)F.3 pa-analysis-agent (M2)Appendix G: Deployment InfrastructureGitOps and Sync WavesPreview Environments (ADR-13)Deployment TiersAppendix H: Ingestion Design Q&AAppendix I: RO-Crate Metadata StructureAppendix J: Methodology Decision Matrix