Pacific Analytics
Infrastructure

Architecture

PacificAnalytics/infra

Architecture and Components

The platform infrastructure is broadly split into two parts, both consisting of IaC managed resources using a GitOps methodology:

Infrastructure (Terraform)

  • Kubernetes Cluster provisioning (nodes, networking, storage)
  • ArgoCD installation and initial configuration
  • Istio Base components (istiod, istio-base)
  • Core Infrastructure setup (namespaces, RBAC, storage classes)

Application (ArgoCD)

  • Self-Signed Certificate issuers and Certificate resources
  • Istio Configuration (Gateways, VirtualServices, DestinationRules)
  • Platform Services (databases, auth, monitoring)
  • Application Services (our applications)

Components

The platform's architecture comprises the following services and datastores:

ComponentTypeDescription
ClientFrontendWeb-based user interface, providing the primary interaction point with the platform's services.
API GatewayBackendServes as the ingress point for all API requests, responsible for routing, rate limiting, and authentication orchestration. Also utilizes PostgreSQL for its data persistence needs.
WESkitBackendGA4GH WES (Workflow Execution Service) compliant engine for executing containerized computational workflows.
↳ WESkit APIBackend (sub-component)RESTful API for WESkit, handling workflow submission, status polling, and results retrieval.
↳ WESkit WorkerBackend (sub-component)Celery-based distributed task queue workers responsible for the actual execution of workflow steps.
↳ WESkit Celery BeatBackend (sub-component)Celery scheduler for periodic tasks within the WESkit ecosystem, if required.
TRS FilerBackendGA4GH TRS (Tool Registry Service) compliant implementation for indexing and discovering workflow tools and their metadata.
DRS FilerBackendGA4GH DRS (Data Repository Service) compliant implementation for resolving data object identifiers to fetchable access URLs.
KeycloakBackend (Auth Service)Centralized Identity and Access Management (IAM) solution based on OAuth 2.0 and OpenID Connect for platform-wide security.
JupyterHubBackend (Compute Service)Multi-user platform for provisioning and managing sandboxed Jupyter notebook environments, integrated with Keycloak for SSO.
PostgreSQLDatabaseRelational database, used by Keycloak and the API Gateway for their persistent data stores. Configured as an external dependency.
MongoDBDatabaseNoSQL document store, primarily leveraged by WESkit for persisting workflow state, run history, and operational metadata.
RedisDatabase / BrokerIn-memory data structure store, utilized by WESkit as a Celery message broker and for caching intermediate state or results.
MinIOStorageS3-compatible distributed object storage system, serving as the backend for datasets, workflow artifacts, execution logs, and large-scale results.

Multi-Environment Support

BranchEnvironmentDomainDeployment
mainProductionpacificanalytics.comApplicationSet
stagingStagingstaging-internal.pacificanalytics.comApplicationSet
uatUATuat.pacificanalytics.comApplicationSet
stableStableapp.pacificanalytics.comApplicationSet
preview/*Preview{branch}-internal.pacificanalytics.comIndividual App
devDevelopmentdev-internal.pacificanalytics.comIndividual App
  1. Kubernetes Cluster Infrastructure:

    • The platform is deployed onto one or more Kubernetes clusters, corresponding to different environments (e.g., dev, staging, production).
    • Terraform, via HCL configurations in the terraform/ directory, is likely employed for provisioning and managing the underlying Kubernetes cluster infrastructure (e.g., Minikube for local/on-prem development environments, or managed Kubernetes services like EKS, GKE, AKS for other environments) and potentially for the initial bootstrapping of Argo CD instances or core Application CRDs.
  2. Application Packaging (Helm):

    • The pa-platform Helm chart encapsulates all Kubernetes manifest templates (e.g., Deployments, StatefulSets, Services, ConfigMaps, Secrets, PersistentVolumeClaims, Ingress resources) required for the platform's components and their interdependencies.
    • External dependencies (Keycloak, MongoDB, Redis, MinIO) are managed as subcharts or external chart dependencies declared in the pa-platform chart's Chart.yaml.
  3. Continuous Integration and Delivery (CI/CD via GitHub Actions & Argo CD):

    • Continuous Integration (CI) for each component (e.g., using GitHub Actions):

      1. Trigger: A push to a significant branch (e.g., dev, staging, master) in a component's individual Git repository initiates its CI pipeline.
      2. Build & Push Docker Image: The pipeline checks out the component's code, builds a new Docker image, and tags it (e.g., with the short commit SHA and full commit SHA).
      3. Push to Container Registry: The tagged Docker image is pushed to a central container registry (e.g., Docker Hub, under an organization like pacificanalytics). Docker build caching is often used to speed up this process.
      4. Update Infrastructure Repository: Upon a successful build and push (especially for environment-specific branches like dev), the CI pipeline for the component automatically checks out the pacificAnalytics/infra Git repository (targeting the corresponding environment branch, e.g., dev branch in infra for a dev component build).
      5. It then uses a tool like yq to update the relevant environment-specific Helm values file (e.g., values.minikube.yaml for a Minikube/dev setup, or more generally values-<environment>.yaml) within the infra repository. The update involves setting the image.tag for the specific component (e.g., client.image.tag, api-gateway.image.tag) to the newly built Docker image tag (e.g., the short commit SHA).
      6. Commit & Push to Infrastructure: The change to the Helm values file is committed and pushed back to the corresponding branch in the pacificAnalytics/infra repository. This commit signifies the intent to deploy the new component version to that environment.
    • Continuous Delivery (CD with Argo CD & GitOps):

      1. Git Repository as Source of Truth: Argo CD continuously monitors the pacificAnalytics/infra repository, which contains the pa-platform Helm chart and its environment-specific configurations.
      2. Monitoring: Argo CD Application controllers (defined via Application CRDs) are configured to watch specific paths and branches (e.g., dev, staging, master branches in infra) for changes.
      3. Automated Synchronization: When Argo CD detects a change in the monitored Git source (e.g., the updated Helm values file resulting from a component's CI pipeline), it initiates a sync operation.
      4. Deployment: Argo CD applies these changes to the target Kubernetes cluster and namespace for the respective environment. This typically involves Argo CD rendering the Helm templates with the updated values and applying the resulting Kubernetes manifests, ensuring the deployed application state converges to the state defined in Git.

Branch Management and Environments

A Git branching strategy underpins the environment promotion model. Each component repository likely follows a similar branching model, and the pacificAnalytics/infra repository mirrors these for environment configurations. Argo CD applications are configured to track specific Git revisions (branches, tags) and Helm value files from the infra repository for each environment.

  • dev Environment:

    • Component Source: dev branch in individual component repositories.
    • Infrastructure Source: dev branch in the pacificAnalytics/infra repository (containing Helm values like values.minikube.yaml or values-dev.yaml).
    • Purpose: Integration environment for active development and initial QA. New features and component versions are deployed here continuously via the automated CI/CD process described above.
    • Argo CD Configuration: An Argo CD Application CRD targets the dev branch of the infra repository and its associated Helm value files.
  • staging Environment:

    • Component Source: staging branch, or a dedicated release candidate branch (e.g., release/vX.Y.Z) in component repositories, branched from dev.
    • Infrastructure Source: staging branch (or corresponding release branch) in the pacificAnalytics/infra repository, with Helm values like values-staging.yaml.
    • Purpose: Pre-production environment for UAT, performance testing, and regression testing, closely mimicking production.
    • Promotion: Promotion to staging typically involves merging the dev branch of a component into its staging branch. The CI process then builds the image, and the update-infra step in its pipeline would target the staging branch of the infra repository.
    • Argo CD Configuration: A separate Argo CD Application CRD targets the staging branch of the infra repository.
  • master (Production) Environment:

    • Component Source: master (or main) branch, or specific Git tags (e.g., vX.Y.Z) in component repositories. Using tags is recommended for immutable production deployments.
    • Infrastructure Source: master branch (or corresponding tags) in the pacificAnalytics/infra repository, with Helm values like values-prod.yaml.
    • Purpose: Live, end-user-facing environment.
    • Promotion: Promotion to production typically involves merging a well-tested staging branch (or release branch) into master. The CI process for the component builds the production-tagged image, and its update-infra job targets the master branch of the infra repository.
    • Argo CD Configuration: An Argo CD Application CRD targets the master branch or a stable Git tag in the infra repository.
  • On-Demand Feature Branch Deployments (Future Plan):

    • Objective: Enable dynamic provisioning of isolated environments per feature branch of individual components for development and QA, without affecting the shared dev environment.
    • Potential Implementation: Integration between the component's CI system (on feature branch activity like PR creation) and Argo CD. A CI job could: 1. Create a temporary namespace in Kubernetes. 2. Dynamically generate or update an environment-specific Helm values file (e.g., values-feature-X.yaml) in a designated path within the infra repository (perhaps on a short-lived branch or a specific directory monitored by an Argo CD ApplicationSet). 3. This change would trigger an Argo CD ApplicationSet to create a new Argo CD Application CRD targeting the feature branch's Docker image and the feature-specific Helm values. 4. On pull request closure/merge, a corresponding job would clean up the Helm values, leading Argo CD to remove the Application and its resources, and the temporary namespace would be deleted.

Helm Chart Operations (Primarily for Local Development/Understanding)

While Argo CD automates deployments, direct Helm CLI usage is relevant for local chart development, linting, and initial bootstrapping before Argo CD management.

  1. Prerequisites: helm CLI installed, kubectl configured to a Kubernetes cluster.

  2. Clone Repository: git clone <repository_url_of_infra_repo_or_chart_source>

  3. Navigate to Chart Directory: cd path/to/pa-platform (within the infra repo structure)

  4. Dependency Update:

    helm dependency update

    (Resolves and fetches chart dependencies specified in Chart.yaml into the charts/ subdirectory.)

  5. Configuration: Customize values.yaml or provide environment-specific overrides (e.g., -f values-dev.yaml).

  6. Lint & Template (Dry Run):

    helm lint .
    helm template <release-name> . -f values.yaml -n <namespace> --debug > manifests.yaml

Note: For clusters where Argo CD manages the pa-platform application, avoid direct helm install/upgrade operations on that application, as Argo CD will override these changes based on its Git source of truth (the pacificAnalytics/infra repository). Manual Helm operations are generally confined to non-Argo-managed contexts or for chart development phases.

On this page