Architecture
PacificAnalytics/infra
Architecture and Components
The platform infrastructure is broadly split into two parts, both consisting of IaC managed resources using a GitOps methodology:
Infrastructure (Terraform)
- Kubernetes Cluster provisioning (nodes, networking, storage)
- ArgoCD installation and initial configuration
- Istio Base components (istiod, istio-base)
- Core Infrastructure setup (namespaces, RBAC, storage classes)
Application (ArgoCD)
- Self-Signed Certificate issuers and Certificate resources
- Istio Configuration (Gateways, VirtualServices, DestinationRules)
- Platform Services (databases, auth, monitoring)
- Application Services (our applications)
Components
The platform's architecture comprises the following services and datastores:
| Component | Type | Description |
|---|---|---|
| Client | Frontend | Web-based user interface, providing the primary interaction point with the platform's services. |
| API Gateway | Backend | Serves as the ingress point for all API requests, responsible for routing, rate limiting, and authentication orchestration. Also utilizes PostgreSQL for its data persistence needs. |
| WESkit | Backend | GA4GH WES (Workflow Execution Service) compliant engine for executing containerized computational workflows. |
| ↳ WESkit API | Backend (sub-component) | RESTful API for WESkit, handling workflow submission, status polling, and results retrieval. |
| ↳ WESkit Worker | Backend (sub-component) | Celery-based distributed task queue workers responsible for the actual execution of workflow steps. |
| ↳ WESkit Celery Beat | Backend (sub-component) | Celery scheduler for periodic tasks within the WESkit ecosystem, if required. |
| TRS Filer | Backend | GA4GH TRS (Tool Registry Service) compliant implementation for indexing and discovering workflow tools and their metadata. |
| DRS Filer | Backend | GA4GH DRS (Data Repository Service) compliant implementation for resolving data object identifiers to fetchable access URLs. |
| Keycloak | Backend (Auth Service) | Centralized Identity and Access Management (IAM) solution based on OAuth 2.0 and OpenID Connect for platform-wide security. |
| JupyterHub | Backend (Compute Service) | Multi-user platform for provisioning and managing sandboxed Jupyter notebook environments, integrated with Keycloak for SSO. |
| PostgreSQL | Database | Relational database, used by Keycloak and the API Gateway for their persistent data stores. Configured as an external dependency. |
| MongoDB | Database | NoSQL document store, primarily leveraged by WESkit for persisting workflow state, run history, and operational metadata. |
| Redis | Database / Broker | In-memory data structure store, utilized by WESkit as a Celery message broker and for caching intermediate state or results. |
| MinIO | Storage | S3-compatible distributed object storage system, serving as the backend for datasets, workflow artifacts, execution logs, and large-scale results. |
Multi-Environment Support
| Branch | Environment | Domain | Deployment |
|---|---|---|---|
main | Production | pacificanalytics.com | ApplicationSet |
staging | Staging | staging-internal.pacificanalytics.com | ApplicationSet |
uat | UAT | uat.pacificanalytics.com | ApplicationSet |
stable | Stable | app.pacificanalytics.com | ApplicationSet |
preview/* | Preview | {branch}-internal.pacificanalytics.com | Individual App |
dev | Development | dev-internal.pacificanalytics.com | Individual App |
-
Kubernetes Cluster Infrastructure:
- The platform is deployed onto one or more Kubernetes clusters, corresponding to different environments (e.g.,
dev,staging,production). - Terraform, via HCL configurations in the
terraform/directory, is likely employed for provisioning and managing the underlying Kubernetes cluster infrastructure (e.g., Minikube for local/on-prem development environments, or managed Kubernetes services like EKS, GKE, AKS for other environments) and potentially for the initial bootstrapping of Argo CD instances or core Application CRDs.
- The platform is deployed onto one or more Kubernetes clusters, corresponding to different environments (e.g.,
-
Application Packaging (Helm):
- The
pa-platformHelm chart encapsulates all Kubernetes manifest templates (e.g.,Deployments,StatefulSets,Services,ConfigMaps,Secrets,PersistentVolumeClaims,Ingressresources) required for the platform's components and their interdependencies. - External dependencies (Keycloak, MongoDB, Redis, MinIO) are managed as subcharts or external chart dependencies declared in the
pa-platformchart'sChart.yaml.
- The
-
Continuous Integration and Delivery (CI/CD via GitHub Actions & Argo CD):
-
Continuous Integration (CI) for each component (e.g., using GitHub Actions):
- Trigger: A push to a significant branch (e.g.,
dev,staging,master) in a component's individual Git repository initiates its CI pipeline. - Build & Push Docker Image: The pipeline checks out the component's code, builds a new Docker image, and tags it (e.g., with the short commit SHA and full commit SHA).
- Push to Container Registry: The tagged Docker image is pushed to a central container registry (e.g., Docker Hub, under an organization like
pacificanalytics). Docker build caching is often used to speed up this process. - Update Infrastructure Repository: Upon a successful build and push (especially for environment-specific branches like
dev), the CI pipeline for the component automatically checks out thepacificAnalytics/infraGit repository (targeting the corresponding environment branch, e.g.,devbranch ininfrafor adevcomponent build). - It then uses a tool like
yqto update the relevant environment-specific Helm values file (e.g.,values.minikube.yamlfor a Minikube/dev setup, or more generallyvalues-<environment>.yaml) within theinfrarepository. The update involves setting theimage.tagfor the specific component (e.g.,client.image.tag,api-gateway.image.tag) to the newly built Docker image tag (e.g., the short commit SHA). - Commit & Push to Infrastructure: The change to the Helm values file is committed and pushed back to the corresponding branch in the
pacificAnalytics/infrarepository. This commit signifies the intent to deploy the new component version to that environment.
- Trigger: A push to a significant branch (e.g.,
-
Continuous Delivery (CD with Argo CD & GitOps):
- Git Repository as Source of Truth: Argo CD continuously monitors the
pacificAnalytics/infrarepository, which contains thepa-platformHelm chart and its environment-specific configurations. - Monitoring: Argo CD Application controllers (defined via
ApplicationCRDs) are configured to watch specific paths and branches (e.g.,dev,staging,masterbranches ininfra) for changes. - Automated Synchronization: When Argo CD detects a change in the monitored Git source (e.g., the updated Helm values file resulting from a component's CI pipeline), it initiates a sync operation.
- Deployment: Argo CD applies these changes to the target Kubernetes cluster and namespace for the respective environment. This typically involves Argo CD rendering the Helm templates with the updated values and applying the resulting Kubernetes manifests, ensuring the deployed application state converges to the state defined in Git.
- Git Repository as Source of Truth: Argo CD continuously monitors the
-
Branch Management and Environments
A Git branching strategy underpins the environment promotion model. Each component repository likely follows a similar branching model, and the pacificAnalytics/infra repository mirrors these for environment configurations. Argo CD applications are configured to track specific Git revisions (branches, tags) and Helm value files from the infra repository for each environment.
-
devEnvironment:- Component Source:
devbranch in individual component repositories. - Infrastructure Source:
devbranch in thepacificAnalytics/infrarepository (containing Helm values likevalues.minikube.yamlorvalues-dev.yaml). - Purpose: Integration environment for active development and initial QA. New features and component versions are deployed here continuously via the automated CI/CD process described above.
- Argo CD Configuration: An Argo CD
ApplicationCRD targets thedevbranch of theinfrarepository and its associated Helm value files.
- Component Source:
-
stagingEnvironment:- Component Source:
stagingbranch, or a dedicated release candidate branch (e.g.,release/vX.Y.Z) in component repositories, branched fromdev. - Infrastructure Source:
stagingbranch (or corresponding release branch) in thepacificAnalytics/infrarepository, with Helm values likevalues-staging.yaml. - Purpose: Pre-production environment for UAT, performance testing, and regression testing, closely mimicking production.
- Promotion: Promotion to staging typically involves merging the
devbranch of a component into itsstagingbranch. The CI process then builds the image, and theupdate-infrastep in its pipeline would target thestagingbranch of theinfrarepository. - Argo CD Configuration: A separate Argo CD
ApplicationCRD targets thestagingbranch of theinfrarepository.
- Component Source:
-
master(Production) Environment:- Component Source:
master(ormain) branch, or specific Git tags (e.g.,vX.Y.Z) in component repositories. Using tags is recommended for immutable production deployments. - Infrastructure Source:
masterbranch (or corresponding tags) in thepacificAnalytics/infrarepository, with Helm values likevalues-prod.yaml. - Purpose: Live, end-user-facing environment.
- Promotion: Promotion to production typically involves merging a well-tested
stagingbranch (or release branch) intomaster. The CI process for the component builds the production-tagged image, and itsupdate-infrajob targets themasterbranch of theinfrarepository. - Argo CD Configuration: An Argo CD
ApplicationCRD targets themasterbranch or a stable Git tag in theinfrarepository.
- Component Source:
-
On-Demand Feature Branch Deployments (Future Plan):
- Objective: Enable dynamic provisioning of isolated environments per feature branch of individual components for development and QA, without affecting the shared
devenvironment. - Potential Implementation: Integration between the component's CI system (on feature branch activity like PR creation) and Argo CD. A CI job could:
1. Create a temporary namespace in Kubernetes.
2. Dynamically generate or update an environment-specific Helm values file (e.g.,
values-feature-X.yaml) in a designated path within theinfrarepository (perhaps on a short-lived branch or a specific directory monitored by an Argo CD ApplicationSet). 3. This change would trigger an Argo CD ApplicationSet to create a new Argo CDApplicationCRD targeting the feature branch's Docker image and the feature-specific Helm values. 4. On pull request closure/merge, a corresponding job would clean up the Helm values, leading Argo CD to remove theApplicationand its resources, and the temporary namespace would be deleted.
- Objective: Enable dynamic provisioning of isolated environments per feature branch of individual components for development and QA, without affecting the shared
Helm Chart Operations (Primarily for Local Development/Understanding)
While Argo CD automates deployments, direct Helm CLI usage is relevant for local chart development, linting, and initial bootstrapping before Argo CD management.
-
Prerequisites:
helmCLI installed,kubectlconfigured to a Kubernetes cluster. -
Clone Repository:
git clone <repository_url_of_infra_repo_or_chart_source> -
Navigate to Chart Directory:
cd path/to/pa-platform(within theinfrarepo structure) -
Dependency Update:
helm dependency update(Resolves and fetches chart dependencies specified in
Chart.yamlinto thecharts/subdirectory.) -
Configuration: Customize
values.yamlor provide environment-specific overrides (e.g.,-f values-dev.yaml). -
Lint & Template (Dry Run):
helm lint . helm template <release-name> . -f values.yaml -n <namespace> --debug > manifests.yaml
Note: For clusters where Argo CD manages the pa-platform application, avoid direct helm install/upgrade operations on that application, as Argo CD will override these changes based on its Git source of truth (the pacificAnalytics/infra repository). Manual Helm operations are generally confined to non-Argo-managed contexts or for chart development phases.