Resources List

Resources Created in LOCI Self-Hosted Deployment

This document lists all resources created by the standalone deployment infrastructure.

Table of Contents

  1. Infrastructure & Networking

  2. EKS Cluster & Node Groups

  3. Kubernetes Resources

  4. AWS Services

  5. Route53 & DNS

  6. Monitoring & Logging

  7. Security & IAM

  8. Storage


Notes

  • All resources are tagged with cost allocation tags for tracking

  • Resources follow AWS Well-Architected Framework principles

  • High availability is configured for critical components (databases, node groups)

  • Backup and disaster recovery configured for databases

  • Monitoring and logging are comprehensive across all services

  • Security follows least-privilege IAM principles

  • All data is encrypted at rest and in transit

Infrastructure & Networking

VPC Resources

  • VPC (example-cluster)

    • CIDR: 10.3.0.0/16 (example CIDR)

    • DNS hostnames enabled

    • DNS support enabled

Subnets

  • Public Subnets

    • 10.3.1.0/24 (Availability Zone A)

    • 10.3.3.0/24 (Availability Zone B)

    • Tagged for ELB: kubernetes.io/role/elb = 1

  • Private Subnets

    • 10.3.2.0/24 (Availability Zone A)

    • 10.3.4.0/24 (Availability Zone B)

    • Tagged for internal ELB: kubernetes.io/role/internal-elb = 1

Networking Components

  • Internet Gateway (IGW)

  • NAT Gateway (Single NAT Gateway for cost optimization)

  • Route Tables (Public and Private)

  • VPC Endpoints (for private access to AWS services):

    • ECR API Endpoint (com.amazonaws.{region}.ecr.api)

    • ECR DKR Endpoint (com.amazonaws.{region}.ecr.dkr)

    • CloudWatch Logs Endpoint (com.amazonaws.{region}.logs)

    • EventBridge Endpoint (com.amazonaws.{region}.events) - Optional

    • S3 Gateway Endpoint (com.amazonaws.{region}.s3)

    • SageMaker Runtime Endpoint (com.amazonaws.{region}.sagemaker.runtime)

    • SageMaker API Endpoint (com.amazonaws.{region}.sagemaker.api)

Security Groups

  • EKS Cluster Security Group

  • Node Security Groups (for each node group)

  • SageMaker Endpoint Security Group

  • VPC Endpoint Security Group


EKS Cluster & Node Groups

EKS Cluster

  • Cluster Name: example-cluster (example name, will be set from config.ini)

  • Kubernetes Version: Latest supported

  • CloudWatch Logging: Enabled (14-day retention)

  • Public Endpoint Access: Enabled

Node Groups

1. Private Node Group

  • Instance Types: t3.medium

  • Desired Size: 2

  • Min Size: 2

  • Max Size: 3

  • Subnets: Private subnets

  • Capacity Type: ON_DEMAND

  • Workload Type: Private workloads

  • Labels: workload-type=private

  • Services: Backend application, Monitoring services (Grafana, Prometheus, Loki)

2. Database Node Group

  • Instance Types: t3.xlarge

  • Desired Size: 2

  • Min Size: 2

  • Max Size: 5

  • Subnets: Private subnets

  • Disk Size: 400GB (gp3, encrypted)

  • Workload Type: Database workloads

  • Labels: workload-type=database

3. Public Node Group (Disabled)

  • Status: Disabled (desired_size = 0)

  • Note: Public node group is not created. All workloads run on private nodes.

4. MCP Node Group

  • Instance Types: t3.xlarge

  • Desired Size: 2

  • Min Size: 2

  • Max Size: 4

  • Subnets: Private subnets

  • Capacity Type: ON_DEMAND

  • Workload Type: MCP workloads

  • Labels: workload-type=mcp

  • Services: MCP Server, MCP Agent, Code Optim Agent

EKS Add-ons

  • EBS CSI Driver (for persistent volumes)

  • AWS Load Balancer Controller

  • CoreDNS (for service discovery)

  • kube-proxy

  • VPC CNI


Kubernetes Resources

Namespaces

  • backend - Backend application namespace

  • frontend - Frontend application namespace

  • db - Database namespace (PostgreSQL)

  • neo4j - Neo4j database namespace

  • lambda - Lambda functions namespace

  • mcp - MCP services namespace (MCP Server, MCP Agent, Code Optim Agent)

  • monitoring - Monitoring stack namespace (Grafana, Prometheus)

  • logging - Logging stack namespace (Loki, Fluent-bit)

Helm Charts Deployed

1. Backend (charts/backend/)

  • Deployment: Backend application pods

  • Service: ClusterIP service for backend

  • Service (LB): LoadBalancer service for backend

  • Ingress: ALB Ingress for external access

  • HPA: Horizontal Pod Autoscaler

  • Image: loci-backend:latest

2. Frontend (charts/frontend/)

  • Deployment: Frontend application pods

  • NodeSelector: workload-type: private (explicitly configured - runs on Private Node Group)

  • Service: ClusterIP service for frontend

  • Ingress: ALB Ingress for external access

  • Image: example-frontend:latest (example image name)

  • Hostname: example.com (example hostname, will be set from config.ini)

3. Neo4j (charts/neo4j/)

  • Helm Chart: Neo4j Community Edition

  • Deployment: Neo4j database pods

  • NodeSelector: workload-type: database (explicitly configured - runs on Database Node Group)

  • Service: ClusterIP and LoadBalancer services

  • Backup CronJob: Automated Neo4j backups to S3

  • Credentials: Managed via Kubernetes secrets

4. PostgreSQL (charts/postgres-pgo/)

  • PostgreSQL Operator: PGO (PostgreSQL Operator)

  • PostgreSQL Cluster: High Availability PostgreSQL

  • NodeSelector: workload-type: database (explicitly configured - runs on Database Node Group)

  • Service: loci-postgres-ha service

  • Backup CronJob: Automated PostgreSQL backups to S3

  • Credentials: Managed via Kubernetes secrets

5. Grafana (charts/grafana/)

  • Helm Chart: Grafana

  • Namespace: monitoring

  • Deployment: Grafana pods

  • NodeSelector: None (can run on any node group)

  • Service: ClusterIP service

  • Ingress: ALB Ingress for external access

  • Hostname: grafana.example.com (example hostname, will be set from config.ini)

  • Storage: 10Gi persistent volume

  • Dashboards: Pre-configured dashboards (cluster-logs.json)

  • Data Sources: Prometheus, Loki

6. Prometheus (charts/prometheus/)

  • Helm Chart: Prometheus

  • Namespace: monitoring

  • Deployment: Prometheus pods

  • NodeSelector: workload-type: private (explicitly configured - runs on Private Node Group)

  • Service: ClusterIP service (prometheus.monitoring.svc.cluster.local:9090)

  • Storage: 50Gi persistent volume

  • Retention: 7 days

  • Scraping: Metrics collection from cluster

7. Loki (charts/loki/)

  • Helm Chart: Loki

  • Namespace: logging

  • Deployment: Loki pods (SingleBinary mode)

  • NodeSelector: workload-type: private (explicitly configured - runs on Private Node Group)

  • Service: ClusterIP service (loci-logs-loki.logging.svc.cluster.local:3100)

  • Storage: 50Gi persistent volume

  • Retention: 48 hours (2 days)

  • External Service: For Fluent-bit integration

8. Fluent-bit (charts/fluent-bit/)

  • DaemonSet: Fluent-bit pods

  • Namespace: logging

  • NodeSelector: workload-type: private (explicitly configured - runs on all Private Node Group nodes)

  • Log Forwarding: To Loki and CloudWatch

8. Fluent-bit (charts/fluent-bit/)

  • DaemonSet: Fluent-bit pods on all nodes

  • ConfigMap: Fluent-bit configuration

  • ServiceAccount: With IRSA for CloudWatch access

  • ClusterRole: Permissions for log collection

  • Log Forwarding: To Loki and CloudWatch

9. Lambda (charts/lambda/)

  • Kubernetes Functions: Lambda functions deployed as K8s resources

  • RBAC: ServiceAccount and RoleBindings

  • Functions:

    • Version Uploaded Lambda

    • Service Status Updated Lambda

    • SageMaker Status Lambda

10. MCP Server (charts/mcp/mcp-server/)

  • Helm Chart: MCP Server

  • Namespace: mcp

  • Deployment: MCP Server pods

  • NodeSelector: workload-type: mcp (explicitly configured - runs on MCP Node Group)

  • Service: ClusterIP service (loci-mcp-server.mcp.svc.cluster.local:7000)

  • Image: loci-mcp-server:latest

  • Resources: 1 CPU limit, 3Gi memory limit

  • Environment: Backend URL configured

11. MCP Agent (charts/mcp/mcp-agent/)

  • Helm Chart: MCP Agent

  • Namespace: mcp

  • Deployment: MCP Agent pods

  • NodeSelector: workload-type: mcp (explicitly configured - runs on MCP Node Group)

  • Service: ClusterIP service (port 8000)

  • Ingress: ALB Ingress for external access

  • Hostname: agent.example.com (example hostname, will be set from config.ini)

  • Image: example-mcp-agent:latest (example image name)

  • Resources: 2 CPU limit, 3Gi memory limit

  • Environment: Backend URL and AWS credentials configured

  • MCP Server URL: example-mcp-server.mcp.svc.cluster.local (example service name)


AWS Services

ECS (Elastic Container Service)

  • ECS Cluster: LociPlatform{LociEnv}Cluster

  • Capacity Providers: FARGATE, FARGATE_SPOT

  • Task Definitions:

    • Static Analysis Task: LociPlatform{LociEnv}StaticAnalysis

      • CPU: 8192 (8 vCPU)

      • Memory: 61440 MB (60 GB)

      • Container: static-analysis

    • LCLM Task: LociPlatform{LociEnv}LCLM

      • CPU: 8192 (8 vCPU)

      • Memory: 61440 MB (60 GB)

      • Container: lclm

SageMaker

  • SageMaker Model: LociPlatform{LociEnv}Model

    • Model Package ARN: From config (lclm-multi)

    • VPC Configuration: Enabled

  • Endpoint Configuration: LociPlatform{LociEnv}EndpointConfig

    • Instance Type: ml.g5.xlarge

    • Async Inference: Enabled

    • SNS Notifications: Configured

  • SageMaker Endpoint: LociPlatform{LociEnv}Endpoint

    • Auto Scaling: Enabled (min: 1, max: configured)

    • CloudWatch Alarms: CPU and Memory utilization

DynamoDB

  • Table: LociPlatform{LociEnv}PipelineDB

    • Billing Mode: PAY_PER_REQUEST

    • Primary Key: version_id (Hash), artifact_action (Range)

    • Point-in-Time Recovery: Enabled

    • Encryption: Server-side encryption enabled

S3 Buckets

  • Storage Bucket: loci-{environment} (or configured name)

    • Versioning: Suspended

    • Encryption: AES256

    • Public Access: Blocked

    • Lifecycle Rules: Transition to IA (30 days), Glacier (90 days)

  • Backup Bucket: loci-backups-{environment}

    • Used for: PostgreSQL backups, Neo4j backups

  • Terraform State Bucket: loci-terraform (or configured)

    • State file: cluster/terraform.tfstate

EventBridge

  • Event Bus: LociPlatform{LociEnv}Events

  • Event Rules:

    • LociPlatform{LociEnv}PipelineStarted

    • LociPlatform{LociEnv}PipelineCompleted

    • LociPlatform{LociEnv}PipelineFailed

    • LociPlatform{LociEnv}VersionUploaded

    • LociPlatform{LociEnv}ServiceStatusUpdated

SNS (Simple Notification Service)

  • Topic: LociPlatform{LociEnv}Notifications

    • Used for: SageMaker async inference notifications

    • Used for: Pipeline event notifications

CloudWatch

  • Log Groups:

    • /ecs/LociPlatform{LociEnv}StaticAnalysis (14-day retention)

    • /ecs/LociPlatform{LociEnv}LCLM (14-day retention)

    • /aws/sagemaker/Endpoints/LociPlatform{LociEnv}Endpoint (14-day retention)

    • /aws/lambda/LociPlatform{LociEnv}-VersionUploadedLambda (14-day retention)

    • /aws/lambda/LociPlatform{LociEnv}-ServiceStatusUpdatedLambda (14-day retention)

    • /aws/lambda/LociPlatform{LociEnv}-SageMakerStatusLambda (14-day retention)

  • Alarms:

    • SageMaker High CPU Utilization

    • SageMaker High Memory Utilization

Lambda Functions (Kubernetes-based)

  • Version Uploaded Lambda: Handles S3 upload events

  • Service Status Updated Lambda: Handles ECS task status updates

  • SageMaker Status Lambda: Handles SageMaker endpoint status


Route53 & DNS

Public DNS (Route53 Hosted Zone)

  • Hosted Zone: example.com (example zone ID: Z08012303MU06OUHG8U2M)

  • Records Created:

    • example.com → Frontend ALB (example hostname)

    • api.example.com → Backend ALB (example hostname)

    • grafana.example.com → Grafana ALB (example hostname)

    • agent.example.com → MCP Agent ALB (example hostname)

Private DNS (Route53 Private Hosted Zone)

  • Private Zone: k8s.{environment}.internal

  • VPC Association: Associated with VPC

  • Service Discovery Records:

    • neo4j.k8s.{environment}.internal → Neo4j LoadBalancer

    • loci-postgres-ha.k8s.{environment}.internal → PostgreSQL LoadBalancer

    • loci-backend.k8s.{environment}.internal → Backend LoadBalancer

Route53 Resolver

  • Resolver Endpoint: {environment}-k8s-dns-forwarder (OUTBOUND)

  • Resolver Rule: {environment}-k8s-service-forwarding

    • Forwards svc.cluster.local queries to CoreDNS

  • Rule Association: Associated with VPC


Monitoring & Logging

Prometheus

  • Metrics Collection: Cluster-wide metrics

  • Scraping Targets: All Kubernetes services

  • Storage: Persistent volume

Grafana

  • Dashboards: Pre-configured dashboards

    • Cluster Logs Dashboard

  • Data Sources: Prometheus, Loki

  • Access: https://grafana.example.com (example URL, will be set from config.ini)

Loki

  • Log Aggregation: Centralized log storage

  • Service: loci-logs-loki.logging.svc.cluster.local

  • Integration: Fluent-bit → Loki

Fluent-bit

  • Log Collection: DaemonSet on all nodes

  • Log Forwarding:

    • To Loki (cluster logs)

    • To CloudWatch (AWS service logs)

  • Filters: Log parsing and enrichment

CloudWatch

  • Container Insights: Enabled on ECS cluster

  • Log Groups: All application and service logs

  • Metrics: ECS, SageMaker, Lambda metrics


Security & IAM

IAM Roles

ECS Roles

  • ECS Task Execution Role: LociPlatform{LociEnv}ECSTaskExecutionRole

    • ECR access

    • CloudWatch Logs access

    • S3 read access (model artifacts)

  • ECS Task Role: LociPlatform{LociEnv}ECSTaskRole

    • Full S3 access

    • EventBridge access

    • DynamoDB access

    • SageMaker access

    • ECS access

    • SNS access

    • CloudWatch Logs access

SageMaker Roles

  • SageMaker Execution Role: LociPlatform{LociEnv}SageMakerExecutionRole

    • SageMaker Full Access

    • S3 access (storage bucket)

    • SNS access (notifications)

    • ECR access (cross-account)

    • Model Package access (cross-account)

Lambda Roles

  • Lambda Execution Roles: Created via Kubernetes IRSA

    • ECS access

    • S3 access

    • EventBridge access

    • DynamoDB access

    • SageMaker access

Kubernetes Service Accounts

  • lambda-secret-updater: ServiceAccount with IRSA

    • Updates Kubernetes secrets from AWS Secrets Manager

  • fluent-bit: ServiceAccount with IRSA

    • CloudWatch Logs write access

IAM Policies

  • DynamoDB Pipeline Access Policy: LociPlatform{LociEnv}DynamoDBAccess

  • SageMaker S3 Access Policy: LociPlatform{LociEnv}SageMakerS3Access

  • SageMaker SNS Access Policy: LociPlatform{LociEnv}SageMakerSNSAccess

  • SageMaker Model Package Access Policy: LociPlatform{LociEnv}SageMakerModelPackageAccess

  • SageMaker ECR Access Policy: LociPlatform{LociEnv}SageMakerECRAccess

Security Groups

  • EKS Cluster Security Group: Cluster control plane access

  • Node Security Groups: Node group access rules

  • SageMaker Endpoint Security Group: Endpoint access (HTTPS from VPC)

  • VPC Endpoint Security Group: VPC endpoint access (HTTPS from VPC)

Secrets Management

  • Kubernetes Secrets:

    • Neo4j credentials (neo4j namespace)

    • PostgreSQL credentials (db namespace)

    • Lambda environment variables (lambda namespace)

    • AWS credentials (mcp namespace) - Used by MCP Agent and Code Optim Agent

  • AWS Secrets Manager: (if configured)

    • Database passwords

    • API keys


Storage

EBS Volumes

  • Database Node Volumes: 400GB gp3 encrypted volumes

  • Persistent Volumes: Created via EBS CSI Driver

    • PostgreSQL data volumes

    • Neo4j data volumes

    • Prometheus storage

S3 Storage

  • Application Data: loci-{environment} bucket

    • Model artifacts

    • SageMaker input/output

    • Application uploads

  • Backups: loci-backups-{environment} bucket

    • PostgreSQL backups (pgBackRest)

    • Neo4j backups

  • Terraform State: loci-terraform bucket

    • Infrastructure state files

Storage Classes (Kubernetes)

  • gp3: General purpose SSD (default)

  • io1: Provisioned IOPS SSD (if configured)


Summary

Total Resource Count (Approximate)

  • VPC Resources: ~15 (VPC, subnets, gateways, route tables, endpoints)

  • EKS Resources: 1 cluster + 3 node groups (Private + Database + MCP; Public disabled)

  • Kubernetes Namespaces: 8 (backend, frontend, db, neo4j, lambda, mcp, monitoring, logging)

  • Helm Charts: 12

  • ECS Resources: 1 cluster + 2 task definitions

  • SageMaker Resources: 1 model + 1 endpoint config + 1 endpoint

  • DynamoDB Tables: 1

  • S3 Buckets: 2-3

  • Route53 Zones: 2 (1 public + 1 private)

  • Route53 Records: ~8-10

  • IAM Roles: ~5-7

  • IAM Policies: ~8-10

  • Security Groups: ~5-7

  • CloudWatch Log Groups: ~6

  • CloudWatch Alarms: 2

  • EventBridge Resources: 1 event bus + 5 rules

  • SNS Topics: 1

  • Lambda Functions: 3 (Kubernetes-based)


Node Group Assignments

Services by Node Group

Private Node Group (workload-type: private)

  • Backend (backend namespace)

    • Application pods

    • NodeSelector: workload-type: private (explicitly configured)

    • Service: loci-backend

    • LoadBalancer: Internal NLB for backend access

  • Frontend (frontend namespace)

    • Application pods

    • NodeSelector: workload-type: private (explicitly configured)

    • Service: Frontend service

    • Ingress: ALB for external access

  • Prometheus (monitoring namespace)

    • Prometheus pods

    • NodeSelector: workload-type: private (explicitly configured)

    • Storage: 50Gi persistent volume

    • Service: ClusterIP (prometheus.monitoring.svc.cluster.local:9090)

  • Loki (logging namespace)

    • Loki pods

    • NodeSelector: workload-type: private (explicitly configured)

    • Storage: 50Gi persistent volume

    • Service: ClusterIP (loci-logs-loki.logging.svc.cluster.local:3100)

  • Fluent-bit (logging namespace)

    • DaemonSet pods (runs on all private nodes)

    • NodeSelector: workload-type: private (explicitly configured)

    • Log collection from all nodes

Database Node Group (workload-type: database)

  • Neo4j (neo4j namespace)

    • Neo4j database pods

    • NodeSelector: workload-type: database (explicitly configured)

    • Storage: 120Gi EBS volumes

    • Service: neo4j (ClusterIP) + LoadBalancer (Internal NLB)

  • PostgreSQL (db namespace)

    • PostgreSQL cluster pods (managed by PGO)

    • NodeSelector: workload-type: database (explicitly configured)

    • Replicas: 2 instances

    • Storage: 400Gi EBS volumes per instance

    • Service: loci-postgres-ha (LoadBalancer - Internal NLB)

    • PGO Operator: Deployed in db namespace (no nodeSelector - control plane component)

MCP Node Group (workload-type: mcp)

  • MCP Server (mcp namespace)

    • MCP Server pods

    • NodeSelector: workload-type: mcp (explicitly configured)

    • Service: loci-mcp-server (ClusterIP)

    • Resources: 1 CPU, 3Gi memory

  • MCP Agent (mcp namespace)

    • MCP Agent pods

    • NodeSelector: workload-type: mcp (explicitly configured)

    • Service: ClusterIP + Ingress (ALB)

    • Hostname: agent.example.com (example hostname)

    • Resources: 2 CPU, 3Gi memory

Public Node Group (workload-type: public) - DISABLED

  • Status: Not created (disabled in configuration)

  • Note: Public node group is disabled. System pods (CoreDNS, kube-proxy, ALB Ingress Controller) will run on private nodes.

All Node Groups

  • Fluent-bit (logging namespace)

    • DaemonSet deployed on all nodes

    • No nodeSelector (runs on all node types)

    • Collects logs from all nodes

Node Group Summary

  • Public Node Group: DISABLED (not created)

  • Private Node Group: All workloads run here (applications + system pods):

    • Backend (explicitly configured with workload-type: private)

    • Frontend (explicitly configured with workload-type: private)

    • Prometheus (explicitly configured with workload-type: private)

    • Loki (explicitly configured with workload-type: private)

    • Fluent-bit DaemonSet (explicitly configured with workload-type: private)

    • Min Size: 2 nodes

  • Database Node Group: Database workloads only:

    • Neo4j (explicitly configured with workload-type: database)

    • PostgreSQL (explicitly configured with workload-type: database)

    • Min Size: 2 nodes

  • MCP Node Group: MCP/AI workloads:

    • MCP Server (explicitly configured with workload-type: mcp)

    • MCP Agent (explicitly configured with workload-type: mcp)

    • Code Optim Agent (explicitly configured with workload-type: mcp)

    • Min Size: 2 nodes

    • Instance Type: t3.xlarge (4 vCPU, 16GB RAM)

Monitoring Services Details

  • Namespace: monitoring (Grafana, Prometheus) and logging (Loki, Fluent-bit)

  • Node Assignment:

    • Prometheus: Explicitly configured with workload-type: private - runs on Private Node Group

    • Loki: Explicitly configured with workload-type: private - runs on Private Node Group

    • Fluent-bit: Explicitly configured with workload-type: private - DaemonSet runs on all Private Node Group nodes

    • Grafana: No explicit nodeSelector (can run on any node group, typically scheduled on Private Node Group)

  • Storage: All monitoring services use persistent volumes (Grafana: 10Gi, Prometheus: 50Gi, Loki: 50Gi)


Last updated