Architecture Decision Records

Only decisions that materially constrain the architecture belong here. Domain rules and terminology live directly in Domain.

ADR-001: Brand-as-tenant multi-tenancy

Status: accepted · Date: 2026-06-11

Decision: Use one shared application and database. Every brand-owned record carries brand_id; repository/query APIs enforce tenant scope. GTIN ownership is an explicit Product Catalog mapping. A Brand User belongs to exactly one Brand.

Why: It provides testable brand isolation and cheap onboarding without multiplying infrastructure.

Trade-off: A tenant-scoping defect can leak data, so isolation requires central tests.

ADR-002: Modular monolith first

Status: superseded by ADR-007 · Date: 2026-06-11

Decision: Deploy the MVP as a modular monolith. Product Passport, Personalization & Verdict, Fridge, Identity & Consent, and Brand Analytics are code and ownership boundaries, not independently deployed services.

Why: The budget, team size and iteration speed do not justify microservices.

Trade-off: The course assignment may evolve modules into services later; boundaries must remain explicit enough to permit that extraction.

ADR-003: Resolve from a local catalog, assemble progressively

Status: accepted · Date: 2026-06-11

Decision: PackyTrace implements the GS1 Digital Link Resolver. A scan resolves a per-GTIN ProductCatalogEntry, while lot, serial and expiry remain per-item ScannedItem data. ResolvedPassport progressively combines the catalog entry with independent external sections. Open Food Facts and Agribalyse are the MVP sources behind ACL adapters and circuit breakers.

Why: Local catalog data keeps identity and Verdict computation reliable while external failures degrade individual sections instead of the whole scan.

Trade-off: Catalog data must be cached and kept fresh.

ADR-004: Server-owned consumer state with pseudonymous visitors

Status: accepted · Date: 2026-06-11

Decision: Create a stable pseudonymous Visitor ID at first scan and link it when an Account is created. Health Profiles, consent records and Fridges are server-owned. Consent revocation, profile deletion and account deletion remain distinct operations; account deletion nullifies Account references on retained anonymous ScanRecords.

Why: This supports measurement, re-screening, expiry alerts and verifiable erasure without exposing health data in scan requests.

Trade-off: Visitor IDs undercount across cleared storage, devices and shared devices.

ADR-005: Aggregate before the privacy wall

Status: accepted · Date: 2026-06-11

Decision: Raw scan, Verdict, Fridge-save and account-link facts remain consumer-side. Only minimum-group-size BrandMetricBatchPublished aggregates cross into Brand Analytics.

Why: Brands need engagement metrics, but no per-scan or per-visitor trail may cross the privacy wall.

Trade-off: Small groups and real-time individual events cannot appear in dashboards.

ADR-006: Versioned domain-event contract

Status: accepted · Date: 2026-06-11

Decision: Events use a common envelope with event ID, type, occurrence time, schema version, correlation ID and causation ID. Payloads contain domain reason codes, not localized presentation messages.

Why: Versioning and tracing are required for reliable asynchronous workflows.

Trade-off: Producers and consumers must maintain schema compatibility.

ADR-007: Microservices from the start

Status: accepted · supersedes ADR-002 · Date: 2026-06-11

Decision: Deploy all seven services independently from day one: the five context services plus api-gateway and measurement-pipeline. The target topology of the microservices decomposition is built directly, with no intermediate monolith phase.

Why: The course's Part II/III deliverables — per-service containers, Kafka, and Kubernetes — are the project's real goal. Building the target topology directly avoids a throwaway monolith phase and exercises the service boundaries from the first commit.

Trade-off: More operational surface for a solo developer; MVP iteration is slower than inside a single deployable.

ADR-008: Polyglot service stacks by fit

Status: accepted · Date: 2026-06-11

Decision: Each service uses the language that fits its job. Go (chi, pgx + sqlc, franz-go) for api-gateway, passport-service, fridge-service and measurement-pipeline — proxying, external-source resilience, event-sourcing folds and Kafka throughput. TypeScript/Node (Fastify, Kysely, Confluent Kafka client) for identity-service, personalization-service and brand-analytics-service — auth ecosystem, fast-iterating rule policies and dashboard-shaped queries. All services share the same hexagonal layout (domain / application / adapters).

Why: Each language goes where it is strongest, and a polyglot fleet demonstrates real microservice independence rather than asserting it.

Trade-off: Two toolchains to maintain, and cross-cutting plumbing (config, logging, metrics) is implemented twice.

ADR-009: Single Postgres, schema-per-service with per-service roles

Status: accepted · Date: 2026-06-11

Decision: One Postgres instance with six schemaspassport, personalization, fridge, identity, measurement, brand_analytics — and six roles, one per service, each GRANT-restricted to its own schema. No cross-schema access. The Fridge event store is an append-only events table plus projections in the fridge schema; the measurement schema holds short-retention raw facts and aggregation windows.

Why: A single infrastructure piece keeps local and deployed environments simple, while per-service roles make data ownership enforced by the database rather than by discipline.

Trade-off: A shared instance couples availability: if Postgres is down, every stateful service is down.

ADR-010: Kafka for asynchronous facts, JSON Schema contracts

Status: accepted · Date: 2026-06-11

Decision: Run Apache Kafka (KRaft, single node) from day one, carrying only asynchronous domain facts (ProductScanned, VerdictComputed, ItemAddedToFridge, ItemConsumed/ItemDiscarded, VisitorLinkedToAccount, the consent/erasure events, AlertRaised, BrandMetricBatchPublished). Immediate request/response interactions remain synchronous internal REST per API Design §3. Event contracts are versioned JSON Schemas in a shared contracts/ directory — the ADR-006 envelope plus per-event payloads only, never domain entities or database models — with code generated for both Go and TypeScript. Consumers tolerate unknown optional fields; payloads evolve additively.

Why: Independently deployed services need a real broker for the event catalog anyway, and JSON Schema gives contract governance across two languages without a schema registry or entity coupling.

Trade-off: Contract discipline lives in CI checks rather than a registry, and Part III becomes a Kafka deepening (topic design, partitioning, consumer groups, delivery semantics) rather than a re-engineering.

ADR-011: Delegate authentication to Keycloak

Status: accepted · Date: 2026-06-11

Decision: Authentication is delegated to a self-hosted Keycloak (OIDC). The api-gateway validates Keycloak-issued tokens; identity-service keeps only the domain parts: Visitor identities, visitor→account linking, the consent ledger, and Brand/BrandUser.

Why: Identity is a generic subdomain, and health-adjacent data must not ride on hand-rolled password auth — password reset, refresh rotation, token revocation and session handling come for free from a hardened provider.

Trade-off: A heavyweight JVM container joins the fleet and its configuration must be versioned alongside the code.

ADR-012: Services own and apply their migrations at startup

Status: accepted · Date: 2026-06-13

Decision: Each service carries its schema migrations in its own repository directory and applies them itself at startup, connecting as its own database role (ADR-009 already confines every role to its schema via search_path, so unqualified DDL and the migration version table land in the right schema automatically). Go services embed plain-SQL migrations and run them with goose as a library; TypeScript services use Kysely's built-in Migrator. Generated data access (sqlc) reads the same SQL files as its schema source.

Why: No extra containers, init steps or cross-language tooling — make up and make smoke keep working unchanged, and a service plus its database schema deploy as one unit, preserving exclusive data ownership.

Trade-off: Concurrent replicas of one service could race on startup migration (acceptable single-instance; revisit before horizontal scaling), and there is no central migration audit across services.

ADR-013: GS1 anchoring — pragmatic-conformant resolver, strict GTIN, own SDK

Status: accepted · Date: 2026-06-13

Decision: This refines how PackyTrace implements the resolver promised in ADR-003.

  • Resolver: pragmatic now, conformant-ready. The Resolver parses the Digital Link and 302-redirects to the product page, but carries an internal linkType model (default gs1:pip, the product-information page). A GS1 Conformant Resolver surface (/.well-known/gs1resolver, application/linkset+json, content-negotiated link resolution) is therefore an additive later step, not a rewrite.
  • GTIN handling is strict. The canonical catalog key is a mod-10 check-digit-validated, zero-padded GTIN-14. GTIN-8/12/13/14 normalize to GTIN-14 before lookup; an invalid check digit is rejected, never silently mis-resolved. The catalog stores GTIN-14.
  • Canonical AIs: 01 (GTIN), 10 (lot), 21 (serial), 17 (expiry, YYMMDD, date-validated). Non-standard "friendly" path forms are dropped.
  • Parsing/validation lives in a standalone GS1 Digital Link SDK owned by the organisation and bound for open source — pure GS1 General Specifications / Syntax Dictionary logic with zero PackyTrace domain concepts (no Brand, catalog, or verdict ever enter it). It ships parity Go and TypeScript implementations over one shared golden test-vector corpus, and is the single source of truth; the existing frontend gs1-decoder.ts is demoted to an offline UX hint and replaced by the SDK's TS package. For the thin slice the SDK lives in-repo as an isolated, independently versioned package; extraction to its own repository and publication (Go module + npm) is post-slice.

Why: GS1 is the platform's namesake standard, so correctness (no wrong-product resolves) and a credible path to interoperability outweigh a certified resolver on day one. A generic SDK keeps the standard logic in one tested place across the Go backend and TS frontend, prevents parser drift, and turns a course requirement into a reusable open-source artefact. Because the SDK is generic GS1 logic and not PackyTrace code, it is an ordinary dependency under ADR-010, not cross-service code sharing — a boundary that holds only while it stays free of PackyTrace domain concepts.

Trade-off: Go/TS parity must be enforced by the shared vector corpus rather than a single binary; full resolver conformance (/.well-known/gs1resolver, linkset) and the open-source extraction are deferred.

ADR-014: AWS deployment — single ARM box, compose, Terraform

Status: accepted · Date: 2026-06-16

Context: The platform must run on a real public URL within AWS free-plan credits (~$100–200, ~6 months). The managed-service shape (ECS Fargate per service + RDS + MSK) costs ~$200+/month — MSK alone exhausts the credits in weeks — so it is not viable on a free budget. The HTTPS requirement is hard: barcode scanning needs camera access, which browsers only grant over TLS.

Decision: Deploy the existing container stack onto one EC2 instance (c7i-flex.large, 4 GB x86_64) running Docker Compose, provisioned with Terraform (deployment/aws/, account 398152419692, eu-central-1). The instance type is constrained by the AWS Free Plan, which only permits launching a fixed allowlist of types (ARM t4g.medium is not on it; c7i-flex.large is the roomiest 4 GB option available).

  • Self-hosted data plane. Postgres, the broker, and Keycloak run as containers on the box, not as RDS/MSK. The schema-per-service role isolation (ADR-009) is preserved exactly — the prod Postgres init creates the same GRANT-limited roles, with passwords injected from SSM instead of the dev defaults. Keycloak gets its own Postgres database (production mode forbids the dev H2 store) and stays internal.
  • Redpanda replaces Apache Kafka in the deployed stack only (Kafka-API compatible, JVM-free) so the broker fits the box's RAM. Dev/CI keep Apache Kafka; franz-go is unchanged. No contract or ADR-010 change — same topics, same envelope.
  • Caddy is the only public surface. It terminates TLS (Let's Encrypt for the configured host), serves the SPA, and proxies /api/*, /01/*, /health — same-origin, no CORS. Postgres, Redpanda, Keycloak, and all 7 services are internal-only. Because auth is server-side ROPC (ADR-011), Keycloak is never exposed and the token issuer stays its internal address.
  • Images are built amd64 in CI and pushed to GHCR; the box pulls via a read:packages PAT. Shell access is SSM Session Manager (no SSH, no key pairs). Secrets are generated by Terraform and stored as SSM SecureStrings; state and data persist on a separate EBS volume so the instance is replaceable.

Why: A single box is the only shape that fits the budget while keeping the architecture's load-bearing invariants (DB-enforced tenant/schema isolation, Keycloak auth, contracts-only coupling) intact. Terraform makes the whole footprint reproducible and destroy-able to stop spending.

Trade-off: No high availability — the box is a single point of failure, and a replacement incurs cold-start + cert re-issue. This is deployment topology only; it does not relax any service boundary (ADR-008/009/010) or the auth/privacy rules (ADR-001/005/011). Migrating to managed services later is a deployment change, not a code change. Backups (EBS snapshots / pg_dump to S3) and a CDN are deferred.