We can't find the internet
Attempting to reconnect
Something went wrong!
Attempting to reconnect
Telemetry-Driven Development: Observability from Day One
Building observable systems with the OTEL doctrine and :telemetry
Prismatic Engineering
Prismatic Platform
Observability Is Not Optional
The OTEL doctrine (Observability Telemetry Enforcement Layer) is one of the platform's 18 enforcement pillars. It mandates that every GenServer, controller, LiveView, and external API call must emit telemetry events. Observability is not something you bolt on after a production incident -- it is a design constraint from day one.
The :telemetry Foundation
Elixir's :telemetry library provides a lightweight, VM-native event system. Events are tuples of name, measurements, and metadata:
:telemetry.execute(
[:prismatic, :osint, :adapter, :execute],
%{duration: duration_ms, result_count: length(results)},
%{adapter: adapter_name, query: sanitized_query}
)
The key principle is separation of emission and handling. The code that does work emits events. Completely separate code decides what to do with those events -- log them, aggregate them, send them to Prometheus, or trigger alerts.
GenServer Telemetry
Every GenServer in the platform emits telemetry for three lifecycle events:
init/1
def init(config) do
start_time = System.monotonic_time()
# ... initialization logic ...
duration = System.monotonic_time() - start_time
:telemetry.execute(
[:prismatic, :genserver, :init],
%{duration: duration},
%{module: __MODULE__, config_keys: Map.keys(config)}
)
{:ok, state}
end
handle_call/handle_cast
Every message handler emits duration, queue length, and result status. This data reveals which GenServers are bottlenecks and which message types are slowest.
terminate/2
Termination events capture the reason and the final state size. Unexpected terminations (anything other than :normal or :shutdown) trigger escalation alerts.
Controller and LiveView Instrumentation
Phoenix already emits telemetry for HTTP requests via Plug.Telemetry. The platform extends this with:
Request Context
:telemetry.execute(
[:prismatic, :request, :complete],
%{duration: duration_ms, status: status_code},
%{
path: conn.request_path,
method: conn.method,
user_id: get_user_id(conn),
request_id: Logger.metadata()[:request_id]
}
)
LiveView Mount and Event Handling
LiveView mounts and event handlers emit timing data that feeds into the performance gate system. The PERF doctrine mandates:
Any violation is flagged in the telemetry dashboard and blocks deployment if the violation persists across multiple measurements.
Span Creation for Distributed Tracing
For operations that span multiple processes or external calls, the platform creates trace spans:
def execute_investigation(case_id) do
span_id = generate_span_id()
:telemetry.span(
[:prismatic, :investigation, :execute],
%{case_id: case_id, span_id: span_id},
fn ->
result = do_investigation(case_id)
{result, %{entity_count: length(result.entities)}}
end
)
end
:telemetry.span/3 automatically emits start and stop events (or exception events on failure), with duration calculated precisely using monotonic time. Spans can be nested and correlated using span IDs.
Metric Collection and Aggregation
Raw telemetry events are ephemeral -- they fire and are gone. The platform uses Telemetry.Metrics to define persistent aggregations:
|-------------|---------|---------|
prismatic.osint.adapter.execute.countprismatic.osint.adapter.execute.result_countprismatic.genserver.mailbox_lengthprismatic.request.durationThese metrics are exported to Prometheus for long-term storage and Grafana for visualization.
Structured Logging
The platform uses structured logging exclusively. No unstructured string interpolation:
# Correct: structured metadata
Logger.info("Investigation completed",
case_id: case_id,
entity_count: length(entities),
duration_ms: duration,
source: :dd_engine
)
# Incorrect: unstructured string
# Logger.info("Investigation #{case_id} found #{length(entities)} entities in #{duration}ms")
Structured logs are machine-parseable, searchable, and can be correlated across services using request IDs and span IDs.
The ErrorFeed Real-Time Dashboard
The ErrorFeed is a LiveView dashboard at /admin/error-feed that provides real-time visibility into platform errors:
Features
Architecture
Application Code
|
v
StreamLoggerBackend (captures all log levels)
|
v
PatternTracker (classifies and groups)
|
v
PubSub "error_patterns" topic
|
v
ErrorFeedLive (renders in browser)
The StreamLoggerBackend is a custom Logger backend that intercepts all log events at runtime. It filters for error-level events and forwards them to the PatternTracker, which maintains a sliding window of recent errors and identifies patterns.
OTEL Doctrine Enforcement
The OTEL pillar is enforced at two levels:
2. CI/CD: mix check.doctrines --pillar otel runs a comprehensive audit of all modules, flagging any that lack the required telemetry integration.
Violations are advisory in pre-commit (warning) but blocking in CI (the build fails). This gives developers a chance to fix issues before pushing while ensuring nothing reaches production without proper observability.
If you cannot observe it, you cannot improve it. If you cannot measure it, you cannot manage it.