iTranslated by AI
Visualizing Metrics and Trace Data with Cloud Run Sidecars
Introduction
Recently, sidecar container support was added to Cloud Run.
Previously, Cloud Run could only use a single container, but now you can deploy two or more multiple containers.
This allows you to use Envoy as a sidecar or place an OpenTelemetry collector to offload distributed tracing configurations from the application.
Goal
Using this mechanism, I would like to outline the process of sending Prometheus-compatible metrics and trace information from a Cloud Run application via a sidecar OpenTelemetry collector to Victoria Metrics / Grafana Tempo locally, and to Cloud Monitoring / Cloud Trace on Cloud Run for visualization.
The final architecture is expected to look like this:

The verification code used this time is managed in the following repository:
Setup
Preparation
First, we will add the implementation to the application side to output Prometheus metrics and trace information.
I am using a sample application built with Echo.
Since the Echo implementation itself is out of scope this time, I will only provide a brief excerpt.
Metrics
The settings for outputting metrics for Prometheus exist as an Echo library, so we will use that.
import (
"github.com/labstack/echo-contrib/echoprometheus"
"github.com/labstack/echo/v4"
"github.com/labstack/echo/v4/middleware"
)
func main() {
e := echo.New()
e.Use(echoprometheus.NewMiddleware(DefaultComponentName))
e.GET("/metrics", echoprometheus.NewHandler())
}
By adding the implementation above, you can output HTTP-related metrics.
# HELP app_request_duration_seconds The HTTP request latencies in seconds.
# TYPE app_request_duration_seconds histogram
app_request_duration_seconds_bucket{code="200",host="app:8080",method="GET",url="/metrics",le="0.005"} 0
app_request_duration_seconds_bucket{code="200",host="app:8080",method="GET",url="/metrics",le="0.01"} 4
app_request_duration_seconds_bucket{code="200",host="app:8080",method="GET",url="/metrics",le="0.025"} 35
app_request_duration_seconds_bucket{code="200",host="app:8080",method="GET",url="/metrics",le="0.05"} 36
app_request_duration_seconds_bucket{code="200",host="app:8080",method="GET",url="/metrics",le="0.1"} 37
app_request_duration_seconds_bucket{code="200",host="app:8080",method="GET",url="/metrics",le="0.25"} 37
app_request_duration_seconds_bucket{code="200",host="app:8080",method="GET",url="/metrics",le="0.5"} 37
app_request_duration_seconds_bucket{code="200",host="app:8080",method="GET",url="/metrics",le="1"} 37
Trace Information
Tracing follows the OpenTelemetry implementation.
In this setup, we use the OpenTelemetry Collector as a sidecar for the exporter.
The OpenTelemetry Collector can receive requests via both HTTP and gRPC, but we are using gRPC in this instance.
Create an exporter as follows:
func NewExporter(ctx context.Context, cfg *config.Config) (sdktrace.SpanExporter, error) {
client := otlptracegrpc.NewClient(
otlptracegrpc.WithInsecure(),
// Specify the OpenTelemetry Collector endpoint here
// When using docker-compose locally, it is otel:4317
otlptracegrpc.WithEndpoint(cfg.OtelCollectorEndpoint),
otlptracegrpc.WithDialOption(grpc.WithBlock()),
)
exporter, err := otlptrace.New(ctx, client)
if err != nil {
return nil, err
}
return exporter, nil
}
Create a trace provider using the created exporter.
r := resource.NewWithAttributes(
semconv.SchemaURL,
semconv.ServiceNameKey.String(DefaultComponentName),
)
traceProvider := sdktrace.NewTracerProvider(
sdktrace.WithBatcher(exporter),
sdktrace.WithSampler(sdktrace.AlwaysSample()),
sdktrace.WithResource(r),
)
otel.SetTracerProvider(traceProvider)
Call the generated trace provider within any function to generate spans.
func home(c echo.Context) error {
_, span := tracer.Start(c.Request().Context(), "home")
defer span.End()
return c.JSON(http.StatusOK, nil)
}
OpenTelemetry Collector
We will build the OpenTelemetry Collector as a component to receive metrics and trace information.
Local
This describes the OpenTelemetry Collector configuration for running locally.
config.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: "0.0.0.0:4317"
prometheus:
config:
global:
external_labels: {}
scrape_configs:
- job_name: cloud-run-otel
scrape_interval: 10s
static_configs:
- targets:
- localhost:8888
- job_name: cloud-run
scrape_interval: 10s
metrics_path: /metrics
static_configs:
- targets:
- app:8080
exporters:
prometheusremotewrite:
endpoint: http://victoriametrics:8428/api/v1/write
otlp:
endpoint: http://tempo:4317
tls:
insecure: true
service:
telemetry:
logs:
level: WARN
encoding: json
extensions:
- health_check
pipelines:
metrics:
receivers:
- prometheus
exporters:
- prometheusremotewrite
traces:
receivers:
- otlp
exporters:
- otlp
extensions:
health_check: null
Cloud Run
This describes the OpenTelemetry Collector configuration for running on Cloud Run.
config.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: "0.0.0.0:4317"
prometheus:
config:
global:
external_labels:
service: ${K_SERVICE}
revision: ${K_REVISION}
scrape_configs:
- job_name: cloud-run-otel
scrape_interval: 10s
static_configs:
- targets:
- localhost:8888
- job_name: cloud-run
scrape_interval: 10s
metrics_path: /metrics
static_configs:
- targets:
- localhost:8080
exporters:
googlemanagedprometheus:
project: ${PROJECT_ID}
googlecloud:
trace:
endpoint: cloudtrace.googleapis.com:443
service:
telemetry:
logs:
level: WARN
encoding: json
extensions:
- health_check
pipelines:
metrics:
receivers:
- prometheus
processors:
- batch
- resourcedetection
- resource
exporters:
- googlemanagedprometheus
traces:
receivers:
- otlp
exporters:
- googlecloud
processors:
batch:
send_batch_max_size: 200
send_batch_size: 200
timeout: 5s
resourcedetection:
detectors:
- env
- gcp
resource:
attributes:
- key: service.name
value: ${K_SERVICE}
action: upsert
- key: service.instance.id
from_attribute: faas.id
action: insert
extensions:
health_check: null
Configuration Details
First, we describe the receiver configuration for the OpenTelemetry Collector to receive metrics and tracing information. As mentioned earlier, we define settings to receive tracing information via gRPC.
receivers:
otlp:
protocols:
grpc:
endpoint: "0.0.0.0:4317"
For metrics, we use a configuration similar to Prometheus settings to scrape /metrics from the application container. To make monitoring easier during service operation, we add the Cloud Run reserved environment variables K_SERVICE and K_REVISION as labels to all metrics using global.external_labels, allowing us to distinguish metrics by Cloud Run revision.
receivers:
prometheus:
config:
global:
external_labels:
# Adding Cloud Run revision information as metric labels
service: ${K_SERVICE}
revision: ${K_REVISION}
scrape_configs:
- job_name: cloud-run-otel
scrape_interval: 10s
static_configs:
- targets:
- localhost:8888
- job_name: cloud-run
scrape_interval: 10s
metrics_path: /metrics
static_configs:
- targets:
- app:8080
When running on Cloud Run, the application container and the OpenTelemetry Collector belong to the same network, so static_configs must be changed to the following.
static_configs:
- targets:
- localhost:8080
Next, we describe the exporter settings for sending the metrics and trace information received by the OpenTelemetry Collector to the actual data stores.
Locally, we use Victoria Metrics as the data store for metrics.
exporters:
prometheusremotewrite:
endpoint: http://victoriametrics:8428/api/v1/write
When running on Cloud Run, we use Google-managed Prometheus, so we use the following configuration for the exporter.
*Note: PROJECT_ID is set as an environment variable in the OpenTelemetry Collector container on Cloud Run.
exporters:
googlemanagedprometheus:
project: ${PROJECT_ID}
For local trace information, we use Grafana Tempo.
exporters:
otlp:
endpoint: http://tempo:4317
tls:
insecure: true
When running on Cloud Run, we use Cloud Trace, so we use the following configuration for the exporter.
exporters:
googlecloud:
trace:
endpoint: cloudtrace.googleapis.com
Then, we describe the local service configuration to route these settings.
service:
telemetry:
logs:
level: ERROR
encoding: json
extensions:
- health_check
pipelines:
metrics:
receivers:
- prometheus
exporters:
- prometheusremotewrite
traces:
receivers:
- otlp
exporters:
- otlp
extensions:
health_check: null
For Cloud Run, we use the following configuration.
service:
telemetry:
logs:
level: WARN
encoding: json
extensions:
- health_check
pipelines:
metrics:
receivers:
- prometheus
processors:
- batch
- resourcedetection
- resource
exporters:
- googlemanagedprometheus
traces:
receivers:
- otlp
exporters:
- googlecloud
Additionally, for Cloud Run, we add processor settings to automatically insert the necessary configurations for Google-managed Prometheus.
For resourcedetection, we refer to this page.
processors:
batch:
send_batch_max_size: 200
send_batch_size: 200
timeout: 5s
resourcedetection:
detectors:
- env
- gcp
resource:
attributes:
- key: service.name
value: ${K_SERVICE}
action: upsert
- key: service.instance.id
from_attribute: faas.id
action: insert
Data Stores
We won't go into the details of Victoria Metrics or Tempo configurations as they are out of scope for this article, but the settings for running them locally are provided in the GitHub repository if you are interested.
Verification
Verification Locally
I have created a Docker Compose environment so that we can run all the settings mentioned above locally. Let's start it up and check the operations.
Use the following command to start the environment.
docker compose up -d
Once it is up, access Grafana at localhost:3000.
The default login credentials are admin for both the username and password.
After logging in, verify the metrics and trace information in Explore.
For metrics, with the data source set to VictoriaMetrics, you can confirm that Echo's metrics are being visualized by running a query like the one below.
promhttp_metric_handler_requests_total

For trace information, you can check the trace information emitted from the application by setting the data source to Tempo.

Verification on Cloud Run
Now that we have completed the operational check locally, I will describe the settings and configuration for actually running it on Cloud Run.
The Cloud Run settings use a Knative manifest.
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: cloud-run-mco
annotations:
run.googleapis.com/launch-stage: BETA
run.googleapis.com/ingress: all
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/maxScale: '1'
spec:
containerConcurrency: 1
timeoutSeconds: 300
serviceAccountName: "" # Specify a Google Service Account with roles/cloudtrace.agent and roles/monitoring.metricWriter permissions
containers:
- name: app
image: "" # Specify the built application container
env:
- name: OTEL_COLLECTOR_ENDPOINT
value: localhost:4317
ports:
- name: http1
containerPort: 8080
resources:
limits:
cpu: 500m
memory: 256Mi
startupProbe:
timeoutSeconds: 5
periodSeconds: 5
failureThreshold: 3
httpGet:
path: /
port: 8080
- image: "" # Specify the built OpenTelemetry Collector container
env:
- name: PROJECT_ID
value: "" # Specify the GCP project you are running in
resources:
limits:
cpu: 200m
memory: 128Mi
startupProbe:
initialDelaySeconds: 10
timeoutSeconds: 10
periodSeconds: 30
failureThreshold: 3
httpGet:
path: /
You can confirm that the metrics and trace information are visualized in Cloud Monitoring and Cloud Trace, respectively.


Summary
By delegating the transmission and collection of metrics and trace information from the application implementation to the OpenTelemetry Collector, it is now possible to achieve a similar monitoring mechanism not only in Cloud Run but in any environment.
OpenTelemetry Collector settings need to be managed in YAML, and the configurations often differ between local and Cloud Run. While not introduced in this article, by using tools like Cue to apply differences per environment while maintaining commonality, managing configuration files can become quite simple.
If you are interested, please check the GitHub repository.
If you have any feedback or suggestions, I look forward to hearing from you in the comments or on X (Twitter).
Discussion