# Best Practices for Product Telmetry with OpenTelemetry ## Deciding on a Signal Type (Traces vs Metrics vs Logs) There are three main signal types in OpenTelemetry (and general observability) and each one has tradeoffs that suit them to specific use cases. In OpenTelemetry, each signal type has a specific structure. - Metrics contain groups of counter or measurement records, and these records contain numeric properties and metric scopes - Traces are groupings of individual events (spans) that share parent/child relationships or context - Logs contain a text body ### Metrics If an event or piece of code you want telemetry for can be presented numerically, then the use case is best for metrics. Several examples include: - User login event (increment by 1) - Query event in AI Catalog (increment by 1) - Response time in milliseconds (add to distribution) - RAM utilized by a machine in megabytes (gauge the signal) This does not necessarily mean that the primary interest in the telemetry is the number of user login events. You could be collecting this telemetry event because you're more interested in the characteristics of who logged in, which you can add via attributes. The metric signal type is still the best fit because the telemetry record is inherently a metric. OpenTelemetry packages metrics of the same name together in groups containing each individual event. It uses a set structure and fixed properties for all metrics to form consist payload contents. This makes metrics easier for metric backends to store and query, and you'll be able to see specific metric events containing specific sets of labels. #### Visual Example of a Metric ``` { "scopeMetrics": [ { "scope": { "name": "test_service", "version": "dev-build" }, "metrics": [ { "name": "test", "description": "No description.", "unit": "#", "sum": { "dataPoints": [ { "attributes": [ { "key": "user.id", "value": { "stringValue": "test123" } } ], "startTimeUnixNano": "1767886298183762000", "timeUnixNano": "1767886298183888000", "asInt": "1" } ], "aggregationTemporality": 2 } } ] } ] } ``` ### Traces When the context of a group of events matters, such as a request path or the route of multiple interconnected events, traces are the signal type to use. The most common use case is request tracing across distributed systems, with user journeys being another possibility. In the case of tracing, the context is the important part. If the events in code do not matter in relation to each other then they should never be a trace. Traces are made up of spans. Spans are the individual events - such as a request to Service A - with traces being the container grouping spans together. This grouping happens with instrumentation via the sharing of context, which is created on the initial call and typically passed via request headers. The context could also be passed via function parameters for a use case like tracking user journeys within a single application. Spans have parent/child relationships. The context contains the most recent span, so when a new span is created if there is context to be extracted then the new span can refer to the previous span as its parent. Consider the following example: - Request from client to Service A - The request to Service A results in a call to Service B, where the chain of events ends Trace telemetry for this event would be a single trace. It would contain the parent span representing the call to Service A from the client, and the parent span's child span which represents the call from Service A to Service B. If service B was also instrumented then there would be a third span. This one would just be the child of the span representing the call from Service A to Service B, and not have any child spans itself because there are no more requests generated by this API route. #### Visual example of a trace payload: In this example Service A calls service B which calls service C, and the chain ends there. Note the empty `parent_id` for the root parent span, and the matching `parent_id` of each subsequent span to its parent's `span_id`. ``` { "name": "serviceC.process", "context": { "trace_id": "0x16beb6ad8f552cd02b12810da6e7ba40", "span_id": "0x3f9017e98c9dd191", "trace_state": "[]" }, "kind": "SpanKind.INTERNAL", "parent_id": "0x1d8619fc37fb1436", "start_time": "2026-01-08T20:27:42.584024Z", "end_time": "2026-01-08T20:27:42.584052Z", } { "name": "serviceB.process", "context": { "trace_id": "0x16beb6ad8f552cd02b12810da6e7ba40", "span_id": "0x1d8619fc37fb1436", "trace_state": "[]" }, "kind": "SpanKind.INTERNAL", "parent_id": "0x5df37dadbdc8180b", "start_time": "2026-01-08T20:27:42.582757Z", "end_time": "2026-01-08T20:27:42.585265Z", } { "name": "serviceA.process", "context": { "trace_id": "0x16beb6ad8f552cd02b12810da6e7ba40", "span_id": "0x5df37dadbdc8180b", "trace_state": "[]" }, "kind": "SpanKind.INTERNAL", "parent_id": null, "start_time": "2026-01-08T20:27:42.581063Z", "end_time": "2026-01-08T20:27:42.585825Z", } ``` ### Logs When many developers think of logs, they think of application/developer logs. And if you desire those logs to be part of your telemetry then the log signal should be used. Log telemetry also acts a bit like a catch all. The main component of the OpenTelemetry log payload is the log body. You can send multi-line data as telemetry just by exporting logs. For data that doesn't clearly fit metrics or traces, logs are usually the best fit. Examples of good log telemetry: - JSON payloads - Large strings that can't be parsed into individual attributes - Multi-line messages The reason to use logs rather than put telemetry data into a suboptimally utilized metric or trace is related to querying/processing. Log payloads are designed to be queried by their log body. Only the log body, logger name, and attributes are of high interest. In comparison metrics and traces have other fields that make up their payloads and require more detail to parse through. #### Visual Example of a log payload ``` { "scopeLogs": [ { "scope": { "name": "my_logger" }, "logRecords": [ { "timeUnixNano": "1767993488934098944", "observedTimeUnixNano": "1767993488934126000", "severityNumber": 13, "severityText": "WARN", "body": { "stringValue": "Hello" }, "attributes": [ { "key": "code.file.path", "value": { "stringValue": "/Users/rhettsaunders/Documents/GitHub/anaconda-otel-python/telemetry_test_local.py" } }, { "key": "code.function.name", "value": { "stringValue": "" } }, { "key": "code.line.number", "value": { "intValue": "33" } } ], } ] } ] } ```