The gcl_writer
Connector
The GCL Writer connector integrates Google Cloud Logging from the Google Cloud Platform Operations Suite. This connector allows json logs to be published to Cloud Logging.
Configuration
use std::time::nanos;
define connector gcl_writer from gcl_writer
with
config = {
"connect_timeout": nanos::from_seconds(1), # defaults to 1 second
"request_timeout: nanos::from_seconds(10),# defaults to 10 seconds
# Concurrency - number of simultaneous in-flight requests ( defaults to 4 )
# "concurrency" = 4,
}
end
The timeouts are in nanoseconds.
option | description |
---|---|
log_name | The default log_name for this configuration or default if not provided. The log_name can be overridden on a per-event basis in metadata |
resource | A default monitored resource object that is assigned to all log entries in entries that do not specify a value for resource. A comprehensive list of resources is available and resources can be discovered via the gcloud client with gcloud logging resource-descriptors list |
partial_success | This setting sets the behaviour of the connector with respect to whether valid entries should be written even if some entries in a batch set to Google Cloud Logging are invalid. Defaults to false |
dry_run | This setting enables a sanity check for validating that log entries are well formed and and valid by exercising the connector to write entries where resulting entries are not persisted. Useful primarily during initial exploration and configuration or after large configuration changes as a sanity check. Defaults to false |
default_severity | This setting sets a default log severity that can be overriden on a per event basis through metadata |
labels | This setting sets a default set of labels that can be overriden on a per event basis through metadata |
connect_timeout | The timeout in nanoseconds for connecting to the Google API |
request_timeout | The timeout in nanoseconds for each request to the Google API |
concurrency | The number of simultaneous in-flight requests ( defaults to 4 ) |
Metadata
Metadata can optionally be provided on a per event basis for events flowing to this connector's sink.
The metadata is encapsulated in the $gcl_writer
record and is may optionally specify one or many of the
following fields.
field | description |
---|---|
log_name | Overrides the default configured log_name for this event only |
log_severity | Overrides the default log severity for this event only |
resource | Overrides the default configured resource , if provided |
insert_id | An optional unique identifier for the log entry. If you provide a value, then Logging considers other log entries in the same project, with the same timestamp, and with the same insert_id to be duplicates which are removed in a single query result. However, there are no guarantees of de-duplication in the export of logs |
http_request | Optional information about the HTTP request associated with this log entry, if applicable |
labels | An optional map of system-defined or user-defined key-value string pairs related to the entry |
operation | Optional information about an operation associated with the log entry, if applicable |
trace | Optional. The REST resource name of the trace being written to Cloud Trace in association with this log entry. For example, if your trace data is stored in the Cloud project "my-trace-project" and if the service that is creating the log entry receives a trace header that includes the trace ID "12345", then the service should use "projects/my-tracing-project/traces/12345". The trace field provides the link between logs and traces. By using this field, you can navigate from a log entry to a trace. |
span_id | Optional. The ID of the Cloud Trace span associated with the current operation in which the log is being written. For example, if a span has the REST resource name of "projects/some-project/traces/some-trace/spans/some-span-id", then the spanId field is "some-span-id". A Span represents a single operation within a trace. Whereas a trace may involve multiple different microservices running on multiple different machines, a span generally corresponds to a single logical operation being performed in a single instance of a microservice on one specific machine. Spans are the nodes within the tree that is a trace. Applications that are instrumented for tracing will generally assign a new, unique span ID on each incoming request. It is also common to create and record additional spans corresponding to internal processing elements as well as issuing requests to dependencies. The span ID is expected to be a 16-character, hexadecimal encoding of an 8-byte array and should not be zero. It should be unique within the trace and should, ideally, be generated in a manner that is uniformly random. |
trace_sampled | The sampling decision of the trace associated with the log entry. True means that the trace resource name in the trace field was sampled for storage in a trace backend. False means that the trace was not sampled for storage when this log entry was written, or the sampling decision was unknown at the time. A non-sampled trace value is still useful as a request correlation identifier. The default is False |
source_location | Optional. Source code location information associated with the log entry, if any |
HTTP Request metadata
Optional related set of HTTP request data relevant to the log entry JSON payload.
field | description |
---|---|
request_method | The HTTP verb for the request |
request_url | The URL, path and params for the request |
request_size | The size in bytes of the request body |
status | The status of the response to the request |
response_size | The size in bytes of the response body |
user_agent | The user_agent header value |
remote_ip | The recorded remote IP address, if available |
server_ip | The server IP address, if available |
referer | The referer, if available |
latency | The round trip latency in nanoseconds since epoch |
cache_lookup | True if there was a cache lookup for the request |
cache_hit | True if there was a cache lookup, and it was a hit |
cache_validated_with_origin_server | True, if there was a validated cache lookup with the origin server |
cache_fill_bytes | Bytes of the cache response, if there was a cache hit |
protocol | The effective protocol eg: websockets, grpc |
Operation metadata
Optional operation metadata field relevant to the log entry of the form:
{
"id": "a unique id for the operation",
"producer": "id of the producer of the operation",
"first": true, # is this the first of a related sequence
"last": true, # is this the last of a related sequence
}
Source location metadata
Optional source code location information if available of the form
{
"file": "path/to/file.rs",
"line": 200,
"function": "snot_badger_transformer",
}
Payload structure
The event value is transformed to JSON and transmitted as a JSON Payload with the log entry and any provided optional metadata.
Example
A worked example flow that uses a metronome source to inject log events into GCP cloud logging periodically which has the basic visual structure as below.
define flow main
flow
use std::time::nanos;
use tremor::connectors;
use tremor::system;
use integration;
use google::cloud::logging as gcl;
# We use a metronome as an event source in this
# example. We fire periodic events every 500
# milliseconds
define connector metronome from metronome
with
config = {"interval": nanos::from_millis(500)}
end;
# Our connection to the GCP cloud logging service
define connector google_cloud_logging from gcl_writer
with
config = {
# Default log_name
"log_name": "projects/my-project-id/logs/test-gcl",
# If connecting external from GCP, use a global resource
"resource": {
"type": "global",
"labels": {
"project_id": "my-project-id"
}
},
# This is not a test
# "partial_success": false,
# This is not a dry run
# "dry_run": false,
# 500ms connection timeout
"connect_timeout": nanos::from_millis(500),
# 1s request timeout
"request_timeout": nanos::from_seconds(1),
# Use `debug` log severity by default
"default_severity": gcl::severity::DEBUG,
# Indicate tremor version
"labels": {
"tremor-version": system::version()
}
}
end;
define pipeline main
pipeline
define script add_metadata_overrides
script
use std::time::nanos;
use google::cloud::logging as gcl;
# Example of setting metadata for each log event
let $gcl_writer = {
# "log_name": "projects/my-project-id/logs/test-gcl2",
"log_severity": gcl::severity::INFO,
"insert_id": "x" + gcl::gen_trace_id_string(),
"http_request": {
"request_method": "GET",
"request_url": "https://www.tremor.rs/",
"request_size": 0,
"status": 200,
"response_size": 1024,
"user_agent": "tremor",
"remote_ip": "164.90.232.184",
"server_ip": "localhost",
"referer": "https://www.tremo.rs",
"latency": nanos::from_millis(10),
},
"labels": {
"tremor-override": "crash-overrun",
},
"operation": {
"id": "snot-id-" + gcl::gen_span_id_string(),
"producer": "github.com/tremor-rs/gcl_writer/test",
"first": true,
"last": true,
},
"trace": gcl::gen_trace_id_string(),
"span_id": gcl::gen_span_id_string(),
"trace_sampled": false,
"source_location": { "file": "snot.rs", "line": 10, "function": "badger" },
};
event
end;
create script add_metadata_overrides;
select event from in into add_metadata_overrides;
select event from add_metadata_overrides into out;
end;
define pipeline exit
pipeline
select {
"exit": 0,
} from in into out;
end;
# create connector exit from connectors::exit;
create connector file from integration::write_file;
create connector metronome;
create connector google_cloud_logging;
create pipeline main;
# create pipeline exit;
connect /connector/metronome to /pipeline/main;
connect /pipeline/main to /connector/google_cloud_logging;
connect /pipeline/main to /connector/file;
# connect /pipeline/main to /connector/exit;
end;
deploy flow main;