GCP stackdriver-agent installed on VM send strange logs every minute

please can you help me with the following issue.

I have a backend service on node.js I deployed it on GCE VM. It's working fine, but after installing logging and monitoring agent I see very strange logs in Logs Viewer. I looked at the paid that generates that logs. It's stackdriver-agent.

Here are them:

A 2020-05-15T22:45:26Z write_gcm: can not take infinite value
A 2020-05-15T22:45:26Z write_gcm: wg_typed_value_create_from_value_t_inline failed for swap/percent/value! Continuing. 
A 2020-05-15T22:45:26Z write_gcm: can not take infinite value 
A 2020-05-15T22:45:26Z write_gcm: wg_typed_value_create_from_value_t_inline failed for swap/percent/value! Continuing. 
A 2020-05-15T22:45:26Z write_gcm: can not take infinite value 
A 2020-05-15T22:45:26Z write_gcm: wg_typed_value_create_from_value_t_inline failed for swap/percent/value! Continuing. 
A 2020-05-15T22:45:28Z write_gcm: Server response (CollectdTimeseriesRequest) contains errors:#012{#012  "payloadErrors": [#012    {#012      "error": {#012        "code": 3,#012        "message": "Unsupported collectd plugin/type combination: plugin: \"processes\" type: \"io_octets\""#012      }#012    },#012    {#012      "index": 5,#012      "error": {#012        "code": 3,#012        "message": "Unsupported collectd plugin/type combination: plugin: \"processes\" type: \"io_octets\""#012      }#012    },#012    {#012      "index": 10,#012      "error": {#012        "code": 3,#012        "message": "Unsupported collectd plugin/type combination: plugin: \"processes\" type: \"io_octets\""#012      }#012    },#012    {#012      "index": 15,#012      "error": {#012        "code": 3,#012        "message": "Unsupported collectd plugin/type combination: plugin: \"processes\" type: \"io_octets\""#012      }#012    },#012    {#012      "index": 20,#012      "error": {#012        "code": 3,#012        "message": "Unsupported collectd plugin/type combination: plugin: \"processes\" type: \"io_octets\""#012      }#012    },#012    {#012      "index": 25 
A 2020-05-15T22:45:29Z write_gcm: Server response (CollectdTimeseriesRequest) contains errors:#012{#012  "payloadErrors": [#012    {#012      "error": {#012        "code": 3,#012        "message": "Unsupported collectd plugin/type combination: plugin: \"processes\" type: \"io_octets\""#012      }#012    }#012  ]#012} 
A 2020-05-15T22:45:29Z write_gcm: Unsuccessful HTTP request 400: {#012  "error": {#012    "code": 400,#012    "message": "Field timeSeries[3].points[0].interval.start_time had an invalid value of \"2020-05-15T15:45:27.348251-07:00\": The start time must be before the end time (2020-05-15T15:45:27.348251-07:00) for the non-gauge metric 'agent.googleapis.com/agent/api_request_count'.",#012    "status": "INVALID_ARGUMENT"#012  }#012} 
A 2020-05-15T22:45:29Z write_gcm: Error talking to the endpoint. 
A 2020-05-15T22:45:29Z write_gcm: wg_transmit_unique_segment failed. 
A 2020-05-15T22:45:29Z write_gcm: wg_transmit_unique_segments failed. Flushing.

So, every minute I see such logs appear. When I stop stackdriver-agent service, they disappear. I have 4 VMs in my project. And only on two of them such issue appear On Cent OS7 VM and on Ubuntu 18 VM

So far there are 2 PITs:

Last one has Google engineer explanation for error 400:

These messages are annoying but harmless. You are not losing any metrics. you can safely ignore these logs.

The root cause is a server-side config change and affects all agents. That change only affected the verbosity of the responses, not the processing of the requests. some of the incoming metrics were silently dropped before that change, and are now dropped noisily.

The metrics are sent by default by the upstream collectd plugin, and there are no controls for us to completely prevent those metrics from being sent. The log spam messages result from collectd's internal processing of those metrics.

If you'd like to filter out all the noisy logs you're seeing, you can create a Log Exclusion[1][2] or Log Sink[3][4]. A Log Exclusion will match logs up with a specified filter and drop them from the logs viewer before they come in, and a Log Sink would take logs and direct them to a Storage bucket, Big Query Table, or PubSub topic.

[1] https://cloud.google.com/logging/docs/exclusions#overview

[2] https://cloud.google.com/logging/docs/exclusions#create-filter

[3] https://cloud.google.com/logging/docs/export

[4] https://cloud.google.com/logging/docs/export/configure_export_v2

Regarding swap there is a blog post:

https://myshittycode.com/2020/06/13/gcp-stackdriver-agent-write_gcm-can-not-take-infinite-value-error/

This error occurs because the VM instance does not have swap memory, hence this metric plugin tries to divide by 0.

To fix this, remove this configuration and restart stackdriver-agent.

Recommended topics

Hot tags