This document describes the alarms a platform integrating Cygnus-twitter should raise when an incident happens. Thus, it is addressed to professional operators and such platform administrators.
Cygnus messages are explained before the alarm conditions deriving from those messages are described.
For each alarm, the following information is given:
- Alarm identifier. A unique numerical identifier starting by 1.
- Severity. CRITICAL or WARNING.
- Detection strategy. An example log trace which identifies related alarm.
- Stop condition. An example log trace which means that related problem is no longer active.
- Description. A detailed explanation of the situation which triggers the alarm.
- Action. A detailed plan to cope with this situation (e.g. reboots, checks connectivities, etc).
Cygnus logs are categorized under seven message types, each one identified by a tag in the custom message part of the trace. These are the tags:
Fatal error (
FATALlevel). These kind of errors may cause Cygnus to stop, and thus must be repported to the development team through stackoverflow.com (please, tag it with fiware).
Fatal error (SSL cannot be used, no such algorithm. Details=...)* Runtime error (
ERRORlevel). These kind of errors may cause Cygnus to fail, and thus must be repported to the development team through stackoverflow.com (please, tag it with fiware).
Runtime error (The Hive table cannot be created. Hive query=.... Details="...)* Bad configuration (
ERRORlevel). These kind of errors regard to a bad configuration parameter, and eventually may lead to a Cygnus fail.
Bad configuration (Unrecognized HDFS API. The sink can start, but the data is not going to be persisted!)* Channel error (
ERRORlevel). These kind of errors tell about problems with the internal channel of the agent. This channel is used as part of the failover mechanisms of Flume, storing those events that cannot be processed by the sinks. Nevertheless, the channel may fail itself, either because the HTTP source is not able to put the event (channel error, or simply it is full), either because the sink cannot get a new event.
Channel error (The event could not be got. Details=...)* Persistence error (
ERRORlevel). These kind of errors tell about problems with the persistence backend: unable to connect or not existent folder (when the backend needs to have provisioned a container for that data, e.g. a folder in HDFS). They are exclusively thrown by the sinks. Please observe Cygnus itself may solve the problem thanks to the channel-based failover mechanism of Flume, and the Flume Failover Sink Processor which switchs to a passive sink (if configured).
Persistence error (Could not connect to the HDFS)* Streaming error (
ERRORlevel). These kind of errors tell about problems with the Twitter API: unable to connect to the API because of invalid credentials or temporary unavailability of Twitter. They are exclusively thrown by the
Exception while streaming tweets
Debug messages are labeled as Debug, with a logging level of
DEBUG. Informational messages such as Cygnus version, transaction start/end and other are labeled as Informational, being
INFO the logging level.
|Alarm ID||Severity||Detection strategy||Stop condition||Description||Action|
||For each configured Cygnus-twitter component (i.e.
||A problem has happend at Cygnus startup. The
||Fix the issue that is precluding Cygnus startup, e.g. if the problem was due to and invalid twitter API key or invalid coordinates for the geoquery, then change such values.|
||N/A||A runtime error has happened. The
||Restart Cygnus. If the error persits (e.g. new Runtime errors appear within the next hour), scale up the problem to the development team.|
||For each configured Cygnus component (i.e.
||A Cygnus component has not been configured in the appropriate way.||Configure the component in the appropriate way.|
||Flume events, put by the sources, cannot be got by the sinks from the internal channel due to a problem with the channel (most probably) or the sink itself||A runtime error has happened. The
||Once solved the problem with the storage, Cygnus should be able to fix this kind of errors automatically by means of the internal channel, which works as a temporal buffer for not already processed Flume events (containing context data to be persisted).|
||Once checked that the problem is not due to an external Twitter unavailability, ensure your API credentials (consumerKey, consumerSecret, accessToken and accessTokenSecret) are valid and active.|