Metric Monitoring

4 Key components:

  • The TSDB (time series database) actually stores the billions of individual measurements of things like page load time

  • The "retrieval" section takes data from various places, like parsing them from log messages or measuring how long individual jobs took, and puts them in the database

  • The alert manager sends notifications to the relevant people when INCIDENT

  • The web UI is primary used to understand the metrics alerted someone has been alerted of a fault

Last updated