Metric Monitoring

The TSDB (time series database) actually stores the billions of individual measurements of things like page load time
The "retrieval" section takes data from various places, like parsing them from log messages or measuring how long individual jobs took, and puts them in the database
The alert manager sends notifications to the relevant people when INCIDENT
The web UI is primary used to understand the metrics alerted someone has been alerted of a fault

Last updated 2 years ago