What is Log Aggregation?
It's a way of collecting and tagging application logs from many different services into a single dashboard that can easily be searched
Popular Open Source Option: ELK
Logstash:
Gets all the messages
Uses each field as a tag (let's say,
time, user, what the user did
Does not log the messages but sends them to Elasticsearch
Elasticsearch
Efficient database that logs all the info coming in from Logstash
Kibana
You as the admin would connect to Kibana
Kibana would query Elasticsearch (the DB)
You would get the results needed to identify the fault
How to use ELK to diagnose a PROD problem?
Report: user saw error code 12345 when he tried to do "x" actions
With ELK set up, all we have to do is go to Kibana and query recent logs for 12345
That might give us the name of the service: "backend"
Then we could narrow down the time to the five second interval around the error and filter for log messages emitted by either the backend or the database
Finally, we'd read those log messages to find the context for the bug
Bonus: Add log aggregation as an extra test
Could be useful to see if there is any error messages coming back from running the stack
Options of Log Aggregation platforms:
ELK
Fluentd
DataDog
LogDNA
AWS CloudWatch Logs
Last updated