Kubernetes Monitoring and Logging: Tools and Best Practices ❤️

Kubernetes (K8s) is an open-source project under the CNCF organization that mainly helps in container orchestration by simplifying the deployment and management of containerized applications. It is widely used in DevOps and cloud-native space, and one cannot imagine DevOps workflow without it. During the management of these containerized applications, it becomes harder and harder to monitor these containers due to the increasing complexity and scalability of these containers.

Hence, a proper monitoring and logging setup is essential to make sure things don’t break unexpectedly. In a one-liner, Monitoring or Observability is the process of watching out for the application through alerts. Logging or Logs are info of every small thing happening inside the containers (e.g. ‘namespace created’ –> ‘pod is yet to start’ –> ‘pod is running’ –> ‘pod is restarting’ etc.)

Table of Content

What is Kubernetes?
What is Kubernetes Monitoring And Why Should You Care About It?
Kubernetes Logging Architecture
System Component Logs
Cluster Logging Architecture
Types of Kubernetes Logs
What Metrics To Monitor For Monitoring?
What Options Are Available For Monitoring Kubernetes Cluster?
How to perform Kubernetes Monitoring and Logging? A Step-By-Step Guide
Features of monitoring and logging
How does Logging in Kubernetes Work?
How is Logging in Kubernetes different from Others?
Popular Kubernetes Logging Topics
Kubernetes Logging Tools
Kubectl logs and Other Useful kubectl Commands
Best Practices of Kubernetes Monitoring
Kubernetes Logging Best Practices
Conclusion
Kubernetes Monitoring and Logging – FAQs

Kubernetes is an open-source Container Orchestrator tool that helps in managing the containers in term of pods. It provides several functionalities and features around that with some monitoring features too. Kubernetes itself is a huge and complex project under CNCF (Cloud Native Computing Foundation). It facilitates in simplifying the management of complex microservice architectures in production environments.

Kubernetes monitoring or simply monitoring is a set of practices used to make sure that our Kubernetes cluster is working properly, and in-case any unusual thing happens with our cluster for example – some pods are crashing again and again, some pods are not starting, authentication errors etc. Then through some set of practices and methods we identify the cause of the issue and then troubleshoot it. For this purpose we monitor some thing called as ‘Metrics’. Metrics are basically the parameters that we monitor for our monitoring purpose. Monitoring in terms of cloud-native world is also known as ‘Observability’.

Kubernetes logging architecture is the one that designed for capturing, storing, and analyzing the logs from applications and system components that are running within a cluster. It ensures that logs are accessible for debugging, monitoring, and auditing purposes. The architecture mostly involves in collecting logs from various sources, aggregating them, and storing them in a centralized system where they can be analyzed. The following are some of its key points:

Log Collection: Logs are collected from application containers, system components, and nodes.
Log Aggregation: Collected logs are aggregated to a central storage system.
Log Analysis: Centralized logs are analyzed for monitoring, debugging, and auditing.

The System component logs in Kubernetes include the logs from core components that manage the cluster. These components basically generate the logs which helps in monitoring and troubleshooting the cluster’s health and performance. The following are the some of the system component logs:

kube-apiserver Logs: It stores the logs of API requests and responses including the authentication and authorization details.
kube-scheduler Logs: This logs helps in scheduling the decisions and errors.
kube-controller-manager Logs: It facilitates with stores the action records taken by various controllers, like node and replication controllers.
kubelet Logs: Logs activities related to node management, pod lifecycle, and container runtime interactions.
kube-proxy Logs: Logs network rules and traffic details for service discovery and load balancing.
etcd Logs: Logs database operations for Kubernetes state storage.

Cluster logging architecture in Kubernetes refers to the setup that collects, processes, and stores logs from all nodes and applications in a cluster. It typically involves using logging agents on each node to forward logs to a central log management system.

Logging Agents: Deployed on each node (e.g., Fluentd, Logstash) to collect logs from containers and system components.
Log Forwarding: Agents forward logs to a central logging system (e.g., Elasticsearch, Splunk).
Centralized Storage: Logs are stored in a scalable and searchable central repository.
Log Processing: Logs are processed and parsed for better analysis and monitoring.
Visualization Tools: Use of tools like Kibana or Grafana to visualize and query logs for insights and troubleshooting.

The Kubernetes logs are generally categorized into 3 types as follows:

1. Application Logs
2. Node Logs
3. System Component Logs

Application Logs: These logs are generated by the containerized applications. It captures the standard output (stdout) and standard error (stderr). These logs are main useful for monitoring the application performance and debugging process.
Node Logs: These logs are generated from the kubernetes nodes that includes operating system and container runtime logs. It provides the required insights into the node health and performances.
System Component Logs: These logs are generated from the kubernetes system components. It include the kube-apiserver, kube-scheduler, kube-controller-manager and kube-proxy. These are essential for troubleshooting the cluster operations and monitoring the system healths.

There are ‘n’ no. of parameters that you can assess for monitoring but it will not be feasible. Below are listed some of the most important metrics that you must have to monitor and they cover most part of your application. You can also assess some additional metrics as per use-case.

Node resource usage
How many pods are running in a node
Deployments and Daemonsets
Pods (Which are failing, restarting, in CrashLoopBackOff)
Memory utilisation by pods and cluster
Application health and performance

Kubernetes Dashboard: Kubernetes itself provides a dashboard which you can access via a web-browser, it roughly covers main metrics and gives a glimpse of what is happening in cluster.
Prometheus: It is one of the most famous monitoring tool in the entire market. It provides powerful metrics with lots of functionalities and cluster integration for Kubernetes specially.
Grafana: Grafana dashboard is popular for its visual UI dashboard that makes it very interesting to measure and keep track of different clusters and metrics. It is oftenly used with Prometheus for creating a powerful monitoring setup.
EFK Stack: It provides a centralized way to collect logs, and then those logs are depicted in a dashboard for visual representation. Here EFK stands for ‘ElasticSearch‘, ‘Fluentd’, ‘Kibana’ which are basically respective tools for collecting logs, integration to dashboard, and finally the dashboard.
Cloud-based monitoring: Many cloud providers provide there own services for allowing users of there cloud to not look anywhere for different toolings and instead they provide all monitoring setup as a single setup which is specific to use through their cloud.

Step 1: Install Kubernetes dashboard locally:

kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.7.0/aio/deploy/recommended.yaml

Step 2: Enable access to dashboard using ‘proxy’ cmd, and you will be able to access it on Port 8001 or you may require doing port forwarding for it:

kubectl proxy

Step 3: You can also access this UI dashboard more securely with a service a/c and cluster binding file.

For that, first you have to make a service a/c (A manifest file through which you can define different permission, roles and access control by also making sure that only a handful of authenticated user can access and edit this file, K8s uses RBAC for this purpose).
If you aren’t using an old k8s version that is about to deprecate then the output will be as below or if you want to create own serviceAccount then you can use ‘touch’ and ‘vi’ cmds for that.

kubectl get serviceaccounts

On running the above command it will list the service accounts as follows:

NAME      SECRETS    AGE
default      1       4d

Use the following command for creating a yaml file and then upload the code in it:

touch w3wiki.yaml

vi w3wiki.yaml

# Write contents of your manifest as something like below after pressing "i" and then save it 
# using (ESC + : + x) 

apiVersion: v1
kind: ServiceAccount
metadata:
  name: w3wiki
  namespace: default

Now, Configure the defined yaml manefist file with the following command kubectl apply:

kubectl apply -f w3wiki.yaml

Verify whether your resources are configured properly or not using the following command:

# You can check if your service a/c is configured properly or not, using below cmd:

kubectl get serviceaccounts/w3wiki -o yaml

Step 4: Making some roles and permissions using clusterRole & clusterRoleBinding (A manifest file using which we appoint permissions to specific roles and groups).

These steps as a part of best practices are for application specific service a/c, additionaly you can apply these changes to other namespaces as well. In the below examples, we will be using “default” namespace with “w3wiki” as our service account.

# It will grant 'read-only' permission to our service a/c, which means we will be able to only view 
# dashboard using our service a/c. You can change these permissions for different service a/c's.

kubectl create rolebinding w3wiki-view \
  --clusterrole=view \
  --serviceaccount=default:w3wiki \
  --namespace=default

Step 5: We will require a token for accessing our K8s dashboard. Create token, and paste it to your browser to access your cluster using below cmds (It is done for security purposes).

You will get a long random code, copy that token and paste it to your dashboard, and you will be able to log in to your dashboard.

kubectl create token w3wiki

Step 6: Make a deployment manifest file of your application. Ex-

apiVersion: apps/v1
   kind: Deployment
   metadata:
     name: w3wiki
   spec:
     replicas: 1
     selector:
       matchLabels:
         app: w3wiki
     template:
       metadata:
         labels:
           app: w3wiki
       spec:
         containers:
         - name: w3wiki
           image: your-app-image
           command: ["your-app-command"]

Step 7: Access Container logs using ‘logs’ cmd

kubectl logs
or
kubectl logs <pod-name>

Step 8: After opening dashboard you will see a UI similar to this one with your pods and cluster info.

Now you can explore these monitoring features under different tabs such as – Overview, Nodes, Workloads, Storage, Configuration, CRD’s, Metrics, Events etc.
You will see info of your replicas, pods, deployments etc in a pie-chart format at home and you can even monitor different namespaces seamlessly.
You can now easily use these features and monitor your cluster using it.

Step 9: You can seamlessly create and update pod, replicas, deployment etc. manifest file without using kubectl cmds with just few clicks on this dashboard.

For doing so click on “+” icon at top-right corner and a pop-up screen will appear, where you can select different info for the manifest file that you want to create, and your manifest file will be created live.
And for updating just select the manifest file, click on inspect option and you will be able to update its content.

Step 10: For viewing live logs of a manifest file, simply click on 3 dots icon at top right after clicking on that manifest file and then click on logs option to view or download those logs.

A good Logging & Monitoring setup ensures the reliable use of application while taking care of security of the cluster/application. Some of the key points that describe this importance is as below:

Logging

Debugging: They help in troubleshooting related issues as they provide detailed step by step info of every change.
Auditing and Compliance: it helps in storing a track record of all the activities.
Analysis: They help in analyzing resource utilization and optimization.
Security: It helps in identifying threats and security bugs.

Monitoring

Bug identification: It helps in identifying any abnormalities caused to cluster.
Efficient resource allocation: It helps in efficient use of resource utilization by proper resource allocation as per the requirement of pods, hence saves resources and resolves scalability issues.
Trust factor: It increases trust and customer service support of user as it helps in identifying the issues in real time.
Advantageous functionalities: A human might miss an incident, but machine won’t, and hence functionalities like automated alerting system is also provided with monitoring tools.
Reliability: A good setup of logging and monitoring system increases reliablity of overall application.
Insights: It helps in giving some forecast of performance and health of components and cluster tools

Logging in Kubernetes involves collecting, storing, and analyzing logs from applications and system components running within the cluster. Kubernetes uses logging agents on each node to collect logs from various sources and forward them to a centralized logging system. This centralized system allows for efficient log aggregation, search, and analysis, helping with debugging, monitoring, and auditing. The following are the kubernetes logging workflow:

Log Collection: Logs are collected from containers, nodes, and system components.
Logging Agents: Tools like Fluentd, Logstash, or Filebeat are used to gather logs.
Log Forwarding: Logs are sent to centralized storage systems (e.g., Elasticsearch, Splunk).
Centralized Analysis: Aggregated logs are analyzed using tools like Kibana or Grafana.

Logging in Kubernetes differs from traditional logging due to its containerized and distributed nature. Traditional logging typically involves logs from monolithic applications running on static servers, whereas Kubernetes logging deals with dynamic and ephemeral containers running across a distributed cluster. The following are the some of the kubernetes logging differences from others.

Dynamic Environments: Handles logs from containers that can be created and destroyed frequently.
Distributed Systems: Manages logs from multiple nodes in a cluster.
Centralized Log Management: Aggregates logs from various sources for centralized analysis.
Container Context: Provides context about the container, pod, and node from which the log originates.

The Kubernetes logging focus on best practices, tools, and strategies that will facilitates with effective log management in Kubernetes environments. These involves in choosing the right logging agents, setting up centralized logging, and ensuring log security and compliance. The following are the some of the popular kubernetes logging topics that focuses on best practices, tools and strategies for effective log management:

Logging Best Practices: Techniques for efficient log collection, storage, and analysis make sure of getting optimal performance and resource utilization.
Logging Tools: The comparison and usage of tools like Fluentd, Elasticsearch, and Kibana for log collection, aggregation, visualization, and analysis.
Centralized Logging: The strategies for collecting logs from distributed sources across the Kubernetes cluster and aggregating them into a centralized logging system for easier management and analysis.
Log Retention and Compliance: The logs are stored securely and meet compliance requirements such as data retention policies, encryption, and access control.
Performance Optimization: On improving the performance of logging systems for handling large volumes of logs efficiently without degradation, involves optimizing resource usage and implementing scaling strategies.

The following are the some of the logging tools that help in managing logs efficiently in a Kubernetes environment:

Fluentd: A flexible logging agent used for collecting, processing, and forwarding logs.
Elasticsearch: A search and analytics engine commonly used for storing and querying logs.
Kibana: A visualization tool that works with Elasticsearch to analyze and visualize logs.
Logstash: A data processing pipeline tool for collecting and parsing logs before storing them in Elasticsearch.
Filebeat: A lightweight log forwarder that sends logs to Elasticsearch or Logstash.

kubectl logs is a command used to fetch logs from Kubernetes pods. It is a vital tool for debugging applications and monitoring their behavior directly from the command line. The following are the some of the kubectl logs command for fetching logs:

1. kubectl logs [pod-name]

It helps in fetching the logs from a specific pod example pod named my-pod :

kubectl logs my-pod

2. kubectl logs [pod-name] -c [container-name]

It helps fetching the logs from a specific container within a pod named “my-pod”

kubectl logs my-pod -c my-container

3. kubectl logs –previous [pod-name]

It helps in fetching the lgos from the previous terminated containers in a pod named “my-pod”

kubectl logs --previous my-pod

4. kubectl describe pod [pod-name]

It provides the detailed information about a pod that is named as “my-pod” including the events and logs:

kubectl describe pod my-pod

The following are the best practices of kubernetes monitoring:

Using advanced monitoring solution such as (Prometheus + Grafana) or (Telegraf + InfluxDB + Grafana) or etc. for advanced monitoring functionalities and capabilities with metrices.
Using a centralized log system such as EFK stack (ElasticSearch + Fluentd + Kibana) for log collection and management.
Using a cloud-based or cloud-native based tooling for monitoring and logging.
Enabling audit logging in K8s for having a record of all the requests made via API server for enhanced security.
Monitoring and reducing costs associated with clusters, by keeping a check on resource utilization of cloud services by our cluster and opting for only required resources. Hence, reducing the cost of cluster.
Monitor only relevant metrices, and don’t make it complicating by monitoring 10’s of metrics for each cluster.
Set-up and integrate an alert system with Prometheus to get notified about your cluster health in a centralized way via Slack, E-mail and etc.
Write some automation scripts and integrate them with your monitoring setup to troubleshoot common issues via an automation system.
Always be prepared with a disaster recovery setup of your application in-case of some unusual mishappening with your application or system.

Implementing best practices in Kubernetes logging ensures efficient log management, enhances performance, and aids in quick troubleshooting. These practices help maintain a robust logging infrastructure. The following are the some of the best practices of kubernetes logging:

Centralized Logging: Aggregate logs from all sources into a central system for easier management and analysis.
Structured Logging: Use structured log formats (e.g., JSON) to facilitate easier parsing and querying.
Log Rotation and Retention: Implement log rotation and retention policies to manage log file sizes and comply with data retention regulations.
Security and Compliance: Ensure log data is encrypted and access-controlled to meet security and compliance requirements.
Monitor Log Performance: Regularly monitor the performance of the logging system to handle high volumes of logs without degradation.

Monitoring and logging are crucial things for troubleshooting a cluster. Monitoring or Observability is basically the process of watching out for current and changing state of containers and components in the cluster and making us aware of state of application through alerts. Logging or Logs are basically info of every small thing happening inside the containers (e.g. ‘namespace created’ –> ‘pod is yet to start’ –> ‘pod is running’ –> ‘pod is restarting’ etc.). We have different-different monitoring and logging solutions for different requirements and use cases. The set up depends upon use case and the functionalities you require. Documentation and their slack community can be referred in case you need any further help.

Should I use (K8s native monitoring setup) or (Prometheus & Grafana) or (Cloud services)?

It depends upon the use case, if you are a beginner and new to these devops stuff or you don’t require much functionalities then a simple native k8s dashboard will be simple to use tool for getting the task done. If you are working in a complex project, or you need advanced functionalities, or a proper setup for making it easier to troubleshoot then using Prometheus & Grafana could be the best option to go with. And if you are already familiar & comfortable with a cloud then considering cost of cloud, using Prometheus, Grafana or cloud services would be better option to go with.

Which resources should I follow for further learning?

Refer Official documentation of Kubernetes, Prometheus and other devops tooling for being up to date with common issues, official blogs and new functionalities.

What to do if I get stuck with monitoring?

Slack community of these devops tools are very active and welcoming. If you get stuck somewhere and you are not able to make any progress, then reach out to them via their slack community, and make sure you ask your questions properly there.

What is Observability/Monitoring?

Monitoring or Observability is basically the process of watching out for current and changing state of containers and components in the cluster and making us aware of state of application through alerts.

What are Logs?

Logging or Logs are basically info of every small thing happening inside the containers (e.g. ‘namespace created’ –> ‘pod is yet to start’ –> ‘pod is running’ –> ‘pod is restarting’ etc.

Should I learn cloud specific tooling or cloud-native based tooling?

Cloud based or cloud-native based tooling are getting more and more adopted by companiesnowadays. So, learning devops tooling that is cloud-native based could be the best option to learn as they are vendor free and reliable and hence, as such no company have any problem to use them over cloud-based tooling untill and unless most of their code is in a specific cloud or they have their own cloud platform.

What are some of the most recommended tools to learn in 2024 for Monitoring?

Some of the most recommended tooling to learn are as below –

Prometheus and Grafana (Highly recommended)

EFK Stack (Optional)

Cloud specific tools of atleast one Tier1 clouds {AWS, Azure, GCP} (Highly recommended)

Cloud specific tools of atleast one Tier2 clouds {DO, Heroku, Civo} {Highly recommended)

DataDog/Sysdig for monitoring and logging (Optional)

Jeger, Loki, Thanos or Cortex with Prometheus and Grafana (Optional but recommended)

Kubernetes Monitoring and Logging: Tools and Best Practices

What is Kubernetes?

What is Kubernetes Monitoring And Why Should You Care About It?

Kubernetes Logging Architecture

System Component Logs

Cluster Logging Architecture

Types of Kubernetes Logs

What Metrics To Monitor For Monitoring?

What Options Are Available For Monitoring Kubernetes Cluster?

How to perform Kubernetes Monitoring and Logging? A Step-By-Step Guide

Features of monitoring and logging

Logging

Monitoring

How does Logging in Kubernetes Work?

How is Logging in Kubernetes different from Others?

Popular Kubernetes Logging Topics

Kubernetes Logging Tools

Kubectl logs and Other Useful kubectl Commands

1. kubectl logs [pod-name]

2. kubectl logs [pod-name] -c [container-name]

3. kubectl logs –previous [pod-name]

4. kubectl describe pod [pod-name]

Best Practices of Kubernetes Monitoring

Kubernetes Logging Best Practices

Conclusion

Kubernetes Monitoring and Logging – FAQs