DevOps Learning checklist

Linux

  1. Understand the Linux booting process

  2. Understand Linux File System structure.

  3. Lean user/group management and permissions.

  4. Learn file permissions

  5. Understand systemd

  6. Install and Configure web servers (Apache, Nginx, Tomcat, etc..) and learn how web servers work.

  7. Learn how Linux processes work.

  8. Learn about Linux Namespaces and Control Groups

  9. Learn how SSH works.

  10. Learn how volumes work in Linux.

  11. Learn about system logging, monitoring, and troubleshooting.

  12. Learn about important protocols (SSL, TLS, TCP, UDP, FTP, SFTP, SCP, SSH)

  13. Learn to manage services and try to create a service on your own (Initd, Systemd)

  14. Host static/Dynamic websites on web servers and play around with different configurations.

  15. Understand the difference between a Load Balancer and a Reverse Proxy/

  16. Setup Load balancers & Reverse Proxy (Nginx, HA proxy, etc). Understand each configuration and algorithm behind load balancing.

  17. Learn to optimize Linux performance.

  18. Setup and Database and understand its configurations and management. (MySQL, PostgreSQL, MongoDB, etc)

  19. Break something and learn to troubleshoot.

Some materials:

Learn How IT Infrastructure Components Work

Networking

Storage

1. OSI Model/TCP-IP Model
2. Network Topologies
3. CIDR Notations
4. Subnetting
5. Public network
6. Private network
7. Static/Dynamic IPs
8. Firewall
9. Proxy
10. NAT
11. Public & Private DNS
12. VPN
13. IPv4 & IPv6 Protocols

1. SAN
2. Backups
3. NFS
5. Object storage
6. Disk IOPS/throughput/latency
7. Databases
8. Key-Value Stores

High Availability

Single Sign-On

1. Distributed Systems (Clusters)
2. Fail-Over Mechanisms
3. Disaster Recovery
4. Vertical scaling
5. Horizontal scaling

1. Active Directory/LDAP
2. Okta
3. OpenID Connect

Security

Load Balancers

1. SSL certificates (Single & Mutual)
2. PKI Infrastructure
3. Zero trust security
4. Password/secret rotation
5. Security Compliance
6. Site-to-site VPN
7. Client-to-site VPN

1. L4 Load Balancers
2. L7 Load Balancers
3. Load balancing algorithms
4. Reverse Proxy

Learn Virtualization & Cloud Computing

You can start learning virtualization concepts using tools like Virtualbox & Vagrant (Type 2 virtualization). It is the foundation of cloud computing.

In terms of Cloud computing, you need to learn and get certified on cloud platforms.

When I say “Get Certified,” please do not use the exam dumps to pass the certification. It adds significantly less value to you. It may be useful for the organization to show the clients that they have certified cloud engineers.

Pick any one public cloud, preferably AWS, and learn about all its core infrastructure services. Do hands-on on all the core services and understand how it works.

Some materials:

Learn Infrastructure Automation Tools

For Dev Environment

For Infrastructure Provisioning

1. Vagrant
2. Docker Desktop
3. Minikube (k8s)
4. Minishift (k8s)
5. Kind (k8s)

1. Terraform (preferable)
2. Cloudformation for AWS
3. CLIs (of respective cloud provider) 4. Pulumi

For Configuration Management

VM image/Container management

1. Ansible (preferred)
2. Chef
3. Puppet
4. Saltstack

1. Hashicorp Packer
2. Docker
3. Podman

Tips:

  1. Learn the basics from the official documentation or through a course.

  2. If you want to write an Ansible playbook for Nginx, first configure Nginx manually and see how the components and configs work. Then start writing the playbook.

  3. Ensure you learn test-driven infrastructure development. There are testing tools for every automation tool. (Ansible-test, terratest etc.)

  4. Community modules are a great reference to learning. You can learn complex logic from community modules.

  5. When using community modules, ensure you know what each block of code does.

Learn About Microservices, Containers & Kubernetes

Once you understand docker, you can start learning about the container orchestration tool Kubernetes.

These platforms are best suited for microservices-based architecture.

Service mesh is an advanced topic in the container space. If you are a beginner to container toolsets, you can learn this after gaining a good amount of knowledge in container orchestration and microservices-based architecture.

Here are the steps involved in deploying containers on multiple hosts and managing networking without any orchestration system

  1. Set up multiple servers: You would need to set up multiple servers, either on-premise or in the cloud, to host your containers.

  2. Install a container runtime: You would need to install a container runtime, such as Docker, on each of the servers.

  3. Deploy containers: Next, you would need to manually deploy the containers on each of the servers. This could involve pulling container images and starting the containers on each server.

  4. Manage networking: You would need to manage the networking between the containers on each server, ensuring that they can communicate with each other as needed. This could involve setting up network bridges, creating network segments, or using other network configurations.

  5. Manage scaling: To scale your application, you would need to manually add or remove containers from each server as needed.

  6. Manage Load balancing: You need customized applications to enable load balancing.

This process can be time-consuming and error-prone and does not provide automation and scalability features.

Here is where container orchestration tool like Kubernetes comes into the picture.

Using kubernetes, you just have to worry about your application development and deployments. All heavy lifting like networking, service-to-service communication across nodes, load balancing, resource scheduling, scalability, and high availability are taken care of by Kubernetes.

Overall Kubernetes helps you achieve the following.

  1. Container orchestration: Automates the deployment, scaling, and management of containers

  2. Automatic scaling: Horizontal and Vertical scaling

  3. Self-healing: Detects and replaces failed containers.

  4. Load balancing: Distributes incoming requests across multiple containers

  5. Service discovery and networking: Manage communication between containers.

  6. Rolling updates and rollbacks: Deploy applications with zero downtime.

  7. Resource management: Manage CPU, memory, and storage of containers

  8. Volume management: Manage persistent storage for your containers.

  9. Config and Secret management: Provides ways to externalize application configs and secrets and makes them accessible to applications running inside containers.

Prerequisites to Learn Kubernetes

  1. Distributed system: Learn about distributed system basics & their use cases in modern IT infrastructure.

  2. Linux: Kubernetes can run on both Linux and Windows operating systems. However, a majority of the production deployments and testing are done on Linux, so it is recommended to have a basic understanding of Linux systems and administration. Having a good grasp of Linux would also help in understanding the underlying components and architecture of Kubernetes.

  3. Authentication & Authorization: A basic concept in IT, but engineers starting their careers tend to get confused by it. To help with this, it’s recommended to gain a good understanding of learning through analogies. You will often encounter these terms when learning about Kubernetes.

  4. Key Value Store: It is a type of NoSQL Database. Understand just enough basics and their use cases.

  5. API: Kubernetes is an API-driven system. So you need to have an understanding of RESTFUL APIs. Also, try to understand gRPC API. It’s good to have knowledge.

  6. YAML: YAML stands for YAML Ain’t Markup Language. It is a data serialization language that can be used for data storage and configuration files. It’s very easy to learn and from a Kubernetes standpoint, we will use it to define and manage configurations for containers and applications. So understanding YAML syntax is very important.

  7. Container: Kubernetes is all about managing containers, so a good understanding of how containers work is essential. You should be familiar with Docker or Podman and have some experience working with containers. I would also suggest reading about Open container initiative and Container Runtime Interface (CRI)

  8. Service Discovery: It is one of the key areas of Kubernetes. You need to have basic knowledge of client-side and server-side service discovery. To put it simply, in client-side service discovery, the request goes to a service registry to get the endpoints available for backend services. In server-side service discovery, the request goes to a load balancer and the load balancer uses the service registry to get the ending of backend services.

  9. Networking Basis

    1. CIDR Notation & Type of IP Addresses

    2. L3, L4 & L7 Layers (OSI Layers)

    3. SSL/TLS: One way & Mutual TLS

    4. Proxy

    5. DNS & Ports

    6. IPTables

    7. Software Defined Networking (SDN)

    8. Virtual Interfaces

    9. Overlay networking

  10. Familiarity with Command Line Interface (CLI): Kubernetes is primarily managed using the command line, so it is important to be comfortable using the CLI.

  11. Knowledge of Git: We will make use of Git for version control and source code management, so it is recommended to have some experience with Git.

Materials:

Learn Logging & Monitoring & Observability

All apps deployed in the infrastructure will produce logs and metrics. Logs are pushed and stored in a logging infrastructure based on architecture and design.

Every company would have a logging and monitoring infrastructure. Commonly used logging stacks are Splunk and ELK. Also, there are a few SaaS companies like Loggly, which provide logging infrastructure.

For monitoring, there are open-source tools like Prometheus, and Nagios and enterprise tools like AppDynamics, Datadog, SignalFx, etc.

Developers, operations teams, and security teams use logging systems to monitor, troubleshoot, and audit applications and infrastructure. Also, for AIOPS, log data plays a key role.

In every organization, mission-critical applications are monitored 24/7 using monitoring dashboards. Generally, dashboards use data from logging sources or metrics generated by the application.

Also, there would be alerting systems that use the rules configured in the monitoring systems for alerting.

For example, an alert could be triggered as a slack notification, Jira ticket, email alert, ServiceNow incident ticket, or xMatters phone call. Alerting workflows differ from organization to organization.

As a DevOps engineer, you should be able to query logs and troubleshoot issues in non-prod and prod environments. Understanding regular expressions is very important to query logs in any logging tool.

Materials:

Learn Security Best Practices (DevSecOps)

Materials:

Learn Programming & Scripting

Scripting is essential for a DevOps Engineer. Nowadays, for DevOps interviews, every decent company has a preliminary scripting/coding round.

Here is a snippet from the official google cloud blog which talks about skills to become a cloud engineer. Which says “Code is non-negotiable“.

Recommended:

  • Bash/Shell

  • Python

  • Golang

  • NodeJs

  • Rust

Pick a programming language and build an application from scratch. According to Google Cloud guys:

Also, In today’s world, we treat everything as code. Even though there are enough tools to automate everything, you might need custom functionality that a tool may not offer. In such cases, coding/scripting comes in handy to achieve those functionalities.

For example,

  1. Jenkins pipeline as code requires an understanding of groovy

  2. Ansible custom module requires an understanding of python

  3. Writing Kubernetes operator requires Golang experience.

Also, if you look at AWS CDK or IaaC tool like Pulumi, you can use a programming language to define the infrastructure and do test-driven infrastructure development like you develop applications.

Learn Git & GitOps

Materials:

End To End Application Delivery Lifecycle

Learn to use any of the following CI/CD tools.

CI ToolsCD Tools
JenkinsArgoCD
GitHub ActionsFluxCD
Drone CIJenkins X
Travis CIGoCD

Materials:

Also, here is a list of topics related to the application development and release lifecycle. You can connect with people in the industry and understand how it is done in their organization.

  1. Planning process.

  2. Architectural approval & signoff process by Enterprise architects.

  3. Enterprise Security signoff on infrastructure & application design/tools.

  4. Data Compliance

  5. Config /Secret management

  6. QA/Performance testing & Approvals

  7. Monitoring KPIs documentation & setup

  8. Change management process.

  9. Production release process.

  10. Production Deployment Strategies

    • Blue Green Deployment

    • Canary Deployment

  11. Post-production validation activities

  12. ApplicationRollbacks scenarios and strategies.

How To Learn a DevOps Tool

Here is a systematic approach to learning new technologies.

  1. First, understand what problem the tool solves.

  2. Start by reading the official documentation and getting started guides and then move on to community-written blogs if the official documentation is not helpful.

  3. Learn the key concepts, architecture/design, and use cases. Then, if it is a complex tool, I try to read a book on that tool.

  4. Then, I watch official videos to learn how the tool is used and the problems it solves for other companies. Most tools will have conference presentations on youtube. For example, AWS Reinvent videos, Kubecon videos, etc.

  5. Then, I pick up a use case and gain hands-on experience. It could be a use case available online or I design my own. The real learning starts here.

  6. Once I gain adequate hands-on knowledge, I start reading blogs. Mostly blogs on experiences shared by engineering teams on using the tool in production. It helps me understand the best practices and learnings from others.

  7. Then, I start digging into conversations on Reddit, StackOverflow, etc. Interestingly these forums are a gold mine of information.

  8. Lastly, I reach out to professionals in my network to understand how they use the tool in their projects. If the tool is relatively new, there is less chance of getting information. But you can share your knowledge with them.

You can choose a method and order that suits your needs. Do not rush. Have patience and trust in the learning process.

Sharing your knowledge with others and teaching others helps you retain what you have learned. This is also an important aspect of being a DevOps engineer.

Read & Document Your Learnings

If you want to be a knowledgeable DevOps engineer, read more. Read at least one DevOps tech blog related to engineering. Read topics that are not part of your day-to-day job to broaden your thinking.

Follow all the engineering blogs like Netflix, Twitter, Google, etc. Learn how they are using the right toolsets, deployment strategies, and their latest open-source projects.

Follow like-minded people on LinkedIn, Reddit, Medium, Quora, etc.

It’s good to share with others your experiences and learning. You can publish tutorials, learnings, and experiences on your blog.

Or you can create a GitHub repo documentation.

Whenever you learn something new about DevOps, you can write about it. It will be a reference to you as well as others.

Recomended Certifications for DevOps Engineers

\========

More helpful resources:

Source: devopscube.com; modified, revised and add more additional materials