🐧 Ubuntu Linux — Guide Complet (Desktop / Server / Cloud / AWS)

Ubuntu = distribution Linux “production-friendly” : stabilité, sécurité, support, cloud, écosystème. (Catégorie IDEO-Lab : O/S & Platforms)

Download

1.1

Ubuntu : c’est quoi ?

Positionnement, philosophie, réputation (desktop + serveur + cloud), pourquoi c’est un standard “pro”.

O/S & Platforms Linux Enterprise-ready

1.2

Versions & cycle (LTS)

LTS vs interim, support, comment choisir (prod/dev), exemples de versions actuelles.

LTS Release cycle Support

2.1

Installation (Desktop/Server)

ISO, partitionnement, UEFI, SSH, cloud-init (server), post-install “propre”.

Install UEFI SSH

7.2

Software Management

Ubuntu App Center, .deb packages, Snap, Flatpak, PPA repositories, software sources and safe production usage.

App Center DEB PPA

2.2

Fonctions de base (CLI)

Fichiers, users, permissions, services, logs, réseau, storage : le kit “sysadmin”.

Terminal Systemd Troubleshoot

3.1

Paquets : APT & Snap

Repositories, pinning, updates, sécurité, snaps, bonnes pratiques (prod).

APT Snap Repos

7.3

Maîtriser le Terminal

BASH, commandes fondamentales, navigation fichiers, sudo, permissions, chmod, chown et réflexes sysadmin.

BASH CLI Permissions

4.1

Sécurité (hardening)

UFW, SSH, fail2ban, mises à jour sécurité, users/roles, audit & bonnes pratiques cloud.

Security UFW SSH

4.2

Performance & robustesse

Kernel, IO, memory, CPU, tuning, monitoring, pourquoi Ubuntu est “stable” en prod.

Perf Robust Monitoring

7.4

Maintenance & Security

System updates, UFW firewall, Timeshift restore points, logs, journald and safe maintenance routines.

Updates UFW Timeshift

5.1

Cloud & AWS (Ubuntu images)

AMI officielles, Owner Canonical, cloud-init, userdata, SSH keys, patterns EC2.

AWS EC2 cloud-init

5.2

Containers & Virtualisation

Docker, LXD/LXC, KVM, virt-manager, usages (CI/CD, lab, prod).

Docker LXD KVM

7.5

Customization & Optimization

GNOME extensions, themes, icons, keyboard shortcuts, battery management, swappiness and safe cleanup routines.

GNOME Themes Optimize

6.1

Dépannage (méthodo)

Logs systemd, journald, réseau, DNS, disk, boot, services : playbook.

Debug Logs Incidents

7.1

Cheat-Sheet Ubuntu

Commandes essentielles + checklists “serveur prod” + bonnes pratiques cloud.

Quick Checklist Ops

1.1 Ubuntu: definition, positioning, reputation, server, desktop, cloud and professional usage

Definition

Ubuntu is a Linux distribution maintained by Canonical. It is built on the Linux kernel and provides a complete operating system: package management, system services, security updates, networking, storage, user management, desktop environment, server tools and cloud images.

In professional environments, Ubuntu is popular because it is predictable, widely documented, cloud-friendly, developer-friendly and available in long-term support releases. It is commonly used for web servers, APIs, containers, DevOps tooling, CI/CD runners, databases, monitoring, AI workloads and desktop development.

Category: Operating system / Linux distribution

Vendor: Canonical

Kernel: Linux

Main strengths: LTS, packages, cloud, documentation

Common roles: server, desktop, cloud VM, container host

Professional value: DevOps, backend, SRE, infrastructure

Simple definition: Ubuntu is a production-friendly Linux operating system used to run applications, services, containers, databases, automation pipelines and developer workstations.

Where Ubuntu sits in the technology landscape

Layer	Ubuntu role	Examples
Hardware / VM	Runs on physical machines or virtual machines.	Server, laptop, AWS EC2, Azure VM, KVM.
Kernel	Uses Linux kernel for process, memory, network and filesystem control.	scheduler, TCP/IP, ext4, drivers.
User space	Provides tools, libraries, shells and services.	bash, systemd, apt, ssh, journald.
Applications	Hosts business and infrastructure services.	Nginx, PostgreSQL, Redis, Docker, Django.
Operations	Provides operational surface for admins and DevOps.	logs, units, firewall, packages, users.

Mental classification

Ubuntu is not:
                            - a programming language
                            - a framework
                            - a database
                            - a cloud provider
                            - a container engine

                            Ubuntu is:
                            - an operating system
                            - a Linux distribution
                            - a server platform
                            - a desktop platform
                            - a cloud image baseline
                            - a container host
                            - a DevOps execution environment

Why Ubuntu became a professional standard

Ubuntu became a common professional choice because it offers a practical balance: easier than many traditional server distributions for newcomers, stable enough for production when using LTS, and supported by a huge ecosystem of packages, tutorials, cloud images and vendor documentation.

Reason	Professional impact	Concrete example
LTS releases	Stable baseline for servers and production workloads.	Choose one version and patch it for years.
Large package ecosystem	Fast installation of standard infrastructure tools.	`apt install nginx postgresql redis`
Cloud images	Quick deployment on public cloud providers.	EC2, Azure, GCP, OpenStack.
Documentation	Faster troubleshooting and onboarding.	Server docs, community docs, vendor guides.
Developer tooling	Good fit for Python, Node.js, Go, Java, Docker and CI/CD.	Local dev and production parity.
Enterprise support	Commercial support path exists if needed.	Canonical support, Ubuntu Pro, security services.

Professional value map

Ubuntu knowledge helps in:

                            Backend engineering
                            ├── deploy APIs
                            ├── manage services
                            ├── inspect logs
                            └── debug network and permissions

                            DevOps
                            ├── automate installs
                            ├── configure systemd
                            ├── harden SSH
                            ├── manage packages
                            └── operate containers

                            SRE / Production
                            ├── monitor CPU/RAM/disk/network
                            ├── investigate incidents
                            ├── patch security updates
                            ├── tune services
                            └── write runbooks

                            Cloud engineering
                            ├── boot cloud images
                            ├── use cloud-init
                            ├── configure storage
                            ├── set firewall rules
                            └── deploy workloads

Recruiter-friendly summary

Strong positioning: Ubuntu skills show the ability to operate real services: SSH, systemd, networking, logs, permissions, packages, firewalling, hardening, scripting, Docker, Nginx, databases and cloud deployment.

Ubuntu operating system architecture

Applications / Services
                            ├── nginx
                            ├── postgres
                            ├── redis
                            ├── django
                            ├── docker
                            └── monitoring agents
                            │
                            ▼
                            User space
                            ├── bash / shell
                            ├── GNU tools
                            ├── systemd
                            ├── journald
                            ├── apt / dpkg
                            ├── ssh
                            └── libraries
                            │
                            ▼
                            Linux kernel
                            ├── process scheduler
                            ├── memory management
                            ├── filesystem layer
                            ├── network stack
                            ├── security modules
                            └── drivers
                            │
                            ▼
                            Hardware / Hypervisor
                            ├── CPU
                            ├── RAM
                            ├── disk
                            ├── network card
                            ├── KVM / VMware
                            └── cloud hypervisor

What each layer means in operations

Layer	Typical admin action	Diagnostic command
Application	Restart service, inspect config, read logs.	`systemctl status nginx`
User space	Install package, manage users, run scripts.	`apt list --installed`
Systemd	Enable boot services and dependencies.	`journalctl -u service`
Kernel	Check memory, processes, sockets, I/O.	`dmesg`, `ss`, `top`
Storage	Mount disks, inspect usage, tune I/O.	`df -h`, `lsblk`
Network	Check IP, routes, DNS, firewall.	`ip a`, `ip r`, `resolvectl`

Common mistake: debugging randomly. On Ubuntu, start with a layer: service, logs, process, network, storage, permissions, package, kernel.

Ubuntu Desktop, Server and Cloud

Edition / usage	Main purpose	Typical user	Key components
Ubuntu Desktop	Workstation, development, daily OS.	Developer, engineer, analyst.	GNOME, terminal, browser, IDEs, Docker.
Ubuntu Server	Production services and infrastructure.	DevOps, SRE, backend engineer.	SSH, systemd, apt, netplan, firewall.
Ubuntu Cloud Image	Cloud VM baseline.	Cloud engineer, platform team.	cloud-init, optimized kernel, cloud agent.
Ubuntu Container Base	Base image for containers.	DevOps, application engineer.	minimal packages, apt, runtime libraries.
Ubuntu Core	IoT and embedded-oriented variant.	IoT platform team.	snap-based, transactional updates.

Use-case examples

Desktop:
                            - Python development
                            - Docker-based local stack
                            - SSH into production servers
                            - Kubernetes and cloud CLI tools

                            Server:
                            - Nginx reverse proxy
                            - Django or Node.js API
                            - PostgreSQL or Redis host
                            - monitoring server
                            - VPN or bastion host

                            Cloud:
                            - EC2 instance
                            - Azure VM
                            - GCP Compute Engine
                            - OpenStack instance
                            - Kubernetes node

Edition decision tree

Need a local workstation?
                            └── Ubuntu Desktop

                            Need a production VM?
                            └── Ubuntu Server LTS

                            Need a cloud instance?
                            └── Ubuntu cloud image

                            Need a container base?
                            └── Ubuntu minimal/base image

                            Need IoT appliance-like OS?
                            └── Ubuntu Core

                            Need enterprise security extensions?
                            └── Ubuntu LTS + Ubuntu Pro

Practical distinction

Desktop vs Server: Desktop has a graphical environment and workstation tools. Server is lighter, usually SSH-only, and optimized for services. In production, Ubuntu Server LTS is the common baseline.

LTS model, releases and upgrade strategy

Ubuntu is frequently chosen in production because of the LTS model. LTS means long-term support: a stable base release used for servers, cloud images and enterprise deployments. Non-LTS releases are useful for newer software, but less common as a conservative production baseline.

Release type	Best for	Production recommendation
LTS	Servers, cloud, enterprise, long-lived systems.	Default choice for production.
Interim release	Newer packages, testing, short-lived environments.	Use only with clear upgrade discipline.
Rolling behavior	Not Ubuntu's main model.	Use another distro if rolling release is required.

Upgrade strategy

Safe production upgrade path:

                            1. Inventory servers and services
                            2. Confirm current Ubuntu version
                            3. Check application compatibility
                            4. Snapshot or backup
                            5. Test upgrade on staging
                            6. Review package changes
                            7. Schedule maintenance window
                            8. Upgrade one node first
                            9. Validate services and logs
                            10. Roll out progressively
                            11. Keep rollback plan ready

Version management commands

# Show Ubuntu version
                            lsb_release -a

                            # Show OS release file
                            cat /etc/os-release

                            # Show kernel version
                            uname -a

                            # Update package lists
                            sudo apt update

                            # Upgrade installed packages
                            sudo apt upgrade

                            # Full upgrade with dependency changes
                            sudo apt full-upgrade

                            # Check reboot requirement
                            test -f /var/run/reboot-required && cat /var/run/reboot-required

Release risk table

Risk	Cause	Control
Package incompatibility	Runtime or library version changes.	Test staging before production.
Service restart failure	Config syntax or dependency change.	Validate configs before restart.
Kernel reboot required	Security kernel update.	Plan reboot window.
Repository mismatch	Third-party packages not ready.	Audit external repositories.

Ubuntu in enterprise and production

In enterprise environments, Ubuntu is used when teams need a stable Linux baseline with strong cloud support, broad package availability, automation compatibility and a known operational model. It is especially common for backend platforms, DevOps infrastructure, Kubernetes nodes, CI runners and cloud-hosted services.

Enterprise requirement	Ubuntu answer	Operational practice
Security patching	Regular package and kernel updates.	Patch windows and reboot strategy.
Repeatable deployment	Cloud images, apt, automation tools.	Ansible, Terraform, cloud-init.
Service supervision	systemd standard service manager.	Unit files, restart policy, journald logs.
Access control	Linux users, groups, sudo, SSH.	Least privilege and key-based access.
Observability	journald, syslog, metrics agents.	Central logging and monitoring.
Cloud integration	Images and cloud-init.	Bootstrap on first boot.

Production server lifecycle

Provision
                            │
                            ├── select Ubuntu LTS image
                            ├── configure cloud-init
                            ├── attach disk
                            └── configure network
                            │
                            ▼
                            Harden
                            │
                            ├── SSH keys
                            ├── disable root login
                            ├── firewall
                            ├── unattended upgrades policy
                            └── least-privilege users
                            │
                            ▼
                            Deploy
                            │
                            ├── install packages
                            ├── configure services
                            ├── systemd unit files
                            └── application release
                            │
                            ▼
                            Operate
                            │
                            ├── logs
                            ├── metrics
                            ├── backups
                            ├── patching
                            └── incident response

Professional checklist

[ ] LTS release selected
                            [ ] SSH key access only
                            [ ] sudo policy controlled
                            [ ] firewall enabled
                            [ ] services managed by systemd
                            [ ] logs visible through journalctl
                            [ ] backups configured
                            [ ] monitoring installed
                            [ ] security updates planned
                            [ ] disk usage monitored
                            [ ] certificates tracked
                            [ ] rollback plan documented

Core Ubuntu administration toolkit

Area	Tools	Typical command
Packages	apt, dpkg	`sudo apt install nginx`
Services	systemd, systemctl	`sudo systemctl restart nginx`
Logs	journalctl, syslog	`journalctl -u nginx -f`
Network	ip, ss, resolvectl, netplan	`ss -lntp`
Firewall	ufw, nftables	`sudo ufw status verbose`
Storage	df, du, lsblk, mount	`df -h`
Processes	ps, top, htop, kill	`ps aux \| grep nginx`
Users	useradd, usermod, sudoers	`sudo usermod -aG sudo user`

First diagnostic commands

# System identity
                            hostnamectl
                            cat /etc/os-release
                            uptime

                            # CPU and memory
                            top
                            free -h

                            # Disk usage
                            df -h
                            du -sh /var/log/*

                            # Network
                            ip a
                            ip r
                            ss -lntp
                            resolvectl status

                            # Services
                            systemctl status nginx
                            journalctl -u nginx --since "30 min ago"

                            # Packages
                            apt policy nginx
                            dpkg -l | grep nginx

                            # Security
                            sudo ufw status verbose
                            sudo journalctl -u ssh --since today

Good operator habit: before changing anything, collect facts: service status, logs, listening ports, disk space, memory, recent package changes and firewall state.

Typical Ubuntu production stacks

Stack	Components	Ubuntu role
Django / Python API	Nginx, Gunicorn, Django, PostgreSQL, Redis.	Host services, packages, systemd units, logs.
Node.js API	Nginx, Node.js, PM2/systemd, database.	Runtime host and reverse proxy.
Docker host	Docker Engine, Compose, images, volumes.	Container runtime platform.
Database server	PostgreSQL, MySQL, MariaDB, backups.	Storage, service control, tuning, logs.
Monitoring server	Prometheus, Grafana, Loki, exporters.	Observability host.
Bastion host	SSH gateway, audit, restricted access.	Secure entry point.

Django deployment example

Internet
                            │
                            ▼
                            Nginx
                            │
                            ├── TLS termination
                            ├── static files
                            └── reverse proxy
                            │
                            ▼
                            Gunicorn systemd service
                            │
                            ▼
                            Django application
                            │
                            ├── PostgreSQL
                            ├── Redis
                            ├── Celery workers
                            └── media/static storage

Example service units

Common systemd units:
                            - nginx.service
                            - postgresql.service
                            - redis-server.service
                            - docker.service
                            - gunicorn.service
                            - celery.service
                            - celerybeat.service
                            - prometheus-node-exporter.service

                            Typical commands:
                            sudo systemctl enable nginx
                            sudo systemctl restart gunicorn
                            sudo systemctl status redis-server
                            journalctl -u celery -f

Minimal web server setup flow

1. Create server
                            2. Update packages
                            3. Create deploy user
                            4. Configure SSH
                            5. Install Nginx
                            6. Install app runtime
                            7. Configure database
                            8. Create systemd service
                            9. Configure TLS
                            10. Enable firewall
                            11. Add monitoring
                            12. Add backup
                            13. Document runbook

Backend angle: Ubuntu is where application theory becomes production reality: processes, ports, permissions, logs, memory, disk and security.

Common risks, anti-patterns and production mistakes

Anti-pattern	Risk	Correction
Logging in as root directly	Weak audit and high blast radius.	Use named users, sudo and SSH keys.
Public SSH with passwords	Brute-force exposure.	Key-only SSH, firewall, fail2ban or VPN.
Ignoring package updates	Known vulnerabilities remain active.	Patch policy and reboot planning.
No service manager	App dies and does not restart.	Use systemd with restart policy.
No log strategy	Incidents are hard to diagnose.	Use journald, logrotate and central logs.
Manual untracked changes	Server becomes unreproducible.	Use automation and versioned configs.
No disk monitoring	Full disk causes outage.	Monitor filesystem usage and logs.
No rollback plan	Failed upgrade becomes long outage.	Snapshot, backup and tested restore path.

Incident diagnostic decision tree

Application is down
                            │
                            ├── Is server reachable?
                            │       ├── no  -> network, firewall, cloud, DNS
                            │       └── yes
                            │
                            ├── Is service running?
                            │       ├── no  -> systemctl status + journalctl
                            │       └── yes
                            │
                            ├── Is port listening?
                            │       ├── no  -> config or bind failure
                            │       └── yes
                            │
                            ├── Is reverse proxy healthy?
                            │       ├── no  -> nginx config/logs
                            │       └── yes
                            │
                            ├── Is database reachable?
                            │       ├── no  -> DB service/network/auth
                            │       └── yes
                            │
                            └── Is app throwing errors?
                            ├── yes -> application logs
                            └── no  -> upstream routing/cache/client issue

First-response commands

systemctl status nginx
                            journalctl -u nginx --since "15 min ago"
                            ss -lntp
                            df -h
                            free -h
                            top
                            sudo ufw status
                            curl -I http://localhost
                            curl -I https://example.com

Production rule: never change several things at once during an incident. Observe, isolate, change one thing, verify, document.

Official links and useful references

Resource	URL	Usage
Ubuntu main site	`https://ubuntu.com/`	Product overview and downloads.
Download Ubuntu	`https://ubuntu.com/download`	Desktop, server and cloud downloads.
Ubuntu documentation	`https://documentation.ubuntu.com/`	Official documentation portal.
Ubuntu Server docs	`https://documentation.ubuntu.com/server/`	Server administration reference.
Ubuntu releases	`https://releases.ubuntu.com/`	Release images and versions.
Ubuntu packages	`https://packages.ubuntu.com/`	Package lookup.
Ubuntu security notices	`https://ubuntu.com/security/notices`	Security update tracking.

Learning roadmap

Ubuntu learning path:

                            1. Shell basics
                            2. Filesystem and permissions
                            3. Users, groups and sudo
                            4. apt and packages
                            5. systemd services
                            6. journald and logs
                            7. networking and DNS
                            8. firewall and SSH hardening
                            9. storage and mounts
                            10. Nginx reverse proxy
                            11. database service operation
                            12. Docker host usage
                            13. backups and restore
                            14. monitoring and alerting
                            15. cloud-init and automation

One-line positioning

Ubuntu is a professional Linux platform for running and operating real systems: web servers, APIs, databases, CI/CD, containers, cloud workloads, monitoring and developer environments.

7.2 Ubuntu Software Management: App Center, DEB, Snap, Flatpak, PPAs and repository governance

Software management on Ubuntu

Ubuntu provides several ways to install software. The most important are: graphical installation through Ubuntu App Center, traditional Debian packages through APT and .deb files, Snap packages, Flatpak applications and third-party repositories such as PPAs.

The right method depends on the context. A desktop user may prefer App Center, Snap or Flatpak. A server administrator usually prefers APT and controlled repositories. A developer may use a vendor repository for Docker, PostgreSQL, Node.js or cloud tooling. A production team must control package origin, version, update policy and rollback.

Method	Best for	Strength	Risk
App Center	Desktop users and simple installs.	Easy graphical installation.	Less precise for production governance.
APT / DEB	Servers, system packages, standard tools.	Native Ubuntu package management.	Repository conflicts if unmanaged.
Snap	Sandboxed apps and some Canonical-supported tools.	Bundled dependencies and automatic refresh.	Refresh policy and confinement must be understood.
Flatpak	Desktop applications, especially cross-distro apps.	Good desktop app ecosystem.	Another runtime and update channel to govern.
PPA	Newer versions or community packages.	Access to versions not in official repos.	Trust, lifecycle and upgrade conflicts.
Vendor repo	Official upstream packages.	Best path for many professional tools.	Keys, pinning and repository ownership matter.

Core rule: for production servers, prefer official Ubuntu repositories or official vendor repositories. Use PPAs and manual .deb installs only with explicit justification.

Software source map

Ubuntu software sources
                            │
                            ├── App Center
                            │       ├── graphical install
                            │       ├── desktop apps
                            │       └── simple discovery
                            │
                            ├── APT repositories
                            │       ├── official Ubuntu repos
                            │       ├── security updates
                            │       ├── vendor repos
                            │       └── PPAs
                            │
                            ├── Local DEB files
                            │       ├── downloaded installer
                            │       ├── vendor package
                            │       └── manual install
                            │
                            ├── Snap
                            │       ├── snap store
                            │       ├── channels
                            │       ├── sandbox
                            │       └── auto refresh
                            │
                            └── Flatpak
                            ├── Flathub
                            ├── desktop app runtimes
                            ├── sandbox permissions
                            └── user-level installs

Decision shortcut

Need a server package?
                            └── APT from Ubuntu or official vendor repository

                            Need a desktop application?
                            └── App Center, Snap or Flatpak

                            Need a newer application version?
                            ├── check official vendor repo first
                            ├── then consider PPA
                            └── document the reason

                            Need a one-off local installer?
                            └── .deb file with verified source

                            Need strict production reproducibility?
                            └── APT + pinned repositories + automation

Ubuntu App Center: simplified graphical installation

Ubuntu App Center is the graphical software interface on Ubuntu Desktop. It is designed for easy discovery, installation and removal of common applications. It is convenient for desktop workflows, but it is not the primary tool for server automation or strict production package governance.

Use case	App Center fit	Comment
Install browser, editor, media tool	Excellent.	Simple desktop workflow.
Discover common applications	Excellent.	Good for non-terminal users.
Install developer desktop tools	Good.	Check whether package is Snap or DEB.
Production server package	Poor fit.	Use APT, automation or vendor repo.
Fleet management	Poor fit.	Use Ansible, cloud-init, image build or MDM.

Typical App Center flow

Ubuntu Desktop
                            │
                            ├── Open App Center
                            ├── Search application
                            ├── Review publisher and package type
                            ├── Click Install
                            ├── Authenticate if required
                            ├── Launch application
                            └── Update through system update flow

Desktop rule: App Center is excellent for user convenience, but administrators should still understand which packaging technology is actually used behind the install.

What to verify before installing

Before installing a desktop app:
                            [ ] Is the publisher trusted?
                            [ ] Is it a Snap, DEB or Flatpak package?
                            [ ] Is the app maintained?
                            [ ] Does it need sensitive permissions?
                            [ ] Is there an official vendor package?
                            [ ] Is it needed system-wide or only for one user?
                            [ ] Is it appropriate for a professional workstation?

Graphical vs CLI management

Approach	Strength	Weakness
App Center	Easy, visual, good for desktop users.	Less scriptable and less auditable.
APT CLI	Scriptable, auditable, server-friendly.	Requires terminal knowledge.
Snap CLI	Precise Snap control.	Requires understanding channels and refresh.
Flatpak CLI	Good app and permission control.	Separate ecosystem and runtimes.

Useful desktop package checks

# Show installed Snap packages
                            snap list

                            # Show installed DEB packages
                            dpkg -l | less

                            # Search APT package
                            apt search package-name

                            # Show package origin
                            apt policy package-name

                            # Show Flatpak apps if installed
                            flatpak list

Governance warning: easy installation does not mean safe installation. Always care about publisher, update channel, permissions and package origin.

DEB packages and APT: native Ubuntu package management

Ubuntu is based on Debian packaging. A .deb file is a Debian package. APT is the higher-level tool that downloads packages from repositories, resolves dependencies, installs upgrades and tracks package versions.

Concept	Meaning	Command
`.deb`	Local Debian package file.	`sudo apt install ./file.deb`
`apt`	High-level package manager.	`sudo apt install nginx`
`dpkg`	Low-level package tool.	`dpkg -l`
Repository	Package source.	`/etc/apt/sources.list.d/`
Dependency	Package required by another package.	Resolved by APT.
Candidate version	Version APT would install.	`apt policy package`

APT essentials

# Update package metadata
                            sudo apt update

                            # Install package from repository
                            sudo apt install nginx

                            # Install local DEB file with dependency resolution
                            sudo apt install ./package.deb

                            # Remove package but keep config
                            sudo apt remove package-name

                            # Remove package and config
                            sudo apt purge package-name

                            # Upgrade packages
                            sudo apt upgrade

                            # Show package details
                            apt show package-name

                            # Show installed and candidate versions
                            apt policy package-name

DEB installation flow

Local DEB file
                            │
                            ├── verify source
                            ├── check vendor signature or checksum if available
                            ├── install with apt
                            │       └── sudo apt install ./package.deb
                            ├── inspect installed package
                            │       └── dpkg -l | grep package
                            ├── verify service or binary
                            └── document install source

Package inspection commands

# List installed packages
                            dpkg -l

                            # Filter installed packages
                            dpkg -l | grep nginx

                            # Show package status
                            dpkg -s nginx

                            # Show files installed by package
                            dpkg -L nginx

                            # Find package owning a file
                            dpkg -S /usr/sbin/nginx

                            # Show APT history
                            less /var/log/apt/history.log

                            # Show available versions
                            apt-cache madison nginx

Production DEB rules

Do:
                            - prefer repository installation over random downloads
                            - use official vendor DEB if needed
                            - keep package source documented
                            - automate installation in scripts or Ansible
                            - review apt history after changes

                            Avoid:
                            - random DEB files from unknown sites
                            - manual installs without documentation
                            - local DEB files with no update path
                            - mixing multiple competing repositories
                            - installing critical server packages from untrusted sources

Server rule: APT and DEB are the default professional path for Ubuntu server software because they integrate with repositories, updates, logs and automation.

Snap packages: bundled apps, channels, confinement and refresh

Snap packages bundle applications with their dependencies and run with a confinement model. They are distributed through the Snap ecosystem and can use channels such as stable, candidate, beta or edge. Snap refresh behavior is important because packages can update automatically.

Snap concept	Meaning	Operational impact
Channel	Release track.	Stable is safer than beta or edge.
Revision	Specific build of a Snap.	Can support revert to previous revision.
Confinement	Sandbox permissions.	May restrict filesystem/device access.
Interface	Permission connection.	May require manual connection.
Refresh	Update mechanism.	Needs maintenance policy for servers.

Snap commands

# List installed snaps
                            snap list

                            # Search for app
                            snap find package-name

                            # Show package info
                            snap info package-name

                            # Install stable channel
                            sudo snap install package-name --channel=stable

                            # Refresh snaps
                            sudo snap refresh

                            # Show refresh schedule
                            snap refresh --time

                            # Remove snap
                            sudo snap remove package-name

Snap operations

# Show changes
                            snap changes

                            # Show connections
                            snap connections package-name

                            # Connect interface
                            sudo snap connect package-name:interface

                            # Revert to previous revision if available
                            sudo snap revert package-name

                            # Hold refresh temporarily
                            sudo snap refresh --hold=24h package-name

                            # Logs for snap service
                            snap logs package-name

Snap decision tree

Considering Snap?
                            │
                            ├── Desktop application?
                            │       └── often acceptable
                            │
                            ├── Server daemon?
                            │       ├── check refresh policy
                            │       ├── check confinement
                            │       ├── check logs
                            │       └── check rollback
                            │
                            ├── Need strict package timing?
                            │       └── prefer APT or control refresh window
                            │
                            └── Need sandboxed app delivery?
                            └── Snap can be a good fit

Snap strengths and cautions

Strength	Caution
Bundled dependencies.	More disk usage than native package in some cases.
Simple install path.	Refresh behavior must be understood.
Sandbox confinement.	Permissions may surprise users or services.
Channels and revert.	Wrong channel can increase instability.

Production warning: before using Snap for a server component, define refresh policy, rollback, monitoring and service ownership.

Flatpak: desktop application distribution and Flathub ecosystem

Flatpak is a cross-distribution packaging system often used for desktop applications. Applications run with a sandbox model and rely on runtimes. Flatpak is especially common when users want recent desktop applications independently from the system package version.

Flatpak concept	Meaning	Operational note
Remote	Package source.	Flathub is the common public remote.
Runtime	Shared dependency platform.	Required by Flatpak apps.
Application ID	Unique app identifier.	Example: `org.gimp.GIMP`.
Sandbox	Permission model.	Filesystem and device access can be restricted.
User install	Install for one user.	Useful on shared desktops.
System install	Install for all users.	Requires admin privileges.

Install Flatpak support

# Install Flatpak
                            sudo apt update
                            sudo apt install flatpak

                            # Add Flathub remote
                            flatpak remote-add --if-not-exists flathub https://flathub.org/repo/flathub.flatpakrepo

                            # Search app
                            flatpak search gimp

                            # Install app
                            flatpak install flathub org.gimp.GIMP

                            # Run app
                            flatpak run org.gimp.GIMP

Flatpak operations

# List installed apps
                            flatpak list

                            # List remotes
                            flatpak remotes

                            # Update apps
                            flatpak update

                            # Show app info
                            flatpak info org.gimp.GIMP

                            # Uninstall app
                            flatpak uninstall org.gimp.GIMP

                            # Remove unused runtimes
                            flatpak uninstall --unused

                            # Show app permissions
                            flatpak info --show-permissions org.gimp.GIMP

Flatpak fit

Context	Flatpak fit	Comment
Desktop apps	Strong.	Especially when recent versions matter.
Server daemons	Weak.	APT or vendor repo is usually better.
Developer workstation	Good.	Useful for GUI tools.
Production fleet	Limited.	Needs desktop app governance.

Flatpak vs Snap mental model

Snap:
                            - integrated by default on Ubuntu
                            - used for desktop apps and selected system tools
                            - has channels and refresh behavior

                            Flatpak:
                            - popular for cross-distro desktop apps
                            - commonly uses Flathub
                            - strong desktop application ecosystem
                            - often installed separately on Ubuntu

Flatpak rule: good for desktop applications, not the default path for production server packages.

PPA repositories: newer software versions and controlled exceptions

A PPA is a third-party APT repository hosted on Launchpad. PPAs are useful when the official Ubuntu repository does not provide the needed version, but they must be treated as trust decisions. Adding a PPA can change package candidates, dependencies and upgrade behavior.

PPA use case	Good reason?	Production caution
Need newer desktop app	Sometimes.	Check maintainer and update history.
Need newer dev tool	Sometimes.	Prefer official vendor repo when available.
Need critical server package	Rarely.	Use official Ubuntu or vendor repo if possible.
Random tutorial says add PPA	No.	Understand why before adding.
Temporary test machine	Acceptable.	Disposable environment lowers risk.

PPA commands

# Install helper if needed
                            sudo apt install software-properties-common

                            # Add PPA
                            sudo add-apt-repository ppa:owner/name

                            # Update metadata
                            sudo apt update

                            # Install package
                            sudo apt install package-name

                            # Show package origin and candidate
                            apt policy package-name

                            # Remove PPA source
                            sudo add-apt-repository --remove ppa:owner/name

PPA governance flow

Need a PPA?
                            │
                            ├── Is package available in official Ubuntu repo?
                            │       ├── yes -> prefer official repo
                            │       └── no
                            │
                            ├── Is there an official vendor repository?
                            │       ├── yes -> prefer vendor repo
                            │       └── no
                            │
                            ├── Is PPA trusted and maintained?
                            │       ├── no -> reject
                            │       └── yes
                            │
                            ├── Is this production?
                            │       ├── yes -> document and test in staging
                            │       └── no -> acceptable for lab if understood
                            │
                            └── Add with owner, reason and review date

Inspect repository sources

# Source list files
                            ls -lah /etc/apt/sources.list.d/

                            # Search active deb lines
                            grep -R "^deb" /etc/apt/sources.list /etc/apt/sources.list.d/

                            # Show package candidate and priorities
                            apt policy package-name

                            # Show all versions
                            apt-cache madison package-name

                            # Recent repository changes
                            sudo find /etc/apt -type f -mtime -30 -ls

PPA risk matrix

Risk	Cause	Control
Wrong package version selected	PPA has higher candidate version.	Check `apt policy`.
Upgrade conflict	PPA dependencies diverge.	Test in staging.
Abandoned package	Maintainer stops updates.	Review regularly.
Supply-chain concern	Untrusted publisher.	Prefer official source.

PPA rule: a PPA is not just a package. It is a new repository that may influence dependency resolution across the system.

DEB vs Snap vs Flatpak vs PPA: practical comparison

Criterion	DEB / APT	Snap	Flatpak	PPA
Best target	Server and system packages.	Desktop apps and selected tools.	Desktop apps.	Newer APT packages.
Dependency model	System dependencies.	Bundled dependencies.	Runtimes and bundled app parts.	APT dependencies from repo.
Update model	APT updates.	Snap refresh.	Flatpak update.	APT updates from PPA.
Sandboxing	Usually no app sandbox.	Confinement model.	Sandbox model.	Same as APT package.
Production servers	Best default.	Case-by-case.	Usually no.	Exception only.
Desktop apps	Good.	Good.	Good.	Sometimes.
Governance complexity	Medium.	Medium.	Medium.	High if unmanaged.

Choice diagram

Choose package format
                            │
                            ├── Is this a production server dependency?
                            │       ├── yes -> APT / DEB / official vendor repo
                            │       └── no
                            │
                            ├── Is this a desktop GUI app?
                            │       ├── yes -> App Center, Snap or Flatpak
                            │       └── no
                            │
                            ├── Do you need newest upstream version?
                            │       ├── yes -> official vendor repo first
                            │       ├── then PPA if trusted
                            │       └── document exception
                            │
                            ├── Do you need sandboxed desktop app?
                            │       ├── yes -> Snap or Flatpak
                            │       └── no
                            │
                            └── Need reproducible fleet?
                            └── automate APT and pin sources

Use-case recommendations

Use case	Preferred option	Reason
Nginx on server	APT.	Native service integration.
PostgreSQL production	Ubuntu repo or official PostgreSQL repo.	Clear lifecycle and updates.
Docker Engine	Official Docker repo or Ubuntu package by policy.	Version and support clarity.
Desktop editor	App Center, Snap, DEB or vendor repo.	Depends on vendor support.
Graphic design app	Flatpak or Snap often acceptable.	Desktop app freshness.

Simple rule: server software should be boring and governed. Desktop software can be more flexible if publisher and permissions are understood.

Security, provenance and update governance

Software installation is a supply-chain decision. Every package source can install code with user or system privileges. Good governance means knowing where software comes from, how it updates, who maintains it and how to roll back when it breaks.

Risk	Example	Control
Untrusted publisher	Random DEB or PPA.	Use official source or trusted vendor.
Unexpected updates	Snap refresh, PPA version change.	Control channels, windows and policy.
Dependency conflict	PPA overrides Ubuntu package.	Check `apt policy` and pin if needed.
Abandoned package	No security patches.	Review source health.
Secret exposure	Install script writes credentials.	Inspect scripts, avoid long-lived secrets.
No rollback path	Manual install with no version record.	Document package version and source.

Pre-install security checklist

[ ] Is the source official?
                            [ ] Is the publisher trusted?
                            [ ] Is the package maintained?
                            [ ] Is the update mechanism known?
                            [ ] Is the package type known?
                            [ ] Is the installation reversible?
                            [ ] Is the version documented?
                            [ ] Does it add a repository?
                            [ ] Does it add a signing key?
                            [ ] Does it run a script as root?
                            [ ] Does it request sensitive permissions?
                            [ ] Is it approved for production?

Install script warning pattern

Risky pattern:
                            curl https://example.com/install.sh | sudo bash

                            Safer pattern:
                            1. Download script
                            2. Inspect script
                            3. Verify source
                            4. Verify checksum or signature if available
                            5. Run intentionally
                            6. Record package source
                            7. Test in staging first

Repository audit commands

# Show APT sources
                            grep -R "^deb" /etc/apt/sources.list /etc/apt/sources.list.d/

                            # List source files
                            ls -lah /etc/apt/sources.list.d/

                            # Show package origin
                            apt policy package-name

                            # Show installed Snap packages
                            snap list

                            # Show Flatpak remotes and apps
                            flatpak remotes
                            flatpak list

                            # Show recent package operations
                            less /var/log/apt/history.log

Production software governance

Governance record:
                            - package name
                            - package type
                            - source repository
                            - publisher
                            - installed version
                            - update policy
                            - rollback method
                            - owner
                            - reason
                            - review date

Supply-chain rule: package installation is code execution. Treat unknown packages as a security risk, not as a convenience.

Troubleshooting software installation and updates

Symptom	Likely cause	First command	Fix direction
APT lock error	Another apt/dpkg process running.	`ps aux \| grep -E 'apt\|dpkg'`	Wait or investigate process.
Broken packages	Interrupted install or dependency conflict.	`sudo dpkg --configure -a`	Repair dpkg and dependencies.
Repository signature error	Missing or wrong signing key.	`sudo apt update`	Fix keyring or remove repo.
Package version unexpected	PPA or vendor repo changes candidate.	`apt policy package-name`	Pin, remove repo or choose version.
Snap app cannot access file	Confinement or interface issue.	`snap connections app`	Connect interface or adjust path.
Flatpak app missing permission	Sandbox permission.	`flatpak info --show-permissions app`	Adjust permission intentionally.

APT repair commands

# Repair interrupted package configuration
                            sudo dpkg --configure -a

                            # Fix broken dependencies
                            sudo apt -f install

                            # Refresh metadata
                            sudo apt update

                            # Clean package cache
                            sudo apt clean

                            # Check holds
                            apt-mark showhold

                            # Review package history
                            less /var/log/apt/history.log
                            less /var/log/apt/term.log

Package troubleshooting decision tree

Software install failed
                            │
                            ├── Read exact error
                            │
                            ├── APT lock?
                            │       └── check apt/dpkg process
                            │
                            ├── DNS or network?
                            │       └── resolvectl, dig, curl
                            │
                            ├── Signature or key?
                            │       └── inspect source and keyring
                            │
                            ├── Dependency conflict?
                            │       └── apt policy, apt -f install, holds
                            │
                            ├── PPA conflict?
                            │       └── disable source, apt update
                            │
                            ├── Snap confinement?
                            │       └── snap connections
                            │
                            └── Flatpak permission?
                            └── flatpak info --show-permissions

Disable source temporarily

# Disable a repository source file
                            sudo mv /etc/apt/sources.list.d/vendor.list \
                            /etc/apt/sources.list.d/vendor.list.disabled

                            # Refresh metadata
                            sudo apt update

                            # Check package candidate again
                            apt policy package-name

Snap and Flatpak diagnostics

# Snap
                            snap list
                            snap info package-name
                            snap changes
                            snap connections package-name
                            snap logs package-name

                            # Flatpak
                            flatpak list
                            flatpak remotes
                            flatpak info app-id
                            flatpak info --show-permissions app-id
                            flatpak update

Troubleshooting rule: package problems usually have a precise error. Read it before applying repair commands.

Final checklist and command cheat sheet

Software management checklist

[ ] Package type is understood
                            [ ] Source is trusted
                            [ ] Publisher is verified
                            [ ] Update mechanism is known
                            [ ] Rollback path exists
                            [ ] Repository additions are documented
                            [ ] PPAs are justified
                            [ ] Vendor repos are preferred over random PPAs
                            [ ] Local DEB files are avoided unless necessary
                            [ ] Snap refresh behavior is understood
                            [ ] Flatpak remotes are known
                            [ ] Production servers use governed sources
                            [ ] Package changes are traceable
                            [ ] Staging test exists for critical software
                            [ ] Security updates are planned

APT / DEB cheat sheet

sudo apt update
                            sudo apt install package-name
                            sudo apt install ./package.deb
                            sudo apt remove package-name
                            sudo apt purge package-name
                            sudo apt upgrade
                            apt search package-name
                            apt show package-name
                            apt policy package-name
                            dpkg -l | grep package-name
                            dpkg -L package-name
                            dpkg -S /path/to/file
                            less /var/log/apt/history.log

Snap / Flatpak / PPA cheat sheet

# Snap
                            snap list
                            snap find package-name
                            snap info package-name
                            sudo snap install package-name
                            sudo snap refresh
                            snap refresh --time
                            sudo snap remove package-name

                            # Flatpak
                            flatpak remotes
                            flatpak search package-name
                            flatpak install flathub app-id
                            flatpak run app-id
                            flatpak update
                            flatpak uninstall app-id

                            # PPA
                            sudo apt install software-properties-common
                            sudo add-apt-repository ppa:owner/name
                            sudo apt update
                            apt policy package-name
                            sudo add-apt-repository --remove ppa:owner/name

Final rule

Ubuntu software management is source governance.
Installing software means trusting a publisher, an update channel and a dependency chain. Use App Center for simple desktop workflows, APT and DEB for professional server management, Snap or Flatpak for selected desktop/application cases, and PPAs only as controlled exceptions.

Production default

Production software default:
                            - Ubuntu LTS
                            - official Ubuntu repositories
                            - official vendor repositories when needed
                            - no random PPAs
                            - no unknown DEB downloads
                            - package baseline automated
                            - update policy documented
                            - rollback path tested
                            - package history reviewed after changes

1.2 Ubuntu Versions & Release Cycle: LTS, interim, support, upgrade strategy and production choice

Ubuntu release model

Ubuntu follows a predictable release model with two main families: LTS releases and interim releases. LTS means Long-Term Support and is the default choice for production systems. Interim releases provide newer software faster, but with a much shorter support window.

In professional environments, version choice is not cosmetic. It impacts security patching, kernel behavior, package versions, compatibility, cloud images, automation, compliance, upgrade windows and rollback strategy.

Release type	Typical cadence	Support model	Best usage
LTS	Every 2 years	Long support window, production-oriented.	Servers, cloud, enterprise, databases, Kubernetes nodes.
Interim	Between LTS releases	Short support window.	Testing, newer kernels, recent desktop features, short-lived dev systems.
Point release	LTS refresh images	Updated installer media for the same LTS family.	Fresh installs with fewer post-install updates.
ESM / Ubuntu Pro	After standard support or for broader package coverage	Extended security maintenance model.	Long-lived enterprise systems that cannot upgrade quickly.

Simple rule: for production, choose an Ubuntu LTS baseline unless there is a very specific reason to use an interim release.

Release cycle mental diagram

Ubuntu release cycle
                            │
                            ├── LTS release
                            │       ├── stable baseline
                            │       ├── long security maintenance
                            │       ├── enterprise-friendly
                            │       ├── common cloud image
                            │       └── recommended for production
                            │
                            ├── Interim release
                            │       ├── newer kernel
                            │       ├── newer user-space
                            │       ├── shorter support
                            │       ├── useful for testing
                            │       └── requires upgrade discipline
                            │
                            ├── Point release
                            │       ├── refreshed installer image
                            │       ├── accumulated updates
                            │       └── useful for new deployments
                            │
                            └── ESM / extended support
                            ├── longer security coverage
                            ├── used when upgrade is delayed
                            └── enterprise lifecycle tool

What version choice affects

Version choice affects:
                            - kernel version
                            - driver support
                            - OpenSSL version
                            - Python / PHP / Node packages
                            - systemd behavior
                            - Netplan / network stack
                            - cloud-init behavior
                            - container runtime support
                            - security patch horizon
                            - application certification
                            - upgrade planning
                            - operational risk

LTS vs interim: practical comparison

Criterion	LTS	Interim
Primary goal	Stability and long-term operation.	Newer features and faster evolution.
Production fit	Excellent default choice.	Only if justified and actively managed.
Security maintenance	Long support window.	Short support window.
Kernel freshness	Stable, sometimes less recent.	More recent.
Package freshness	Conservative.	Newer versions.
Operational burden	Lower.	Higher, because upgrades are frequent.
Cloud image standardization	Excellent.	Less common for long-lived fleets.
Best for	Servers, DBs, APIs, CI runners, cloud nodes.	Labs, test machines, recent hardware, feature validation.

Decision shortcut

Choose LTS when:
                            - server is production
                            - database is production
                            - uptime matters
                            - patching must be predictable
                            - infrastructure must be standardized
                            - cloud images are reused
                            - upgrade windows are rare
                            - compliance matters

                            Choose interim when:
                            - testing newer kernel
                            - testing new desktop stack
                            - testing new hardware support
                            - environment is disposable
                            - upgrade cadence is accepted
                            - production risk is low

Common professional rule

LTS-first policy: in enterprise, servers usually standardize on one LTS version. Interim releases are exceptions that must be justified, documented and upgraded before support ends.

Bad decision examples

Bad:
                            - installing an interim release on a long-lived database server
                            - using different Ubuntu versions randomly across servers
                            - upgrading production without staging validation
                            - ignoring end-of-support dates
                            - choosing a release because it is "newer" only

                            Better:
                            - define one production LTS baseline
                            - define patch cadence
                            - define upgrade window
                            - keep rollback images
                            - document exceptions

Current release examples and how to read them

Ubuntu versions use a year.month format. For example, 24.04 means a release from April 2024. LTS releases are usually April releases in even-numbered years. Point releases such as 24.04.4 are refreshed installation images for the same LTS family.

Example	Meaning	Use case	What to remember
24.04 LTS	Noble Numbat LTS family.	Production baseline, servers, cloud.	Long-term support release.
24.04.x LTS	Point release inside the 24.04 LTS family.	Fresh install image with accumulated updates.	Still same LTS generation.
25.10	Interim release.	Short-lived dev/test or recent features.	Requires faster upgrade planning.
22.04 LTS	Previous LTS generation.	Existing production fleets.	Plan migration before support constraints become urgent.
20.04 LTS	Older LTS generation.	Legacy systems.	Often requires ESM/Pro or migration plan.

Practical reading: 24.04.4 LTS means the 4th point-release image of Ubuntu 24.04 LTS, not a completely different major OS generation.

Version naming pattern

Ubuntu version format:
                            YY.MM

                            Examples:
                            22.04 = April 2022
                            24.04 = April 2024
                            25.10 = October 2025

                            LTS examples:
                            20.04 LTS
                            22.04 LTS
                            24.04 LTS

                            Point release examples:
                            22.04.5 LTS
                            24.04.3 LTS
                            24.04.4 LTS

                            Meaning:
                            major LTS family + refreshed installer media

Release interpretation flow

See a version number
                            │
                            ▼
                            Is it marked LTS?
                            ├── yes
                            │   ├── good production candidate
                            │   ├── check standard support date
                            │   └── check Pro/ESM if long-lived
                            │
                            └── no
                            ├── interim release
                            ├── check short support date
                            └── use mainly for dev/test unless justified

Useful official sources

Ubuntu release cycle:
                            https://ubuntu.com/about/release-cycle

                            Ubuntu releases:
                            https://releases.ubuntu.com/

                            Ubuntu release list:
                            https://documentation.ubuntu.com/project/release-team/list-of-releases/

                            Ubuntu release notes:
                            https://documentation.ubuntu.com/release-notes/

Support timeline: standard support, ESM and end of life

Support lifecycle matters because an unsupported server becomes a security and compliance risk. Once a release is out of standard support, teams must either upgrade, use an extended maintenance option if available, or retire the system.

Lifecycle phase	Meaning	Operational action
Active standard support	Normal security and maintenance updates.	Patch regularly, monitor advisories.
Point release phase	Refreshed install media for LTS family.	Use latest point image for new servers.
Approaching end of standard support	Upgrade planning becomes urgent.	Inventory, staging test, migration window.
ESM / extended maintenance	Extended security coverage for supported scenarios.	Use as controlled bridge, not as excuse to avoid upgrades forever.
End of life	No normal support path for that release.	Upgrade, isolate, replace or retire.

Support responsibility map

Operating system lifecycle
                            │
                            ├── security updates
                            ├── kernel updates
                            ├── package patches
                            ├── repository availability
                            ├── vendor support
                            └── compliance status

                            Operations team responsibility
                            │
                            ├── know release version
                            ├── know support end date
                            ├── patch regularly
                            ├── plan reboots
                            ├── test upgrades
                            └── avoid unsupported servers

Timeline diagram

LTS release
                            │
                            ├── Year 0
                            │       └── release becomes production candidate
                            │
                            ├── Years 0-5
                            │       ├── standard security maintenance
                            │       ├── point releases
                            │       ├── cloud images maintained
                            │       └── normal production usage
                            │
                            ├── After standard support
                            │       ├── upgrade recommended
                            │       └── ESM / Ubuntu Pro may be used
                            │
                            └── Long-lived legacy phase
                            ├── higher operational risk
                            ├── stronger justification required
                            └── migration plan should exist

Operational policy example

Company Ubuntu policy:
                            - production servers use LTS only
                            - new projects use current LTS point image
                            - old LTS versions are reviewed quarterly
                            - unsupported releases are forbidden
                            - interim releases require architecture approval
                            - upgrade tests must pass in staging
                            - rollback image must exist
                            - patching window is monthly
                            - emergency CVE patching is immediate

Risk: an out-of-support OS may continue to run, but it becomes harder to patch, audit, insure, certify and defend during incidents.

How to choose the right Ubuntu version

Context	Recommended choice	Reason
Production web server	Latest stable LTS point release.	Security support, standardization, predictable patching.
Database server	LTS only.	Data systems need stability and tested upgrade windows.
Kubernetes node	LTS supported by your Kubernetes distribution.	Kernel, container runtime and vendor compatibility.
CI runner	LTS by default.	Reproducible builds and stable toolchains.
Developer workstation	LTS for stability, interim for recent desktop features.	Depends on tolerance for upgrades.
Recent hardware	LTS with HWE kernel or interim if required.	Driver and kernel support may matter.
Short-lived lab	Interim can be acceptable.	Easy to rebuild if support ends.

Production decision matrix

Production workload?
                            ├── yes -> LTS
                            └── no
                            │
                            ▼
                            Long-lived machine?
                            ├── yes -> LTS
                            └── no
                            │
                            ▼
                            Need newest kernel/userspace?
                            ├── yes -> interim or LTS HWE
                            └── no -> LTS

                            Compliance or security audit?
                            └── LTS + documented patch policy

Version choice scoring

Question	If yes	Impact
Will this server live more than 12 months?	Choose LTS.	Reduces upgrade pressure.
Does it host production data?	Choose LTS.	Stability matters more than novelty.
Is it part of a fleet?	Standardize on one LTS.	Improves automation and support.
Does hardware need a newer kernel?	Evaluate HWE or interim.	Driver support may override default.
Is it disposable?	Interim is acceptable.	Lower lifecycle risk.

Practical production baseline: for new servers, use the current LTS generation and the latest point-release image unless a compatibility requirement forces another choice.

Upgrade strategy: from one Ubuntu generation to another

Ubuntu upgrades should be treated as infrastructure changes, not casual package updates. A release upgrade may change kernel, libraries, system services, defaults, packages, Python versions, OpenSSL behavior, firewall tooling or network configuration.

Safe upgrade process

1. Inventory
                            - server role
                            - Ubuntu version
                            - kernel version
                            - installed packages
                            - services
                            - external repositories

                            2. Prepare
                            - backup data
                            - snapshot VM
                            - export configs
                            - check disk space
                            - review release notes

                            3. Test
                            - clone staging
                            - upgrade staging
                            - run application tests
                            - validate logs and services

                            4. Execute
                            - schedule maintenance
                            - stop risky jobs
                            - upgrade
                            - reboot
                            - validate services

                            5. Verify
                            - application health
                            - network ports
                            - logs
                            - performance
                            - monitoring

                            6. Rollback if needed
                            - restore snapshot
                            - restore old image
                            - revert DNS or load balancer

Upgrade architecture

Current production server
                            │
                            ├── snapshot / AMI / backup
                            ├── package inventory
                            ├── config export
                            └── staging clone
                            │
                            ▼
                            Staging upgrade
                            │
                            ├── do-release-upgrade
                            ├── reboot
                            ├── service validation
                            ├── application tests
                            └── performance checks
                            │
                            ▼
                            Production rollout
                            │
                            ├── one node first
                            ├── monitor
                            ├── continue rollout
                            └── keep rollback window

Blue/green alternative

Instead of in-place upgrade:

                            1. Build new Ubuntu LTS image
                            2. Install application stack
                            3. Restore or connect data
                            4. Run smoke tests
                            5. Attach to load balancer
                            6. Shift traffic gradually
                            7. Keep old server as rollback
                            8. Retire old server after validation

                            Often safer for:
                            - web apps
                            - stateless APIs
                            - container hosts
                            - cloud workloads

Production rule: blue/green replacement is often safer than in-place upgrade for application servers. In-place upgrade is more common for carefully controlled single-server or legacy systems.

Commands to identify version, support and upgrade state

Version inspection

# Ubuntu version
                            lsb_release -a

                            # OS release metadata
                            cat /etc/os-release

                            # Kernel version
                            uname -a

                            # Host and OS summary
                            hostnamectl

                            # Architecture
                            dpkg --print-architecture

                            # Check codename only
                            lsb_release -cs

Package maintenance

# Refresh package indexes
                            sudo apt update

                            # Show upgradeable packages
                            apt list --upgradable

                            # Upgrade installed packages
                            sudo apt upgrade

                            # Full upgrade with dependency changes
                            sudo apt full-upgrade

                            # Remove unused packages
                            sudo apt autoremove

                            # Check held packages
                            apt-mark showhold

Reboot and upgrade readiness

# Check if reboot is required
                            test -f /var/run/reboot-required && cat /var/run/reboot-required

                            # See packages requiring reboot if available
                            cat /var/run/reboot-required.pkgs 2>/dev/null

                            # Check disk space before upgrades
                            df -h

                            # Check package manager locks
                            ps aux | grep -E 'apt|dpkg'

                            # Repair interrupted package operation
                            sudo dpkg --configure -a
                            sudo apt -f install

Release upgrade

# Install release upgrade tool if missing
                            sudo apt install update-manager-core

                            # Check release upgrader configuration
                            cat /etc/update-manager/release-upgrades

                            # Start release upgrade
                            sudo do-release-upgrade

                            # Server session safety
                            sudo apt install screen
                            screen -S upgrade
                            sudo do-release-upgrade

Warning: do not run a production release upgrade without backup, snapshot, staging test, disk-space check and rollback plan.

Ubuntu versions in cloud images, AMIs and automation

In cloud environments, Ubuntu versioning becomes part of your infrastructure standard. Teams usually define a base image: Ubuntu LTS version, packages, users, SSH hardening, monitoring agent, logging agent, cloud-init behavior and security baseline.

Cloud concept	Ubuntu version impact	Best practice
AMI / image	Defines OS baseline and package versions.	Use approved LTS image family.
cloud-init	Bootstraps users, packages and config.	Test with target LTS version.
Terraform	References image IDs or filters.	Avoid unpinned surprise changes in production.
Golden image	Pre-baked hardened server template.	Rebuild regularly with patches.
Autoscaling	New nodes inherit image baseline.	Validate image before rollout.
Patch management	Images age quickly if not rebuilt.	Rebuild and replace, not only patch in place.

Cloud image lifecycle

Official Ubuntu LTS cloud image
                            │
                            ▼
                            Golden image pipeline
                            │
                            ├── install baseline packages
                            ├── configure SSH
                            ├── add monitoring agent
                            ├── apply security hardening
                            ├── apply updates
                            └── run validation tests
                            │
                            ▼
                            Approved image
                            │
                            ├── used by Terraform
                            ├── used by autoscaling groups
                            ├── used by Kubernetes nodes
                            └── used by application servers
                            │
                            ▼
                            Periodic rebuild
                            ├── security patches
                            ├── config changes
                            └── new point release

Cloud version rules

Recommended:
                            - use LTS for production cloud VMs
                            - pin or control image selection
                            - rebuild images regularly
                            - test cloud-init on target release
                            - document image version
                            - keep rollback image available
                            - avoid unmanaged snowflake servers

                            Avoid:
                            - latest image without validation
                            - random Ubuntu versions across fleet
                            - old images with no patch process
                            - manual changes after boot with no automation

Cloud rule: Ubuntu version is part of the infrastructure contract. Treat it like code: pinned, reviewed, tested and rolled out progressively.

Version-related risks and anti-patterns

Anti-pattern	Risk	Correction
Using interim release for long-lived production	Support ends quickly, forced upgrade under pressure.	Use LTS for production.
No inventory of Ubuntu versions	Unsupported servers remain hidden.	Maintain fleet inventory.
Ignoring release notes	Breaking changes surprise production.	Review release notes before upgrade.
Mixing many versions randomly	Automation, debugging and support become harder.	Define approved baselines.
No rollback image	Failed upgrade becomes long outage.	Snapshot or blue/green rollout.
Third-party repositories unmanaged	Upgrade conflicts and broken packages.	Audit external apt sources.
Kernel upgrade without reboot plan	Security patch is installed but not active.	Track reboot-required state.
Old LTS kept forever	Security and compliance risk grows.	Plan migration or use ESM as a temporary bridge.

Version risk decision tree

Server has old Ubuntu version
                            │
                            ▼
                            Is it still in standard support?
                            ├── yes
                            │   ├── keep patched
                            │   └── plan future migration
                            │
                            └── no
                            │
                            ▼
                            Is ESM / Pro enabled and valid?
                            ├── yes
                            │   ├── use as temporary bridge
                            │   └── plan upgrade
                            │
                            └── no
                            │
                            ▼
                            Risk is high
                            ├── isolate if necessary
                            ├── snapshot
                            ├── test upgrade path
                            └── migrate or retire

Upgrade failure symptoms

After upgrade, check:
                            - service fails to start
                            - port no longer listens
                            - Python or PHP version changed
                            - OpenSSL behavior changed
                            - Nginx config warning becomes fatal
                            - database extension mismatch
                            - kernel module missing
                            - firewall rule behavior changed
                            - DNS resolution changed
                            - cloud-init or network config changed

Production rule: a release upgrade is not just an apt upgrade. It is a platform change and must be tested like one.

Production checklist for Ubuntu version strategy

Version governance checklist

[ ] Approved Ubuntu LTS baseline is defined
                            [ ] Interim releases require explicit exception
                            [ ] Fleet inventory contains Ubuntu version
                            [ ] Fleet inventory contains kernel version
                            [ ] Support end dates are tracked
                            [ ] Old LTS migration plan exists
                            [ ] ESM / Pro usage is documented if used
                            [ ] Golden images are versioned
                            [ ] Cloud image selection is controlled
                            [ ] Third-party apt repositories are inventoried
                            [ ] Release notes are reviewed before upgrade
                            [ ] Staging upgrade test is mandatory
                            [ ] Rollback method is documented
                            [ ] Reboot policy exists for kernel updates
                            [ ] Patch cadence is documented

Minimum production baseline

Production Ubuntu baseline:
                            - LTS release
                            - latest approved point image
                            - security updates enabled
                            - patch window defined
                            - reboot policy defined
                            - monitored support end date
                            - standard package repositories
                            - controlled third-party repositories
                            - backup/snapshot before major upgrade
                            - staging validation before production rollout

Final decision summary

Question	Answer
What should I use for production?	Ubuntu LTS.
Should I use the latest interim release on a server?	Only for a short-lived or explicitly justified case.
Should I standardize versions?	Yes, define one or two approved LTS baselines.
Should I upgrade in place?	Only with backup, staging test and rollback plan.
Is ESM a replacement for upgrading?	No, it is usually a bridge for long-lived systems.
What matters most?	Support horizon, patching, compatibility and rollback.

Final rule

Ubuntu version strategy is production risk management.
Choose LTS for stability, track support dates, patch regularly, test upgrades in staging, keep rollback images, and never let unsupported servers become invisible infrastructure.

7.3 Ubuntu Terminal & BASH: shell, file commands, sudo, chmod, chown and operational reflexes

Why the terminal matters

The terminal is the fastest and most precise way to operate Ubuntu. It gives direct access to files, processes, services, logs, permissions, packages, networking, storage and automation. On a server, there is often no graphical interface: SSH plus terminal is the normal administration model.

BASH is the default command-line shell on many Ubuntu systems. It lets you run commands, chain them, inspect output, redirect logs, write scripts and automate repeatable tasks. A developer who understands BASH can deploy, debug and operate systems more effectively.

Use case	Terminal advantage	Example
Server administration	Works remotely over SSH.	`ssh deploy@server`
Debugging	Direct logs and service state.	`journalctl -u nginx`
File operations	Fast navigation, copy, move, search.	`find /var/log -name "*.log"`
Automation	Repeatable scripts.	`backup.sh`, `deploy.sh`
Security	Precise control of users and permissions.	`chmod`, `chown`, `sudo`
Performance	Immediate resource inspection.	`top`, `df -h`, `free -h`

Core rule: the terminal is not “old school”; it is the professional control plane for Linux systems, servers, cloud VMs, containers and automation.

Terminal control map

Ubuntu terminal
                            │
                            ├── Files
                            │       ├── ls
                            │       ├── cd
                            │       ├── pwd
                            │       ├── cp
                            │       ├── mv
                            │       └── rm
                            │
                            ├── Text and search
                            │       ├── cat
                            │       ├── less
                            │       ├── head
                            │       ├── tail
                            │       ├── grep
                            │       └── find
                            │
                            ├── Permissions
                            │       ├── sudo
                            │       ├── chmod
                            │       ├── chown
                            │       ├── groups
                            │       └── id
                            │
                            ├── System operations
                            │       ├── systemctl
                            │       ├── journalctl
                            │       ├── apt
                            │       └── ssh
                            │
                            └── Automation
                            ├── variables
                            ├── pipes
                            ├── redirects
                            ├── loops
                            └── scripts

Mental model

Command anatomy:
                            command [options] [arguments]

                            Examples:
                            ls -lah /var/log
                            cp -a source destination
                            rm old-file.log
                            sudo systemctl restart nginx

                            Where:
                            - command   = program to run
                            - options   = behavior modifiers
                            - arguments = files, directories, services, values

BASH basics: prompt, paths, history, completion, pipes and redirects

BASH is both an interactive shell and a scripting language. It receives commands, expands variables, resolves paths, runs programs, connects outputs to inputs and lets you automate tasks through scripts.

Concept	Meaning	Example
Prompt	Where you type commands.	`user@host:~$`
Home directory	Your personal directory.	`~`, `/home/deploy`
Current directory	Where commands operate by default.	`pwd`
Absolute path	Path from root `/`.	`/var/log/syslog`
Relative path	Path from current directory.	`../backup`
History	Previous commands.	`history`
Tab completion	Auto-complete command or path.	Press `TAB`

BASH essentials

# Show current directory
                            pwd

                            # Show current user
                            whoami

                            # Show command history
                            history

                            # Clear screen
                            clear

                            # Show current shell
                            echo $SHELL

                            # Show environment variables
                            env

                            # Show PATH
                            echo $PATH

                            # Show command location
                            which bash
                            which python3
                            which nginx

Pipes and redirects

# Pipe output to another command
                            ps aux | grep nginx

                            # Redirect output to a file
                            ls -lah /var/log > files.txt

                            # Append output to a file
                            date >> audit.log

                            # Redirect errors too
                            command > output.log 2> error.log

                            # Redirect output and errors together
                            command > all.log 2>&1

                            # View long output page by page
                            journalctl -u nginx | less

Useful keyboard shortcuts

Shortcut	Action
`TAB`	Complete command or filename.
`Ctrl + C`	Interrupt current command.
`Ctrl + L`	Clear screen.
`Ctrl + R`	Search command history.
`Ctrl + A`	Move to beginning of line.
`Ctrl + E`	Move to end of line.

Efficiency rule: use history search and tab completion constantly. They reduce typing errors and speed up operations.

Navigation: pwd, ls, cd and filesystem orientation

Navigation is the first terminal skill. You need to know where you are, what files are present, how to move between directories and how to distinguish absolute and relative paths.

Command	Purpose	Example
`pwd`	Print current directory.	`pwd`
`ls`	List files.	`ls`
`ls -lah`	Detailed list, hidden files, human sizes.	`ls -lah /etc`
`cd`	Change directory.	`cd /var/log`
`cd ..`	Move to parent directory.	`cd ..`
`cd ~`	Move to home directory.	`cd ~`
`cd -`	Return to previous directory.	`cd -`

Navigation examples

# Where am I?
                            pwd

                            # List current directory
                            ls

                            # Detailed list with hidden files
                            ls -lah

                            # Go to logs
                            cd /var/log

                            # Go home
                            cd ~

                            # Go one level up
                            cd ..

                            # Go to previous directory
                            cd -

                            # List directory without entering it
                            ls -lah /etc/nginx

Ubuntu filesystem map

/
                            ├── etc      system configuration
                            ├── home     user home directories
                            ├── var      logs, cache, databases, runtime data
                            ├── srv      service/application data
                            ├── opt      optional third-party software
                            ├── usr      installed programs and libraries
                            ├── tmp      temporary files
                            ├── boot     bootloader and kernel files
                            ├── dev      device files
                            ├── proc     process and kernel virtual filesystem
                            └── root     root user's home directory

Path examples

Absolute paths:
                            - /etc/nginx/nginx.conf
                            - /var/log/syslog
                            - /srv/myapp
                            - /home/deploy/.ssh/authorized_keys

                            Relative paths:
                            - ./script.sh
                            - ../backup
                            - logs/app.log
                            - ../../etc/example.conf

                            Special paths:
                            - .  current directory
                            - .. parent directory
                            - ~  current user's home directory
                            - /  filesystem root

Path warning: commands like rm, chmod and chown depend heavily on the path you give. Always verify with pwd and ls before destructive actions.

File operations: cp, mv, rm, mkdir, touch and safe handling

File operations are powerful and dangerous. Copying, moving and deleting files from the terminal is fast, but usually does not ask for confirmation unless you request it. In production, create backups before editing or deleting configuration files.

Command	Purpose	Safe example
`cp`	Copy files.	`cp file.txt file.bak`
`cp -a`	Copy preserving metadata.	`cp -a /etc/nginx /etc/nginx.bak`
`mv`	Move or rename.	`mv app.conf app.conf.disabled`
`rm`	Remove file.	`rm old.log`
`mkdir`	Create directory.	`mkdir -p /srv/myapp/logs`
`touch`	Create empty file or update timestamp.	`touch deploy.log`

File operation examples

# Create a directory tree
                            mkdir -p /srv/myapp/releases

                            # Create an empty file
                            touch /tmp/test.txt

                            # Copy a file
                            cp config.ini config.ini.bak

                            # Copy a directory with attributes
                            cp -a /etc/nginx /etc/nginx.bak.$(date +%Y%m%d-%H%M%S)

                            # Rename a file
                            mv old.conf new.conf

                            # Move a file to backup directory
                            mv app.log /tmp/app.log.bak

                            # Remove a file
                            rm old-file.txt

Danger zone: rm

# Remove one file
                            rm file.txt

                            # Ask before deleting
                            rm -i file.txt

                            # Remove directory recursively
                            rm -r directory

                            # Force recursive delete - dangerous
                            rm -rf directory

                            # Extremely dangerous if path is wrong
                            sudo rm -rf /some/path

Safe deletion workflow

Before deleting:
                            1. Show current directory
                            pwd

                            2. List target
                            ls -lah target

                            3. Check size if directory
                            du -sh target

                            4. Move to quarantine first
                            mv target /tmp/target.to-delete

                            5. Verify service still works

                            6. Delete later if safe

Backup-before-edit pattern

# Backup config before edit
                            sudo cp -a /etc/nginx/nginx.conf \
                            /etc/nginx/nginx.conf.bak.$(date +%Y%m%d-%H%M%S)

                            # Edit file
                            sudo vim /etc/nginx/nginx.conf

                            # Validate before reload
                            sudo nginx -t

                            # Reload if valid
                            sudo systemctl reload nginx

Production rule: prefer moving risky files to a quarantine location before deleting. Deletion is not a rollback strategy.

Read, inspect and search files: cat, less, head, tail, grep, find

Reading and searching files is a core Linux skill. Logs, configuration, service units, environment files and scripts are plain text. The right command depends on file size and whether you need the beginning, the end, live follow or keyword search.

Command	Best for	Example
`cat`	Small files.	`cat /etc/os-release`
`less`	Large files, page navigation.	`less /var/log/syslog`
`head`	First lines of a file.	`head -50 app.log`
`tail`	Last lines of a file.	`tail -100 app.log`
`tail -f`	Follow a log live.	`tail -f /var/log/syslog`
`grep`	Search text.	`grep -i error app.log`
`find`	Find files by name, size, age.	`find /var/log -name "*.log"`

Read commands

# Small file
                            cat /etc/os-release

                            # Large file, scroll
                            less /var/log/syslog

                            # First lines
                            head -50 /var/log/syslog

                            # Last lines
                            tail -100 /var/log/syslog

                            # Follow live
                            tail -f /var/log/syslog

                            # Number lines
                            nl config.ini | less

Search examples

# Case-insensitive search
                            grep -i "error" app.log

                            # Search recursively
                            grep -R "server_name" /etc/nginx

                            # Show line numbers
                            grep -n "listen" /etc/nginx/sites-enabled/*

                            # Exclude noisy files
                            grep -R "DEBUG" /srv/myapp --exclude="*.pyc"

                            # Search compressed logs
                            zgrep -i "error" /var/log/syslog.*.gz

                            # Find files by name
                            find /etc -name "*.conf"

                            # Find large files
                            find /var -type f -size +100M -exec ls -lh {} \;

                            # Find recently modified files
                            find /etc -type f -mtime -2 -ls

Log reading pattern

When debugging logs:
                            1. Identify time window
                            2. Read service-specific logs first
                            3. Search for first real error
                            4. Correlate with recent deploy/update
                            5. Avoid reading huge files without filters

                            Examples:
                            journalctl -u nginx --since "30 min ago"
                            grep -i "permission denied" app.log
                            grep -i "connection refused" app.log

Reading rule: use less for large files, tail for recent lines, grep for patterns, and find for unknown locations.

sudo: super-user privileges and safe administration

sudo runs a command with elevated privileges, usually as root. On Ubuntu, normal users do not directly administer protected system areas. Instead, trusted users are added to the sudo group and elevate only when needed.

Command	Meaning	Example
`sudo command`	Run one command as root.	`sudo apt update`
`sudo -l`	List allowed sudo commands.	`sudo -l`
`sudo -u user command`	Run command as another user.	`sudo -u postgres psql`
`sudo -i`	Start root login shell.	Use rarely and carefully.
`visudo`	Edit sudoers safely.	`sudo visudo`

sudo examples

# Update packages
                            sudo apt update

                            # Edit protected config
                            sudo vim /etc/ssh/sshd_config

                            # Restart service
                            sudo systemctl restart nginx

                            # Read protected log
                            sudo tail -100 /var/log/auth.log

                            # Run command as postgres user
                            sudo -u postgres psql

                            # Check your sudo privileges
                            sudo -l

sudo mental model

Normal user
                            │
                            ├── can read/write own files
                            ├── cannot edit system files
                            ├── cannot restart system services
                            └── cannot install packages
                            │
                            ▼
                            sudo
                            │
                            ├── asks for authentication
                            ├── checks sudoers policy
                            ├── logs action
                            └── runs command with elevated privilege

User sudo setup

# Create user
                            sudo adduser deploy

                            # Add to sudo group
                            sudo usermod -aG sudo deploy

                            # Check group membership
                            groups deploy

                            # Show sudo group
                            getent group sudo

                            # Edit sudoers safely
                            sudo visudo

sudo safety rules

Do:
                            - use sudo for specific commands
                            - keep named admin users
                            - review sudo group members
                            - use visudo for sudoers edits
                            - log administrative changes

                            Avoid:
                            - logging in directly as root
                            - running long sessions as root
                            - using sudo with unknown scripts
                            - using sudo rm -rf without path verification
                            - granting sudo to every user

Security warning: sudo is not just “permission accepted”. It is root-level control. Treat every sudo command as potentially system-changing.

chmod: file permission modes and practical examples

chmod changes file permissions. Linux permissions are split into three groups: owner, group and others. Each can have read, write and execute permissions. For directories, execute means “can enter/traverse”.

Permission notation

Example:
                            -rw-r--r-- 1 root root 1200 app.conf

                            Breakdown:
                            -       file type
                            rw-     owner permissions
                            r--     group permissions
                            r--     others permissions

                            r = read
                            w = write
                            x = execute / enter directory

Mode	Meaning	Typical use
`600`	Owner read/write only.	Private keys, secret files.
`640`	Owner read/write, group read.	App env files readable by service group.
`644`	Owner write, everyone read.	Normal config or static files.
`700`	Owner full access only.	`.ssh` directory.
`755`	Owner write, everyone read/execute.	Directories and public scripts.
`777`	Everyone can read/write/execute.	Almost never acceptable.

chmod examples

# Normal file readable by everyone, writable by owner
                            chmod 644 config.ini

                            # Directory accessible by everyone, writable by owner
                            chmod 755 /srv/myapp

                            # Private SSH directory
                            chmod 700 ~/.ssh

                            # Private SSH key
                            chmod 600 ~/.ssh/id_ed25519

                            # Authorized keys
                            chmod 600 ~/.ssh/authorized_keys

                            # Make script executable
                            chmod +x deploy.sh

                            # Remove write access for group and others
                            chmod go-w file.txt

Numeric mode logic

Permission values:
                            r = 4
                            w = 2
                            x = 1

                            Examples:
                            7 = 4 + 2 + 1 = rwx
                            6 = 4 + 2     = rw-
                            5 = 4 + 1     = r-x
                            4 = 4         = r--

                            chmod 755:
                            owner = 7 = rwx
                            group = 5 = r-x
                            other = 5 = r-x

                            chmod 640:
                            owner = 6 = rw-
                            group = 4 = r--
                            other = 0 = ---

Permission troubleshooting

# Show permissions
                            ls -lah file

                            # Show full path permissions
                            namei -l /srv/myapp/current/.env

                            # Show current user groups
                            id

                            # Test as service user
                            sudo -u myapp cat /srv/myapp/.env

chmod rule: do not use chmod 777 to “fix” access. It hides the real ownership problem and creates a security risk.

chown: file ownership, groups and service users

chown changes file owner and group. Many permission errors are not caused by missing chmod, but by wrong ownership. Services such as Nginx, Gunicorn, PostgreSQL or application workers must be able to read the files they need, but should not own everything as root.

Command	Meaning	Example
`chown user file`	Change owner.	`sudo chown deploy app.log`
`chown user:group file`	Change owner and group.	`sudo chown deploy:www-data app`
`chown :group file`	Change group only.	`sudo chown :www-data static`
`chown -R`	Recursive ownership change.	Use carefully on directories.
`id user`	Show UID, GID and groups.	`id myapp`

chown examples

# Change one file owner
                            sudo chown deploy app.log

                            # Change owner and group
                            sudo chown deploy:www-data /srv/myapp

                            # Change group only
                            sudo chown :www-data /srv/myapp/static

                            # Recursive change, use carefully
                            sudo chown -R deploy:www-data /srv/myapp

                            # App env file owned by root, readable by app group
                            sudo chown root:myapp /srv/myapp/.env
                            sudo chmod 640 /srv/myapp/.env

Ownership model for web app

/srv/myapp
                            │
                            ├── code files
                            │       ├── owner: deploy
                            │       └── group: www-data
                            │
                            ├── static files
                            │       ├── readable by nginx
                            │       └── not writable by public users
                            │
                            ├── .env secrets
                            │       ├── owner: root
                            │       ├── group: myapp
                            │       └── mode: 640
                            │
                            └── runtime logs/uploads
                            ├── owner: myapp
                            └── controlled write access

Service user checks

# Show service user in unit file
                            systemctl cat myapp

                            # Show process user
                            ps aux | grep gunicorn

                            # Check user groups
                            id myapp

                            # Check path permissions
                            namei -l /srv/myapp/current/.env

                            # Test access as service user
                            sudo -u myapp test -r /srv/myapp/.env && echo readable

Common ownership mistakes

Mistake	Consequence	Better approach
Everything owned by root	App cannot write needed runtime files.	Use service-specific owner/group.
Everything owned by app user	App can modify its own code/secrets.	Separate code, secrets and runtime dirs.
Recursive chown on wrong path	System or app permissions broken.	Verify path with `pwd` and `ls`.
Using chmod instead of chown	Permissions become too broad.	Fix ownership first.

Ownership rule: permissions answer “what can be done”; ownership answers “who controls the file”. Both must be correct.

Safety patterns: avoid destructive mistakes

The terminal is powerful because it does exactly what you ask. That also makes it dangerous. Professional terminal usage means verifying targets, backing up before edits, validating configs before restart and avoiding irreversible commands when a reversible action is possible.

Risky action	Safer pattern	Reason
Delete directly with `rm -rf`	Move to quarantine first.	Allows rollback.
Edit config without backup	`cp -a file file.bak.DATE`	Easy restore.
Restart service blindly	Validate config and read logs first.	Avoid making outage worse.
Recursive chmod/chown on broad path	Check target with `pwd`, `ls`, `du`.	Prevents system-wide damage.
Run unknown script with sudo	Download, inspect, verify, then run.	Supply-chain safety.
Disable SSH password auth immediately	Test SSH key in second terminal first.	Prevents lockout.

Safe config edit workflow

# 1. Backup
                            sudo cp -a /etc/ssh/sshd_config \
                            /etc/ssh/sshd_config.bak.$(date +%Y%m%d-%H%M%S)

                            # 2. Edit
                            sudo vim /etc/ssh/sshd_config

                            # 3. Validate
                            sudo sshd -t

                            # 4. Restart only if valid
                            sudo systemctl restart ssh

                            # 5. Check logs
                            journalctl -u ssh --since "10 min ago"

Dangerous command patterns

Dangerous:
                            sudo rm -rf /
                            sudo rm -rf *
                            sudo chown -R user:user /
                            sudo chmod -R 777 /
                            sudo chmod -R 777 /var/www
                            curl URL | sudo bash
                            sudo mv /etc /tmp
                            docker compose down -v

                            Safer:
                            - verify path first
                            - backup first
                            - move instead of delete
                            - inspect scripts
                            - target exact directory
                            - keep rollback possible

Pre-flight checklist

Before destructive command:
                            [ ] Am I on the right server?
                            [ ] Am I in the right directory?
                            [ ] Did I list the target?
                            [ ] Did I check the size?
                            [ ] Do I have a backup?
                            [ ] Can I rollback?
                            [ ] Is the command scoped enough?
                            [ ] Am I using sudo unnecessarily?
                            [ ] Did I understand wildcard expansion?
                            [ ] Did I test on staging if production?

Know your context

# Confirm server
                            hostnamectl

                            # Confirm user
                            whoami

                            # Confirm directory
                            pwd

                            # Confirm target
                            ls -lah target

                            # Confirm disk and space
                            df -h
                            du -sh target

                            # Confirm command before sudo
                            echo sudo rm -rf target

Safety rule: the most dangerous Linux commands are short, recursive and executed with sudo. Slow down before using them.

Terminal and permissions cheat sheet

Fundamental commands

# Navigation
                            pwd
                            ls
                            ls -lah
                            cd /path
                            cd ..
                            cd ~
                            cd -

                            # Files
                            cp file file.bak
                            cp -a dir dir.bak
                            mv old new
                            rm file
                            rm -i file
                            mkdir -p path/to/dir
                            touch file

                            # Read and search
                            cat file
                            less file
                            head -50 file
                            tail -100 file
                            tail -f file
                            grep -i "error" file
                            find /path -name "*.log"

                            # Context
                            whoami
                            id
                            hostnamectl
                            history
                            which command

sudo, chmod, chown cheat sheet

# sudo
                            sudo apt update
                            sudo systemctl restart nginx
                            sudo -l
                            sudo -u postgres psql
                            sudo visudo

                            # chmod
                            chmod 644 file
                            chmod 640 secret.env
                            chmod 755 directory
                            chmod 700 ~/.ssh
                            chmod 600 ~/.ssh/id_ed25519
                            chmod +x script.sh

                            # chown
                            sudo chown user file
                            sudo chown user:group file
                            sudo chown :group file
                            sudo chown -R user:group directory

                            # Diagnose permissions
                            ls -lah file
                            namei -l /path/to/file
                            id user
                            groups user

Final rule

Mastering the Ubuntu terminal means mastering control with discipline.
Use BASH to navigate, inspect, modify, automate and troubleshoot. Use sudo only when required. Use chmod for permissions, chown for ownership, and always verify the target before destructive commands.

Minimal professional reflexes

[ ] I know where I am with pwd
                            [ ] I inspect before changing with ls -lah
                            [ ] I backup config files before editing
                            [ ] I validate configs before restart
                            [ ] I avoid chmod 777
                            [ ] I understand sudo impact
                            [ ] I test SSH access before hardening
                            [ ] I move risky files before deleting
                            [ ] I use logs before restarting blindly
                            [ ] I document production changes

7.4 Ubuntu Maintenance & Security: updates, UFW, Timeshift, backups, logs and system recovery

Maintenance and security objective

Ubuntu maintenance is the set of recurring actions that keep a system secure, stable, recoverable and understandable. It includes package updates, security patching, reboot planning, firewall control, restore points, backups, log review and incident diagnosis.

On a desktop, maintenance protects the user from data loss and broken upgrades. On a server, maintenance protects services from outages, vulnerabilities, full disks, misconfiguration and unrecoverable incidents.

Area	Purpose	Main tools	Failure prevented
System updates	Apply fixes and security patches.	`apt`, Software Updater, unattended upgrades.	Known vulnerabilities, outdated packages.
Firewall	Limit network exposure.	`ufw`, security groups, router firewall.	Open services reachable from outside.
Restore points	Return system state after bad change.	Timeshift, snapshots.	Broken updates, bad configuration.
Backups	Protect personal or business data.	rsync, external disk, cloud backup, database dumps.	Data loss, disk failure, accidental deletion.
Logs	Understand what happened.	`journalctl`, `/var/log`, app logs.	Blind troubleshooting and repeated incidents.
Routine checks	Detect problems before they grow.	`df`, `systemctl`, `journalctl`.	Full disk, failed services, unnoticed errors.

Core rule: maintenance is not a one-time setup. It is a routine: update, verify, protect, observe and document.

Maintenance architecture map

Ubuntu maintenance
                            │
                            ├── Updates
                            │       ├── apt update
                            │       ├── apt upgrade
                            │       ├── security fixes
                            │       └── reboot policy
                            │
                            ├── Firewall
                            │       ├── default deny incoming
                            │       ├── allow required services
                            │       ├── restrict SSH
                            │       └── review open ports
                            │
                            ├── Recovery
                            │       ├── Timeshift restore points
                            │       ├── backups
                            │       ├── snapshots
                            │       └── restore testing
                            │
                            ├── Logs
                            │       ├── journalctl
                            │       ├── auth logs
                            │       ├── system logs
                            │       └── app logs
                            │
                            └── Routine
                            ├── weekly checks
                            ├── monthly cleanup
                            ├── update review
                            └── documentation

Desktop vs server emphasis

Context	Priority	Example
Desktop	Restore points, data backup, safe updates.	Timeshift before big upgrade.
Server	Security patches, firewall, monitoring, backups.	Patch window and reboot plan.
Cloud VM	Snapshots, security groups, logs, replacement.	AMI or EBS snapshot before change.
Developer workstation	Tool updates, project backup, SSH keys.	Backup home and dotfiles.

System updates: why and how to perform them

System updates fix bugs, close security vulnerabilities, improve hardware support and keep installed packages consistent with the Ubuntu release. Updates should be frequent enough to reduce exposure, but controlled enough to avoid surprise downtime on important machines.

Command	Purpose	When to use
`sudo apt update`	Refresh package metadata.	Before installing or upgrading packages.
`apt list --upgradable`	Show available upgrades.	Before applying updates.
`sudo apt upgrade`	Upgrade packages safely without removals.	Regular maintenance.
`sudo apt full-upgrade`	Upgrade with dependency changes.	When upgrade requires installs/removals.
`sudo apt autoremove`	Remove unused dependencies.	After upgrades or package removals.
`sudo apt clean`	Clean package cache.	When disk cleanup is needed.

Standard update flow

# 1. Refresh package metadata
                            sudo apt update

                            # 2. Review available upgrades
                            apt list --upgradable

                            # 3. Apply regular upgrades
                            sudo apt upgrade

                            # 4. Remove unused packages
                            sudo apt autoremove

                            # 5. Check if reboot is required
                            test -f /var/run/reboot-required && cat /var/run/reboot-required

                            # 6. Verify system state
                            systemctl --failed
                            journalctl -p warning --since "30 min ago"

Update decision diagram

Updates available
                            │
                            ├── Desktop workstation?
                            │       ├── create Timeshift snapshot if major change
                            │       ├── apply updates
                            │       └── reboot if required
                            │
                            ├── Production server?
                            │       ├── review packages
                            │       ├── confirm backup/snapshot
                            │       ├── test staging if critical
                            │       ├── schedule maintenance window
                            │       └── apply and verify
                            │
                            └── Cloud VM?
                            ├── snapshot or image
                            ├── patch
                            ├── reboot if required
                            └── validate health checks

Reboot-required checks

# Check if reboot is required
                            test -f /var/run/reboot-required && cat /var/run/reboot-required || echo "no reboot flag"

                            # Show packages that requested reboot if available
                            cat /var/run/reboot-required.pkgs 2>/dev/null

                            # Current kernel
                            uname -a

                            # Boot time
                            uptime
                            last reboot | head

Graphical update path

Ubuntu Desktop
                            │
                            ├── Open Software Updater
                            ├── Review proposed updates
                            ├── Install updates
                            ├── Reboot if requested
                            └── Verify desktop and main apps

Update warning: a kernel update is not active until reboot. Always check /var/run/reboot-required after maintenance.

Update strategy: safe patching, unattended upgrades and rollback

A good update strategy balances speed and safety. Security updates should not be postponed indefinitely, but critical systems need backups, staging tests and rollback paths. The more important the machine, the more controlled the update process must be.

Strategy	Best for	Strength	Risk
Manual updates	Personal desktop, small servers.	Human review before changes.	Can be forgotten.
Unattended security upgrades	Standard servers.	Faster security patching.	Needs reboot policy.
Scheduled patch window	Production systems.	Predictable maintenance.	Emergency patches still need fast track.
Snapshot before update	Desktop, VM, cloud instances.	Rollback-friendly.	Snapshot does not replace data backup.
Blue/green replacement	Cloud application servers.	Safer than in-place update.	Requires automation.

Unattended upgrades

# Install unattended upgrades
                            sudo apt update
                            sudo apt install unattended-upgrades

                            # Enable basic automatic security updates
                            sudo dpkg-reconfigure unattended-upgrades

                            # Main configuration files
                            /etc/apt/apt.conf.d/20auto-upgrades
                            /etc/apt/apt.conf.d/50unattended-upgrades

                            # Logs
                            sudo less /var/log/unattended-upgrades/unattended-upgrades.log

Safe production patch workflow

Patch workflow
                            │
                            ├── Inventory
                            │       ├── OS version
                            │       ├── kernel version
                            │       ├── critical packages
                            │       └── running services
                            │
                            ├── Protect
                            │       ├── backup
                            │       ├── snapshot
                            │       ├── Timeshift on desktop
                            │       └── rollback plan
                            │
                            ├── Apply
                            │       ├── apt update
                            │       ├── review upgrades
                            │       ├── apt upgrade
                            │       └── reboot if required
                            │
                            └── Verify
                            ├── systemctl --failed
                            ├── logs
                            ├── ports
                            ├── app health
                            └── user validation

Post-update validation

# Failed services
                            systemctl --failed

                            # Recent warnings
                            journalctl -p warning --since "30 min ago"

                            # Listening ports
                            ss -lntp

                            # Disk and memory
                            df -h
                            free -h

                            # Web smoke test
                            curl -I http://localhost

                            # Package history
                            less /var/log/apt/history.log

Update risk matrix

Update type	Risk	Control
Kernel	Requires reboot, driver risk.	Snapshot and reboot window.
OpenSSL / libc	Service restart may be needed.	Restart affected services.
Database packages	Service compatibility.	Backup and staging test.
Nginx / SSH	Access or web outage if config breaks.	Validate config before restart.

Maintenance rule: update strategy is risk management. The critical question is not only “can I update?”, but “can I recover if the update fails?”.

Basic security with UFW firewall

UFW is Ubuntu’s simple firewall interface. It helps expose only the ports required by the machine. A safe default is to deny incoming traffic, allow outgoing traffic, then explicitly allow SSH, web traffic or other required services.

Port	Service	Typical exposure	Comment
`22/tcp`	SSH	Restricted source IP if possible.	Administration access.
`80/tcp`	HTTP	Public only for web server or redirect.	Often redirects to HTTPS.
`443/tcp`	HTTPS	Public for web application.	Main public web port.
`3306/tcp`	MySQL / MariaDB	Private only.	Never expose casually.
`5432/tcp`	PostgreSQL	Private only.	Restrict to app server.
`6379/tcp`	Redis	Private only.	Should not be public.

UFW baseline

# Check current firewall status
                            sudo ufw status verbose

                            # Default policies
                            sudo ufw default deny incoming
                            sudo ufw default allow outgoing

                            # Allow SSH before enabling firewall
                            sudo ufw allow OpenSSH

                            # Allow web traffic if needed
                            sudo ufw allow 80/tcp
                            sudo ufw allow 443/tcp

                            # Enable firewall
                            sudo ufw enable

                            # Verify rules
                            sudo ufw status verbose
                            sudo ufw status numbered

Firewall decision diagram

New service installed
                            │
                            ├── Does it need network access?
                            │       ├── no -> keep local only
                            │       └── yes
                            │
                            ├── Should it be public?
                            │       ├── yes -> open exact required port
                            │       └── no
                            │
                            ├── Should it be private?
                            │       ├── yes -> restrict by source IP or subnet
                            │       └── no
                            │
                            └── Is rule documented?
                            ├── yes -> apply rule
                            └── no -> do not expose yet

Restrict by source

# Allow SSH from one admin IP
                            sudo ufw allow from 203.0.113.10 to any port 22 proto tcp

                            # Allow PostgreSQL from one app server
                            sudo ufw allow from 10.0.1.25 to any port 5432 proto tcp

                            # Delete a numbered rule
                            sudo ufw status numbered
                            sudo ufw delete 3

                            # Deny a specific IP
                            sudo ufw deny from 198.51.100.44

UFW troubleshooting

# Firewall status
                            sudo ufw status verbose

                            # Listening ports
                            ss -lntp

                            # Local service test
                            curl -I http://localhost

                            # Kernel firewall logs if enabled
                            sudo ufw logging on
                            sudo journalctl -k --since "30 min ago" | grep UFW

Firewall warning: always allow and test SSH before enabling or tightening UFW on a remote server.

Timeshift: system restore points for safer changes

Timeshift creates system restore points. It is useful on Ubuntu Desktop and some workstation scenarios before major updates, driver changes, package experiments or risky configuration changes. It is not a full personal-data backup solution by itself: it mainly protects system state.

Timeshift concept	Meaning	Operational note
Snapshot	Restore point of system files.	Useful before upgrades.
RSYNC mode	File-based snapshot mode.	Works on common filesystems.
BTRFS mode	Filesystem snapshot mode.	Requires BTRFS layout.
Schedule	Automatic snapshot frequency.	Daily, weekly, monthly policies.
Restore	Return system to previous state.	Can recover from bad update or config.
Exclusions	Paths not included.	Understand home/data behavior.

Install Timeshift

# Install Timeshift
                            sudo apt update
                            sudo apt install timeshift

                            # Launch graphical interface
                            sudo timeshift-gtk

                            # CLI help
                            timeshift --help

                            # List snapshots
                            sudo timeshift --list

Timeshift workflow

Before risky change
                            │
                            ├── Open Timeshift
                            ├── Create snapshot
                            ├── Name or comment the snapshot
                            ├── Apply update or configuration change
                            ├── Reboot if required
                            ├── Verify system works
                            └── Keep or delete snapshot later

When to create a snapshot

Create a Timeshift snapshot before:
                            - major system update
                            - release upgrade
                            - driver installation
                            - desktop environment change
                            - kernel experiment
                            - repository or PPA experiment
                            - risky configuration edit
                            - important package removal

Timeshift vs backup

Need	Timeshift	Data backup
Restore broken system update	Excellent.	Not primary role.
Recover deleted personal file	Not always sufficient.	Best tool.
Recover from disk failure	Only if snapshot stored elsewhere.	Required.
Recover database state	Not ideal.	Use database backup.

Recovery warning: Timeshift is not a replacement for backups. A restore point helps with system rollback, while backups protect personal or business data.

Backup model: system restore, personal data and server data

A complete protection strategy separates system restore from data backup. Timeshift can help restore the OS state. Personal files, project folders, databases, uploads, secrets and configuration must also be backed up separately.

Data type	Recommended protection	Example path
System files	Timeshift or VM snapshot.	`/etc`, packages, system state.
Personal files	File backup to external disk or cloud.	`/home/user/Documents`
Project code	Git remote and file backup.	`/home/user/projects`
Databases	Database-native dump and volume backup.	PostgreSQL, MySQL, MariaDB.
Application uploads	File backup with retention.	`/srv/app/media`
Secrets	Secure secret backup or vault.	`.env`, keys, certificates.

Simple rsync backup example

# Backup home directory to external disk
                            rsync -aHAX --info=progress2 \
                            /home/user/ \
                            /media/user/backup/home-user/

                            # Backup project directory
                            rsync -a --delete \
                            /srv/myapp/ \
                            /backup/myapp/

                            # Dry run first
                            rsync -a --dry-run /source/ /destination/

Backup strategy diagram

Protection strategy
                            │
                            ├── System restore
                            │       ├── Timeshift
                            │       ├── VM snapshot
                            │       └── cloud image
                            │
                            ├── Data backup
                            │       ├── documents
                            │       ├── projects
                            │       ├── uploads
                            │       └── databases
                            │
                            ├── Configuration backup
                            │       ├── /etc
                            │       ├── service units
                            │       ├── nginx configs
                            │       └── SSH configs
                            │
                            └── Restore test
                            ├── can files be restored?
                            ├── can database be restored?
                            ├── can server boot?
                            └── is procedure documented?

Database backup examples

# PostgreSQL dump
                            pg_dump -U app_user -h localhost app_db > app_db.sql

                            # PostgreSQL compressed dump
                            pg_dump -U app_user -h localhost app_db | gzip > app_db.sql.gz

                            # MySQL / MariaDB dump
                            mysqldump -u app_user -p app_db > app_db.sql

                            # MySQL / MariaDB compressed dump
                            mysqldump -u app_user -p app_db | gzip > app_db.sql.gz

Backup quality checklist

[ ] Backup is automatic
                            [ ] Backup includes data, not only system files
                            [ ] Backup destination is separate from source disk
                            [ ] Backup has retention policy
                            [ ] Backup is encrypted if sensitive
                            [ ] Restore has been tested
                            [ ] Database backups are consistent
                            [ ] Secrets are protected
                            [ ] Backup logs are reviewed
                            [ ] Owner and schedule are documented

Backup rule: a backup is only proven when restore has been tested.

Log management: reading system journals when problems occur

Logs are the first source of truth when Ubuntu behaves unexpectedly. They show service failures, authentication attempts, package operations, kernel events, disk errors, network issues and application errors.

Log source	Contains	Command
systemd journal	Service and system events.	`journalctl`
Service logs	One daemon timeline.	`journalctl -u SERVICE`
Kernel logs	OOM, disk, driver, hardware events.	`journalctl -k`
Authentication logs	SSH, sudo, login attempts.	`/var/log/auth.log`
System log	General system messages.	`/var/log/syslog`
APT logs	Package updates and installs.	`/var/log/apt/history.log`

journalctl essentials

# Recent errors and context
                            journalctl -xe

                            # Current boot logs
                            journalctl -b

                            # Previous boot logs
                            journalctl -b -1

                            # Service logs
                            journalctl -u nginx

                            # Service logs with time window
                            journalctl -u nginx --since "1 hour ago"

                            # Follow service logs live
                            journalctl -u nginx -f

                            # Warnings and errors
                            journalctl -p warning --since today

                            # Kernel logs
                            journalctl -k --since today

Classic log commands

# System log
                            sudo tail -200 /var/log/syslog

                            # Authentication log
                            sudo tail -200 /var/log/auth.log

                            # APT history
                            less /var/log/apt/history.log

                            # Search errors
                            grep -i "error" /var/log/syslog

                            # Search failed SSH attempts
                            sudo grep -i "failed password" /var/log/auth.log | tail -100

                            # Search sudo usage
                            sudo grep -i "sudo" /var/log/auth.log | tail -100

                            # Search OOM events
                            journalctl -k --since today | grep -i -E "oom|killed process"

Log reading workflow

Problem detected
                            │
                            ├── Identify time window
                            ├── Check failed services
                            ├── Read service journal
                            ├── Read system warnings
                            ├── Read kernel logs
                            ├── Check auth logs if access issue
                            ├── Check apt history if after update
                            └── Find first meaningful error

Journal size control

# Show journal disk usage
                            journalctl --disk-usage

                            # Keep only last 14 days
                            sudo journalctl --vacuum-time=14d

                            # Keep journal under 1 GB
                            sudo journalctl --vacuum-size=1G

Log rule: use time windows. --since "30 min ago" is often more useful than reading thousands of old lines.

Troubleshooting maintenance problems

Maintenance can fail: updates may be interrupted, repositories may break, firewall rules may block access, Timeshift snapshots may fill disk space, logs may grow, or services may fail after a package upgrade. Diagnose from the exact symptom.

Symptom	Likely cause	First command	Fix direction
APT locked	Another package process running.	`ps aux \| grep -E 'apt\|dpkg'`	Wait or investigate process.
Broken packages	Interrupted install.	`sudo dpkg --configure -a`	Repair package state.
No network after UFW	Required port blocked.	`sudo ufw status numbered`	Allow required rule or rollback.
SSH locked out	Firewall or SSH config error.	Console access, UFW and SSH status.	Restore SSH path safely.
Disk full	Logs, snapshots, cache, Docker.	`df -h`, `du -sh`	Clean safely and add retention.
Service failed after update	Config change or dependency issue.	`systemctl status SERVICE`	Read logs, rollback or fix config.

APT repair commands

# Finish interrupted package configuration
                            sudo dpkg --configure -a

                            # Fix broken dependencies
                            sudo apt -f install

                            # Refresh metadata
                            sudo apt update

                            # Clean package cache
                            sudo apt clean

                            # Remove unused dependencies
                            sudo apt autoremove

                            # Review update history
                            less /var/log/apt/history.log
                            less /var/log/apt/term.log

Maintenance failure decision tree

Maintenance issue
                            │
                            ├── Package manager error?
                            │       ├── lock -> check apt/dpkg process
                            │       ├── broken -> dpkg --configure -a
                            │       └── repo -> inspect apt sources
                            │
                            ├── Firewall issue?
                            │       ├── check UFW rules
                            │       ├── verify SSH rule
                            │       └── test required ports
                            │
                            ├── Disk issue?
                            │       ├── check df -h
                            │       ├── check logs
                            │       ├── check snapshots
                            │       └── clean safely
                            │
                            ├── Service issue?
                            │       ├── systemctl status
                            │       ├── journalctl -u service
                            │       └── validate config
                            │
                            └── Bad update?
                            ├── use Timeshift if desktop
                            ├── use snapshot if VM
                            └── rollback package or config

Disk cleanup for maintenance

# Disk usage
                            df -h

                            # Large top-level directories
                            sudo du -xhd1 / 2>/dev/null | sort -h

                            # Journal usage and cleanup
                            journalctl --disk-usage
                            sudo journalctl --vacuum-time=14d

                            # APT cleanup
                            sudo apt clean
                            sudo apt autoremove

                            # Timeshift snapshots
                            sudo timeshift --list

Recovery rule: if the system is unstable after a major update, prefer a known restore point over random manual changes.

Maintenance routine: daily, weekly, monthly and before major changes

A simple routine prevents many incidents. The goal is not to spend hours every day, but to maintain visibility: update status, disk usage, failed services, logs, backup state and restore readiness.

Frequency	Actions	Commands / tools
Daily	Check failed services and critical alerts.	`systemctl --failed`, monitoring.
Weekly	Review updates, disk usage and warnings.	`apt list --upgradable`, `df -h`.
Monthly	Apply updates, reboot if needed, verify backups.	`apt upgrade`, backup logs.
Before major change	Create restore point or snapshot.	Timeshift, VM snapshot, cloud snapshot.
After incident	Review logs and add prevention.	`journalctl`, runbook update.

Weekly maintenance command block

echo "== SYSTEM =="
                            hostnamectl
                            uptime

                            echo "== UPDATES =="
                            sudo apt update
                            apt list --upgradable

                            echo "== DISK =="
                            df -h

                            echo "== FAILED SERVICES =="
                            systemctl --failed

                            echo "== WARNINGS TODAY =="
                            journalctl -p warning --since today --no-pager | tail -100

                            echo "== REBOOT REQUIRED =="
                            test -f /var/run/reboot-required && cat /var/run/reboot-required || echo "no reboot flag"

Maintenance calendar

Daily
                            ├── monitor alerts
                            ├── failed services
                            └── backup success

                            Weekly
                            ├── package update review
                            ├── disk space review
                            ├── log warnings review
                            └── firewall exposure review

                            Monthly
                            ├── apply updates
                            ├── reboot if required
                            ├── test restore sample
                            ├── cleanup old logs/snapshots
                            └── review users and sudo

                            Before major upgrade
                            ├── backup data
                            ├── Timeshift or VM snapshot
                            ├── record current version
                            ├── apply change
                            └── verify and document

Server maintenance record

Maintenance record:
                            - date
                            - hostname
                            - Ubuntu version
                            - packages updated
                            - reboot required
                            - reboot performed
                            - services checked
                            - disk usage
                            - backup status
                            - warnings found
                            - actions taken
                            - rollback point
                            - operator

Routine rule: the best maintenance routine is the one you can actually repeat. Keep it simple, observable and documented.

Final maintenance and security checklist

Maintenance checklist

[ ] Ubuntu LTS version is known
                            [ ] Package updates are reviewed regularly
                            [ ] Security updates are applied
                            [ ] Reboot-required status is checked
                            [ ] Reboot window exists for servers
                            [ ] Failed services are checked
                            [ ] Disk usage is monitored
                            [ ] Journal size is controlled
                            [ ] APT history is reviewed after updates
                            [ ] Timeshift is configured on desktop/workstation
                            [ ] Restore point is created before major changes
                            [ ] Data backup exists
                            [ ] Restore has been tested
                            [ ] Logs are readable
                            [ ] Maintenance actions are documented

Security checklist

[ ] UFW is enabled when appropriate
                            [ ] Default incoming policy is deny
                            [ ] Only required ports are open
                            [ ] SSH is protected
                            [ ] SSH source is restricted if possible
                            [ ] Root SSH login is disabled
                            [ ] Password SSH is disabled after key test
                            [ ] Users and sudo group are reviewed
                            [ ] Secrets are not world-readable
                            [ ] Backups are protected
                            [ ] Firewall rules are documented
                            [ ] Logs are reviewed after suspicious activity

Command cheat sheet

# Updates
                            sudo apt update
                            apt list --upgradable
                            sudo apt upgrade
                            sudo apt autoremove
                            sudo apt clean
                            test -f /var/run/reboot-required && cat /var/run/reboot-required

                            # Firewall
                            sudo ufw status verbose
                            sudo ufw default deny incoming
                            sudo ufw default allow outgoing
                            sudo ufw allow OpenSSH
                            sudo ufw allow 80/tcp
                            sudo ufw allow 443/tcp
                            sudo ufw enable

                            # Timeshift
                            sudo apt install timeshift
                            sudo timeshift-gtk
                            sudo timeshift --list

                            # Logs
                            journalctl -xe
                            journalctl -p warning --since today
                            journalctl -u SERVICE --since "1 hour ago"
                            journalctl -k --since today
                            sudo tail -100 /var/log/auth.log
                            less /var/log/apt/history.log

                            # Health
                            systemctl --failed
                            df -h
                            free -h
                            ss -lntp

Final rule

A well-maintained Ubuntu system is updated, protected, observable and recoverable.
Apply updates with a rollback plan, restrict network exposure with UFW, create restore points before risky changes, back up real data, read logs when problems occur, and keep a repeatable maintenance routine.

Minimal safe maintenance profile

Minimum safe profile:
                            - updates applied regularly
                            - reboot-required checked
                            - UFW configured
                            - SSH protected
                            - Timeshift or snapshot before major changes
                            - real data backup
                            - restore tested
                            - logs reviewed
                            - failed services checked
                            - disk usage monitored
                            - maintenance documented

2.1 Ubuntu Installation: Desktop, Server, ISO, UEFI, partitions, SSH, cloud-init and clean post-install

Installation scope

Ubuntu installation depends on the target: desktop workstation, production server, cloud VM, container host, lab machine or hardened bastion. The installation itself is only the first step. A clean Ubuntu setup also includes users, SSH, updates, firewall, time sync, storage layout, service baseline, logs, monitoring and backup strategy.

The professional approach is to install with a clear target architecture: what the machine will host, how it will be accessed, how it will be patched, how it will be monitored, and how it can be rebuilt.

Target	Installer	Key choices	Post-install priority
Desktop workstation	Ubuntu Desktop ISO	GUI, disk encryption, drivers, developer tools.	Updates, IDE, Docker, SSH keys, backups.
Server VM	Ubuntu Server ISO or cloud image	SSH, LVM, static IP, no GUI, minimal packages.	Hardening, firewall, systemd, monitoring.
Cloud server	Cloud image	cloud-init, SSH key, security group, disk size.	Bootstrap automation, logging, backup, patching.
Database server	Server ISO or image	Disk layout, filesystem, I/O, backup volume.	Storage monitoring, backup, security, tuning.
Container host	Server LTS	Disk for Docker, cgroups, kernel, network.	Docker, log rotation, registry access, metrics.

Core rule: an Ubuntu installation is not complete when the machine boots. It is complete when access, security, updates, logs, storage, monitoring and recovery are under control.

Installation flow map

Installation workflow
                            │
                            ├── Choose target
                            │       ├── desktop
                            │       ├── server
                            │       ├── cloud VM
                            │       └── container host
                            │
                            ├── Prepare media
                            │       ├── download ISO
                            │       ├── verify checksum
                            │       ├── create USB key
                            │       └── boot in UEFI mode
                            │
                            ├── Install system
                            │       ├── language and keyboard
                            │       ├── network
                            │       ├── disk layout
                            │       ├── user account
                            │       ├── SSH server
                            │       └── base packages
                            │
                            └── Post-install
                            ├── update packages
                            ├── harden SSH
                            ├── configure firewall
                            ├── enable monitoring
                            ├── configure backups
                            └── document server

Official URLs

Ubuntu downloads:
                            https://ubuntu.com/download

                            Ubuntu Server documentation:
                            https://documentation.ubuntu.com/server/

                            Ubuntu Desktop documentation:
                            https://documentation.ubuntu.com/desktop/

                            Ubuntu release images:
                            https://releases.ubuntu.com/

                            Ubuntu cloud images:
                            https://cloud-images.ubuntu.com/

                            cloud-init documentation:
                            https://cloudinit.readthedocs.io/

Ubuntu Desktop installation

Ubuntu Desktop installation is designed for workstations: developers, engineers, analysts and general desktop users. The main choices are language, keyboard, network, installation type, disk encryption, user account and optional third-party drivers.

Desktop install path

1. Download Ubuntu Desktop ISO
                            2. Verify checksum if required
                            3. Create bootable USB key
                            4. Boot in UEFI mode
                            5. Select language and keyboard
                            6. Connect to network
                            7. Choose normal or minimal install
                            8. Enable third-party drivers if needed
                            9. Choose disk layout
                            10. Enable encryption if laptop or sensitive data
                            11. Create admin user
                            12. Install and reboot
                            13. Remove USB key
                            14. Run updates
                            15. Install development tools

Choice	Recommended option	Reason
Release	LTS for stable workstation.	Less upgrade pressure.
Install type	Normal for general use, minimal for clean dev setup.	Controls preinstalled apps.
Disk encryption	Yes on laptop.	Protects data if machine is lost.
Third-party drivers	Enable if NVIDIA or Wi-Fi requires it.	Improves hardware compatibility.
Partitioning	Automatic unless dual boot or advanced layout.	Simple and safe for most users.

Desktop post-install developer baseline

# Update system
                            sudo apt update
                            sudo apt upgrade

                            # Install useful tools
                            sudo apt install curl wget git vim htop tree unzip ca-certificates

                            # Install build basics
                            sudo apt install build-essential pkg-config

                            # Install Python essentials
                            sudo apt install python3 python3-venv python3-pip

                            # Check version
                            lsb_release -a
                            uname -a

Developer workstation map

Ubuntu Desktop
                            │
                            ├── Terminal
                            ├── Git
                            ├── Python / Node / Java / Go
                            ├── Docker Desktop or Docker Engine
                            ├── IDE
                            ├── SSH keys
                            ├── browser dev tools
                            ├── cloud CLIs
                            └── VPN / security tooling

Desktop rule: for a serious developer machine, keep the OS stable, version your dotfiles, back up important files, use SSH keys, and avoid random system modifications that cannot be reproduced.

Ubuntu Server installation

Ubuntu Server installation is usually text-based and focused on production readiness: network, storage, user account, SSH, package selection and minimal attack surface. A server should normally be installed without a desktop environment.

Server install path

1. Download Ubuntu Server ISO
                            2. Boot in UEFI mode
                            3. Select language and keyboard
                            4. Configure network
                            - DHCP for simple cases
                            - static IP for fixed infrastructure
                            5. Configure proxy if needed
                            6. Configure apt mirror
                            7. Choose disk layout
                            - guided LVM for most servers
                            - manual for advanced storage
                            8. Create admin user
                            9. Install OpenSSH server
                            10. Import SSH key if available
                            11. Select minimal server packages
                            12. Install bootloader
                            13. Reboot
                            14. Connect by SSH
                            15. Run post-install baseline

Server choice	Recommendation	Why
GUI	No GUI on production server.	Lower resource usage and smaller attack surface.
SSH	Install OpenSSH during setup.	Remote administration required.
User	Named sudo user.	Avoid direct root workflow.
Disk	LVM for flexible servers.	Easier resizing and volume management.
Packages	Minimal baseline.	Install only what is needed.

Server install architecture

Bare metal or VM
                            │
                            ▼
                            Ubuntu Server installer
                            │
                            ├── network setup
                            ├── disk layout
                            ├── user creation
                            ├── SSH setup
                            ├── package baseline
                            └── bootloader
                            │
                            ▼
                            First boot
                            │
                            ├── SSH login
                            ├── update packages
                            ├── harden access
                            ├── configure firewall
                            ├── install services
                            └── enable monitoring

First server commands

# Update package index and upgrade
                            sudo apt update
                            sudo apt upgrade

                            # Install baseline tools
                            sudo apt install curl wget vim git htop tree unzip net-tools dnsutils

                            # Check services
                            systemctl --failed
                            systemctl status ssh

                            # Check network
                            ip a
                            ip r
                            ss -lntp

                            # Check storage
                            lsblk
                            df -h

Production warning: do not expose a fresh server directly without SSH hardening, firewall rules, update policy and monitoring.

UEFI, BIOS, boot media and installation verification

Modern Ubuntu installations should normally boot in UEFI mode. UEFI affects the boot partition, bootloader installation and compatibility with Secure Boot. If the USB key is booted in legacy BIOS mode, the final installation may not match the target firmware configuration.

Boot concept	Meaning	Practical rule
UEFI	Modern firmware boot mode.	Preferred for new machines and servers.
Legacy BIOS	Older boot mode.	Use only if hardware requires it.
ESP	EFI System Partition.	Required for UEFI boot.
Secure Boot	Firmware validation of boot chain.	Usually supported, but test with custom drivers.
Boot order	Firmware decides which disk or USB boots first.	Verify after installation.

Boot media preparation

Recommended flow:
                            1. Download ISO from official source
                            2. Verify ISO checksum if required
                            3. Write USB with Rufus, Balena Etcher or dd
                            4. Boot USB in UEFI mode
                            5. Install Ubuntu
                            6. Reboot without USB key
                            7. Confirm system boots from target disk

UEFI disk layout sketch

Disk /dev/sda
                            │
                            ├── EFI System Partition
                            │       ├── size: 512 MB to 1 GB
                            │       ├── filesystem: FAT32
                            │       └── mount: /boot/efi
                            │
                            ├── /boot
                            │       ├── optional separate partition
                            │       └── kernel and initramfs
                            │
                            ├── LVM physical volume
                            │       ├── root volume /
                            │       ├── var volume /var
                            │       ├── home volume /home
                            │       └── swap volume or swapfile
                            │
                            └── free space or data volumes

Boot verification commands

# Check if system booted in UEFI mode
                            test -d /sys/firmware/efi && echo "UEFI boot" || echo "Legacy boot"

                            # Show block devices
                            lsblk -f

                            # Show EFI boot entries
                            sudo efibootmgr -v

                            # Show mounted filesystems
                            findmnt

                            # Show boot partition
                            findmnt /boot/efi

UEFI rule: boot the installer in the same mode you want the installed system to use. Mixing legacy and UEFI often creates bootloader confusion.

Disk layout, partitions, LVM, encryption and swap

Disk layout should reflect the server role. A laptop usually benefits from full-disk encryption. A server often benefits from LVM. A database server needs careful storage planning. A Docker host needs enough space under /var/lib/docker.

Pattern	Best for	Strength	Watch out
Automatic layout	Desktop, lab, simple VM.	Fast and low-risk.	Less control over growth areas.
LVM	Servers and VMs.	Flexible resizing and volume management.	Requires basic LVM knowledge.
Encrypted disk	Laptops and sensitive systems.	Protects data at rest.	Remote boot can be harder.
Separate /var	Servers with logs, caches, Docker.	Protects root filesystem from log growth.	Size must be planned.
Separate data volume	Database and application data.	Cleaner backup and scaling.	Mount and permission discipline required.

Example server layout

Small web server:
                            - /boot/efi     512 MB to 1 GB
                            - /             30 GB to 50 GB
                            - /var          20 GB to 100 GB
                            - /home         optional
                            - swap          swapfile or LV
                            - /srv          application data if needed

                            Docker host:
                            - /             30 GB to 50 GB
                            - /var          large volume
                            - /var/lib/docker on dedicated volume if possible

                            Database host:
                            - /             30 GB to 50 GB
                            - /var/log      separate or monitored
                            - /data         dedicated fast volume
                            - /backup       separate volume or external storage

Disk decision tree

Is it a laptop?
                            ├── yes -> enable disk encryption
                            └── no
                            │
                            ▼
                            Is it a production server?
                            ├── yes -> prefer LVM or cloud volume strategy
                            └── no -> automatic layout is acceptable

                            Will logs, Docker or DB grow?
                            ├── yes -> separate /var or data volume
                            └── no -> simple root filesystem

                            Need easy snapshot/resize?
                            ├── yes -> LVM or cloud block volumes
                            └── no -> simple partitioning

Storage inspection commands

# Show disks and partitions
                            lsblk

                            # Show filesystems
                            lsblk -f

                            # Show disk usage
                            df -h

                            # Show directory usage
                            sudo du -sh /var/*

                            # Show mounts
                            findmnt

                            # Show LVM volumes
                            sudo pvs
                            sudo vgs
                            sudo lvs

                            # Show swap
                            swapon --show
                            free -h

Production risk: if /var fills up, logs, Docker, package installs, databases and services can fail. Monitor disk usage from day one.

Network, SSH and remote access baseline

Server installation must make remote access reliable and safe. The minimum baseline is: one named sudo user, SSH key access, password authentication disabled where possible, root login disabled, firewall enabled and only required ports opened.

Area	Baseline	Reason
Admin user	Named user with sudo rights.	Audit and safer administration.
SSH keys	Key-based access.	Stronger than passwords.
Root login	Disabled.	Reduces brute-force and blast radius.
Password auth	Disabled after key validation.	Reduces attack surface.
Firewall	Default deny incoming.	Only expose required services.
Network config	DHCP for simple cases, static for infrastructure.	Predictable access.

SSH hardening example

# Backup SSH config
                            sudo cp /etc/ssh/sshd_config /etc/ssh/sshd_config.bak

                            # Edit SSH config
                            sudo vim /etc/ssh/sshd_config

                            # Recommended directives
                            PermitRootLogin no
                            PasswordAuthentication no
                            PubkeyAuthentication yes
                            AllowUsers deploy

                            # Validate and restart
                            sudo sshd -t
                            sudo systemctl restart ssh

Firewall baseline

# Enable UFW
                            sudo ufw default deny incoming
                            sudo ufw default allow outgoing

                            # Allow SSH
                            sudo ufw allow OpenSSH

                            # Web server example
                            sudo ufw allow 80/tcp
                            sudo ufw allow 443/tcp

                            # Enable firewall
                            sudo ufw enable

                            # Check status
                            sudo ufw status verbose

Network diagnostic commands

# IP addresses
                            ip a

                            # Routes
                            ip r

                            # DNS status
                            resolvectl status

                            # Listening ports
                            ss -lntp

                            # Test local service
                            curl -I http://localhost

                            # Test remote host
                            ping -c 3 8.8.8.8

                            # Trace path
                            tracepath ubuntu.com

Important: before disabling SSH password authentication, verify that key-based login works in a second terminal. Otherwise, you can lock yourself out.

cloud-init for automated server bootstrap

cloud-init is the standard way to initialize Ubuntu cloud images. It can create users, install packages, add SSH keys, write files, run commands, configure timezone and prepare the machine during first boot.

cloud-init feature	Usage	Example
users	Create admin users.	deploy user with sudo.
ssh_authorized_keys	Install public keys.	Key-based access from first boot.
packages	Install baseline packages.	curl, git, htop, nginx.
write_files	Create config files.	systemd unit, app config, banner.
runcmd	Run final bootstrap commands.	enable firewall, restart service.
package_update	Refresh apt cache.	Update before package install.

Minimal cloud-init example

#cloud-config
                            package_update: true
                            package_upgrade: true

                            users:
                            - name: deploy
                            groups: sudo
                            shell: /bin/bash
                            sudo: ['ALL=(ALL) NOPASSWD:ALL']
                            ssh_authorized_keys:
                            - ssh-ed25519 AAAA_REPLACE_WITH_PUBLIC_KEY deploy-key

                            packages:
                            - curl
                            - wget
                            - git
                            - vim
                            - htop
                            - ufw

                            runcmd:
                            - ufw allow OpenSSH
                            - ufw --force enable
                            - timedatectl set-timezone UTC

cloud-init lifecycle

Cloud VM first boot
                            │
                            ▼
                            cloud-init starts
                            │
                            ├── reads metadata
                            ├── reads user-data
                            ├── configures hostname
                            ├── creates users
                            ├── installs SSH keys
                            ├── installs packages
                            ├── writes files
                            ├── runs commands
                            └── marks initialization done
                            │
                            ▼
                            Server ready for automation
                            ├── Ansible
                            ├── deploy pipeline
                            ├── monitoring
                            └── application install

cloud-init diagnostics

# Show cloud-init status
                            cloud-init status

                            # Wait until finished
                            cloud-init status --wait

                            # Inspect logs
                            sudo less /var/log/cloud-init.log
                            sudo less /var/log/cloud-init-output.log

                            # Validate config file if tool is available
                            cloud-init schema --config-file user-data.yaml

                            # Re-run is not trivial on production
                            # Prefer rebuilding disposable cloud instances

Cloud rule: use cloud-init for first-boot bootstrap, then use Ansible, Terraform, scripts or configuration management for repeatable lifecycle operations.

Clean post-install baseline

Post-install is where a raw Ubuntu machine becomes a clean operating platform. The goal is to make the system secure, updated, observable, recoverable and ready for application deployment.

Post-install baseline commands

# Update system
                            sudo apt update
                            sudo apt upgrade

                            # Install useful admin tools
                            sudo apt install curl wget vim git htop tree unzip ca-certificates dnsutils

                            # Set timezone
                            timedatectl
                            sudo timedatectl set-timezone UTC

                            # Check failed units
                            systemctl --failed

                            # Check logs
                            journalctl -p warning --since today

                            # Check reboot requirement
                            test -f /var/run/reboot-required && cat /var/run/reboot-required

Post-install action	Command / file	Why
Update packages	`apt update && apt upgrade`	Apply latest security fixes.
Create admin user	`adduser`, `usermod -aG sudo`	Avoid root workflow.
Harden SSH	`/etc/ssh/sshd_config`	Reduce remote access risk.
Enable firewall	`ufw`	Expose only required ports.
Configure time	`timedatectl`	Correct logs and certificates.
Install monitoring	agent or exporter	Detect issues early.

Clean server baseline

Fresh Ubuntu Server
                            │
                            ├── system update
                            ├── admin user
                            ├── SSH key access
                            ├── root login disabled
                            ├── password auth disabled
                            ├── firewall enabled
                            ├── timezone configured
                            ├── monitoring installed
                            ├── log policy checked
                            ├── backups configured
                            ├── service manager ready
                            └── runbook documented

Server documentation template

Server record:
                            - hostname
                            - Ubuntu version
                            - kernel version
                            - role
                            - owner
                            - public IP
                            - private IP
                            - SSH port
                            - open firewall ports
                            - installed services
                            - data volumes
                            - backup policy
                            - monitoring URL
                            - patching window
                            - rollback method
                            - emergency contact

Professional habit: document the server immediately after installation. Six months later, this avoids guessing what the machine is and how it was built.

Installation troubleshooting

Problem	Likely cause	First diagnostic	Correction
USB does not boot	Bad USB image, wrong boot mode, firmware order.	Check UEFI boot menu.	Rewrite USB, select UEFI USB entry.
Installed system does not boot	Bootloader installed in wrong mode or disk.	Check UEFI entries.	Repair bootloader or reinstall in correct mode.
No network during install	Driver, cable, DHCP, VLAN, Wi-Fi issue.	Check link and IP.	Use wired network or configure static IP.
Cannot SSH after install	SSH not installed, firewall, wrong IP, bad key.	Console login and `systemctl status ssh`.	Install SSH, fix firewall, verify key.
Disk full after install	Small root, logs, Docker, wrong partition plan.	`df -h`, `du -sh`.	Resize volume, clean logs, separate /var.
Package install fails	Broken apt state, no DNS, mirror issue.	`apt update`, DNS check.	Fix DNS, mirror, dpkg configure.

Diagnostic decision tree

Fresh server problem
                            │
                            ├── Does it boot?
                            │       ├── no -> UEFI, bootloader, disk
                            │       └── yes
                            │
                            ├── Does it have network?
                            │       ├── no -> IP, route, DNS, driver
                            │       └── yes
                            │
                            ├── Can you SSH?
                            │       ├── no -> ssh service, firewall, key, IP
                            │       └── yes
                            │
                            ├── Are packages working?
                            │       ├── no -> DNS, apt mirror, dpkg lock
                            │       └── yes
                            │
                            └── Are baseline services healthy?
                            ├── no -> systemctl and journalctl
                            └── yes -> server ready

Emergency commands

# SSH service
                            sudo systemctl status ssh
                            sudo systemctl restart ssh

                            # Firewall
                            sudo ufw status verbose

                            # Network
                            ip a
                            ip r
                            resolvectl status

                            # Package repair
                            sudo dpkg --configure -a
                            sudo apt -f install
                            sudo apt update

                            # Logs
                            journalctl -p err --since today
                            dmesg -T | tail -100

Incident rule: on a new install, most failures are boot mode, network, SSH, firewall, disk or package mirror. Diagnose in that order.

Final installation checklist

Before install

[ ] Target role is defined
                            [ ] Ubuntu edition selected
                            [ ] LTS version selected
                            [ ] ISO downloaded from official source
                            [ ] Checksum verified if required
                            [ ] Boot media created
                            [ ] UEFI mode confirmed
                            [ ] Disk layout planned
                            [ ] Static IP or DHCP decision made
                            [ ] Hostname chosen
                            [ ] Admin user chosen
                            [ ] SSH key available
                            [ ] Backup of existing data done
                            [ ] Rollback plan exists if replacing server

During install

[ ] Correct disk selected
                            [ ] EFI partition created if UEFI
                            [ ] LVM selected if server needs flexibility
                            [ ] Encryption enabled if needed
                            [ ] OpenSSH server installed
                            [ ] Admin user created
                            [ ] Network works
                            [ ] Bootloader installed correctly
                            [ ] Machine reboots without USB

After install

[ ] System updated
                            [ ] Reboot performed if required
                            [ ] SSH key login tested
                            [ ] Root SSH login disabled
                            [ ] Password SSH disabled after key validation
                            [ ] Firewall enabled
                            [ ] Only required ports open
                            [ ] Timezone and time sync configured
                            [ ] Hostname correct
                            [ ] Disk usage checked
                            [ ] Failed systemd units checked
                            [ ] Monitoring installed
                            [ ] Backup configured
                            [ ] Server documented
                            [ ] Snapshot or image created if needed

Final rule

A clean Ubuntu installation is reproducible, secure and observable.
The machine should boot correctly, be reachable through controlled SSH, have a clear disk layout, expose only required ports, receive updates, produce usable logs, be monitored, be backed up and be documented.

Minimal safe server baseline

Minimum safe server:
                            - Ubuntu Server LTS
                            - named sudo user
                            - SSH key access
                            - root login disabled
                            - firewall enabled
                            - system updated
                            - timezone configured
                            - disk monitored
                            - logs accessible
                            - backup and rollback plan
                            - server record documented

2.2 Ubuntu CLI Basics: files, users, permissions, services, logs, network, storage and troubleshooting

What “Ubuntu CLI basics” means

The Ubuntu command line is the operational control layer of a Linux server. It is used to inspect files, manage users, control services, read logs, diagnose network problems, check storage, install packages, secure access and troubleshoot production incidents.

A good sysadmin workflow is not memorizing thousands of commands. It is knowing which subsystem to inspect first: files, permissions, users, service manager, logs, network, storage, packages or security.

Area	Purpose	Main tools	Typical question
Files	Navigate, copy, move, inspect, search.	`ls`, `cp`, `mv`, `find`, `du`	Where is the file? How large is it?
Permissions	Control who can read, write or execute.	`chmod`, `chown`, `umask`, `stat`	Why can this process not access this file?
Users	Create accounts, groups and sudo rights.	`adduser`, `usermod`, `id`, `sudo`	Who can administer this machine?
Services	Start, stop, enable and debug daemons.	`systemctl`, `journalctl`	Is Nginx, SSH, Redis or PostgreSQL running?
Logs	Understand what happened.	`journalctl`, `tail`, `grep`	What error occurred and when?
Network	Inspect IP, routes, DNS, ports, sockets.	`ip`, `ss`, `curl`, `dig`	Can the server reach or expose the service?
Storage	Inspect disks, mounts, free space, I/O.	`df`, `du`, `lsblk`, `findmnt`	Is the disk full or mounted correctly?

Core rule: in production, do not guess. Inspect facts first: service status, logs, ports, permissions, disk space, memory and recent changes.

CLI diagnostic mental model

Problem on Ubuntu
                            │
                            ├── Is the file present?
                            │       └── ls, find, stat
                            │
                            ├── Are permissions correct?
                            │       └── ls -l, chmod, chown, id
                            │
                            ├── Is the service running?
                            │       └── systemctl status
                            │
                            ├── What do logs say?
                            │       └── journalctl, tail, grep
                            │
                            ├── Is the port listening?
                            │       └── ss -lntp
                            │
                            ├── Is the network path OK?
                            │       └── ip, ping, curl, dig
                            │
                            ├── Is storage full?
                            │       └── df, du, lsblk
                            │
                            └── Did something recently change?
                            └── apt history, logs, config diff

First 60 seconds on a server

hostnamectl
                            uptime
                            who
                            df -h
                            free -h
                            systemctl --failed
                            ss -lntp
                            journalctl -p warning --since "30 min ago"

Bad reflex: restarting random services without reading logs. It may hide the root cause and make the incident harder to understand.

Files and directories: navigate, inspect, copy, search

Most Ubuntu administration starts with files: configuration files, logs, service units, application folders, SSH keys, certificates, scripts, backups and data directories.

Command	Usage	Example
`pwd`	Show current directory.	`pwd`
`ls -lah`	List files with details and hidden files.	`ls -lah /etc/nginx`
`cd`	Change directory.	`cd /var/log`
`cp -a`	Copy while preserving attributes.	`cp -a app app.bak`
`mv`	Move or rename.	`mv old.conf new.conf`
`rm`	Remove files.	`rm old.log`
`find`	Search files by name, type, age or size.	`find /var/log -type f -name "*.log"`
`du -sh`	Show directory size.	`du -sh /var/lib/docker`

Essential file commands

# Where am I?
                            pwd

                            # List files with permissions, owner, size and hidden files
                            ls -lah

                            # Copy a directory safely, preserving metadata
                            cp -a /etc/nginx /etc/nginx.bak

                            # Move or rename
                            mv app.conf app.conf.disabled

                            # Remove carefully
                            rm file.txt

                            # Dangerous: recursive delete
                            rm -rf path

                            # Find recent logs
                            find /var/log -type f -name "*.log" -mtime -7

                            # Find large files
                            find /var -type f -size +100M -exec ls -lh {} \;

Linux filesystem map

/
                            ├── bin      essential binaries
                            ├── boot     kernel and boot files
                            ├── dev      devices
                            ├── etc      system configuration
                            ├── home     user home directories
                            ├── lib      system libraries
                            ├── media    removable media
                            ├── mnt      temporary mounts
                            ├── opt      optional software
                            ├── proc     kernel/process virtual filesystem
                            ├── root     root user home
                            ├── run      runtime state
                            ├── sbin     system binaries
                            ├── srv      service/application data
                            ├── sys      kernel/device virtual filesystem
                            ├── tmp      temporary files
                            ├── usr      user-space programs and libraries
                            └── var      logs, cache, spool, databases, runtime data

Useful inspection commands

# Show file type
                            file /path/to/file

                            # Show file metadata
                            stat /path/to/file

                            # Read first lines
                            head -50 /var/log/syslog

                            # Read last lines
                            tail -100 /var/log/syslog

                            # Follow a log live
                            tail -f /var/log/syslog

                            # Search inside files
                            grep -R "error" /etc/nginx

                            # Compare two files
                            diff -u old.conf new.conf

Production habit: before editing a config file, create a timestamped backup: sudo cp -a file file.bak.$(date +%Y%m%d-%H%M%S).

Permissions: rwx, ownership, groups, umask and safe defaults

Linux permissions define who can read, write or execute a file. Most application failures on Ubuntu servers eventually involve one of these: wrong owner, wrong group, missing execute bit on directory, overly permissive file, SSH key permissions or service user unable to access application files.

Permission notation

Example:
                            -rw-r--r-- 1 root root 1200 app.conf

                            Breakdown:
                            -       file type
                            rw-     owner permissions
                            r--     group permissions
                            r--     others permissions

                            r = read
                            w = write
                            x = execute / enter directory

Mode	Meaning	Typical use
`600`	Owner read/write only.	Private keys, secrets.
`644`	Owner write, everyone read.	Config files, static files.
`700`	Owner full access only.	Private directories, `.ssh`.
`755`	Owner write, everyone read/execute.	Directories, scripts, web static dirs.
`777`	Everyone can do everything.	Almost never acceptable.

Permission commands

# Show permissions
                            ls -lah /srv/app

                            # Show user and group identity
                            id deploy

                            # Change owner
                            sudo chown deploy:www-data /srv/app

                            # Change owner recursively
                            sudo chown -R deploy:www-data /srv/app

                            # Change file permissions
                            chmod 644 config.ini

                            # Change directory permissions
                            chmod 755 /srv/app

                            # SSH key permissions
                            chmod 700 ~/.ssh
                            chmod 600 ~/.ssh/id_ed25519
                            chmod 644 ~/.ssh/authorized_keys

                            # Show default creation mask
                            umask

Permission troubleshooting flow

Permission denied
                            │
                            ├── Which user runs the process?
                            │       └── ps aux | grep service
                            │
                            ├── Who owns the file?
                            │       └── ls -lah file
                            │
                            ├── Can the user access parent directories?
                            │       └── namei -l /path/to/file
                            │
                            ├── Is the group correct?
                            │       └── id user
                            │
                            └── Are permissions too strict or too broad?
                            └── chmod / chown carefully

Production rule: never solve permission problems with chmod 777. Fix ownership, groups and minimal required permissions.

Users, groups, sudo and SSH access

Ubuntu administration should use named users with sudo privileges, not direct root logins. This improves traceability, reduces operational risk and supports least-privilege access. For production, SSH keys should be preferred over passwords.

Task	Command	Purpose
Create user	`sudo adduser deploy`	Create named account.
Add sudo rights	`sudo usermod -aG sudo deploy`	Allow admin actions.
Inspect identity	`id deploy`	Show UID, GID and groups.
Show groups	`groups deploy`	Confirm group membership.
Check sudo rights	`sudo -l`	Show allowed sudo commands.
Lock account	`sudo usermod -L user`	Disable password login.

User management examples

# Create admin user
                            sudo adduser deploy
                            sudo usermod -aG sudo deploy

                            # Check user
                            id deploy
                            groups deploy

                            # Switch user
                            su - deploy

                            # Test sudo permissions
                            sudo -l

                            # Add user to web group
                            sudo usermod -aG www-data deploy

                            # Lock user password
                            sudo passwd -l deploy

SSH access model

Admin workstation
                            │
                            ├── private key
                            └── public key
                            │
                            ▼
                            Ubuntu server
                            │
                            ├── /home/deploy/.ssh/authorized_keys
                            ├── sshd service
                            ├── firewall allows SSH
                            └── sudo controls privilege escalation

SSH hardening baseline

# Backup SSH config
                            sudo cp /etc/ssh/sshd_config /etc/ssh/sshd_config.bak

                            # Recommended settings in /etc/ssh/sshd_config
                            PermitRootLogin no
                            PasswordAuthentication no
                            PubkeyAuthentication yes
                            AllowUsers deploy

                            # Validate syntax
                            sudo sshd -t

                            # Restart SSH
                            sudo systemctl restart ssh

                            # Check logs
                            journalctl -u ssh --since today

Access control checklist

[ ] One named admin user
                            [ ] SSH key installed
                            [ ] User belongs to sudo group only if required
                            [ ] Root SSH login disabled
                            [ ] Password authentication disabled after key test
                            [ ] Unused users disabled
                            [ ] sudoers changes made with visudo
                            [ ] SSH access logs reviewed

Safe habit: keep one open SSH session while changing SSH configuration, then test a second connection before closing the first one.

Services with systemd: status, start, stop, enable, logs

Ubuntu uses systemd to manage services. A service can be running now, enabled at boot, failed, disabled, masked or waiting on dependencies. Most production daemons such as SSH, Nginx, PostgreSQL, Redis, Docker, Gunicorn and Celery are managed by systemd.

Command	Meaning	Example
`status`	Show state, PID, recent logs.	`systemctl status nginx`
`start`	Start now.	`sudo systemctl start nginx`
`stop`	Stop now.	`sudo systemctl stop nginx`
`restart`	Stop and start again.	`sudo systemctl restart nginx`
`reload`	Reload config without full restart if supported.	`sudo systemctl reload nginx`
`enable`	Start automatically at boot.	`sudo systemctl enable nginx`
`disable`	Do not start automatically at boot.	`sudo systemctl disable nginx`

Essential systemd commands

# Service status
                            systemctl status nginx

                            # Start / stop / restart
                            sudo systemctl start nginx
                            sudo systemctl stop nginx
                            sudo systemctl restart nginx

                            # Enable at boot
                            sudo systemctl enable nginx

                            # Disable at boot
                            sudo systemctl disable nginx

                            # Show failed services
                            systemctl list-units --type=service --state=failed

                            # Show enabled services
                            systemctl list-unit-files --type=service --state=enabled

Service troubleshooting flow

Service is down
                            │
                            ├── Check status
                            │       └── systemctl status service
                            │
                            ├── Read service logs
                            │       └── journalctl -u service
                            │
                            ├── Validate config
                            │       └── nginx -t / sshd -t / app-specific check
                            │
                            ├── Check port binding
                            │       └── ss -lntp
                            │
                            ├── Check permissions
                            │       └── ls -lah, id service-user
                            │
                            └── Restart only after understanding error
                            └── systemctl restart service

Custom service unit example

[Unit]
                            Description=Gunicorn Django application
                            After=network.target

                            [Service]
                            User=deploy
                            Group=www-data
                            WorkingDirectory=/srv/myapp
                            Environment="DJANGO_SETTINGS_MODULE=config.settings"
                            ExecStart=/srv/myapp/venv/bin/gunicorn config.wsgi:application \
                            --bind 127.0.0.1:8000 \
                            --workers 3
                            Restart=always
                            RestartSec=5

                            [Install]
                            WantedBy=multi-user.target

Install custom unit

sudo cp gunicorn.service /etc/systemd/system/gunicorn.service
                            sudo systemctl daemon-reload
                            sudo systemctl enable gunicorn
                            sudo systemctl start gunicorn
                            systemctl status gunicorn
                            journalctl -u gunicorn -f

Reflex: if a service fails, use systemctl status then journalctl -u service. Do not debug blindly.

Logs: journald, syslog, auth logs and application logs

Logs tell what the system and services reported at the time of the incident. On Ubuntu, systemd logs are read with journalctl. Traditional logs often live under /var/log. Applications may log to journald, files, Docker logs or external observability tools.

Log source	What it contains	Command
systemd journal	Service logs and system events.	`journalctl`
Service unit logs	Specific service output.	`journalctl -u nginx`
Auth logs	SSH, sudo, authentication events.	`/var/log/auth.log`
Syslog	General system messages.	`/var/log/syslog`
Kernel logs	Kernel and hardware messages.	`dmesg`
Application logs	App-specific runtime errors.	App path, journald or Docker logs.

journalctl essentials

# Recent critical context
                            journalctl -xe

                            # Logs for one service
                            journalctl -u nginx

                            # Follow service logs live
                            journalctl -u nginx -f

                            # Logs since a time
                            journalctl -u nginx --since "1 hour ago"

                            # Logs since today
                            journalctl -u ssh --since today

                            # Warnings and errors
                            journalctl -p warning --since today

                            # Boot logs
                            journalctl -b

                            # Previous boot
                            journalctl -b -1

Classic log commands

# Last lines
                            tail -n 200 /var/log/syslog
                            tail -n 200 /var/log/auth.log

                            # Follow file live
                            tail -f /var/log/syslog

                            # Search errors
                            grep -i "error" /var/log/syslog

                            # Search SSH failures
                            grep -i "failed" /var/log/auth.log

                            # Compressed rotated logs
                            zgrep -i "error" /var/log/syslog.*.gz

                            # Kernel recent messages
                            dmesg -T | tail -100

Log diagnosis map

Incident type
                            │
                            ├── Service fails
                            │       └── journalctl -u service
                            │
                            ├── SSH login issue
                            │       └── journalctl -u ssh, /var/log/auth.log
                            │
                            ├── Kernel or hardware issue
                            │       └── dmesg -T
                            │
                            ├── Package install issue
                            │       └── /var/log/apt/history.log
                            │
                            ├── Web server issue
                            │       └── nginx/apache logs + journal
                            │
                            └── App issue
                            └── app logs + service journal

Apt history

# See package changes
                            less /var/log/apt/history.log

                            # See apt terminal output
                            less /var/log/apt/term.log

Production habit: always include time windows in log commands. It reduces noise: --since "30 min ago".

Network: IP, routes, DNS, ports, sockets, firewall

Network troubleshooting should follow a strict order: local IP, route, DNS, firewall, listening socket, service health, upstream application. This avoids confusing a DNS issue with a service issue, or a firewall issue with an application crash.

Layer	Question	Command
Interface	Does the server have an IP?	`ip a`
Route	Does it know where to send traffic?	`ip r`
DNS	Can names resolve?	`resolvectl status`, `dig`
Port	Is the service listening?	`ss -lntp`
Firewall	Is traffic allowed?	`ufw status verbose`
HTTP test	Does the endpoint respond?	`curl -I`

Network essentials

# IP addresses
                            ip a

                            # Routes
                            ip r

                            # Listening TCP ports with process
                            ss -lntp

                            # Established connections
                            ss -antp

                            # DNS status
                            resolvectl status

                            # DNS query
                            dig example.com

                            # HTTP check
                            curl -I https://example.com

                            # Basic reachability
                            ping -c 3 1.1.1.1

                            # Path test
                            tracepath example.com

Network troubleshooting flow

Network problem
                            │
                            ├── Local IP present?
                            │       └── ip a
                            │
                            ├── Default route present?
                            │       └── ip r
                            │
                            ├── DNS working?
                            │       └── dig domain
                            │
                            ├── Firewall allows traffic?
                            │       └── ufw status verbose
                            │
                            ├── Service listening?
                            │       └── ss -lntp
                            │
                            ├── Local curl works?
                            │       └── curl -I http://localhost
                            │
                            └── Remote curl works?
                            └── curl -I https://public-domain

Firewall commands

# Status
                            sudo ufw status verbose

                            # Default rules
                            sudo ufw default deny incoming
                            sudo ufw default allow outgoing

                            # Allow SSH
                            sudo ufw allow OpenSSH

                            # Allow web ports
                            sudo ufw allow 80/tcp
                            sudo ufw allow 443/tcp

                            # Enable
                            sudo ufw enable

                            # Delete a rule
                            sudo ufw delete allow 80/tcp

Pattern: IP → route → DNS → firewall → listening port → service logs → application.

Storage: disks, mounts, usage, LVM, swap and full-disk incidents

Storage problems are among the most common Linux incidents. A full root filesystem, a full /var, a missing mount, broken permissions on a data directory or uncontrolled Docker logs can stop services even when CPU and memory look fine.

Command	Purpose	Example
`df -h`	Show filesystem free space.	`df -h`
`du -sh`	Show directory size.	`du -sh /var/*`
`lsblk`	Show disks and partitions.	`lsblk -f`
`findmnt`	Show mounted filesystems.	`findmnt /var`
`swapon`	Show swap devices/files.	`swapon --show`
`lvs`	Show LVM logical volumes.	`sudo lvs`

Storage essentials

# Filesystem usage
                            df -h

                            # Directory usage
                            sudo du -sh /var/*
                            sudo du -sh /var/log/*
                            sudo du -sh /var/lib/docker/*

                            # Disks and filesystems
                            lsblk
                            lsblk -f

                            # Mounted filesystems
                            findmnt

                            # Swap
                            swapon --show
                            free -h

                            # LVM if used
                            sudo pvs
                            sudo vgs
                            sudo lvs

Full disk incident flow

Disk alert or service failure
                            │
                            ├── Check filesystems
                            │       └── df -h
                            │
                            ├── Identify large directories
                            │       └── du -sh /*
                            │
                            ├── Focus common growth areas
                            │       ├── /var/log
                            │       ├── /var/lib/docker
                            │       ├── /var/lib/postgresql
                            │       ├── /tmp
                            │       └── application uploads
                            │
                            ├── Clean safely
                            │       ├── rotate logs
                            │       ├── prune Docker carefully
                            │       └── archive/delete known files
                            │
                            └── Prevent recurrence
                            ├── monitoring
                            ├── logrotate
                            ├── retention policy
                            └── larger/separate volume

Safe cleanup examples

# Clean apt cache
                            sudo apt clean

                            # Remove unused packages
                            sudo apt autoremove

                            # Vacuum journal logs older than 14 days
                            sudo journalctl --vacuum-time=14d

                            # Show Docker usage
                            docker system df

                            # Docker cleanup - use carefully
                            docker system prune

Production warning: never delete unknown database files manually. For PostgreSQL, MySQL, MariaDB, Redis or Docker volumes, understand the data path before cleanup.

Troubleshooting patterns: from symptom to root cause

Troubleshooting on Ubuntu should follow a repeatable sequence: observe, isolate, verify, change one thing, measure again, document. Most incidents can be reduced to service state, logs, ports, permissions, network, storage, memory or recent changes.

Symptom	First checks	Common causes
Service down	`systemctl status`, `journalctl -u`	Bad config, dependency, permission, port conflict.
502 from Nginx	Nginx logs, upstream service, socket/port.	Gunicorn down, wrong socket, app error.
SSH blocked	SSH service, firewall, key, auth logs.	Bad key, password disabled, UFW, fail2ban.
Cannot install package	`apt update`, DNS, locks, dpkg state.	Mirror, DNS, interrupted install, lock file.
Disk full	`df -h`, `du -sh`.	Logs, Docker, DB, uploads, backups.
App permission error	`ls -lah`, `id`, `namei -l`.	Wrong owner, group, parent directory permissions.
DNS issue	`resolvectl status`, `dig`.	Resolver config, firewall, network, cloud DNS.

Universal incident decision tree

Application not working
                            │
                            ├── Is the server alive?
                            │       └── ping, SSH, cloud console
                            │
                            ├── Is disk full?
                            │       └── df -h
                            │
                            ├── Is memory exhausted?
                            │       └── free -h, top
                            │
                            ├── Is the service running?
                            │       └── systemctl status service
                            │
                            ├── What do logs say?
                            │       └── journalctl -u service
                            │
                            ├── Is the port listening?
                            │       └── ss -lntp
                            │
                            ├── Is firewall blocking?
                            │       └── ufw status verbose
                            │
                            ├── Is DNS/routing OK?
                            │       └── ip r, resolvectl, dig
                            │
                            └── Did a recent change happen?
                            └── apt history, deploy logs, config diff

Useful “one screen” diagnostic

echo "== HOST ==" && hostnamectl
                            echo "== UPTIME ==" && uptime
                            echo "== DISK ==" && df -h
                            echo "== MEMORY ==" && free -h
                            echo "== FAILED UNITS ==" && systemctl --failed
                            echo "== PORTS ==" && ss -lntp
                            echo "== WARNINGS ==" && journalctl -p warning --since "30 min ago" --no-pager

Incident discipline: isolate scope first. Is it one service, one port, one user, one disk, one host, one network path or the whole platform?

Ubuntu CLI cheat sheet and production checklist

Core cheat sheet

# Files
                            ls -lah
                            cp -a src dst
                            mv old new
                            rm file
                            find /path -name "*.log"
                            du -sh *
                            df -h

                            # Permissions
                            ls -l
                            chmod 644 file
                            chmod 755 dir
                            chown user:group file
                            id user
                            namei -l /path/to/file

                            # Users
                            adduser user
                            usermod -aG sudo user
                            groups user
                            sudo -l

                            # Services
                            systemctl status service
                            systemctl restart service
                            systemctl enable service
                            systemctl --failed

                            # Logs
                            journalctl -u service -f
                            journalctl -p warning --since today
                            tail -f /var/log/syslog

                            # Network
                            ip a
                            ip r
                            ss -lntp
                            curl -I http://localhost
                            dig domain
                            resolvectl status

                            # Storage
                            lsblk -f
                            findmnt
                            swapon --show
                            free -h

Production sysadmin baseline

[ ] I know the server role
                            [ ] I know the Ubuntu version
                            [ ] I know which services must run
                            [ ] I know which ports must listen
                            [ ] I know where logs are
                            [ ] I know which user runs each app
                            [ ] I know where configs are
                            [ ] I know where data is stored
                            [ ] I know backup location
                            [ ] I know firewall rules
                            [ ] I know how to restart safely
                            [ ] I know how to rollback
                            [ ] I avoid chmod 777
                            [ ] I avoid root direct login
                            [ ] I document changes

Final rule

The Ubuntu CLI is a production microscope.
It lets you inspect the real state of the machine: files, permissions, users, services, logs, ports, network paths, disks and failures. Good troubleshooting means reading evidence before making changes.

Troubleshooting order

1. Observe symptoms
                            2. Check server health
                            3. Check disk and memory
                            4. Check service state
                            5. Read logs
                            6. Check ports
                            7. Check network and DNS
                            8. Check permissions
                            9. Check recent changes
                            10. Apply one fix
                            11. Verify
                            12. Document

3.1 Ubuntu Packages: APT, Snap, repositories, updates, pinning, security and production practices

Package management on Ubuntu

Ubuntu package management is mainly based on APT, which installs, upgrades, removes and resolves software dependencies from configured repositories. Ubuntu also supports Snap, a package format designed for sandboxed applications with automatic refresh behavior.

In production, package management is not only about installing software. It controls security patching, dependency stability, reproducibility, rollback strategy, package provenance, compliance and operational risk.

Tool	Role	Typical usage	Production concern
APT	Main Ubuntu package manager frontend.	Install Nginx, PostgreSQL, Redis, Python packages from Ubuntu repos.	Repository control, upgrade policy, dependency stability.
dpkg	Low-level Debian package tool.	Inspect installed packages or install local `.deb` files.	Does not resolve dependencies like APT.
Snap	Sandboxed application packaging.	Desktop apps, selected server tools, Canonical ecosystem packages.	Automatic refresh, policy control, mixed packaging strategy.
PPA	Third-party repository hosted on Launchpad.	Newer package versions or vendor-specific builds.	Trust, support, upgrade conflicts, governance.
Vendor repo	Repository maintained by software vendor.	Docker, PostgreSQL, NodeSource, Elastic, HashiCorp.	Key management, package pinning, lifecycle tracking.

Core rule: production package management must be intentional: approved repositories, known package versions, tested updates, documented rollback and clear ownership.

Package management architecture

Ubuntu package flow
                            │
                            ├── Repository configuration
                            │       ├── Ubuntu official repositories
                            │       ├── security repositories
                            │       ├── updates repositories
                            │       ├── PPAs
                            │       └── vendor repositories
                            │
                            ├── APT metadata
                            │       ├── package lists
                            │       ├── versions
                            │       ├── dependencies
                            │       └── priorities
                            │
                            ├── Package operations
                            │       ├── install
                            │       ├── upgrade
                            │       ├── remove
                            │       ├── purge
                            │       └── autoremove
                            │
                            └── Operational controls
                            ├── pinning
                            ├── holds
                            ├── unattended upgrades
                            ├── reboot policy
                            └── rollback plan

Decision map

Need standard server package?
                            └── use APT from Ubuntu repositories

                            Need vendor-supported latest version?
                            └── use official vendor repository

                            Need experimental or community package?
                            └── use PPA only with governance

                            Need desktop-style sandboxed app?
                            └── Snap can be acceptable

                            Need strict production reproducibility?
                            └── prefer APT + pinned versions + image build

APT basics: install, upgrade, remove, inspect

APT is the standard daily tool for Ubuntu package operations. It downloads package metadata, resolves dependencies, installs software, upgrades packages and removes software cleanly.

Command	Purpose	Example
`apt update`	Refresh repository metadata.	`sudo apt update`
`apt upgrade`	Upgrade installed packages without removing packages.	`sudo apt upgrade`
`apt full-upgrade`	Upgrade with dependency changes, installs/removals if needed.	`sudo apt full-upgrade`
`apt install`	Install package.	`sudo apt install nginx`
`apt remove`	Remove package but keep config files.	`sudo apt remove nginx`
`apt purge`	Remove package and config files.	`sudo apt purge nginx`
`apt autoremove`	Remove unused dependencies.	`sudo apt autoremove`
`apt policy`	Show installed and candidate version.	`apt policy nginx`

Essential APT commands

# Refresh package metadata
                            sudo apt update

                            # Show upgradeable packages
                            apt list --upgradable

                            # Upgrade packages
                            sudo apt upgrade

                            # Install package
                            sudo apt install nginx

                            # Show package information
                            apt show nginx

                            # Show package versions and source repository
                            apt policy nginx

                            # Search package
                            apt search postgresql

                            # Remove package but keep configuration
                            sudo apt remove nginx

                            # Remove package and configuration
                            sudo apt purge nginx

                            # Remove unused dependencies
                            sudo apt autoremove

APT vs dpkg

Tool	Best for	Important detail
`apt`	Normal package management.	Resolves dependencies from repositories.
`apt-cache`	Older metadata inspection commands.	Still useful in scripts and diagnostics.
`dpkg`	Inspect or install local Debian packages.	Does not automatically resolve dependencies.
`apt-file`	Find which package provides a file.	Requires package metadata installation.

Package inspection

# List installed packages
                            dpkg -l

                            # Filter installed packages
                            dpkg -l | grep nginx

                            # Show files installed by package
                            dpkg -L nginx

                            # Find which package owns a file
                            dpkg -S /usr/sbin/nginx

                            # Show package version
                            dpkg -s nginx | grep Version

                            # Show apt history
                            less /var/log/apt/history.log

                            # Show apt terminal logs
                            less /var/log/apt/term.log

Production habit: before upgrading, run apt list --upgradable and review critical packages such as kernel, OpenSSL, database, web server and runtime.

Repositories: official sources, PPAs, vendor repos and trust

APT installs packages from repositories. Repository governance is critical: every repository added to a production server becomes part of the trust and upgrade surface. Too many uncontrolled PPAs or vendor repositories can make upgrades unpredictable.

Repository type	Usage	Risk	Production rule
Ubuntu main	Official supported packages.	Low.	Default baseline.
Ubuntu universe	Community-maintained packages.	Support scope differs.	Accept with awareness.
Security repo	Security updates.	Must stay enabled.	Never disable casually.
PPA	Community or project-specific builds.	Trust and compatibility risk.	Use only with explicit approval.
Vendor repo	Official software vendor packages.	Key, pinning and lifecycle complexity.	Document and monitor.
Local mirror	Enterprise-controlled package mirror.	Mirror freshness.	Useful for controlled fleets.

Repository locations

# Main APT source files
                            /etc/apt/sources.list
                            /etc/apt/sources.list.d/

                            # Newer Ubuntu systems may use deb822 source files
                            /etc/apt/sources.list.d/*.sources

                            # Trusted keyring locations
                            /etc/apt/keyrings/
                            /usr/share/keyrings/

                            # Apt preferences and pinning
                            /etc/apt/preferences
                            /etc/apt/preferences.d/

Repository inspection commands

# Show active source files
                            ls -lah /etc/apt/sources.list.d/
                            cat /etc/apt/sources.list

                            # Search configured repositories
                            grep -R "^deb" /etc/apt/sources.list /etc/apt/sources.list.d/

                            # Refresh repository metadata
                            sudo apt update

                            # Show repository used for package candidate
                            apt policy nginx

                            # Show all versions available
                            apt-cache madison nginx

                            # Show package origin details
                            apt-cache policy nginx

Vendor repository pattern

Recommended vendor repo pattern:
                            1. Add vendor signing key into /etc/apt/keyrings/
                            2. Add repository source referencing signed-by key
                            3. Run apt update
                            4. Check apt policy package
                            5. Install exact package
                            6. Document repository owner and reason
                            7. Monitor vendor release notes
                            8. Pin if required

Repository risk diagram

New repository added
                            │
                            ├── Can replace existing packages?
                            ├── Can introduce newer dependencies?
                            ├── Can break upgrade path?
                            ├── Is signing key controlled?
                            ├── Is vendor trusted?
                            ├── Is lifecycle documented?
                            └── Is rollback possible?

Production warning: every PPA is a supply-chain and compatibility decision. Do not add PPAs casually on long-lived production servers.

Updates: patching, reboot policy, golden images and upgrade windows

Ubuntu updates must balance security and stability. Security patches should be applied quickly, but critical production systems often require staging validation, maintenance windows and rollback plans. Kernel and libc-related updates may require service restart or full reboot.

Update strategy	Best for	Strength	Watch out
Manual updates	Small systems, controlled maintenance.	Maximum human control.	Can be forgotten.
Unattended security updates	Standard servers.	Fast CVE patching.	Needs reboot/service restart policy.
Monthly patch window	Critical production.	Testing and coordination.	Emergency CVEs still need fast path.
Golden image replacement	Cloud fleets and autoscaling.	Reproducible and rollback-friendly.	Requires image pipeline.
Rolling patching	Clusters and HA services.	No full downtime.	Requires health checks and drain logic.

Update commands

# Refresh metadata
                            sudo apt update

                            # Show upgradeable packages
                            apt list --upgradable

                            # Upgrade packages
                            sudo apt upgrade

                            # More complete dependency-aware upgrade
                            sudo apt full-upgrade

                            # Remove unused dependencies
                            sudo apt autoremove

                            # Check if reboot is required
                            test -f /var/run/reboot-required && cat /var/run/reboot-required

                            # Show packages requiring reboot
                            cat /var/run/reboot-required.pkgs 2>/dev/null

Patch workflow

Patch workflow
                            │
                            ├── Inventory
                            │       ├── OS version
                            │       ├── kernel version
                            │       ├── critical services
                            │       └── package list
                            │
                            ├── Prepare
                            │       ├── backup
                            │       ├── snapshot
                            │       ├── staging test
                            │       └── maintenance window
                            │
                            ├── Patch
                            │       ├── apt update
                            │       ├── apt upgrade
                            │       ├── service validation
                            │       └── reboot if required
                            │
                            └── Verify
                            ├── systemctl --failed
                            ├── journalctl warnings
                            ├── listening ports
                            ├── application smoke tests
                            └── monitoring green

Unattended upgrades

# Install unattended upgrades
                            sudo apt install unattended-upgrades

                            # Configure automatic updates
                            sudo dpkg-reconfigure unattended-upgrades

                            # Main config files
                            /etc/apt/apt.conf.d/20auto-upgrades
                            /etc/apt/apt.conf.d/50unattended-upgrades

                            # Check logs
                            less /var/log/unattended-upgrades/unattended-upgrades.log

Production rule: security updates without reboot planning can create false confidence. A patched kernel is not active until the system boots into it.

Security: CVEs, package provenance, keys and audit trail

Package security is about more than installing updates. It includes repository trust, signing keys, CVE awareness, dependency origin, package version visibility, automatic security updates, rollback and auditability.

Security concern	Diagnostic	Control
Known vulnerable package	Security notices, scanner, package version.	Patch quickly, reboot/restart if needed.
Untrusted repository	Inspect sources and keys.	Remove unused PPAs and vendor repos.
Unsigned or broken repository	`apt update` errors.	Fix keyring or disable repository.
Package replaced by PPA	`apt policy package`.	Pin or remove repository.
No audit trail	Apt history missing from process.	Record update windows and package changes.

Security inspection commands

# Show installed version and candidate
                            apt policy openssl
                            apt policy nginx

                            # Show package details
                            apt show openssl

                            # Show package changelog if available
                            apt changelog openssl

                            # Review apt history
                            less /var/log/apt/history.log

                            # Show recently modified source files
                            sudo find /etc/apt -type f -mtime -30 -ls

                            # Check Ubuntu Pro status if available
                            pro status

Package security flow

Security advisory or CVE
                            │
                            ├── Identify affected package
                            │       └── apt policy package
                            │
                            ├── Check installed version
                            │       └── dpkg -s package
                            │
                            ├── Check available update
                            │       └── apt list --upgradable
                            │
                            ├── Apply patch
                            │       └── apt upgrade package
                            │
                            ├── Restart service if needed
                            │       └── systemctl restart service
                            │
                            ├── Reboot if kernel/system library
                            │       └── reboot-required
                            │
                            └── Verify
                            ├── version updated
                            ├── service healthy
                            └── logs clean

Key management principles

Good:
                            - vendor keys stored in /etc/apt/keyrings/
                            - repository line uses signed-by=
                            - repository owner documented
                            - old repositories removed
                            - package origin checked with apt policy

                            Avoid:
                            - legacy apt-key usage
                            - unknown curl | sudo bash scripts
                            - unmanaged PPAs
                            - repositories kept after one-time install
                            - blind upgrades without package review

Supply-chain rule: never pipe unknown install scripts directly into a root shell on production servers. Download, inspect, verify source, then execute intentionally.

Pinning, holds and version control

Pinning and holds control package versions. They are useful when a service depends on a specific version, when a repository offers unwanted newer packages, or when an upgrade must be temporarily blocked. They should be documented because forgotten pins can create security and maintenance risks.

Mechanism	Purpose	Example use	Risk
`apt-mark hold`	Prevent package upgrades.	Freeze PostgreSQL or Nginx temporarily.	Security patches may be blocked.
APT preferences	Control repository priority.	Prefer Ubuntu repo over PPA.	Misconfiguration can select wrong packages.
Exact version install	Install specific version.	`apt install package=version`	Version may disappear from repo.
Golden image	Freeze whole system baseline.	Cloud server fleet.	Image must be rebuilt for patches.

Hold commands

# Hold a package
                            sudo apt-mark hold nginx

                            # Show held packages
                            apt-mark showhold

                            # Remove hold
                            sudo apt-mark unhold nginx

                            # Install exact version
                            sudo apt install nginx=1.24.0-2ubuntu7

                            # Show available versions
                            apt-cache madison nginx
                            apt policy nginx

APT preferences example

# Example file:
                            # /etc/apt/preferences.d/nginx-pin

                            Package: nginx*
                            Pin: release o=Ubuntu
                            Pin-Priority: 700

                            Package: nginx*
                            Pin: origin "ppa.launchpadcontent.net"
                            Pin-Priority: 400

Version governance flow

Need version control?
                            │
                            ├── Is this temporary?
                            │       ├── yes -> apt-mark hold + ticket + expiry date
                            │       └── no
                            │
                            ├── Is repo priority wrong?
                            │       ├── yes -> APT preferences pinning
                            │       └── no
                            │
                            ├── Need fleet reproducibility?
                            │       ├── yes -> golden image or IaC
                            │       └── no
                            │
                            └── Document package policy
                            ├── package
                            ├── desired version
                            ├── reason
                            ├── owner
                            └── review date

Pinning risks

Risk	Cause	Control
Missed security update	Package held too long.	Review holds regularly.
Dependency conflict	Package versions drift.	Test upgrades in staging.
Wrong repo selected	Bad pin priority.	Check `apt policy`.
Hidden operational debt	No owner or expiry.	Document every hold and pin.

Production rule: every hold or pin must have a reason, owner and review date. Otherwise, it becomes invisible technical debt.

Snap: concept, commands, refresh behavior and production policy

Snap packages bundle applications with their dependencies and run with confinement rules. Snaps are useful for some desktop applications and selected server tools, but production teams must understand refresh behavior, confinement, channels and operational policy before relying on them.

Snap concept	Meaning	Operational impact
Channel	Release track such as stable, candidate, beta, edge.	Controls risk level.
Confinement	Sandbox permissions model.	Can affect filesystem and device access.
Refresh	Automatic update behavior.	Needs maintenance window policy.
Revision	Specific snap build version.	Rollback may use previous revision.
Interface	Permission connection between snap and system resource.	May require manual connection.

Snap essentials

# List installed snaps
                            snap list

                            # Find package
                            snap find code

                            # Install snap
                            sudo snap install package-name

                            # Install from specific channel
                            sudo snap install package-name --channel=stable

                            # Refresh snaps
                            sudo snap refresh

                            # Show refresh schedule
                            snap refresh --time

                            # Show snap information
                            snap info package-name

                            # Remove snap
                            sudo snap remove package-name

Snap operational commands

# Show connections/interfaces
                            snap connections package-name

                            # Connect interface manually
                            sudo snap connect package-name:interface

                            # Revert to previous revision if available
                            sudo snap revert package-name

                            # Hold refresh temporarily
                            sudo snap refresh --hold=24h package-name

                            # Hold all refreshes temporarily
                            sudo snap refresh --hold=24h

                            # Show changes
                            snap changes

                            # Show logs for snap service if applicable
                            snap logs package-name

APT vs Snap decision table

Need	Prefer APT	Prefer Snap
Core server packages	Yes.	Usually no.
Desktop applications	Sometimes.	Often acceptable.
Strict patch window	Easier to control.	Refresh policy must be managed.
Sandboxed app delivery	Less direct.	Good fit.
Traditional system service	Usually better.	Depends on package and support model.

Production warning: avoid unmanaged mixing of APT and Snap for the same role. Define which package system owns each component.

APT and package troubleshooting

Package problems often come from broken dependencies, interrupted installs, repository errors, DNS issues, expired keys, dpkg locks, held packages or third-party repository conflicts. Troubleshooting should start by reading the actual APT error.

Symptom	Likely cause	First command
`Could not get lock`	Another apt or dpkg process is running.	`ps aux \| grep -E 'apt\|dpkg'`
`Temporary failure resolving`	DNS problem.	`resolvectl status`
`NO_PUBKEY`	Missing repository signing key.	Inspect repository and keyring.
`held broken packages`	Dependency conflict or holds.	`apt-mark showhold`
Package version unexpected	PPA or pinning changed candidate.	`apt policy package`
Install interrupted	dpkg half-configured packages.	`sudo dpkg --configure -a`

Repair commands

# Repair interrupted dpkg configuration
                            sudo dpkg --configure -a

                            # Fix broken dependencies
                            sudo apt -f install

                            # Refresh metadata
                            sudo apt update

                            # Clean local package cache
                            sudo apt clean

                            # Check held packages
                            apt-mark showhold

                            # Check locks safely
                            ps aux | grep -E 'apt|dpkg'

                            # Review apt history
                            less /var/log/apt/history.log
                            less /var/log/apt/term.log

Troubleshooting decision tree

APT operation fails
                            │
                            ├── Read exact error
                            │
                            ├── Lock error?
                            │       └── wait or inspect apt/dpkg processes
                            │
                            ├── Network or DNS error?
                            │       └── check ip, route, DNS, proxy
                            │
                            ├── Repository signature error?
                            │       └── check source file and keyring
                            │
                            ├── Dependency conflict?
                            │       └── apt -f install, apt policy, holds
                            │
                            ├── Interrupted install?
                            │       └── dpkg --configure -a
                            │
                            └── Third-party repo conflict?
                            └── disable repo, update, retry in staging

Repository isolation technique

# Temporarily disable a source file
                            sudo mv /etc/apt/sources.list.d/vendor.list \
                            /etc/apt/sources.list.d/vendor.list.disabled

                            # Refresh metadata
                            sudo apt update

                            # Re-check package candidate
                            apt policy package-name

Do not: delete dpkg lock files blindly while package operations are running. You can corrupt the package database. First identify the active process.

Production best practices: governance, reproducibility and rollback

In production, package management must be reproducible. The same server role should use the same repositories, packages, versions, configuration and patching process. Manual package drift is a major source of incidents.

Practice	Why it matters	Implementation
Approved repository list	Controls supply-chain risk.	Document Ubuntu, security and vendor repos.
Package baseline	Improves reproducibility.	Ansible, Packer, Terraform, cloud-init.
Patch windows	Reduces surprise outages.	Monthly standard, emergency CVE fast path.
Staging validation	Catches dependency and config breakage.	Upgrade staging before production.
Rollback plan	Limits outage duration.	Snapshot, AMI, previous image, package downgrade plan.
Change log	Enables incident diagnosis.	Ticket, deployment log, apt history archive.

Production package lifecycle

Package change request
                            │
                            ├── Why is package needed?
                            ├── Which repository provides it?
                            ├── Is vendor trusted?
                            ├── Is version pinned or floating?
                            ├── Has staging been tested?
                            ├── Is rollback possible?
                            └── Is owner documented?
                            │
                            ▼
                            Approved installation
                            │
                            ├── update IaC
                            ├── apply in staging
                            ├── validate
                            ├── apply in production
                            └── document result

Production rules

Do:
                            - use Ubuntu LTS for production
                            - keep security repository enabled
                            - document every external repository
                            - prefer vendor official repositories over random PPAs
                            - test updates in staging
                            - track reboot-required state
                            - keep rollback snapshot or image
                            - automate package baseline
                            - review apt history after changes
                            - monitor security advisories

                            Avoid:
                            - unmanaged PPAs
                            - curl | sudo bash without review
                            - compiling manually into /usr/local without documentation
                            - mixing APT and Snap for the same service role
                            - holding packages forever
                            - patching critical systems without rollback

Infrastructure-as-code examples

Package baseline can be expressed in:
                            - Ansible apt module
                            - cloud-init packages section
                            - Packer image build
                            - Terraform user_data
                            - Dockerfile for containers
                            - shell bootstrap script under version control

                            Goal:
                            rebuild server from code, not memory.

Production rule: if a package is installed manually and nobody documents why, the server has started to become a snowflake.

Package management cheat sheet and final checklist

APT cheat sheet

# Metadata and updates
                            sudo apt update
                            apt list --upgradable
                            sudo apt upgrade
                            sudo apt full-upgrade

                            # Install and remove
                            sudo apt install package-name
                            sudo apt remove package-name
                            sudo apt purge package-name
                            sudo apt autoremove

                            # Inspect
                            apt show package-name
                            apt policy package-name
                            apt-cache madison package-name
                            dpkg -l | grep package-name
                            dpkg -L package-name
                            dpkg -S /path/to/file

                            # Troubleshoot
                            sudo dpkg --configure -a
                            sudo apt -f install
                            apt-mark showhold
                            less /var/log/apt/history.log

                            # Hold
                            sudo apt-mark hold package-name
                            sudo apt-mark unhold package-name

Snap cheat sheet

# Inspect
                            snap list
                            snap find package-name
                            snap info package-name

                            # Install and remove
                            sudo snap install package-name
                            sudo snap install package-name --channel=stable
                            sudo snap remove package-name

                            # Refresh
                            sudo snap refresh
                            snap refresh --time
                            sudo snap refresh --hold=24h package-name

                            # Operations
                            snap changes
                            snap connections package-name
                            snap logs package-name
                            sudo snap revert package-name

Final production checklist

[ ] Ubuntu official repositories are enabled
                            [ ] Security repository is enabled
                            [ ] External repositories are documented
                            [ ] Repository keys are managed in keyrings
                            [ ] PPAs are justified or avoided
                            [ ] Package baseline is automated
                            [ ] Critical package versions are known
                            [ ] Holds and pins are documented
                            [ ] Update policy is defined
                            [ ] Reboot policy is defined
                            [ ] Staging update test exists
                            [ ] Rollback image or snapshot exists
                            [ ] Apt history is reviewed after changes
                            [ ] Snap policy is defined
                            [ ] Security advisories are monitored

Final rule

Package management is production governance.
APT and Snap are not just installation tools. They define what software runs, where it comes from, how it is patched, how it is upgraded, and how safely the system can recover when a package change goes wrong.

7.5 Ubuntu Customization & Optimization: GNOME, themes, keyboard shortcuts, battery, swappiness and cleanup

Customization and optimization objective

Ubuntu can be customized at several levels: desktop interface, GNOME extensions, themes, icons, fonts, keyboard shortcuts, startup applications, power settings, memory behavior and cleanup routines. The objective is to improve usability and performance without making the system fragile.

Good customization is controlled, reversible and documented. Bad customization creates unstable extensions, broken themes, slow login, excessive startup services, battery drain, hidden disk growth and difficult troubleshooting.

Area	Goal	Main tools	Risk if unmanaged
GNOME interface	Improve desktop workflow.	Settings, Tweaks, Extensions.	Shell instability or visual inconsistency.
Themes and icons	Adapt visual style.	GTK themes, icon themes, user themes.	Broken UI after updates.
Keyboard shortcuts	Accelerate daily workflow.	Settings, custom commands, terminal shortcuts.	Conflicts and hard-to-remember mappings.
Battery	Reduce power usage on laptops.	Power profiles, TLP, powertop.	Thermal issues or poor autonomy.
Memory tuning	Control swap behavior.	`vm.swappiness`, monitoring.	Slow system if tuned blindly.
Cleanup	Keep disk usage healthy.	APT cleanup, journal vacuum, cache review.	Disk full or accidental data loss.

Core rule: customize for productivity, not for complexity. Every optimization should be measurable, reversible and safe after system updates.

Optimization map

Ubuntu workstation optimization
                            │
                            ├── Interface
                            │       ├── GNOME Settings
                            │       ├── GNOME Tweaks
                            │       ├── dock behavior
                            │       ├── workspace behavior
                            │       └── display settings
                            │
                            ├── Extensions
                            │       ├── shell extensions
                            │       ├── app indicators
                            │       ├── tiling helpers
                            │       └── workflow enhancers
                            │
                            ├── Visual style
                            │       ├── GTK theme
                            │       ├── icon theme
                            │       ├── cursor theme
                            │       └── fonts
                            │
                            ├── Productivity
                            │       ├── keyboard shortcuts
                            │       ├── terminal shortcuts
                            │       ├── custom commands
                            │       └── launcher workflow
                            │
                            └── Performance
                            ├── startup apps
                            ├── battery profile
                            ├── swappiness
                            ├── cache cleanup
                            └── logs and disk hygiene

Decision shortcut

Want a better desktop?
                            ├── first use built-in Settings
                            ├── then GNOME Tweaks
                            ├── then a few trusted extensions
                            └── avoid stacking many shell modifications

                            Want better performance?
                            ├── remove useless startup apps
                            ├── check disk and memory
                            ├── tune battery profile
                            ├── clean caches safely
                            └── measure before changing kernel parameters

GNOME interface: built-in customization first

Ubuntu Desktop uses GNOME with Ubuntu-specific defaults. Before installing extensions or themes, start with built-in settings: dock placement, appearance, workspaces, display scaling, night light, keyboard layout, privacy, notifications and power profile.

Interface area	Where to configure	Useful for
Appearance	Settings → Appearance.	Light/dark mode, accent style, dock behavior.
Displays	Settings → Displays.	Resolution, scaling, multi-monitor layout.
Keyboard	Settings → Keyboard.	Shortcuts, input sources, custom commands.
Power	Settings → Power.	Battery profile, screen blank, suspend behavior.
Notifications	Settings → Notifications.	Reduce distractions.
Privacy	Settings → Privacy.	Location, file history, camera, microphone.

GNOME Tweaks installation

# Install GNOME Tweaks
                            sudo apt update
                            sudo apt install gnome-tweaks

                            # Launch from terminal
                            gnome-tweaks

                            # Install extension app if available
                            sudo apt install gnome-shell-extension-manager

Interface customization flow

Customize desktop
                            │
                            ├── Built-in Settings
                            │       ├── appearance
                            │       ├── display
                            │       ├── keyboard
                            │       ├── power
                            │       └── privacy
                            │
                            ├── GNOME Tweaks
                            │       ├── fonts
                            │       ├── window behavior
                            │       ├── startup apps
                            │       └── themes if enabled
                            │
                            ├── Extensions
                            │       ├── install only useful ones
                            │       ├── verify compatibility
                            │       └── disable if shell breaks
                            │
                            └── Backup preferences
                            ├── document installed extensions
                            ├── export dotfiles if needed
                            └── keep restore point

Useful inspection commands

# GNOME Shell version
                            gnome-shell --version

                            # Current desktop session
                            echo $XDG_CURRENT_DESKTOP
                            echo $XDG_SESSION_TYPE

                            # Display environment
                            echo $WAYLAND_DISPLAY
                            echo $DISPLAY

                            # Installed GNOME packages
                            dpkg -l | grep -i gnome | head

                            # User configuration directories
                            ls -lah ~/.config
                            ls -lah ~/.local/share

Interface rule: use the simplest native setting first. Extensions should solve real workflow problems, not replace every part of the desktop.

GNOME extensions: workflow power with compatibility discipline

GNOME extensions modify the behavior of GNOME Shell. They can add indicators, tiling, dock improvements, clipboard managers, system monitors or workflow enhancements. However, extensions run inside the desktop shell environment and can break after GNOME upgrades if not maintained.

Extension type	Use case	Risk
App indicators	Tray icons for apps.	Low to medium.
Dock customization	Dock behavior and visual changes.	Medium if overlapping Ubuntu dock.
Tiling assistants	Window snapping and layouts.	Medium if shell version changes.
System monitors	CPU, RAM, network indicators.	Can add overhead if badly implemented.
Theme/user shell	Shell visual customization.	Can break visual consistency.

Install and manage extensions

# Install Extension Manager if available
                            sudo apt update
                            sudo apt install gnome-shell-extension-manager

                            # List enabled extensions
                            gnome-extensions list --enabled

                            # List all extensions
                            gnome-extensions list

                            # Show extension info
                            gnome-extensions info extension-name

                            # Disable extension
                            gnome-extensions disable extension-name

                            # Enable extension
                            gnome-extensions enable extension-name

Extension safety flow

Before installing extension
                            │
                            ├── Is it really needed?
                            ├── Is it compatible with GNOME version?
                            ├── Is it maintained?
                            ├── Does it overlap with another extension?
                            ├── Can it be disabled easily?
                            └── Is there a restore point before major desktop changes?

Extension troubleshooting

# Disable all extensions for diagnostic
                            gnome-extensions disable extension-name

                            # Check GNOME Shell logs
                            journalctl /usr/bin/gnome-shell --since "1 hour ago"

                            # Check session errors
                            journalctl --user -p warning --since "1 hour ago"

                            # Restart GNOME Shell on Xorg
                            # Press Alt+F2, type r, press Enter

                            # On Wayland, log out and log back in

Recommended extension policy

Good:
                            - install only a few extensions
                            - prefer maintained extensions
                            - remove unused extensions
                            - document core workflow extensions
                            - test after Ubuntu upgrade

                            Avoid:
                            - stacking many visual extensions
                            - installing abandoned extensions
                            - relying on extensions for critical access
                            - changing many extensions at once
                            - ignoring shell errors after login

Extension warning: a broken GNOME extension can make the desktop unstable. Keep the list small and know how to disable extensions.

GTK themes, icon themes, cursor themes and visual consistency

Ubuntu visual customization can use GTK themes, icon themes, cursor themes and fonts. Themes can improve comfort and readability, but deep theming may break after application or desktop updates, especially when applications use different toolkit versions.

Theme element	What it changes	Typical location
GTK theme	Window and widget appearance.	`~/.themes`, `/usr/share/themes`
Icon theme	Application and file icons.	`~/.icons`, `~/.local/share/icons`
Cursor theme	Mouse pointer style.	`~/.icons`, system icon paths.
Shell theme	GNOME Shell top bar, menus, overview.	Requires user theme support.
Fonts	UI and document typography.	GNOME Tweaks.

Theme directories

# User theme directories
                            mkdir -p ~/.themes
                            mkdir -p ~/.icons
                            mkdir -p ~/.local/share/icons

                            # System theme directories
                            ls -lah /usr/share/themes
                            ls -lah /usr/share/icons

                            # User config
                            ls -lah ~/.config
                            ls -lah ~/.local/share

Theme installation flow

Install theme safely
                            │
                            ├── Download from trusted source
                            ├── Extract theme
                            ├── Place in user directory
                            │       ├── ~/.themes
                            │       └── ~/.icons
                            ├── Open GNOME Tweaks
                            ├── Select theme
                            ├── Verify apps look correct
                            └── Keep original theme as fallback

Visual customization checklist

[ ] Theme source is trusted
                            [ ] Theme supports current GNOME/GTK version
                            [ ] Original theme remains available
                            [ ] Icons are readable in light and dark mode
                            [ ] Terminal colors remain readable
                            [ ] File manager remains usable
                            [ ] Browser and developer tools remain clear
                            [ ] Screenshots and presentations look professional
                            [ ] Theme can be reverted quickly

Common theme problems

Problem	Likely cause	Correction
Invisible text	Theme color mismatch.	Return to default or compatible theme.
Broken window controls	Unsupported shell or GTK version.	Use maintained theme.
Icons missing	Incomplete icon theme.	Install fallback icon set.
App does not follow theme	Different toolkit or sandbox package.	Accept limitation or configure app separately.

Visual rule: prioritize readability and stability over extreme theming, especially on a professional workstation.

Keyboard shortcuts: customize workflow and reduce friction

Keyboard shortcuts are one of the highest-return customizations. They reduce mouse use, speed up window management, launch tools quickly and make development workflows smoother. The best shortcuts are easy to remember and do not conflict with application shortcuts.

Shortcut area	Example action	Good candidate
Terminal	Open terminal quickly.	`Ctrl + Alt + T`
Window management	Move, maximize, tile windows.	Super + arrows.
Workspaces	Switch between focused contexts.	Super + Page Up/Page Down.
Screenshots	Capture screen or region.	Print Screen shortcuts.
Custom app launch	Open IDE, browser, file manager.	Custom commands.
Scripts	Run productivity automation.	Custom script binding.

Custom shortcut flow

Create custom shortcut
                            │
                            ├── Open Settings
                            ├── Go to Keyboard
                            ├── Open Keyboard Shortcuts
                            ├── Add Custom Shortcut
                            ├── Enter name
                            ├── Enter command
                            ├── Assign key combination
                            └── Test immediately

Useful custom commands

# Open terminal
                            gnome-terminal

                            # Open file manager
                            nautilus

                            # Open browser
                            firefox

                            # Open specific project directory
                            gnome-terminal --working-directory=/home/user/projects

                            # Run a custom script
                            /home/user/bin/daily-check.sh

                            # Lock screen
                            gnome-screensaver-command -l

Shortcut design principles

Good shortcuts:
                            - easy to remember
                            - close to existing habits
                            - not conflicting with IDE/browser
                            - consistent by category
                            - documented if custom
                            - limited to high-frequency actions

                            Avoid:
                            - too many shortcuts
                            - hard-to-type combinations
                            - overriding critical app shortcuts
                            - shortcuts that run destructive scripts
                            - undocumented production scripts

Developer workflow example

Workflow:
                            Super + Enter       -> terminal
                            Super + E           -> file manager
                            Super + B           -> browser
                            Super + D           -> IDE
                            Super + Shift + L   -> lock screen
                            Super + Shift + M   -> monitoring dashboard
                            Super + Shift + T   -> project terminal

Shortcut rule: customize shortcuts for actions you perform every day. If you use an action once a month, it does not need a shortcut.

Battery and power optimization for laptops

Battery optimization on Ubuntu starts with power profiles, screen brightness, sleep behavior, background applications and hardware drivers. More advanced users can use tools like TLP or powertop, but should avoid applying random power tweaks without verifying their effect.

Power area	Optimization	Trade-off
Power profile	Use power saver on battery.	Lower performance.
Screen brightness	Reduce brightness.	Less visibility in bright environment.
Sleep behavior	Shorter idle suspend.	May interrupt background tasks.
Startup apps	Disable unnecessary background apps.	Some apps need manual launch.
Bluetooth	Disable when unused.	Peripheral inconvenience.
GPU mode	Use integrated graphics if possible.	Lower graphics performance.

Power commands

# Show power profiles if supported
                            powerprofilesctl

                            # Set power saver
                            powerprofilesctl set power-saver

                            # Set balanced
                            powerprofilesctl set balanced

                            # Set performance if available
                            powerprofilesctl set performance

                            # Battery status
                            upower -i $(upower -e | grep BAT) 2>/dev/null

                            # Show running processes
                            top

                            # Show startup applications through GUI
                            gnome-session-properties

TLP and powertop

# Install TLP
                            sudo apt update
                            sudo apt install tlp

                            # Enable TLP
                            sudo systemctl enable --now tlp

                            # Show TLP status
                            sudo tlp-stat -s

                            # Install powertop
                            sudo apt install powertop

                            # Run powertop
                            sudo powertop

Battery optimization flow

Battery drains quickly
                            │
                            ├── Check power profile
                            ├── Reduce screen brightness
                            ├── Close high CPU apps
                            ├── Disable unused Bluetooth
                            ├── Review startup apps
                            ├── Check browser tabs
                            ├── Check GPU mode
                            ├── Use TLP if needed
                            └── Measure again

Laptop routine

On battery:
                            [ ] power-saver profile
                            [ ] lower brightness
                            [ ] close heavy browser tabs
                            [ ] stop unused containers or VMs
                            [ ] disable Bluetooth if unused
                            [ ] avoid heavy indexing jobs
                            [ ] monitor CPU usage
                            [ ] suspend when idle

Battery warning: Docker containers, VMs, IDE indexers, browsers and video calls can dominate power usage. Tune applications before blaming the OS.

Swappiness: memory behavior and swap tuning

Swappiness controls how aggressively the Linux kernel tends to move memory pages to swap. Lower values generally reduce swap tendency; higher values allow more swapping. It is not a magic performance setting. The correct value depends on RAM size, workload, disk speed and latency tolerance.

Context	Typical approach	Reason
Desktop with enough RAM	Moderately low swappiness.	Keep apps responsive.
Small laptop	Do not disable swap blindly.	Swap can prevent abrupt OOM.
Database server	Avoid active swapping.	Swap can hurt latency heavily.
Batch workload	Some swap may be acceptable.	Throughput may tolerate latency.
VM with slow disk	Be careful with swap activity.	Slow storage amplifies latency.

Inspect memory and swappiness

# Current swappiness
                            cat /proc/sys/vm/swappiness
                            sysctl vm.swappiness

                            # Memory overview
                            free -h

                            # Swap devices/files
                            swapon --show

                            # Swap activity
                            vmstat 1

                            # Top memory processes
                            ps aux --sort=-%mem | head -30

Temporary and persistent swappiness

# Temporary change until reboot
                            sudo sysctl -w vm.swappiness=10

                            # Persistent configuration
                            sudo vim /etc/sysctl.d/99-custom-swappiness.conf

                            # Example content
                            vm.swappiness = 10

                            # Apply persistent sysctl files
                            sudo sysctl --system

                            # Verify
                            sysctl vm.swappiness

Swappiness decision tree

Considering swappiness change?
                            │
                            ├── Is there real swap activity?
                            │       ├── no -> do not tune yet
                            │       └── yes
                            │
                            ├── Is system slow because of swapping?
                            │       ├── no -> investigate app first
                            │       └── yes
                            │
                            ├── Is RAM insufficient?
                            │       ├── yes -> reduce workload or add RAM
                            │       └── no
                            │
                            └── Test lower value
                            ├── apply temporarily
                            ├── measure behavior
                            ├── document result
                            └── make persistent only if useful

Memory interpretation

Signal	Meaning
High used memory	Normal if Linux is using cache.
Low available memory	Possible pressure.
Swap used but stable	Not always a problem.
Active swap in/out	Performance warning.
OOM logs	Memory exhaustion occurred.

Swappiness rule: tune only after observing memory pressure. The best fix for constant swapping is often less workload or more RAM, not only a sysctl value.

Cleanup: temporary files, caches, logs and safe disk hygiene

Cleanup keeps Ubuntu healthy, but careless cleanup can delete useful data. Focus on safe areas first: APT cache, unused packages, journal size, trash, thumbnails, old downloads and application caches. Be very careful with database directories, Docker volumes and project folders.

Cleanup target	Command / location	Safety level
APT cache	`sudo apt clean`	Safe.
Unused packages	`sudo apt autoremove`	Usually safe, review output.
Systemd journal	`journalctl --vacuum-time=14d`	Safe if retention is acceptable.
User trash	File manager or trash path.	Safe if reviewed.
Downloads	`~/Downloads`	Manual review recommended.
Docker data	`docker system df`	Careful, volumes may contain data.
Database files	`/var/lib/mysql`, `/var/lib/postgresql`	Dangerous to delete manually.

Safe cleanup commands

# Check disk usage first
                            df -h

                            # Show top-level directory sizes
                            sudo du -xhd1 / 2>/dev/null | sort -h

                            # Clean APT cache
                            sudo apt clean

                            # Remove unused packages
                            sudo apt autoremove

                            # Show journal size
                            journalctl --disk-usage

                            # Vacuum journal by time
                            sudo journalctl --vacuum-time=14d

                            # Vacuum journal by size
                            sudo journalctl --vacuum-size=1G

User cache cleanup

# Check user cache size
                            du -sh ~/.cache 2>/dev/null

                            # Check thumbnails
                            du -sh ~/.cache/thumbnails 2>/dev/null

                            # Remove thumbnail cache
                            rm -rf ~/.cache/thumbnails/*

                            # Review downloads manually
                            du -sh ~/Downloads/*
                            ls -lah ~/Downloads

Docker cleanup caution

# Show Docker disk usage
                            docker system df

                            # Remove unused images only
                            docker image prune

                            # Remove stopped containers
                            docker container prune

                            # More aggressive cleanup, use carefully
                            docker system prune

                            # Dangerous for persistent data if volumes included
                            docker system prune --volumes

Cleanup decision tree

Need disk space?
                            │
                            ├── Check filesystem
                            │       └── df -h
                            │
                            ├── Find large directories
                            │       └── du -xhd1 /
                            │
                            ├── Safe cleanup first
                            │       ├── apt clean
                            │       ├── apt autoremove
                            │       ├── journal vacuum
                            │       └── trash/downloads review
                            │
                            ├── App-specific cleanup
                            │       ├── browser cache
                            │       ├── Docker images
                            │       └── old build artifacts
                            │
                            └── Dangerous data zones
                            ├── databases
                            ├── Docker volumes
                            ├── project data
                            └── backups

Cleanup warning: never delete files in database directories or Docker volumes unless you fully understand what owns them and have a backup.

Troubleshooting customization and optimization problems

Customization problems usually appear after installing extensions, changing themes, modifying startup apps, changing power tools, altering sysctl settings or cleaning too aggressively. The fastest recovery is to isolate the last change and revert it.

Symptom	Likely cause	First check	Fix direction
Desktop shell unstable	Broken GNOME extension.	`gnome-extensions list --enabled`	Disable recent extension.
Text unreadable	Theme mismatch.	GNOME Tweaks theme settings.	Return to default theme.
Login slow	Startup apps or extensions.	User journal and startup apps.	Disable nonessential startup entries.
Battery drains fast	High CPU app, containers, VM, browser.	`top`, power profile.	Stop heavy workload, set power saver.
System slow after tuning	Bad sysctl or swap behavior.	`vmstat 1`, sysctl values.	Revert tuning.
Missing files after cleanup	Over-aggressive delete.	Trash, backup, shell history.	Restore from backup if possible.

Diagnostic commands

# GNOME version and session
                            gnome-shell --version
                            echo $XDG_SESSION_TYPE

                            # Enabled extensions
                            gnome-extensions list --enabled

                            # User session warnings
                            journalctl --user -p warning --since "1 hour ago"

                            # GNOME Shell logs
                            journalctl /usr/bin/gnome-shell --since "1 hour ago"

                            # Resource usage
                            top
                            free -h
                            df -h

                            # Swappiness
                            sysctl vm.swappiness

Rollback flow

Customization issue
                            │
                            ├── What changed last?
                            │       ├── extension
                            │       ├── theme
                            │       ├── shortcut
                            │       ├── startup app
                            │       ├── power tool
                            │       └── sysctl value
                            │
                            ├── Disable or revert one change
                            ├── Log out and log back in if needed
                            ├── Check user journal
                            ├── Verify desktop stability
                            └── Document stable configuration

Safe mode mindset

If desktop is unstable:
                            1. Switch to terminal if possible
                            2. Disable recent extensions
                            3. Return to default theme
                            4. Remove recent startup app
                            5. Reboot or log out
                            6. Restore Timeshift snapshot if needed

Useful reset targets

# Disable one extension
                            gnome-extensions disable extension-name

                            # List user autostart entries
                            ls -lah ~/.config/autostart

                            # Move suspicious autostart entry away
                            mkdir -p ~/.config/autostart.disabled
                            mv ~/.config/autostart/app.desktop ~/.config/autostart.disabled/

                            # Revert sysctl custom file
                            sudo mv /etc/sysctl.d/99-custom-swappiness.conf /tmp/
                            sudo sysctl --system

Troubleshooting rule: customization failures are easiest to fix when you change one thing at a time and keep a restore point before major changes.

Final customization and optimization checklist

Customization checklist

[ ] Built-in Settings used before extensions
                            [ ] GNOME Tweaks installed if needed
                            [ ] Extension list is short and useful
                            [ ] Extensions are compatible with GNOME version
                            [ ] Unused extensions removed
                            [ ] Theme source is trusted
                            [ ] Default theme remains available
                            [ ] Icons remain readable
                            [ ] Terminal colors remain readable
                            [ ] Keyboard shortcuts are documented
                            [ ] No shortcut runs destructive command
                            [ ] Startup applications are reviewed
                            [ ] Restore point exists before major desktop changes

Optimization checklist

[ ] Power profile configured
                            [ ] Battery behavior reviewed
                            [ ] Heavy startup apps disabled
                            [ ] Disk usage checked
                            [ ] APT cache cleaned when needed
                            [ ] Journal size controlled
                            [ ] User caches reviewed
                            [ ] Docker usage reviewed if installed
                            [ ] Swappiness observed before tuning
                            [ ] sysctl changes documented
                            [ ] Performance measured before and after changes
                            [ ] Cleanup avoids databases and important volumes

Command cheat sheet

# GNOME and extensions
                            gnome-shell --version
                            gnome-extensions list
                            gnome-extensions list --enabled
                            gnome-extensions disable extension-name
                            sudo apt install gnome-tweaks
                            sudo apt install gnome-shell-extension-manager

                            # Power
                            powerprofilesctl
                            powerprofilesctl set power-saver
                            powerprofilesctl set balanced
                            sudo apt install tlp
                            sudo systemctl enable --now tlp
                            sudo tlp-stat -s

                            # Memory and swappiness
                            free -h
                            swapon --show
                            vmstat 1
                            sysctl vm.swappiness
                            sudo sysctl -w vm.swappiness=10

                            # Cleanup
                            df -h
                            sudo du -xhd1 / 2>/dev/null | sort -h
                            sudo apt clean
                            sudo apt autoremove
                            journalctl --disk-usage
                            sudo journalctl --vacuum-time=14d

Final rule

A good Ubuntu workstation is comfortable, fast and recoverable.
Customize GNOME carefully, keep extensions minimal, use readable themes, build a keyboard-driven workflow, optimize battery and memory only with evidence, clean disk space safely, and keep rollback options before major changes.

Minimal safe profile

Minimum safe customization profile:
                            - default theme fallback
                            - small extension set
                            - documented shortcuts
                            - reviewed startup apps
                            - power profile selected
                            - disk cleanup routine
                            - no blind sysctl tuning
                            - no dangerous cleanup
                            - restore point before major changes
                            - stable desktop after logout/reboot test

4.1 Ubuntu Security Hardening: SSH, UFW, users, roles, updates, audit, fail2ban and cloud security

Security hardening objective

Ubuntu hardening means reducing the attack surface of a machine while keeping it maintainable. The goal is not to make the server impossible to use. The goal is to control access, reduce exposed ports, keep packages patched, monitor suspicious events, protect secrets, isolate services and keep a clear recovery path.

A secure Ubuntu server is built layer by layer: SSH access, users and sudo, firewall, package updates, service isolation, log visibility, intrusion throttling, cloud network rules, backups and incident procedures.

Security layer	Goal	Main tools	Failure prevented
SSH	Control remote administration.	`sshd_config`, SSH keys, logs.	Brute force, root login abuse, password compromise.
Firewall	Expose only required ports.	`ufw`, `nftables`, cloud security groups.	Unwanted network exposure.
Users and sudo	Apply least privilege.	`adduser`, `usermod`, `sudoers`.	Shared accounts, excessive privileges, poor auditability.
Updates	Patch known vulnerabilities.	`apt`, unattended upgrades, reboot policy.	Known CVEs left exploitable.
Audit	See what happened.	`journalctl`, `auth.log`, auditd, central logs.	Blind incidents and no forensic trail.
Cloud	Control external exposure and identity.	Security groups, IAM, metadata settings, snapshots.	Public services, leaked secrets, weak recovery.

Core rule: hardening must remain observable and reversible. Every security change should be documented, testable and recoverable.

Hardening architecture map

Internet
                            │
                            ├── DNS
                            ├── CDN / WAF / Load Balancer
                            └── Cloud security group
                            │
                            ▼
                            Ubuntu server
                            │
                            ├── UFW / nftables
                            ├── SSH daemon
                            ├── system users and sudo
                            ├── systemd services
                            ├── package security updates
                            ├── logs and audit trail
                            ├── fail2ban or rate controls
                            ├── secrets and permissions
                            └── backups and restore plan
                            │
                            ▼
                            Application layer
                            ├── Nginx
                            ├── app runtime
                            ├── database
                            ├── Redis
                            └── monitoring agent

Security baseline priorities

Priority 1:
                            - SSH keys
                            - no root SSH login
                            - firewall enabled
                            - security updates
                            - backups

                            Priority 2:
                            - fail2ban or equivalent
                            - sudo policy
                            - service users
                            - secret permissions
                            - log review

                            Priority 3:
                            - auditd
                            - central logging
                            - file integrity checks
                            - vulnerability scanning
                            - CIS-style benchmark review

                            Priority 4:
                            - bastion host
                            - VPN-only administration
                            - WAF
                            - immutable images
                            - automated rebuild

SSH hardening: keys, root login, password policy and safe reload

SSH is usually the main administration door. On a public server, weak SSH configuration is one of the first risks to address. The safe baseline is key-based login, no direct root login, no password authentication when keys are validated, and limited users.

Setting	Recommended value	Why
`PermitRootLogin`	`no`	Forces named-user login and sudo audit trail.
`PasswordAuthentication`	`no`	Blocks password brute-force login.
`PubkeyAuthentication`	`yes`	Uses SSH keys.
`AllowUsers`	Specific admin users only.	Reduces account exposure.
`X11Forwarding`	`no` on servers.	Reduces unused features.
`MaxAuthTries`	Small value such as `3`.	Limits repeated authentication attempts.

Generate and install key

# On admin workstation
                            ssh-keygen -t ed25519 -C "admin-server-access"

                            # Copy public key to server
                            ssh-copy-id deploy@server.example.com

                            # Test key login before changing server policy
                            ssh deploy@server.example.com

Safe SSH hardening flow

Open current SSH session
                            │
                            ├── Create deploy user
                            ├── Add SSH key
                            ├── Test second SSH session
                            ├── Backup sshd_config
                            ├── Apply hardening
                            ├── Validate syntax
                            ├── Restart SSH
                            ├── Test third SSH session
                            └── Close old session only after success

Server-side SSH configuration

# Create backup
                            sudo cp /etc/ssh/sshd_config /etc/ssh/sshd_config.bak.$(date +%Y%m%d-%H%M%S)

                            # Edit configuration
                            sudo vim /etc/ssh/sshd_config

                            # Recommended baseline
                            PermitRootLogin no
                            PasswordAuthentication no
                            PubkeyAuthentication yes
                            X11Forwarding no
                            MaxAuthTries 3
                            AllowUsers deploy

                            # Validate syntax before restart
                            sudo sshd -t

                            # Restart SSH
                            sudo systemctl restart ssh

                            # Check service and logs
                            systemctl status ssh
                            journalctl -u ssh --since "15 min ago"

SSH diagnostic commands

# Service status
                            systemctl status ssh

                            # Listening port
                            ss -lntp | grep ssh

                            # Authentication logs
                            journalctl -u ssh --since today
                            sudo tail -100 /var/log/auth.log

                            # Current sessions
                            who
                            w

                            # Show user key file permissions
                            ls -lah ~/.ssh
                            ls -lah ~/.ssh/authorized_keys

Lockout warning: never disable password login until key login has been tested from a separate terminal.

UFW firewall: minimal exposure and safe activation

UFW is a simple firewall frontend commonly used on Ubuntu. The baseline is to deny incoming traffic by default, allow outgoing traffic, then open only the required service ports. On cloud servers, UFW complements cloud security groups; it does not replace them.

Port	Service	Exposure rule	Comment
`22/tcp`	SSH	Restrict by source IP if possible.	Administration path.
`80/tcp`	HTTP	Open only for web server or redirect.	Often redirects to HTTPS.
`443/tcp`	HTTPS	Open for public web apps.	Primary web entry point.
`5432/tcp`	PostgreSQL	Private network only.	Never public unless heavily controlled.
`6379/tcp`	Redis	Private network only.	Do not expose publicly.
`3306/tcp`	MySQL/MariaDB	Private network only.	Restrict by source and credentials.

Safe UFW baseline

# Show current status
                            sudo ufw status verbose

                            # Default policy
                            sudo ufw default deny incoming
                            sudo ufw default allow outgoing

                            # Allow SSH before enabling firewall
                            sudo ufw allow OpenSSH

                            # Web server ports if needed
                            sudo ufw allow 80/tcp
                            sudo ufw allow 443/tcp

                            # Enable firewall
                            sudo ufw enable

                            # Verify
                            sudo ufw status verbose
                            sudo ufw status numbered

Firewall decision diagram

New service installed
                            │
                            ├── Does it need network access?
                            │       ├── no  -> keep local only
                            │       └── yes
                            │
                            ├── Is it public-facing?
                            │       ├── yes -> allow only required public port
                            │       └── no
                            │
                            ├── Is it internal only?
                            │       ├── yes -> restrict to private CIDR or source IP
                            │       └── no
                            │
                            └── Is exposure documented?
                            ├── yes -> add rule
                            └── no  -> do not expose

Restrict access by source

# Allow SSH only from one admin IP
                            sudo ufw allow from 203.0.113.10 to any port 22 proto tcp

                            # Allow PostgreSQL only from app server
                            sudo ufw allow from 10.0.1.25 to any port 5432 proto tcp

                            # Delete rule by number
                            sudo ufw status numbered
                            sudo ufw delete 3

                            # Deny a specific IP
                            sudo ufw deny from 198.51.100.55

UFW diagnostics

# UFW status
                            sudo ufw status verbose

                            # Listening ports
                            ss -lntp

                            # Check service locally
                            curl -I http://localhost

                            # Check logs if logging enabled
                            sudo ufw logging on
                            sudo journalctl -k --since "30 min ago" | grep UFW

Cloud warning: a port may be blocked by UFW, cloud security group, NACL, load balancer, application bind address or service config. Check every layer.

Users, groups, sudo, service accounts and least privilege

Least privilege means each human and service gets only the permissions needed to do its job. Avoid shared admin accounts, avoid running applications as root, and keep secrets readable only by the users that need them.

Identity type	Recommended practice	Example
Human admin	Named account with sudo if required.	`deploy`, `ops_admin`
Application user	Dedicated non-login user.	`myapp`, `www-data`
Database user	Application-specific DB account.	`myapp_db_user`
Root	Avoid direct login.	Use sudo with audit trail.
Shared account	Avoid.	Hard to audit and revoke safely.

User and group commands

# Create admin user
                            sudo adduser deploy
                            sudo usermod -aG sudo deploy

                            # Create service user without login shell
                            sudo adduser --system --group --home /srv/myapp myapp

                            # Show user identity
                            id deploy
                            groups deploy

                            # Show sudo permissions
                            sudo -l

                            # Edit sudoers safely
                            sudo visudo

                            # Add sudoers file safely
                            sudo visudo -f /etc/sudoers.d/deploy

Least privilege model

Human admin
                            │
                            ├── SSH key login
                            ├── sudo for admin actions
                            └── no direct root login

                            Application service
                            │
                            ├── dedicated user
                            ├── limited filesystem access
                            ├── systemd service unit
                            └── no shell login if not needed

                            Secrets
                            │
                            ├── owned by service user or root
                            ├── mode 600 or 640
                            ├── not world-readable
                            └── not committed to git

Secret and file permissions

# Private key
                            chmod 600 /home/deploy/.ssh/id_ed25519

                            # SSH directory
                            chmod 700 /home/deploy/.ssh

                            # Application env file
                            sudo chown root:myapp /srv/myapp/.env
                            sudo chmod 640 /srv/myapp/.env

                            # Application directory
                            sudo chown -R myapp:www-data /srv/myapp
                            sudo chmod -R u=rwX,g=rX,o= /srv/myapp

Account review commands

# List users
                            cut -d: -f1 /etc/passwd

                            # Show users with shell access
                            grep -E "/bin/bash|/bin/sh|/usr/bin/zsh" /etc/passwd

                            # Show sudo group members
                            getent group sudo

                            # Show recent sudo usage
                            sudo grep sudo /var/log/auth.log | tail -100

Production rule: a service should not need root to run. Bind privileged ports through Nginx or systemd capabilities rather than running the application as root.

Security updates, patch windows and reboot policy

Security updates close known vulnerabilities. On Ubuntu, patching must include package updates, service restarts, kernel reboots when required, and validation after patching. Production teams should define standard patch windows and emergency patch paths.

Patch model	Best for	Advantage	Risk
Manual patching	Critical systems with maintenance windows.	Control and validation.	Can be delayed.
Unattended security updates	Standard servers.	Fast CVE response.	Needs restart and reboot policy.
Golden image rebuild	Cloud fleets and stateless systems.	Reproducible and rollback-friendly.	Requires image pipeline.
Rolling patching	HA clusters.	Minimizes downtime.	Requires health checks and drain logic.

Patch commands

# Refresh metadata
                            sudo apt update

                            # Show upgradeable packages
                            apt list --upgradable

                            # Apply upgrades
                            sudo apt upgrade

                            # Full upgrade with dependency changes
                            sudo apt full-upgrade

                            # Remove obsolete packages
                            sudo apt autoremove

                            # Check reboot requirement
                            test -f /var/run/reboot-required && cat /var/run/reboot-required
                            cat /var/run/reboot-required.pkgs 2>/dev/null

Patch workflow diagram

Security update required
                            │
                            ├── Identify affected packages
                            ├── Check staging compatibility
                            ├── Snapshot or backup
                            ├── Apply apt updates
                            ├── Restart affected services
                            ├── Reboot if required
                            ├── Validate application
                            ├── Check logs
                            └── Document package changes

Unattended upgrades

# Install unattended upgrades
                            sudo apt install unattended-upgrades

                            # Enable basic automatic security updates
                            sudo dpkg-reconfigure unattended-upgrades

                            # Config files
                            /etc/apt/apt.conf.d/20auto-upgrades
                            /etc/apt/apt.conf.d/50unattended-upgrades

                            # Logs
                            sudo less /var/log/unattended-upgrades/unattended-upgrades.log

Post-patch validation

# Failed services
                            systemctl --failed

                            # Warnings since patch
                            journalctl -p warning --since "30 min ago"

                            # Listening ports
                            ss -lntp

                            # Application smoke test
                            curl -I https://example.com

                            # Confirm kernel after reboot
                            uname -a

Patch risk: installing a kernel security update without rebooting leaves the old kernel running. Track reboot-required status.

fail2ban: throttling brute-force attempts and noisy clients

fail2ban watches logs and temporarily bans IP addresses that match suspicious patterns, such as repeated SSH authentication failures. It is not a replacement for key-based SSH and firewall rules, but it is useful as an extra layer against brute-force noise.

Component	Meaning	Example
Jail	Protection rule for a service.	`sshd`
Filter	Log pattern that detects failures.	SSH failed login regex.
Action	What to do when matched.	Ban IP with firewall.
findtime	Time window for counting failures.	`10m`
maxretry	Number of failures before ban.	`5`
bantime	Ban duration.	`1h`

Install and baseline

# Install
                            sudo apt update
                            sudo apt install fail2ban

                            # Create local jail config
                            sudo cp /etc/fail2ban/jail.conf /etc/fail2ban/jail.local

                            # Edit local config
                            sudo vim /etc/fail2ban/jail.local

                            # Restart and enable
                            sudo systemctl enable fail2ban
                            sudo systemctl restart fail2ban

                            # Check status
                            sudo systemctl status fail2ban

Example SSH jail

[sshd]
                            enabled = true
                            port = ssh
                            filter = sshd
                            logpath = %(sshd_log)s
                            maxretry = 5
                            findtime = 10m
                            bantime = 1h

fail2ban operations

# Overall status
                            sudo fail2ban-client status

                            # Jail status
                            sudo fail2ban-client status sshd

                            # Ban an IP manually
                            sudo fail2ban-client set sshd banip 198.51.100.10

                            # Unban an IP
                            sudo fail2ban-client set sshd unbanip 198.51.100.10

                            # Logs
                            sudo journalctl -u fail2ban --since today
                            sudo tail -100 /var/log/fail2ban.log

Layered protection

SSH security layers
                            │
                            ├── SSH keys
                            ├── no root login
                            ├── no password auth
                            ├── AllowUsers
                            ├── UFW source restriction
                            ├── fail2ban
                            └── bastion or VPN for stricter environments

Practical rule: fail2ban reduces noise and slows attackers, but real SSH security starts with key-based access and minimal exposure.

Audit, logs, detection and security visibility

Hardening without visibility is incomplete. You need to know who logged in, who used sudo, which services failed, what ports are listening, which packages changed, and whether suspicious authentication events occurred.

Question	Command / source	Why it matters
Who logged in?	`last`, `who`, `w`	Session visibility.
Who used sudo?	`/var/log/auth.log`	Privilege escalation audit.
Which SSH attempts failed?	`journalctl -u ssh`	Brute-force or misconfiguration detection.
Which packages changed?	`/var/log/apt/history.log`	Patch and change traceability.
Which services failed?	`systemctl --failed`	Operational health.
Which ports are open?	`ss -lntp`	Exposure check.

Security log commands

# SSH logs
                            journalctl -u ssh --since today

                            # Authentication logs
                            sudo tail -200 /var/log/auth.log

                            # Failed SSH attempts
                            sudo grep -i "failed password" /var/log/auth.log | tail -100

                            # Sudo usage
                            sudo grep -i "sudo" /var/log/auth.log | tail -100

                            # Recent logins
                            last -a | head -30

                            # Current sessions
                            who
                            w

Audit architecture

Ubuntu host
                            │
                            ├── journald
                            ├── auth.log
                            ├── apt history
                            ├── service logs
                            ├── firewall logs
                            ├── fail2ban logs
                            └── application logs
                            │
                            ▼
                            Central logging
                            │
                            ├── CloudWatch
                            ├── ELK / OpenSearch
                            ├── Loki
                            ├── SIEM
                            └── long-term archive

auditd baseline

# Install audit daemon
                            sudo apt install auditd audispd-plugins

                            # Enable service
                            sudo systemctl enable auditd
                            sudo systemctl start auditd

                            # Status
                            sudo systemctl status auditd

                            # Search audit logs
                            sudo ausearch -m USER_LOGIN
                            sudo ausearch -m USER_CMD
                            sudo aureport --summary

Security review snapshot

echo "== USERS WITH SHELL =="
                            grep -E "/bin/bash|/bin/sh|/usr/bin/zsh" /etc/passwd

                            echo "== SUDO GROUP =="
                            getent group sudo

                            echo "== OPEN PORTS =="
                            ss -lntp

                            echo "== FAILED UNITS =="
                            systemctl --failed

                            echo "== RECENT SSH LOGS =="
                            journalctl -u ssh --since "24 hours ago" --no-pager

Detection rule: collect logs before you need them. During an incident, missing logs cannot be reconstructed.

Cloud security: security groups, metadata, IAM, snapshots and bastion design

On cloud servers, Ubuntu security is shared between the operating system and the cloud perimeter. A safe design uses security groups, private subnets, IAM roles, metadata protection, snapshots, central logs and restricted administration paths.

Cloud control	Purpose	Ubuntu-side complement
Security group	Cloud firewall at instance or interface level.	UFW local firewall.
Private subnet	Keep databases and internal services non-public.	Bind services to private IP or localhost.
Bastion host	Controlled admin entry point.	SSH restricted to bastion IP.
IAM role	Grant cloud API permissions without static keys.	Avoid storing cloud keys on disk.
Metadata service controls	Reduce credential exposure risk.	Limit local process access and use least privilege.
Snapshots	Rollback and disaster recovery.	Test restore and document recovery.
Cloud logs	Centralize evidence and monitoring.	Forward Ubuntu logs and app logs.

Cloud exposure model

Public Internet
                            │
                            ├── HTTPS only
                            ▼
                            Load balancer / WAF
                            │
                            ├── forwards to app subnet
                            ▼
                            Application server
                            │
                            ├── UFW allows LB source only
                            ├── SSH allowed from bastion only
                            └── app talks to DB privately
                            │
                            ▼
                            Database server
                            ├── no public IP
                            ├── private subnet
                            └── port allowed only from app server

Cloud hardening checklist

[ ] Only required public ports are open
                            [ ] SSH is restricted by source IP or bastion
                            [ ] Database has no public exposure
                            [ ] Redis has no public exposure
                            [ ] Security groups are documented
                            [ ] UFW rules match cloud security model
                            [ ] IAM role uses least privilege
                            [ ] No static cloud keys in home directories
                            [ ] Instance metadata policy is reviewed
                            [ ] Snapshots are scheduled
                            [ ] Restore has been tested
                            [ ] Logs are shipped centrally
                            [ ] Monitoring alerts are configured

Cloud diagnostic commands

# Local listening ports
                            ss -lntp

                            # Local firewall
                            sudo ufw status verbose

                            # Instance view of routes and IPs
                            ip a
                            ip r

                            # Check outbound cloud metadata access if policy allows it
                            curl -s --max-time 2 http://169.254.169.254/ || true

                            # Check public service from server
                            curl -I http://localhost

                            # Check logs
                            journalctl -p warning --since "30 min ago"

Cloud rule: do not rely on one firewall layer only. Use security groups and UFW together for critical servers.

Security incident response: brute force, exposed port, compromise suspicion

Security incidents must be handled carefully. The first objective is to preserve evidence and stop further damage. Avoid making random changes before collecting logs, current sessions, open ports and process state.

Incident	Immediate checks	Containment
SSH brute force	Auth logs, fail2ban, source IPs.	Restrict SSH, disable passwords, ban sources.
Unexpected open port	`ss -lntp`, service status, firewall.	Stop service or close firewall rule.
Suspicious user	`/etc/passwd`, sudo group, auth logs.	Lock account, preserve logs.
Package tampering	Apt history, modified repositories.	Disable unknown repos, rebuild if needed.
Possible compromise	Processes, ports, cron, users, logs.	Isolate host, snapshot disk, rotate credentials.
Secret exposure	Access logs, shell history, app logs.	Rotate keys, revoke tokens, audit access.

First response commands

# Current users and sessions
                            who
                            w
                            last -a | head -50

                            # Listening ports and processes
                            ss -lntp
                            ps aux --sort=-%cpu | head -30

                            # Failed services
                            systemctl --failed

                            # Recent auth activity
                            sudo tail -300 /var/log/auth.log

                            # SSH logs
                            journalctl -u ssh --since "24 hours ago"

                            # Recent package changes
                            less /var/log/apt/history.log

Incident response flow

Security alert
                            │
                            ├── Preserve evidence
                            │       ├── logs
                            │       ├── sessions
                            │       ├── ports
                            │       └── process list
                            │
                            ├── Determine scope
                            │       ├── one account
                            │       ├── one service
                            │       ├── one host
                            │       └── multiple systems
                            │
                            ├── Contain
                            │       ├── firewall rule
                            │       ├── disable account
                            │       ├── stop service
                            │       └── isolate instance
                            │
                            ├── Eradicate
                            │       ├── patch
                            │       ├── remove access
                            │       ├── rotate secrets
                            │       └── rebuild if needed
                            │
                            └── Recover
                            ├── restore service
                            ├── validate logs
                            ├── monitor closely
                            └── write postmortem

Credential rotation checklist

[ ] SSH keys reviewed
                            [ ] Unknown keys removed
                            [ ] Sudo users reviewed
                            [ ] Application secrets rotated
                            [ ] Database passwords rotated
                            [ ] Cloud API keys revoked or rotated
                            [ ] CI/CD tokens rotated
                            [ ] Webhook secrets rotated
                            [ ] TLS private key checked
                            [ ] Backup access reviewed

Compromise rule: if root compromise is credible, rebuilding from a trusted image is usually safer than trying to clean the machine manually.

Final hardening checklist and command cheat sheet

Ubuntu security baseline checklist

[ ] Ubuntu LTS is used
                            [ ] Packages are updated
                            [ ] Reboot-required state is checked
                            [ ] Named admin user exists
                            [ ] Root SSH login is disabled
                            [ ] SSH key login is validated
                            [ ] Password SSH login is disabled
                            [ ] UFW default deny incoming is enabled
                            [ ] Only required ports are open
                            [ ] Database ports are private only
                            [ ] Redis ports are private only
                            [ ] fail2ban is installed if public SSH exists
                            [ ] Sudo group is reviewed
                            [ ] Service users are non-root
                            [ ] Secrets are not world-readable
                            [ ] Logs are reviewed and centralized if possible
                            [ ] Backups or snapshots exist
                            [ ] Restore has been tested
                            [ ] Cloud security groups are minimal
                            [ ] Incident response procedure exists

Quick security snapshot

echo "== OS =="
                            lsb_release -a

                            echo "== REBOOT REQUIRED =="
                            test -f /var/run/reboot-required && cat /var/run/reboot-required || echo "no reboot flag"

                            echo "== UFW =="
                            sudo ufw status verbose

                            echo "== OPEN PORTS =="
                            ss -lntp

                            echo "== SUDO USERS =="
                            getent group sudo

                            echo "== SSH LOGS =="
                            journalctl -u ssh --since "24 hours ago" --no-pager | tail -100

Command cheat sheet

# SSH
                            sudo sshd -t
                            sudo systemctl restart ssh
                            journalctl -u ssh --since today

                            # Firewall
                            sudo ufw status verbose
                            sudo ufw allow OpenSSH
                            sudo ufw allow 443/tcp
                            sudo ufw enable

                            # Users
                            id deploy
                            getent group sudo
                            sudo visudo
                            sudo passwd -l username

                            # Updates
                            sudo apt update
                            apt list --upgradable
                            sudo apt upgrade
                            test -f /var/run/reboot-required && cat /var/run/reboot-required

                            # fail2ban
                            sudo fail2ban-client status
                            sudo fail2ban-client status sshd

                            # Logs
                            sudo tail -100 /var/log/auth.log
                            journalctl -p warning --since today
                            systemctl --failed

Final rule

Ubuntu hardening is a living process.
Secure the access path, minimize exposed ports, patch regularly, run services with least privilege, monitor logs, keep backups, test recovery, and document every exception.

Minimal secure server profile

Minimum secure Ubuntu server:
                            - Ubuntu LTS
                            - SSH key access only
                            - no root SSH login
                            - UFW enabled
                            - only required ports open
                            - security updates applied
                            - non-root service users
                            - secrets protected
                            - logs available
                            - backups tested
                            - cloud perimeter restricted

4.2 Ubuntu Performance & Robustness: CPU, RAM, IO, kernel, tuning, monitoring and production stability

Performance and robustness objective

Ubuntu performance engineering is not random tuning. It is a disciplined loop: observe real metrics, identify the bottleneck, make one controlled change, measure again, document the result, and keep rollback possible.

Production robustness comes from stable LTS releases, predictable package updates, systemd service supervision, good logs, monitoring, disk hygiene, firewalling, backup, capacity planning and incident runbooks. Ubuntu is considered stable in production because these operational primitives are mature, widely documented and automation-friendly.

Layer	Observe	Typical bottleneck	Main tools
CPU	Load average, CPU %, run queue, per-process usage.	Too many workers, hot loop, compression, TLS, DB query CPU.	`top`, `htop`, `pidstat`, `mpstat`
Memory	Used RAM, available RAM, swap, OOM kills.	Memory leak, cache pressure, too many processes.	`free`, `vmstat`, `journalctl`
Disk / IO	IO wait, disk latency, queue depth, filesystem usage.	Slow volume, log growth, DB writes, Docker layers.	`iostat`, `iotop`, `df`, `du`
Network	Ports, connections, packet errors, latency, throughput.	Firewall, DNS, saturation, SYN flood, bad route.	`ss`, `ip`, `mtr`, `nload`
Services	systemd state, restarts, logs, health checks.	Crash loop, bad config, missing dependency.	`systemctl`, `journalctl`

Core method: measure first, tune second. Most bad tuning comes from changing kernel parameters before proving where the bottleneck is.

Performance investigation map

Application is slow
                            │
                            ├── CPU saturated?
                            │       ├── yes -> top, htop, pidstat, app profiler
                            │       └── no
                            │
                            ├── Memory pressure?
                            │       ├── yes -> free, vmstat, OOM logs, process RSS
                            │       └── no
                            │
                            ├── IO wait high?
                            │       ├── yes -> iostat, iotop, df, DB/log writes
                            │       └── no
                            │
                            ├── Network slow?
                            │       ├── yes -> ss, ip, mtr, DNS, firewall
                            │       └── no
                            │
                            ├── Service unstable?
                            │       ├── yes -> systemctl, journalctl, restart policy
                            │       └── no
                            │
                            └── Application bottleneck?
                            ├── DB query
                            ├── external API
                            ├── lock contention
                            ├── cache miss
                            └── bad algorithm

Install performance toolkit

sudo apt update

                            sudo apt install -y \
                            sysstat \
                            iotop \
                            htop \
                            nload \
                            iftop \
                            mtr-tiny \
                            dstat \
                            strace \
                            lsof \
                            curl \
                            dnsutils

Production warning: tools such as strace, perf or heavy tracing can add overhead. Use carefully on busy production systems.

CPU: load average, saturation, processes and worker sizing

CPU performance issues usually appear as high load average, high user CPU, high system CPU, excessive context switching or too many runnable processes. On web servers, wrong worker counts can create CPU contention or memory pressure.

Metric	Meaning	Warning sign	Command
Load average	Runnable or waiting tasks over 1, 5, 15 min.	Load consistently above CPU cores.	`uptime`
User CPU	Application code CPU usage.	One process dominates.	`top`, `pidstat`
System CPU	Kernel work.	High network, IO, syscall overhead.	`mpstat`
IO wait	CPU waiting on disk.	App slow but CPU not busy.	`iostat`, `top`
Steal time	VM CPU stolen by hypervisor.	Cloud instance contention.	`mpstat`
Context switches	Task switching overhead.	Too many workers or threads.	`vmstat`

CPU commands

# Load average and uptime
                            uptime

                            # Interactive CPU/process view
                            top
                            htop

                            # Per-CPU statistics
                            mpstat -P ALL 1

                            # Per-process CPU every second
                            pidstat -u 1

                            # Process tree
                            ps aux --sort=-%cpu | head -30

                            # Threads of a process
                            ps -L -p PID -o pid,tid,pcpu,pmem,comm

CPU diagnosis flow

High CPU or high load
                            │
                            ├── Is load higher than CPU core count?
                            │       └── uptime, nproc
                            │
                            ├── Is CPU user, system, iowait or steal?
                            │       └── top, mpstat
                            │
                            ├── Which process dominates?
                            │       └── ps aux --sort=-%cpu
                            │
                            ├── Is it app code, DB, web server, backup, cron?
                            │       └── systemctl, logs, cron
                            │
                            ├── Did traffic increase?
                            │       └── nginx logs, app metrics
                            │
                            └── Did a deployment or package update happen?
                            └── deploy logs, apt history

Worker sizing examples

Gunicorn starting point:
                            workers = (2 * CPU cores) + 1

                            Example:
                            2 vCPU -> 5 workers
                            4 vCPU -> 9 workers

                            But verify with:
                            - memory per worker
                            - request latency
                            - DB connection limit
                            - CPU saturation
                            - queue time
                            - error rate

                            Celery:
                            - CPU-bound tasks: concurrency near CPU cores
                            - IO-bound tasks: higher concurrency can help
                            - DB-heavy tasks: limit by database capacity

CPU rule: more workers are not always better. Too many workers can increase memory usage, DB connections, context switches and latency.

Memory: RAM, cache, swap, OOM killer and service limits

Linux uses free memory for cache, so “used memory” is not automatically a problem. The important indicators are available memory, swap activity, OOM kills, growing process RSS, and whether memory pressure correlates with latency or crashes.

Metric	Meaning	Bad sign	Command
Available RAM	Memory that can be used without heavy reclaim.	Very low for sustained period.	`free -h`
Swap used	Memory pages moved to disk.	Growing swap + latency.	`swapon --show`
si / so	Swap in / swap out activity.	Non-zero under load.	`vmstat 1`
RSS	Resident memory per process.	Process grows without bound.	`ps`, `top`
OOM kill	Kernel killed process due to memory exhaustion.	Service disappears suddenly.	`journalctl -k`

Memory commands

# Memory overview
                            free -h

                            # Swap
                            swapon --show

                            # VM activity
                            vmstat 1

                            # Top memory processes
                            ps aux --sort=-%mem | head -30

                            # Kernel OOM events
                            journalctl -k --since today | grep -i -E "oom|killed process"

                            # Memory per service process
                            systemctl status myservice
                            ps -o pid,rss,vsz,cmd -C gunicorn

Memory diagnosis flow

Service slow or killed
                            │
                            ├── Is available memory low?
                            │       └── free -h
                            │
                            ├── Is swap active?
                            │       └── swapon --show, vmstat 1
                            │
                            ├── Any OOM kills?
                            │       └── journalctl -k | grep -i oom
                            │
                            ├── Which process uses memory?
                            │       └── ps aux --sort=-%mem
                            │
                            ├── Did memory grow after deploy?
                            │       └── compare metrics before/after
                            │
                            └── Can service be limited?
                            └── systemd MemoryMax, worker count, app config

systemd memory limit example

# /etc/systemd/system/myapp.service.d/limits.conf
                            [Service]
                            MemoryMax=1G
                            MemoryHigh=800M
                            Restart=on-failure
                            RestartSec=5

                            # Apply
                            sudo systemctl daemon-reload
                            sudo systemctl restart myapp
                            systemctl status myapp

Swap policy

Context	Swap recommendation	Reason
Small VM	Small swap can prevent abrupt OOM.	Graceful degradation.
Latency-sensitive DB	Avoid heavy swap activity.	Swap can destroy latency.
Batch worker	Some swap acceptable.	Throughput may tolerate latency.

Memory rule: swap used is not always fatal. Active swapping under load is the real warning sign.

Disk and IO: latency, throughput, filesystem usage and log growth

Disk IO bottlenecks often look like application slowness. CPU may appear idle while requests are stuck waiting for disk. Common causes: database writes, slow cloud volume, Docker logs, journal growth, backups, missing indexes, swap activity or full filesystem.

Symptom	Likely cause	Verification	Correction
High app latency	IO wait or DB disk pressure.	`iostat -xz 1`	Faster disk, batching, DB tuning.
Disk full	Logs, Docker, uploads, backups.	`df -h`, `du -sh`	Retention, cleanup, bigger volume.
Swap activity	RAM shortage.	`vmstat 1`	Reduce workers, add RAM, tune app.
Docker grows fast	Images, containers, logs, volumes.	`docker system df`	Log rotation, prune carefully.
Journal too large	Systemd journal retention unmanaged.	`journalctl --disk-usage`	Vacuum or configure retention.

Disk and IO commands

# Filesystem usage
                            df -h

                            # Largest top-level directories
                            sudo du -sh /* 2>/dev/null | sort -h

                            # Common growth areas
                            sudo du -sh /var/log/*
                            sudo du -sh /var/lib/docker/*
                            sudo du -sh /var/lib/postgresql/* 2>/dev/null

                            # Block devices
                            lsblk -f
                            findmnt

                            # IO statistics
                            iostat -xz 1

                            # IO by process
                            sudo iotop -o

                            # Journal usage
                            journalctl --disk-usage

IO bottleneck flow

Application latency spike
                            │
                            ├── Is CPU iowait high?
                            │       └── top, iostat
                            │
                            ├── Which disk is busy?
                            │       └── iostat -xz 1
                            │
                            ├── Which process writes?
                            │       └── iotop -o
                            │
                            ├── Is filesystem near full?
                            │       └── df -h
                            │
                            ├── Did logs or Docker grow?
                            │       └── du -sh /var/log /var/lib/docker
                            │
                            └── Is database the writer?
                            ├── check slow queries
                            ├── check checkpoints
                            └── check volume IOPS

Safe cleanup examples

# Clean apt cache
                            sudo apt clean

                            # Remove unused packages
                            sudo apt autoremove

                            # Vacuum systemd journal by time
                            sudo journalctl --vacuum-time=14d

                            # Vacuum systemd journal by size
                            sudo journalctl --vacuum-size=1G

                            # Docker usage
                            docker system df

                            # Docker cleanup - careful on production
                            docker image prune
                            docker container prune

Preventive controls

Prevent disk incidents:
                            - alert when filesystem > 80%
                            - separate /var for log-heavy servers
                            - configure logrotate
                            - configure Docker log rotation
                            - monitor journal size
                            - monitor database volume
                            - keep backup volume separate
                            - test volume resize procedure

Critical rule: never manually delete unknown files inside database directories. Use database-native cleanup, backups, vacuum or retention procedures.

Network performance: ports, connections, latency, packet loss and throughput

Network issues may appear as API latency, timeouts, intermittent failures, failed database connections, slow downloads or connection storms. Diagnose from local socket state outward: listening ports, established connections, interface errors, DNS, routing, packet loss and remote latency.

Question	Command	What to look for
Which ports listen?	`ss -lntp`	Expected services only.
How many connections?	`ss -s`	Established, time-wait, orphaned sockets.
Interface errors?	`ip -s link`	RX/TX errors, dropped packets.
DNS working?	`dig`, `resolvectl`	Resolver latency and correctness.
Packet loss or route issue?	`mtr -rw`	Loss, latency, bad hop.
Bandwidth usage?	`nload`, `iftop`	Unexpected egress or ingress.

Network commands

# Socket summary
                            ss -s

                            # Listening TCP ports
                            ss -lntp

                            # Established connections
                            ss -antp

                            # Network interfaces and counters
                            ip -s link

                            # Routes
                            ip r

                            # DNS
                            resolvectl status
                            dig example.com

                            # Latency and packet loss
                            ping -c 5 1.1.1.1
                            mtr -rw example.com

                            # Live bandwidth
                            nload
                            sudo iftop

Network diagnosis flow

Network latency or timeout
                            │
                            ├── Local service listening?
                            │       └── ss -lntp
                            │
                            ├── Firewall blocking?
                            │       └── ufw status, cloud security group
                            │
                            ├── DNS slow or wrong?
                            │       └── dig, resolvectl
                            │
                            ├── Route correct?
                            │       └── ip r
                            │
                            ├── Packet loss?
                            │       └── ping, mtr
                            │
                            ├── Interface drops?
                            │       └── ip -s link
                            │
                            └── Too many connections?
                            └── ss -s, logs, rate limits

Connection pressure examples

# Count connections by state
                            ss -ant | awk 'NR>1 {state[$1]++} END {for (s in state) print s, state[s]}'

                            # Top remote IPs connected to port 443
                            ss -ant '( sport = :443 )' | awk 'NR>1 {print $5}' | cut -d: -f1 | sort | uniq -c | sort -nr | head

                            # Check Nginx access bursts
                            sudo tail -1000 /var/log/nginx/access.log | awk '{print $1}' | sort | uniq -c | sort -nr | head

Network rule: separate reachability, DNS, firewall, service binding and application errors. They are different failure layers.

Kernel and sysctl: controlled tuning, limits and safe defaults

Kernel tuning should be conservative. Ubuntu defaults are reasonable for general production. Change kernel parameters only when you understand the workload and can measure before and after. Keep changes versioned and reversible.

Area	Parameter / control	Why it matters	Warning
File descriptors	`LimitNOFILE`, `ulimit`	Many sockets/files.	Must align with app and systemd.
Swappiness	`vm.swappiness`	Swap tendency.	Do not blindly set to zero.
TCP backlog	`net.core.somaxconn`	Connection bursts.	App backlog must also match.
Ephemeral ports	`ip_local_port_range`	Outbound connection scale.	Usually not first bottleneck.
Kernel logs	`dmesg`, `journalctl -k`	OOM, disk, driver, network errors.	Read before tuning.

Inspect kernel and limits

# Kernel version
                            uname -a

                            # CPU cores
                            nproc

                            # Current sysctl values
                            sysctl vm.swappiness
                            sysctl net.core.somaxconn
                            sysctl net.ipv4.ip_local_port_range

                            # Current shell limits
                            ulimit -a

                            # systemd service limits
                            systemctl show nginx | grep -E "LimitNOFILE|LimitNPROC"

                            # Kernel messages
                            dmesg -T | tail -100
                            journalctl -k --since today

Safe sysctl pattern

# Temporary test until reboot
                            sudo sysctl -w net.core.somaxconn=4096

                            # Persistent setting
                            sudo vim /etc/sysctl.d/99-custom-performance.conf

                            # Example content
                            net.core.somaxconn = 4096
                            vm.swappiness = 10

                            # Apply
                            sudo sysctl --system

                            # Verify
                            sysctl net.core.somaxconn
                            sysctl vm.swappiness

systemd limit example

# Create override
                            sudo systemctl edit nginx

                            # Add:
                            [Service]
                            LimitNOFILE=65535

                            # Apply
                            sudo systemctl daemon-reload
                            sudo systemctl restart nginx

                            # Verify
                            systemctl show nginx | grep LimitNOFILE

Tuning decision tree

Want to tune kernel?
                            │
                            ├── Is bottleneck measured?
                            │       ├── no -> measure first
                            │       └── yes
                            │
                            ├── Is app configured consistently?
                            │       ├── no -> tune app first
                            │       └── yes
                            │
                            ├── Is change reversible?
                            │       ├── no -> do not apply
                            │       └── yes
                            │
                            └── Apply one change
                            ├── measure again
                            ├── document
                            └── keep rollback

Tuning rule: kernel parameters are not magic. Bad sysctl tuning can reduce stability or hide the real application bottleneck.

Robustness with systemd: restart policy, health checks, limits and dependencies

Production robustness depends on what happens when a process fails. systemd can restart services, limit resources, order dependencies, isolate users, set environment files and expose logs. A fragile script becomes a production service when it has a proper unit.

systemd feature	Purpose	Example
`Restart`	Restart process after failure.	`Restart=on-failure`
`RestartSec`	Delay before restart.	`RestartSec=5`
`StartLimitBurst`	Prevent infinite crash loops.	`StartLimitBurst=5`
`MemoryMax`	Limit memory usage.	`MemoryMax=1G`
`LimitNOFILE`	Raise file descriptor limit.	`LimitNOFILE=65535`
`User`	Run service as non-root user.	`User=myapp`

Robust service unit

[Unit]
                            Description=My application service
                            After=network.target
                            StartLimitIntervalSec=60
                            StartLimitBurst=5

                            [Service]
                            User=myapp
                            Group=myapp
                            WorkingDirectory=/srv/myapp
                            EnvironmentFile=/srv/myapp/.env
                            ExecStart=/srv/myapp/venv/bin/gunicorn config.wsgi:application \
                            --bind 127.0.0.1:8000 \
                            --workers 3
                            Restart=on-failure
                            RestartSec=5
                            TimeoutStopSec=30
                            LimitNOFILE=65535
                            MemoryMax=1G

                            [Install]
                            WantedBy=multi-user.target

Service robustness flow

Service process
                            │
                            ├── runs as non-root user
                            ├── starts after dependencies
                            ├── has environment file
                            ├── logs to journald
                            ├── restarts on failure
                            ├── has restart backoff
                            ├── has resource limits
                            └── is enabled at boot
                            │
                            ▼
                            Operations
                            ├── systemctl status
                            ├── journalctl -u service
                            ├── systemctl restart service
                            ├── systemctl show service
                            └── alerts on restart count

Service diagnostics

# Status
                            systemctl status myapp

                            # Logs
                            journalctl -u myapp --since "1 hour ago"
                            journalctl -u myapp -f

                            # Check restart count and limits
                            systemctl show myapp | grep -E "NRestarts|Restart|Memory|LimitNOFILE"

                            # Failed units
                            systemctl --failed

                            # Reload unit changes
                            sudo systemctl daemon-reload

Robustness rule: every production daemon should be a systemd-managed service with logs, restart policy, non-root user and clear operational commands.

Monitoring: host metrics, service metrics, logs, alerts and SLOs

Monitoring makes performance visible before users complain. The minimum production stack should monitor CPU, memory, disk, IO, network, service state, open ports, logs, reboot-required state, certificate expiry, backups and application health.

Metric family	Examples	Alert idea
CPU	CPU %, load average, steal, iowait.	Sustained saturation over baseline.
Memory	Available RAM, swap activity, OOM events.	Low available memory or OOM kill.
Disk	Filesystem %, inode %, IO latency.	Filesystem above 80-90%.
Network	Throughput, packet drops, connection count.	Unexpected drop/error spike.
Services	systemd failed units, restart count.	Service failed or crash-looping.
Security	SSH failures, auth failures, UFW denies.	Spike above baseline.

Monitoring stack example

Ubuntu server
                            │
                            ├── node exporter
                            ├── journald logs
                            ├── application metrics
                            ├── nginx metrics/logs
                            ├── database exporter
                            └── backup status
                            │
                            ▼
                            Observability platform
                            ├── Prometheus
                            ├── Grafana
                            ├── Loki / ELK
                            ├── Alertmanager
                            └── incident channel

Local monitoring commands

# CPU and memory
                            top
                            free -h
                            vmstat 1

                            # Disk and IO
                            df -h
                            iostat -xz 1
                            iotop -o

                            # Network
                            ss -s
                            ip -s link
                            nload

                            # Services and logs
                            systemctl --failed
                            journalctl -p warning --since "30 min ago"

                            # Reboot required
                            test -f /var/run/reboot-required && cat /var/run/reboot-required

Alerting baseline

Recommended alerts:
                            [ ] Disk filesystem > 80%
                            [ ] Disk filesystem > 90%
                            [ ] Inode usage high
                            [ ] Service failed
                            [ ] Reboot required too long
                            [ ] OOM kill detected
                            [ ] Swap activity sustained
                            [ ] CPU saturation sustained
                            [ ] IO wait sustained
                            [ ] Backup failed
                            [ ] Certificate expires soon
                            [ ] SSH failure spike
                            [ ] HTTP 5xx spike
                            [ ] Database unavailable

Monitoring rule: alerts must be actionable. Every alert needs severity, owner, impact, first checks and definition of done.

Troubleshooting playbooks: slow server, crash loop, disk full, memory pressure

Symptom matrix

Symptom	First checks	Likely cause	Action
Server slow	`top`, `free -h`, `iostat`	CPU, memory or IO saturation.	Identify resource, reduce load, scale.
Service crash loop	`systemctl status`, `journalctl -u`	Bad config, dependency, permission, OOM.	Fix root cause, then restart.
Disk full	`df -h`, `du -sh`	Logs, Docker, DB, backups.	Clean safely, add retention, resize.
Memory pressure	`free`, `vmstat`, OOM logs.	Leak, too many workers, cache pressure.	Reduce workers, limit service, add RAM.
Network timeout	`ss`, `ip`, `mtr`, DNS.	Firewall, DNS, route, saturation.	Fix correct layer.

One-shot diagnostic

echo "== HOST =="
                            hostnamectl

                            echo "== UPTIME =="
                            uptime

                            echo "== CPU/MEM =="
                            free -h
                            top -b -n1 | head -30

                            echo "== DISK =="
                            df -h

                            echo "== PORTS =="
                            ss -lntp

                            echo "== FAILED SERVICES =="
                            systemctl --failed

                            echo "== WARNINGS =="
                            journalctl -p warning --since "30 min ago" --no-pager | tail -100

Universal performance decision tree

Production issue
                            │
                            ├── Is it user-visible?
                            │       ├── yes -> check app SLO, HTTP 5xx, latency
                            │       └── no  -> check monitoring and trend
                            │
                            ├── Resource saturation?
                            │       ├── CPU -> top, pidstat
                            │       ├── RAM -> free, vmstat, OOM
                            │       ├── IO  -> iostat, iotop
                            │       └── NET -> ss, ip, mtr
                            │
                            ├── Service instability?
                            │       └── systemctl, journalctl
                            │
                            ├── Recent change?
                            │       ├── deployment
                            │       ├── apt upgrade
                            │       ├── config change
                            │       └── traffic spike
                            │
                            └── Fix strategy
                            ├── rollback
                            ├── reduce load
                            ├── scale resource
                            ├── tune one parameter
                            └── monitor result

Change discipline

During performance incident:
                            [ ] Do not change many things at once
                            [ ] Capture metrics before change
                            [ ] Identify the bottleneck
                            [ ] Apply one controlled change
                            [ ] Measure again
                            [ ] Keep rollback possible
                            [ ] Document root cause
                            [ ] Add alert if missing
                            [ ] Add runbook step if useful

Incident rule: random restarts can temporarily hide symptoms but lose evidence. Capture logs and metrics first when possible.

Performance and robustness checklist

Production performance baseline

[ ] Ubuntu LTS is used
                            [ ] Packages are updated
                            [ ] Reboot policy exists
                            [ ] CPU metrics are monitored
                            [ ] Memory and swap are monitored
                            [ ] Disk usage is monitored
                            [ ] IO latency is monitored
                            [ ] Network errors are monitored
                            [ ] systemd failed units are alerted
                            [ ] Service restart count is monitored
                            [ ] Logs are centralized if possible
                            [ ] Backups are monitored
                            [ ] Restore has been tested
                            [ ] Capacity baseline is documented
                            [ ] Load test exists for critical services
                            [ ] Runbooks exist for CPU/RAM/IO/disk incidents

Robust service checklist

[ ] Service runs under non-root user
                            [ ] systemd unit is versioned
                            [ ] Restart policy is configured
                            [ ] Resource limits are configured if needed
                            [ ] Logs are visible with journalctl
                            [ ] Health check exists
                            [ ] Environment file permissions are strict
                            [ ] Deployment rollback is possible
                            [ ] Service starts at boot
                            [ ] Dependencies are documented

Command cheat sheet

# CPU
                            uptime
                            top
                            htop
                            mpstat -P ALL 1
                            pidstat -u 1

                            # Memory
                            free -h
                            vmstat 1
                            swapon --show
                            journalctl -k | grep -i oom

                            # Disk / IO
                            df -h
                            du -sh /var/*
                            iostat -xz 1
                            iotop -o
                            journalctl --disk-usage

                            # Network
                            ss -s
                            ss -lntp
                            ip -s link
                            mtr -rw example.com
                            nload

                            # Services
                            systemctl --failed
                            systemctl status service
                            journalctl -u service -f

                            # Kernel / limits
                            uname -a
                            sysctl -a | grep vm.swappiness
                            ulimit -a

Final rule

Ubuntu is stable in production when it is operated properly.
Stability comes from LTS discipline, measured capacity, controlled updates, systemd supervision, monitored resources, clean logs, safe rollback, tested backups and calm incident handling.

Minimal robust server profile

Minimum robust Ubuntu server:
                            - Ubuntu LTS
                            - systemd-managed services
                            - restart policies
                            - monitoring for CPU/RAM/disk/IO/network
                            - alerting on failed services and disk growth
                            - log retention
                            - patch and reboot policy
                            - backup and restore test
                            - documented runbook

5.1 Ubuntu Cloud & AWS: official AMIs, Canonical owner, EC2 patterns, cloud-init, userdata and SSH keys

Ubuntu on cloud: what it means

Ubuntu is one of the most common Linux baselines for cloud servers. On AWS, it is typically deployed as an EC2 instance using an official Ubuntu AMI. The instance then boots with cloud-init, receives an SSH key, attaches storage, joins a network, applies security groups and runs the server bootstrap.

In production, the cloud image is part of the infrastructure contract. It defines the operating system version, kernel, package baseline, boot behavior, cloud-init behavior, default users, storage layout and initial security posture.

Concept	Meaning	Production impact
AMI	Amazon Machine Image used to boot EC2.	Defines OS baseline and initial package state.
Official Ubuntu image	Image published by Canonical for AWS.	Preferred baseline for Ubuntu EC2 servers.
Owner ID	AWS account that owns the public AMI.	Used to avoid fake or untrusted public images.
cloud-init	First-boot initialization system.	Creates users, installs packages, writes files, runs commands.
User data	Bootstrap content passed at EC2 launch.	Automates first boot configuration.
Security group	AWS network firewall attached to instance or ENI.	Controls inbound and outbound exposure.
Key pair	SSH access credential used at launch.	Controls first admin access.

Core rule: do not treat a cloud VM as a manually configured machine. Treat it as infrastructure generated from an approved image, controlled user data, security groups, monitoring and a reproducible deployment process.

AWS Ubuntu mental model

AWS EC2 Ubuntu instance
                            │
                            ├── AMI
                            │       ├── Ubuntu release
                            │       ├── kernel
                            │       ├── cloud-init
                            │       └── base packages
                            │
                            ├── Instance configuration
                            │       ├── instance type
                            │       ├── EBS volume
                            │       ├── subnet
                            │       ├── security group
                            │       ├── IAM role
                            │       └── SSH key pair
                            │
                            ├── First boot
                            │       ├── cloud-init metadata
                            │       ├── user data
                            │       ├── SSH key injection
                            │       ├── package installation
                            │       └── service bootstrap
                            │
                            └── Operations
                            ├── patching
                            ├── monitoring
                            ├── backups
                            ├── logs
                            ├── snapshots
                            └── replacement strategy

Official URLs

Ubuntu on AWS:
                            https://documentation.ubuntu.com/aws/

                            Find Ubuntu images on AWS:
                            https://documentation.ubuntu.com/aws/aws-how-to/instances/find-ubuntu-images/

                            Ubuntu cloud images:
                            https://cloud-images.ubuntu.com/

                            AWS EC2:
                            https://docs.aws.amazon.com/ec2/

                            cloud-init:
                            https://cloudinit.readthedocs.io/

Official Ubuntu AMIs and Canonical owner filtering

Public AMI catalogs contain many images. In production, you should avoid selecting a random public image called “Ubuntu”. Use official Canonical images and verify the owner. This reduces the risk of using an untrusted image with unknown modifications.

Item	Value / practice	Reason
Canonical AWS owner ID	`099720109477`	Filters official Ubuntu AMIs published by Canonical.
Release choice	Ubuntu Server LTS for production.	Longer support and safer lifecycle.
Architecture	`amd64` or `arm64`.	Must match EC2 instance family.
Storage type	EBS-backed AMI.	Standard for modern EC2 instances.
Virtualization	HVM.	Modern EC2 virtualization mode.
Image lifecycle	Pin or approve AMI IDs for production.	Avoid surprise image changes.

Console filtering pattern

EC2 Console
                            │
                            ├── Images
                            ├── AMIs
                            ├── Public images
                            ├── Owner = 099720109477
                            ├── Name contains ubuntu/images/hvm-ssd/ubuntu
                            ├── Select LTS release
                            └── Verify architecture and region

AWS CLI AMI search example

aws ec2 describe-images \
                            --owners 099720109477 \
                            --filters \
                            "Name=name,Values=ubuntu/images/hvm-ssd/ubuntu-*-24.04-amd64-server-*" \
                            "Name=state,Values=available" \
                            "Name=architecture,Values=x86_64" \
                            --query 'Images | sort_by(@, &CreationDate)[-5:].{Name:Name,ImageId:ImageId,CreationDate:CreationDate}' \
                            --output table

AMI selection decision tree

Need Ubuntu EC2 image?
                            │
                            ├── Is this production?
                            │       ├── yes -> choose LTS
                            │       └── no  -> LTS still preferred, interim only if justified
                            │
                            ├── Is owner Canonical?
                            │       ├── yes -> continue
                            │       └── no  -> reject for production
                            │
                            ├── Does architecture match instance?
                            │       ├── yes -> continue
                            │       └── no  -> select amd64 or arm64 correctly
                            │
                            ├── Is AMI approved or pinned?
                            │       ├── yes -> launch
                            │       └── no  -> review before production

Production warning: never use a random public AMI because the name looks correct. Verify owner, release, architecture, creation date and approval status.

EC2 launch pattern: instance type, storage, network, security and bootstrap

Launching Ubuntu on EC2 is a sequence of infrastructure decisions. The AMI is only the OS. Production quality also depends on instance sizing, EBS volume type, network placement, security groups, IAM role, SSH access, user data and monitoring.

EC2 decision	Typical production practice	Risk if ignored
Instance type	Size by CPU, RAM, network and workload.	CPU steal, memory pressure, throttling.
EBS root volume	Enough size, gp3 baseline tuned if needed.	Disk full or IO bottleneck.
Subnet	Public only if it must receive internet traffic.	Unnecessary exposure.
Security group	Only required ports, source restricted.	Public SSH, public DB, attack surface.
IAM role	Least privilege role attached to instance.	Static credentials on disk.
User data	Minimal bootstrap, versioned and tested.	Unreproducible snowflake server.
Tags	Name, environment, owner, cost center, role.	Poor inventory and cost tracking.

EC2 launch flow

Launch EC2
                            │
                            ├── Choose official Ubuntu LTS AMI
                            ├── Choose instance type
                            ├── Configure EBS root volume
                            ├── Select VPC and subnet
                            ├── Attach security group
                            ├── Attach IAM role
                            ├── Select SSH key pair
                            ├── Add user data
                            ├── Add tags
                            └── Launch instance

Reference AWS Ubuntu architecture

Internet
                            │
                            ▼
                            AWS Load Balancer
                            │
                            ├── HTTPS listener
                            ├── certificate
                            └── health checks
                            │
                            ▼
                            Public or private app subnet
                            │
                            ├── Ubuntu EC2 app server
                            │       ├── Nginx
                            │       ├── Gunicorn / app runtime
                            │       ├── CloudWatch agent
                            │       └── UFW local firewall
                            │
                            └── Security group
                            ├── inbound from load balancer only
                            └── SSH from bastion or admin IP only

                            Private data subnet
                            │
                            ├── RDS / database
                            ├── Redis / cache
                            └── no public exposure

Sizing examples

Use case	Starting point	Watch metric
Small Nginx reverse proxy	Small general-purpose instance.	Network, CPU, connections.
Django / API server	Balanced CPU/RAM instance.	CPU, memory, latency, worker count.
Celery worker	CPU or RAM based on task type.	Queue depth, CPU, memory.
Database on EC2	Memory and IO optimized.	IOPS, latency, cache hit ratio.

EC2 rule: launch is not enough. The instance must be tagged, monitored, patched, backed up and replaceable.

SSH keys, default users and safe access patterns

Ubuntu cloud images use SSH keys for initial access. On AWS, the selected EC2 key pair is injected into the default Ubuntu user account during first boot. For Ubuntu images, the common default username is ubuntu.

Access element	Production practice	Reason
Default user	`ubuntu` for initial access.	Standard cloud image behavior.
SSH key pair	Use protected private key.	Controls initial admin access.
SSH exposure	Restrict by source IP or bastion.	Reduces brute-force surface.
Root login	Disabled.	Use named users and sudo.
Long-term access	Create named admin users or SSM access.	Improves audit and revocation.
Emergency access	Document SSM, console or recovery procedure.	Prevents lockout during incidents.

SSH examples

# Secure private key permissions
                            chmod 600 my-aws-key.pem

                            # Connect to Ubuntu EC2
                            ssh -i my-aws-key.pem ubuntu@EC2_PUBLIC_IP

                            # First checks
                            hostnamectl
                            whoami
                            sudo -l
                            ip a
                            systemctl status ssh

Safe access architecture

Admin workstation
                            │
                            ├── SSH private key
                            └── fixed public IP if possible
                            │
                            ▼
                            Security group
                            │
                            ├── allow SSH only from admin IP
                            └── or allow SSH only from bastion
                            │
                            ▼
                            Ubuntu EC2 instance
                            │
                            ├── default ubuntu user
                            ├── sudo for admin tasks
                            ├── no root SSH login
                            └── logs in auth/journal

Hardening after first login

# Update packages
                            sudo apt update
                            sudo apt upgrade

                            # Create named admin user if needed
                            sudo adduser deploy
                            sudo usermod -aG sudo deploy

                            # Add SSH key for deploy user
                            sudo mkdir -p /home/deploy/.ssh
                            sudo cp /home/ubuntu/.ssh/authorized_keys /home/deploy/.ssh/authorized_keys
                            sudo chown -R deploy:deploy /home/deploy/.ssh
                            sudo chmod 700 /home/deploy/.ssh
                            sudo chmod 600 /home/deploy/.ssh/authorized_keys

                            # Test deploy login before restricting access
                            ssh deploy@EC2_PUBLIC_IP

Access alternatives

Pattern	When useful	Comment
Direct SSH	Small setups, restricted IP.	Simple but exposed if public.
Bastion host	Multiple private instances.	Centralized admin entry point.
AWS Systems Manager	No public SSH desired.	Requires IAM, agent and network access.
VPN	Private operations network.	Good for strict environments.

Lockout warning: before removing old keys or restricting SSH, test the new access path from a separate session.

cloud-init: first boot automation for Ubuntu cloud images

cloud-init is the standard first-boot initialization system for Ubuntu cloud images. It reads cloud metadata and user data, then applies configuration such as users, SSH keys, packages, files, commands, hostname, timezone and service setup.

cloud-init section	Purpose	Example usage
`package_update`	Refresh package metadata.	Prepare apt before install.
`package_upgrade`	Upgrade packages at first boot.	Apply latest security patches.
`users`	Create users and SSH keys.	Provision deploy user.
`packages`	Install packages.	Nginx, fail2ban, monitoring agent.
`write_files`	Create config files.	Systemd unit, app env template.
`runcmd`	Run final commands.	Enable services, configure firewall.

cloud-init lifecycle

EC2 instance first boot
                            │
                            ├── Query AWS metadata service
                            ├── Read user data
                            ├── Configure hostname
                            ├── Inject SSH key
                            ├── Create users
                            ├── Configure packages
                            ├── Write files
                            ├── Run commands
                            ├── Start services
                            └── Mark initialization complete

Minimal cloud-init baseline

#cloud-config
                            package_update: true
                            package_upgrade: true

                            timezone: UTC

                            packages:
                            - curl
                            - wget
                            - git
                            - vim
                            - htop
                            - ufw
                            - fail2ban
                            - nginx

                            runcmd:
                            - ufw allow OpenSSH
                            - ufw allow 80/tcp
                            - ufw allow 443/tcp
                            - ufw --force enable
                            - systemctl enable --now nginx
                            - systemctl enable --now fail2ban

cloud-init diagnostics

# Status
                            cloud-init status

                            # Wait for completion
                            cloud-init status --wait

                            # Main logs
                            sudo less /var/log/cloud-init.log
                            sudo less /var/log/cloud-init-output.log

                            # Show instance metadata if allowed
                            curl -s http://169.254.169.254/latest/meta-data/ || true

                            # Validate config if tool supports it
                            cloud-init schema --config-file user-data.yaml

cloud-init rule: use user data for minimal first-boot bootstrap. For complex configuration, call a versioned script or configuration management tool.

User data patterns: simple bootstrap, web server, app server and config handoff

User data should be small, readable and reliable. A good pattern is to install only base packages, harden basic access, install monitoring and call a versioned bootstrap script from a trusted source. Avoid stuffing an entire production deployment into a long untested user-data block.

Pattern 1: simple HTTP test server

#cloud-config
                            package_update: true

                            packages:
                            - nginx

                            write_files:
                            - path: /var/www/html/index.html
                            permissions: '0644'
                            content: |
                            Ubuntu EC2 is running.

                            runcmd:
                            - systemctl enable --now nginx

Pattern 2: baseline security bootstrap

#cloud-config
                            package_update: true
                            package_upgrade: true

                            packages:
                            - ufw
                            - fail2ban
                            - curl
                            - htop

                            runcmd:
                            - ufw default deny incoming
                            - ufw default allow outgoing
                            - ufw allow OpenSSH
                            - ufw --force enable
                            - systemctl enable --now fail2ban

Pattern 3: handoff to versioned script

#cloud-config
                            package_update: true

                            packages:
                            - curl
                            - ca-certificates

                            runcmd:
                            - curl -fsSL https://example.com/bootstrap/ubuntu-app.sh -o /root/bootstrap.sh
                            - chmod 700 /root/bootstrap.sh
                            - /root/bootstrap.sh --role app --env prod

Better handoff pattern

Preferred production pattern:
                            1. cloud-init creates minimal baseline
                            2. instance has IAM role
                            3. script is downloaded from trusted private source
                            4. script checksum or signature is verified
                            5. configuration is versioned
                            6. logs are written to /var/log/bootstrap.log
                            7. monitoring reports success or failure

Bootstrap logging example

#!/usr/bin/env bash
                            set -euo pipefail

                            exec > >(tee -a /var/log/bootstrap.log) 2>&1

                            echo "bootstrap started at $(date -Is)"

                            apt update
                            apt install -y nginx

                            systemctl enable --now nginx

                            echo "bootstrap finished at $(date -Is)"

Supply-chain warning: avoid blindly running remote scripts as root. Use trusted sources, checksums, IAM controls and private artifact storage.

Golden AMI pattern: reproducible Ubuntu servers

A golden AMI is a prebuilt, approved image containing a hardened baseline: Ubuntu LTS, patches, standard packages, users, agents, logging, monitoring and security defaults. It reduces boot time, improves repeatability and makes replacement safer than manual repair.

Golden AMI content	Purpose	Example
Ubuntu LTS base	Approved OS baseline.	24.04 LTS server image.
Security updates	Reduce patch work at boot.	apt upgrade during image build.
Agents	Monitoring, logs, SSM, backup.	CloudWatch agent, SSM agent.
Hardening	Common security defaults.	SSH policy, sysctl, UFW baseline.
Tags and metadata	Inventory and lifecycle.	version, build date, git commit.
Validation tests	Prove image boots and works.	SSH, cloud-init, services, logs.

Golden AMI build flow

Official Ubuntu AMI
                            │
                            ▼
                            Packer image build
                            │
                            ├── apply apt updates
                            ├── install baseline packages
                            ├── install monitoring agents
                            ├── apply hardening
                            ├── clean temporary files
                            ├── validate services
                            └── create AMI
                            │
                            ▼
                            Approved AMI
                            │
                            ├── tagged with version
                            ├── tested in staging
                            ├── used by launch templates
                            └── rolled out progressively

Replace, do not repair

Traditional server:
                            - SSH into machine
                            - manually patch
                            - manually edit config
                            - server becomes unique
                            - recovery depends on memory

                            Cloud-native server:
                            - build image
                            - deploy new instance
                            - attach to load balancer
                            - drain old instance
                            - terminate old instance
                            - rollback by previous image

Golden AMI governance

[ ] Base AMI owner verified
                            [ ] Ubuntu LTS version recorded
                            [ ] Build script versioned
                            [ ] Security updates applied
                            [ ] Image tests pass
                            [ ] AMI is tagged
                            [ ] AMI ID is published to parameter store or IaC
                            [ ] Rollback AMI is retained
                            [ ] Staging rollout completed
                            [ ] Production rollout is progressive
                            [ ] Old AMIs are retired safely

Launch template model

Launch Template
                            │
                            ├── approved AMI ID
                            ├── instance type
                            ├── IAM role
                            ├── security groups
                            ├── EBS configuration
                            ├── user data
                            └── tags
                            │
                            ▼
                            Auto Scaling Group
                            ├── desired capacity
                            ├── health checks
                            ├── rolling replacement
                            └── rollback to previous template version

Cloud robustness rule: if a server can be rebuilt from image and code, incidents become easier to recover from than manually maintained machines.

AWS security for Ubuntu EC2: security groups, IAM, metadata and private networking

Ubuntu hardening and AWS security must work together. Security groups restrict network access before traffic reaches the instance. UFW adds host-level defense. IAM roles avoid static cloud keys. Private subnets prevent unnecessary exposure.

Security control	AWS side	Ubuntu side
Network filtering	Security groups, NACLs, load balancer.	UFW or nftables.
Admin access	Bastion, VPN, SSM Session Manager.	SSH keys, no root login, auth logs.
Cloud permissions	IAM role attached to instance.	No static AWS keys stored on disk.
Secrets	Secrets Manager, SSM Parameter Store.	Strict file permissions if cached locally.
Observability	CloudWatch, VPC Flow Logs, CloudTrail.	journald, auth logs, application logs.
Recovery	EBS snapshots, AMIs, backups.	Restore tests and runbooks.

Security group examples

Public web server:
                            - inbound 443/tcp from 0.0.0.0/0
                            - inbound 80/tcp from 0.0.0.0/0 only if redirect is needed
                            - inbound 22/tcp only from admin IP or bastion
                            - outbound restricted if strict policy is required

                            Private app server behind load balancer:
                            - inbound app port only from load balancer security group
                            - inbound SSH only from bastion security group
                            - no direct public access

                            Database server:
                            - inbound DB port only from app security group
                            - no public IP
                            - no public SSH

Layered AWS Ubuntu security diagram

Internet
                            │
                            ▼
                            AWS perimeter
                            ├── Route 53
                            ├── CloudFront / WAF
                            ├── Load Balancer
                            └── Security Groups
                            │
                            ▼
                            Ubuntu host
                            ├── UFW
                            ├── SSH hardening
                            ├── non-root services
                            ├── package updates
                            ├── logs
                            └── monitoring agent
                            │
                            ▼
                            Application
                            ├── TLS
                            ├── secrets management
                            ├── app logs
                            ├── DB access
                            └── health checks

Metadata and credentials

Recommended:
                            - use IAM roles instead of static AWS keys
                            - keep role permissions minimal
                            - avoid storing credentials in user data
                            - avoid secrets in AMI images
                            - use Parameter Store or Secrets Manager
                            - monitor CloudTrail for suspicious API calls
                            - review instance profile permissions

                            Avoid:
                            - AWS_ACCESS_KEY_ID in .bashrc
                            - secrets embedded in user data
                            - secrets baked into AMIs
                            - overly broad IAM roles
                            - public metadata exposure through SSRF-vulnerable apps

Security warning: user data may be visible to users or processes with instance metadata access. Do not place long-lived secrets in user data.

Operations: monitoring, logs, snapshots, patching and recovery

Ubuntu EC2 operations combine Linux administration and AWS lifecycle management. The system must be patched, monitored, backed up, logged, tagged, replaceable and tested. A server that cannot be rebuilt is a long-term operational risk.

Operational area	AWS control	Ubuntu control	Question to answer
Metrics	CloudWatch metrics and agent.	node exporter, system metrics.	Is the host saturated?
Logs	CloudWatch Logs, S3 archive.	journald, app logs, auth logs.	Can we diagnose incidents?
Backups	EBS snapshots, AWS Backup.	Application-aware backup hooks.	Can we restore?
Patching	SSM Patch Manager, image rebuild.	apt, unattended upgrades.	Are CVEs patched?
Recovery	AMI, launch template, autoscaling.	cloud-init, bootstrap scripts.	Can we replace the server?
Inventory	Tags, AWS Config, Systems Manager.	hostname, OS version, package list.	Do we know what this server is?

Operational metrics

Host:
                            - CPU utilization
                            - memory usage
                            - disk usage
                            - disk IO latency
                            - network throughput
                            - systemd failed services
                            - reboot-required state

                            Application:
                            - HTTP latency
                            - HTTP 5xx
                            - worker queue depth
                            - database connections
                            - error logs
                            - health check status

                            AWS:
                            - instance status checks
                            - EBS burst balance
                            - EBS latency
                            - load balancer health
                            - security group changes
                            - CloudTrail events

Recovery patterns

Pattern 1: EBS snapshot restore
                            ├── create volume from snapshot
                            ├── attach to instance
                            ├── mount and recover data
                            └── validate application

                            Pattern 2: AMI rollback
                            ├── select previous AMI
                            ├── launch replacement instance
                            ├── attach to load balancer
                            ├── validate health
                            └── terminate bad instance

                            Pattern 3: Blue/green replacement
                            ├── build new Ubuntu image
                            ├── launch green environment
                            ├── smoke test
                            ├── shift traffic
                            └── keep blue as rollback

Ubuntu EC2 health commands

# OS and kernel
                            hostnamectl
                            uname -a
                            lsb_release -a

                            # Cloud-init status
                            cloud-init status
                            sudo tail -100 /var/log/cloud-init-output.log

                            # System health
                            uptime
                            df -h
                            free -h
                            systemctl --failed
                            journalctl -p warning --since "30 min ago"

                            # Network and ports
                            ip a
                            ip r
                            ss -lntp

                            # Reboot required
                            test -f /var/run/reboot-required && cat /var/run/reboot-required

Operations rule: backups only count if restore has been tested. Snapshots without restore tests are assumptions, not recovery guarantees.

Final AWS Ubuntu checklist

AMI and launch checklist

[ ] Official Ubuntu AMI selected
                            [ ] Canonical owner ID verified
                            [ ] LTS release selected for production
                            [ ] Architecture matches instance type
                            [ ] AMI ID is approved or pinned
                            [ ] Instance type matches workload
                            [ ] EBS volume size is sufficient
                            [ ] EBS performance is appropriate
                            [ ] VPC and subnet are correct
                            [ ] Security group is minimal
                            [ ] SSH access is restricted
                            [ ] IAM role uses least privilege
                            [ ] User data is tested
                            [ ] Tags are complete
                            [ ] Monitoring is enabled

Cloud-init checklist

[ ] User data starts with #cloud-config if YAML
                            [ ] package_update is intentional
                            [ ] package_upgrade is intentional
                            [ ] No long-lived secrets in user data
                            [ ] Bootstrap logs to file
                            [ ] cloud-init status is checked
                            [ ] /var/log/cloud-init-output.log is reviewed
                            [ ] Failed commands are visible
                            [ ] Complex setup is delegated to versioned script
                            [ ] Script source is trusted
                            [ ] Rebuild process is documented

Production operations checklist

[ ] UFW matches security group policy
                            [ ] Root SSH login is disabled
                            [ ] SSH keys are controlled
                            [ ] Patch policy exists
                            [ ] Reboot policy exists
                            [ ] CloudWatch or equivalent metrics enabled
                            [ ] Logs are shipped centrally
                            [ ] EBS snapshots are scheduled
                            [ ] Restore has been tested
                            [ ] Launch template is versioned
                            [ ] Golden AMI pipeline exists if fleet is large
                            [ ] Rollback AMI is retained
                            [ ] Instance is replaceable
                            [ ] Runbook exists
                            [ ] Owner and cost tags are present

Final rule

Ubuntu on AWS is reliable when the server is reproducible.
Use official Canonical AMIs, LTS baselines, controlled user data, minimal security groups, SSH keys or SSM, IAM roles, monitoring, snapshots, tested restore and image-based replacement where possible.

Minimal safe EC2 Ubuntu baseline

Minimum safe baseline:
                            - official Ubuntu LTS AMI
                            - Canonical owner verified
                            - SSH restricted
                            - security group minimal
                            - IAM role instead of static keys
                            - cloud-init bootstrap tested
                            - packages updated
                            - UFW enabled if needed
                            - monitoring installed
                            - snapshots configured
                            - restore tested
                            - instance documented and tagged

5.2 Ubuntu Containers & Virtualisation: Docker, LXD/LXC, KVM, virt-manager, CI/CD, labs and production patterns

Containers and virtualization on Ubuntu

Ubuntu is a strong platform for both containers and virtualization. Docker is typically used for application containers. LXD/LXC is used for system containers that behave more like lightweight machines. KVM/QEMU is used for full virtual machines with their own kernel. virt-manager provides a graphical management interface for KVM.

The key difference is isolation level. Docker containers share the host kernel and are optimized for application packaging. LXD containers also share the host kernel but feel closer to small Linux systems. KVM virtual machines run a full guest OS with stronger isolation and more overhead.

Technology	Category	Best for	Isolation	Typical command
Docker	Application containers	Apps, microservices, dev stacks, CI jobs.	Process/container isolation, shared kernel.	`docker run nginx`
Docker Compose	Multi-container orchestration	Local stacks, demos, small deployments.	Same as Docker.	`docker compose up`
LXD / LXC	System containers	Mini Linux systems, labs, isolated services.	OS-level isolation, shared kernel.	`lxc launch ubuntu:24.04 c1`
KVM / QEMU	Full virtualization	VMs, different kernels, stronger isolation.	Hardware-assisted VM isolation.	`virsh list --all`
virt-manager	GUI for KVM/libvirt	Desktop/lab VM management.	Manages KVM guests.	Graphical interface.

Core rule: use Docker for application packaging, LXD for lightweight Linux environments, and KVM when you need a full VM with its own kernel.

Isolation model diagram

Ubuntu host
                            │
                            ├── Docker containers
                            │       ├── app process
                            │       ├── image layers
                            │       ├── container network
                            │       └── shared host kernel
                            │
                            ├── LXD system containers
                            │       ├── init/systemd inside container
                            │       ├── full Ubuntu userspace
                            │       ├── container profiles
                            │       └── shared host kernel
                            │
                            └── KVM virtual machines
                            ├── guest kernel
                            ├── guest OS
                            ├── virtual CPU/RAM/disk/NIC
                            └── stronger isolation boundary

Decision shortcut

Need to ship an application?
                            └── Docker

                            Need several services locally?
                            └── Docker Compose

                            Need a mini Ubuntu machine?
                            └── LXD / LXC

                            Need another kernel or full VM isolation?
                            └── KVM / QEMU

                            Need a graphical VM manager?
                            └── virt-manager

                            Need production orchestration at scale?
                            └── Kubernetes, ECS, Nomad or managed platform

Common mistake: using containers as if they were VMs without understanding storage, networking, privilege, logs, lifecycle and security boundaries.

Docker on Ubuntu: images, containers, networks, volumes and logs

Docker packages an application and its runtime dependencies into an image. A container is a running instance of that image. On Ubuntu, Docker is commonly used for development, CI/CD, local demos, staging environments and production workloads behind a reverse proxy or orchestrator.

Concept	Meaning	Example
Image	Immutable package template.	`nginx:latest`, `postgres:16`
Container	Running process from an image.	`docker run nginx`
Volume	Persistent storage outside container lifecycle.	Database data, uploads.
Network	Container communication layer.	bridge network, app network.
Registry	Image storage and distribution.	Docker Hub, GHCR, ECR.
Dockerfile	Build recipe for an image.	Python app image.

Install Docker baseline

# Install from Ubuntu repository for simple usage
                            sudo apt update
                            sudo apt install docker.io docker-compose-v2

                            # Enable Docker
                            sudo systemctl enable --now docker

                            # Check status
                            systemctl status docker

                            # Add current user to docker group
                            sudo usermod -aG docker $USER

                            # Re-login before using docker without sudo
                            docker version
                            docker info

Core Docker commands

# List running containers
                            docker ps

                            # List all containers
                            docker ps -a

                            # List images
                            docker images

                            # Run Nginx
                            docker run --name web -p 8080:80 nginx:latest

                            # Stop and remove
                            docker stop web
                            docker rm web

                            # Logs
                            docker logs web
                            docker logs -f web

                            # Shell inside container
                            docker exec -it web bash

                            # Inspect container
                            docker inspect web

                            # Disk usage
                            docker system df

Docker architecture

Developer or CI
                            │
                            ├── Dockerfile
                            ├── build image
                            ├── tag image
                            └── push image
                            │
                            ▼
                            Registry
                            │
                            ├── Docker Hub
                            ├── GitHub Container Registry
                            ├── AWS ECR
                            └── private registry
                            │
                            ▼
                            Ubuntu host
                            │
                            ├── pull image
                            ├── run container
                            ├── attach volume
                            ├── expose port
                            └── collect logs

Security warning: membership in the docker group is effectively root-equivalent on the host. Do not grant it casually on production servers.

Docker Compose: local stacks, demos, CI environments and small deployments

Docker Compose defines several containers in one YAML file. It is excellent for local development, prototypes, demos, test stacks and small internal deployments. For large production environments, Compose is usually replaced by Kubernetes, ECS, Nomad or another orchestrator.

Use case	Compose fit	Comment
Local Django + Postgres + Redis	Excellent.	Reproducible dev environment.
Demo platform	Excellent.	Easy to start and stop.
CI integration tests	Good.	Start dependencies for test run.
Single-server production	Possible with discipline.	Needs backups, monitoring, update strategy.
Large multi-node production	Not ideal.	Use orchestrator.

Example Compose stack

services:
                            web:
                            build: .
                            command: gunicorn config.wsgi:application --bind 0.0.0.0:8000
                            ports:
                            - "8000:8000"
                            environment:
                            DJANGO_SETTINGS_MODULE: config.settings
                            DATABASE_URL: postgres://app:app@db:5432/app
                            REDIS_URL: redis://redis:6379/0
                            depends_on:
                            - db
                            - redis

                            db:
                            image: postgres:16
                            environment:
                            POSTGRES_DB: app
                            POSTGRES_USER: app
                            POSTGRES_PASSWORD: app
                            volumes:
                            - pgdata:/var/lib/postgresql/data

                            redis:
                            image: redis:7

                            volumes:
                            pgdata:

Compose commands

# Start stack
                            docker compose up

                            # Start in background
                            docker compose up -d

                            # Show containers
                            docker compose ps

                            # Show logs
                            docker compose logs
                            docker compose logs -f web

                            # Execute command
                            docker compose exec web bash

                            # Stop stack
                            docker compose down

                            # Stop and remove volumes - destructive
                            docker compose down -v

                            # Rebuild
                            docker compose build
                            docker compose up -d --build

Compose lifecycle

docker-compose.yml
                            │
                            ├── services
                            ├── networks
                            ├── volumes
                            ├── environment
                            ├── ports
                            └── dependencies
                            │
                            ▼
                            docker compose up
                            │
                            ├── creates network
                            ├── creates volumes
                            ├── starts containers
                            ├── streams logs
                            └── exposes ports

Production cautions

If using Compose in production:
                            [ ] pin image versions
                            [ ] avoid latest tags
                            [ ] define restart policies
                            [ ] configure log rotation
                            [ ] persist data in named volumes
                            [ ] backup volumes
                            [ ] monitor containers
                            [ ] document upgrade process
                            [ ] keep secrets out of git
                            [ ] place behind Nginx or load balancer

Compose rule: great for reproducibility and demos. For serious production, add the missing operational pieces: backups, monitoring, secrets, updates and rollback.

LXD / LXC: system containers and lightweight Ubuntu environments

LXD manages LXC system containers. Unlike Docker, which usually runs one application process per container, LXD containers can behave like lightweight Linux machines with init, SSH, packages, services and multiple processes. This makes LXD excellent for labs, training, test environments, network simulations and isolated system services.

Feature	LXD / LXC behavior	Usefulness
System container	Full Linux userspace.	Feels like a mini VM.
Shared kernel	Uses host kernel.	Lightweight compared to VM.
Images	Launch Ubuntu and other Linux images.	Fast lab creation.
Profiles	Reusable config for containers.	Standardized CPU/RAM/network/storage.
Snapshots	Snapshot and restore container state.	Safe experimentation.
Networking	Bridge, routed, macvlan patterns.	Complex labs and isolated networks.

LXD install and init

# Install LXD
                            sudo snap install lxd

                            # Add user to lxd group
                            sudo usermod -aG lxd $USER

                            # Re-login, then initialize
                            lxd init

                            # Launch Ubuntu container
                            lxc launch ubuntu:24.04 test1

                            # List containers
                            lxc list

                            # Shell inside container
                            lxc exec test1 -- bash

LXD command examples

# Start / stop
                            lxc start test1
                            lxc stop test1

                            # Execute command
                            lxc exec test1 -- apt update

                            # Copy file into container
                            lxc file push local.txt test1/root/local.txt

                            # Snapshot
                            lxc snapshot test1 before-change

                            # Restore snapshot
                            lxc restore test1 before-change

                            # Delete container
                            lxc delete test1 --force

                            # Show configuration
                            lxc config show test1

                            # Limit memory
                            lxc config set test1 limits.memory 1GiB

                            # Limit CPU
                            lxc config set test1 limits.cpu 2

LXD lab architecture

Ubuntu host
                            │
                            ├── lxdbr0 bridge
                            │
                            ├── container: web01
                            │       ├── nginx
                            │       └── app service
                            │
                            ├── container: db01
                            │       └── PostgreSQL
                            │
                            ├── container: monitor01
                            │       └── Prometheus / Grafana
                            │
                            └── snapshots
                            ├── before-upgrade
                            ├── before-network-test
                            └── clean-baseline

LXD rule: use LXD when you want machine-like Linux environments without the full overhead of VMs.

KVM / QEMU / libvirt: full virtualization on Ubuntu

KVM is Linux kernel-based virtualization. With QEMU and libvirt, Ubuntu can host full virtual machines. Each VM has its own virtual CPU, memory, disk, network card and guest operating system. This is heavier than containers but gives stronger isolation and supports different kernels and operating systems.

Component	Role	Example usage
KVM	Kernel virtualization acceleration.	Runs VM workloads efficiently.
QEMU	Machine emulator and virtualizer.	Emulates devices and runs guests.
libvirt	Management layer for VMs.	`virsh`, virt-manager.
virt-install	CLI VM installer.	Create VM from ISO or cloud image.
virsh	CLI administration tool.	List, start, stop, inspect VMs.
qcow2	Common VM disk image format.	Snapshots and thin provisioning.

Install KVM stack

# Check CPU virtualization support
                            egrep -c '(vmx|svm)' /proc/cpuinfo

                            # Install KVM/libvirt tools
                            sudo apt update
                            sudo apt install -y \
                            qemu-kvm \
                            libvirt-daemon-system \
                            libvirt-clients \
                            bridge-utils \
                            virtinst

                            # Add user to groups
                            sudo usermod -aG libvirt,kvm $USER

                            # Re-login, then check
                            virsh list --all
                            systemctl status libvirtd

KVM architecture

Ubuntu host
                            │
                            ├── Linux kernel with KVM
                            ├── QEMU processes
                            ├── libvirt daemon
                            ├── virtual networks
                            ├── storage pools
                            └── VM guests
                            │
                            ├── Ubuntu VM
                            ├── Debian VM
                            ├── Windows VM
                            └── lab appliance VM

virsh commands

# List VMs
                            virsh list --all

                            # Start VM
                            virsh start vm1

                            # Stop gracefully
                            virsh shutdown vm1

                            # Force stop
                            virsh destroy vm1

                            # VM info
                            virsh dominfo vm1

                            # Autostart VM
                            virsh autostart vm1

                            # Show networks
                            virsh net-list --all

                            # Show storage pools
                            virsh pool-list --all

When KVM is better than containers

Use KVM when:
                            - guest needs its own kernel
                            - running another OS
                            - stronger isolation is required
                            - testing kernel-level behavior
                            - simulating production VM topology
                            - running legacy software
                            - needing VM snapshots and full machine state

Resource warning: VMs consume reserved CPU/RAM/disk more like real machines. Capacity planning matters more than with lightweight containers.

virt-manager: graphical VM management on Ubuntu

virt-manager is a desktop GUI for managing KVM/libvirt virtual machines. It is useful for labs, local testing, training, troubleshooting, VM console access and visual VM configuration. On servers, CLI tools such as virsh and automation are more common.

virt-manager feature	Purpose	Typical usage
VM creation wizard	Create new VM from ISO or image.	Ubuntu or Windows lab VM.
Console view	Access VM screen.	Install OS, fix boot issues.
Hardware editor	Configure CPU, RAM, disks, NICs.	Adjust VM resources.
Snapshots	Capture VM state.	Before risky change.
Network view	Manage virtual networks.	NAT, bridge, isolated network.
Storage pools	Manage VM disks.	qcow2 images and volumes.

Install virt-manager

sudo apt update
                            sudo apt install virt-manager

                            # Start GUI from desktop
                            virt-manager

                            # Check libvirt service
                            systemctl status libvirtd

                            # List VMs from CLI
                            virsh list --all

VM creation flow with virt-manager

virt-manager
                            │
                            ├── New virtual machine
                            ├── Choose ISO or cloud image
                            ├── Select OS type
                            ├── Allocate CPU and RAM
                            ├── Create virtual disk
                            ├── Choose network
                            ├── Start installation
                            └── Install guest tools if needed

Lab topology example

Ubuntu desktop host
                            │
                            ├── virt-manager
                            │
                            ├── VM: router-lab
                            │       ├── NIC 1: NAT
                            │       └── NIC 2: isolated lab network
                            │
                            ├── VM: web-server
                            │       └── lab network
                            │
                            └── VM: db-server
                            └── lab network

Good usage boundaries

Use virt-manager for:
                            - desktop labs
                            - OS installation
                            - visual debugging
                            - VM console access
                            - local experiments

                            Prefer CLI/IaC for:
                            - production servers
                            - repeatable deployment
                            - remote headless hosts
                            - large VM fleets
                            - automated rebuilds

virt-manager rule: excellent for learning and labs. For production, prefer versioned definitions, automation and CLI-controlled operations.

CI/CD, labs and developer workflows

Ubuntu containers and virtualization are extremely useful for reproducible development, automated tests, CI runners, network labs, database experiments, security sandboxes and integration environments. The goal is to reduce “works on my machine” problems.

Workflow	Best technology	Why
Local web app stack	Docker Compose	Fast, reproducible dependencies.
CI integration tests	Docker services	Start DB/cache/message broker for tests.
Linux admin training	LXD	Fast mini Ubuntu machines.
Network topology lab	LXD or KVM	Multiple nodes and networks.
Kernel or OS testing	KVM	Full guest kernel isolation.
Security sandbox	KVM	Stronger isolation boundary.

CI pipeline example

Git push
                            │
                            ▼
                            CI runner on Ubuntu
                            │
                            ├── checkout code
                            ├── build Docker image
                            ├── start Compose services
                            │       ├── app
                            │       ├── postgres
                            │       └── redis
                            ├── run tests
                            ├── scan image
                            ├── push image to registry
                            └── deploy to target environment

Demo architecture: one Ubuntu host

Ubuntu host
                            │
                            ├── Docker service
                            │       ├── nginx container
                            │       ├── app container
                            │       └── redis container
                            │
                            ├── LXD container
                            │       └── ubuntu:24.04 system lab
                            │
                            ├── KVM VM
                            │       └── isolated test machine
                            │
                            └── Monitoring
                            ├── node exporter
                            ├── docker stats
                            └── systemd status

Useful demo commands

# Docker demo
                            docker run -d --name demo-nginx -p 8080:80 nginx:latest
                            curl -I http://localhost:8080
                            docker logs demo-nginx

                            # LXD demo
                            lxc launch ubuntu:24.04 lab1
                            lxc exec lab1 -- bash -lc "hostnamectl && apt update"

                            # KVM visibility
                            virsh list --all

                            # Host monitoring
                            top
                            df -h
                            ss -lntp

Portfolio demo idea: one Ubuntu host running one Docker app, one LXD system container, one KVM VM, plus a small monitoring page. It clearly shows platform breadth.

Production patterns: Docker host, reverse proxy, volumes, logs, updates and orchestration

Containers in production require more than docker run. You need image governance, non-root containers, pinned versions, health checks, persistent volumes, log rotation, backup, monitoring, network boundaries, secrets management and a clear update strategy.

Production topic	Good practice	Risk if ignored
Image versions	Pin tags or digests.	Unexpected changes from `latest`.
Volumes	Persist state outside container.	Data loss on container removal.
Logs	Configure rotation and centralization.	Disk fills under `/var/lib/docker`.
Secrets	Use secret store or strict env file permissions.	Secrets leaked in git or inspect output.
Networking	Expose only reverse proxy, keep internal networks private.	DB/cache exposed accidentally.
Health checks	Define container and load balancer health.	Dead service appears running.
Backups	Backup volumes and databases.	No recovery path.

Single-host container production pattern

Internet
                            │
                            ▼
                            Nginx on Ubuntu host
                            │
                            ├── TLS termination
                            ├── rate limiting
                            ├── static files
                            └── reverse proxy
                            │
                            ▼
                            Docker network
                            │
                            ├── app container
                            ├── worker container
                            ├── redis container
                            └── internal-only database or external DB

Docker daemon log rotation

# /etc/docker/daemon.json
                            {
                            "log-driver": "json-file",
                            "log-opts": {
                            "max-size": "50m",
                            "max-file": "5"
                            }
                            }

                            # Apply
                            sudo systemctl restart docker

Production Compose example

services:
                            app:
                            image: registry.example.com/myapp:1.4.2
                            restart: unless-stopped
                            env_file:
                            - /srv/myapp/app.env
                            networks:
                            - internal
                            healthcheck:
                            test: ["CMD", "curl", "-f", "http://localhost:8000/health/"]
                            interval: 30s
                            timeout: 5s
                            retries: 3

                            redis:
                            image: redis:7.2
                            restart: unless-stopped
                            networks:
                            - internal
                            volumes:
                            - redisdata:/data

                            networks:
                            internal:

                            volumes:
                            redisdata:

When to move beyond Compose

Move to orchestrator when:
                            - multiple hosts are needed
                            - rolling updates are required
                            - autoscaling is required
                            - service discovery is complex
                            - secrets and config need governance
                            - many teams deploy independently
                            - high availability is mandatory

Production warning: containers do not remove the need for Linux administration. The host still needs patching, disk monitoring, firewalling, backups and incident response.

Troubleshooting containers and virtualization

Troubleshooting should start at the right layer: host health, Docker daemon, container logs, network binding, volume permissions, image version, LXD profile, libvirt daemon, VM console or storage pool. Avoid deleting containers or volumes before understanding where persistent data lives.

Symptom	First checks	Common cause
Docker container exits	`docker ps -a`, `docker logs`	Bad config, missing env, app crash.
Port not reachable	`docker ps`, `ss -lntp`, UFW.	Port not published, firewall, bind address.
Disk full	`docker system df`, `du -sh /var/lib/docker`	Logs, images, old containers, volumes.
Permission denied on volume	`ls -lah`, container user, UID/GID.	Host volume ownership mismatch.
LXD container has no network	`lxc list`, `lxc network list`.	Bridge, DNS, profile or firewall issue.
KVM VM will not start	`virsh list --all`, libvirt logs.	Missing storage, permission, CPU virtualization.

Docker troubleshooting commands

docker ps
                            docker ps -a
                            docker logs CONTAINER
                            docker inspect CONTAINER
                            docker exec -it CONTAINER bash
                            docker stats
                            docker system df
                            docker network ls
                            docker volume ls
                            systemctl status docker
                            journalctl -u docker --since "30 min ago"

LXD and KVM troubleshooting commands

# LXD
                            lxc list
                            lxc info CONTAINER
                            lxc config show CONTAINER
                            lxc network list
                            lxc storage list
                            lxc exec CONTAINER -- bash
                            journalctl -u snap.lxd.daemon --since "30 min ago"

                            # KVM / libvirt
                            virsh list --all
                            virsh dominfo VM
                            virsh net-list --all
                            virsh pool-list --all
                            systemctl status libvirtd
                            journalctl -u libvirtd --since "30 min ago"

Decision tree

Container or VM issue
                            │
                            ├── Is host healthy?
                            │       └── CPU, RAM, disk, network
                            │
                            ├── Is manager running?
                            │       ├── Docker daemon
                            │       ├── LXD daemon
                            │       └── libvirt daemon
                            │
                            ├── Is workload running?
                            │       ├── docker ps -a
                            │       ├── lxc list
                            │       └── virsh list --all
                            │
                            ├── What do logs say?
                            │       ├── docker logs
                            │       ├── lxc info --show-log
                            │       └── journalctl
                            │
                            └── Is it network, storage or permissions?
                            ├── ports
                            ├── volumes
                            ├── bridges
                            └── UID/GID

Data warning: docker compose down -v deletes named volumes. Never run it on production unless you explicitly want to delete persistent data.

Final checklist and command cheat sheet

Technology choice checklist

[ ] Docker selected for application containers
                            [ ] Compose selected for local or small multi-service stacks
                            [ ] LXD selected for system-container labs
                            [ ] KVM selected for full VM isolation
                            [ ] virt-manager selected for GUI lab management
                            [ ] Production orchestrator considered if multi-node
                            [ ] Persistent data location is documented
                            [ ] Backup strategy exists for volumes and VM disks
                            [ ] Network exposure is documented
                            [ ] Host firewall rules are known
                            [ ] Logs are rotated
                            [ ] Images are pinned
                            [ ] Secrets are not stored in git
                            [ ] Monitoring covers host and workloads
                            [ ] Update and rollback process exists

Docker cheat sheet

docker ps
                            docker ps -a
                            docker images
                            docker run --name web -p 8080:80 nginx
                            docker logs -f web
                            docker exec -it web bash
                            docker stop web
                            docker rm web
                            docker system df
                            docker compose up -d
                            docker compose logs -f
                            docker compose down

LXD / KVM cheat sheet

# LXD
                            lxd init
                            lxc launch ubuntu:24.04 c1
                            lxc list
                            lxc exec c1 -- bash
                            lxc snapshot c1 before-change
                            lxc restore c1 before-change
                            lxc delete c1 --force

                            # KVM / libvirt
                            virsh list --all
                            virsh start vm1
                            virsh shutdown vm1
                            virsh dominfo vm1
                            virsh net-list --all
                            virsh pool-list --all

                            # Host checks
                            systemctl status docker
                            systemctl status libvirtd
                            df -h
                            free -h
                            ss -lntp

Final rule

Ubuntu is a strong virtualization and container host when the lifecycle is controlled.
Docker gives fast application packaging, Compose gives reproducible stacks, LXD gives machine-like containers, and KVM gives full VM isolation. Production quality depends on security, storage, networking, logs, monitoring, backups and rollback.

Minimal robust host profile

Minimum robust Ubuntu container/VM host:
                            - Ubuntu LTS
                            - patched kernel and packages
                            - Docker/LXD/KVM installed intentionally
                            - non-root operational model
                            - storage sized and monitored
                            - log rotation enabled
                            - firewall rules documented
                            - images or VM templates versioned
                            - backups tested
                            - monitoring and alerts enabled
                            - runbook documented

6.1 Ubuntu Troubleshooting Playbook: logs, systemd, network, DNS, disk, boot, services and incidents

Professional troubleshooting method

Ubuntu troubleshooting must be systematic. The objective is not to try random commands until something changes. The objective is to identify the failing layer: application, service manager, process, logs, permissions, network, DNS, firewall, storage, memory, CPU, kernel, package update, boot or recent configuration change.

A good incident workflow follows a stable sequence: define the symptom, determine the scope, collect evidence, isolate the layer, apply one minimal fix, verify, document, then add prevention.

Step	Question	Command family	Output expected
1. Symptom	What exactly is failing?	curl, browser, user report, monitoring	Precise error, time, scope.
2. Scope	One service, one host, one network, all users?	health checks, ping, curl, dashboard	Incident boundary.
3. Logs	What did the system report?	journalctl, tail, grep, dmesg	Error message and timeline.
4. Services	Is the daemon running and enabled?	systemctl, ss	Running state, PID, port.
5. Resources	Is the host saturated?	top, free, df, iostat, vmstat	CPU/RAM/disk/IO pressure.
6. Network	Can traffic reach the service?	ip, ss, ufw, dig, curl	IP, route, DNS, port, firewall status.
7. Change	What changed recently?	apt history, deploy logs, git, config diff	Likely trigger.
8. Fix	What is the smallest safe correction?	rollback, restart, config fix, cleanup	Service restored and verified.

Core rule: observe before acting. During an incident, every random restart or untracked change can destroy evidence and make the root cause harder to find.

Global diagnostic decision tree

Ubuntu incident
                            │
                            ├── Is the host reachable?
                            │       ├── no  -> cloud, network, firewall, boot, provider
                            │       └── yes
                            │
                            ├── Is disk full?
                            │       ├── yes -> disk playbook
                            │       └── no
                            │
                            ├── Are critical services running?
                            │       ├── no  -> systemd playbook
                            │       └── yes
                            │
                            ├── Are ports listening?
                            │       ├── no  -> service config / bind / crash
                            │       └── yes
                            │
                            ├── Is network path valid?
                            │       ├── no  -> DNS / route / firewall / SG
                            │       └── yes
                            │
                            ├── Are resources saturated?
                            │       ├── CPU -> CPU playbook
                            │       ├── RAM -> memory playbook
                            │       ├── IO  -> disk / IO playbook
                            │       └── no
                            │
                            └── Application layer likely
                            ├── app logs
                            ├── DB connectivity
                            ├── cache connectivity
                            ├── external dependency
                            └── recent deploy

Do / avoid

Do	Avoid
Collect logs with time window.	Reading huge logs without filtering.
Change one thing at a time.	Restarting everything blindly.
Check disk early.	Debugging app while root FS is full.
Validate config before restart.	Restarting with unvalidated config.
Keep rollback possible.	Deleting unknown files in production.

First 5 minutes: collect facts quickly

The first minutes of an incident are for orientation. You need to know whether the machine is alive, whether the disk is full, whether memory is exhausted, whether services failed, which ports are listening, and whether logs show a clear error.

One-screen diagnostic

echo "== HOST =="
                            hostnamectl

                            echo "== UPTIME =="
                            uptime

                            echo "== WHO IS CONNECTED =="
                            who

                            echo "== DISK =="
                            df -h

                            echo "== MEMORY =="
                            free -h

                            echo "== FAILED UNITS =="
                            systemctl --failed

                            echo "== LISTENING PORTS =="
                            ss -lntp

                            echo "== RECENT WARNINGS =="
                            journalctl -p warning --since "30 min ago" --no-pager | tail -100

Signal	Good sign	Bad sign	Next action
Uptime	Stable, expected boot time.	Unexpected reboot.	Check previous boot logs.
Disk	Filesystem below alert threshold.	`/` or `/var` near 100%.	Disk playbook.
Memory	Available RAM healthy.	Swap active, OOM events.	Memory playbook.
Failed units	No failed services.	Critical unit failed.	systemd playbook.
Ports	Expected ports listening.	Missing 80/443/app/DB port.	Service and network playbook.

Minimum incident facts

Incident facts to capture:
                            - exact symptom
                            - first detection time
                            - impacted users or services
                            - hostname
                            - Ubuntu version
                            - kernel version
                            - uptime
                            - recent deployments
                            - recent package upgrades
                            - failed services
                            - resource saturation
                            - relevant logs
                            - immediate workaround
                            - rollback option

Recent change checks

# Apt package changes
                            less /var/log/apt/history.log
                            less /var/log/apt/term.log

                            # Recently modified config files under /etc
                            sudo find /etc -type f -mtime -2 -ls | sort -k 8,9

                            # Recent system boots
                            last reboot | head

                            # Current users
                            who
                            w

                            # Cron logs if syslog is available
                            grep CRON /var/log/syslog | tail -100

Immediate triage matrix

Observation	Likely category
Service failed in `systemctl`	Service/config/dependency issue.
Port not listening	Service did not bind or crashed.
Port listening locally but unreachable remotely	Firewall, route, security group, load balancer.
Disk full	Logs, Docker, DB, uploads, backups.
OOM kill in kernel logs	Memory leak or insufficient RAM.

Fast triage: first separate host failure, service failure, network failure, resource saturation and application failure.

systemd and service troubleshooting

Most production daemons on Ubuntu are managed by systemd. When a service is down, start with systemctl status, then read the journal, validate the config, check ports, check permissions, and only then restart.

Question	Command	What it tells you
Is service running?	`systemctl status nginx`	Active state, PID, exit code, recent logs.
Why did it fail?	`journalctl -u nginx`	Service logs and errors.
Did unit fail?	`systemctl --failed`	Failed systemd units.
Does it start at boot?	`systemctl is-enabled nginx`	Boot activation state.
Which ports are bound?	`ss -lntp`	Listening sockets and processes.
What are unit properties?	`systemctl show nginx`	Restart policy, limits, user, environment.

Service commands

# Status
                            systemctl status SERVICE

                            # Logs
                            journalctl -u SERVICE --since "1 hour ago"
                            journalctl -u SERVICE -f

                            # Restart
                            sudo systemctl restart SERVICE

                            # Reload config if supported
                            sudo systemctl reload SERVICE

                            # Enable at boot
                            sudo systemctl enable SERVICE

                            # Failed units
                            systemctl --failed

                            # Unit file
                            systemctl cat SERVICE

                            # Runtime properties
                            systemctl show SERVICE | less

Service failure decision tree

Service failed
                            │
                            ├── Read status
                            │       └── systemctl status SERVICE
                            │
                            ├── Read logs with time window
                            │       └── journalctl -u SERVICE --since "30 min ago"
                            │
                            ├── Config syntax valid?
                            │       ├── nginx -t
                            │       ├── sshd -t
                            │       └── app-specific check
                            │
                            ├── Dependency available?
                            │       ├── database
                            │       ├── redis
                            │       ├── network
                            │       └── filesystem mount
                            │
                            ├── Permissions correct?
                            │       ├── service user
                            │       ├── config files
                            │       └── runtime directories
                            │
                            ├── Port conflict?
                            │       └── ss -lntp
                            │
                            └── Restart with monitoring
                            └── systemctl restart SERVICE

Common service failures

Error pattern	Likely cause	Fix direction
Exit code 1 after deploy	Bad config or app error.	Validate config, rollback deploy.
Permission denied	Wrong owner/group/path.	Check service user and `namei -l`.
Address already in use	Port conflict.	Find process with `ss -lntp`.
Start request repeated too quickly	Crash loop.	Fix root cause, then `systemctl reset-failed`.
Dependency failed	Database, network, mount, Redis missing.	Restore dependency first.

Service rule: restart is a recovery action, not a diagnosis. Read the reason before restarting when possible.

Logs and journald: finding the real error

Logs are the timeline of the incident. On Ubuntu, the main tools are journalctl, service-specific logs, /var/log/auth.log, /var/log/syslog, kernel logs and application logs. The most useful log queries are scoped by service and time.

Need	Command	Use case
Recent critical context	`journalctl -xe`	Quick overview of recent errors.
Service logs	`journalctl -u nginx`	Why one service failed.
Current boot logs	`journalctl -b`	Boot-time errors and service startup.
Previous boot logs	`journalctl -b -1`	Debug crash/reboot before current boot.
Kernel logs	`journalctl -k`	OOM, disk, driver, network errors.
Authentication logs	`/var/log/auth.log`	SSH, sudo, login attempts.

journalctl commands

# Recent errors and context
                            journalctl -xe

                            # Service logs today
                            journalctl -u SERVICE --since today

                            # Service logs last 30 minutes
                            journalctl -u SERVICE --since "30 min ago"

                            # Follow service logs
                            journalctl -u SERVICE -f

                            # Warnings and errors today
                            journalctl -p warning --since today

                            # Current boot
                            journalctl -b

                            # Previous boot
                            journalctl -b -1

                            # Kernel logs
                            journalctl -k --since today

Classic log files

# System log
                            sudo tail -200 /var/log/syslog

                            # Authentication log
                            sudo tail -200 /var/log/auth.log

                            # Nginx logs
                            sudo tail -200 /var/log/nginx/error.log
                            sudo tail -200 /var/log/nginx/access.log

                            # Apt history
                            less /var/log/apt/history.log
                            less /var/log/apt/term.log

                            # Compressed rotated logs
                            zgrep -i "error" /var/log/syslog.*.gz

                            # Kernel ring buffer
                            dmesg -T | tail -100

Log investigation flow

Need root cause from logs
                            │
                            ├── Identify incident time window
                            │
                            ├── Read service journal
                            │       └── journalctl -u SERVICE --since TIME
                            │
                            ├── Read system warnings
                            │       └── journalctl -p warning --since TIME
                            │
                            ├── Read kernel logs
                            │       └── journalctl -k --since TIME
                            │
                            ├── Read application logs
                            │
                            ├── Correlate with deploy/update
                            │       └── apt history / deploy log
                            │
                            └── Extract first error, not last symptom

Useful grep patterns

grep -i "error" app.log
                            grep -i "permission denied" app.log
                            grep -i "connection refused" app.log
                            grep -i "no space left" /var/log/syslog
                            grep -i "killed process" /var/log/syslog
                            grep -i "failed password" /var/log/auth.log

Log rule: find the first meaningful error in the timeline. Later messages often describe consequences, not causes.

Network and DNS troubleshooting

Network debugging should be layered: IP address, route, DNS, firewall, listening port, local service response, remote response. Do not assume an application is broken until the network path is verified.

Layer	Question	Command	Bad sign
Interface	Does the host have an IP?	`ip a`	No expected IP.
Route	Is default route present?	`ip r`	No default route.
DNS	Can names resolve?	`dig`, `resolvectl`	Timeout or wrong answer.
Firewall	Is traffic allowed?	`ufw status verbose`	Required port denied.
Socket	Is service listening?	`ss -lntp`	Port missing.
HTTP local	Does local endpoint respond?	`curl -I localhost`	Connection refused or 5xx.
Remote path	Does public endpoint respond?	`curl -I domain`	Timeout, TLS, 5xx, wrong IP.

Network commands

# Interfaces
                            ip a

                            # Routes
                            ip r

                            # DNS status
                            resolvectl status

                            # DNS query
                            dig example.com
                            dig A example.com
                            dig AAAA example.com

                            # Listening ports
                            ss -lntp

                            # Connection summary
                            ss -s

                            # Firewall
                            sudo ufw status verbose

                            # Local HTTP check
                            curl -I http://localhost

                            # Public HTTP check
                            curl -I https://example.com

Network decision tree

Service unreachable
                            │
                            ├── Is service listening locally?
                            │       ├── no  -> service/config issue
                            │       └── yes
                            │
                            ├── Does local curl work?
                            │       ├── no  -> app/service issue
                            │       └── yes
                            │
                            ├── Is firewall open?
                            │       ├── no  -> UFW/cloud security group
                            │       └── yes
                            │
                            ├── Does DNS point to correct IP?
                            │       ├── no  -> DNS provider / record
                            │       └── yes
                            │
                            ├── Does remote curl reach?
                            │       ├── no  -> route/LB/firewall/provider
                            │       └── yes
                            │
                            └── Is response app error?
                            └── app logs / upstream logs

Common network symptoms

Symptom	Likely cause	Check
Connection refused	No service listening on target port.	`ss -lntp`, service status.
Connection timeout	Firewall, route, security group, provider.	UFW, cloud firewall, route.
DNS resolves wrong IP	Bad DNS record or stale cache.	`dig`, DNS console.
Works locally, not remotely	Firewall, bind address, reverse proxy, LB.	`ss`, UFW, Nginx.
TLS error	Wrong certificate, expired cert, SNI issue.	Nginx logs, certbot, openssl.

Network rule: localhost works and public domain works are different tests. Always verify both.

Disk, filesystem and IO troubleshooting

Disk problems can break everything: package installs, logs, databases, Docker, SSH, application uploads and systemd services. Always check disk early in an incident. A full / or /var often creates misleading application errors.

Problem	Command	Likely cause	Safe first action
Filesystem full	`df -h`	Logs, Docker, DB, backups, uploads.	Identify large directories.
Large logs	`du -sh /var/log/*`	Log storm or missing rotation.	Vacuum journal, rotate logs.
Docker disk growth	`docker system df`	Images, volumes, logs.	Prune only understood objects.
Mount missing	`findmnt`, `lsblk -f`	fstab error, disk detach.	Fix mount, do not write to wrong path.
High IO wait	`iostat -xz 1`	Slow disk, DB writes, swap, backup.	Find process with iotop.

Disk commands

# Filesystem usage
                            df -h

                            # Inode usage
                            df -ih

                            # Top-level directory sizes
                            sudo du -xhd1 / 2>/dev/null | sort -h

                            # Common growth areas
                            sudo du -sh /var/log/*
                            sudo du -sh /var/lib/docker/* 2>/dev/null
                            sudo du -sh /var/lib/postgresql/* 2>/dev/null
                            sudo du -sh /tmp/* 2>/dev/null

                            # Mounts and disks
                            lsblk -f
                            findmnt
                            cat /etc/fstab

                            # IO statistics
                            iostat -xz 1
                            sudo iotop -o

Disk full decision tree

Disk full
                            │
                            ├── Which filesystem?
                            │       └── df -h
                            │
                            ├── Is it root or /var?
                            │       └── du -xhd1 /
                            │
                            ├── Is journal huge?
                            │       └── journalctl --disk-usage
                            │
                            ├── Are app logs huge?
                            │       └── du -sh /var/log/*
                            │
                            ├── Is Docker huge?
                            │       └── docker system df
                            │
                            ├── Is database huge?
                            │       └── do not delete manually
                            │
                            └── Prevent recurrence
                            ├── logrotate
                            ├── monitoring
                            ├── retention
                            └── resize or separate volume

Safe cleanup commands

# Clean apt cache
                            sudo apt clean

                            # Remove unused packages
                            sudo apt autoremove

                            # Show journal size
                            journalctl --disk-usage

                            # Vacuum journal by time
                            sudo journalctl --vacuum-time=7d

                            # Vacuum journal by size
                            sudo journalctl --vacuum-size=1G

                            # Docker usage
                            docker system df

                            # Docker image cleanup - use with care
                            docker image prune

Dangerous cleanup commands

Dangerous in production:
                            rm -rf /var/lib/postgresql/*
                            rm -rf /var/lib/mysql/*
                            rm -rf /var/lib/docker/volumes/*
                            docker compose down -v
                            truncate unknown database files
                            delete random files under /var/lib

                            Safer:
                            - understand owner service
                            - stop service if required
                            - backup first
                            - use native cleanup tools
                            - document action

Disk rule: never delete unknown files under database or volume directories. Freeing space quickly can create permanent data loss.

CPU, memory, swap and process troubleshooting

Resource saturation explains many incidents: slow responses, timeouts, SSH lag, services killed by OOM, high load, worker backlog, database slowness and container instability. Identify whether the bottleneck is CPU, RAM, swap, IO wait or one process.

Signal	Command	Interpretation	Next action
High load	`uptime`	Runnable/waiting tasks high.	Check CPU vs IO wait.
High CPU	`top`, `pidstat`	Process consuming CPU.	Profile app or reduce load.
Low memory	`free -h`	Available memory low.	Find memory process.
Swap activity	`vmstat 1`	RAM pressure causing latency.	Reduce workers, add RAM.
OOM kill	`journalctl -k`	Kernel killed process.	Fix memory pressure.
High IO wait	`iostat`, `top`	CPU waiting on disk.	Disk/IO playbook.

Resource commands

# CPU/load
                            uptime
                            top
                            htop
                            ps aux --sort=-%cpu | head -30

                            # Memory
                            free -h
                            ps aux --sort=-%mem | head -30
                            vmstat 1

                            # Swap
                            swapon --show

                            # OOM events
                            journalctl -k --since today | grep -i -E "oom|killed process"

                            # Per-process stats if sysstat installed
                            pidstat -u -r 1

Resource decision tree

Server slow
                            │
                            ├── Load high?
                            │       └── uptime
                            │
                            ├── CPU saturated?
                            │       ├── yes -> top, process, app profiler
                            │       └── no
                            │
                            ├── IO wait high?
                            │       ├── yes -> iostat, iotop, disk playbook
                            │       └── no
                            │
                            ├── Memory low?
                            │       ├── yes -> ps by memory, OOM logs
                            │       └── no
                            │
                            ├── Swap active?
                            │       ├── yes -> reduce workers or add RAM
                            │       └── no
                            │
                            └── App-level bottleneck
                            ├── DB query
                            ├── lock
                            ├── external API
                            ├── cache miss
                            └── queue backlog

Common resource fixes

Cause	Short-term action	Long-term fix
Too many app workers	Reduce workers, restart app.	Right-size worker count.
Memory leak	Restart controlled service.	Fix code, add monitoring, MemoryMax.
Traffic spike	Rate limit, scale, cache.	Autoscaling, CDN, capacity plan.
Slow database query	Kill/limit bad query if safe.	Index, query optimization, DB scaling.
Backup job overload	Pause or throttle job.	Schedule and IO limits.

Resource rule: high load does not always mean high CPU. It can also mean tasks waiting on disk or blocked resources.

Boot, kernel, emergency mode and recovery troubleshooting

Boot issues usually come from filesystem errors, broken /etc/fstab, failed mounts, bootloader problems, bad kernel update, disk issues or cloud volume attachment problems. Recovery may require console access, rescue mode, previous kernel or mounting the disk on another instance.

Symptom	Likely cause	Diagnostic	Recovery direction
Emergency mode	Broken fstab or failed mount.	Console logs, `journalctl -xb`.	Fix fstab or mount issue.
Boot hangs after update	Kernel/driver issue.	GRUB previous kernel.	Boot previous kernel, rollback.
No SSH after reboot	Network, firewall, ssh service, boot incomplete.	Cloud console / serial log.	Console recovery.
Filesystem check fails	Disk corruption or unclean shutdown.	fsck from recovery.	Repair with backup ready.
Wrong boot disk	Bootloader or cloud volume mapping.	UEFI/GRUB/cloud console.	Fix boot order or volume attachment.

Boot diagnostics

# Current boot logs
                            journalctl -b

                            # Previous boot logs
                            journalctl -b -1

                            # Boot errors
                            journalctl -b -p err

                            # Kernel logs
                            journalctl -k -b

                            # Filesystems
                            lsblk -f
                            findmnt
                            cat /etc/fstab

                            # Failed units
                            systemctl --failed

                            # Kernel version
                            uname -a

Boot failure decision tree

Server did not come back after reboot
                            │
                            ├── Cloud or physical console available?
                            │       ├── yes -> read boot output
                            │       └── no  -> use provider recovery tools
                            │
                            ├── Reaches GRUB?
                            │       ├── yes -> try previous kernel
                            │       └── no  -> bootloader/disk issue
                            │
                            ├── Emergency mode?
                            │       ├── yes -> check fstab and mounts
                            │       └── no
                            │
                            ├── Network failed?
                            │       ├── yes -> check netplan/cloud-init
                            │       └── no
                            │
                            ├── SSH failed?
                            │       ├── yes -> ssh service/firewall/keys
                            │       └── no
                            │
                            └── Application failed after boot
                            └── systemd service playbook

fstab recovery checks

# Check fstab content
                            cat /etc/fstab

                            # Test mounts without reboot
                            sudo mount -a

                            # Show current mounts
                            findmnt

                            # Validate UUIDs
                            blkid
                            lsblk -f

Cloud recovery pattern

Broken cloud VM
                            │
                            ├── Stop instance
                            ├── Detach root volume
                            ├── Attach volume to rescue instance
                            ├── Mount filesystem
                            ├── Fix fstab/config/keys
                            ├── Unmount cleanly
                            ├── Reattach as root volume
                            └── Boot and verify

Boot rule: any change to /etc/fstab, bootloader, kernel or network config should be tested before rebooting a remote server.

Incident playbooks: common Ubuntu production failures

Playbook matrix

Incident	First command	Likely root causes	Safe correction
Website down	`curl -I localhost`	Nginx, app service, DB, firewall.	Fix failed layer, rollback deploy if needed.
502 Bad Gateway	`systemctl status app`	Upstream app down, socket path, port mismatch.	Fix app service or Nginx upstream.
SSH unavailable	Cloud console / provider console.	Firewall, SSH config, key, fail2ban, network.	Console recovery, avoid closing existing session.
Disk full	`df -h`	Logs, Docker, DB, backups.	Safe cleanup and retention fix.
High CPU	`top`	Traffic spike, hot process, backup, worker count.	Limit, scale, rollback, profile.
OOM kill	`journalctl -k \| grep -i oom`	Memory leak, too many workers, low RAM.	Reduce memory pressure, add limits.
DNS failure	`dig domain`	Bad record, resolver, TTL, provider issue.	Fix DNS or resolver path.
Package update broke service	`less /var/log/apt/history.log`	Dependency change, config prompt, version mismatch.	Rollback package or restore previous image.

502 Nginx playbook

502 Bad Gateway
                            │
                            ├── Check Nginx config
                            │       └── sudo nginx -t
                            │
                            ├── Check Nginx logs
                            │       └── tail -100 /var/log/nginx/error.log
                            │
                            ├── Check upstream app service
                            │       └── systemctl status gunicorn
                            │
                            ├── Check upstream port/socket
                            │       └── ss -lntp
                            │
                            ├── Check app logs
                            │       └── journalctl -u gunicorn
                            │
                            └── Fix app or upstream config

SSH lockout playbook

SSH unavailable
                            │
                            ├── Is server reachable?
                            │       └── ping / cloud status checks
                            │
                            ├── Is port open externally?
                            │       └── security group / firewall
                            │
                            ├── Console access possible?
                            │       └── provider console / serial console
                            │
                            ├── Check ssh service
                            │       └── systemctl status ssh
                            │
                            ├── Check firewall
                            │       └── ufw status verbose
                            │
                            ├── Check SSH config syntax
                            │       └── sshd -t
                            │
                            ├── Check keys and user
                            │       └── authorized_keys, permissions
                            │
                            └── Restore safe access before hardening again

Disk full playbook

Disk full
                            │
                            ├── df -h
                            ├── du -xhd1 /
                            ├── du -sh /var/log/*
                            ├── journalctl --disk-usage
                            ├── docker system df
                            ├── apt clean
                            ├── journalctl --vacuum-time=7d
                            ├── prune Docker carefully if applicable
                            ├── resize volume if needed
                            └── add monitoring and retention

Post-incident actions

After restoration:
                            [ ] Confirm user-visible service is healthy
                            [ ] Confirm logs are clean
                            [ ] Confirm monitoring is green
                            [ ] Record exact root cause
                            [ ] Record commands executed
                            [ ] Record rollback option used or not used
                            [ ] Add missing alert
                            [ ] Add missing dashboard panel
                            [ ] Add missing runbook step
                            [ ] Schedule permanent fix

Incident rule: the incident is not finished when the service is back. It is finished when the cause is understood and recurrence is reduced.

Ubuntu troubleshooting cheat sheet and final checklist

Command cheat sheet

# Host
                            hostnamectl
                            uptime
                            who
                            w
                            last reboot | head

                            # Services
                            systemctl status SERVICE
                            systemctl --failed
                            systemctl cat SERVICE
                            journalctl -u SERVICE --since "30 min ago"
                            journalctl -u SERVICE -f

                            # Logs
                            journalctl -xe
                            journalctl -p warning --since today
                            journalctl -k --since today
                            tail -100 /var/log/syslog
                            tail -100 /var/log/auth.log

                            # Network
                            ip a
                            ip r
                            ss -lntp
                            ss -s
                            resolvectl status
                            dig example.com
                            curl -I http://localhost
                            ufw status verbose

                            # Disk
                            df -h
                            df -ih
                            du -xhd1 /
                            lsblk -f
                            findmnt
                            journalctl --disk-usage

                            # Resources
                            free -h
                            top
                            ps aux --sort=-%cpu | head
                            ps aux --sort=-%mem | head
                            vmstat 1
                            iostat -xz 1

Final troubleshooting checklist

[ ] Symptom is precisely described
                            [ ] Incident start time is known
                            [ ] Scope is known
                            [ ] Host health is checked
                            [ ] Disk usage is checked
                            [ ] Memory and CPU are checked
                            [ ] Failed systemd units are checked
                            [ ] Relevant service logs are read
                            [ ] Kernel logs are checked if needed
                            [ ] Network path is verified
                            [ ] DNS is verified
                            [ ] Firewall is verified
                            [ ] Listening ports are verified
                            [ ] Recent deploys are checked
                            [ ] Recent apt updates are checked
                            [ ] Fix is minimal and reversible
                            [ ] Service health is verified after fix
                            [ ] Postmortem notes are written
                            [ ] Preventive action is created

Final rule

Ubuntu troubleshooting is evidence-driven.
Start with facts: logs, service status, ports, disk, memory, CPU, network and recent changes. Apply one controlled fix, verify the result, document the root cause, and add monitoring or a runbook so the same incident becomes easier next time.

Minimal incident report template

Incident report:
                            - title
                            - start time
                            - detection method
                            - impacted service
                            - user impact
                            - root cause
                            - immediate fix
                            - commands executed
                            - rollback used
                            - prevention action
                            - owner
                            - deadline

7.1 Ubuntu Cheat Sheet: essential commands, production checklists, cloud ops and incident shortcuts

Ubuntu operator quick map

This cheat sheet is a compact operational reference for Ubuntu servers: first checks, package operations, systemd services, journald logs, network, DNS, disk, memory, security, cloud patterns and production readiness.

Need	First command	What it answers
Host identity	`hostnamectl`	Hostname, OS, kernel, machine type.
System load	`uptime`	Load average and uptime.
Failed services	`systemctl --failed`	Broken units.
Service status	`systemctl status SERVICE`	Service state, PID, exit code, recent logs.
Service logs	`journalctl -u SERVICE`	Service timeline and errors.
Listening ports	`ss -lntp`	Open TCP ports and owning processes.
Disk usage	`df -h`	Filesystem capacity.
Memory usage	`free -h`	RAM, available memory and swap.
Firewall	`sudo ufw status verbose`	Host-level network exposure.
Recent package changes	`less /var/log/apt/history.log`	Updates, installs and removals.

Fast rule: in an incident, check disk, memory, failed units, ports and recent logs before changing configuration.

First 90 seconds on a server

echo "== HOST =="
                            hostnamectl

                            echo "== UPTIME =="
                            uptime

                            echo "== USERS =="
                            who

                            echo "== DISK =="
                            df -h

                            echo "== MEMORY =="
                            free -h

                            echo "== FAILED UNITS =="
                            systemctl --failed

                            echo "== PORTS =="
                            ss -lntp

                            echo "== WARNINGS =="
                            journalctl -p warning --since "30 min ago" --no-pager | tail -100

Triage decision tree

Problem reported
                            │
                            ├── Host unreachable?
                            │       └── cloud, network, boot, firewall
                            │
                            ├── Disk full?
                            │       └── df -h, du, journal size, Docker
                            │
                            ├── Service failed?
                            │       └── systemctl status, journalctl
                            │
                            ├── Port missing?
                            │       └── service bind, config, crash
                            │
                            ├── Network path broken?
                            │       └── IP, route, DNS, UFW, security group
                            │
                            └── App problem?
                            └── app logs, DB, cache, deploy

Bad practice: changing several things at once. Use one change, one verification, one rollback path.

System identity, users, processes and host state

Host and OS

# Host and OS summary
                            hostnamectl

                            # Ubuntu release metadata
                            cat /etc/os-release
                            lsb_release -a

                            # Kernel
                            uname -a

                            # Architecture
                            dpkg --print-architecture

                            # Boot and uptime
                            uptime
                            last reboot | head

                            # Current users
                            who
                            w

Processes

# Interactive process view
                            top
                            htop

                            # Top CPU processes
                            ps aux --sort=-%cpu | head -30

                            # Top memory processes
                            ps aux --sort=-%mem | head -30

                            # Process tree
                            pstree -ap

                            # Find process
                            pgrep -af nginx

                            # Open files by process
                            sudo lsof -p PID

Users and groups

# Current identity
                            whoami
                            id

                            # User identity
                            id deploy
                            groups deploy

                            # Create user
                            sudo adduser deploy

                            # Add sudo rights
                            sudo usermod -aG sudo deploy

                            # Show sudo group
                            getent group sudo

                            # Show shell users
                            grep -E "/bin/bash|/bin/sh|/usr/bin/zsh" /etc/passwd

                            # Lock user password
                            sudo passwd -l username

Permissions

# File permissions
                            ls -lah /srv/app

                            # Path permissions
                            namei -l /srv/app/current/.env

                            # Change owner
                            sudo chown deploy:www-data file

                            # Recursive owner change
                            sudo chown -R deploy:www-data /srv/app

                            # File mode
                            chmod 644 file

                            # Directory mode
                            chmod 755 directory

                            # Secret file mode
                            chmod 600 secret.key

Permission rule: never solve production permissions with chmod 777. Fix owner, group and minimal access.

Packages, APT, repositories and updates

APT essentials

# Refresh package metadata
                            sudo apt update

                            # Show upgradeable packages
                            apt list --upgradable

                            # Upgrade packages
                            sudo apt upgrade

                            # Full dependency-aware upgrade
                            sudo apt full-upgrade

                            # Install package
                            sudo apt install PACKAGE

                            # Remove package, keep config
                            sudo apt remove PACKAGE

                            # Remove package and config
                            sudo apt purge PACKAGE

                            # Remove unused dependencies
                            sudo apt autoremove

                            # Clean package cache
                            sudo apt clean

Package inspection

# Search package
                            apt search nginx

                            # Package details
                            apt show nginx

                            # Installed and candidate version
                            apt policy nginx

                            # Installed packages
                            dpkg -l | grep nginx

                            # Files installed by package
                            dpkg -L nginx

                            # Package owning a file
                            dpkg -S /usr/sbin/nginx

                            # Available versions
                            apt-cache madison nginx

Repositories and history

# Source files
                            cat /etc/apt/sources.list
                            ls -lah /etc/apt/sources.list.d/

                            # Search repo lines
                            grep -R "^deb" /etc/apt/sources.list /etc/apt/sources.list.d/

                            # APT history
                            less /var/log/apt/history.log

                            # APT terminal logs
                            less /var/log/apt/term.log

                            # Held packages
                            apt-mark showhold

                            # Hold package
                            sudo apt-mark hold PACKAGE

                            # Unhold package
                            sudo apt-mark unhold PACKAGE

Package repair

# Finish interrupted dpkg operation
                            sudo dpkg --configure -a

                            # Fix broken dependencies
                            sudo apt -f install

                            # Check locks safely
                            ps aux | grep -E 'apt|dpkg'

                            # Re-run metadata refresh
                            sudo apt update

Update safety

# Reboot required?
                            test -f /var/run/reboot-required && cat /var/run/reboot-required

                            # Packages requiring reboot if present
                            cat /var/run/reboot-required.pkgs 2>/dev/null

                            # Security automation
                            sudo apt install unattended-upgrades
                            sudo dpkg-reconfigure unattended-upgrades

Production rule: review external repositories and package changes before major upgrades. Unknown PPAs create upgrade and supply-chain risk.

systemd services, unit files and runtime control

Service commands

# Service status
                            systemctl status SERVICE

                            # Start / stop / restart
                            sudo systemctl start SERVICE
                            sudo systemctl stop SERVICE
                            sudo systemctl restart SERVICE

                            # Reload config if supported
                            sudo systemctl reload SERVICE

                            # Enable / disable at boot
                            sudo systemctl enable SERVICE
                            sudo systemctl disable SERVICE

                            # Is active / enabled?
                            systemctl is-active SERVICE
                            systemctl is-enabled SERVICE

                            # Failed units
                            systemctl --failed

                            # Reset failed state
                            sudo systemctl reset-failed SERVICE

Unit inspection

# Show unit file
                            systemctl cat SERVICE

                            # Show runtime properties
                            systemctl show SERVICE | less

                            # Show service logs
                            journalctl -u SERVICE --since "1 hour ago"

                            # Follow logs
                            journalctl -u SERVICE -f

                            # Reload unit files after edit
                            sudo systemctl daemon-reload

Service failure flow

Service broken
                            │
                            ├── systemctl status SERVICE
                            ├── journalctl -u SERVICE --since "30 min ago"
                            ├── systemctl cat SERVICE
                            ├── validate config
                            ├── check dependencies
                            ├── check permissions
                            ├── check ports
                            ├── restart only after cause is understood
                            └── verify logs and health check

Common config validators

# Nginx
                            sudo nginx -t

                            # SSH
                            sudo sshd -t

                            # Apache
                            sudo apachectl configtest

                            # PostgreSQL
                            sudo -u postgres psql -c "select version();"

                            # Redis
                            redis-cli ping

                            # Local HTTP health
                            curl -I http://localhost

Robust unit pattern

[Unit]
                            Description=My app
                            After=network.target

                            [Service]
                            User=myapp
                            Group=myapp
                            WorkingDirectory=/srv/myapp
                            EnvironmentFile=/srv/myapp/.env
                            ExecStart=/srv/myapp/venv/bin/gunicorn config.wsgi:application
                            Restart=on-failure
                            RestartSec=5
                            LimitNOFILE=65535

                            [Install]
                            WantedBy=multi-user.target

Service rule: every production daemon should be systemd-managed, non-root, restartable, logged and documented.

Logs, journald, auth logs, kernel logs and audit trail

journald essentials

# Recent diagnostic context
                            journalctl -xe

                            # Current boot
                            journalctl -b

                            # Previous boot
                            journalctl -b -1

                            # Service logs
                            journalctl -u SERVICE

                            # Service logs since today
                            journalctl -u SERVICE --since today

                            # Service logs last 30 minutes
                            journalctl -u SERVICE --since "30 min ago"

                            # Follow service logs
                            journalctl -u SERVICE -f

                            # Warnings and errors
                            journalctl -p warning --since today

                            # Kernel logs
                            journalctl -k --since today

Classic log files

# System log
                            sudo tail -200 /var/log/syslog

                            # Authentication log
                            sudo tail -200 /var/log/auth.log

                            # Nginx
                            sudo tail -200 /var/log/nginx/error.log
                            sudo tail -200 /var/log/nginx/access.log

                            # APT
                            less /var/log/apt/history.log
                            less /var/log/apt/term.log

                            # Kernel ring buffer
                            dmesg -T | tail -100

Search patterns

# Generic errors
                            grep -i "error" app.log
                            grep -i "failed" app.log
                            grep -i "permission denied" app.log
                            grep -i "connection refused" app.log
                            grep -i "no space left" /var/log/syslog

                            # SSH failures
                            sudo grep -i "failed password" /var/log/auth.log | tail -100

                            # Sudo usage
                            sudo grep -i "sudo" /var/log/auth.log | tail -100

                            # OOM events
                            journalctl -k --since today | grep -i -E "oom|killed process"

                            # Compressed rotated logs
                            zgrep -i "error" /var/log/syslog.*.gz

Journal size control

# Show journal size
                            journalctl --disk-usage

                            # Vacuum by time
                            sudo journalctl --vacuum-time=14d

                            # Vacuum by size
                            sudo journalctl --vacuum-size=1G

Log investigation flow

Find root cause
                            │
                            ├── define incident time window
                            ├── read service journal
                            ├── read system warnings
                            ├── read kernel logs
                            ├── read app logs
                            ├── check apt/deploy history
                            └── identify first meaningful error

Log rule: the first meaningful error is more valuable than the last visible symptom.

Network, DNS, firewall and HTTP checks

Network commands

# Interfaces
                            ip a

                            # Routes
                            ip r

                            # Interface counters
                            ip -s link

                            # Listening TCP ports
                            ss -lntp

                            # Established connections
                            ss -antp

                            # Socket summary
                            ss -s

                            # DNS status
                            resolvectl status

                            # DNS query
                            dig example.com
                            dig A example.com
                            dig AAAA example.com

                            # Reachability
                            ping -c 3 1.1.1.1
                            tracepath example.com
                            mtr -rw example.com

HTTP and TLS checks

# Local HTTP
                            curl -I http://localhost

                            # Public HTTP
                            curl -I https://example.com

                            # Follow redirects
                            curl -IL https://example.com

                            # Verbose TLS/HTTP
                            curl -vI https://example.com

                            # Check certificate with openssl
                            openssl s_client -connect example.com:443 -servername example.com

Firewall commands

# Status
                            sudo ufw status verbose
                            sudo ufw status numbered

                            # Baseline
                            sudo ufw default deny incoming
                            sudo ufw default allow outgoing

                            # Allow SSH
                            sudo ufw allow OpenSSH

                            # Allow web
                            sudo ufw allow 80/tcp
                            sudo ufw allow 443/tcp

                            # Restrict SSH by source
                            sudo ufw allow from 203.0.113.10 to any port 22 proto tcp

                            # Delete numbered rule
                            sudo ufw delete RULE_NUMBER

                            # Enable firewall
                            sudo ufw enable

Network diagnostic flow

Service unreachable
                            │
                            ├── service listening locally?
                            │       └── ss -lntp
                            │
                            ├── local curl works?
                            │       └── curl -I localhost
                            │
                            ├── firewall open?
                            │       └── ufw status
                            │
                            ├── DNS points correctly?
                            │       └── dig domain
                            │
                            ├── remote curl works?
                            │       └── curl -I domain
                            │
                            └── app returns error?
                            └── app logs / upstream logs

Cloud note: for AWS or another cloud, check both host firewall and cloud security group. Either one can block traffic.

Disk, filesystem, memory, CPU and IO

Disk and filesystem

# Filesystem usage
                            df -h

                            # Inode usage
                            df -ih

                            # Block devices
                            lsblk -f

                            # Mounts
                            findmnt

                            # fstab
                            cat /etc/fstab

                            # Top-level usage
                            sudo du -xhd1 / 2>/dev/null | sort -h

                            # Common growth paths
                            sudo du -sh /var/log/*
                            sudo du -sh /var/lib/docker/* 2>/dev/null
                            sudo du -sh /var/lib/postgresql/* 2>/dev/null
                            sudo du -sh /tmp/* 2>/dev/null

Safe cleanup

# APT cache
                            sudo apt clean
                            sudo apt autoremove

                            # Journal
                            journalctl --disk-usage
                            sudo journalctl --vacuum-time=14d
                            sudo journalctl --vacuum-size=1G

                            # Docker usage
                            docker system df

                            # Docker cleanup, use carefully
                            docker image prune
                            docker container prune

CPU, memory and IO

# CPU/load
                            uptime
                            top
                            htop
                            ps aux --sort=-%cpu | head -30

                            # Memory
                            free -h
                            ps aux --sort=-%mem | head -30

                            # Swap
                            swapon --show
                            vmstat 1

                            # OOM
                            journalctl -k --since today | grep -i -E "oom|killed process"

                            # IO, requires sysstat
                            iostat -xz 1

                            # Per-process IO
                            sudo iotop -o

Disk full playbook

Disk full
                            │
                            ├── df -h
                            ├── du -xhd1 /
                            ├── journalctl --disk-usage
                            ├── du -sh /var/log/*
                            ├── docker system df
                            ├── apt clean
                            ├── journalctl --vacuum-time=14d
                            ├── resize volume if needed
                            └── add alert and retention policy

Resource interpretation

Signal	Likely issue
High load + high CPU	CPU-bound workload or traffic spike.
High load + high IO wait	Disk or database bottleneck.
Low available RAM + swap activity	Memory pressure.
OOM kill logs	Process killed by kernel due to memory exhaustion.
Filesystem 100%	Services may fail unpredictably.

Data warning: never delete unknown files in database directories or Docker volumes without understanding ownership and backup state.

Security hardening and quick audit

SSH hardening

# Backup config
                            sudo cp /etc/ssh/sshd_config /etc/ssh/sshd_config.bak.$(date +%Y%m%d-%H%M%S)

                            # Recommended directives
                            PermitRootLogin no
                            PasswordAuthentication no
                            PubkeyAuthentication yes
                            X11Forwarding no
                            MaxAuthTries 3
                            AllowUsers deploy

                            # Validate and restart
                            sudo sshd -t
                            sudo systemctl restart ssh

                            # Logs
                            journalctl -u ssh --since today

SSH key permissions

chmod 700 ~/.ssh
                            chmod 600 ~/.ssh/id_ed25519
                            chmod 644 ~/.ssh/id_ed25519.pub
                            chmod 600 ~/.ssh/authorized_keys

fail2ban

sudo apt install fail2ban
                            sudo systemctl enable --now fail2ban
                            sudo fail2ban-client status
                            sudo fail2ban-client status sshd
                            sudo journalctl -u fail2ban --since today

Security snapshot

echo "== UFW =="
                            sudo ufw status verbose

                            echo "== OPEN PORTS =="
                            ss -lntp

                            echo "== SUDO USERS =="
                            getent group sudo

                            echo "== SHELL USERS =="
                            grep -E "/bin/bash|/bin/sh|/usr/bin/zsh" /etc/passwd

                            echo "== SSH LOGS =="
                            journalctl -u ssh --since "24 hours ago" --no-pager | tail -100

                            echo "== AUTH LOG =="
                            sudo tail -100 /var/log/auth.log

Security checklist

[ ] Ubuntu LTS
                            [ ] Packages updated
                            [ ] Reboot-required checked
                            [ ] Named admin user
                            [ ] Root SSH login disabled
                            [ ] SSH key login validated
                            [ ] Password SSH disabled
                            [ ] UFW enabled
                            [ ] Only required ports open
                            [ ] Database ports private
                            [ ] Redis ports private
                            [ ] fail2ban enabled if SSH public
                            [ ] Service users are non-root
                            [ ] Secrets are not world-readable
                            [ ] Backups exist
                            [ ] Restore tested

Security rule: hardening is useful only if access remains recoverable and changes are documented.

Cloud and AWS Ubuntu quick reference

AWS Ubuntu baseline

Production EC2 Ubuntu baseline:
                            - official Ubuntu LTS AMI
                            - Canonical owner verified
                            - minimal security group
                            - SSH restricted by source or bastion
                            - IAM role instead of static keys
                            - cloud-init tested
                            - packages updated
                            - UFW aligned with security group
                            - monitoring installed
                            - snapshots scheduled
                            - restore tested
                            - tags complete
                            - instance replaceable

Canonical AMI owner

Canonical AWS owner ID:
                            099720109477

                            Use it to filter official Ubuntu AMIs.

AWS CLI AMI search

aws ec2 describe-images \
                            --owners 099720109477 \
                            --filters \
                            "Name=name,Values=ubuntu/images/hvm-ssd/ubuntu-*-24.04-amd64-server-*" \
                            "Name=state,Values=available" \
                            "Name=architecture,Values=x86_64" \
                            --query 'Images | sort_by(@, &CreationDate)[-5:].{Name:Name,ImageId:ImageId,CreationDate:CreationDate}' \
                            --output table

cloud-init quick pattern

#cloud-config
                            package_update: true
                            package_upgrade: true

                            timezone: UTC

                            packages:
                            - curl
                            - wget
                            - git
                            - htop
                            - ufw
                            - fail2ban
                            - nginx

                            runcmd:
                            - ufw allow OpenSSH
                            - ufw allow 80/tcp
                            - ufw allow 443/tcp
                            - ufw --force enable
                            - systemctl enable --now nginx
                            - systemctl enable --now fail2ban

Cloud-init diagnostics

cloud-init status
                            cloud-init status --wait
                            sudo tail -200 /var/log/cloud-init.log
                            sudo tail -200 /var/log/cloud-init-output.log

Official links

Resource	URL
Ubuntu downloads	`https://ubuntu.com/download`
Ubuntu documentation	`https://documentation.ubuntu.com/`
Ubuntu Server docs	`https://documentation.ubuntu.com/server/`
Ubuntu release cycle	`https://ubuntu.com/about/release-cycle`
Ubuntu releases	`https://releases.ubuntu.com/`
Ubuntu on AWS	`https://documentation.ubuntu.com/aws/`
AWS AMI concepts	`https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html`

Cloud rule: do not put long-lived secrets in user data or baked AMIs. Use IAM, Parameter Store or a proper secret manager.

Production server checklist and mini demo

Production readiness checklist

[System]
                            [ ] Ubuntu LTS selected
                            [ ] Hostname correct
                            [ ] Timezone configured
                            [ ] Packages updated
                            [ ] Reboot-required checked
                            [ ] Server role documented

                            [Security]
                            [ ] SSH keys only
                            [ ] Root SSH login disabled
                            [ ] UFW enabled
                            [ ] Only required ports open
                            [ ] Users and sudo controlled
                            [ ] Service users non-root
                            [ ] Secrets protected

                            [Operations]
                            [ ] systemd services enabled
                            [ ] Logs visible with journalctl
                            [ ] Monitoring installed
                            [ ] Alerts configured
                            [ ] Backups scheduled
                            [ ] Restore tested
                            [ ] Patch policy defined
                            [ ] Runbook written

                            [Cloud]
                            [ ] Official LTS image
                            [ ] Security groups minimal
                            [ ] IAM role least privilege
                            [ ] Snapshots configured
                            [ ] Tags complete
                            [ ] Replacement strategy documented

Mini demo for portfolio

Demo: production-minded Ubuntu EC2

                            Architecture:
                            Internet
                            │
                            ▼
                            Security Group
                            │
                            ├── 22/tcp from admin IP only
                            ├── 80/tcp public
                            └── 443/tcp public
                            │
                            ▼
                            Ubuntu LTS EC2
                            ├── cloud-init installs nginx
                            ├── UFW enabled
                            ├── fail2ban enabled
                            ├── logs checked
                            ├── metrics installed
                            └── backup snapshot configured

Mini demo validation commands

hostnamectl
                            cloud-init status
                            systemctl status nginx
                            sudo ufw status verbose
                            ss -lntp
                            curl -I http://localhost
                            journalctl -u nginx --since today
                            df -h
                            free -h

Cheat-sheet poster placeholder

Optional poster image:

Placeholder: static/img/ubuntu/ubuntu_cheatsheet_poster.png

Final rule: a professional Ubuntu server is not just installed. It is secured, monitored, patched, backed up, documented and recoverable.