Project Oxygen & Ideo-LabIDEO LAB Dashboard 2026
IDEOยทLAB

๐Ÿง Ubuntu Linux โ€” Guide Complet (Desktop / Server / Cloud / AWS)

Ubuntu = distribution Linux โ€œproduction-friendlyโ€ : stabilitรฉ, sรฉcuritรฉ, support, cloud, รฉcosystรจme. (Catรฉgorie IDEO-Lab : O/S & Platforms)

Download
1.1

Ubuntu : cโ€™est quoi ?

Positionnement, philosophie, rรฉputation (desktop + serveur + cloud), pourquoi cโ€™est un standard โ€œproโ€.

O/S & Platforms Linux Enterprise-ready
1.2

Versions & cycle (LTS)

LTS vs interim, support, comment choisir (prod/dev), exemples de versions actuelles.

LTS Release cycle Support
2.1

Installation (Desktop/Server)

ISO, partitionnement, UEFI, SSH, cloud-init (server), post-install โ€œpropreโ€.

Install UEFI SSH
7.2

Software Management

Ubuntu App Center, .deb packages, Snap, Flatpak, PPA repositories, software sources and safe production usage.

App Center DEB PPA
2.2

Fonctions de base (CLI)

Fichiers, users, permissions, services, logs, rรฉseau, storage : le kit โ€œsysadminโ€.

Terminal Systemd Troubleshoot
3.1

Paquets : APT & Snap

Repositories, pinning, updates, sรฉcuritรฉ, snaps, bonnes pratiques (prod).

APT Snap Repos
7.3

Maรฎtriser le Terminal

BASH, commandes fondamentales, navigation fichiers, sudo, permissions, chmod, chown et rรฉflexes sysadmin.

BASH CLI Permissions
4.1

Sรฉcuritรฉ (hardening)

UFW, SSH, fail2ban, mises ร  jour sรฉcuritรฉ, users/roles, audit & bonnes pratiques cloud.

Security UFW SSH
4.2

Performance & robustesse

Kernel, IO, memory, CPU, tuning, monitoring, pourquoi Ubuntu est โ€œstableโ€ en prod.

Perf Robust Monitoring
7.4

Maintenance & Security

System updates, UFW firewall, Timeshift restore points, logs, journald and safe maintenance routines.

Updates UFW Timeshift
5.1

Cloud & AWS (Ubuntu images)

AMI officielles, Owner Canonical, cloud-init, userdata, SSH keys, patterns EC2.

AWS EC2 cloud-init
5.2

Containers & Virtualisation

Docker, LXD/LXC, KVM, virt-manager, usages (CI/CD, lab, prod).

Docker LXD KVM
7.5

Customization & Optimization

GNOME extensions, themes, icons, keyboard shortcuts, battery management, swappiness and safe cleanup routines.

GNOME Themes Optimize
6.1

Dรฉpannage (mรฉthodo)

Logs systemd, journald, rรฉseau, DNS, disk, boot, services : playbook.

Debug Logs Incidents
7.1

Cheat-Sheet Ubuntu

Commandes essentielles + checklists โ€œserveur prodโ€ + bonnes pratiques cloud.

Quick Checklist Ops
1.1 Ubuntu: definition, positioning, reputation, server, desktop, cloud and professional usage
Definition

Ubuntu is a Linux distribution maintained by Canonical. It is built on the Linux kernel and provides a complete operating system: package management, system services, security updates, networking, storage, user management, desktop environment, server tools and cloud images.

In professional environments, Ubuntu is popular because it is predictable, widely documented, cloud-friendly, developer-friendly and available in long-term support releases. It is commonly used for web servers, APIs, containers, DevOps tooling, CI/CD runners, databases, monitoring, AI workloads and desktop development.

Category: Operating system / Linux distribution
Vendor: Canonical
Kernel: Linux
Main strengths: LTS, packages, cloud, documentation
Common roles: server, desktop, cloud VM, container host
Professional value: DevOps, backend, SRE, infrastructure
Simple definition: Ubuntu is a production-friendly Linux operating system used to run applications, services, containers, databases, automation pipelines and developer workstations.
Where Ubuntu sits in the technology landscape
LayerUbuntu roleExamples
Hardware / VMRuns on physical machines or virtual machines.Server, laptop, AWS EC2, Azure VM, KVM.
KernelUses Linux kernel for process, memory, network and filesystem control.scheduler, TCP/IP, ext4, drivers.
User spaceProvides tools, libraries, shells and services.bash, systemd, apt, ssh, journald.
ApplicationsHosts business and infrastructure services.Nginx, PostgreSQL, Redis, Docker, Django.
OperationsProvides operational surface for admins and DevOps.logs, units, firewall, packages, users.
Mental classification
Ubuntu is not:
                            - a programming language
                            - a framework
                            - a database
                            - a cloud provider
                            - a container engine

                            Ubuntu is:
                            - an operating system
                            - a Linux distribution
                            - a server platform
                            - a desktop platform
                            - a cloud image baseline
                            - a container host
                            - a DevOps execution environment
Why Ubuntu became a professional standard

Ubuntu became a common professional choice because it offers a practical balance: easier than many traditional server distributions for newcomers, stable enough for production when using LTS, and supported by a huge ecosystem of packages, tutorials, cloud images and vendor documentation.

ReasonProfessional impactConcrete example
LTS releasesStable baseline for servers and production workloads.Choose one version and patch it for years.
Large package ecosystemFast installation of standard infrastructure tools.apt install nginx postgresql redis
Cloud imagesQuick deployment on public cloud providers.EC2, Azure, GCP, OpenStack.
DocumentationFaster troubleshooting and onboarding.Server docs, community docs, vendor guides.
Developer toolingGood fit for Python, Node.js, Go, Java, Docker and CI/CD.Local dev and production parity.
Enterprise supportCommercial support path exists if needed.Canonical support, Ubuntu Pro, security services.
Professional value map
Ubuntu knowledge helps in:

                            Backend engineering
                            โ”œโ”€โ”€ deploy APIs
                            โ”œโ”€โ”€ manage services
                            โ”œโ”€โ”€ inspect logs
                            โ””โ”€โ”€ debug network and permissions

                            DevOps
                            โ”œโ”€โ”€ automate installs
                            โ”œโ”€โ”€ configure systemd
                            โ”œโ”€โ”€ harden SSH
                            โ”œโ”€โ”€ manage packages
                            โ””โ”€โ”€ operate containers

                            SRE / Production
                            โ”œโ”€โ”€ monitor CPU/RAM/disk/network
                            โ”œโ”€โ”€ investigate incidents
                            โ”œโ”€โ”€ patch security updates
                            โ”œโ”€โ”€ tune services
                            โ””โ”€โ”€ write runbooks

                            Cloud engineering
                            โ”œโ”€โ”€ boot cloud images
                            โ”œโ”€โ”€ use cloud-init
                            โ”œโ”€โ”€ configure storage
                            โ”œโ”€โ”€ set firewall rules
                            โ””โ”€โ”€ deploy workloads
Recruiter-friendly summary
Strong positioning: Ubuntu skills show the ability to operate real services: SSH, systemd, networking, logs, permissions, packages, firewalling, hardening, scripting, Docker, Nginx, databases and cloud deployment.
Ubuntu operating system architecture
Applications / Services
                            โ”œโ”€โ”€ nginx
                            โ”œโ”€โ”€ postgres
                            โ”œโ”€โ”€ redis
                            โ”œโ”€โ”€ django
                            โ”œโ”€โ”€ docker
                            โ””โ”€โ”€ monitoring agents
                            โ”‚
                            โ–ผ
                            User space
                            โ”œโ”€โ”€ bash / shell
                            โ”œโ”€โ”€ GNU tools
                            โ”œโ”€โ”€ systemd
                            โ”œโ”€โ”€ journald
                            โ”œโ”€โ”€ apt / dpkg
                            โ”œโ”€โ”€ ssh
                            โ””โ”€โ”€ libraries
                            โ”‚
                            โ–ผ
                            Linux kernel
                            โ”œโ”€โ”€ process scheduler
                            โ”œโ”€โ”€ memory management
                            โ”œโ”€โ”€ filesystem layer
                            โ”œโ”€โ”€ network stack
                            โ”œโ”€โ”€ security modules
                            โ””โ”€โ”€ drivers
                            โ”‚
                            โ–ผ
                            Hardware / Hypervisor
                            โ”œโ”€โ”€ CPU
                            โ”œโ”€โ”€ RAM
                            โ”œโ”€โ”€ disk
                            โ”œโ”€โ”€ network card
                            โ”œโ”€โ”€ KVM / VMware
                            โ””โ”€โ”€ cloud hypervisor
What each layer means in operations
LayerTypical admin actionDiagnostic command
ApplicationRestart service, inspect config, read logs.systemctl status nginx
User spaceInstall package, manage users, run scripts.apt list --installed
SystemdEnable boot services and dependencies.journalctl -u service
KernelCheck memory, processes, sockets, I/O.dmesg, ss, top
StorageMount disks, inspect usage, tune I/O.df -h, lsblk
NetworkCheck IP, routes, DNS, firewall.ip a, ip r, resolvectl
Common mistake: debugging randomly. On Ubuntu, start with a layer: service, logs, process, network, storage, permissions, package, kernel.
Ubuntu Desktop, Server and Cloud
Edition / usageMain purposeTypical userKey components
Ubuntu DesktopWorkstation, development, daily OS.Developer, engineer, analyst.GNOME, terminal, browser, IDEs, Docker.
Ubuntu ServerProduction services and infrastructure.DevOps, SRE, backend engineer.SSH, systemd, apt, netplan, firewall.
Ubuntu Cloud ImageCloud VM baseline.Cloud engineer, platform team.cloud-init, optimized kernel, cloud agent.
Ubuntu Container BaseBase image for containers.DevOps, application engineer.minimal packages, apt, runtime libraries.
Ubuntu CoreIoT and embedded-oriented variant.IoT platform team.snap-based, transactional updates.
Use-case examples
Desktop:
                            - Python development
                            - Docker-based local stack
                            - SSH into production servers
                            - Kubernetes and cloud CLI tools

                            Server:
                            - Nginx reverse proxy
                            - Django or Node.js API
                            - PostgreSQL or Redis host
                            - monitoring server
                            - VPN or bastion host

                            Cloud:
                            - EC2 instance
                            - Azure VM
                            - GCP Compute Engine
                            - OpenStack instance
                            - Kubernetes node
Edition decision tree
Need a local workstation?
                            โ””โ”€โ”€ Ubuntu Desktop

                            Need a production VM?
                            โ””โ”€โ”€ Ubuntu Server LTS

                            Need a cloud instance?
                            โ””โ”€โ”€ Ubuntu cloud image

                            Need a container base?
                            โ””โ”€โ”€ Ubuntu minimal/base image

                            Need IoT appliance-like OS?
                            โ””โ”€โ”€ Ubuntu Core

                            Need enterprise security extensions?
                            โ””โ”€โ”€ Ubuntu LTS + Ubuntu Pro
Practical distinction
Desktop vs Server: Desktop has a graphical environment and workstation tools. Server is lighter, usually SSH-only, and optimized for services. In production, Ubuntu Server LTS is the common baseline.
LTS model, releases and upgrade strategy

Ubuntu is frequently chosen in production because of the LTS model. LTS means long-term support: a stable base release used for servers, cloud images and enterprise deployments. Non-LTS releases are useful for newer software, but less common as a conservative production baseline.

Release typeBest forProduction recommendation
LTSServers, cloud, enterprise, long-lived systems.Default choice for production.
Interim releaseNewer packages, testing, short-lived environments.Use only with clear upgrade discipline.
Rolling behaviorNot Ubuntu's main model.Use another distro if rolling release is required.
Upgrade strategy
Safe production upgrade path:

                            1. Inventory servers and services
                            2. Confirm current Ubuntu version
                            3. Check application compatibility
                            4. Snapshot or backup
                            5. Test upgrade on staging
                            6. Review package changes
                            7. Schedule maintenance window
                            8. Upgrade one node first
                            9. Validate services and logs
                            10. Roll out progressively
                            11. Keep rollback plan ready
Version management commands
# Show Ubuntu version
                            lsb_release -a

                            # Show OS release file
                            cat /etc/os-release

                            # Show kernel version
                            uname -a

                            # Update package lists
                            sudo apt update

                            # Upgrade installed packages
                            sudo apt upgrade

                            # Full upgrade with dependency changes
                            sudo apt full-upgrade

                            # Check reboot requirement
                            test -f /var/run/reboot-required && cat /var/run/reboot-required
Release risk table
RiskCauseControl
Package incompatibilityRuntime or library version changes.Test staging before production.
Service restart failureConfig syntax or dependency change.Validate configs before restart.
Kernel reboot requiredSecurity kernel update.Plan reboot window.
Repository mismatchThird-party packages not ready.Audit external repositories.
Ubuntu in enterprise and production

In enterprise environments, Ubuntu is used when teams need a stable Linux baseline with strong cloud support, broad package availability, automation compatibility and a known operational model. It is especially common for backend platforms, DevOps infrastructure, Kubernetes nodes, CI runners and cloud-hosted services.

Enterprise requirementUbuntu answerOperational practice
Security patchingRegular package and kernel updates.Patch windows and reboot strategy.
Repeatable deploymentCloud images, apt, automation tools.Ansible, Terraform, cloud-init.
Service supervisionsystemd standard service manager.Unit files, restart policy, journald logs.
Access controlLinux users, groups, sudo, SSH.Least privilege and key-based access.
Observabilityjournald, syslog, metrics agents.Central logging and monitoring.
Cloud integrationImages and cloud-init.Bootstrap on first boot.
Production server lifecycle
Provision
                            โ”‚
                            โ”œโ”€โ”€ select Ubuntu LTS image
                            โ”œโ”€โ”€ configure cloud-init
                            โ”œโ”€โ”€ attach disk
                            โ””โ”€โ”€ configure network
                            โ”‚
                            โ–ผ
                            Harden
                            โ”‚
                            โ”œโ”€โ”€ SSH keys
                            โ”œโ”€โ”€ disable root login
                            โ”œโ”€โ”€ firewall
                            โ”œโ”€โ”€ unattended upgrades policy
                            โ””โ”€โ”€ least-privilege users
                            โ”‚
                            โ–ผ
                            Deploy
                            โ”‚
                            โ”œโ”€โ”€ install packages
                            โ”œโ”€โ”€ configure services
                            โ”œโ”€โ”€ systemd unit files
                            โ””โ”€โ”€ application release
                            โ”‚
                            โ–ผ
                            Operate
                            โ”‚
                            โ”œโ”€โ”€ logs
                            โ”œโ”€โ”€ metrics
                            โ”œโ”€โ”€ backups
                            โ”œโ”€โ”€ patching
                            โ””โ”€โ”€ incident response
Professional checklist
[ ] LTS release selected
                            [ ] SSH key access only
                            [ ] sudo policy controlled
                            [ ] firewall enabled
                            [ ] services managed by systemd
                            [ ] logs visible through journalctl
                            [ ] backups configured
                            [ ] monitoring installed
                            [ ] security updates planned
                            [ ] disk usage monitored
                            [ ] certificates tracked
                            [ ] rollback plan documented
Core Ubuntu administration toolkit
AreaToolsTypical command
Packagesapt, dpkgsudo apt install nginx
Servicessystemd, systemctlsudo systemctl restart nginx
Logsjournalctl, syslogjournalctl -u nginx -f
Networkip, ss, resolvectl, netplanss -lntp
Firewallufw, nftablessudo ufw status verbose
Storagedf, du, lsblk, mountdf -h
Processesps, top, htop, killps aux | grep nginx
Usersuseradd, usermod, sudoerssudo usermod -aG sudo user
First diagnostic commands
# System identity
                            hostnamectl
                            cat /etc/os-release
                            uptime

                            # CPU and memory
                            top
                            free -h

                            # Disk usage
                            df -h
                            du -sh /var/log/*

                            # Network
                            ip a
                            ip r
                            ss -lntp
                            resolvectl status

                            # Services
                            systemctl status nginx
                            journalctl -u nginx --since "30 min ago"

                            # Packages
                            apt policy nginx
                            dpkg -l | grep nginx

                            # Security
                            sudo ufw status verbose
                            sudo journalctl -u ssh --since today
Good operator habit: before changing anything, collect facts: service status, logs, listening ports, disk space, memory, recent package changes and firewall state.
Typical Ubuntu production stacks
StackComponentsUbuntu role
Django / Python APINginx, Gunicorn, Django, PostgreSQL, Redis.Host services, packages, systemd units, logs.
Node.js APINginx, Node.js, PM2/systemd, database.Runtime host and reverse proxy.
Docker hostDocker Engine, Compose, images, volumes.Container runtime platform.
Database serverPostgreSQL, MySQL, MariaDB, backups.Storage, service control, tuning, logs.
Monitoring serverPrometheus, Grafana, Loki, exporters.Observability host.
Bastion hostSSH gateway, audit, restricted access.Secure entry point.
Django deployment example
Internet
                            โ”‚
                            โ–ผ
                            Nginx
                            โ”‚
                            โ”œโ”€โ”€ TLS termination
                            โ”œโ”€โ”€ static files
                            โ””โ”€โ”€ reverse proxy
                            โ”‚
                            โ–ผ
                            Gunicorn systemd service
                            โ”‚
                            โ–ผ
                            Django application
                            โ”‚
                            โ”œโ”€โ”€ PostgreSQL
                            โ”œโ”€โ”€ Redis
                            โ”œโ”€โ”€ Celery workers
                            โ””โ”€โ”€ media/static storage
Example service units
Common systemd units:
                            - nginx.service
                            - postgresql.service
                            - redis-server.service
                            - docker.service
                            - gunicorn.service
                            - celery.service
                            - celerybeat.service
                            - prometheus-node-exporter.service

                            Typical commands:
                            sudo systemctl enable nginx
                            sudo systemctl restart gunicorn
                            sudo systemctl status redis-server
                            journalctl -u celery -f
Minimal web server setup flow
1. Create server
                            2. Update packages
                            3. Create deploy user
                            4. Configure SSH
                            5. Install Nginx
                            6. Install app runtime
                            7. Configure database
                            8. Create systemd service
                            9. Configure TLS
                            10. Enable firewall
                            11. Add monitoring
                            12. Add backup
                            13. Document runbook
Backend angle: Ubuntu is where application theory becomes production reality: processes, ports, permissions, logs, memory, disk and security.
Common risks, anti-patterns and production mistakes
Anti-patternRiskCorrection
Logging in as root directlyWeak audit and high blast radius.Use named users, sudo and SSH keys.
Public SSH with passwordsBrute-force exposure.Key-only SSH, firewall, fail2ban or VPN.
Ignoring package updatesKnown vulnerabilities remain active.Patch policy and reboot planning.
No service managerApp dies and does not restart.Use systemd with restart policy.
No log strategyIncidents are hard to diagnose.Use journald, logrotate and central logs.
Manual untracked changesServer becomes unreproducible.Use automation and versioned configs.
No disk monitoringFull disk causes outage.Monitor filesystem usage and logs.
No rollback planFailed upgrade becomes long outage.Snapshot, backup and tested restore path.
Incident diagnostic decision tree
Application is down
                            โ”‚
                            โ”œโ”€โ”€ Is server reachable?
                            โ”‚       โ”œโ”€โ”€ no  -> network, firewall, cloud, DNS
                            โ”‚       โ””โ”€โ”€ yes
                            โ”‚
                            โ”œโ”€โ”€ Is service running?
                            โ”‚       โ”œโ”€โ”€ no  -> systemctl status + journalctl
                            โ”‚       โ””โ”€โ”€ yes
                            โ”‚
                            โ”œโ”€โ”€ Is port listening?
                            โ”‚       โ”œโ”€โ”€ no  -> config or bind failure
                            โ”‚       โ””โ”€โ”€ yes
                            โ”‚
                            โ”œโ”€โ”€ Is reverse proxy healthy?
                            โ”‚       โ”œโ”€โ”€ no  -> nginx config/logs
                            โ”‚       โ””โ”€โ”€ yes
                            โ”‚
                            โ”œโ”€โ”€ Is database reachable?
                            โ”‚       โ”œโ”€โ”€ no  -> DB service/network/auth
                            โ”‚       โ””โ”€โ”€ yes
                            โ”‚
                            โ””โ”€โ”€ Is app throwing errors?
                            โ”œโ”€โ”€ yes -> application logs
                            โ””โ”€โ”€ no  -> upstream routing/cache/client issue
First-response commands
systemctl status nginx
                            journalctl -u nginx --since "15 min ago"
                            ss -lntp
                            df -h
                            free -h
                            top
                            sudo ufw status
                            curl -I http://localhost
                            curl -I https://example.com
Production rule: never change several things at once during an incident. Observe, isolate, change one thing, verify, document.
Official links and useful references
ResourceURLUsage
Ubuntu main sitehttps://ubuntu.com/Product overview and downloads.
Download Ubuntuhttps://ubuntu.com/downloadDesktop, server and cloud downloads.
Ubuntu documentationhttps://documentation.ubuntu.com/Official documentation portal.
Ubuntu Server docshttps://documentation.ubuntu.com/server/Server administration reference.
Ubuntu releaseshttps://releases.ubuntu.com/Release images and versions.
Ubuntu packageshttps://packages.ubuntu.com/Package lookup.
Ubuntu security noticeshttps://ubuntu.com/security/noticesSecurity update tracking.
Learning roadmap
Ubuntu learning path:

                            1. Shell basics
                            2. Filesystem and permissions
                            3. Users, groups and sudo
                            4. apt and packages
                            5. systemd services
                            6. journald and logs
                            7. networking and DNS
                            8. firewall and SSH hardening
                            9. storage and mounts
                            10. Nginx reverse proxy
                            11. database service operation
                            12. Docker host usage
                            13. backups and restore
                            14. monitoring and alerting
                            15. cloud-init and automation
One-line positioning
Ubuntu is a professional Linux platform for running and operating real systems: web servers, APIs, databases, CI/CD, containers, cloud workloads, monitoring and developer environments.
7.2 Ubuntu Software Management: App Center, DEB, Snap, Flatpak, PPAs and repository governance
Software management on Ubuntu

Ubuntu provides several ways to install software. The most important are: graphical installation through Ubuntu App Center, traditional Debian packages through APT and .deb files, Snap packages, Flatpak applications and third-party repositories such as PPAs.

The right method depends on the context. A desktop user may prefer App Center, Snap or Flatpak. A server administrator usually prefers APT and controlled repositories. A developer may use a vendor repository for Docker, PostgreSQL, Node.js or cloud tooling. A production team must control package origin, version, update policy and rollback.

MethodBest forStrengthRisk
App CenterDesktop users and simple installs.Easy graphical installation.Less precise for production governance.
APT / DEBServers, system packages, standard tools.Native Ubuntu package management.Repository conflicts if unmanaged.
SnapSandboxed apps and some Canonical-supported tools.Bundled dependencies and automatic refresh.Refresh policy and confinement must be understood.
FlatpakDesktop applications, especially cross-distro apps.Good desktop app ecosystem.Another runtime and update channel to govern.
PPANewer versions or community packages.Access to versions not in official repos.Trust, lifecycle and upgrade conflicts.
Vendor repoOfficial upstream packages.Best path for many professional tools.Keys, pinning and repository ownership matter.
Core rule: for production servers, prefer official Ubuntu repositories or official vendor repositories. Use PPAs and manual .deb installs only with explicit justification.
Software source map
Ubuntu software sources
                            โ”‚
                            โ”œโ”€โ”€ App Center
                            โ”‚       โ”œโ”€โ”€ graphical install
                            โ”‚       โ”œโ”€โ”€ desktop apps
                            โ”‚       โ””โ”€โ”€ simple discovery
                            โ”‚
                            โ”œโ”€โ”€ APT repositories
                            โ”‚       โ”œโ”€โ”€ official Ubuntu repos
                            โ”‚       โ”œโ”€โ”€ security updates
                            โ”‚       โ”œโ”€โ”€ vendor repos
                            โ”‚       โ””โ”€โ”€ PPAs
                            โ”‚
                            โ”œโ”€โ”€ Local DEB files
                            โ”‚       โ”œโ”€โ”€ downloaded installer
                            โ”‚       โ”œโ”€โ”€ vendor package
                            โ”‚       โ””โ”€โ”€ manual install
                            โ”‚
                            โ”œโ”€โ”€ Snap
                            โ”‚       โ”œโ”€โ”€ snap store
                            โ”‚       โ”œโ”€โ”€ channels
                            โ”‚       โ”œโ”€โ”€ sandbox
                            โ”‚       โ””โ”€โ”€ auto refresh
                            โ”‚
                            โ””โ”€โ”€ Flatpak
                            โ”œโ”€โ”€ Flathub
                            โ”œโ”€โ”€ desktop app runtimes
                            โ”œโ”€โ”€ sandbox permissions
                            โ””โ”€โ”€ user-level installs
Decision shortcut
Need a server package?
                            โ””โ”€โ”€ APT from Ubuntu or official vendor repository

                            Need a desktop application?
                            โ””โ”€โ”€ App Center, Snap or Flatpak

                            Need a newer application version?
                            โ”œโ”€โ”€ check official vendor repo first
                            โ”œโ”€โ”€ then consider PPA
                            โ””โ”€โ”€ document the reason

                            Need a one-off local installer?
                            โ””โ”€โ”€ .deb file with verified source

                            Need strict production reproducibility?
                            โ””โ”€โ”€ APT + pinned repositories + automation
Ubuntu App Center: simplified graphical installation

Ubuntu App Center is the graphical software interface on Ubuntu Desktop. It is designed for easy discovery, installation and removal of common applications. It is convenient for desktop workflows, but it is not the primary tool for server automation or strict production package governance.

Use caseApp Center fitComment
Install browser, editor, media toolExcellent.Simple desktop workflow.
Discover common applicationsExcellent.Good for non-terminal users.
Install developer desktop toolsGood.Check whether package is Snap or DEB.
Production server packagePoor fit.Use APT, automation or vendor repo.
Fleet managementPoor fit.Use Ansible, cloud-init, image build or MDM.
Typical App Center flow
Ubuntu Desktop
                            โ”‚
                            โ”œโ”€โ”€ Open App Center
                            โ”œโ”€โ”€ Search application
                            โ”œโ”€โ”€ Review publisher and package type
                            โ”œโ”€โ”€ Click Install
                            โ”œโ”€โ”€ Authenticate if required
                            โ”œโ”€โ”€ Launch application
                            โ””โ”€โ”€ Update through system update flow
Desktop rule: App Center is excellent for user convenience, but administrators should still understand which packaging technology is actually used behind the install.
What to verify before installing
Before installing a desktop app:
                            [ ] Is the publisher trusted?
                            [ ] Is it a Snap, DEB or Flatpak package?
                            [ ] Is the app maintained?
                            [ ] Does it need sensitive permissions?
                            [ ] Is there an official vendor package?
                            [ ] Is it needed system-wide or only for one user?
                            [ ] Is it appropriate for a professional workstation?
Graphical vs CLI management
ApproachStrengthWeakness
App CenterEasy, visual, good for desktop users.Less scriptable and less auditable.
APT CLIScriptable, auditable, server-friendly.Requires terminal knowledge.
Snap CLIPrecise Snap control.Requires understanding channels and refresh.
Flatpak CLIGood app and permission control.Separate ecosystem and runtimes.
Useful desktop package checks
# Show installed Snap packages
                            snap list

                            # Show installed DEB packages
                            dpkg -l | less

                            # Search APT package
                            apt search package-name

                            # Show package origin
                            apt policy package-name

                            # Show Flatpak apps if installed
                            flatpak list
Governance warning: easy installation does not mean safe installation. Always care about publisher, update channel, permissions and package origin.
DEB packages and APT: native Ubuntu package management

Ubuntu is based on Debian packaging. A .deb file is a Debian package. APT is the higher-level tool that downloads packages from repositories, resolves dependencies, installs upgrades and tracks package versions.

ConceptMeaningCommand
.debLocal Debian package file.sudo apt install ./file.deb
aptHigh-level package manager.sudo apt install nginx
dpkgLow-level package tool.dpkg -l
RepositoryPackage source./etc/apt/sources.list.d/
DependencyPackage required by another package.Resolved by APT.
Candidate versionVersion APT would install.apt policy package
APT essentials
# Update package metadata
                            sudo apt update

                            # Install package from repository
                            sudo apt install nginx

                            # Install local DEB file with dependency resolution
                            sudo apt install ./package.deb

                            # Remove package but keep config
                            sudo apt remove package-name

                            # Remove package and config
                            sudo apt purge package-name

                            # Upgrade packages
                            sudo apt upgrade

                            # Show package details
                            apt show package-name

                            # Show installed and candidate versions
                            apt policy package-name
DEB installation flow
Local DEB file
                            โ”‚
                            โ”œโ”€โ”€ verify source
                            โ”œโ”€โ”€ check vendor signature or checksum if available
                            โ”œโ”€โ”€ install with apt
                            โ”‚       โ””โ”€โ”€ sudo apt install ./package.deb
                            โ”œโ”€โ”€ inspect installed package
                            โ”‚       โ””โ”€โ”€ dpkg -l | grep package
                            โ”œโ”€โ”€ verify service or binary
                            โ””โ”€โ”€ document install source
Package inspection commands
# List installed packages
                            dpkg -l

                            # Filter installed packages
                            dpkg -l | grep nginx

                            # Show package status
                            dpkg -s nginx

                            # Show files installed by package
                            dpkg -L nginx

                            # Find package owning a file
                            dpkg -S /usr/sbin/nginx

                            # Show APT history
                            less /var/log/apt/history.log

                            # Show available versions
                            apt-cache madison nginx
Production DEB rules
Do:
                            - prefer repository installation over random downloads
                            - use official vendor DEB if needed
                            - keep package source documented
                            - automate installation in scripts or Ansible
                            - review apt history after changes

                            Avoid:
                            - random DEB files from unknown sites
                            - manual installs without documentation
                            - local DEB files with no update path
                            - mixing multiple competing repositories
                            - installing critical server packages from untrusted sources
Server rule: APT and DEB are the default professional path for Ubuntu server software because they integrate with repositories, updates, logs and automation.
Snap packages: bundled apps, channels, confinement and refresh

Snap packages bundle applications with their dependencies and run with a confinement model. They are distributed through the Snap ecosystem and can use channels such as stable, candidate, beta or edge. Snap refresh behavior is important because packages can update automatically.

Snap conceptMeaningOperational impact
ChannelRelease track.Stable is safer than beta or edge.
RevisionSpecific build of a Snap.Can support revert to previous revision.
ConfinementSandbox permissions.May restrict filesystem/device access.
InterfacePermission connection.May require manual connection.
RefreshUpdate mechanism.Needs maintenance policy for servers.
Snap commands
# List installed snaps
                            snap list

                            # Search for app
                            snap find package-name

                            # Show package info
                            snap info package-name

                            # Install stable channel
                            sudo snap install package-name --channel=stable

                            # Refresh snaps
                            sudo snap refresh

                            # Show refresh schedule
                            snap refresh --time

                            # Remove snap
                            sudo snap remove package-name
Snap operations
# Show changes
                            snap changes

                            # Show connections
                            snap connections package-name

                            # Connect interface
                            sudo snap connect package-name:interface

                            # Revert to previous revision if available
                            sudo snap revert package-name

                            # Hold refresh temporarily
                            sudo snap refresh --hold=24h package-name

                            # Logs for snap service
                            snap logs package-name
Snap decision tree
Considering Snap?
                            โ”‚
                            โ”œโ”€โ”€ Desktop application?
                            โ”‚       โ””โ”€โ”€ often acceptable
                            โ”‚
                            โ”œโ”€โ”€ Server daemon?
                            โ”‚       โ”œโ”€โ”€ check refresh policy
                            โ”‚       โ”œโ”€โ”€ check confinement
                            โ”‚       โ”œโ”€โ”€ check logs
                            โ”‚       โ””โ”€โ”€ check rollback
                            โ”‚
                            โ”œโ”€โ”€ Need strict package timing?
                            โ”‚       โ””โ”€โ”€ prefer APT or control refresh window
                            โ”‚
                            โ””โ”€โ”€ Need sandboxed app delivery?
                            โ””โ”€โ”€ Snap can be a good fit
Snap strengths and cautions
StrengthCaution
Bundled dependencies.More disk usage than native package in some cases.
Simple install path.Refresh behavior must be understood.
Sandbox confinement.Permissions may surprise users or services.
Channels and revert.Wrong channel can increase instability.
Production warning: before using Snap for a server component, define refresh policy, rollback, monitoring and service ownership.
Flatpak: desktop application distribution and Flathub ecosystem

Flatpak is a cross-distribution packaging system often used for desktop applications. Applications run with a sandbox model and rely on runtimes. Flatpak is especially common when users want recent desktop applications independently from the system package version.

Flatpak conceptMeaningOperational note
RemotePackage source.Flathub is the common public remote.
RuntimeShared dependency platform.Required by Flatpak apps.
Application IDUnique app identifier.Example: org.gimp.GIMP.
SandboxPermission model.Filesystem and device access can be restricted.
User installInstall for one user.Useful on shared desktops.
System installInstall for all users.Requires admin privileges.
Install Flatpak support
# Install Flatpak
                            sudo apt update
                            sudo apt install flatpak

                            # Add Flathub remote
                            flatpak remote-add --if-not-exists flathub https://flathub.org/repo/flathub.flatpakrepo

                            # Search app
                            flatpak search gimp

                            # Install app
                            flatpak install flathub org.gimp.GIMP

                            # Run app
                            flatpak run org.gimp.GIMP
Flatpak operations
# List installed apps
                            flatpak list

                            # List remotes
                            flatpak remotes

                            # Update apps
                            flatpak update

                            # Show app info
                            flatpak info org.gimp.GIMP

                            # Uninstall app
                            flatpak uninstall org.gimp.GIMP

                            # Remove unused runtimes
                            flatpak uninstall --unused

                            # Show app permissions
                            flatpak info --show-permissions org.gimp.GIMP
Flatpak fit
ContextFlatpak fitComment
Desktop appsStrong.Especially when recent versions matter.
Server daemonsWeak.APT or vendor repo is usually better.
Developer workstationGood.Useful for GUI tools.
Production fleetLimited.Needs desktop app governance.
Flatpak vs Snap mental model
Snap:
                            - integrated by default on Ubuntu
                            - used for desktop apps and selected system tools
                            - has channels and refresh behavior

                            Flatpak:
                            - popular for cross-distro desktop apps
                            - commonly uses Flathub
                            - strong desktop application ecosystem
                            - often installed separately on Ubuntu
Flatpak rule: good for desktop applications, not the default path for production server packages.
PPA repositories: newer software versions and controlled exceptions

A PPA is a third-party APT repository hosted on Launchpad. PPAs are useful when the official Ubuntu repository does not provide the needed version, but they must be treated as trust decisions. Adding a PPA can change package candidates, dependencies and upgrade behavior.

PPA use caseGood reason?Production caution
Need newer desktop appSometimes.Check maintainer and update history.
Need newer dev toolSometimes.Prefer official vendor repo when available.
Need critical server packageRarely.Use official Ubuntu or vendor repo if possible.
Random tutorial says add PPANo.Understand why before adding.
Temporary test machineAcceptable.Disposable environment lowers risk.
PPA commands
# Install helper if needed
                            sudo apt install software-properties-common

                            # Add PPA
                            sudo add-apt-repository ppa:owner/name

                            # Update metadata
                            sudo apt update

                            # Install package
                            sudo apt install package-name

                            # Show package origin and candidate
                            apt policy package-name

                            # Remove PPA source
                            sudo add-apt-repository --remove ppa:owner/name
PPA governance flow
Need a PPA?
                            โ”‚
                            โ”œโ”€โ”€ Is package available in official Ubuntu repo?
                            โ”‚       โ”œโ”€โ”€ yes -> prefer official repo
                            โ”‚       โ””โ”€โ”€ no
                            โ”‚
                            โ”œโ”€โ”€ Is there an official vendor repository?
                            โ”‚       โ”œโ”€โ”€ yes -> prefer vendor repo
                            โ”‚       โ””โ”€โ”€ no
                            โ”‚
                            โ”œโ”€โ”€ Is PPA trusted and maintained?
                            โ”‚       โ”œโ”€โ”€ no -> reject
                            โ”‚       โ””โ”€โ”€ yes
                            โ”‚
                            โ”œโ”€โ”€ Is this production?
                            โ”‚       โ”œโ”€โ”€ yes -> document and test in staging
                            โ”‚       โ””โ”€โ”€ no -> acceptable for lab if understood
                            โ”‚
                            โ””โ”€โ”€ Add with owner, reason and review date
Inspect repository sources
# Source list files
                            ls -lah /etc/apt/sources.list.d/

                            # Search active deb lines
                            grep -R "^deb" /etc/apt/sources.list /etc/apt/sources.list.d/

                            # Show package candidate and priorities
                            apt policy package-name

                            # Show all versions
                            apt-cache madison package-name

                            # Recent repository changes
                            sudo find /etc/apt -type f -mtime -30 -ls
PPA risk matrix
RiskCauseControl
Wrong package version selectedPPA has higher candidate version.Check apt policy.
Upgrade conflictPPA dependencies diverge.Test in staging.
Abandoned packageMaintainer stops updates.Review regularly.
Supply-chain concernUntrusted publisher.Prefer official source.
PPA rule: a PPA is not just a package. It is a new repository that may influence dependency resolution across the system.
DEB vs Snap vs Flatpak vs PPA: practical comparison
CriterionDEB / APTSnapFlatpakPPA
Best targetServer and system packages.Desktop apps and selected tools.Desktop apps.Newer APT packages.
Dependency modelSystem dependencies.Bundled dependencies.Runtimes and bundled app parts.APT dependencies from repo.
Update modelAPT updates.Snap refresh.Flatpak update.APT updates from PPA.
SandboxingUsually no app sandbox.Confinement model.Sandbox model.Same as APT package.
Production serversBest default.Case-by-case.Usually no.Exception only.
Desktop appsGood.Good.Good.Sometimes.
Governance complexityMedium.Medium.Medium.High if unmanaged.
Choice diagram
Choose package format
                            โ”‚
                            โ”œโ”€โ”€ Is this a production server dependency?
                            โ”‚       โ”œโ”€โ”€ yes -> APT / DEB / official vendor repo
                            โ”‚       โ””โ”€โ”€ no
                            โ”‚
                            โ”œโ”€โ”€ Is this a desktop GUI app?
                            โ”‚       โ”œโ”€โ”€ yes -> App Center, Snap or Flatpak
                            โ”‚       โ””โ”€โ”€ no
                            โ”‚
                            โ”œโ”€โ”€ Do you need newest upstream version?
                            โ”‚       โ”œโ”€โ”€ yes -> official vendor repo first
                            โ”‚       โ”œโ”€โ”€ then PPA if trusted
                            โ”‚       โ””โ”€โ”€ document exception
                            โ”‚
                            โ”œโ”€โ”€ Do you need sandboxed desktop app?
                            โ”‚       โ”œโ”€โ”€ yes -> Snap or Flatpak
                            โ”‚       โ””โ”€โ”€ no
                            โ”‚
                            โ””โ”€โ”€ Need reproducible fleet?
                            โ””โ”€โ”€ automate APT and pin sources
Use-case recommendations
Use casePreferred optionReason
Nginx on serverAPT.Native service integration.
PostgreSQL productionUbuntu repo or official PostgreSQL repo.Clear lifecycle and updates.
Docker EngineOfficial Docker repo or Ubuntu package by policy.Version and support clarity.
Desktop editorApp Center, Snap, DEB or vendor repo.Depends on vendor support.
Graphic design appFlatpak or Snap often acceptable.Desktop app freshness.
Simple rule: server software should be boring and governed. Desktop software can be more flexible if publisher and permissions are understood.
Security, provenance and update governance

Software installation is a supply-chain decision. Every package source can install code with user or system privileges. Good governance means knowing where software comes from, how it updates, who maintains it and how to roll back when it breaks.

RiskExampleControl
Untrusted publisherRandom DEB or PPA.Use official source or trusted vendor.
Unexpected updatesSnap refresh, PPA version change.Control channels, windows and policy.
Dependency conflictPPA overrides Ubuntu package.Check apt policy and pin if needed.
Abandoned packageNo security patches.Review source health.
Secret exposureInstall script writes credentials.Inspect scripts, avoid long-lived secrets.
No rollback pathManual install with no version record.Document package version and source.
Pre-install security checklist
[ ] Is the source official?
                            [ ] Is the publisher trusted?
                            [ ] Is the package maintained?
                            [ ] Is the update mechanism known?
                            [ ] Is the package type known?
                            [ ] Is the installation reversible?
                            [ ] Is the version documented?
                            [ ] Does it add a repository?
                            [ ] Does it add a signing key?
                            [ ] Does it run a script as root?
                            [ ] Does it request sensitive permissions?
                            [ ] Is it approved for production?
Install script warning pattern
Risky pattern:
                            curl https://example.com/install.sh | sudo bash

                            Safer pattern:
                            1. Download script
                            2. Inspect script
                            3. Verify source
                            4. Verify checksum or signature if available
                            5. Run intentionally
                            6. Record package source
                            7. Test in staging first
Repository audit commands
# Show APT sources
                            grep -R "^deb" /etc/apt/sources.list /etc/apt/sources.list.d/

                            # List source files
                            ls -lah /etc/apt/sources.list.d/

                            # Show package origin
                            apt policy package-name

                            # Show installed Snap packages
                            snap list

                            # Show Flatpak remotes and apps
                            flatpak remotes
                            flatpak list

                            # Show recent package operations
                            less /var/log/apt/history.log
Production software governance
Governance record:
                            - package name
                            - package type
                            - source repository
                            - publisher
                            - installed version
                            - update policy
                            - rollback method
                            - owner
                            - reason
                            - review date
Supply-chain rule: package installation is code execution. Treat unknown packages as a security risk, not as a convenience.
Troubleshooting software installation and updates
SymptomLikely causeFirst commandFix direction
APT lock errorAnother apt/dpkg process running.ps aux | grep -E 'apt|dpkg'Wait or investigate process.
Broken packagesInterrupted install or dependency conflict.sudo dpkg --configure -aRepair dpkg and dependencies.
Repository signature errorMissing or wrong signing key.sudo apt updateFix keyring or remove repo.
Package version unexpectedPPA or vendor repo changes candidate.apt policy package-namePin, remove repo or choose version.
Snap app cannot access fileConfinement or interface issue.snap connections appConnect interface or adjust path.
Flatpak app missing permissionSandbox permission.flatpak info --show-permissions appAdjust permission intentionally.
APT repair commands
# Repair interrupted package configuration
                            sudo dpkg --configure -a

                            # Fix broken dependencies
                            sudo apt -f install

                            # Refresh metadata
                            sudo apt update

                            # Clean package cache
                            sudo apt clean

                            # Check holds
                            apt-mark showhold

                            # Review package history
                            less /var/log/apt/history.log
                            less /var/log/apt/term.log
Package troubleshooting decision tree
Software install failed
                            โ”‚
                            โ”œโ”€โ”€ Read exact error
                            โ”‚
                            โ”œโ”€โ”€ APT lock?
                            โ”‚       โ””โ”€โ”€ check apt/dpkg process
                            โ”‚
                            โ”œโ”€โ”€ DNS or network?
                            โ”‚       โ””โ”€โ”€ resolvectl, dig, curl
                            โ”‚
                            โ”œโ”€โ”€ Signature or key?
                            โ”‚       โ””โ”€โ”€ inspect source and keyring
                            โ”‚
                            โ”œโ”€โ”€ Dependency conflict?
                            โ”‚       โ””โ”€โ”€ apt policy, apt -f install, holds
                            โ”‚
                            โ”œโ”€โ”€ PPA conflict?
                            โ”‚       โ””โ”€โ”€ disable source, apt update
                            โ”‚
                            โ”œโ”€โ”€ Snap confinement?
                            โ”‚       โ””โ”€โ”€ snap connections
                            โ”‚
                            โ””โ”€โ”€ Flatpak permission?
                            โ””โ”€โ”€ flatpak info --show-permissions
Disable source temporarily
# Disable a repository source file
                            sudo mv /etc/apt/sources.list.d/vendor.list \
                            /etc/apt/sources.list.d/vendor.list.disabled

                            # Refresh metadata
                            sudo apt update

                            # Check package candidate again
                            apt policy package-name
Snap and Flatpak diagnostics
# Snap
                            snap list
                            snap info package-name
                            snap changes
                            snap connections package-name
                            snap logs package-name

                            # Flatpak
                            flatpak list
                            flatpak remotes
                            flatpak info app-id
                            flatpak info --show-permissions app-id
                            flatpak update
Troubleshooting rule: package problems usually have a precise error. Read it before applying repair commands.
Final checklist and command cheat sheet
Software management checklist
[ ] Package type is understood
                            [ ] Source is trusted
                            [ ] Publisher is verified
                            [ ] Update mechanism is known
                            [ ] Rollback path exists
                            [ ] Repository additions are documented
                            [ ] PPAs are justified
                            [ ] Vendor repos are preferred over random PPAs
                            [ ] Local DEB files are avoided unless necessary
                            [ ] Snap refresh behavior is understood
                            [ ] Flatpak remotes are known
                            [ ] Production servers use governed sources
                            [ ] Package changes are traceable
                            [ ] Staging test exists for critical software
                            [ ] Security updates are planned
APT / DEB cheat sheet
sudo apt update
                            sudo apt install package-name
                            sudo apt install ./package.deb
                            sudo apt remove package-name
                            sudo apt purge package-name
                            sudo apt upgrade
                            apt search package-name
                            apt show package-name
                            apt policy package-name
                            dpkg -l | grep package-name
                            dpkg -L package-name
                            dpkg -S /path/to/file
                            less /var/log/apt/history.log
Snap / Flatpak / PPA cheat sheet
# Snap
                            snap list
                            snap find package-name
                            snap info package-name
                            sudo snap install package-name
                            sudo snap refresh
                            snap refresh --time
                            sudo snap remove package-name

                            # Flatpak
                            flatpak remotes
                            flatpak search package-name
                            flatpak install flathub app-id
                            flatpak run app-id
                            flatpak update
                            flatpak uninstall app-id

                            # PPA
                            sudo apt install software-properties-common
                            sudo add-apt-repository ppa:owner/name
                            sudo apt update
                            apt policy package-name
                            sudo add-apt-repository --remove ppa:owner/name
Final rule
Ubuntu software management is source governance.
Installing software means trusting a publisher, an update channel and a dependency chain. Use App Center for simple desktop workflows, APT and DEB for professional server management, Snap or Flatpak for selected desktop/application cases, and PPAs only as controlled exceptions.
Production default
Production software default:
                            - Ubuntu LTS
                            - official Ubuntu repositories
                            - official vendor repositories when needed
                            - no random PPAs
                            - no unknown DEB downloads
                            - package baseline automated
                            - update policy documented
                            - rollback path tested
                            - package history reviewed after changes
1.2 Ubuntu Versions & Release Cycle: LTS, interim, support, upgrade strategy and production choice
Ubuntu release model

Ubuntu follows a predictable release model with two main families: LTS releases and interim releases. LTS means Long-Term Support and is the default choice for production systems. Interim releases provide newer software faster, but with a much shorter support window.

In professional environments, version choice is not cosmetic. It impacts security patching, kernel behavior, package versions, compatibility, cloud images, automation, compliance, upgrade windows and rollback strategy.

Release typeTypical cadenceSupport modelBest usage
LTSEvery 2 yearsLong support window, production-oriented.Servers, cloud, enterprise, databases, Kubernetes nodes.
InterimBetween LTS releasesShort support window.Testing, newer kernels, recent desktop features, short-lived dev systems.
Point releaseLTS refresh imagesUpdated installer media for the same LTS family.Fresh installs with fewer post-install updates.
ESM / Ubuntu ProAfter standard support or for broader package coverageExtended security maintenance model.Long-lived enterprise systems that cannot upgrade quickly.
Simple rule: for production, choose an Ubuntu LTS baseline unless there is a very specific reason to use an interim release.
Release cycle mental diagram
Ubuntu release cycle
                            โ”‚
                            โ”œโ”€โ”€ LTS release
                            โ”‚       โ”œโ”€โ”€ stable baseline
                            โ”‚       โ”œโ”€โ”€ long security maintenance
                            โ”‚       โ”œโ”€โ”€ enterprise-friendly
                            โ”‚       โ”œโ”€โ”€ common cloud image
                            โ”‚       โ””โ”€โ”€ recommended for production
                            โ”‚
                            โ”œโ”€โ”€ Interim release
                            โ”‚       โ”œโ”€โ”€ newer kernel
                            โ”‚       โ”œโ”€โ”€ newer user-space
                            โ”‚       โ”œโ”€โ”€ shorter support
                            โ”‚       โ”œโ”€โ”€ useful for testing
                            โ”‚       โ””โ”€โ”€ requires upgrade discipline
                            โ”‚
                            โ”œโ”€โ”€ Point release
                            โ”‚       โ”œโ”€โ”€ refreshed installer image
                            โ”‚       โ”œโ”€โ”€ accumulated updates
                            โ”‚       โ””โ”€โ”€ useful for new deployments
                            โ”‚
                            โ””โ”€โ”€ ESM / extended support
                            โ”œโ”€โ”€ longer security coverage
                            โ”œโ”€โ”€ used when upgrade is delayed
                            โ””โ”€โ”€ enterprise lifecycle tool
What version choice affects
Version choice affects:
                            - kernel version
                            - driver support
                            - OpenSSL version
                            - Python / PHP / Node packages
                            - systemd behavior
                            - Netplan / network stack
                            - cloud-init behavior
                            - container runtime support
                            - security patch horizon
                            - application certification
                            - upgrade planning
                            - operational risk
LTS vs interim: practical comparison
CriterionLTSInterim
Primary goalStability and long-term operation.Newer features and faster evolution.
Production fitExcellent default choice.Only if justified and actively managed.
Security maintenanceLong support window.Short support window.
Kernel freshnessStable, sometimes less recent.More recent.
Package freshnessConservative.Newer versions.
Operational burdenLower.Higher, because upgrades are frequent.
Cloud image standardizationExcellent.Less common for long-lived fleets.
Best forServers, DBs, APIs, CI runners, cloud nodes.Labs, test machines, recent hardware, feature validation.
Decision shortcut
Choose LTS when:
                            - server is production
                            - database is production
                            - uptime matters
                            - patching must be predictable
                            - infrastructure must be standardized
                            - cloud images are reused
                            - upgrade windows are rare
                            - compliance matters

                            Choose interim when:
                            - testing newer kernel
                            - testing new desktop stack
                            - testing new hardware support
                            - environment is disposable
                            - upgrade cadence is accepted
                            - production risk is low
Common professional rule
LTS-first policy: in enterprise, servers usually standardize on one LTS version. Interim releases are exceptions that must be justified, documented and upgraded before support ends.
Bad decision examples
Bad:
                            - installing an interim release on a long-lived database server
                            - using different Ubuntu versions randomly across servers
                            - upgrading production without staging validation
                            - ignoring end-of-support dates
                            - choosing a release because it is "newer" only

                            Better:
                            - define one production LTS baseline
                            - define patch cadence
                            - define upgrade window
                            - keep rollback images
                            - document exceptions
Current release examples and how to read them

Ubuntu versions use a year.month format. For example, 24.04 means a release from April 2024. LTS releases are usually April releases in even-numbered years. Point releases such as 24.04.4 are refreshed installation images for the same LTS family.

ExampleMeaningUse caseWhat to remember
24.04 LTSNoble Numbat LTS family.Production baseline, servers, cloud.Long-term support release.
24.04.x LTSPoint release inside the 24.04 LTS family.Fresh install image with accumulated updates.Still same LTS generation.
25.10Interim release.Short-lived dev/test or recent features.Requires faster upgrade planning.
22.04 LTSPrevious LTS generation.Existing production fleets.Plan migration before support constraints become urgent.
20.04 LTSOlder LTS generation.Legacy systems.Often requires ESM/Pro or migration plan.
Practical reading: 24.04.4 LTS means the 4th point-release image of Ubuntu 24.04 LTS, not a completely different major OS generation.
Version naming pattern
Ubuntu version format:
                            YY.MM

                            Examples:
                            22.04 = April 2022
                            24.04 = April 2024
                            25.10 = October 2025

                            LTS examples:
                            20.04 LTS
                            22.04 LTS
                            24.04 LTS

                            Point release examples:
                            22.04.5 LTS
                            24.04.3 LTS
                            24.04.4 LTS

                            Meaning:
                            major LTS family + refreshed installer media
Release interpretation flow
See a version number
                            โ”‚
                            โ–ผ
                            Is it marked LTS?
                            โ”œโ”€โ”€ yes
                            โ”‚   โ”œโ”€โ”€ good production candidate
                            โ”‚   โ”œโ”€โ”€ check standard support date
                            โ”‚   โ””โ”€โ”€ check Pro/ESM if long-lived
                            โ”‚
                            โ””โ”€โ”€ no
                            โ”œโ”€โ”€ interim release
                            โ”œโ”€โ”€ check short support date
                            โ””โ”€โ”€ use mainly for dev/test unless justified
Useful official sources
Ubuntu release cycle:
                            https://ubuntu.com/about/release-cycle

                            Ubuntu releases:
                            https://releases.ubuntu.com/

                            Ubuntu release list:
                            https://documentation.ubuntu.com/project/release-team/list-of-releases/

                            Ubuntu release notes:
                            https://documentation.ubuntu.com/release-notes/
Support timeline: standard support, ESM and end of life

Support lifecycle matters because an unsupported server becomes a security and compliance risk. Once a release is out of standard support, teams must either upgrade, use an extended maintenance option if available, or retire the system.

Lifecycle phaseMeaningOperational action
Active standard supportNormal security and maintenance updates.Patch regularly, monitor advisories.
Point release phaseRefreshed install media for LTS family.Use latest point image for new servers.
Approaching end of standard supportUpgrade planning becomes urgent.Inventory, staging test, migration window.
ESM / extended maintenanceExtended security coverage for supported scenarios.Use as controlled bridge, not as excuse to avoid upgrades forever.
End of lifeNo normal support path for that release.Upgrade, isolate, replace or retire.
Support responsibility map
Operating system lifecycle
                            โ”‚
                            โ”œโ”€โ”€ security updates
                            โ”œโ”€โ”€ kernel updates
                            โ”œโ”€โ”€ package patches
                            โ”œโ”€โ”€ repository availability
                            โ”œโ”€โ”€ vendor support
                            โ””โ”€โ”€ compliance status

                            Operations team responsibility
                            โ”‚
                            โ”œโ”€โ”€ know release version
                            โ”œโ”€โ”€ know support end date
                            โ”œโ”€โ”€ patch regularly
                            โ”œโ”€โ”€ plan reboots
                            โ”œโ”€โ”€ test upgrades
                            โ””โ”€โ”€ avoid unsupported servers
Timeline diagram
LTS release
                            โ”‚
                            โ”œโ”€โ”€ Year 0
                            โ”‚       โ””โ”€โ”€ release becomes production candidate
                            โ”‚
                            โ”œโ”€โ”€ Years 0-5
                            โ”‚       โ”œโ”€โ”€ standard security maintenance
                            โ”‚       โ”œโ”€โ”€ point releases
                            โ”‚       โ”œโ”€โ”€ cloud images maintained
                            โ”‚       โ””โ”€โ”€ normal production usage
                            โ”‚
                            โ”œโ”€โ”€ After standard support
                            โ”‚       โ”œโ”€โ”€ upgrade recommended
                            โ”‚       โ””โ”€โ”€ ESM / Ubuntu Pro may be used
                            โ”‚
                            โ””โ”€โ”€ Long-lived legacy phase
                            โ”œโ”€โ”€ higher operational risk
                            โ”œโ”€โ”€ stronger justification required
                            โ””โ”€โ”€ migration plan should exist
Operational policy example
Company Ubuntu policy:
                            - production servers use LTS only
                            - new projects use current LTS point image
                            - old LTS versions are reviewed quarterly
                            - unsupported releases are forbidden
                            - interim releases require architecture approval
                            - upgrade tests must pass in staging
                            - rollback image must exist
                            - patching window is monthly
                            - emergency CVE patching is immediate
Risk: an out-of-support OS may continue to run, but it becomes harder to patch, audit, insure, certify and defend during incidents.
How to choose the right Ubuntu version
ContextRecommended choiceReason
Production web serverLatest stable LTS point release.Security support, standardization, predictable patching.
Database serverLTS only.Data systems need stability and tested upgrade windows.
Kubernetes nodeLTS supported by your Kubernetes distribution.Kernel, container runtime and vendor compatibility.
CI runnerLTS by default.Reproducible builds and stable toolchains.
Developer workstationLTS for stability, interim for recent desktop features.Depends on tolerance for upgrades.
Recent hardwareLTS with HWE kernel or interim if required.Driver and kernel support may matter.
Short-lived labInterim can be acceptable.Easy to rebuild if support ends.
Production decision matrix
Production workload?
                            โ”œโ”€โ”€ yes -> LTS
                            โ””โ”€โ”€ no
                            โ”‚
                            โ–ผ
                            Long-lived machine?
                            โ”œโ”€โ”€ yes -> LTS
                            โ””โ”€โ”€ no
                            โ”‚
                            โ–ผ
                            Need newest kernel/userspace?
                            โ”œโ”€โ”€ yes -> interim or LTS HWE
                            โ””โ”€โ”€ no -> LTS

                            Compliance or security audit?
                            โ””โ”€โ”€ LTS + documented patch policy
Version choice scoring
QuestionIf yesImpact
Will this server live more than 12 months?Choose LTS.Reduces upgrade pressure.
Does it host production data?Choose LTS.Stability matters more than novelty.
Is it part of a fleet?Standardize on one LTS.Improves automation and support.
Does hardware need a newer kernel?Evaluate HWE or interim.Driver support may override default.
Is it disposable?Interim is acceptable.Lower lifecycle risk.
Practical production baseline: for new servers, use the current LTS generation and the latest point-release image unless a compatibility requirement forces another choice.
Upgrade strategy: from one Ubuntu generation to another

Ubuntu upgrades should be treated as infrastructure changes, not casual package updates. A release upgrade may change kernel, libraries, system services, defaults, packages, Python versions, OpenSSL behavior, firewall tooling or network configuration.

Safe upgrade process
1. Inventory
                            - server role
                            - Ubuntu version
                            - kernel version
                            - installed packages
                            - services
                            - external repositories

                            2. Prepare
                            - backup data
                            - snapshot VM
                            - export configs
                            - check disk space
                            - review release notes

                            3. Test
                            - clone staging
                            - upgrade staging
                            - run application tests
                            - validate logs and services

                            4. Execute
                            - schedule maintenance
                            - stop risky jobs
                            - upgrade
                            - reboot
                            - validate services

                            5. Verify
                            - application health
                            - network ports
                            - logs
                            - performance
                            - monitoring

                            6. Rollback if needed
                            - restore snapshot
                            - restore old image
                            - revert DNS or load balancer
Upgrade architecture
Current production server
                            โ”‚
                            โ”œโ”€โ”€ snapshot / AMI / backup
                            โ”œโ”€โ”€ package inventory
                            โ”œโ”€โ”€ config export
                            โ””โ”€โ”€ staging clone
                            โ”‚
                            โ–ผ
                            Staging upgrade
                            โ”‚
                            โ”œโ”€โ”€ do-release-upgrade
                            โ”œโ”€โ”€ reboot
                            โ”œโ”€โ”€ service validation
                            โ”œโ”€โ”€ application tests
                            โ””โ”€โ”€ performance checks
                            โ”‚
                            โ–ผ
                            Production rollout
                            โ”‚
                            โ”œโ”€โ”€ one node first
                            โ”œโ”€โ”€ monitor
                            โ”œโ”€โ”€ continue rollout
                            โ””โ”€โ”€ keep rollback window
Blue/green alternative
Instead of in-place upgrade:

                            1. Build new Ubuntu LTS image
                            2. Install application stack
                            3. Restore or connect data
                            4. Run smoke tests
                            5. Attach to load balancer
                            6. Shift traffic gradually
                            7. Keep old server as rollback
                            8. Retire old server after validation

                            Often safer for:
                            - web apps
                            - stateless APIs
                            - container hosts
                            - cloud workloads
Production rule: blue/green replacement is often safer than in-place upgrade for application servers. In-place upgrade is more common for carefully controlled single-server or legacy systems.
Commands to identify version, support and upgrade state
Version inspection
# Ubuntu version
                            lsb_release -a

                            # OS release metadata
                            cat /etc/os-release

                            # Kernel version
                            uname -a

                            # Host and OS summary
                            hostnamectl

                            # Architecture
                            dpkg --print-architecture

                            # Check codename only
                            lsb_release -cs
Package maintenance
# Refresh package indexes
                            sudo apt update

                            # Show upgradeable packages
                            apt list --upgradable

                            # Upgrade installed packages
                            sudo apt upgrade

                            # Full upgrade with dependency changes
                            sudo apt full-upgrade

                            # Remove unused packages
                            sudo apt autoremove

                            # Check held packages
                            apt-mark showhold
Reboot and upgrade readiness
# Check if reboot is required
                            test -f /var/run/reboot-required && cat /var/run/reboot-required

                            # See packages requiring reboot if available
                            cat /var/run/reboot-required.pkgs 2>/dev/null

                            # Check disk space before upgrades
                            df -h

                            # Check package manager locks
                            ps aux | grep -E 'apt|dpkg'

                            # Repair interrupted package operation
                            sudo dpkg --configure -a
                            sudo apt -f install
Release upgrade
# Install release upgrade tool if missing
                            sudo apt install update-manager-core

                            # Check release upgrader configuration
                            cat /etc/update-manager/release-upgrades

                            # Start release upgrade
                            sudo do-release-upgrade

                            # Server session safety
                            sudo apt install screen
                            screen -S upgrade
                            sudo do-release-upgrade
Warning: do not run a production release upgrade without backup, snapshot, staging test, disk-space check and rollback plan.
Ubuntu versions in cloud images, AMIs and automation

In cloud environments, Ubuntu versioning becomes part of your infrastructure standard. Teams usually define a base image: Ubuntu LTS version, packages, users, SSH hardening, monitoring agent, logging agent, cloud-init behavior and security baseline.

Cloud conceptUbuntu version impactBest practice
AMI / imageDefines OS baseline and package versions.Use approved LTS image family.
cloud-initBootstraps users, packages and config.Test with target LTS version.
TerraformReferences image IDs or filters.Avoid unpinned surprise changes in production.
Golden imagePre-baked hardened server template.Rebuild regularly with patches.
AutoscalingNew nodes inherit image baseline.Validate image before rollout.
Patch managementImages age quickly if not rebuilt.Rebuild and replace, not only patch in place.
Cloud image lifecycle
Official Ubuntu LTS cloud image
                            โ”‚
                            โ–ผ
                            Golden image pipeline
                            โ”‚
                            โ”œโ”€โ”€ install baseline packages
                            โ”œโ”€โ”€ configure SSH
                            โ”œโ”€โ”€ add monitoring agent
                            โ”œโ”€โ”€ apply security hardening
                            โ”œโ”€โ”€ apply updates
                            โ””โ”€โ”€ run validation tests
                            โ”‚
                            โ–ผ
                            Approved image
                            โ”‚
                            โ”œโ”€โ”€ used by Terraform
                            โ”œโ”€โ”€ used by autoscaling groups
                            โ”œโ”€โ”€ used by Kubernetes nodes
                            โ””โ”€โ”€ used by application servers
                            โ”‚
                            โ–ผ
                            Periodic rebuild
                            โ”œโ”€โ”€ security patches
                            โ”œโ”€โ”€ config changes
                            โ””โ”€โ”€ new point release
Cloud version rules
Recommended:
                            - use LTS for production cloud VMs
                            - pin or control image selection
                            - rebuild images regularly
                            - test cloud-init on target release
                            - document image version
                            - keep rollback image available
                            - avoid unmanaged snowflake servers

                            Avoid:
                            - latest image without validation
                            - random Ubuntu versions across fleet
                            - old images with no patch process
                            - manual changes after boot with no automation
Cloud rule: Ubuntu version is part of the infrastructure contract. Treat it like code: pinned, reviewed, tested and rolled out progressively.
Version-related risks and anti-patterns
Anti-patternRiskCorrection
Using interim release for long-lived productionSupport ends quickly, forced upgrade under pressure.Use LTS for production.
No inventory of Ubuntu versionsUnsupported servers remain hidden.Maintain fleet inventory.
Ignoring release notesBreaking changes surprise production.Review release notes before upgrade.
Mixing many versions randomlyAutomation, debugging and support become harder.Define approved baselines.
No rollback imageFailed upgrade becomes long outage.Snapshot or blue/green rollout.
Third-party repositories unmanagedUpgrade conflicts and broken packages.Audit external apt sources.
Kernel upgrade without reboot planSecurity patch is installed but not active.Track reboot-required state.
Old LTS kept foreverSecurity and compliance risk grows.Plan migration or use ESM as a temporary bridge.
Version risk decision tree
Server has old Ubuntu version
                            โ”‚
                            โ–ผ
                            Is it still in standard support?
                            โ”œโ”€โ”€ yes
                            โ”‚   โ”œโ”€โ”€ keep patched
                            โ”‚   โ””โ”€โ”€ plan future migration
                            โ”‚
                            โ””โ”€โ”€ no
                            โ”‚
                            โ–ผ
                            Is ESM / Pro enabled and valid?
                            โ”œโ”€โ”€ yes
                            โ”‚   โ”œโ”€โ”€ use as temporary bridge
                            โ”‚   โ””โ”€โ”€ plan upgrade
                            โ”‚
                            โ””โ”€โ”€ no
                            โ”‚
                            โ–ผ
                            Risk is high
                            โ”œโ”€โ”€ isolate if necessary
                            โ”œโ”€โ”€ snapshot
                            โ”œโ”€โ”€ test upgrade path
                            โ””โ”€โ”€ migrate or retire
Upgrade failure symptoms
After upgrade, check:
                            - service fails to start
                            - port no longer listens
                            - Python or PHP version changed
                            - OpenSSL behavior changed
                            - Nginx config warning becomes fatal
                            - database extension mismatch
                            - kernel module missing
                            - firewall rule behavior changed
                            - DNS resolution changed
                            - cloud-init or network config changed
Production rule: a release upgrade is not just an apt upgrade. It is a platform change and must be tested like one.
Production checklist for Ubuntu version strategy
Version governance checklist
[ ] Approved Ubuntu LTS baseline is defined
                            [ ] Interim releases require explicit exception
                            [ ] Fleet inventory contains Ubuntu version
                            [ ] Fleet inventory contains kernel version
                            [ ] Support end dates are tracked
                            [ ] Old LTS migration plan exists
                            [ ] ESM / Pro usage is documented if used
                            [ ] Golden images are versioned
                            [ ] Cloud image selection is controlled
                            [ ] Third-party apt repositories are inventoried
                            [ ] Release notes are reviewed before upgrade
                            [ ] Staging upgrade test is mandatory
                            [ ] Rollback method is documented
                            [ ] Reboot policy exists for kernel updates
                            [ ] Patch cadence is documented
Minimum production baseline
Production Ubuntu baseline:
                            - LTS release
                            - latest approved point image
                            - security updates enabled
                            - patch window defined
                            - reboot policy defined
                            - monitored support end date
                            - standard package repositories
                            - controlled third-party repositories
                            - backup/snapshot before major upgrade
                            - staging validation before production rollout
Final decision summary
QuestionAnswer
What should I use for production?Ubuntu LTS.
Should I use the latest interim release on a server?Only for a short-lived or explicitly justified case.
Should I standardize versions?Yes, define one or two approved LTS baselines.
Should I upgrade in place?Only with backup, staging test and rollback plan.
Is ESM a replacement for upgrading?No, it is usually a bridge for long-lived systems.
What matters most?Support horizon, patching, compatibility and rollback.
Final rule
Ubuntu version strategy is production risk management.
Choose LTS for stability, track support dates, patch regularly, test upgrades in staging, keep rollback images, and never let unsupported servers become invisible infrastructure.
7.3 Ubuntu Terminal & BASH: shell, file commands, sudo, chmod, chown and operational reflexes
Why the terminal matters

The terminal is the fastest and most precise way to operate Ubuntu. It gives direct access to files, processes, services, logs, permissions, packages, networking, storage and automation. On a server, there is often no graphical interface: SSH plus terminal is the normal administration model.

BASH is the default command-line shell on many Ubuntu systems. It lets you run commands, chain them, inspect output, redirect logs, write scripts and automate repeatable tasks. A developer who understands BASH can deploy, debug and operate systems more effectively.

Use caseTerminal advantageExample
Server administrationWorks remotely over SSH.ssh deploy@server
DebuggingDirect logs and service state.journalctl -u nginx
File operationsFast navigation, copy, move, search.find /var/log -name "*.log"
AutomationRepeatable scripts.backup.sh, deploy.sh
SecurityPrecise control of users and permissions.chmod, chown, sudo
PerformanceImmediate resource inspection.top, df -h, free -h
Core rule: the terminal is not โ€œold schoolโ€; it is the professional control plane for Linux systems, servers, cloud VMs, containers and automation.
Terminal control map
Ubuntu terminal
                            โ”‚
                            โ”œโ”€โ”€ Files
                            โ”‚       โ”œโ”€โ”€ ls
                            โ”‚       โ”œโ”€โ”€ cd
                            โ”‚       โ”œโ”€โ”€ pwd
                            โ”‚       โ”œโ”€โ”€ cp
                            โ”‚       โ”œโ”€โ”€ mv
                            โ”‚       โ””โ”€โ”€ rm
                            โ”‚
                            โ”œโ”€โ”€ Text and search
                            โ”‚       โ”œโ”€โ”€ cat
                            โ”‚       โ”œโ”€โ”€ less
                            โ”‚       โ”œโ”€โ”€ head
                            โ”‚       โ”œโ”€โ”€ tail
                            โ”‚       โ”œโ”€โ”€ grep
                            โ”‚       โ””โ”€โ”€ find
                            โ”‚
                            โ”œโ”€โ”€ Permissions
                            โ”‚       โ”œโ”€โ”€ sudo
                            โ”‚       โ”œโ”€โ”€ chmod
                            โ”‚       โ”œโ”€โ”€ chown
                            โ”‚       โ”œโ”€โ”€ groups
                            โ”‚       โ””โ”€โ”€ id
                            โ”‚
                            โ”œโ”€โ”€ System operations
                            โ”‚       โ”œโ”€โ”€ systemctl
                            โ”‚       โ”œโ”€โ”€ journalctl
                            โ”‚       โ”œโ”€โ”€ apt
                            โ”‚       โ””โ”€โ”€ ssh
                            โ”‚
                            โ””โ”€โ”€ Automation
                            โ”œโ”€โ”€ variables
                            โ”œโ”€โ”€ pipes
                            โ”œโ”€โ”€ redirects
                            โ”œโ”€โ”€ loops
                            โ””โ”€โ”€ scripts
Mental model
Command anatomy:
                            command [options] [arguments]

                            Examples:
                            ls -lah /var/log
                            cp -a source destination
                            rm old-file.log
                            sudo systemctl restart nginx

                            Where:
                            - command   = program to run
                            - options   = behavior modifiers
                            - arguments = files, directories, services, values
BASH basics: prompt, paths, history, completion, pipes and redirects

BASH is both an interactive shell and a scripting language. It receives commands, expands variables, resolves paths, runs programs, connects outputs to inputs and lets you automate tasks through scripts.

ConceptMeaningExample
PromptWhere you type commands.user@host:~$
Home directoryYour personal directory.~, /home/deploy
Current directoryWhere commands operate by default.pwd
Absolute pathPath from root /./var/log/syslog
Relative pathPath from current directory.../backup
HistoryPrevious commands.history
Tab completionAuto-complete command or path.Press TAB
BASH essentials
# Show current directory
                            pwd

                            # Show current user
                            whoami

                            # Show command history
                            history

                            # Clear screen
                            clear

                            # Show current shell
                            echo $SHELL

                            # Show environment variables
                            env

                            # Show PATH
                            echo $PATH

                            # Show command location
                            which bash
                            which python3
                            which nginx
Pipes and redirects
# Pipe output to another command
                            ps aux | grep nginx

                            # Redirect output to a file
                            ls -lah /var/log > files.txt

                            # Append output to a file
                            date >> audit.log

                            # Redirect errors too
                            command > output.log 2> error.log

                            # Redirect output and errors together
                            command > all.log 2>&1

                            # View long output page by page
                            journalctl -u nginx | less
Useful keyboard shortcuts
ShortcutAction
TABComplete command or filename.
Ctrl + CInterrupt current command.
Ctrl + LClear screen.
Ctrl + RSearch command history.
Ctrl + AMove to beginning of line.
Ctrl + EMove to end of line.
Efficiency rule: use history search and tab completion constantly. They reduce typing errors and speed up operations.
Navigation: pwd, ls, cd and filesystem orientation

Navigation is the first terminal skill. You need to know where you are, what files are present, how to move between directories and how to distinguish absolute and relative paths.

CommandPurposeExample
pwdPrint current directory.pwd
lsList files.ls
ls -lahDetailed list, hidden files, human sizes.ls -lah /etc
cdChange directory.cd /var/log
cd ..Move to parent directory.cd ..
cd ~Move to home directory.cd ~
cd -Return to previous directory.cd -
Navigation examples
# Where am I?
                            pwd

                            # List current directory
                            ls

                            # Detailed list with hidden files
                            ls -lah

                            # Go to logs
                            cd /var/log

                            # Go home
                            cd ~

                            # Go one level up
                            cd ..

                            # Go to previous directory
                            cd -

                            # List directory without entering it
                            ls -lah /etc/nginx
Ubuntu filesystem map
/
                            โ”œโ”€โ”€ etc      system configuration
                            โ”œโ”€โ”€ home     user home directories
                            โ”œโ”€โ”€ var      logs, cache, databases, runtime data
                            โ”œโ”€โ”€ srv      service/application data
                            โ”œโ”€โ”€ opt      optional third-party software
                            โ”œโ”€โ”€ usr      installed programs and libraries
                            โ”œโ”€โ”€ tmp      temporary files
                            โ”œโ”€โ”€ boot     bootloader and kernel files
                            โ”œโ”€โ”€ dev      device files
                            โ”œโ”€โ”€ proc     process and kernel virtual filesystem
                            โ””โ”€โ”€ root     root user's home directory
Path examples
Absolute paths:
                            - /etc/nginx/nginx.conf
                            - /var/log/syslog
                            - /srv/myapp
                            - /home/deploy/.ssh/authorized_keys

                            Relative paths:
                            - ./script.sh
                            - ../backup
                            - logs/app.log
                            - ../../etc/example.conf

                            Special paths:
                            - .  current directory
                            - .. parent directory
                            - ~  current user's home directory
                            - /  filesystem root
Path warning: commands like rm, chmod and chown depend heavily on the path you give. Always verify with pwd and ls before destructive actions.
File operations: cp, mv, rm, mkdir, touch and safe handling

File operations are powerful and dangerous. Copying, moving and deleting files from the terminal is fast, but usually does not ask for confirmation unless you request it. In production, create backups before editing or deleting configuration files.

CommandPurposeSafe example
cpCopy files.cp file.txt file.bak
cp -aCopy preserving metadata.cp -a /etc/nginx /etc/nginx.bak
mvMove or rename.mv app.conf app.conf.disabled
rmRemove file.rm old.log
mkdirCreate directory.mkdir -p /srv/myapp/logs
touchCreate empty file or update timestamp.touch deploy.log
File operation examples
# Create a directory tree
                            mkdir -p /srv/myapp/releases

                            # Create an empty file
                            touch /tmp/test.txt

                            # Copy a file
                            cp config.ini config.ini.bak

                            # Copy a directory with attributes
                            cp -a /etc/nginx /etc/nginx.bak.$(date +%Y%m%d-%H%M%S)

                            # Rename a file
                            mv old.conf new.conf

                            # Move a file to backup directory
                            mv app.log /tmp/app.log.bak

                            # Remove a file
                            rm old-file.txt
Danger zone: rm
# Remove one file
                            rm file.txt

                            # Ask before deleting
                            rm -i file.txt

                            # Remove directory recursively
                            rm -r directory

                            # Force recursive delete - dangerous
                            rm -rf directory

                            # Extremely dangerous if path is wrong
                            sudo rm -rf /some/path
Safe deletion workflow
Before deleting:
                            1. Show current directory
                            pwd

                            2. List target
                            ls -lah target

                            3. Check size if directory
                            du -sh target

                            4. Move to quarantine first
                            mv target /tmp/target.to-delete

                            5. Verify service still works

                            6. Delete later if safe
Backup-before-edit pattern
# Backup config before edit
                            sudo cp -a /etc/nginx/nginx.conf \
                            /etc/nginx/nginx.conf.bak.$(date +%Y%m%d-%H%M%S)

                            # Edit file
                            sudo vim /etc/nginx/nginx.conf

                            # Validate before reload
                            sudo nginx -t

                            # Reload if valid
                            sudo systemctl reload nginx
Production rule: prefer moving risky files to a quarantine location before deleting. Deletion is not a rollback strategy.
Read, inspect and search files: cat, less, head, tail, grep, find

Reading and searching files is a core Linux skill. Logs, configuration, service units, environment files and scripts are plain text. The right command depends on file size and whether you need the beginning, the end, live follow or keyword search.

CommandBest forExample
catSmall files.cat /etc/os-release
lessLarge files, page navigation.less /var/log/syslog
headFirst lines of a file.head -50 app.log
tailLast lines of a file.tail -100 app.log
tail -fFollow a log live.tail -f /var/log/syslog
grepSearch text.grep -i error app.log
findFind files by name, size, age.find /var/log -name "*.log"
Read commands
# Small file
                            cat /etc/os-release

                            # Large file, scroll
                            less /var/log/syslog

                            # First lines
                            head -50 /var/log/syslog

                            # Last lines
                            tail -100 /var/log/syslog

                            # Follow live
                            tail -f /var/log/syslog

                            # Number lines
                            nl config.ini | less
Search examples
# Case-insensitive search
                            grep -i "error" app.log

                            # Search recursively
                            grep -R "server_name" /etc/nginx

                            # Show line numbers
                            grep -n "listen" /etc/nginx/sites-enabled/*

                            # Exclude noisy files
                            grep -R "DEBUG" /srv/myapp --exclude="*.pyc"

                            # Search compressed logs
                            zgrep -i "error" /var/log/syslog.*.gz

                            # Find files by name
                            find /etc -name "*.conf"

                            # Find large files
                            find /var -type f -size +100M -exec ls -lh {} \;

                            # Find recently modified files
                            find /etc -type f -mtime -2 -ls
Log reading pattern
When debugging logs:
                            1. Identify time window
                            2. Read service-specific logs first
                            3. Search for first real error
                            4. Correlate with recent deploy/update
                            5. Avoid reading huge files without filters

                            Examples:
                            journalctl -u nginx --since "30 min ago"
                            grep -i "permission denied" app.log
                            grep -i "connection refused" app.log
Reading rule: use less for large files, tail for recent lines, grep for patterns, and find for unknown locations.
sudo: super-user privileges and safe administration

sudo runs a command with elevated privileges, usually as root. On Ubuntu, normal users do not directly administer protected system areas. Instead, trusted users are added to the sudo group and elevate only when needed.

CommandMeaningExample
sudo commandRun one command as root.sudo apt update
sudo -lList allowed sudo commands.sudo -l
sudo -u user commandRun command as another user.sudo -u postgres psql
sudo -iStart root login shell.Use rarely and carefully.
visudoEdit sudoers safely.sudo visudo
sudo examples
# Update packages
                            sudo apt update

                            # Edit protected config
                            sudo vim /etc/ssh/sshd_config

                            # Restart service
                            sudo systemctl restart nginx

                            # Read protected log
                            sudo tail -100 /var/log/auth.log

                            # Run command as postgres user
                            sudo -u postgres psql

                            # Check your sudo privileges
                            sudo -l
sudo mental model
Normal user
                            โ”‚
                            โ”œโ”€โ”€ can read/write own files
                            โ”œโ”€โ”€ cannot edit system files
                            โ”œโ”€โ”€ cannot restart system services
                            โ””โ”€โ”€ cannot install packages
                            โ”‚
                            โ–ผ
                            sudo
                            โ”‚
                            โ”œโ”€โ”€ asks for authentication
                            โ”œโ”€โ”€ checks sudoers policy
                            โ”œโ”€โ”€ logs action
                            โ””โ”€โ”€ runs command with elevated privilege
User sudo setup
# Create user
                            sudo adduser deploy

                            # Add to sudo group
                            sudo usermod -aG sudo deploy

                            # Check group membership
                            groups deploy

                            # Show sudo group
                            getent group sudo

                            # Edit sudoers safely
                            sudo visudo
sudo safety rules
Do:
                            - use sudo for specific commands
                            - keep named admin users
                            - review sudo group members
                            - use visudo for sudoers edits
                            - log administrative changes

                            Avoid:
                            - logging in directly as root
                            - running long sessions as root
                            - using sudo with unknown scripts
                            - using sudo rm -rf without path verification
                            - granting sudo to every user
Security warning: sudo is not just โ€œpermission acceptedโ€. It is root-level control. Treat every sudo command as potentially system-changing.
chmod: file permission modes and practical examples

chmod changes file permissions. Linux permissions are split into three groups: owner, group and others. Each can have read, write and execute permissions. For directories, execute means โ€œcan enter/traverseโ€.

Permission notation
Example:
                            -rw-r--r-- 1 root root 1200 app.conf

                            Breakdown:
                            -       file type
                            rw-     owner permissions
                            r--     group permissions
                            r--     others permissions

                            r = read
                            w = write
                            x = execute / enter directory
ModeMeaningTypical use
600Owner read/write only.Private keys, secret files.
640Owner read/write, group read.App env files readable by service group.
644Owner write, everyone read.Normal config or static files.
700Owner full access only..ssh directory.
755Owner write, everyone read/execute.Directories and public scripts.
777Everyone can read/write/execute.Almost never acceptable.
chmod examples
# Normal file readable by everyone, writable by owner
                            chmod 644 config.ini

                            # Directory accessible by everyone, writable by owner
                            chmod 755 /srv/myapp

                            # Private SSH directory
                            chmod 700 ~/.ssh

                            # Private SSH key
                            chmod 600 ~/.ssh/id_ed25519

                            # Authorized keys
                            chmod 600 ~/.ssh/authorized_keys

                            # Make script executable
                            chmod +x deploy.sh

                            # Remove write access for group and others
                            chmod go-w file.txt
Numeric mode logic
Permission values:
                            r = 4
                            w = 2
                            x = 1

                            Examples:
                            7 = 4 + 2 + 1 = rwx
                            6 = 4 + 2     = rw-
                            5 = 4 + 1     = r-x
                            4 = 4         = r--

                            chmod 755:
                            owner = 7 = rwx
                            group = 5 = r-x
                            other = 5 = r-x

                            chmod 640:
                            owner = 6 = rw-
                            group = 4 = r--
                            other = 0 = ---
Permission troubleshooting
# Show permissions
                            ls -lah file

                            # Show full path permissions
                            namei -l /srv/myapp/current/.env

                            # Show current user groups
                            id

                            # Test as service user
                            sudo -u myapp cat /srv/myapp/.env
chmod rule: do not use chmod 777 to โ€œfixโ€ access. It hides the real ownership problem and creates a security risk.
chown: file ownership, groups and service users

chown changes file owner and group. Many permission errors are not caused by missing chmod, but by wrong ownership. Services such as Nginx, Gunicorn, PostgreSQL or application workers must be able to read the files they need, but should not own everything as root.

CommandMeaningExample
chown user fileChange owner.sudo chown deploy app.log
chown user:group fileChange owner and group.sudo chown deploy:www-data app
chown :group fileChange group only.sudo chown :www-data static
chown -RRecursive ownership change.Use carefully on directories.
id userShow UID, GID and groups.id myapp
chown examples
# Change one file owner
                            sudo chown deploy app.log

                            # Change owner and group
                            sudo chown deploy:www-data /srv/myapp

                            # Change group only
                            sudo chown :www-data /srv/myapp/static

                            # Recursive change, use carefully
                            sudo chown -R deploy:www-data /srv/myapp

                            # App env file owned by root, readable by app group
                            sudo chown root:myapp /srv/myapp/.env
                            sudo chmod 640 /srv/myapp/.env
Ownership model for web app
/srv/myapp
                            โ”‚
                            โ”œโ”€โ”€ code files
                            โ”‚       โ”œโ”€โ”€ owner: deploy
                            โ”‚       โ””โ”€โ”€ group: www-data
                            โ”‚
                            โ”œโ”€โ”€ static files
                            โ”‚       โ”œโ”€โ”€ readable by nginx
                            โ”‚       โ””โ”€โ”€ not writable by public users
                            โ”‚
                            โ”œโ”€โ”€ .env secrets
                            โ”‚       โ”œโ”€โ”€ owner: root
                            โ”‚       โ”œโ”€โ”€ group: myapp
                            โ”‚       โ””โ”€โ”€ mode: 640
                            โ”‚
                            โ””โ”€โ”€ runtime logs/uploads
                            โ”œโ”€โ”€ owner: myapp
                            โ””โ”€โ”€ controlled write access
Service user checks
# Show service user in unit file
                            systemctl cat myapp

                            # Show process user
                            ps aux | grep gunicorn

                            # Check user groups
                            id myapp

                            # Check path permissions
                            namei -l /srv/myapp/current/.env

                            # Test access as service user
                            sudo -u myapp test -r /srv/myapp/.env && echo readable
Common ownership mistakes
MistakeConsequenceBetter approach
Everything owned by rootApp cannot write needed runtime files.Use service-specific owner/group.
Everything owned by app userApp can modify its own code/secrets.Separate code, secrets and runtime dirs.
Recursive chown on wrong pathSystem or app permissions broken.Verify path with pwd and ls.
Using chmod instead of chownPermissions become too broad.Fix ownership first.
Ownership rule: permissions answer โ€œwhat can be doneโ€; ownership answers โ€œwho controls the fileโ€. Both must be correct.
Safety patterns: avoid destructive mistakes

The terminal is powerful because it does exactly what you ask. That also makes it dangerous. Professional terminal usage means verifying targets, backing up before edits, validating configs before restart and avoiding irreversible commands when a reversible action is possible.

Risky actionSafer patternReason
Delete directly with rm -rfMove to quarantine first.Allows rollback.
Edit config without backupcp -a file file.bak.DATEEasy restore.
Restart service blindlyValidate config and read logs first.Avoid making outage worse.
Recursive chmod/chown on broad pathCheck target with pwd, ls, du.Prevents system-wide damage.
Run unknown script with sudoDownload, inspect, verify, then run.Supply-chain safety.
Disable SSH password auth immediatelyTest SSH key in second terminal first.Prevents lockout.
Safe config edit workflow
# 1. Backup
                            sudo cp -a /etc/ssh/sshd_config \
                            /etc/ssh/sshd_config.bak.$(date +%Y%m%d-%H%M%S)

                            # 2. Edit
                            sudo vim /etc/ssh/sshd_config

                            # 3. Validate
                            sudo sshd -t

                            # 4. Restart only if valid
                            sudo systemctl restart ssh

                            # 5. Check logs
                            journalctl -u ssh --since "10 min ago"
Dangerous command patterns
Dangerous:
                            sudo rm -rf /
                            sudo rm -rf *
                            sudo chown -R user:user /
                            sudo chmod -R 777 /
                            sudo chmod -R 777 /var/www
                            curl URL | sudo bash
                            sudo mv /etc /tmp
                            docker compose down -v

                            Safer:
                            - verify path first
                            - backup first
                            - move instead of delete
                            - inspect scripts
                            - target exact directory
                            - keep rollback possible
Pre-flight checklist
Before destructive command:
                            [ ] Am I on the right server?
                            [ ] Am I in the right directory?
                            [ ] Did I list the target?
                            [ ] Did I check the size?
                            [ ] Do I have a backup?
                            [ ] Can I rollback?
                            [ ] Is the command scoped enough?
                            [ ] Am I using sudo unnecessarily?
                            [ ] Did I understand wildcard expansion?
                            [ ] Did I test on staging if production?
Know your context
# Confirm server
                            hostnamectl

                            # Confirm user
                            whoami

                            # Confirm directory
                            pwd

                            # Confirm target
                            ls -lah target

                            # Confirm disk and space
                            df -h
                            du -sh target

                            # Confirm command before sudo
                            echo sudo rm -rf target
Safety rule: the most dangerous Linux commands are short, recursive and executed with sudo. Slow down before using them.
Terminal and permissions cheat sheet
Fundamental commands
# Navigation
                            pwd
                            ls
                            ls -lah
                            cd /path
                            cd ..
                            cd ~
                            cd -

                            # Files
                            cp file file.bak
                            cp -a dir dir.bak
                            mv old new
                            rm file
                            rm -i file
                            mkdir -p path/to/dir
                            touch file

                            # Read and search
                            cat file
                            less file
                            head -50 file
                            tail -100 file
                            tail -f file
                            grep -i "error" file
                            find /path -name "*.log"

                            # Context
                            whoami
                            id
                            hostnamectl
                            history
                            which command
sudo, chmod, chown cheat sheet
# sudo
                            sudo apt update
                            sudo systemctl restart nginx
                            sudo -l
                            sudo -u postgres psql
                            sudo visudo

                            # chmod
                            chmod 644 file
                            chmod 640 secret.env
                            chmod 755 directory
                            chmod 700 ~/.ssh
                            chmod 600 ~/.ssh/id_ed25519
                            chmod +x script.sh

                            # chown
                            sudo chown user file
                            sudo chown user:group file
                            sudo chown :group file
                            sudo chown -R user:group directory

                            # Diagnose permissions
                            ls -lah file
                            namei -l /path/to/file
                            id user
                            groups user
Final rule
Mastering the Ubuntu terminal means mastering control with discipline.
Use BASH to navigate, inspect, modify, automate and troubleshoot. Use sudo only when required. Use chmod for permissions, chown for ownership, and always verify the target before destructive commands.
Minimal professional reflexes
[ ] I know where I am with pwd
                            [ ] I inspect before changing with ls -lah
                            [ ] I backup config files before editing
                            [ ] I validate configs before restart
                            [ ] I avoid chmod 777
                            [ ] I understand sudo impact
                            [ ] I test SSH access before hardening
                            [ ] I move risky files before deleting
                            [ ] I use logs before restarting blindly
                            [ ] I document production changes
7.4 Ubuntu Maintenance & Security: updates, UFW, Timeshift, backups, logs and system recovery
Maintenance and security objective

Ubuntu maintenance is the set of recurring actions that keep a system secure, stable, recoverable and understandable. It includes package updates, security patching, reboot planning, firewall control, restore points, backups, log review and incident diagnosis.

On a desktop, maintenance protects the user from data loss and broken upgrades. On a server, maintenance protects services from outages, vulnerabilities, full disks, misconfiguration and unrecoverable incidents.

AreaPurposeMain toolsFailure prevented
System updatesApply fixes and security patches.apt, Software Updater, unattended upgrades.Known vulnerabilities, outdated packages.
FirewallLimit network exposure.ufw, security groups, router firewall.Open services reachable from outside.
Restore pointsReturn system state after bad change.Timeshift, snapshots.Broken updates, bad configuration.
BackupsProtect personal or business data.rsync, external disk, cloud backup, database dumps.Data loss, disk failure, accidental deletion.
LogsUnderstand what happened.journalctl, /var/log, app logs.Blind troubleshooting and repeated incidents.
Routine checksDetect problems before they grow.df, systemctl, journalctl.Full disk, failed services, unnoticed errors.
Core rule: maintenance is not a one-time setup. It is a routine: update, verify, protect, observe and document.
Maintenance architecture map
Ubuntu maintenance
                            โ”‚
                            โ”œโ”€โ”€ Updates
                            โ”‚       โ”œโ”€โ”€ apt update
                            โ”‚       โ”œโ”€โ”€ apt upgrade
                            โ”‚       โ”œโ”€โ”€ security fixes
                            โ”‚       โ””โ”€โ”€ reboot policy
                            โ”‚
                            โ”œโ”€โ”€ Firewall
                            โ”‚       โ”œโ”€โ”€ default deny incoming
                            โ”‚       โ”œโ”€โ”€ allow required services
                            โ”‚       โ”œโ”€โ”€ restrict SSH
                            โ”‚       โ””โ”€โ”€ review open ports
                            โ”‚
                            โ”œโ”€โ”€ Recovery
                            โ”‚       โ”œโ”€โ”€ Timeshift restore points
                            โ”‚       โ”œโ”€โ”€ backups
                            โ”‚       โ”œโ”€โ”€ snapshots
                            โ”‚       โ””โ”€โ”€ restore testing
                            โ”‚
                            โ”œโ”€โ”€ Logs
                            โ”‚       โ”œโ”€โ”€ journalctl
                            โ”‚       โ”œโ”€โ”€ auth logs
                            โ”‚       โ”œโ”€โ”€ system logs
                            โ”‚       โ””โ”€โ”€ app logs
                            โ”‚
                            โ””โ”€โ”€ Routine
                            โ”œโ”€โ”€ weekly checks
                            โ”œโ”€โ”€ monthly cleanup
                            โ”œโ”€โ”€ update review
                            โ””โ”€โ”€ documentation
Desktop vs server emphasis
ContextPriorityExample
DesktopRestore points, data backup, safe updates.Timeshift before big upgrade.
ServerSecurity patches, firewall, monitoring, backups.Patch window and reboot plan.
Cloud VMSnapshots, security groups, logs, replacement.AMI or EBS snapshot before change.
Developer workstationTool updates, project backup, SSH keys.Backup home and dotfiles.
System updates: why and how to perform them

System updates fix bugs, close security vulnerabilities, improve hardware support and keep installed packages consistent with the Ubuntu release. Updates should be frequent enough to reduce exposure, but controlled enough to avoid surprise downtime on important machines.

CommandPurposeWhen to use
sudo apt updateRefresh package metadata.Before installing or upgrading packages.
apt list --upgradableShow available upgrades.Before applying updates.
sudo apt upgradeUpgrade packages safely without removals.Regular maintenance.
sudo apt full-upgradeUpgrade with dependency changes.When upgrade requires installs/removals.
sudo apt autoremoveRemove unused dependencies.After upgrades or package removals.
sudo apt cleanClean package cache.When disk cleanup is needed.
Standard update flow
# 1. Refresh package metadata
                            sudo apt update

                            # 2. Review available upgrades
                            apt list --upgradable

                            # 3. Apply regular upgrades
                            sudo apt upgrade

                            # 4. Remove unused packages
                            sudo apt autoremove

                            # 5. Check if reboot is required
                            test -f /var/run/reboot-required && cat /var/run/reboot-required

                            # 6. Verify system state
                            systemctl --failed
                            journalctl -p warning --since "30 min ago"
Update decision diagram
Updates available
                            โ”‚
                            โ”œโ”€โ”€ Desktop workstation?
                            โ”‚       โ”œโ”€โ”€ create Timeshift snapshot if major change
                            โ”‚       โ”œโ”€โ”€ apply updates
                            โ”‚       โ””โ”€โ”€ reboot if required
                            โ”‚
                            โ”œโ”€โ”€ Production server?
                            โ”‚       โ”œโ”€โ”€ review packages
                            โ”‚       โ”œโ”€โ”€ confirm backup/snapshot
                            โ”‚       โ”œโ”€โ”€ test staging if critical
                            โ”‚       โ”œโ”€โ”€ schedule maintenance window
                            โ”‚       โ””โ”€โ”€ apply and verify
                            โ”‚
                            โ””โ”€โ”€ Cloud VM?
                            โ”œโ”€โ”€ snapshot or image
                            โ”œโ”€โ”€ patch
                            โ”œโ”€โ”€ reboot if required
                            โ””โ”€โ”€ validate health checks
Reboot-required checks
# Check if reboot is required
                            test -f /var/run/reboot-required && cat /var/run/reboot-required || echo "no reboot flag"

                            # Show packages that requested reboot if available
                            cat /var/run/reboot-required.pkgs 2>/dev/null

                            # Current kernel
                            uname -a

                            # Boot time
                            uptime
                            last reboot | head
Graphical update path
Ubuntu Desktop
                            โ”‚
                            โ”œโ”€โ”€ Open Software Updater
                            โ”œโ”€โ”€ Review proposed updates
                            โ”œโ”€โ”€ Install updates
                            โ”œโ”€โ”€ Reboot if requested
                            โ””โ”€โ”€ Verify desktop and main apps
Update warning: a kernel update is not active until reboot. Always check /var/run/reboot-required after maintenance.
Update strategy: safe patching, unattended upgrades and rollback

A good update strategy balances speed and safety. Security updates should not be postponed indefinitely, but critical systems need backups, staging tests and rollback paths. The more important the machine, the more controlled the update process must be.

StrategyBest forStrengthRisk
Manual updatesPersonal desktop, small servers.Human review before changes.Can be forgotten.
Unattended security upgradesStandard servers.Faster security patching.Needs reboot policy.
Scheduled patch windowProduction systems.Predictable maintenance.Emergency patches still need fast track.
Snapshot before updateDesktop, VM, cloud instances.Rollback-friendly.Snapshot does not replace data backup.
Blue/green replacementCloud application servers.Safer than in-place update.Requires automation.
Unattended upgrades
# Install unattended upgrades
                            sudo apt update
                            sudo apt install unattended-upgrades

                            # Enable basic automatic security updates
                            sudo dpkg-reconfigure unattended-upgrades

                            # Main configuration files
                            /etc/apt/apt.conf.d/20auto-upgrades
                            /etc/apt/apt.conf.d/50unattended-upgrades

                            # Logs
                            sudo less /var/log/unattended-upgrades/unattended-upgrades.log
Safe production patch workflow
Patch workflow
                            โ”‚
                            โ”œโ”€โ”€ Inventory
                            โ”‚       โ”œโ”€โ”€ OS version
                            โ”‚       โ”œโ”€โ”€ kernel version
                            โ”‚       โ”œโ”€โ”€ critical packages
                            โ”‚       โ””โ”€โ”€ running services
                            โ”‚
                            โ”œโ”€โ”€ Protect
                            โ”‚       โ”œโ”€โ”€ backup
                            โ”‚       โ”œโ”€โ”€ snapshot
                            โ”‚       โ”œโ”€โ”€ Timeshift on desktop
                            โ”‚       โ””โ”€โ”€ rollback plan
                            โ”‚
                            โ”œโ”€โ”€ Apply
                            โ”‚       โ”œโ”€โ”€ apt update
                            โ”‚       โ”œโ”€โ”€ review upgrades
                            โ”‚       โ”œโ”€โ”€ apt upgrade
                            โ”‚       โ””โ”€โ”€ reboot if required
                            โ”‚
                            โ””โ”€โ”€ Verify
                            โ”œโ”€โ”€ systemctl --failed
                            โ”œโ”€โ”€ logs
                            โ”œโ”€โ”€ ports
                            โ”œโ”€โ”€ app health
                            โ””โ”€โ”€ user validation
Post-update validation
# Failed services
                            systemctl --failed

                            # Recent warnings
                            journalctl -p warning --since "30 min ago"

                            # Listening ports
                            ss -lntp

                            # Disk and memory
                            df -h
                            free -h

                            # Web smoke test
                            curl -I http://localhost

                            # Package history
                            less /var/log/apt/history.log
Update risk matrix
Update typeRiskControl
KernelRequires reboot, driver risk.Snapshot and reboot window.
OpenSSL / libcService restart may be needed.Restart affected services.
Database packagesService compatibility.Backup and staging test.
Nginx / SSHAccess or web outage if config breaks.Validate config before restart.
Maintenance rule: update strategy is risk management. The critical question is not only โ€œcan I update?โ€, but โ€œcan I recover if the update fails?โ€.
Basic security with UFW firewall

UFW is Ubuntuโ€™s simple firewall interface. It helps expose only the ports required by the machine. A safe default is to deny incoming traffic, allow outgoing traffic, then explicitly allow SSH, web traffic or other required services.

PortServiceTypical exposureComment
22/tcpSSHRestricted source IP if possible.Administration access.
80/tcpHTTPPublic only for web server or redirect.Often redirects to HTTPS.
443/tcpHTTPSPublic for web application.Main public web port.
3306/tcpMySQL / MariaDBPrivate only.Never expose casually.
5432/tcpPostgreSQLPrivate only.Restrict to app server.
6379/tcpRedisPrivate only.Should not be public.
UFW baseline
# Check current firewall status
                            sudo ufw status verbose

                            # Default policies
                            sudo ufw default deny incoming
                            sudo ufw default allow outgoing

                            # Allow SSH before enabling firewall
                            sudo ufw allow OpenSSH

                            # Allow web traffic if needed
                            sudo ufw allow 80/tcp
                            sudo ufw allow 443/tcp

                            # Enable firewall
                            sudo ufw enable

                            # Verify rules
                            sudo ufw status verbose
                            sudo ufw status numbered
Firewall decision diagram
New service installed
                            โ”‚
                            โ”œโ”€โ”€ Does it need network access?
                            โ”‚       โ”œโ”€โ”€ no -> keep local only
                            โ”‚       โ””โ”€โ”€ yes
                            โ”‚
                            โ”œโ”€โ”€ Should it be public?
                            โ”‚       โ”œโ”€โ”€ yes -> open exact required port
                            โ”‚       โ””โ”€โ”€ no
                            โ”‚
                            โ”œโ”€โ”€ Should it be private?
                            โ”‚       โ”œโ”€โ”€ yes -> restrict by source IP or subnet
                            โ”‚       โ””โ”€โ”€ no
                            โ”‚
                            โ””โ”€โ”€ Is rule documented?
                            โ”œโ”€โ”€ yes -> apply rule
                            โ””โ”€โ”€ no -> do not expose yet
Restrict by source
# Allow SSH from one admin IP
                            sudo ufw allow from 203.0.113.10 to any port 22 proto tcp

                            # Allow PostgreSQL from one app server
                            sudo ufw allow from 10.0.1.25 to any port 5432 proto tcp

                            # Delete a numbered rule
                            sudo ufw status numbered
                            sudo ufw delete 3

                            # Deny a specific IP
                            sudo ufw deny from 198.51.100.44
UFW troubleshooting
# Firewall status
                            sudo ufw status verbose

                            # Listening ports
                            ss -lntp

                            # Local service test
                            curl -I http://localhost

                            # Kernel firewall logs if enabled
                            sudo ufw logging on
                            sudo journalctl -k --since "30 min ago" | grep UFW
Firewall warning: always allow and test SSH before enabling or tightening UFW on a remote server.
Timeshift: system restore points for safer changes

Timeshift creates system restore points. It is useful on Ubuntu Desktop and some workstation scenarios before major updates, driver changes, package experiments or risky configuration changes. It is not a full personal-data backup solution by itself: it mainly protects system state.

Timeshift conceptMeaningOperational note
SnapshotRestore point of system files.Useful before upgrades.
RSYNC modeFile-based snapshot mode.Works on common filesystems.
BTRFS modeFilesystem snapshot mode.Requires BTRFS layout.
ScheduleAutomatic snapshot frequency.Daily, weekly, monthly policies.
RestoreReturn system to previous state.Can recover from bad update or config.
ExclusionsPaths not included.Understand home/data behavior.
Install Timeshift
# Install Timeshift
                            sudo apt update
                            sudo apt install timeshift

                            # Launch graphical interface
                            sudo timeshift-gtk

                            # CLI help
                            timeshift --help

                            # List snapshots
                            sudo timeshift --list
Timeshift workflow
Before risky change
                            โ”‚
                            โ”œโ”€โ”€ Open Timeshift
                            โ”œโ”€โ”€ Create snapshot
                            โ”œโ”€โ”€ Name or comment the snapshot
                            โ”œโ”€โ”€ Apply update or configuration change
                            โ”œโ”€โ”€ Reboot if required
                            โ”œโ”€โ”€ Verify system works
                            โ””โ”€โ”€ Keep or delete snapshot later
When to create a snapshot
Create a Timeshift snapshot before:
                            - major system update
                            - release upgrade
                            - driver installation
                            - desktop environment change
                            - kernel experiment
                            - repository or PPA experiment
                            - risky configuration edit
                            - important package removal
Timeshift vs backup
NeedTimeshiftData backup
Restore broken system updateExcellent.Not primary role.
Recover deleted personal fileNot always sufficient.Best tool.
Recover from disk failureOnly if snapshot stored elsewhere.Required.
Recover database stateNot ideal.Use database backup.
Recovery warning: Timeshift is not a replacement for backups. A restore point helps with system rollback, while backups protect personal or business data.
Backup model: system restore, personal data and server data

A complete protection strategy separates system restore from data backup. Timeshift can help restore the OS state. Personal files, project folders, databases, uploads, secrets and configuration must also be backed up separately.

Data typeRecommended protectionExample path
System filesTimeshift or VM snapshot./etc, packages, system state.
Personal filesFile backup to external disk or cloud./home/user/Documents
Project codeGit remote and file backup./home/user/projects
DatabasesDatabase-native dump and volume backup.PostgreSQL, MySQL, MariaDB.
Application uploadsFile backup with retention./srv/app/media
SecretsSecure secret backup or vault..env, keys, certificates.
Simple rsync backup example
# Backup home directory to external disk
                            rsync -aHAX --info=progress2 \
                            /home/user/ \
                            /media/user/backup/home-user/

                            # Backup project directory
                            rsync -a --delete \
                            /srv/myapp/ \
                            /backup/myapp/

                            # Dry run first
                            rsync -a --dry-run /source/ /destination/
Backup strategy diagram
Protection strategy
                            โ”‚
                            โ”œโ”€โ”€ System restore
                            โ”‚       โ”œโ”€โ”€ Timeshift
                            โ”‚       โ”œโ”€โ”€ VM snapshot
                            โ”‚       โ””โ”€โ”€ cloud image
                            โ”‚
                            โ”œโ”€โ”€ Data backup
                            โ”‚       โ”œโ”€โ”€ documents
                            โ”‚       โ”œโ”€โ”€ projects
                            โ”‚       โ”œโ”€โ”€ uploads
                            โ”‚       โ””โ”€โ”€ databases
                            โ”‚
                            โ”œโ”€โ”€ Configuration backup
                            โ”‚       โ”œโ”€โ”€ /etc
                            โ”‚       โ”œโ”€โ”€ service units
                            โ”‚       โ”œโ”€โ”€ nginx configs
                            โ”‚       โ””โ”€โ”€ SSH configs
                            โ”‚
                            โ””โ”€โ”€ Restore test
                            โ”œโ”€โ”€ can files be restored?
                            โ”œโ”€โ”€ can database be restored?
                            โ”œโ”€โ”€ can server boot?
                            โ””โ”€โ”€ is procedure documented?
Database backup examples
# PostgreSQL dump
                            pg_dump -U app_user -h localhost app_db > app_db.sql

                            # PostgreSQL compressed dump
                            pg_dump -U app_user -h localhost app_db | gzip > app_db.sql.gz

                            # MySQL / MariaDB dump
                            mysqldump -u app_user -p app_db > app_db.sql

                            # MySQL / MariaDB compressed dump
                            mysqldump -u app_user -p app_db | gzip > app_db.sql.gz
Backup quality checklist
[ ] Backup is automatic
                            [ ] Backup includes data, not only system files
                            [ ] Backup destination is separate from source disk
                            [ ] Backup has retention policy
                            [ ] Backup is encrypted if sensitive
                            [ ] Restore has been tested
                            [ ] Database backups are consistent
                            [ ] Secrets are protected
                            [ ] Backup logs are reviewed
                            [ ] Owner and schedule are documented
Backup rule: a backup is only proven when restore has been tested.
Log management: reading system journals when problems occur

Logs are the first source of truth when Ubuntu behaves unexpectedly. They show service failures, authentication attempts, package operations, kernel events, disk errors, network issues and application errors.

Log sourceContainsCommand
systemd journalService and system events.journalctl
Service logsOne daemon timeline.journalctl -u SERVICE
Kernel logsOOM, disk, driver, hardware events.journalctl -k
Authentication logsSSH, sudo, login attempts./var/log/auth.log
System logGeneral system messages./var/log/syslog
APT logsPackage updates and installs./var/log/apt/history.log
journalctl essentials
# Recent errors and context
                            journalctl -xe

                            # Current boot logs
                            journalctl -b

                            # Previous boot logs
                            journalctl -b -1

                            # Service logs
                            journalctl -u nginx

                            # Service logs with time window
                            journalctl -u nginx --since "1 hour ago"

                            # Follow service logs live
                            journalctl -u nginx -f

                            # Warnings and errors
                            journalctl -p warning --since today

                            # Kernel logs
                            journalctl -k --since today
Classic log commands
# System log
                            sudo tail -200 /var/log/syslog

                            # Authentication log
                            sudo tail -200 /var/log/auth.log

                            # APT history
                            less /var/log/apt/history.log

                            # Search errors
                            grep -i "error" /var/log/syslog

                            # Search failed SSH attempts
                            sudo grep -i "failed password" /var/log/auth.log | tail -100

                            # Search sudo usage
                            sudo grep -i "sudo" /var/log/auth.log | tail -100

                            # Search OOM events
                            journalctl -k --since today | grep -i -E "oom|killed process"
Log reading workflow
Problem detected
                            โ”‚
                            โ”œโ”€โ”€ Identify time window
                            โ”œโ”€โ”€ Check failed services
                            โ”œโ”€โ”€ Read service journal
                            โ”œโ”€โ”€ Read system warnings
                            โ”œโ”€โ”€ Read kernel logs
                            โ”œโ”€โ”€ Check auth logs if access issue
                            โ”œโ”€โ”€ Check apt history if after update
                            โ””โ”€โ”€ Find first meaningful error
Journal size control
# Show journal disk usage
                            journalctl --disk-usage

                            # Keep only last 14 days
                            sudo journalctl --vacuum-time=14d

                            # Keep journal under 1 GB
                            sudo journalctl --vacuum-size=1G
Log rule: use time windows. --since "30 min ago" is often more useful than reading thousands of old lines.
Troubleshooting maintenance problems

Maintenance can fail: updates may be interrupted, repositories may break, firewall rules may block access, Timeshift snapshots may fill disk space, logs may grow, or services may fail after a package upgrade. Diagnose from the exact symptom.

SymptomLikely causeFirst commandFix direction
APT lockedAnother package process running.ps aux | grep -E 'apt|dpkg'Wait or investigate process.
Broken packagesInterrupted install.sudo dpkg --configure -aRepair package state.
No network after UFWRequired port blocked.sudo ufw status numberedAllow required rule or rollback.
SSH locked outFirewall or SSH config error.Console access, UFW and SSH status.Restore SSH path safely.
Disk fullLogs, snapshots, cache, Docker.df -h, du -shClean safely and add retention.
Service failed after updateConfig change or dependency issue.systemctl status SERVICERead logs, rollback or fix config.
APT repair commands
# Finish interrupted package configuration
                            sudo dpkg --configure -a

                            # Fix broken dependencies
                            sudo apt -f install

                            # Refresh metadata
                            sudo apt update

                            # Clean package cache
                            sudo apt clean

                            # Remove unused dependencies
                            sudo apt autoremove

                            # Review update history
                            less /var/log/apt/history.log
                            less /var/log/apt/term.log
Maintenance failure decision tree
Maintenance issue
                            โ”‚
                            โ”œโ”€โ”€ Package manager error?
                            โ”‚       โ”œโ”€โ”€ lock -> check apt/dpkg process
                            โ”‚       โ”œโ”€โ”€ broken -> dpkg --configure -a
                            โ”‚       โ””โ”€โ”€ repo -> inspect apt sources
                            โ”‚
                            โ”œโ”€โ”€ Firewall issue?
                            โ”‚       โ”œโ”€โ”€ check UFW rules
                            โ”‚       โ”œโ”€โ”€ verify SSH rule
                            โ”‚       โ””โ”€โ”€ test required ports
                            โ”‚
                            โ”œโ”€โ”€ Disk issue?
                            โ”‚       โ”œโ”€โ”€ check df -h
                            โ”‚       โ”œโ”€โ”€ check logs
                            โ”‚       โ”œโ”€โ”€ check snapshots
                            โ”‚       โ””โ”€โ”€ clean safely
                            โ”‚
                            โ”œโ”€โ”€ Service issue?
                            โ”‚       โ”œโ”€โ”€ systemctl status
                            โ”‚       โ”œโ”€โ”€ journalctl -u service
                            โ”‚       โ””โ”€โ”€ validate config
                            โ”‚
                            โ””โ”€โ”€ Bad update?
                            โ”œโ”€โ”€ use Timeshift if desktop
                            โ”œโ”€โ”€ use snapshot if VM
                            โ””โ”€โ”€ rollback package or config
Disk cleanup for maintenance
# Disk usage
                            df -h

                            # Large top-level directories
                            sudo du -xhd1 / 2>/dev/null | sort -h

                            # Journal usage and cleanup
                            journalctl --disk-usage
                            sudo journalctl --vacuum-time=14d

                            # APT cleanup
                            sudo apt clean
                            sudo apt autoremove

                            # Timeshift snapshots
                            sudo timeshift --list
Recovery rule: if the system is unstable after a major update, prefer a known restore point over random manual changes.
Maintenance routine: daily, weekly, monthly and before major changes

A simple routine prevents many incidents. The goal is not to spend hours every day, but to maintain visibility: update status, disk usage, failed services, logs, backup state and restore readiness.

FrequencyActionsCommands / tools
DailyCheck failed services and critical alerts.systemctl --failed, monitoring.
WeeklyReview updates, disk usage and warnings.apt list --upgradable, df -h.
MonthlyApply updates, reboot if needed, verify backups.apt upgrade, backup logs.
Before major changeCreate restore point or snapshot.Timeshift, VM snapshot, cloud snapshot.
After incidentReview logs and add prevention.journalctl, runbook update.
Weekly maintenance command block
echo "== SYSTEM =="
                            hostnamectl
                            uptime

                            echo "== UPDATES =="
                            sudo apt update
                            apt list --upgradable

                            echo "== DISK =="
                            df -h

                            echo "== FAILED SERVICES =="
                            systemctl --failed

                            echo "== WARNINGS TODAY =="
                            journalctl -p warning --since today --no-pager | tail -100

                            echo "== REBOOT REQUIRED =="
                            test -f /var/run/reboot-required && cat /var/run/reboot-required || echo "no reboot flag"
Maintenance calendar
Daily
                            โ”œโ”€โ”€ monitor alerts
                            โ”œโ”€โ”€ failed services
                            โ””โ”€โ”€ backup success

                            Weekly
                            โ”œโ”€โ”€ package update review
                            โ”œโ”€โ”€ disk space review
                            โ”œโ”€โ”€ log warnings review
                            โ””โ”€โ”€ firewall exposure review

                            Monthly
                            โ”œโ”€โ”€ apply updates
                            โ”œโ”€โ”€ reboot if required
                            โ”œโ”€โ”€ test restore sample
                            โ”œโ”€โ”€ cleanup old logs/snapshots
                            โ””โ”€โ”€ review users and sudo

                            Before major upgrade
                            โ”œโ”€โ”€ backup data
                            โ”œโ”€โ”€ Timeshift or VM snapshot
                            โ”œโ”€โ”€ record current version
                            โ”œโ”€โ”€ apply change
                            โ””โ”€โ”€ verify and document
Server maintenance record
Maintenance record:
                            - date
                            - hostname
                            - Ubuntu version
                            - packages updated
                            - reboot required
                            - reboot performed
                            - services checked
                            - disk usage
                            - backup status
                            - warnings found
                            - actions taken
                            - rollback point
                            - operator
Routine rule: the best maintenance routine is the one you can actually repeat. Keep it simple, observable and documented.
Final maintenance and security checklist
Maintenance checklist
[ ] Ubuntu LTS version is known
                            [ ] Package updates are reviewed regularly
                            [ ] Security updates are applied
                            [ ] Reboot-required status is checked
                            [ ] Reboot window exists for servers
                            [ ] Failed services are checked
                            [ ] Disk usage is monitored
                            [ ] Journal size is controlled
                            [ ] APT history is reviewed after updates
                            [ ] Timeshift is configured on desktop/workstation
                            [ ] Restore point is created before major changes
                            [ ] Data backup exists
                            [ ] Restore has been tested
                            [ ] Logs are readable
                            [ ] Maintenance actions are documented
Security checklist
[ ] UFW is enabled when appropriate
                            [ ] Default incoming policy is deny
                            [ ] Only required ports are open
                            [ ] SSH is protected
                            [ ] SSH source is restricted if possible
                            [ ] Root SSH login is disabled
                            [ ] Password SSH is disabled after key test
                            [ ] Users and sudo group are reviewed
                            [ ] Secrets are not world-readable
                            [ ] Backups are protected
                            [ ] Firewall rules are documented
                            [ ] Logs are reviewed after suspicious activity
Command cheat sheet
# Updates
                            sudo apt update
                            apt list --upgradable
                            sudo apt upgrade
                            sudo apt autoremove
                            sudo apt clean
                            test -f /var/run/reboot-required && cat /var/run/reboot-required

                            # Firewall
                            sudo ufw status verbose
                            sudo ufw default deny incoming
                            sudo ufw default allow outgoing
                            sudo ufw allow OpenSSH
                            sudo ufw allow 80/tcp
                            sudo ufw allow 443/tcp
                            sudo ufw enable

                            # Timeshift
                            sudo apt install timeshift
                            sudo timeshift-gtk
                            sudo timeshift --list

                            # Logs
                            journalctl -xe
                            journalctl -p warning --since today
                            journalctl -u SERVICE --since "1 hour ago"
                            journalctl -k --since today
                            sudo tail -100 /var/log/auth.log
                            less /var/log/apt/history.log

                            # Health
                            systemctl --failed
                            df -h
                            free -h
                            ss -lntp
Final rule
A well-maintained Ubuntu system is updated, protected, observable and recoverable.
Apply updates with a rollback plan, restrict network exposure with UFW, create restore points before risky changes, back up real data, read logs when problems occur, and keep a repeatable maintenance routine.
Minimal safe maintenance profile
Minimum safe profile:
                            - updates applied regularly
                            - reboot-required checked
                            - UFW configured
                            - SSH protected
                            - Timeshift or snapshot before major changes
                            - real data backup
                            - restore tested
                            - logs reviewed
                            - failed services checked
                            - disk usage monitored
                            - maintenance documented
2.1 Ubuntu Installation: Desktop, Server, ISO, UEFI, partitions, SSH, cloud-init and clean post-install
Installation scope

Ubuntu installation depends on the target: desktop workstation, production server, cloud VM, container host, lab machine or hardened bastion. The installation itself is only the first step. A clean Ubuntu setup also includes users, SSH, updates, firewall, time sync, storage layout, service baseline, logs, monitoring and backup strategy.

The professional approach is to install with a clear target architecture: what the machine will host, how it will be accessed, how it will be patched, how it will be monitored, and how it can be rebuilt.

TargetInstallerKey choicesPost-install priority
Desktop workstationUbuntu Desktop ISOGUI, disk encryption, drivers, developer tools.Updates, IDE, Docker, SSH keys, backups.
Server VMUbuntu Server ISO or cloud imageSSH, LVM, static IP, no GUI, minimal packages.Hardening, firewall, systemd, monitoring.
Cloud serverCloud imagecloud-init, SSH key, security group, disk size.Bootstrap automation, logging, backup, patching.
Database serverServer ISO or imageDisk layout, filesystem, I/O, backup volume.Storage monitoring, backup, security, tuning.
Container hostServer LTSDisk for Docker, cgroups, kernel, network.Docker, log rotation, registry access, metrics.
Core rule: an Ubuntu installation is not complete when the machine boots. It is complete when access, security, updates, logs, storage, monitoring and recovery are under control.
Installation flow map
Installation workflow
                            โ”‚
                            โ”œโ”€โ”€ Choose target
                            โ”‚       โ”œโ”€โ”€ desktop
                            โ”‚       โ”œโ”€โ”€ server
                            โ”‚       โ”œโ”€โ”€ cloud VM
                            โ”‚       โ””โ”€โ”€ container host
                            โ”‚
                            โ”œโ”€โ”€ Prepare media
                            โ”‚       โ”œโ”€โ”€ download ISO
                            โ”‚       โ”œโ”€โ”€ verify checksum
                            โ”‚       โ”œโ”€โ”€ create USB key
                            โ”‚       โ””โ”€โ”€ boot in UEFI mode
                            โ”‚
                            โ”œโ”€โ”€ Install system
                            โ”‚       โ”œโ”€โ”€ language and keyboard
                            โ”‚       โ”œโ”€โ”€ network
                            โ”‚       โ”œโ”€โ”€ disk layout
                            โ”‚       โ”œโ”€โ”€ user account
                            โ”‚       โ”œโ”€โ”€ SSH server
                            โ”‚       โ””โ”€โ”€ base packages
                            โ”‚
                            โ””โ”€โ”€ Post-install
                            โ”œโ”€โ”€ update packages
                            โ”œโ”€โ”€ harden SSH
                            โ”œโ”€โ”€ configure firewall
                            โ”œโ”€โ”€ enable monitoring
                            โ”œโ”€โ”€ configure backups
                            โ””โ”€โ”€ document server
Official URLs
Ubuntu downloads:
                            https://ubuntu.com/download

                            Ubuntu Server documentation:
                            https://documentation.ubuntu.com/server/

                            Ubuntu Desktop documentation:
                            https://documentation.ubuntu.com/desktop/

                            Ubuntu release images:
                            https://releases.ubuntu.com/

                            Ubuntu cloud images:
                            https://cloud-images.ubuntu.com/

                            cloud-init documentation:
                            https://cloudinit.readthedocs.io/
Ubuntu Desktop installation

Ubuntu Desktop installation is designed for workstations: developers, engineers, analysts and general desktop users. The main choices are language, keyboard, network, installation type, disk encryption, user account and optional third-party drivers.

Desktop install path
1. Download Ubuntu Desktop ISO
                            2. Verify checksum if required
                            3. Create bootable USB key
                            4. Boot in UEFI mode
                            5. Select language and keyboard
                            6. Connect to network
                            7. Choose normal or minimal install
                            8. Enable third-party drivers if needed
                            9. Choose disk layout
                            10. Enable encryption if laptop or sensitive data
                            11. Create admin user
                            12. Install and reboot
                            13. Remove USB key
                            14. Run updates
                            15. Install development tools
ChoiceRecommended optionReason
ReleaseLTS for stable workstation.Less upgrade pressure.
Install typeNormal for general use, minimal for clean dev setup.Controls preinstalled apps.
Disk encryptionYes on laptop.Protects data if machine is lost.
Third-party driversEnable if NVIDIA or Wi-Fi requires it.Improves hardware compatibility.
PartitioningAutomatic unless dual boot or advanced layout.Simple and safe for most users.
Desktop post-install developer baseline
# Update system
                            sudo apt update
                            sudo apt upgrade

                            # Install useful tools
                            sudo apt install curl wget git vim htop tree unzip ca-certificates

                            # Install build basics
                            sudo apt install build-essential pkg-config

                            # Install Python essentials
                            sudo apt install python3 python3-venv python3-pip

                            # Check version
                            lsb_release -a
                            uname -a
Developer workstation map
Ubuntu Desktop
                            โ”‚
                            โ”œโ”€โ”€ Terminal
                            โ”œโ”€โ”€ Git
                            โ”œโ”€โ”€ Python / Node / Java / Go
                            โ”œโ”€โ”€ Docker Desktop or Docker Engine
                            โ”œโ”€โ”€ IDE
                            โ”œโ”€โ”€ SSH keys
                            โ”œโ”€โ”€ browser dev tools
                            โ”œโ”€โ”€ cloud CLIs
                            โ””โ”€โ”€ VPN / security tooling
Desktop rule: for a serious developer machine, keep the OS stable, version your dotfiles, back up important files, use SSH keys, and avoid random system modifications that cannot be reproduced.
Ubuntu Server installation

Ubuntu Server installation is usually text-based and focused on production readiness: network, storage, user account, SSH, package selection and minimal attack surface. A server should normally be installed without a desktop environment.

Server install path
1. Download Ubuntu Server ISO
                            2. Boot in UEFI mode
                            3. Select language and keyboard
                            4. Configure network
                            - DHCP for simple cases
                            - static IP for fixed infrastructure
                            5. Configure proxy if needed
                            6. Configure apt mirror
                            7. Choose disk layout
                            - guided LVM for most servers
                            - manual for advanced storage
                            8. Create admin user
                            9. Install OpenSSH server
                            10. Import SSH key if available
                            11. Select minimal server packages
                            12. Install bootloader
                            13. Reboot
                            14. Connect by SSH
                            15. Run post-install baseline
Server choiceRecommendationWhy
GUINo GUI on production server.Lower resource usage and smaller attack surface.
SSHInstall OpenSSH during setup.Remote administration required.
UserNamed sudo user.Avoid direct root workflow.
DiskLVM for flexible servers.Easier resizing and volume management.
PackagesMinimal baseline.Install only what is needed.
Server install architecture
Bare metal or VM
                            โ”‚
                            โ–ผ
                            Ubuntu Server installer
                            โ”‚
                            โ”œโ”€โ”€ network setup
                            โ”œโ”€โ”€ disk layout
                            โ”œโ”€โ”€ user creation
                            โ”œโ”€โ”€ SSH setup
                            โ”œโ”€โ”€ package baseline
                            โ””โ”€โ”€ bootloader
                            โ”‚
                            โ–ผ
                            First boot
                            โ”‚
                            โ”œโ”€โ”€ SSH login
                            โ”œโ”€โ”€ update packages
                            โ”œโ”€โ”€ harden access
                            โ”œโ”€โ”€ configure firewall
                            โ”œโ”€โ”€ install services
                            โ””โ”€โ”€ enable monitoring
First server commands
# Update package index and upgrade
                            sudo apt update
                            sudo apt upgrade

                            # Install baseline tools
                            sudo apt install curl wget vim git htop tree unzip net-tools dnsutils

                            # Check services
                            systemctl --failed
                            systemctl status ssh

                            # Check network
                            ip a
                            ip r
                            ss -lntp

                            # Check storage
                            lsblk
                            df -h
Production warning: do not expose a fresh server directly without SSH hardening, firewall rules, update policy and monitoring.
UEFI, BIOS, boot media and installation verification

Modern Ubuntu installations should normally boot in UEFI mode. UEFI affects the boot partition, bootloader installation and compatibility with Secure Boot. If the USB key is booted in legacy BIOS mode, the final installation may not match the target firmware configuration.

Boot conceptMeaningPractical rule
UEFIModern firmware boot mode.Preferred for new machines and servers.
Legacy BIOSOlder boot mode.Use only if hardware requires it.
ESPEFI System Partition.Required for UEFI boot.
Secure BootFirmware validation of boot chain.Usually supported, but test with custom drivers.
Boot orderFirmware decides which disk or USB boots first.Verify after installation.
Boot media preparation
Recommended flow:
                            1. Download ISO from official source
                            2. Verify ISO checksum if required
                            3. Write USB with Rufus, Balena Etcher or dd
                            4. Boot USB in UEFI mode
                            5. Install Ubuntu
                            6. Reboot without USB key
                            7. Confirm system boots from target disk
UEFI disk layout sketch
Disk /dev/sda
                            โ”‚
                            โ”œโ”€โ”€ EFI System Partition
                            โ”‚       โ”œโ”€โ”€ size: 512 MB to 1 GB
                            โ”‚       โ”œโ”€โ”€ filesystem: FAT32
                            โ”‚       โ””โ”€โ”€ mount: /boot/efi
                            โ”‚
                            โ”œโ”€โ”€ /boot
                            โ”‚       โ”œโ”€โ”€ optional separate partition
                            โ”‚       โ””โ”€โ”€ kernel and initramfs
                            โ”‚
                            โ”œโ”€โ”€ LVM physical volume
                            โ”‚       โ”œโ”€โ”€ root volume /
                            โ”‚       โ”œโ”€โ”€ var volume /var
                            โ”‚       โ”œโ”€โ”€ home volume /home
                            โ”‚       โ””โ”€โ”€ swap volume or swapfile
                            โ”‚
                            โ””โ”€โ”€ free space or data volumes
Boot verification commands
# Check if system booted in UEFI mode
                            test -d /sys/firmware/efi && echo "UEFI boot" || echo "Legacy boot"

                            # Show block devices
                            lsblk -f

                            # Show EFI boot entries
                            sudo efibootmgr -v

                            # Show mounted filesystems
                            findmnt

                            # Show boot partition
                            findmnt /boot/efi
UEFI rule: boot the installer in the same mode you want the installed system to use. Mixing legacy and UEFI often creates bootloader confusion.
Disk layout, partitions, LVM, encryption and swap

Disk layout should reflect the server role. A laptop usually benefits from full-disk encryption. A server often benefits from LVM. A database server needs careful storage planning. A Docker host needs enough space under /var/lib/docker.

PatternBest forStrengthWatch out
Automatic layoutDesktop, lab, simple VM.Fast and low-risk.Less control over growth areas.
LVMServers and VMs.Flexible resizing and volume management.Requires basic LVM knowledge.
Encrypted diskLaptops and sensitive systems.Protects data at rest.Remote boot can be harder.
Separate /varServers with logs, caches, Docker.Protects root filesystem from log growth.Size must be planned.
Separate data volumeDatabase and application data.Cleaner backup and scaling.Mount and permission discipline required.
Example server layout
Small web server:
                            - /boot/efi     512 MB to 1 GB
                            - /             30 GB to 50 GB
                            - /var          20 GB to 100 GB
                            - /home         optional
                            - swap          swapfile or LV
                            - /srv          application data if needed

                            Docker host:
                            - /             30 GB to 50 GB
                            - /var          large volume
                            - /var/lib/docker on dedicated volume if possible

                            Database host:
                            - /             30 GB to 50 GB
                            - /var/log      separate or monitored
                            - /data         dedicated fast volume
                            - /backup       separate volume or external storage
Disk decision tree
Is it a laptop?
                            โ”œโ”€โ”€ yes -> enable disk encryption
                            โ””โ”€โ”€ no
                            โ”‚
                            โ–ผ
                            Is it a production server?
                            โ”œโ”€โ”€ yes -> prefer LVM or cloud volume strategy
                            โ””โ”€โ”€ no -> automatic layout is acceptable

                            Will logs, Docker or DB grow?
                            โ”œโ”€โ”€ yes -> separate /var or data volume
                            โ””โ”€โ”€ no -> simple root filesystem

                            Need easy snapshot/resize?
                            โ”œโ”€โ”€ yes -> LVM or cloud block volumes
                            โ””โ”€โ”€ no -> simple partitioning
Storage inspection commands
# Show disks and partitions
                            lsblk

                            # Show filesystems
                            lsblk -f

                            # Show disk usage
                            df -h

                            # Show directory usage
                            sudo du -sh /var/*

                            # Show mounts
                            findmnt

                            # Show LVM volumes
                            sudo pvs
                            sudo vgs
                            sudo lvs

                            # Show swap
                            swapon --show
                            free -h
Production risk: if /var fills up, logs, Docker, package installs, databases and services can fail. Monitor disk usage from day one.
Network, SSH and remote access baseline

Server installation must make remote access reliable and safe. The minimum baseline is: one named sudo user, SSH key access, password authentication disabled where possible, root login disabled, firewall enabled and only required ports opened.

AreaBaselineReason
Admin userNamed user with sudo rights.Audit and safer administration.
SSH keysKey-based access.Stronger than passwords.
Root loginDisabled.Reduces brute-force and blast radius.
Password authDisabled after key validation.Reduces attack surface.
FirewallDefault deny incoming.Only expose required services.
Network configDHCP for simple cases, static for infrastructure.Predictable access.
SSH hardening example
# Backup SSH config
                            sudo cp /etc/ssh/sshd_config /etc/ssh/sshd_config.bak

                            # Edit SSH config
                            sudo vim /etc/ssh/sshd_config

                            # Recommended directives
                            PermitRootLogin no
                            PasswordAuthentication no
                            PubkeyAuthentication yes
                            AllowUsers deploy

                            # Validate and restart
                            sudo sshd -t
                            sudo systemctl restart ssh
Firewall baseline
# Enable UFW
                            sudo ufw default deny incoming
                            sudo ufw default allow outgoing

                            # Allow SSH
                            sudo ufw allow OpenSSH

                            # Web server example
                            sudo ufw allow 80/tcp
                            sudo ufw allow 443/tcp

                            # Enable firewall
                            sudo ufw enable

                            # Check status
                            sudo ufw status verbose
Network diagnostic commands
# IP addresses
                            ip a

                            # Routes
                            ip r

                            # DNS status
                            resolvectl status

                            # Listening ports
                            ss -lntp

                            # Test local service
                            curl -I http://localhost

                            # Test remote host
                            ping -c 3 8.8.8.8

                            # Trace path
                            tracepath ubuntu.com
Important: before disabling SSH password authentication, verify that key-based login works in a second terminal. Otherwise, you can lock yourself out.
cloud-init for automated server bootstrap

cloud-init is the standard way to initialize Ubuntu cloud images. It can create users, install packages, add SSH keys, write files, run commands, configure timezone and prepare the machine during first boot.

cloud-init featureUsageExample
usersCreate admin users.deploy user with sudo.
ssh_authorized_keysInstall public keys.Key-based access from first boot.
packagesInstall baseline packages.curl, git, htop, nginx.
write_filesCreate config files.systemd unit, app config, banner.
runcmdRun final bootstrap commands.enable firewall, restart service.
package_updateRefresh apt cache.Update before package install.
Minimal cloud-init example
#cloud-config
                            package_update: true
                            package_upgrade: true

                            users:
                            - name: deploy
                            groups: sudo
                            shell: /bin/bash
                            sudo: ['ALL=(ALL) NOPASSWD:ALL']
                            ssh_authorized_keys:
                            - ssh-ed25519 AAAA_REPLACE_WITH_PUBLIC_KEY deploy-key

                            packages:
                            - curl
                            - wget
                            - git
                            - vim
                            - htop
                            - ufw

                            runcmd:
                            - ufw allow OpenSSH
                            - ufw --force enable
                            - timedatectl set-timezone UTC
cloud-init lifecycle
Cloud VM first boot
                            โ”‚
                            โ–ผ
                            cloud-init starts
                            โ”‚
                            โ”œโ”€โ”€ reads metadata
                            โ”œโ”€โ”€ reads user-data
                            โ”œโ”€โ”€ configures hostname
                            โ”œโ”€โ”€ creates users
                            โ”œโ”€โ”€ installs SSH keys
                            โ”œโ”€โ”€ installs packages
                            โ”œโ”€โ”€ writes files
                            โ”œโ”€โ”€ runs commands
                            โ””โ”€โ”€ marks initialization done
                            โ”‚
                            โ–ผ
                            Server ready for automation
                            โ”œโ”€โ”€ Ansible
                            โ”œโ”€โ”€ deploy pipeline
                            โ”œโ”€โ”€ monitoring
                            โ””โ”€โ”€ application install
cloud-init diagnostics
# Show cloud-init status
                            cloud-init status

                            # Wait until finished
                            cloud-init status --wait

                            # Inspect logs
                            sudo less /var/log/cloud-init.log
                            sudo less /var/log/cloud-init-output.log

                            # Validate config file if tool is available
                            cloud-init schema --config-file user-data.yaml

                            # Re-run is not trivial on production
                            # Prefer rebuilding disposable cloud instances
Cloud rule: use cloud-init for first-boot bootstrap, then use Ansible, Terraform, scripts or configuration management for repeatable lifecycle operations.
Clean post-install baseline

Post-install is where a raw Ubuntu machine becomes a clean operating platform. The goal is to make the system secure, updated, observable, recoverable and ready for application deployment.

Post-install baseline commands
# Update system
                            sudo apt update
                            sudo apt upgrade

                            # Install useful admin tools
                            sudo apt install curl wget vim git htop tree unzip ca-certificates dnsutils

                            # Set timezone
                            timedatectl
                            sudo timedatectl set-timezone UTC

                            # Check failed units
                            systemctl --failed

                            # Check logs
                            journalctl -p warning --since today

                            # Check reboot requirement
                            test -f /var/run/reboot-required && cat /var/run/reboot-required
Post-install actionCommand / fileWhy
Update packagesapt update && apt upgradeApply latest security fixes.
Create admin useradduser, usermod -aG sudoAvoid root workflow.
Harden SSH/etc/ssh/sshd_configReduce remote access risk.
Enable firewallufwExpose only required ports.
Configure timetimedatectlCorrect logs and certificates.
Install monitoringagent or exporterDetect issues early.
Clean server baseline
Fresh Ubuntu Server
                            โ”‚
                            โ”œโ”€โ”€ system update
                            โ”œโ”€โ”€ admin user
                            โ”œโ”€โ”€ SSH key access
                            โ”œโ”€โ”€ root login disabled
                            โ”œโ”€โ”€ password auth disabled
                            โ”œโ”€โ”€ firewall enabled
                            โ”œโ”€โ”€ timezone configured
                            โ”œโ”€โ”€ monitoring installed
                            โ”œโ”€โ”€ log policy checked
                            โ”œโ”€โ”€ backups configured
                            โ”œโ”€โ”€ service manager ready
                            โ””โ”€โ”€ runbook documented
Server documentation template
Server record:
                            - hostname
                            - Ubuntu version
                            - kernel version
                            - role
                            - owner
                            - public IP
                            - private IP
                            - SSH port
                            - open firewall ports
                            - installed services
                            - data volumes
                            - backup policy
                            - monitoring URL
                            - patching window
                            - rollback method
                            - emergency contact
Professional habit: document the server immediately after installation. Six months later, this avoids guessing what the machine is and how it was built.
Installation troubleshooting
ProblemLikely causeFirst diagnosticCorrection
USB does not bootBad USB image, wrong boot mode, firmware order.Check UEFI boot menu.Rewrite USB, select UEFI USB entry.
Installed system does not bootBootloader installed in wrong mode or disk.Check UEFI entries.Repair bootloader or reinstall in correct mode.
No network during installDriver, cable, DHCP, VLAN, Wi-Fi issue.Check link and IP.Use wired network or configure static IP.
Cannot SSH after installSSH not installed, firewall, wrong IP, bad key.Console login and systemctl status ssh.Install SSH, fix firewall, verify key.
Disk full after installSmall root, logs, Docker, wrong partition plan.df -h, du -sh.Resize volume, clean logs, separate /var.
Package install failsBroken apt state, no DNS, mirror issue.apt update, DNS check.Fix DNS, mirror, dpkg configure.
Diagnostic decision tree
Fresh server problem
                            โ”‚
                            โ”œโ”€โ”€ Does it boot?
                            โ”‚       โ”œโ”€โ”€ no -> UEFI, bootloader, disk
                            โ”‚       โ””โ”€โ”€ yes
                            โ”‚
                            โ”œโ”€โ”€ Does it have network?
                            โ”‚       โ”œโ”€โ”€ no -> IP, route, DNS, driver
                            โ”‚       โ””โ”€โ”€ yes
                            โ”‚
                            โ”œโ”€โ”€ Can you SSH?
                            โ”‚       โ”œโ”€โ”€ no -> ssh service, firewall, key, IP
                            โ”‚       โ””โ”€โ”€ yes
                            โ”‚
                            โ”œโ”€โ”€ Are packages working?
                            โ”‚       โ”œโ”€โ”€ no -> DNS, apt mirror, dpkg lock
                            โ”‚       โ””โ”€โ”€ yes
                            โ”‚
                            โ””โ”€โ”€ Are baseline services healthy?
                            โ”œโ”€โ”€ no -> systemctl and journalctl
                            โ””โ”€โ”€ yes -> server ready
Emergency commands
# SSH service
                            sudo systemctl status ssh
                            sudo systemctl restart ssh

                            # Firewall
                            sudo ufw status verbose

                            # Network
                            ip a
                            ip r
                            resolvectl status

                            # Package repair
                            sudo dpkg --configure -a
                            sudo apt -f install
                            sudo apt update

                            # Logs
                            journalctl -p err --since today
                            dmesg -T | tail -100
Incident rule: on a new install, most failures are boot mode, network, SSH, firewall, disk or package mirror. Diagnose in that order.
Final installation checklist
Before install
[ ] Target role is defined
                            [ ] Ubuntu edition selected
                            [ ] LTS version selected
                            [ ] ISO downloaded from official source
                            [ ] Checksum verified if required
                            [ ] Boot media created
                            [ ] UEFI mode confirmed
                            [ ] Disk layout planned
                            [ ] Static IP or DHCP decision made
                            [ ] Hostname chosen
                            [ ] Admin user chosen
                            [ ] SSH key available
                            [ ] Backup of existing data done
                            [ ] Rollback plan exists if replacing server
During install
[ ] Correct disk selected
                            [ ] EFI partition created if UEFI
                            [ ] LVM selected if server needs flexibility
                            [ ] Encryption enabled if needed
                            [ ] OpenSSH server installed
                            [ ] Admin user created
                            [ ] Network works
                            [ ] Bootloader installed correctly
                            [ ] Machine reboots without USB
After install
[ ] System updated
                            [ ] Reboot performed if required
                            [ ] SSH key login tested
                            [ ] Root SSH login disabled
                            [ ] Password SSH disabled after key validation
                            [ ] Firewall enabled
                            [ ] Only required ports open
                            [ ] Timezone and time sync configured
                            [ ] Hostname correct
                            [ ] Disk usage checked
                            [ ] Failed systemd units checked
                            [ ] Monitoring installed
                            [ ] Backup configured
                            [ ] Server documented
                            [ ] Snapshot or image created if needed
Final rule
A clean Ubuntu installation is reproducible, secure and observable.
The machine should boot correctly, be reachable through controlled SSH, have a clear disk layout, expose only required ports, receive updates, produce usable logs, be monitored, be backed up and be documented.
Minimal safe server baseline
Minimum safe server:
                            - Ubuntu Server LTS
                            - named sudo user
                            - SSH key access
                            - root login disabled
                            - firewall enabled
                            - system updated
                            - timezone configured
                            - disk monitored
                            - logs accessible
                            - backup and rollback plan
                            - server record documented
2.2 Ubuntu CLI Basics: files, users, permissions, services, logs, network, storage and troubleshooting
What โ€œUbuntu CLI basicsโ€ means

The Ubuntu command line is the operational control layer of a Linux server. It is used to inspect files, manage users, control services, read logs, diagnose network problems, check storage, install packages, secure access and troubleshoot production incidents.

A good sysadmin workflow is not memorizing thousands of commands. It is knowing which subsystem to inspect first: files, permissions, users, service manager, logs, network, storage, packages or security.

AreaPurposeMain toolsTypical question
FilesNavigate, copy, move, inspect, search.ls, cp, mv, find, duWhere is the file? How large is it?
PermissionsControl who can read, write or execute.chmod, chown, umask, statWhy can this process not access this file?
UsersCreate accounts, groups and sudo rights.adduser, usermod, id, sudoWho can administer this machine?
ServicesStart, stop, enable and debug daemons.systemctl, journalctlIs Nginx, SSH, Redis or PostgreSQL running?
LogsUnderstand what happened.journalctl, tail, grepWhat error occurred and when?
NetworkInspect IP, routes, DNS, ports, sockets.ip, ss, curl, digCan the server reach or expose the service?
StorageInspect disks, mounts, free space, I/O.df, du, lsblk, findmntIs the disk full or mounted correctly?
Core rule: in production, do not guess. Inspect facts first: service status, logs, ports, permissions, disk space, memory and recent changes.
CLI diagnostic mental model
Problem on Ubuntu
                            โ”‚
                            โ”œโ”€โ”€ Is the file present?
                            โ”‚       โ””โ”€โ”€ ls, find, stat
                            โ”‚
                            โ”œโ”€โ”€ Are permissions correct?
                            โ”‚       โ””โ”€โ”€ ls -l, chmod, chown, id
                            โ”‚
                            โ”œโ”€โ”€ Is the service running?
                            โ”‚       โ””โ”€โ”€ systemctl status
                            โ”‚
                            โ”œโ”€โ”€ What do logs say?
                            โ”‚       โ””โ”€โ”€ journalctl, tail, grep
                            โ”‚
                            โ”œโ”€โ”€ Is the port listening?
                            โ”‚       โ””โ”€โ”€ ss -lntp
                            โ”‚
                            โ”œโ”€โ”€ Is the network path OK?
                            โ”‚       โ””โ”€โ”€ ip, ping, curl, dig
                            โ”‚
                            โ”œโ”€โ”€ Is storage full?
                            โ”‚       โ””โ”€โ”€ df, du, lsblk
                            โ”‚
                            โ””โ”€โ”€ Did something recently change?
                            โ””โ”€โ”€ apt history, logs, config diff
First 60 seconds on a server
hostnamectl
                            uptime
                            who
                            df -h
                            free -h
                            systemctl --failed
                            ss -lntp
                            journalctl -p warning --since "30 min ago"
Bad reflex: restarting random services without reading logs. It may hide the root cause and make the incident harder to understand.
Files and directories: navigate, inspect, copy, search

Most Ubuntu administration starts with files: configuration files, logs, service units, application folders, SSH keys, certificates, scripts, backups and data directories.

CommandUsageExample
pwdShow current directory.pwd
ls -lahList files with details and hidden files.ls -lah /etc/nginx
cdChange directory.cd /var/log
cp -aCopy while preserving attributes.cp -a app app.bak
mvMove or rename.mv old.conf new.conf
rmRemove files.rm old.log
findSearch files by name, type, age or size.find /var/log -type f -name "*.log"
du -shShow directory size.du -sh /var/lib/docker
Essential file commands
# Where am I?
                            pwd

                            # List files with permissions, owner, size and hidden files
                            ls -lah

                            # Copy a directory safely, preserving metadata
                            cp -a /etc/nginx /etc/nginx.bak

                            # Move or rename
                            mv app.conf app.conf.disabled

                            # Remove carefully
                            rm file.txt

                            # Dangerous: recursive delete
                            rm -rf path

                            # Find recent logs
                            find /var/log -type f -name "*.log" -mtime -7

                            # Find large files
                            find /var -type f -size +100M -exec ls -lh {} \;
Linux filesystem map
/
                            โ”œโ”€โ”€ bin      essential binaries
                            โ”œโ”€โ”€ boot     kernel and boot files
                            โ”œโ”€โ”€ dev      devices
                            โ”œโ”€โ”€ etc      system configuration
                            โ”œโ”€โ”€ home     user home directories
                            โ”œโ”€โ”€ lib      system libraries
                            โ”œโ”€โ”€ media    removable media
                            โ”œโ”€โ”€ mnt      temporary mounts
                            โ”œโ”€โ”€ opt      optional software
                            โ”œโ”€โ”€ proc     kernel/process virtual filesystem
                            โ”œโ”€โ”€ root     root user home
                            โ”œโ”€โ”€ run      runtime state
                            โ”œโ”€โ”€ sbin     system binaries
                            โ”œโ”€โ”€ srv      service/application data
                            โ”œโ”€โ”€ sys      kernel/device virtual filesystem
                            โ”œโ”€โ”€ tmp      temporary files
                            โ”œโ”€โ”€ usr      user-space programs and libraries
                            โ””โ”€โ”€ var      logs, cache, spool, databases, runtime data
Useful inspection commands
# Show file type
                            file /path/to/file

                            # Show file metadata
                            stat /path/to/file

                            # Read first lines
                            head -50 /var/log/syslog

                            # Read last lines
                            tail -100 /var/log/syslog

                            # Follow a log live
                            tail -f /var/log/syslog

                            # Search inside files
                            grep -R "error" /etc/nginx

                            # Compare two files
                            diff -u old.conf new.conf
Production habit: before editing a config file, create a timestamped backup: sudo cp -a file file.bak.$(date +%Y%m%d-%H%M%S).
Permissions: rwx, ownership, groups, umask and safe defaults

Linux permissions define who can read, write or execute a file. Most application failures on Ubuntu servers eventually involve one of these: wrong owner, wrong group, missing execute bit on directory, overly permissive file, SSH key permissions or service user unable to access application files.

Permission notation
Example:
                            -rw-r--r-- 1 root root 1200 app.conf

                            Breakdown:
                            -       file type
                            rw-     owner permissions
                            r--     group permissions
                            r--     others permissions

                            r = read
                            w = write
                            x = execute / enter directory
ModeMeaningTypical use
600Owner read/write only.Private keys, secrets.
644Owner write, everyone read.Config files, static files.
700Owner full access only.Private directories, .ssh.
755Owner write, everyone read/execute.Directories, scripts, web static dirs.
777Everyone can do everything.Almost never acceptable.
Permission commands
# Show permissions
                            ls -lah /srv/app

                            # Show user and group identity
                            id deploy

                            # Change owner
                            sudo chown deploy:www-data /srv/app

                            # Change owner recursively
                            sudo chown -R deploy:www-data /srv/app

                            # Change file permissions
                            chmod 644 config.ini

                            # Change directory permissions
                            chmod 755 /srv/app

                            # SSH key permissions
                            chmod 700 ~/.ssh
                            chmod 600 ~/.ssh/id_ed25519
                            chmod 644 ~/.ssh/authorized_keys

                            # Show default creation mask
                            umask
Permission troubleshooting flow
Permission denied
                            โ”‚
                            โ”œโ”€โ”€ Which user runs the process?
                            โ”‚       โ””โ”€โ”€ ps aux | grep service
                            โ”‚
                            โ”œโ”€โ”€ Who owns the file?
                            โ”‚       โ””โ”€โ”€ ls -lah file
                            โ”‚
                            โ”œโ”€โ”€ Can the user access parent directories?
                            โ”‚       โ””โ”€โ”€ namei -l /path/to/file
                            โ”‚
                            โ”œโ”€โ”€ Is the group correct?
                            โ”‚       โ””โ”€โ”€ id user
                            โ”‚
                            โ””โ”€โ”€ Are permissions too strict or too broad?
                            โ””โ”€โ”€ chmod / chown carefully
Production rule: never solve permission problems with chmod 777. Fix ownership, groups and minimal required permissions.
Users, groups, sudo and SSH access

Ubuntu administration should use named users with sudo privileges, not direct root logins. This improves traceability, reduces operational risk and supports least-privilege access. For production, SSH keys should be preferred over passwords.

TaskCommandPurpose
Create usersudo adduser deployCreate named account.
Add sudo rightssudo usermod -aG sudo deployAllow admin actions.
Inspect identityid deployShow UID, GID and groups.
Show groupsgroups deployConfirm group membership.
Check sudo rightssudo -lShow allowed sudo commands.
Lock accountsudo usermod -L userDisable password login.
User management examples
# Create admin user
                            sudo adduser deploy
                            sudo usermod -aG sudo deploy

                            # Check user
                            id deploy
                            groups deploy

                            # Switch user
                            su - deploy

                            # Test sudo permissions
                            sudo -l

                            # Add user to web group
                            sudo usermod -aG www-data deploy

                            # Lock user password
                            sudo passwd -l deploy
SSH access model
Admin workstation
                            โ”‚
                            โ”œโ”€โ”€ private key
                            โ””โ”€โ”€ public key
                            โ”‚
                            โ–ผ
                            Ubuntu server
                            โ”‚
                            โ”œโ”€โ”€ /home/deploy/.ssh/authorized_keys
                            โ”œโ”€โ”€ sshd service
                            โ”œโ”€โ”€ firewall allows SSH
                            โ””โ”€โ”€ sudo controls privilege escalation
SSH hardening baseline
# Backup SSH config
                            sudo cp /etc/ssh/sshd_config /etc/ssh/sshd_config.bak

                            # Recommended settings in /etc/ssh/sshd_config
                            PermitRootLogin no
                            PasswordAuthentication no
                            PubkeyAuthentication yes
                            AllowUsers deploy

                            # Validate syntax
                            sudo sshd -t

                            # Restart SSH
                            sudo systemctl restart ssh

                            # Check logs
                            journalctl -u ssh --since today
Access control checklist
[ ] One named admin user
                            [ ] SSH key installed
                            [ ] User belongs to sudo group only if required
                            [ ] Root SSH login disabled
                            [ ] Password authentication disabled after key test
                            [ ] Unused users disabled
                            [ ] sudoers changes made with visudo
                            [ ] SSH access logs reviewed
Safe habit: keep one open SSH session while changing SSH configuration, then test a second connection before closing the first one.
Services with systemd: status, start, stop, enable, logs

Ubuntu uses systemd to manage services. A service can be running now, enabled at boot, failed, disabled, masked or waiting on dependencies. Most production daemons such as SSH, Nginx, PostgreSQL, Redis, Docker, Gunicorn and Celery are managed by systemd.

CommandMeaningExample
statusShow state, PID, recent logs.systemctl status nginx
startStart now.sudo systemctl start nginx
stopStop now.sudo systemctl stop nginx
restartStop and start again.sudo systemctl restart nginx
reloadReload config without full restart if supported.sudo systemctl reload nginx
enableStart automatically at boot.sudo systemctl enable nginx
disableDo not start automatically at boot.sudo systemctl disable nginx
Essential systemd commands
# Service status
                            systemctl status nginx

                            # Start / stop / restart
                            sudo systemctl start nginx
                            sudo systemctl stop nginx
                            sudo systemctl restart nginx

                            # Enable at boot
                            sudo systemctl enable nginx

                            # Disable at boot
                            sudo systemctl disable nginx

                            # Show failed services
                            systemctl list-units --type=service --state=failed

                            # Show enabled services
                            systemctl list-unit-files --type=service --state=enabled
Service troubleshooting flow
Service is down
                            โ”‚
                            โ”œโ”€โ”€ Check status
                            โ”‚       โ””โ”€โ”€ systemctl status service
                            โ”‚
                            โ”œโ”€โ”€ Read service logs
                            โ”‚       โ””โ”€โ”€ journalctl -u service
                            โ”‚
                            โ”œโ”€โ”€ Validate config
                            โ”‚       โ””โ”€โ”€ nginx -t / sshd -t / app-specific check
                            โ”‚
                            โ”œโ”€โ”€ Check port binding
                            โ”‚       โ””โ”€โ”€ ss -lntp
                            โ”‚
                            โ”œโ”€โ”€ Check permissions
                            โ”‚       โ””โ”€โ”€ ls -lah, id service-user
                            โ”‚
                            โ””โ”€โ”€ Restart only after understanding error
                            โ””โ”€โ”€ systemctl restart service
Custom service unit example
[Unit]
                            Description=Gunicorn Django application
                            After=network.target

                            [Service]
                            User=deploy
                            Group=www-data
                            WorkingDirectory=/srv/myapp
                            Environment="DJANGO_SETTINGS_MODULE=config.settings"
                            ExecStart=/srv/myapp/venv/bin/gunicorn config.wsgi:application \
                            --bind 127.0.0.1:8000 \
                            --workers 3
                            Restart=always
                            RestartSec=5

                            [Install]
                            WantedBy=multi-user.target
Install custom unit
sudo cp gunicorn.service /etc/systemd/system/gunicorn.service
                            sudo systemctl daemon-reload
                            sudo systemctl enable gunicorn
                            sudo systemctl start gunicorn
                            systemctl status gunicorn
                            journalctl -u gunicorn -f
Reflex: if a service fails, use systemctl status then journalctl -u service. Do not debug blindly.
Logs: journald, syslog, auth logs and application logs

Logs tell what the system and services reported at the time of the incident. On Ubuntu, systemd logs are read with journalctl. Traditional logs often live under /var/log. Applications may log to journald, files, Docker logs or external observability tools.

Log sourceWhat it containsCommand
systemd journalService logs and system events.journalctl
Service unit logsSpecific service output.journalctl -u nginx
Auth logsSSH, sudo, authentication events./var/log/auth.log
SyslogGeneral system messages./var/log/syslog
Kernel logsKernel and hardware messages.dmesg
Application logsApp-specific runtime errors.App path, journald or Docker logs.
journalctl essentials
# Recent critical context
                            journalctl -xe

                            # Logs for one service
                            journalctl -u nginx

                            # Follow service logs live
                            journalctl -u nginx -f

                            # Logs since a time
                            journalctl -u nginx --since "1 hour ago"

                            # Logs since today
                            journalctl -u ssh --since today

                            # Warnings and errors
                            journalctl -p warning --since today

                            # Boot logs
                            journalctl -b

                            # Previous boot
                            journalctl -b -1
Classic log commands
# Last lines
                            tail -n 200 /var/log/syslog
                            tail -n 200 /var/log/auth.log

                            # Follow file live
                            tail -f /var/log/syslog

                            # Search errors
                            grep -i "error" /var/log/syslog

                            # Search SSH failures
                            grep -i "failed" /var/log/auth.log

                            # Compressed rotated logs
                            zgrep -i "error" /var/log/syslog.*.gz

                            # Kernel recent messages
                            dmesg -T | tail -100
Log diagnosis map
Incident type
                            โ”‚
                            โ”œโ”€โ”€ Service fails
                            โ”‚       โ””โ”€โ”€ journalctl -u service
                            โ”‚
                            โ”œโ”€โ”€ SSH login issue
                            โ”‚       โ””โ”€โ”€ journalctl -u ssh, /var/log/auth.log
                            โ”‚
                            โ”œโ”€โ”€ Kernel or hardware issue
                            โ”‚       โ””โ”€โ”€ dmesg -T
                            โ”‚
                            โ”œโ”€โ”€ Package install issue
                            โ”‚       โ””โ”€โ”€ /var/log/apt/history.log
                            โ”‚
                            โ”œโ”€โ”€ Web server issue
                            โ”‚       โ””โ”€โ”€ nginx/apache logs + journal
                            โ”‚
                            โ””โ”€โ”€ App issue
                            โ””โ”€โ”€ app logs + service journal
Apt history
# See package changes
                            less /var/log/apt/history.log

                            # See apt terminal output
                            less /var/log/apt/term.log
Production habit: always include time windows in log commands. It reduces noise: --since "30 min ago".
Network: IP, routes, DNS, ports, sockets, firewall

Network troubleshooting should follow a strict order: local IP, route, DNS, firewall, listening socket, service health, upstream application. This avoids confusing a DNS issue with a service issue, or a firewall issue with an application crash.

LayerQuestionCommand
InterfaceDoes the server have an IP?ip a
RouteDoes it know where to send traffic?ip r
DNSCan names resolve?resolvectl status, dig
PortIs the service listening?ss -lntp
FirewallIs traffic allowed?ufw status verbose
HTTP testDoes the endpoint respond?curl -I
Network essentials
# IP addresses
                            ip a

                            # Routes
                            ip r

                            # Listening TCP ports with process
                            ss -lntp

                            # Established connections
                            ss -antp

                            # DNS status
                            resolvectl status

                            # DNS query
                            dig example.com

                            # HTTP check
                            curl -I https://example.com

                            # Basic reachability
                            ping -c 3 1.1.1.1

                            # Path test
                            tracepath example.com
Network troubleshooting flow
Network problem
                            โ”‚
                            โ”œโ”€โ”€ Local IP present?
                            โ”‚       โ””โ”€โ”€ ip a
                            โ”‚
                            โ”œโ”€โ”€ Default route present?
                            โ”‚       โ””โ”€โ”€ ip r
                            โ”‚
                            โ”œโ”€โ”€ DNS working?
                            โ”‚       โ””โ”€โ”€ dig domain
                            โ”‚
                            โ”œโ”€โ”€ Firewall allows traffic?
                            โ”‚       โ””โ”€โ”€ ufw status verbose
                            โ”‚
                            โ”œโ”€โ”€ Service listening?
                            โ”‚       โ””โ”€โ”€ ss -lntp
                            โ”‚
                            โ”œโ”€โ”€ Local curl works?
                            โ”‚       โ””โ”€โ”€ curl -I http://localhost
                            โ”‚
                            โ””โ”€โ”€ Remote curl works?
                            โ””โ”€โ”€ curl -I https://public-domain
Firewall commands
# Status
                            sudo ufw status verbose

                            # Default rules
                            sudo ufw default deny incoming
                            sudo ufw default allow outgoing

                            # Allow SSH
                            sudo ufw allow OpenSSH

                            # Allow web ports
                            sudo ufw allow 80/tcp
                            sudo ufw allow 443/tcp

                            # Enable
                            sudo ufw enable

                            # Delete a rule
                            sudo ufw delete allow 80/tcp
Pattern: IP โ†’ route โ†’ DNS โ†’ firewall โ†’ listening port โ†’ service logs โ†’ application.
Storage: disks, mounts, usage, LVM, swap and full-disk incidents

Storage problems are among the most common Linux incidents. A full root filesystem, a full /var, a missing mount, broken permissions on a data directory or uncontrolled Docker logs can stop services even when CPU and memory look fine.

CommandPurposeExample
df -hShow filesystem free space.df -h
du -shShow directory size.du -sh /var/*
lsblkShow disks and partitions.lsblk -f
findmntShow mounted filesystems.findmnt /var
swaponShow swap devices/files.swapon --show
lvsShow LVM logical volumes.sudo lvs
Storage essentials
# Filesystem usage
                            df -h

                            # Directory usage
                            sudo du -sh /var/*
                            sudo du -sh /var/log/*
                            sudo du -sh /var/lib/docker/*

                            # Disks and filesystems
                            lsblk
                            lsblk -f

                            # Mounted filesystems
                            findmnt

                            # Swap
                            swapon --show
                            free -h

                            # LVM if used
                            sudo pvs
                            sudo vgs
                            sudo lvs
Full disk incident flow
Disk alert or service failure
                            โ”‚
                            โ”œโ”€โ”€ Check filesystems
                            โ”‚       โ””โ”€โ”€ df -h
                            โ”‚
                            โ”œโ”€โ”€ Identify large directories
                            โ”‚       โ””โ”€โ”€ du -sh /*
                            โ”‚
                            โ”œโ”€โ”€ Focus common growth areas
                            โ”‚       โ”œโ”€โ”€ /var/log
                            โ”‚       โ”œโ”€โ”€ /var/lib/docker
                            โ”‚       โ”œโ”€โ”€ /var/lib/postgresql
                            โ”‚       โ”œโ”€โ”€ /tmp
                            โ”‚       โ””โ”€โ”€ application uploads
                            โ”‚
                            โ”œโ”€โ”€ Clean safely
                            โ”‚       โ”œโ”€โ”€ rotate logs
                            โ”‚       โ”œโ”€โ”€ prune Docker carefully
                            โ”‚       โ””โ”€โ”€ archive/delete known files
                            โ”‚
                            โ””โ”€โ”€ Prevent recurrence
                            โ”œโ”€โ”€ monitoring
                            โ”œโ”€โ”€ logrotate
                            โ”œโ”€โ”€ retention policy
                            โ””โ”€โ”€ larger/separate volume
Safe cleanup examples
# Clean apt cache
                            sudo apt clean

                            # Remove unused packages
                            sudo apt autoremove

                            # Vacuum journal logs older than 14 days
                            sudo journalctl --vacuum-time=14d

                            # Show Docker usage
                            docker system df

                            # Docker cleanup - use carefully
                            docker system prune
Production warning: never delete unknown database files manually. For PostgreSQL, MySQL, MariaDB, Redis or Docker volumes, understand the data path before cleanup.
Troubleshooting patterns: from symptom to root cause

Troubleshooting on Ubuntu should follow a repeatable sequence: observe, isolate, verify, change one thing, measure again, document. Most incidents can be reduced to service state, logs, ports, permissions, network, storage, memory or recent changes.

SymptomFirst checksCommon causes
Service downsystemctl status, journalctl -uBad config, dependency, permission, port conflict.
502 from NginxNginx logs, upstream service, socket/port.Gunicorn down, wrong socket, app error.
SSH blockedSSH service, firewall, key, auth logs.Bad key, password disabled, UFW, fail2ban.
Cannot install packageapt update, DNS, locks, dpkg state.Mirror, DNS, interrupted install, lock file.
Disk fulldf -h, du -sh.Logs, Docker, DB, uploads, backups.
App permission errorls -lah, id, namei -l.Wrong owner, group, parent directory permissions.
DNS issueresolvectl status, dig.Resolver config, firewall, network, cloud DNS.
Universal incident decision tree
Application not working
                            โ”‚
                            โ”œโ”€โ”€ Is the server alive?
                            โ”‚       โ””โ”€โ”€ ping, SSH, cloud console
                            โ”‚
                            โ”œโ”€โ”€ Is disk full?
                            โ”‚       โ””โ”€โ”€ df -h
                            โ”‚
                            โ”œโ”€โ”€ Is memory exhausted?
                            โ”‚       โ””โ”€โ”€ free -h, top
                            โ”‚
                            โ”œโ”€โ”€ Is the service running?
                            โ”‚       โ””โ”€โ”€ systemctl status service
                            โ”‚
                            โ”œโ”€โ”€ What do logs say?
                            โ”‚       โ””โ”€โ”€ journalctl -u service
                            โ”‚
                            โ”œโ”€โ”€ Is the port listening?
                            โ”‚       โ””โ”€โ”€ ss -lntp
                            โ”‚
                            โ”œโ”€โ”€ Is firewall blocking?
                            โ”‚       โ””โ”€โ”€ ufw status verbose
                            โ”‚
                            โ”œโ”€โ”€ Is DNS/routing OK?
                            โ”‚       โ””โ”€โ”€ ip r, resolvectl, dig
                            โ”‚
                            โ””โ”€โ”€ Did a recent change happen?
                            โ””โ”€โ”€ apt history, deploy logs, config diff
Useful โ€œone screenโ€ diagnostic
echo "== HOST ==" && hostnamectl
                            echo "== UPTIME ==" && uptime
                            echo "== DISK ==" && df -h
                            echo "== MEMORY ==" && free -h
                            echo "== FAILED UNITS ==" && systemctl --failed
                            echo "== PORTS ==" && ss -lntp
                            echo "== WARNINGS ==" && journalctl -p warning --since "30 min ago" --no-pager
Incident discipline: isolate scope first. Is it one service, one port, one user, one disk, one host, one network path or the whole platform?
Ubuntu CLI cheat sheet and production checklist
Core cheat sheet
# Files
                            ls -lah
                            cp -a src dst
                            mv old new
                            rm file
                            find /path -name "*.log"
                            du -sh *
                            df -h

                            # Permissions
                            ls -l
                            chmod 644 file
                            chmod 755 dir
                            chown user:group file
                            id user
                            namei -l /path/to/file

                            # Users
                            adduser user
                            usermod -aG sudo user
                            groups user
                            sudo -l

                            # Services
                            systemctl status service
                            systemctl restart service
                            systemctl enable service
                            systemctl --failed

                            # Logs
                            journalctl -u service -f
                            journalctl -p warning --since today
                            tail -f /var/log/syslog

                            # Network
                            ip a
                            ip r
                            ss -lntp
                            curl -I http://localhost
                            dig domain
                            resolvectl status

                            # Storage
                            lsblk -f
                            findmnt
                            swapon --show
                            free -h
Production sysadmin baseline
[ ] I know the server role
                            [ ] I know the Ubuntu version
                            [ ] I know which services must run
                            [ ] I know which ports must listen
                            [ ] I know where logs are
                            [ ] I know which user runs each app
                            [ ] I know where configs are
                            [ ] I know where data is stored
                            [ ] I know backup location
                            [ ] I know firewall rules
                            [ ] I know how to restart safely
                            [ ] I know how to rollback
                            [ ] I avoid chmod 777
                            [ ] I avoid root direct login
                            [ ] I document changes
Final rule
The Ubuntu CLI is a production microscope.
It lets you inspect the real state of the machine: files, permissions, users, services, logs, ports, network paths, disks and failures. Good troubleshooting means reading evidence before making changes.
Troubleshooting order
1. Observe symptoms
                            2. Check server health
                            3. Check disk and memory
                            4. Check service state
                            5. Read logs
                            6. Check ports
                            7. Check network and DNS
                            8. Check permissions
                            9. Check recent changes
                            10. Apply one fix
                            11. Verify
                            12. Document
3.1 Ubuntu Packages: APT, Snap, repositories, updates, pinning, security and production practices
Package management on Ubuntu

Ubuntu package management is mainly based on APT, which installs, upgrades, removes and resolves software dependencies from configured repositories. Ubuntu also supports Snap, a package format designed for sandboxed applications with automatic refresh behavior.

In production, package management is not only about installing software. It controls security patching, dependency stability, reproducibility, rollback strategy, package provenance, compliance and operational risk.

ToolRoleTypical usageProduction concern
APTMain Ubuntu package manager frontend.Install Nginx, PostgreSQL, Redis, Python packages from Ubuntu repos.Repository control, upgrade policy, dependency stability.
dpkgLow-level Debian package tool.Inspect installed packages or install local .deb files.Does not resolve dependencies like APT.
SnapSandboxed application packaging.Desktop apps, selected server tools, Canonical ecosystem packages.Automatic refresh, policy control, mixed packaging strategy.
PPAThird-party repository hosted on Launchpad.Newer package versions or vendor-specific builds.Trust, support, upgrade conflicts, governance.
Vendor repoRepository maintained by software vendor.Docker, PostgreSQL, NodeSource, Elastic, HashiCorp.Key management, package pinning, lifecycle tracking.
Core rule: production package management must be intentional: approved repositories, known package versions, tested updates, documented rollback and clear ownership.
Package management architecture
Ubuntu package flow
                            โ”‚
                            โ”œโ”€โ”€ Repository configuration
                            โ”‚       โ”œโ”€โ”€ Ubuntu official repositories
                            โ”‚       โ”œโ”€โ”€ security repositories
                            โ”‚       โ”œโ”€โ”€ updates repositories
                            โ”‚       โ”œโ”€โ”€ PPAs
                            โ”‚       โ””โ”€โ”€ vendor repositories
                            โ”‚
                            โ”œโ”€โ”€ APT metadata
                            โ”‚       โ”œโ”€โ”€ package lists
                            โ”‚       โ”œโ”€โ”€ versions
                            โ”‚       โ”œโ”€โ”€ dependencies
                            โ”‚       โ””โ”€โ”€ priorities
                            โ”‚
                            โ”œโ”€โ”€ Package operations
                            โ”‚       โ”œโ”€โ”€ install
                            โ”‚       โ”œโ”€โ”€ upgrade
                            โ”‚       โ”œโ”€โ”€ remove
                            โ”‚       โ”œโ”€โ”€ purge
                            โ”‚       โ””โ”€โ”€ autoremove
                            โ”‚
                            โ””โ”€โ”€ Operational controls
                            โ”œโ”€โ”€ pinning
                            โ”œโ”€โ”€ holds
                            โ”œโ”€โ”€ unattended upgrades
                            โ”œโ”€โ”€ reboot policy
                            โ””โ”€โ”€ rollback plan
Decision map
Need standard server package?
                            โ””โ”€โ”€ use APT from Ubuntu repositories

                            Need vendor-supported latest version?
                            โ””โ”€โ”€ use official vendor repository

                            Need experimental or community package?
                            โ””โ”€โ”€ use PPA only with governance

                            Need desktop-style sandboxed app?
                            โ””โ”€โ”€ Snap can be acceptable

                            Need strict production reproducibility?
                            โ””โ”€โ”€ prefer APT + pinned versions + image build
APT basics: install, upgrade, remove, inspect

APT is the standard daily tool for Ubuntu package operations. It downloads package metadata, resolves dependencies, installs software, upgrades packages and removes software cleanly.

CommandPurposeExample
apt updateRefresh repository metadata.sudo apt update
apt upgradeUpgrade installed packages without removing packages.sudo apt upgrade
apt full-upgradeUpgrade with dependency changes, installs/removals if needed.sudo apt full-upgrade
apt installInstall package.sudo apt install nginx
apt removeRemove package but keep config files.sudo apt remove nginx
apt purgeRemove package and config files.sudo apt purge nginx
apt autoremoveRemove unused dependencies.sudo apt autoremove
apt policyShow installed and candidate version.apt policy nginx
Essential APT commands
# Refresh package metadata
                            sudo apt update

                            # Show upgradeable packages
                            apt list --upgradable

                            # Upgrade packages
                            sudo apt upgrade

                            # Install package
                            sudo apt install nginx

                            # Show package information
                            apt show nginx

                            # Show package versions and source repository
                            apt policy nginx

                            # Search package
                            apt search postgresql

                            # Remove package but keep configuration
                            sudo apt remove nginx

                            # Remove package and configuration
                            sudo apt purge nginx

                            # Remove unused dependencies
                            sudo apt autoremove
APT vs dpkg
ToolBest forImportant detail
aptNormal package management.Resolves dependencies from repositories.
apt-cacheOlder metadata inspection commands.Still useful in scripts and diagnostics.
dpkgInspect or install local Debian packages.Does not automatically resolve dependencies.
apt-fileFind which package provides a file.Requires package metadata installation.
Package inspection
# List installed packages
                            dpkg -l

                            # Filter installed packages
                            dpkg -l | grep nginx

                            # Show files installed by package
                            dpkg -L nginx

                            # Find which package owns a file
                            dpkg -S /usr/sbin/nginx

                            # Show package version
                            dpkg -s nginx | grep Version

                            # Show apt history
                            less /var/log/apt/history.log

                            # Show apt terminal logs
                            less /var/log/apt/term.log
Production habit: before upgrading, run apt list --upgradable and review critical packages such as kernel, OpenSSL, database, web server and runtime.
Repositories: official sources, PPAs, vendor repos and trust

APT installs packages from repositories. Repository governance is critical: every repository added to a production server becomes part of the trust and upgrade surface. Too many uncontrolled PPAs or vendor repositories can make upgrades unpredictable.

Repository typeUsageRiskProduction rule
Ubuntu mainOfficial supported packages.Low.Default baseline.
Ubuntu universeCommunity-maintained packages.Support scope differs.Accept with awareness.
Security repoSecurity updates.Must stay enabled.Never disable casually.
PPACommunity or project-specific builds.Trust and compatibility risk.Use only with explicit approval.
Vendor repoOfficial software vendor packages.Key, pinning and lifecycle complexity.Document and monitor.
Local mirrorEnterprise-controlled package mirror.Mirror freshness.Useful for controlled fleets.
Repository locations
# Main APT source files
                            /etc/apt/sources.list
                            /etc/apt/sources.list.d/

                            # Newer Ubuntu systems may use deb822 source files
                            /etc/apt/sources.list.d/*.sources

                            # Trusted keyring locations
                            /etc/apt/keyrings/
                            /usr/share/keyrings/

                            # Apt preferences and pinning
                            /etc/apt/preferences
                            /etc/apt/preferences.d/
Repository inspection commands
# Show active source files
                            ls -lah /etc/apt/sources.list.d/
                            cat /etc/apt/sources.list

                            # Search configured repositories
                            grep -R "^deb" /etc/apt/sources.list /etc/apt/sources.list.d/

                            # Refresh repository metadata
                            sudo apt update

                            # Show repository used for package candidate
                            apt policy nginx

                            # Show all versions available
                            apt-cache madison nginx

                            # Show package origin details
                            apt-cache policy nginx
Vendor repository pattern
Recommended vendor repo pattern:
                            1. Add vendor signing key into /etc/apt/keyrings/
                            2. Add repository source referencing signed-by key
                            3. Run apt update
                            4. Check apt policy package
                            5. Install exact package
                            6. Document repository owner and reason
                            7. Monitor vendor release notes
                            8. Pin if required
Repository risk diagram
New repository added
                            โ”‚
                            โ”œโ”€โ”€ Can replace existing packages?
                            โ”œโ”€โ”€ Can introduce newer dependencies?
                            โ”œโ”€โ”€ Can break upgrade path?
                            โ”œโ”€โ”€ Is signing key controlled?
                            โ”œโ”€โ”€ Is vendor trusted?
                            โ”œโ”€โ”€ Is lifecycle documented?
                            โ””โ”€โ”€ Is rollback possible?
Production warning: every PPA is a supply-chain and compatibility decision. Do not add PPAs casually on long-lived production servers.
Updates: patching, reboot policy, golden images and upgrade windows

Ubuntu updates must balance security and stability. Security patches should be applied quickly, but critical production systems often require staging validation, maintenance windows and rollback plans. Kernel and libc-related updates may require service restart or full reboot.

Update strategyBest forStrengthWatch out
Manual updatesSmall systems, controlled maintenance.Maximum human control.Can be forgotten.
Unattended security updatesStandard servers.Fast CVE patching.Needs reboot/service restart policy.
Monthly patch windowCritical production.Testing and coordination.Emergency CVEs still need fast path.
Golden image replacementCloud fleets and autoscaling.Reproducible and rollback-friendly.Requires image pipeline.
Rolling patchingClusters and HA services.No full downtime.Requires health checks and drain logic.
Update commands
# Refresh metadata
                            sudo apt update

                            # Show upgradeable packages
                            apt list --upgradable

                            # Upgrade packages
                            sudo apt upgrade

                            # More complete dependency-aware upgrade
                            sudo apt full-upgrade

                            # Remove unused dependencies
                            sudo apt autoremove

                            # Check if reboot is required
                            test -f /var/run/reboot-required && cat /var/run/reboot-required

                            # Show packages requiring reboot
                            cat /var/run/reboot-required.pkgs 2>/dev/null
Patch workflow
Patch workflow
                            โ”‚
                            โ”œโ”€โ”€ Inventory
                            โ”‚       โ”œโ”€โ”€ OS version
                            โ”‚       โ”œโ”€โ”€ kernel version
                            โ”‚       โ”œโ”€โ”€ critical services
                            โ”‚       โ””โ”€โ”€ package list
                            โ”‚
                            โ”œโ”€โ”€ Prepare
                            โ”‚       โ”œโ”€โ”€ backup
                            โ”‚       โ”œโ”€โ”€ snapshot
                            โ”‚       โ”œโ”€โ”€ staging test
                            โ”‚       โ””โ”€โ”€ maintenance window
                            โ”‚
                            โ”œโ”€โ”€ Patch
                            โ”‚       โ”œโ”€โ”€ apt update
                            โ”‚       โ”œโ”€โ”€ apt upgrade
                            โ”‚       โ”œโ”€โ”€ service validation
                            โ”‚       โ””โ”€โ”€ reboot if required
                            โ”‚
                            โ””โ”€โ”€ Verify
                            โ”œโ”€โ”€ systemctl --failed
                            โ”œโ”€โ”€ journalctl warnings
                            โ”œโ”€โ”€ listening ports
                            โ”œโ”€โ”€ application smoke tests
                            โ””โ”€โ”€ monitoring green
Unattended upgrades
# Install unattended upgrades
                            sudo apt install unattended-upgrades

                            # Configure automatic updates
                            sudo dpkg-reconfigure unattended-upgrades

                            # Main config files
                            /etc/apt/apt.conf.d/20auto-upgrades
                            /etc/apt/apt.conf.d/50unattended-upgrades

                            # Check logs
                            less /var/log/unattended-upgrades/unattended-upgrades.log
Production rule: security updates without reboot planning can create false confidence. A patched kernel is not active until the system boots into it.
Security: CVEs, package provenance, keys and audit trail

Package security is about more than installing updates. It includes repository trust, signing keys, CVE awareness, dependency origin, package version visibility, automatic security updates, rollback and auditability.

Security concernDiagnosticControl
Known vulnerable packageSecurity notices, scanner, package version.Patch quickly, reboot/restart if needed.
Untrusted repositoryInspect sources and keys.Remove unused PPAs and vendor repos.
Unsigned or broken repositoryapt update errors.Fix keyring or disable repository.
Package replaced by PPAapt policy package.Pin or remove repository.
No audit trailApt history missing from process.Record update windows and package changes.
Security inspection commands
# Show installed version and candidate
                            apt policy openssl
                            apt policy nginx

                            # Show package details
                            apt show openssl

                            # Show package changelog if available
                            apt changelog openssl

                            # Review apt history
                            less /var/log/apt/history.log

                            # Show recently modified source files
                            sudo find /etc/apt -type f -mtime -30 -ls

                            # Check Ubuntu Pro status if available
                            pro status
Package security flow
Security advisory or CVE
                            โ”‚
                            โ”œโ”€โ”€ Identify affected package
                            โ”‚       โ””โ”€โ”€ apt policy package
                            โ”‚
                            โ”œโ”€โ”€ Check installed version
                            โ”‚       โ””โ”€โ”€ dpkg -s package
                            โ”‚
                            โ”œโ”€โ”€ Check available update
                            โ”‚       โ””โ”€โ”€ apt list --upgradable
                            โ”‚
                            โ”œโ”€โ”€ Apply patch
                            โ”‚       โ””โ”€โ”€ apt upgrade package
                            โ”‚
                            โ”œโ”€โ”€ Restart service if needed
                            โ”‚       โ””โ”€โ”€ systemctl restart service
                            โ”‚
                            โ”œโ”€โ”€ Reboot if kernel/system library
                            โ”‚       โ””โ”€โ”€ reboot-required
                            โ”‚
                            โ””โ”€โ”€ Verify
                            โ”œโ”€โ”€ version updated
                            โ”œโ”€โ”€ service healthy
                            โ””โ”€โ”€ logs clean
Key management principles
Good:
                            - vendor keys stored in /etc/apt/keyrings/
                            - repository line uses signed-by=
                            - repository owner documented
                            - old repositories removed
                            - package origin checked with apt policy

                            Avoid:
                            - legacy apt-key usage
                            - unknown curl | sudo bash scripts
                            - unmanaged PPAs
                            - repositories kept after one-time install
                            - blind upgrades without package review
Supply-chain rule: never pipe unknown install scripts directly into a root shell on production servers. Download, inspect, verify source, then execute intentionally.
Pinning, holds and version control

Pinning and holds control package versions. They are useful when a service depends on a specific version, when a repository offers unwanted newer packages, or when an upgrade must be temporarily blocked. They should be documented because forgotten pins can create security and maintenance risks.

MechanismPurposeExample useRisk
apt-mark holdPrevent package upgrades.Freeze PostgreSQL or Nginx temporarily.Security patches may be blocked.
APT preferencesControl repository priority.Prefer Ubuntu repo over PPA.Misconfiguration can select wrong packages.
Exact version installInstall specific version.apt install package=versionVersion may disappear from repo.
Golden imageFreeze whole system baseline.Cloud server fleet.Image must be rebuilt for patches.
Hold commands
# Hold a package
                            sudo apt-mark hold nginx

                            # Show held packages
                            apt-mark showhold

                            # Remove hold
                            sudo apt-mark unhold nginx

                            # Install exact version
                            sudo apt install nginx=1.24.0-2ubuntu7

                            # Show available versions
                            apt-cache madison nginx
                            apt policy nginx
APT preferences example
# Example file:
                            # /etc/apt/preferences.d/nginx-pin

                            Package: nginx*
                            Pin: release o=Ubuntu
                            Pin-Priority: 700

                            Package: nginx*
                            Pin: origin "ppa.launchpadcontent.net"
                            Pin-Priority: 400
Version governance flow
Need version control?
                            โ”‚
                            โ”œโ”€โ”€ Is this temporary?
                            โ”‚       โ”œโ”€โ”€ yes -> apt-mark hold + ticket + expiry date
                            โ”‚       โ””โ”€โ”€ no
                            โ”‚
                            โ”œโ”€โ”€ Is repo priority wrong?
                            โ”‚       โ”œโ”€โ”€ yes -> APT preferences pinning
                            โ”‚       โ””โ”€โ”€ no
                            โ”‚
                            โ”œโ”€โ”€ Need fleet reproducibility?
                            โ”‚       โ”œโ”€โ”€ yes -> golden image or IaC
                            โ”‚       โ””โ”€โ”€ no
                            โ”‚
                            โ””โ”€โ”€ Document package policy
                            โ”œโ”€โ”€ package
                            โ”œโ”€โ”€ desired version
                            โ”œโ”€โ”€ reason
                            โ”œโ”€โ”€ owner
                            โ””โ”€โ”€ review date
Pinning risks
RiskCauseControl
Missed security updatePackage held too long.Review holds regularly.
Dependency conflictPackage versions drift.Test upgrades in staging.
Wrong repo selectedBad pin priority.Check apt policy.
Hidden operational debtNo owner or expiry.Document every hold and pin.
Production rule: every hold or pin must have a reason, owner and review date. Otherwise, it becomes invisible technical debt.
Snap: concept, commands, refresh behavior and production policy

Snap packages bundle applications with their dependencies and run with confinement rules. Snaps are useful for some desktop applications and selected server tools, but production teams must understand refresh behavior, confinement, channels and operational policy before relying on them.

Snap conceptMeaningOperational impact
ChannelRelease track such as stable, candidate, beta, edge.Controls risk level.
ConfinementSandbox permissions model.Can affect filesystem and device access.
RefreshAutomatic update behavior.Needs maintenance window policy.
RevisionSpecific snap build version.Rollback may use previous revision.
InterfacePermission connection between snap and system resource.May require manual connection.
Snap essentials
# List installed snaps
                            snap list

                            # Find package
                            snap find code

                            # Install snap
                            sudo snap install package-name

                            # Install from specific channel
                            sudo snap install package-name --channel=stable

                            # Refresh snaps
                            sudo snap refresh

                            # Show refresh schedule
                            snap refresh --time

                            # Show snap information
                            snap info package-name

                            # Remove snap
                            sudo snap remove package-name
Snap operational commands
# Show connections/interfaces
                            snap connections package-name

                            # Connect interface manually
                            sudo snap connect package-name:interface

                            # Revert to previous revision if available
                            sudo snap revert package-name

                            # Hold refresh temporarily
                            sudo snap refresh --hold=24h package-name

                            # Hold all refreshes temporarily
                            sudo snap refresh --hold=24h

                            # Show changes
                            snap changes

                            # Show logs for snap service if applicable
                            snap logs package-name
APT vs Snap decision table
NeedPrefer APTPrefer Snap
Core server packagesYes.Usually no.
Desktop applicationsSometimes.Often acceptable.
Strict patch windowEasier to control.Refresh policy must be managed.
Sandboxed app deliveryLess direct.Good fit.
Traditional system serviceUsually better.Depends on package and support model.
Production warning: avoid unmanaged mixing of APT and Snap for the same role. Define which package system owns each component.
APT and package troubleshooting

Package problems often come from broken dependencies, interrupted installs, repository errors, DNS issues, expired keys, dpkg locks, held packages or third-party repository conflicts. Troubleshooting should start by reading the actual APT error.

SymptomLikely causeFirst command
Could not get lockAnother apt or dpkg process is running.ps aux | grep -E 'apt|dpkg'
Temporary failure resolvingDNS problem.resolvectl status
NO_PUBKEYMissing repository signing key.Inspect repository and keyring.
held broken packagesDependency conflict or holds.apt-mark showhold
Package version unexpectedPPA or pinning changed candidate.apt policy package
Install interrupteddpkg half-configured packages.sudo dpkg --configure -a
Repair commands
# Repair interrupted dpkg configuration
                            sudo dpkg --configure -a

                            # Fix broken dependencies
                            sudo apt -f install

                            # Refresh metadata
                            sudo apt update

                            # Clean local package cache
                            sudo apt clean

                            # Check held packages
                            apt-mark showhold

                            # Check locks safely
                            ps aux | grep -E 'apt|dpkg'

                            # Review apt history
                            less /var/log/apt/history.log
                            less /var/log/apt/term.log
Troubleshooting decision tree
APT operation fails
                            โ”‚
                            โ”œโ”€โ”€ Read exact error
                            โ”‚
                            โ”œโ”€โ”€ Lock error?
                            โ”‚       โ””โ”€โ”€ wait or inspect apt/dpkg processes
                            โ”‚
                            โ”œโ”€โ”€ Network or DNS error?
                            โ”‚       โ””โ”€โ”€ check ip, route, DNS, proxy
                            โ”‚
                            โ”œโ”€โ”€ Repository signature error?
                            โ”‚       โ””โ”€โ”€ check source file and keyring
                            โ”‚
                            โ”œโ”€โ”€ Dependency conflict?
                            โ”‚       โ””โ”€โ”€ apt -f install, apt policy, holds
                            โ”‚
                            โ”œโ”€โ”€ Interrupted install?
                            โ”‚       โ””โ”€โ”€ dpkg --configure -a
                            โ”‚
                            โ””โ”€โ”€ Third-party repo conflict?
                            โ””โ”€โ”€ disable repo, update, retry in staging
Repository isolation technique
# Temporarily disable a source file
                            sudo mv /etc/apt/sources.list.d/vendor.list \
                            /etc/apt/sources.list.d/vendor.list.disabled

                            # Refresh metadata
                            sudo apt update

                            # Re-check package candidate
                            apt policy package-name
Do not: delete dpkg lock files blindly while package operations are running. You can corrupt the package database. First identify the active process.
Production best practices: governance, reproducibility and rollback

In production, package management must be reproducible. The same server role should use the same repositories, packages, versions, configuration and patching process. Manual package drift is a major source of incidents.

PracticeWhy it mattersImplementation
Approved repository listControls supply-chain risk.Document Ubuntu, security and vendor repos.
Package baselineImproves reproducibility.Ansible, Packer, Terraform, cloud-init.
Patch windowsReduces surprise outages.Monthly standard, emergency CVE fast path.
Staging validationCatches dependency and config breakage.Upgrade staging before production.
Rollback planLimits outage duration.Snapshot, AMI, previous image, package downgrade plan.
Change logEnables incident diagnosis.Ticket, deployment log, apt history archive.
Production package lifecycle
Package change request
                            โ”‚
                            โ”œโ”€โ”€ Why is package needed?
                            โ”œโ”€โ”€ Which repository provides it?
                            โ”œโ”€โ”€ Is vendor trusted?
                            โ”œโ”€โ”€ Is version pinned or floating?
                            โ”œโ”€โ”€ Has staging been tested?
                            โ”œโ”€โ”€ Is rollback possible?
                            โ””โ”€โ”€ Is owner documented?
                            โ”‚
                            โ–ผ
                            Approved installation
                            โ”‚
                            โ”œโ”€โ”€ update IaC
                            โ”œโ”€โ”€ apply in staging
                            โ”œโ”€โ”€ validate
                            โ”œโ”€โ”€ apply in production
                            โ””โ”€โ”€ document result
Production rules
Do:
                            - use Ubuntu LTS for production
                            - keep security repository enabled
                            - document every external repository
                            - prefer vendor official repositories over random PPAs
                            - test updates in staging
                            - track reboot-required state
                            - keep rollback snapshot or image
                            - automate package baseline
                            - review apt history after changes
                            - monitor security advisories

                            Avoid:
                            - unmanaged PPAs
                            - curl | sudo bash without review
                            - compiling manually into /usr/local without documentation
                            - mixing APT and Snap for the same service role
                            - holding packages forever
                            - patching critical systems without rollback
Infrastructure-as-code examples
Package baseline can be expressed in:
                            - Ansible apt module
                            - cloud-init packages section
                            - Packer image build
                            - Terraform user_data
                            - Dockerfile for containers
                            - shell bootstrap script under version control

                            Goal:
                            rebuild server from code, not memory.
Production rule: if a package is installed manually and nobody documents why, the server has started to become a snowflake.
Package management cheat sheet and final checklist
APT cheat sheet
# Metadata and updates
                            sudo apt update
                            apt list --upgradable
                            sudo apt upgrade
                            sudo apt full-upgrade

                            # Install and remove
                            sudo apt install package-name
                            sudo apt remove package-name
                            sudo apt purge package-name
                            sudo apt autoremove

                            # Inspect
                            apt show package-name
                            apt policy package-name
                            apt-cache madison package-name
                            dpkg -l | grep package-name
                            dpkg -L package-name
                            dpkg -S /path/to/file

                            # Troubleshoot
                            sudo dpkg --configure -a
                            sudo apt -f install
                            apt-mark showhold
                            less /var/log/apt/history.log

                            # Hold
                            sudo apt-mark hold package-name
                            sudo apt-mark unhold package-name
Snap cheat sheet
# Inspect
                            snap list
                            snap find package-name
                            snap info package-name

                            # Install and remove
                            sudo snap install package-name
                            sudo snap install package-name --channel=stable
                            sudo snap remove package-name

                            # Refresh
                            sudo snap refresh
                            snap refresh --time
                            sudo snap refresh --hold=24h package-name

                            # Operations
                            snap changes
                            snap connections package-name
                            snap logs package-name
                            sudo snap revert package-name
Final production checklist
[ ] Ubuntu official repositories are enabled
                            [ ] Security repository is enabled
                            [ ] External repositories are documented
                            [ ] Repository keys are managed in keyrings
                            [ ] PPAs are justified or avoided
                            [ ] Package baseline is automated
                            [ ] Critical package versions are known
                            [ ] Holds and pins are documented
                            [ ] Update policy is defined
                            [ ] Reboot policy is defined
                            [ ] Staging update test exists
                            [ ] Rollback image or snapshot exists
                            [ ] Apt history is reviewed after changes
                            [ ] Snap policy is defined
                            [ ] Security advisories are monitored
Final rule
Package management is production governance.
APT and Snap are not just installation tools. They define what software runs, where it comes from, how it is patched, how it is upgraded, and how safely the system can recover when a package change goes wrong.
7.5 Ubuntu Customization & Optimization: GNOME, themes, keyboard shortcuts, battery, swappiness and cleanup
Customization and optimization objective

Ubuntu can be customized at several levels: desktop interface, GNOME extensions, themes, icons, fonts, keyboard shortcuts, startup applications, power settings, memory behavior and cleanup routines. The objective is to improve usability and performance without making the system fragile.

Good customization is controlled, reversible and documented. Bad customization creates unstable extensions, broken themes, slow login, excessive startup services, battery drain, hidden disk growth and difficult troubleshooting.

AreaGoalMain toolsRisk if unmanaged
GNOME interfaceImprove desktop workflow.Settings, Tweaks, Extensions.Shell instability or visual inconsistency.
Themes and iconsAdapt visual style.GTK themes, icon themes, user themes.Broken UI after updates.
Keyboard shortcutsAccelerate daily workflow.Settings, custom commands, terminal shortcuts.Conflicts and hard-to-remember mappings.
BatteryReduce power usage on laptops.Power profiles, TLP, powertop.Thermal issues or poor autonomy.
Memory tuningControl swap behavior.vm.swappiness, monitoring.Slow system if tuned blindly.
CleanupKeep disk usage healthy.APT cleanup, journal vacuum, cache review.Disk full or accidental data loss.
Core rule: customize for productivity, not for complexity. Every optimization should be measurable, reversible and safe after system updates.
Optimization map
Ubuntu workstation optimization
                            โ”‚
                            โ”œโ”€โ”€ Interface
                            โ”‚       โ”œโ”€โ”€ GNOME Settings
                            โ”‚       โ”œโ”€โ”€ GNOME Tweaks
                            โ”‚       โ”œโ”€โ”€ dock behavior
                            โ”‚       โ”œโ”€โ”€ workspace behavior
                            โ”‚       โ””โ”€โ”€ display settings
                            โ”‚
                            โ”œโ”€โ”€ Extensions
                            โ”‚       โ”œโ”€โ”€ shell extensions
                            โ”‚       โ”œโ”€โ”€ app indicators
                            โ”‚       โ”œโ”€โ”€ tiling helpers
                            โ”‚       โ””โ”€โ”€ workflow enhancers
                            โ”‚
                            โ”œโ”€โ”€ Visual style
                            โ”‚       โ”œโ”€โ”€ GTK theme
                            โ”‚       โ”œโ”€โ”€ icon theme
                            โ”‚       โ”œโ”€โ”€ cursor theme
                            โ”‚       โ””โ”€โ”€ fonts
                            โ”‚
                            โ”œโ”€โ”€ Productivity
                            โ”‚       โ”œโ”€โ”€ keyboard shortcuts
                            โ”‚       โ”œโ”€โ”€ terminal shortcuts
                            โ”‚       โ”œโ”€โ”€ custom commands
                            โ”‚       โ””โ”€โ”€ launcher workflow
                            โ”‚
                            โ””โ”€โ”€ Performance
                            โ”œโ”€โ”€ startup apps
                            โ”œโ”€โ”€ battery profile
                            โ”œโ”€โ”€ swappiness
                            โ”œโ”€โ”€ cache cleanup
                            โ””โ”€โ”€ logs and disk hygiene
Decision shortcut
Want a better desktop?
                            โ”œโ”€โ”€ first use built-in Settings
                            โ”œโ”€โ”€ then GNOME Tweaks
                            โ”œโ”€โ”€ then a few trusted extensions
                            โ””โ”€โ”€ avoid stacking many shell modifications

                            Want better performance?
                            โ”œโ”€โ”€ remove useless startup apps
                            โ”œโ”€โ”€ check disk and memory
                            โ”œโ”€โ”€ tune battery profile
                            โ”œโ”€โ”€ clean caches safely
                            โ””โ”€โ”€ measure before changing kernel parameters
GNOME interface: built-in customization first

Ubuntu Desktop uses GNOME with Ubuntu-specific defaults. Before installing extensions or themes, start with built-in settings: dock placement, appearance, workspaces, display scaling, night light, keyboard layout, privacy, notifications and power profile.

Interface areaWhere to configureUseful for
AppearanceSettings โ†’ Appearance.Light/dark mode, accent style, dock behavior.
DisplaysSettings โ†’ Displays.Resolution, scaling, multi-monitor layout.
KeyboardSettings โ†’ Keyboard.Shortcuts, input sources, custom commands.
PowerSettings โ†’ Power.Battery profile, screen blank, suspend behavior.
NotificationsSettings โ†’ Notifications.Reduce distractions.
PrivacySettings โ†’ Privacy.Location, file history, camera, microphone.
GNOME Tweaks installation
# Install GNOME Tweaks
                            sudo apt update
                            sudo apt install gnome-tweaks

                            # Launch from terminal
                            gnome-tweaks

                            # Install extension app if available
                            sudo apt install gnome-shell-extension-manager
Interface customization flow
Customize desktop
                            โ”‚
                            โ”œโ”€โ”€ Built-in Settings
                            โ”‚       โ”œโ”€โ”€ appearance
                            โ”‚       โ”œโ”€โ”€ display
                            โ”‚       โ”œโ”€โ”€ keyboard
                            โ”‚       โ”œโ”€โ”€ power
                            โ”‚       โ””โ”€โ”€ privacy
                            โ”‚
                            โ”œโ”€โ”€ GNOME Tweaks
                            โ”‚       โ”œโ”€โ”€ fonts
                            โ”‚       โ”œโ”€โ”€ window behavior
                            โ”‚       โ”œโ”€โ”€ startup apps
                            โ”‚       โ””โ”€โ”€ themes if enabled
                            โ”‚
                            โ”œโ”€โ”€ Extensions
                            โ”‚       โ”œโ”€โ”€ install only useful ones
                            โ”‚       โ”œโ”€โ”€ verify compatibility
                            โ”‚       โ””โ”€โ”€ disable if shell breaks
                            โ”‚
                            โ””โ”€โ”€ Backup preferences
                            โ”œโ”€โ”€ document installed extensions
                            โ”œโ”€โ”€ export dotfiles if needed
                            โ””โ”€โ”€ keep restore point
Useful inspection commands
# GNOME Shell version
                            gnome-shell --version

                            # Current desktop session
                            echo $XDG_CURRENT_DESKTOP
                            echo $XDG_SESSION_TYPE

                            # Display environment
                            echo $WAYLAND_DISPLAY
                            echo $DISPLAY

                            # Installed GNOME packages
                            dpkg -l | grep -i gnome | head

                            # User configuration directories
                            ls -lah ~/.config
                            ls -lah ~/.local/share
Interface rule: use the simplest native setting first. Extensions should solve real workflow problems, not replace every part of the desktop.
GNOME extensions: workflow power with compatibility discipline

GNOME extensions modify the behavior of GNOME Shell. They can add indicators, tiling, dock improvements, clipboard managers, system monitors or workflow enhancements. However, extensions run inside the desktop shell environment and can break after GNOME upgrades if not maintained.

Extension typeUse caseRisk
App indicatorsTray icons for apps.Low to medium.
Dock customizationDock behavior and visual changes.Medium if overlapping Ubuntu dock.
Tiling assistantsWindow snapping and layouts.Medium if shell version changes.
System monitorsCPU, RAM, network indicators.Can add overhead if badly implemented.
Theme/user shellShell visual customization.Can break visual consistency.
Install and manage extensions
# Install Extension Manager if available
                            sudo apt update
                            sudo apt install gnome-shell-extension-manager

                            # List enabled extensions
                            gnome-extensions list --enabled

                            # List all extensions
                            gnome-extensions list

                            # Show extension info
                            gnome-extensions info extension-name

                            # Disable extension
                            gnome-extensions disable extension-name

                            # Enable extension
                            gnome-extensions enable extension-name
Extension safety flow
Before installing extension
                            โ”‚
                            โ”œโ”€โ”€ Is it really needed?
                            โ”œโ”€โ”€ Is it compatible with GNOME version?
                            โ”œโ”€โ”€ Is it maintained?
                            โ”œโ”€โ”€ Does it overlap with another extension?
                            โ”œโ”€โ”€ Can it be disabled easily?
                            โ””โ”€โ”€ Is there a restore point before major desktop changes?
Extension troubleshooting
# Disable all extensions for diagnostic
                            gnome-extensions disable extension-name

                            # Check GNOME Shell logs
                            journalctl /usr/bin/gnome-shell --since "1 hour ago"

                            # Check session errors
                            journalctl --user -p warning --since "1 hour ago"

                            # Restart GNOME Shell on Xorg
                            # Press Alt+F2, type r, press Enter

                            # On Wayland, log out and log back in
Recommended extension policy
Good:
                            - install only a few extensions
                            - prefer maintained extensions
                            - remove unused extensions
                            - document core workflow extensions
                            - test after Ubuntu upgrade

                            Avoid:
                            - stacking many visual extensions
                            - installing abandoned extensions
                            - relying on extensions for critical access
                            - changing many extensions at once
                            - ignoring shell errors after login
Extension warning: a broken GNOME extension can make the desktop unstable. Keep the list small and know how to disable extensions.
GTK themes, icon themes, cursor themes and visual consistency

Ubuntu visual customization can use GTK themes, icon themes, cursor themes and fonts. Themes can improve comfort and readability, but deep theming may break after application or desktop updates, especially when applications use different toolkit versions.

Theme elementWhat it changesTypical location
GTK themeWindow and widget appearance.~/.themes, /usr/share/themes
Icon themeApplication and file icons.~/.icons, ~/.local/share/icons
Cursor themeMouse pointer style.~/.icons, system icon paths.
Shell themeGNOME Shell top bar, menus, overview.Requires user theme support.
FontsUI and document typography.GNOME Tweaks.
Theme directories
# User theme directories
                            mkdir -p ~/.themes
                            mkdir -p ~/.icons
                            mkdir -p ~/.local/share/icons

                            # System theme directories
                            ls -lah /usr/share/themes
                            ls -lah /usr/share/icons

                            # User config
                            ls -lah ~/.config
                            ls -lah ~/.local/share
Theme installation flow
Install theme safely
                            โ”‚
                            โ”œโ”€โ”€ Download from trusted source
                            โ”œโ”€โ”€ Extract theme
                            โ”œโ”€โ”€ Place in user directory
                            โ”‚       โ”œโ”€โ”€ ~/.themes
                            โ”‚       โ””โ”€โ”€ ~/.icons
                            โ”œโ”€โ”€ Open GNOME Tweaks
                            โ”œโ”€โ”€ Select theme
                            โ”œโ”€โ”€ Verify apps look correct
                            โ””โ”€โ”€ Keep original theme as fallback
Visual customization checklist
[ ] Theme source is trusted
                            [ ] Theme supports current GNOME/GTK version
                            [ ] Original theme remains available
                            [ ] Icons are readable in light and dark mode
                            [ ] Terminal colors remain readable
                            [ ] File manager remains usable
                            [ ] Browser and developer tools remain clear
                            [ ] Screenshots and presentations look professional
                            [ ] Theme can be reverted quickly
Common theme problems
ProblemLikely causeCorrection
Invisible textTheme color mismatch.Return to default or compatible theme.
Broken window controlsUnsupported shell or GTK version.Use maintained theme.
Icons missingIncomplete icon theme.Install fallback icon set.
App does not follow themeDifferent toolkit or sandbox package.Accept limitation or configure app separately.
Visual rule: prioritize readability and stability over extreme theming, especially on a professional workstation.
Keyboard shortcuts: customize workflow and reduce friction

Keyboard shortcuts are one of the highest-return customizations. They reduce mouse use, speed up window management, launch tools quickly and make development workflows smoother. The best shortcuts are easy to remember and do not conflict with application shortcuts.

Shortcut areaExample actionGood candidate
TerminalOpen terminal quickly.Ctrl + Alt + T
Window managementMove, maximize, tile windows.Super + arrows.
WorkspacesSwitch between focused contexts.Super + Page Up/Page Down.
ScreenshotsCapture screen or region.Print Screen shortcuts.
Custom app launchOpen IDE, browser, file manager.Custom commands.
ScriptsRun productivity automation.Custom script binding.
Custom shortcut flow
Create custom shortcut
                            โ”‚
                            โ”œโ”€โ”€ Open Settings
                            โ”œโ”€โ”€ Go to Keyboard
                            โ”œโ”€โ”€ Open Keyboard Shortcuts
                            โ”œโ”€โ”€ Add Custom Shortcut
                            โ”œโ”€โ”€ Enter name
                            โ”œโ”€โ”€ Enter command
                            โ”œโ”€โ”€ Assign key combination
                            โ””โ”€โ”€ Test immediately
Useful custom commands
# Open terminal
                            gnome-terminal

                            # Open file manager
                            nautilus

                            # Open browser
                            firefox

                            # Open specific project directory
                            gnome-terminal --working-directory=/home/user/projects

                            # Run a custom script
                            /home/user/bin/daily-check.sh

                            # Lock screen
                            gnome-screensaver-command -l
Shortcut design principles
Good shortcuts:
                            - easy to remember
                            - close to existing habits
                            - not conflicting with IDE/browser
                            - consistent by category
                            - documented if custom
                            - limited to high-frequency actions

                            Avoid:
                            - too many shortcuts
                            - hard-to-type combinations
                            - overriding critical app shortcuts
                            - shortcuts that run destructive scripts
                            - undocumented production scripts
Developer workflow example
Workflow:
                            Super + Enter       -> terminal
                            Super + E           -> file manager
                            Super + B           -> browser
                            Super + D           -> IDE
                            Super + Shift + L   -> lock screen
                            Super + Shift + M   -> monitoring dashboard
                            Super + Shift + T   -> project terminal
Shortcut rule: customize shortcuts for actions you perform every day. If you use an action once a month, it does not need a shortcut.
Battery and power optimization for laptops

Battery optimization on Ubuntu starts with power profiles, screen brightness, sleep behavior, background applications and hardware drivers. More advanced users can use tools like TLP or powertop, but should avoid applying random power tweaks without verifying their effect.

Power areaOptimizationTrade-off
Power profileUse power saver on battery.Lower performance.
Screen brightnessReduce brightness.Less visibility in bright environment.
Sleep behaviorShorter idle suspend.May interrupt background tasks.
Startup appsDisable unnecessary background apps.Some apps need manual launch.
BluetoothDisable when unused.Peripheral inconvenience.
GPU modeUse integrated graphics if possible.Lower graphics performance.
Power commands
# Show power profiles if supported
                            powerprofilesctl

                            # Set power saver
                            powerprofilesctl set power-saver

                            # Set balanced
                            powerprofilesctl set balanced

                            # Set performance if available
                            powerprofilesctl set performance

                            # Battery status
                            upower -i $(upower -e | grep BAT) 2>/dev/null

                            # Show running processes
                            top

                            # Show startup applications through GUI
                            gnome-session-properties
TLP and powertop
# Install TLP
                            sudo apt update
                            sudo apt install tlp

                            # Enable TLP
                            sudo systemctl enable --now tlp

                            # Show TLP status
                            sudo tlp-stat -s

                            # Install powertop
                            sudo apt install powertop

                            # Run powertop
                            sudo powertop
Battery optimization flow
Battery drains quickly
                            โ”‚
                            โ”œโ”€โ”€ Check power profile
                            โ”œโ”€โ”€ Reduce screen brightness
                            โ”œโ”€โ”€ Close high CPU apps
                            โ”œโ”€โ”€ Disable unused Bluetooth
                            โ”œโ”€โ”€ Review startup apps
                            โ”œโ”€โ”€ Check browser tabs
                            โ”œโ”€โ”€ Check GPU mode
                            โ”œโ”€โ”€ Use TLP if needed
                            โ””โ”€โ”€ Measure again
Laptop routine
On battery:
                            [ ] power-saver profile
                            [ ] lower brightness
                            [ ] close heavy browser tabs
                            [ ] stop unused containers or VMs
                            [ ] disable Bluetooth if unused
                            [ ] avoid heavy indexing jobs
                            [ ] monitor CPU usage
                            [ ] suspend when idle
Battery warning: Docker containers, VMs, IDE indexers, browsers and video calls can dominate power usage. Tune applications before blaming the OS.
Swappiness: memory behavior and swap tuning

Swappiness controls how aggressively the Linux kernel tends to move memory pages to swap. Lower values generally reduce swap tendency; higher values allow more swapping. It is not a magic performance setting. The correct value depends on RAM size, workload, disk speed and latency tolerance.

ContextTypical approachReason
Desktop with enough RAMModerately low swappiness.Keep apps responsive.
Small laptopDo not disable swap blindly.Swap can prevent abrupt OOM.
Database serverAvoid active swapping.Swap can hurt latency heavily.
Batch workloadSome swap may be acceptable.Throughput may tolerate latency.
VM with slow diskBe careful with swap activity.Slow storage amplifies latency.
Inspect memory and swappiness
# Current swappiness
                            cat /proc/sys/vm/swappiness
                            sysctl vm.swappiness

                            # Memory overview
                            free -h

                            # Swap devices/files
                            swapon --show

                            # Swap activity
                            vmstat 1

                            # Top memory processes
                            ps aux --sort=-%mem | head -30
Temporary and persistent swappiness
# Temporary change until reboot
                            sudo sysctl -w vm.swappiness=10

                            # Persistent configuration
                            sudo vim /etc/sysctl.d/99-custom-swappiness.conf

                            # Example content
                            vm.swappiness = 10

                            # Apply persistent sysctl files
                            sudo sysctl --system

                            # Verify
                            sysctl vm.swappiness
Swappiness decision tree
Considering swappiness change?
                            โ”‚
                            โ”œโ”€โ”€ Is there real swap activity?
                            โ”‚       โ”œโ”€โ”€ no -> do not tune yet
                            โ”‚       โ””โ”€โ”€ yes
                            โ”‚
                            โ”œโ”€โ”€ Is system slow because of swapping?
                            โ”‚       โ”œโ”€โ”€ no -> investigate app first
                            โ”‚       โ””โ”€โ”€ yes
                            โ”‚
                            โ”œโ”€โ”€ Is RAM insufficient?
                            โ”‚       โ”œโ”€โ”€ yes -> reduce workload or add RAM
                            โ”‚       โ””โ”€โ”€ no
                            โ”‚
                            โ””โ”€โ”€ Test lower value
                            โ”œโ”€โ”€ apply temporarily
                            โ”œโ”€โ”€ measure behavior
                            โ”œโ”€โ”€ document result
                            โ””โ”€โ”€ make persistent only if useful
Memory interpretation
SignalMeaning
High used memoryNormal if Linux is using cache.
Low available memoryPossible pressure.
Swap used but stableNot always a problem.
Active swap in/outPerformance warning.
OOM logsMemory exhaustion occurred.
Swappiness rule: tune only after observing memory pressure. The best fix for constant swapping is often less workload or more RAM, not only a sysctl value.
Cleanup: temporary files, caches, logs and safe disk hygiene

Cleanup keeps Ubuntu healthy, but careless cleanup can delete useful data. Focus on safe areas first: APT cache, unused packages, journal size, trash, thumbnails, old downloads and application caches. Be very careful with database directories, Docker volumes and project folders.

Cleanup targetCommand / locationSafety level
APT cachesudo apt cleanSafe.
Unused packagessudo apt autoremoveUsually safe, review output.
Systemd journaljournalctl --vacuum-time=14dSafe if retention is acceptable.
User trashFile manager or trash path.Safe if reviewed.
Downloads~/DownloadsManual review recommended.
Docker datadocker system dfCareful, volumes may contain data.
Database files/var/lib/mysql, /var/lib/postgresqlDangerous to delete manually.
Safe cleanup commands
# Check disk usage first
                            df -h

                            # Show top-level directory sizes
                            sudo du -xhd1 / 2>/dev/null | sort -h

                            # Clean APT cache
                            sudo apt clean

                            # Remove unused packages
                            sudo apt autoremove

                            # Show journal size
                            journalctl --disk-usage

                            # Vacuum journal by time
                            sudo journalctl --vacuum-time=14d

                            # Vacuum journal by size
                            sudo journalctl --vacuum-size=1G
User cache cleanup
# Check user cache size
                            du -sh ~/.cache 2>/dev/null

                            # Check thumbnails
                            du -sh ~/.cache/thumbnails 2>/dev/null

                            # Remove thumbnail cache
                            rm -rf ~/.cache/thumbnails/*

                            # Review downloads manually
                            du -sh ~/Downloads/*
                            ls -lah ~/Downloads
Docker cleanup caution
# Show Docker disk usage
                            docker system df

                            # Remove unused images only
                            docker image prune

                            # Remove stopped containers
                            docker container prune

                            # More aggressive cleanup, use carefully
                            docker system prune

                            # Dangerous for persistent data if volumes included
                            docker system prune --volumes
Cleanup decision tree
Need disk space?
                            โ”‚
                            โ”œโ”€โ”€ Check filesystem
                            โ”‚       โ””โ”€โ”€ df -h
                            โ”‚
                            โ”œโ”€โ”€ Find large directories
                            โ”‚       โ””โ”€โ”€ du -xhd1 /
                            โ”‚
                            โ”œโ”€โ”€ Safe cleanup first
                            โ”‚       โ”œโ”€โ”€ apt clean
                            โ”‚       โ”œโ”€โ”€ apt autoremove
                            โ”‚       โ”œโ”€โ”€ journal vacuum
                            โ”‚       โ””โ”€โ”€ trash/downloads review
                            โ”‚
                            โ”œโ”€โ”€ App-specific cleanup
                            โ”‚       โ”œโ”€โ”€ browser cache
                            โ”‚       โ”œโ”€โ”€ Docker images
                            โ”‚       โ””โ”€โ”€ old build artifacts
                            โ”‚
                            โ””โ”€โ”€ Dangerous data zones
                            โ”œโ”€โ”€ databases
                            โ”œโ”€โ”€ Docker volumes
                            โ”œโ”€โ”€ project data
                            โ””โ”€โ”€ backups
Cleanup warning: never delete files in database directories or Docker volumes unless you fully understand what owns them and have a backup.
Troubleshooting customization and optimization problems

Customization problems usually appear after installing extensions, changing themes, modifying startup apps, changing power tools, altering sysctl settings or cleaning too aggressively. The fastest recovery is to isolate the last change and revert it.

SymptomLikely causeFirst checkFix direction
Desktop shell unstableBroken GNOME extension.gnome-extensions list --enabledDisable recent extension.
Text unreadableTheme mismatch.GNOME Tweaks theme settings.Return to default theme.
Login slowStartup apps or extensions.User journal and startup apps.Disable nonessential startup entries.
Battery drains fastHigh CPU app, containers, VM, browser.top, power profile.Stop heavy workload, set power saver.
System slow after tuningBad sysctl or swap behavior.vmstat 1, sysctl values.Revert tuning.
Missing files after cleanupOver-aggressive delete.Trash, backup, shell history.Restore from backup if possible.
Diagnostic commands
# GNOME version and session
                            gnome-shell --version
                            echo $XDG_SESSION_TYPE

                            # Enabled extensions
                            gnome-extensions list --enabled

                            # User session warnings
                            journalctl --user -p warning --since "1 hour ago"

                            # GNOME Shell logs
                            journalctl /usr/bin/gnome-shell --since "1 hour ago"

                            # Resource usage
                            top
                            free -h
                            df -h

                            # Swappiness
                            sysctl vm.swappiness
Rollback flow
Customization issue
                            โ”‚
                            โ”œโ”€โ”€ What changed last?
                            โ”‚       โ”œโ”€โ”€ extension
                            โ”‚       โ”œโ”€โ”€ theme
                            โ”‚       โ”œโ”€โ”€ shortcut
                            โ”‚       โ”œโ”€โ”€ startup app
                            โ”‚       โ”œโ”€โ”€ power tool
                            โ”‚       โ””โ”€โ”€ sysctl value
                            โ”‚
                            โ”œโ”€โ”€ Disable or revert one change
                            โ”œโ”€โ”€ Log out and log back in if needed
                            โ”œโ”€โ”€ Check user journal
                            โ”œโ”€โ”€ Verify desktop stability
                            โ””โ”€โ”€ Document stable configuration
Safe mode mindset
If desktop is unstable:
                            1. Switch to terminal if possible
                            2. Disable recent extensions
                            3. Return to default theme
                            4. Remove recent startup app
                            5. Reboot or log out
                            6. Restore Timeshift snapshot if needed
Useful reset targets
# Disable one extension
                            gnome-extensions disable extension-name

                            # List user autostart entries
                            ls -lah ~/.config/autostart

                            # Move suspicious autostart entry away
                            mkdir -p ~/.config/autostart.disabled
                            mv ~/.config/autostart/app.desktop ~/.config/autostart.disabled/

                            # Revert sysctl custom file
                            sudo mv /etc/sysctl.d/99-custom-swappiness.conf /tmp/
                            sudo sysctl --system
Troubleshooting rule: customization failures are easiest to fix when you change one thing at a time and keep a restore point before major changes.
Final customization and optimization checklist
Customization checklist
[ ] Built-in Settings used before extensions
                            [ ] GNOME Tweaks installed if needed
                            [ ] Extension list is short and useful
                            [ ] Extensions are compatible with GNOME version
                            [ ] Unused extensions removed
                            [ ] Theme source is trusted
                            [ ] Default theme remains available
                            [ ] Icons remain readable
                            [ ] Terminal colors remain readable
                            [ ] Keyboard shortcuts are documented
                            [ ] No shortcut runs destructive command
                            [ ] Startup applications are reviewed
                            [ ] Restore point exists before major desktop changes
Optimization checklist
[ ] Power profile configured
                            [ ] Battery behavior reviewed
                            [ ] Heavy startup apps disabled
                            [ ] Disk usage checked
                            [ ] APT cache cleaned when needed
                            [ ] Journal size controlled
                            [ ] User caches reviewed
                            [ ] Docker usage reviewed if installed
                            [ ] Swappiness observed before tuning
                            [ ] sysctl changes documented
                            [ ] Performance measured before and after changes
                            [ ] Cleanup avoids databases and important volumes
Command cheat sheet
# GNOME and extensions
                            gnome-shell --version
                            gnome-extensions list
                            gnome-extensions list --enabled
                            gnome-extensions disable extension-name
                            sudo apt install gnome-tweaks
                            sudo apt install gnome-shell-extension-manager

                            # Power
                            powerprofilesctl
                            powerprofilesctl set power-saver
                            powerprofilesctl set balanced
                            sudo apt install tlp
                            sudo systemctl enable --now tlp
                            sudo tlp-stat -s

                            # Memory and swappiness
                            free -h
                            swapon --show
                            vmstat 1
                            sysctl vm.swappiness
                            sudo sysctl -w vm.swappiness=10

                            # Cleanup
                            df -h
                            sudo du -xhd1 / 2>/dev/null | sort -h
                            sudo apt clean
                            sudo apt autoremove
                            journalctl --disk-usage
                            sudo journalctl --vacuum-time=14d
Final rule
A good Ubuntu workstation is comfortable, fast and recoverable.
Customize GNOME carefully, keep extensions minimal, use readable themes, build a keyboard-driven workflow, optimize battery and memory only with evidence, clean disk space safely, and keep rollback options before major changes.
Minimal safe profile
Minimum safe customization profile:
                            - default theme fallback
                            - small extension set
                            - documented shortcuts
                            - reviewed startup apps
                            - power profile selected
                            - disk cleanup routine
                            - no blind sysctl tuning
                            - no dangerous cleanup
                            - restore point before major changes
                            - stable desktop after logout/reboot test
4.1 Ubuntu Security Hardening: SSH, UFW, users, roles, updates, audit, fail2ban and cloud security
Security hardening objective

Ubuntu hardening means reducing the attack surface of a machine while keeping it maintainable. The goal is not to make the server impossible to use. The goal is to control access, reduce exposed ports, keep packages patched, monitor suspicious events, protect secrets, isolate services and keep a clear recovery path.

A secure Ubuntu server is built layer by layer: SSH access, users and sudo, firewall, package updates, service isolation, log visibility, intrusion throttling, cloud network rules, backups and incident procedures.

Security layerGoalMain toolsFailure prevented
SSHControl remote administration.sshd_config, SSH keys, logs.Brute force, root login abuse, password compromise.
FirewallExpose only required ports.ufw, nftables, cloud security groups.Unwanted network exposure.
Users and sudoApply least privilege.adduser, usermod, sudoers.Shared accounts, excessive privileges, poor auditability.
UpdatesPatch known vulnerabilities.apt, unattended upgrades, reboot policy.Known CVEs left exploitable.
AuditSee what happened.journalctl, auth.log, auditd, central logs.Blind incidents and no forensic trail.
CloudControl external exposure and identity.Security groups, IAM, metadata settings, snapshots.Public services, leaked secrets, weak recovery.
Core rule: hardening must remain observable and reversible. Every security change should be documented, testable and recoverable.
Hardening architecture map
Internet
                            โ”‚
                            โ”œโ”€โ”€ DNS
                            โ”œโ”€โ”€ CDN / WAF / Load Balancer
                            โ””โ”€โ”€ Cloud security group
                            โ”‚
                            โ–ผ
                            Ubuntu server
                            โ”‚
                            โ”œโ”€โ”€ UFW / nftables
                            โ”œโ”€โ”€ SSH daemon
                            โ”œโ”€โ”€ system users and sudo
                            โ”œโ”€โ”€ systemd services
                            โ”œโ”€โ”€ package security updates
                            โ”œโ”€โ”€ logs and audit trail
                            โ”œโ”€โ”€ fail2ban or rate controls
                            โ”œโ”€โ”€ secrets and permissions
                            โ””โ”€โ”€ backups and restore plan
                            โ”‚
                            โ–ผ
                            Application layer
                            โ”œโ”€โ”€ Nginx
                            โ”œโ”€โ”€ app runtime
                            โ”œโ”€โ”€ database
                            โ”œโ”€โ”€ Redis
                            โ””โ”€โ”€ monitoring agent
Security baseline priorities
Priority 1:
                            - SSH keys
                            - no root SSH login
                            - firewall enabled
                            - security updates
                            - backups

                            Priority 2:
                            - fail2ban or equivalent
                            - sudo policy
                            - service users
                            - secret permissions
                            - log review

                            Priority 3:
                            - auditd
                            - central logging
                            - file integrity checks
                            - vulnerability scanning
                            - CIS-style benchmark review

                            Priority 4:
                            - bastion host
                            - VPN-only administration
                            - WAF
                            - immutable images
                            - automated rebuild
SSH hardening: keys, root login, password policy and safe reload

SSH is usually the main administration door. On a public server, weak SSH configuration is one of the first risks to address. The safe baseline is key-based login, no direct root login, no password authentication when keys are validated, and limited users.

SettingRecommended valueWhy
PermitRootLoginnoForces named-user login and sudo audit trail.
PasswordAuthenticationnoBlocks password brute-force login.
PubkeyAuthenticationyesUses SSH keys.
AllowUsersSpecific admin users only.Reduces account exposure.
X11Forwardingno on servers.Reduces unused features.
MaxAuthTriesSmall value such as 3.Limits repeated authentication attempts.
Generate and install key
# On admin workstation
                            ssh-keygen -t ed25519 -C "admin-server-access"

                            # Copy public key to server
                            ssh-copy-id deploy@server.example.com

                            # Test key login before changing server policy
                            ssh deploy@server.example.com
Safe SSH hardening flow
Open current SSH session
                            โ”‚
                            โ”œโ”€โ”€ Create deploy user
                            โ”œโ”€โ”€ Add SSH key
                            โ”œโ”€โ”€ Test second SSH session
                            โ”œโ”€โ”€ Backup sshd_config
                            โ”œโ”€โ”€ Apply hardening
                            โ”œโ”€โ”€ Validate syntax
                            โ”œโ”€โ”€ Restart SSH
                            โ”œโ”€โ”€ Test third SSH session
                            โ””โ”€โ”€ Close old session only after success
Server-side SSH configuration
# Create backup
                            sudo cp /etc/ssh/sshd_config /etc/ssh/sshd_config.bak.$(date +%Y%m%d-%H%M%S)

                            # Edit configuration
                            sudo vim /etc/ssh/sshd_config

                            # Recommended baseline
                            PermitRootLogin no
                            PasswordAuthentication no
                            PubkeyAuthentication yes
                            X11Forwarding no
                            MaxAuthTries 3
                            AllowUsers deploy

                            # Validate syntax before restart
                            sudo sshd -t

                            # Restart SSH
                            sudo systemctl restart ssh

                            # Check service and logs
                            systemctl status ssh
                            journalctl -u ssh --since "15 min ago"
SSH diagnostic commands
# Service status
                            systemctl status ssh

                            # Listening port
                            ss -lntp | grep ssh

                            # Authentication logs
                            journalctl -u ssh --since today
                            sudo tail -100 /var/log/auth.log

                            # Current sessions
                            who
                            w

                            # Show user key file permissions
                            ls -lah ~/.ssh
                            ls -lah ~/.ssh/authorized_keys
Lockout warning: never disable password login until key login has been tested from a separate terminal.
UFW firewall: minimal exposure and safe activation

UFW is a simple firewall frontend commonly used on Ubuntu. The baseline is to deny incoming traffic by default, allow outgoing traffic, then open only the required service ports. On cloud servers, UFW complements cloud security groups; it does not replace them.

PortServiceExposure ruleComment
22/tcpSSHRestrict by source IP if possible.Administration path.
80/tcpHTTPOpen only for web server or redirect.Often redirects to HTTPS.
443/tcpHTTPSOpen for public web apps.Primary web entry point.
5432/tcpPostgreSQLPrivate network only.Never public unless heavily controlled.
6379/tcpRedisPrivate network only.Do not expose publicly.
3306/tcpMySQL/MariaDBPrivate network only.Restrict by source and credentials.
Safe UFW baseline
# Show current status
                            sudo ufw status verbose

                            # Default policy
                            sudo ufw default deny incoming
                            sudo ufw default allow outgoing

                            # Allow SSH before enabling firewall
                            sudo ufw allow OpenSSH

                            # Web server ports if needed
                            sudo ufw allow 80/tcp
                            sudo ufw allow 443/tcp

                            # Enable firewall
                            sudo ufw enable

                            # Verify
                            sudo ufw status verbose
                            sudo ufw status numbered
Firewall decision diagram
New service installed
                            โ”‚
                            โ”œโ”€โ”€ Does it need network access?
                            โ”‚       โ”œโ”€โ”€ no  -> keep local only
                            โ”‚       โ””โ”€โ”€ yes
                            โ”‚
                            โ”œโ”€โ”€ Is it public-facing?
                            โ”‚       โ”œโ”€โ”€ yes -> allow only required public port
                            โ”‚       โ””โ”€โ”€ no
                            โ”‚
                            โ”œโ”€โ”€ Is it internal only?
                            โ”‚       โ”œโ”€โ”€ yes -> restrict to private CIDR or source IP
                            โ”‚       โ””โ”€โ”€ no
                            โ”‚
                            โ””โ”€โ”€ Is exposure documented?
                            โ”œโ”€โ”€ yes -> add rule
                            โ””โ”€โ”€ no  -> do not expose
Restrict access by source
# Allow SSH only from one admin IP
                            sudo ufw allow from 203.0.113.10 to any port 22 proto tcp

                            # Allow PostgreSQL only from app server
                            sudo ufw allow from 10.0.1.25 to any port 5432 proto tcp

                            # Delete rule by number
                            sudo ufw status numbered
                            sudo ufw delete 3

                            # Deny a specific IP
                            sudo ufw deny from 198.51.100.55
UFW diagnostics
# UFW status
                            sudo ufw status verbose

                            # Listening ports
                            ss -lntp

                            # Check service locally
                            curl -I http://localhost

                            # Check logs if logging enabled
                            sudo ufw logging on
                            sudo journalctl -k --since "30 min ago" | grep UFW
Cloud warning: a port may be blocked by UFW, cloud security group, NACL, load balancer, application bind address or service config. Check every layer.
Users, groups, sudo, service accounts and least privilege

Least privilege means each human and service gets only the permissions needed to do its job. Avoid shared admin accounts, avoid running applications as root, and keep secrets readable only by the users that need them.

Identity typeRecommended practiceExample
Human adminNamed account with sudo if required.deploy, ops_admin
Application userDedicated non-login user.myapp, www-data
Database userApplication-specific DB account.myapp_db_user
RootAvoid direct login.Use sudo with audit trail.
Shared accountAvoid.Hard to audit and revoke safely.
User and group commands
# Create admin user
                            sudo adduser deploy
                            sudo usermod -aG sudo deploy

                            # Create service user without login shell
                            sudo adduser --system --group --home /srv/myapp myapp

                            # Show user identity
                            id deploy
                            groups deploy

                            # Show sudo permissions
                            sudo -l

                            # Edit sudoers safely
                            sudo visudo

                            # Add sudoers file safely
                            sudo visudo -f /etc/sudoers.d/deploy
Least privilege model
Human admin
                            โ”‚
                            โ”œโ”€โ”€ SSH key login
                            โ”œโ”€โ”€ sudo for admin actions
                            โ””โ”€โ”€ no direct root login

                            Application service
                            โ”‚
                            โ”œโ”€โ”€ dedicated user
                            โ”œโ”€โ”€ limited filesystem access
                            โ”œโ”€โ”€ systemd service unit
                            โ””โ”€โ”€ no shell login if not needed

                            Secrets
                            โ”‚
                            โ”œโ”€โ”€ owned by service user or root
                            โ”œโ”€โ”€ mode 600 or 640
                            โ”œโ”€โ”€ not world-readable
                            โ””โ”€โ”€ not committed to git
Secret and file permissions
# Private key
                            chmod 600 /home/deploy/.ssh/id_ed25519

                            # SSH directory
                            chmod 700 /home/deploy/.ssh

                            # Application env file
                            sudo chown root:myapp /srv/myapp/.env
                            sudo chmod 640 /srv/myapp/.env

                            # Application directory
                            sudo chown -R myapp:www-data /srv/myapp
                            sudo chmod -R u=rwX,g=rX,o= /srv/myapp
Account review commands
# List users
                            cut -d: -f1 /etc/passwd

                            # Show users with shell access
                            grep -E "/bin/bash|/bin/sh|/usr/bin/zsh" /etc/passwd

                            # Show sudo group members
                            getent group sudo

                            # Show recent sudo usage
                            sudo grep sudo /var/log/auth.log | tail -100
Production rule: a service should not need root to run. Bind privileged ports through Nginx or systemd capabilities rather than running the application as root.
Security updates, patch windows and reboot policy

Security updates close known vulnerabilities. On Ubuntu, patching must include package updates, service restarts, kernel reboots when required, and validation after patching. Production teams should define standard patch windows and emergency patch paths.

Patch modelBest forAdvantageRisk
Manual patchingCritical systems with maintenance windows.Control and validation.Can be delayed.
Unattended security updatesStandard servers.Fast CVE response.Needs restart and reboot policy.
Golden image rebuildCloud fleets and stateless systems.Reproducible and rollback-friendly.Requires image pipeline.
Rolling patchingHA clusters.Minimizes downtime.Requires health checks and drain logic.
Patch commands
# Refresh metadata
                            sudo apt update

                            # Show upgradeable packages
                            apt list --upgradable

                            # Apply upgrades
                            sudo apt upgrade

                            # Full upgrade with dependency changes
                            sudo apt full-upgrade

                            # Remove obsolete packages
                            sudo apt autoremove

                            # Check reboot requirement
                            test -f /var/run/reboot-required && cat /var/run/reboot-required
                            cat /var/run/reboot-required.pkgs 2>/dev/null
Patch workflow diagram
Security update required
                            โ”‚
                            โ”œโ”€โ”€ Identify affected packages
                            โ”œโ”€โ”€ Check staging compatibility
                            โ”œโ”€โ”€ Snapshot or backup
                            โ”œโ”€โ”€ Apply apt updates
                            โ”œโ”€โ”€ Restart affected services
                            โ”œโ”€โ”€ Reboot if required
                            โ”œโ”€โ”€ Validate application
                            โ”œโ”€โ”€ Check logs
                            โ””โ”€โ”€ Document package changes
Unattended upgrades
# Install unattended upgrades
                            sudo apt install unattended-upgrades

                            # Enable basic automatic security updates
                            sudo dpkg-reconfigure unattended-upgrades

                            # Config files
                            /etc/apt/apt.conf.d/20auto-upgrades
                            /etc/apt/apt.conf.d/50unattended-upgrades

                            # Logs
                            sudo less /var/log/unattended-upgrades/unattended-upgrades.log
Post-patch validation
# Failed services
                            systemctl --failed

                            # Warnings since patch
                            journalctl -p warning --since "30 min ago"

                            # Listening ports
                            ss -lntp

                            # Application smoke test
                            curl -I https://example.com

                            # Confirm kernel after reboot
                            uname -a
Patch risk: installing a kernel security update without rebooting leaves the old kernel running. Track reboot-required status.
fail2ban: throttling brute-force attempts and noisy clients

fail2ban watches logs and temporarily bans IP addresses that match suspicious patterns, such as repeated SSH authentication failures. It is not a replacement for key-based SSH and firewall rules, but it is useful as an extra layer against brute-force noise.

ComponentMeaningExample
JailProtection rule for a service.sshd
FilterLog pattern that detects failures.SSH failed login regex.
ActionWhat to do when matched.Ban IP with firewall.
findtimeTime window for counting failures.10m
maxretryNumber of failures before ban.5
bantimeBan duration.1h
Install and baseline
# Install
                            sudo apt update
                            sudo apt install fail2ban

                            # Create local jail config
                            sudo cp /etc/fail2ban/jail.conf /etc/fail2ban/jail.local

                            # Edit local config
                            sudo vim /etc/fail2ban/jail.local

                            # Restart and enable
                            sudo systemctl enable fail2ban
                            sudo systemctl restart fail2ban

                            # Check status
                            sudo systemctl status fail2ban
Example SSH jail
[sshd]
                            enabled = true
                            port = ssh
                            filter = sshd
                            logpath = %(sshd_log)s
                            maxretry = 5
                            findtime = 10m
                            bantime = 1h
fail2ban operations
# Overall status
                            sudo fail2ban-client status

                            # Jail status
                            sudo fail2ban-client status sshd

                            # Ban an IP manually
                            sudo fail2ban-client set sshd banip 198.51.100.10

                            # Unban an IP
                            sudo fail2ban-client set sshd unbanip 198.51.100.10

                            # Logs
                            sudo journalctl -u fail2ban --since today
                            sudo tail -100 /var/log/fail2ban.log
Layered protection
SSH security layers
                            โ”‚
                            โ”œโ”€โ”€ SSH keys
                            โ”œโ”€โ”€ no root login
                            โ”œโ”€โ”€ no password auth
                            โ”œโ”€โ”€ AllowUsers
                            โ”œโ”€โ”€ UFW source restriction
                            โ”œโ”€โ”€ fail2ban
                            โ””โ”€โ”€ bastion or VPN for stricter environments
Practical rule: fail2ban reduces noise and slows attackers, but real SSH security starts with key-based access and minimal exposure.
Audit, logs, detection and security visibility

Hardening without visibility is incomplete. You need to know who logged in, who used sudo, which services failed, what ports are listening, which packages changed, and whether suspicious authentication events occurred.

QuestionCommand / sourceWhy it matters
Who logged in?last, who, wSession visibility.
Who used sudo?/var/log/auth.logPrivilege escalation audit.
Which SSH attempts failed?journalctl -u sshBrute-force or misconfiguration detection.
Which packages changed?/var/log/apt/history.logPatch and change traceability.
Which services failed?systemctl --failedOperational health.
Which ports are open?ss -lntpExposure check.
Security log commands
# SSH logs
                            journalctl -u ssh --since today

                            # Authentication logs
                            sudo tail -200 /var/log/auth.log

                            # Failed SSH attempts
                            sudo grep -i "failed password" /var/log/auth.log | tail -100

                            # Sudo usage
                            sudo grep -i "sudo" /var/log/auth.log | tail -100

                            # Recent logins
                            last -a | head -30

                            # Current sessions
                            who
                            w
Audit architecture
Ubuntu host
                            โ”‚
                            โ”œโ”€โ”€ journald
                            โ”œโ”€โ”€ auth.log
                            โ”œโ”€โ”€ apt history
                            โ”œโ”€โ”€ service logs
                            โ”œโ”€โ”€ firewall logs
                            โ”œโ”€โ”€ fail2ban logs
                            โ””โ”€โ”€ application logs
                            โ”‚
                            โ–ผ
                            Central logging
                            โ”‚
                            โ”œโ”€โ”€ CloudWatch
                            โ”œโ”€โ”€ ELK / OpenSearch
                            โ”œโ”€โ”€ Loki
                            โ”œโ”€โ”€ SIEM
                            โ””โ”€โ”€ long-term archive
auditd baseline
# Install audit daemon
                            sudo apt install auditd audispd-plugins

                            # Enable service
                            sudo systemctl enable auditd
                            sudo systemctl start auditd

                            # Status
                            sudo systemctl status auditd

                            # Search audit logs
                            sudo ausearch -m USER_LOGIN
                            sudo ausearch -m USER_CMD
                            sudo aureport --summary
Security review snapshot
echo "== USERS WITH SHELL =="
                            grep -E "/bin/bash|/bin/sh|/usr/bin/zsh" /etc/passwd

                            echo "== SUDO GROUP =="
                            getent group sudo

                            echo "== OPEN PORTS =="
                            ss -lntp

                            echo "== FAILED UNITS =="
                            systemctl --failed

                            echo "== RECENT SSH LOGS =="
                            journalctl -u ssh --since "24 hours ago" --no-pager
Detection rule: collect logs before you need them. During an incident, missing logs cannot be reconstructed.
Cloud security: security groups, metadata, IAM, snapshots and bastion design

On cloud servers, Ubuntu security is shared between the operating system and the cloud perimeter. A safe design uses security groups, private subnets, IAM roles, metadata protection, snapshots, central logs and restricted administration paths.

Cloud controlPurposeUbuntu-side complement
Security groupCloud firewall at instance or interface level.UFW local firewall.
Private subnetKeep databases and internal services non-public.Bind services to private IP or localhost.
Bastion hostControlled admin entry point.SSH restricted to bastion IP.
IAM roleGrant cloud API permissions without static keys.Avoid storing cloud keys on disk.
Metadata service controlsReduce credential exposure risk.Limit local process access and use least privilege.
SnapshotsRollback and disaster recovery.Test restore and document recovery.
Cloud logsCentralize evidence and monitoring.Forward Ubuntu logs and app logs.
Cloud exposure model
Public Internet
                            โ”‚
                            โ”œโ”€โ”€ HTTPS only
                            โ–ผ
                            Load balancer / WAF
                            โ”‚
                            โ”œโ”€โ”€ forwards to app subnet
                            โ–ผ
                            Application server
                            โ”‚
                            โ”œโ”€โ”€ UFW allows LB source only
                            โ”œโ”€โ”€ SSH allowed from bastion only
                            โ””โ”€โ”€ app talks to DB privately
                            โ”‚
                            โ–ผ
                            Database server
                            โ”œโ”€โ”€ no public IP
                            โ”œโ”€โ”€ private subnet
                            โ””โ”€โ”€ port allowed only from app server
Cloud hardening checklist
[ ] Only required public ports are open
                            [ ] SSH is restricted by source IP or bastion
                            [ ] Database has no public exposure
                            [ ] Redis has no public exposure
                            [ ] Security groups are documented
                            [ ] UFW rules match cloud security model
                            [ ] IAM role uses least privilege
                            [ ] No static cloud keys in home directories
                            [ ] Instance metadata policy is reviewed
                            [ ] Snapshots are scheduled
                            [ ] Restore has been tested
                            [ ] Logs are shipped centrally
                            [ ] Monitoring alerts are configured
Cloud diagnostic commands
# Local listening ports
                            ss -lntp

                            # Local firewall
                            sudo ufw status verbose

                            # Instance view of routes and IPs
                            ip a
                            ip r

                            # Check outbound cloud metadata access if policy allows it
                            curl -s --max-time 2 http://169.254.169.254/ || true

                            # Check public service from server
                            curl -I http://localhost

                            # Check logs
                            journalctl -p warning --since "30 min ago"
Cloud rule: do not rely on one firewall layer only. Use security groups and UFW together for critical servers.
Security incident response: brute force, exposed port, compromise suspicion

Security incidents must be handled carefully. The first objective is to preserve evidence and stop further damage. Avoid making random changes before collecting logs, current sessions, open ports and process state.

IncidentImmediate checksContainment
SSH brute forceAuth logs, fail2ban, source IPs.Restrict SSH, disable passwords, ban sources.
Unexpected open portss -lntp, service status, firewall.Stop service or close firewall rule.
Suspicious user/etc/passwd, sudo group, auth logs.Lock account, preserve logs.
Package tamperingApt history, modified repositories.Disable unknown repos, rebuild if needed.
Possible compromiseProcesses, ports, cron, users, logs.Isolate host, snapshot disk, rotate credentials.
Secret exposureAccess logs, shell history, app logs.Rotate keys, revoke tokens, audit access.
First response commands
# Current users and sessions
                            who
                            w
                            last -a | head -50

                            # Listening ports and processes
                            ss -lntp
                            ps aux --sort=-%cpu | head -30

                            # Failed services
                            systemctl --failed

                            # Recent auth activity
                            sudo tail -300 /var/log/auth.log

                            # SSH logs
                            journalctl -u ssh --since "24 hours ago"

                            # Recent package changes
                            less /var/log/apt/history.log
Incident response flow
Security alert
                            โ”‚
                            โ”œโ”€โ”€ Preserve evidence
                            โ”‚       โ”œโ”€โ”€ logs
                            โ”‚       โ”œโ”€โ”€ sessions
                            โ”‚       โ”œโ”€โ”€ ports
                            โ”‚       โ””โ”€โ”€ process list
                            โ”‚
                            โ”œโ”€โ”€ Determine scope
                            โ”‚       โ”œโ”€โ”€ one account
                            โ”‚       โ”œโ”€โ”€ one service
                            โ”‚       โ”œโ”€โ”€ one host
                            โ”‚       โ””โ”€โ”€ multiple systems
                            โ”‚
                            โ”œโ”€โ”€ Contain
                            โ”‚       โ”œโ”€โ”€ firewall rule
                            โ”‚       โ”œโ”€โ”€ disable account
                            โ”‚       โ”œโ”€โ”€ stop service
                            โ”‚       โ””โ”€โ”€ isolate instance
                            โ”‚
                            โ”œโ”€โ”€ Eradicate
                            โ”‚       โ”œโ”€โ”€ patch
                            โ”‚       โ”œโ”€โ”€ remove access
                            โ”‚       โ”œโ”€โ”€ rotate secrets
                            โ”‚       โ””โ”€โ”€ rebuild if needed
                            โ”‚
                            โ””โ”€โ”€ Recover
                            โ”œโ”€โ”€ restore service
                            โ”œโ”€โ”€ validate logs
                            โ”œโ”€โ”€ monitor closely
                            โ””โ”€โ”€ write postmortem
Credential rotation checklist
[ ] SSH keys reviewed
                            [ ] Unknown keys removed
                            [ ] Sudo users reviewed
                            [ ] Application secrets rotated
                            [ ] Database passwords rotated
                            [ ] Cloud API keys revoked or rotated
                            [ ] CI/CD tokens rotated
                            [ ] Webhook secrets rotated
                            [ ] TLS private key checked
                            [ ] Backup access reviewed
Compromise rule: if root compromise is credible, rebuilding from a trusted image is usually safer than trying to clean the machine manually.
Final hardening checklist and command cheat sheet
Ubuntu security baseline checklist
[ ] Ubuntu LTS is used
                            [ ] Packages are updated
                            [ ] Reboot-required state is checked
                            [ ] Named admin user exists
                            [ ] Root SSH login is disabled
                            [ ] SSH key login is validated
                            [ ] Password SSH login is disabled
                            [ ] UFW default deny incoming is enabled
                            [ ] Only required ports are open
                            [ ] Database ports are private only
                            [ ] Redis ports are private only
                            [ ] fail2ban is installed if public SSH exists
                            [ ] Sudo group is reviewed
                            [ ] Service users are non-root
                            [ ] Secrets are not world-readable
                            [ ] Logs are reviewed and centralized if possible
                            [ ] Backups or snapshots exist
                            [ ] Restore has been tested
                            [ ] Cloud security groups are minimal
                            [ ] Incident response procedure exists
Quick security snapshot
echo "== OS =="
                            lsb_release -a

                            echo "== REBOOT REQUIRED =="
                            test -f /var/run/reboot-required && cat /var/run/reboot-required || echo "no reboot flag"

                            echo "== UFW =="
                            sudo ufw status verbose

                            echo "== OPEN PORTS =="
                            ss -lntp

                            echo "== SUDO USERS =="
                            getent group sudo

                            echo "== SSH LOGS =="
                            journalctl -u ssh --since "24 hours ago" --no-pager | tail -100
Command cheat sheet
# SSH
                            sudo sshd -t
                            sudo systemctl restart ssh
                            journalctl -u ssh --since today

                            # Firewall
                            sudo ufw status verbose
                            sudo ufw allow OpenSSH
                            sudo ufw allow 443/tcp
                            sudo ufw enable

                            # Users
                            id deploy
                            getent group sudo
                            sudo visudo
                            sudo passwd -l username

                            # Updates
                            sudo apt update
                            apt list --upgradable
                            sudo apt upgrade
                            test -f /var/run/reboot-required && cat /var/run/reboot-required

                            # fail2ban
                            sudo fail2ban-client status
                            sudo fail2ban-client status sshd

                            # Logs
                            sudo tail -100 /var/log/auth.log
                            journalctl -p warning --since today
                            systemctl --failed
Final rule
Ubuntu hardening is a living process.
Secure the access path, minimize exposed ports, patch regularly, run services with least privilege, monitor logs, keep backups, test recovery, and document every exception.
Minimal secure server profile
Minimum secure Ubuntu server:
                            - Ubuntu LTS
                            - SSH key access only
                            - no root SSH login
                            - UFW enabled
                            - only required ports open
                            - security updates applied
                            - non-root service users
                            - secrets protected
                            - logs available
                            - backups tested
                            - cloud perimeter restricted
4.2 Ubuntu Performance & Robustness: CPU, RAM, IO, kernel, tuning, monitoring and production stability
Performance and robustness objective

Ubuntu performance engineering is not random tuning. It is a disciplined loop: observe real metrics, identify the bottleneck, make one controlled change, measure again, document the result, and keep rollback possible.

Production robustness comes from stable LTS releases, predictable package updates, systemd service supervision, good logs, monitoring, disk hygiene, firewalling, backup, capacity planning and incident runbooks. Ubuntu is considered stable in production because these operational primitives are mature, widely documented and automation-friendly.

LayerObserveTypical bottleneckMain tools
CPULoad average, CPU %, run queue, per-process usage.Too many workers, hot loop, compression, TLS, DB query CPU.top, htop, pidstat, mpstat
MemoryUsed RAM, available RAM, swap, OOM kills.Memory leak, cache pressure, too many processes.free, vmstat, journalctl
Disk / IOIO wait, disk latency, queue depth, filesystem usage.Slow volume, log growth, DB writes, Docker layers.iostat, iotop, df, du
NetworkPorts, connections, packet errors, latency, throughput.Firewall, DNS, saturation, SYN flood, bad route.ss, ip, mtr, nload
Servicessystemd state, restarts, logs, health checks.Crash loop, bad config, missing dependency.systemctl, journalctl
Core method: measure first, tune second. Most bad tuning comes from changing kernel parameters before proving where the bottleneck is.
Performance investigation map
Application is slow
                            โ”‚
                            โ”œโ”€โ”€ CPU saturated?
                            โ”‚       โ”œโ”€โ”€ yes -> top, htop, pidstat, app profiler
                            โ”‚       โ””โ”€โ”€ no
                            โ”‚
                            โ”œโ”€โ”€ Memory pressure?
                            โ”‚       โ”œโ”€โ”€ yes -> free, vmstat, OOM logs, process RSS
                            โ”‚       โ””โ”€โ”€ no
                            โ”‚
                            โ”œโ”€โ”€ IO wait high?
                            โ”‚       โ”œโ”€โ”€ yes -> iostat, iotop, df, DB/log writes
                            โ”‚       โ””โ”€โ”€ no
                            โ”‚
                            โ”œโ”€โ”€ Network slow?
                            โ”‚       โ”œโ”€โ”€ yes -> ss, ip, mtr, DNS, firewall
                            โ”‚       โ””โ”€โ”€ no
                            โ”‚
                            โ”œโ”€โ”€ Service unstable?
                            โ”‚       โ”œโ”€โ”€ yes -> systemctl, journalctl, restart policy
                            โ”‚       โ””โ”€โ”€ no
                            โ”‚
                            โ””โ”€โ”€ Application bottleneck?
                            โ”œโ”€โ”€ DB query
                            โ”œโ”€โ”€ external API
                            โ”œโ”€โ”€ lock contention
                            โ”œโ”€โ”€ cache miss
                            โ””โ”€โ”€ bad algorithm
Install performance toolkit
sudo apt update

                            sudo apt install -y \
                            sysstat \
                            iotop \
                            htop \
                            nload \
                            iftop \
                            mtr-tiny \
                            dstat \
                            strace \
                            lsof \
                            curl \
                            dnsutils
Production warning: tools such as strace, perf or heavy tracing can add overhead. Use carefully on busy production systems.
CPU: load average, saturation, processes and worker sizing

CPU performance issues usually appear as high load average, high user CPU, high system CPU, excessive context switching or too many runnable processes. On web servers, wrong worker counts can create CPU contention or memory pressure.

MetricMeaningWarning signCommand
Load averageRunnable or waiting tasks over 1, 5, 15 min.Load consistently above CPU cores.uptime
User CPUApplication code CPU usage.One process dominates.top, pidstat
System CPUKernel work.High network, IO, syscall overhead.mpstat
IO waitCPU waiting on disk.App slow but CPU not busy.iostat, top
Steal timeVM CPU stolen by hypervisor.Cloud instance contention.mpstat
Context switchesTask switching overhead.Too many workers or threads.vmstat
CPU commands
# Load average and uptime
                            uptime

                            # Interactive CPU/process view
                            top
                            htop

                            # Per-CPU statistics
                            mpstat -P ALL 1

                            # Per-process CPU every second
                            pidstat -u 1

                            # Process tree
                            ps aux --sort=-%cpu | head -30

                            # Threads of a process
                            ps -L -p PID -o pid,tid,pcpu,pmem,comm
CPU diagnosis flow
High CPU or high load
                            โ”‚
                            โ”œโ”€โ”€ Is load higher than CPU core count?
                            โ”‚       โ””โ”€โ”€ uptime, nproc
                            โ”‚
                            โ”œโ”€โ”€ Is CPU user, system, iowait or steal?
                            โ”‚       โ””โ”€โ”€ top, mpstat
                            โ”‚
                            โ”œโ”€โ”€ Which process dominates?
                            โ”‚       โ””โ”€โ”€ ps aux --sort=-%cpu
                            โ”‚
                            โ”œโ”€โ”€ Is it app code, DB, web server, backup, cron?
                            โ”‚       โ””โ”€โ”€ systemctl, logs, cron
                            โ”‚
                            โ”œโ”€โ”€ Did traffic increase?
                            โ”‚       โ””โ”€โ”€ nginx logs, app metrics
                            โ”‚
                            โ””โ”€โ”€ Did a deployment or package update happen?
                            โ””โ”€โ”€ deploy logs, apt history
Worker sizing examples
Gunicorn starting point:
                            workers = (2 * CPU cores) + 1

                            Example:
                            2 vCPU -> 5 workers
                            4 vCPU -> 9 workers

                            But verify with:
                            - memory per worker
                            - request latency
                            - DB connection limit
                            - CPU saturation
                            - queue time
                            - error rate

                            Celery:
                            - CPU-bound tasks: concurrency near CPU cores
                            - IO-bound tasks: higher concurrency can help
                            - DB-heavy tasks: limit by database capacity
CPU rule: more workers are not always better. Too many workers can increase memory usage, DB connections, context switches and latency.
Memory: RAM, cache, swap, OOM killer and service limits

Linux uses free memory for cache, so โ€œused memoryโ€ is not automatically a problem. The important indicators are available memory, swap activity, OOM kills, growing process RSS, and whether memory pressure correlates with latency or crashes.

MetricMeaningBad signCommand
Available RAMMemory that can be used without heavy reclaim.Very low for sustained period.free -h
Swap usedMemory pages moved to disk.Growing swap + latency.swapon --show
si / soSwap in / swap out activity.Non-zero under load.vmstat 1
RSSResident memory per process.Process grows without bound.ps, top
OOM killKernel killed process due to memory exhaustion.Service disappears suddenly.journalctl -k
Memory commands
# Memory overview
                            free -h

                            # Swap
                            swapon --show

                            # VM activity
                            vmstat 1

                            # Top memory processes
                            ps aux --sort=-%mem | head -30

                            # Kernel OOM events
                            journalctl -k --since today | grep -i -E "oom|killed process"

                            # Memory per service process
                            systemctl status myservice
                            ps -o pid,rss,vsz,cmd -C gunicorn
Memory diagnosis flow
Service slow or killed
                            โ”‚
                            โ”œโ”€โ”€ Is available memory low?
                            โ”‚       โ””โ”€โ”€ free -h
                            โ”‚
                            โ”œโ”€โ”€ Is swap active?
                            โ”‚       โ””โ”€โ”€ swapon --show, vmstat 1
                            โ”‚
                            โ”œโ”€โ”€ Any OOM kills?
                            โ”‚       โ””โ”€โ”€ journalctl -k | grep -i oom
                            โ”‚
                            โ”œโ”€โ”€ Which process uses memory?
                            โ”‚       โ””โ”€โ”€ ps aux --sort=-%mem
                            โ”‚
                            โ”œโ”€โ”€ Did memory grow after deploy?
                            โ”‚       โ””โ”€โ”€ compare metrics before/after
                            โ”‚
                            โ””โ”€โ”€ Can service be limited?
                            โ””โ”€โ”€ systemd MemoryMax, worker count, app config
systemd memory limit example
# /etc/systemd/system/myapp.service.d/limits.conf
                            [Service]
                            MemoryMax=1G
                            MemoryHigh=800M
                            Restart=on-failure
                            RestartSec=5

                            # Apply
                            sudo systemctl daemon-reload
                            sudo systemctl restart myapp
                            systemctl status myapp
Swap policy
ContextSwap recommendationReason
Small VMSmall swap can prevent abrupt OOM.Graceful degradation.
Latency-sensitive DBAvoid heavy swap activity.Swap can destroy latency.
Batch workerSome swap acceptable.Throughput may tolerate latency.
Memory rule: swap used is not always fatal. Active swapping under load is the real warning sign.
Disk and IO: latency, throughput, filesystem usage and log growth

Disk IO bottlenecks often look like application slowness. CPU may appear idle while requests are stuck waiting for disk. Common causes: database writes, slow cloud volume, Docker logs, journal growth, backups, missing indexes, swap activity or full filesystem.

SymptomLikely causeVerificationCorrection
High app latencyIO wait or DB disk pressure.iostat -xz 1Faster disk, batching, DB tuning.
Disk fullLogs, Docker, uploads, backups.df -h, du -shRetention, cleanup, bigger volume.
Swap activityRAM shortage.vmstat 1Reduce workers, add RAM, tune app.
Docker grows fastImages, containers, logs, volumes.docker system dfLog rotation, prune carefully.
Journal too largeSystemd journal retention unmanaged.journalctl --disk-usageVacuum or configure retention.
Disk and IO commands
# Filesystem usage
                            df -h

                            # Largest top-level directories
                            sudo du -sh /* 2>/dev/null | sort -h

                            # Common growth areas
                            sudo du -sh /var/log/*
                            sudo du -sh /var/lib/docker/*
                            sudo du -sh /var/lib/postgresql/* 2>/dev/null

                            # Block devices
                            lsblk -f
                            findmnt

                            # IO statistics
                            iostat -xz 1

                            # IO by process
                            sudo iotop -o

                            # Journal usage
                            journalctl --disk-usage
IO bottleneck flow
Application latency spike
                            โ”‚
                            โ”œโ”€โ”€ Is CPU iowait high?
                            โ”‚       โ””โ”€โ”€ top, iostat
                            โ”‚
                            โ”œโ”€โ”€ Which disk is busy?
                            โ”‚       โ””โ”€โ”€ iostat -xz 1
                            โ”‚
                            โ”œโ”€โ”€ Which process writes?
                            โ”‚       โ””โ”€โ”€ iotop -o
                            โ”‚
                            โ”œโ”€โ”€ Is filesystem near full?
                            โ”‚       โ””โ”€โ”€ df -h
                            โ”‚
                            โ”œโ”€โ”€ Did logs or Docker grow?
                            โ”‚       โ””โ”€โ”€ du -sh /var/log /var/lib/docker
                            โ”‚
                            โ””โ”€โ”€ Is database the writer?
                            โ”œโ”€โ”€ check slow queries
                            โ”œโ”€โ”€ check checkpoints
                            โ””โ”€โ”€ check volume IOPS
Safe cleanup examples
# Clean apt cache
                            sudo apt clean

                            # Remove unused packages
                            sudo apt autoremove

                            # Vacuum systemd journal by time
                            sudo journalctl --vacuum-time=14d

                            # Vacuum systemd journal by size
                            sudo journalctl --vacuum-size=1G

                            # Docker usage
                            docker system df

                            # Docker cleanup - careful on production
                            docker image prune
                            docker container prune
Preventive controls
Prevent disk incidents:
                            - alert when filesystem > 80%
                            - separate /var for log-heavy servers
                            - configure logrotate
                            - configure Docker log rotation
                            - monitor journal size
                            - monitor database volume
                            - keep backup volume separate
                            - test volume resize procedure
Critical rule: never manually delete unknown files inside database directories. Use database-native cleanup, backups, vacuum or retention procedures.
Network performance: ports, connections, latency, packet loss and throughput

Network issues may appear as API latency, timeouts, intermittent failures, failed database connections, slow downloads or connection storms. Diagnose from local socket state outward: listening ports, established connections, interface errors, DNS, routing, packet loss and remote latency.

QuestionCommandWhat to look for
Which ports listen?ss -lntpExpected services only.
How many connections?ss -sEstablished, time-wait, orphaned sockets.
Interface errors?ip -s linkRX/TX errors, dropped packets.
DNS working?dig, resolvectlResolver latency and correctness.
Packet loss or route issue?mtr -rwLoss, latency, bad hop.
Bandwidth usage?nload, iftopUnexpected egress or ingress.
Network commands
# Socket summary
                            ss -s

                            # Listening TCP ports
                            ss -lntp

                            # Established connections
                            ss -antp

                            # Network interfaces and counters
                            ip -s link

                            # Routes
                            ip r

                            # DNS
                            resolvectl status
                            dig example.com

                            # Latency and packet loss
                            ping -c 5 1.1.1.1
                            mtr -rw example.com

                            # Live bandwidth
                            nload
                            sudo iftop
Network diagnosis flow
Network latency or timeout
                            โ”‚
                            โ”œโ”€โ”€ Local service listening?
                            โ”‚       โ””โ”€โ”€ ss -lntp
                            โ”‚
                            โ”œโ”€โ”€ Firewall blocking?
                            โ”‚       โ””โ”€โ”€ ufw status, cloud security group
                            โ”‚
                            โ”œโ”€โ”€ DNS slow or wrong?
                            โ”‚       โ””โ”€โ”€ dig, resolvectl
                            โ”‚
                            โ”œโ”€โ”€ Route correct?
                            โ”‚       โ””โ”€โ”€ ip r
                            โ”‚
                            โ”œโ”€โ”€ Packet loss?
                            โ”‚       โ””โ”€โ”€ ping, mtr
                            โ”‚
                            โ”œโ”€โ”€ Interface drops?
                            โ”‚       โ””โ”€โ”€ ip -s link
                            โ”‚
                            โ””โ”€โ”€ Too many connections?
                            โ””โ”€โ”€ ss -s, logs, rate limits
Connection pressure examples
# Count connections by state
                            ss -ant | awk 'NR>1 {state[$1]++} END {for (s in state) print s, state[s]}'

                            # Top remote IPs connected to port 443
                            ss -ant '( sport = :443 )' | awk 'NR>1 {print $5}' | cut -d: -f1 | sort | uniq -c | sort -nr | head

                            # Check Nginx access bursts
                            sudo tail -1000 /var/log/nginx/access.log | awk '{print $1}' | sort | uniq -c | sort -nr | head
Network rule: separate reachability, DNS, firewall, service binding and application errors. They are different failure layers.
Kernel and sysctl: controlled tuning, limits and safe defaults

Kernel tuning should be conservative. Ubuntu defaults are reasonable for general production. Change kernel parameters only when you understand the workload and can measure before and after. Keep changes versioned and reversible.

AreaParameter / controlWhy it mattersWarning
File descriptorsLimitNOFILE, ulimitMany sockets/files.Must align with app and systemd.
Swappinessvm.swappinessSwap tendency.Do not blindly set to zero.
TCP backlognet.core.somaxconnConnection bursts.App backlog must also match.
Ephemeral portsip_local_port_rangeOutbound connection scale.Usually not first bottleneck.
Kernel logsdmesg, journalctl -kOOM, disk, driver, network errors.Read before tuning.
Inspect kernel and limits
# Kernel version
                            uname -a

                            # CPU cores
                            nproc

                            # Current sysctl values
                            sysctl vm.swappiness
                            sysctl net.core.somaxconn
                            sysctl net.ipv4.ip_local_port_range

                            # Current shell limits
                            ulimit -a

                            # systemd service limits
                            systemctl show nginx | grep -E "LimitNOFILE|LimitNPROC"

                            # Kernel messages
                            dmesg -T | tail -100
                            journalctl -k --since today
Safe sysctl pattern
# Temporary test until reboot
                            sudo sysctl -w net.core.somaxconn=4096

                            # Persistent setting
                            sudo vim /etc/sysctl.d/99-custom-performance.conf

                            # Example content
                            net.core.somaxconn = 4096
                            vm.swappiness = 10

                            # Apply
                            sudo sysctl --system

                            # Verify
                            sysctl net.core.somaxconn
                            sysctl vm.swappiness
systemd limit example
# Create override
                            sudo systemctl edit nginx

                            # Add:
                            [Service]
                            LimitNOFILE=65535

                            # Apply
                            sudo systemctl daemon-reload
                            sudo systemctl restart nginx

                            # Verify
                            systemctl show nginx | grep LimitNOFILE
Tuning decision tree
Want to tune kernel?
                            โ”‚
                            โ”œโ”€โ”€ Is bottleneck measured?
                            โ”‚       โ”œโ”€โ”€ no -> measure first
                            โ”‚       โ””โ”€โ”€ yes
                            โ”‚
                            โ”œโ”€โ”€ Is app configured consistently?
                            โ”‚       โ”œโ”€โ”€ no -> tune app first
                            โ”‚       โ””โ”€โ”€ yes
                            โ”‚
                            โ”œโ”€โ”€ Is change reversible?
                            โ”‚       โ”œโ”€โ”€ no -> do not apply
                            โ”‚       โ””โ”€โ”€ yes
                            โ”‚
                            โ””โ”€โ”€ Apply one change
                            โ”œโ”€โ”€ measure again
                            โ”œโ”€โ”€ document
                            โ””โ”€โ”€ keep rollback
Tuning rule: kernel parameters are not magic. Bad sysctl tuning can reduce stability or hide the real application bottleneck.
Robustness with systemd: restart policy, health checks, limits and dependencies

Production robustness depends on what happens when a process fails. systemd can restart services, limit resources, order dependencies, isolate users, set environment files and expose logs. A fragile script becomes a production service when it has a proper unit.

systemd featurePurposeExample
RestartRestart process after failure.Restart=on-failure
RestartSecDelay before restart.RestartSec=5
StartLimitBurstPrevent infinite crash loops.StartLimitBurst=5
MemoryMaxLimit memory usage.MemoryMax=1G
LimitNOFILERaise file descriptor limit.LimitNOFILE=65535
UserRun service as non-root user.User=myapp
Robust service unit
[Unit]
                            Description=My application service
                            After=network.target
                            StartLimitIntervalSec=60
                            StartLimitBurst=5

                            [Service]
                            User=myapp
                            Group=myapp
                            WorkingDirectory=/srv/myapp
                            EnvironmentFile=/srv/myapp/.env
                            ExecStart=/srv/myapp/venv/bin/gunicorn config.wsgi:application \
                            --bind 127.0.0.1:8000 \
                            --workers 3
                            Restart=on-failure
                            RestartSec=5
                            TimeoutStopSec=30
                            LimitNOFILE=65535
                            MemoryMax=1G

                            [Install]
                            WantedBy=multi-user.target
Service robustness flow
Service process
                            โ”‚
                            โ”œโ”€โ”€ runs as non-root user
                            โ”œโ”€โ”€ starts after dependencies
                            โ”œโ”€โ”€ has environment file
                            โ”œโ”€โ”€ logs to journald
                            โ”œโ”€โ”€ restarts on failure
                            โ”œโ”€โ”€ has restart backoff
                            โ”œโ”€โ”€ has resource limits
                            โ””โ”€โ”€ is enabled at boot
                            โ”‚
                            โ–ผ
                            Operations
                            โ”œโ”€โ”€ systemctl status
                            โ”œโ”€โ”€ journalctl -u service
                            โ”œโ”€โ”€ systemctl restart service
                            โ”œโ”€โ”€ systemctl show service
                            โ””โ”€โ”€ alerts on restart count
Service diagnostics
# Status
                            systemctl status myapp

                            # Logs
                            journalctl -u myapp --since "1 hour ago"
                            journalctl -u myapp -f

                            # Check restart count and limits
                            systemctl show myapp | grep -E "NRestarts|Restart|Memory|LimitNOFILE"

                            # Failed units
                            systemctl --failed

                            # Reload unit changes
                            sudo systemctl daemon-reload
Robustness rule: every production daemon should be a systemd-managed service with logs, restart policy, non-root user and clear operational commands.
Monitoring: host metrics, service metrics, logs, alerts and SLOs

Monitoring makes performance visible before users complain. The minimum production stack should monitor CPU, memory, disk, IO, network, service state, open ports, logs, reboot-required state, certificate expiry, backups and application health.

Metric familyExamplesAlert idea
CPUCPU %, load average, steal, iowait.Sustained saturation over baseline.
MemoryAvailable RAM, swap activity, OOM events.Low available memory or OOM kill.
DiskFilesystem %, inode %, IO latency.Filesystem above 80-90%.
NetworkThroughput, packet drops, connection count.Unexpected drop/error spike.
Servicessystemd failed units, restart count.Service failed or crash-looping.
SecuritySSH failures, auth failures, UFW denies.Spike above baseline.
Monitoring stack example
Ubuntu server
                            โ”‚
                            โ”œโ”€โ”€ node exporter
                            โ”œโ”€โ”€ journald logs
                            โ”œโ”€โ”€ application metrics
                            โ”œโ”€โ”€ nginx metrics/logs
                            โ”œโ”€โ”€ database exporter
                            โ””โ”€โ”€ backup status
                            โ”‚
                            โ–ผ
                            Observability platform
                            โ”œโ”€โ”€ Prometheus
                            โ”œโ”€โ”€ Grafana
                            โ”œโ”€โ”€ Loki / ELK
                            โ”œโ”€โ”€ Alertmanager
                            โ””โ”€โ”€ incident channel
Local monitoring commands
# CPU and memory
                            top
                            free -h
                            vmstat 1

                            # Disk and IO
                            df -h
                            iostat -xz 1
                            iotop -o

                            # Network
                            ss -s
                            ip -s link
                            nload

                            # Services and logs
                            systemctl --failed
                            journalctl -p warning --since "30 min ago"

                            # Reboot required
                            test -f /var/run/reboot-required && cat /var/run/reboot-required
Alerting baseline
Recommended alerts:
                            [ ] Disk filesystem > 80%
                            [ ] Disk filesystem > 90%
                            [ ] Inode usage high
                            [ ] Service failed
                            [ ] Reboot required too long
                            [ ] OOM kill detected
                            [ ] Swap activity sustained
                            [ ] CPU saturation sustained
                            [ ] IO wait sustained
                            [ ] Backup failed
                            [ ] Certificate expires soon
                            [ ] SSH failure spike
                            [ ] HTTP 5xx spike
                            [ ] Database unavailable
Monitoring rule: alerts must be actionable. Every alert needs severity, owner, impact, first checks and definition of done.
Troubleshooting playbooks: slow server, crash loop, disk full, memory pressure
Symptom matrix
SymptomFirst checksLikely causeAction
Server slowtop, free -h, iostatCPU, memory or IO saturation.Identify resource, reduce load, scale.
Service crash loopsystemctl status, journalctl -uBad config, dependency, permission, OOM.Fix root cause, then restart.
Disk fulldf -h, du -shLogs, Docker, DB, backups.Clean safely, add retention, resize.
Memory pressurefree, vmstat, OOM logs.Leak, too many workers, cache pressure.Reduce workers, limit service, add RAM.
Network timeoutss, ip, mtr, DNS.Firewall, DNS, route, saturation.Fix correct layer.
One-shot diagnostic
echo "== HOST =="
                            hostnamectl

                            echo "== UPTIME =="
                            uptime

                            echo "== CPU/MEM =="
                            free -h
                            top -b -n1 | head -30

                            echo "== DISK =="
                            df -h

                            echo "== PORTS =="
                            ss -lntp

                            echo "== FAILED SERVICES =="
                            systemctl --failed

                            echo "== WARNINGS =="
                            journalctl -p warning --since "30 min ago" --no-pager | tail -100
Universal performance decision tree
Production issue
                            โ”‚
                            โ”œโ”€โ”€ Is it user-visible?
                            โ”‚       โ”œโ”€โ”€ yes -> check app SLO, HTTP 5xx, latency
                            โ”‚       โ””โ”€โ”€ no  -> check monitoring and trend
                            โ”‚
                            โ”œโ”€โ”€ Resource saturation?
                            โ”‚       โ”œโ”€โ”€ CPU -> top, pidstat
                            โ”‚       โ”œโ”€โ”€ RAM -> free, vmstat, OOM
                            โ”‚       โ”œโ”€โ”€ IO  -> iostat, iotop
                            โ”‚       โ””โ”€โ”€ NET -> ss, ip, mtr
                            โ”‚
                            โ”œโ”€โ”€ Service instability?
                            โ”‚       โ””โ”€โ”€ systemctl, journalctl
                            โ”‚
                            โ”œโ”€โ”€ Recent change?
                            โ”‚       โ”œโ”€โ”€ deployment
                            โ”‚       โ”œโ”€โ”€ apt upgrade
                            โ”‚       โ”œโ”€โ”€ config change
                            โ”‚       โ””โ”€โ”€ traffic spike
                            โ”‚
                            โ””โ”€โ”€ Fix strategy
                            โ”œโ”€โ”€ rollback
                            โ”œโ”€โ”€ reduce load
                            โ”œโ”€โ”€ scale resource
                            โ”œโ”€โ”€ tune one parameter
                            โ””โ”€โ”€ monitor result
Change discipline
During performance incident:
                            [ ] Do not change many things at once
                            [ ] Capture metrics before change
                            [ ] Identify the bottleneck
                            [ ] Apply one controlled change
                            [ ] Measure again
                            [ ] Keep rollback possible
                            [ ] Document root cause
                            [ ] Add alert if missing
                            [ ] Add runbook step if useful
Incident rule: random restarts can temporarily hide symptoms but lose evidence. Capture logs and metrics first when possible.
Performance and robustness checklist
Production performance baseline
[ ] Ubuntu LTS is used
                            [ ] Packages are updated
                            [ ] Reboot policy exists
                            [ ] CPU metrics are monitored
                            [ ] Memory and swap are monitored
                            [ ] Disk usage is monitored
                            [ ] IO latency is monitored
                            [ ] Network errors are monitored
                            [ ] systemd failed units are alerted
                            [ ] Service restart count is monitored
                            [ ] Logs are centralized if possible
                            [ ] Backups are monitored
                            [ ] Restore has been tested
                            [ ] Capacity baseline is documented
                            [ ] Load test exists for critical services
                            [ ] Runbooks exist for CPU/RAM/IO/disk incidents
Robust service checklist
[ ] Service runs under non-root user
                            [ ] systemd unit is versioned
                            [ ] Restart policy is configured
                            [ ] Resource limits are configured if needed
                            [ ] Logs are visible with journalctl
                            [ ] Health check exists
                            [ ] Environment file permissions are strict
                            [ ] Deployment rollback is possible
                            [ ] Service starts at boot
                            [ ] Dependencies are documented
Command cheat sheet
# CPU
                            uptime
                            top
                            htop
                            mpstat -P ALL 1
                            pidstat -u 1

                            # Memory
                            free -h
                            vmstat 1
                            swapon --show
                            journalctl -k | grep -i oom

                            # Disk / IO
                            df -h
                            du -sh /var/*
                            iostat -xz 1
                            iotop -o
                            journalctl --disk-usage

                            # Network
                            ss -s
                            ss -lntp
                            ip -s link
                            mtr -rw example.com
                            nload

                            # Services
                            systemctl --failed
                            systemctl status service
                            journalctl -u service -f

                            # Kernel / limits
                            uname -a
                            sysctl -a | grep vm.swappiness
                            ulimit -a
Final rule
Ubuntu is stable in production when it is operated properly.
Stability comes from LTS discipline, measured capacity, controlled updates, systemd supervision, monitored resources, clean logs, safe rollback, tested backups and calm incident handling.
Minimal robust server profile
Minimum robust Ubuntu server:
                            - Ubuntu LTS
                            - systemd-managed services
                            - restart policies
                            - monitoring for CPU/RAM/disk/IO/network
                            - alerting on failed services and disk growth
                            - log retention
                            - patch and reboot policy
                            - backup and restore test
                            - documented runbook
5.1 Ubuntu Cloud & AWS: official AMIs, Canonical owner, EC2 patterns, cloud-init, userdata and SSH keys
Ubuntu on cloud: what it means

Ubuntu is one of the most common Linux baselines for cloud servers. On AWS, it is typically deployed as an EC2 instance using an official Ubuntu AMI. The instance then boots with cloud-init, receives an SSH key, attaches storage, joins a network, applies security groups and runs the server bootstrap.

In production, the cloud image is part of the infrastructure contract. It defines the operating system version, kernel, package baseline, boot behavior, cloud-init behavior, default users, storage layout and initial security posture.

ConceptMeaningProduction impact
AMIAmazon Machine Image used to boot EC2.Defines OS baseline and initial package state.
Official Ubuntu imageImage published by Canonical for AWS.Preferred baseline for Ubuntu EC2 servers.
Owner IDAWS account that owns the public AMI.Used to avoid fake or untrusted public images.
cloud-initFirst-boot initialization system.Creates users, installs packages, writes files, runs commands.
User dataBootstrap content passed at EC2 launch.Automates first boot configuration.
Security groupAWS network firewall attached to instance or ENI.Controls inbound and outbound exposure.
Key pairSSH access credential used at launch.Controls first admin access.
Core rule: do not treat a cloud VM as a manually configured machine. Treat it as infrastructure generated from an approved image, controlled user data, security groups, monitoring and a reproducible deployment process.
AWS Ubuntu mental model
AWS EC2 Ubuntu instance
                            โ”‚
                            โ”œโ”€โ”€ AMI
                            โ”‚       โ”œโ”€โ”€ Ubuntu release
                            โ”‚       โ”œโ”€โ”€ kernel
                            โ”‚       โ”œโ”€โ”€ cloud-init
                            โ”‚       โ””โ”€โ”€ base packages
                            โ”‚
                            โ”œโ”€โ”€ Instance configuration
                            โ”‚       โ”œโ”€โ”€ instance type
                            โ”‚       โ”œโ”€โ”€ EBS volume
                            โ”‚       โ”œโ”€โ”€ subnet
                            โ”‚       โ”œโ”€โ”€ security group
                            โ”‚       โ”œโ”€โ”€ IAM role
                            โ”‚       โ””โ”€โ”€ SSH key pair
                            โ”‚
                            โ”œโ”€โ”€ First boot
                            โ”‚       โ”œโ”€โ”€ cloud-init metadata
                            โ”‚       โ”œโ”€โ”€ user data
                            โ”‚       โ”œโ”€โ”€ SSH key injection
                            โ”‚       โ”œโ”€โ”€ package installation
                            โ”‚       โ””โ”€โ”€ service bootstrap
                            โ”‚
                            โ””โ”€โ”€ Operations
                            โ”œโ”€โ”€ patching
                            โ”œโ”€โ”€ monitoring
                            โ”œโ”€โ”€ backups
                            โ”œโ”€โ”€ logs
                            โ”œโ”€โ”€ snapshots
                            โ””โ”€โ”€ replacement strategy
Official URLs
Ubuntu on AWS:
                            https://documentation.ubuntu.com/aws/

                            Find Ubuntu images on AWS:
                            https://documentation.ubuntu.com/aws/aws-how-to/instances/find-ubuntu-images/

                            Ubuntu cloud images:
                            https://cloud-images.ubuntu.com/

                            AWS EC2:
                            https://docs.aws.amazon.com/ec2/

                            cloud-init:
                            https://cloudinit.readthedocs.io/
Official Ubuntu AMIs and Canonical owner filtering

Public AMI catalogs contain many images. In production, you should avoid selecting a random public image called โ€œUbuntuโ€. Use official Canonical images and verify the owner. This reduces the risk of using an untrusted image with unknown modifications.

ItemValue / practiceReason
Canonical AWS owner ID099720109477Filters official Ubuntu AMIs published by Canonical.
Release choiceUbuntu Server LTS for production.Longer support and safer lifecycle.
Architectureamd64 or arm64.Must match EC2 instance family.
Storage typeEBS-backed AMI.Standard for modern EC2 instances.
VirtualizationHVM.Modern EC2 virtualization mode.
Image lifecyclePin or approve AMI IDs for production.Avoid surprise image changes.
Console filtering pattern
EC2 Console
                            โ”‚
                            โ”œโ”€โ”€ Images
                            โ”œโ”€โ”€ AMIs
                            โ”œโ”€โ”€ Public images
                            โ”œโ”€โ”€ Owner = 099720109477
                            โ”œโ”€โ”€ Name contains ubuntu/images/hvm-ssd/ubuntu
                            โ”œโ”€โ”€ Select LTS release
                            โ””โ”€โ”€ Verify architecture and region
AWS CLI AMI search example
aws ec2 describe-images \
                            --owners 099720109477 \
                            --filters \
                            "Name=name,Values=ubuntu/images/hvm-ssd/ubuntu-*-24.04-amd64-server-*" \
                            "Name=state,Values=available" \
                            "Name=architecture,Values=x86_64" \
                            --query 'Images | sort_by(@, &CreationDate)[-5:].{Name:Name,ImageId:ImageId,CreationDate:CreationDate}' \
                            --output table
AMI selection decision tree
Need Ubuntu EC2 image?
                            โ”‚
                            โ”œโ”€โ”€ Is this production?
                            โ”‚       โ”œโ”€โ”€ yes -> choose LTS
                            โ”‚       โ””โ”€โ”€ no  -> LTS still preferred, interim only if justified
                            โ”‚
                            โ”œโ”€โ”€ Is owner Canonical?
                            โ”‚       โ”œโ”€โ”€ yes -> continue
                            โ”‚       โ””โ”€โ”€ no  -> reject for production
                            โ”‚
                            โ”œโ”€โ”€ Does architecture match instance?
                            โ”‚       โ”œโ”€โ”€ yes -> continue
                            โ”‚       โ””โ”€โ”€ no  -> select amd64 or arm64 correctly
                            โ”‚
                            โ”œโ”€โ”€ Is AMI approved or pinned?
                            โ”‚       โ”œโ”€โ”€ yes -> launch
                            โ”‚       โ””โ”€โ”€ no  -> review before production
Production warning: never use a random public AMI because the name looks correct. Verify owner, release, architecture, creation date and approval status.
EC2 launch pattern: instance type, storage, network, security and bootstrap

Launching Ubuntu on EC2 is a sequence of infrastructure decisions. The AMI is only the OS. Production quality also depends on instance sizing, EBS volume type, network placement, security groups, IAM role, SSH access, user data and monitoring.

EC2 decisionTypical production practiceRisk if ignored
Instance typeSize by CPU, RAM, network and workload.CPU steal, memory pressure, throttling.
EBS root volumeEnough size, gp3 baseline tuned if needed.Disk full or IO bottleneck.
SubnetPublic only if it must receive internet traffic.Unnecessary exposure.
Security groupOnly required ports, source restricted.Public SSH, public DB, attack surface.
IAM roleLeast privilege role attached to instance.Static credentials on disk.
User dataMinimal bootstrap, versioned and tested.Unreproducible snowflake server.
TagsName, environment, owner, cost center, role.Poor inventory and cost tracking.
EC2 launch flow
Launch EC2
                            โ”‚
                            โ”œโ”€โ”€ Choose official Ubuntu LTS AMI
                            โ”œโ”€โ”€ Choose instance type
                            โ”œโ”€โ”€ Configure EBS root volume
                            โ”œโ”€โ”€ Select VPC and subnet
                            โ”œโ”€โ”€ Attach security group
                            โ”œโ”€โ”€ Attach IAM role
                            โ”œโ”€โ”€ Select SSH key pair
                            โ”œโ”€โ”€ Add user data
                            โ”œโ”€โ”€ Add tags
                            โ””โ”€โ”€ Launch instance
Reference AWS Ubuntu architecture
Internet
                            โ”‚
                            โ–ผ
                            AWS Load Balancer
                            โ”‚
                            โ”œโ”€โ”€ HTTPS listener
                            โ”œโ”€โ”€ certificate
                            โ””โ”€โ”€ health checks
                            โ”‚
                            โ–ผ
                            Public or private app subnet
                            โ”‚
                            โ”œโ”€โ”€ Ubuntu EC2 app server
                            โ”‚       โ”œโ”€โ”€ Nginx
                            โ”‚       โ”œโ”€โ”€ Gunicorn / app runtime
                            โ”‚       โ”œโ”€โ”€ CloudWatch agent
                            โ”‚       โ””โ”€โ”€ UFW local firewall
                            โ”‚
                            โ””โ”€โ”€ Security group
                            โ”œโ”€โ”€ inbound from load balancer only
                            โ””โ”€โ”€ SSH from bastion or admin IP only

                            Private data subnet
                            โ”‚
                            โ”œโ”€โ”€ RDS / database
                            โ”œโ”€โ”€ Redis / cache
                            โ””โ”€โ”€ no public exposure
Sizing examples
Use caseStarting pointWatch metric
Small Nginx reverse proxySmall general-purpose instance.Network, CPU, connections.
Django / API serverBalanced CPU/RAM instance.CPU, memory, latency, worker count.
Celery workerCPU or RAM based on task type.Queue depth, CPU, memory.
Database on EC2Memory and IO optimized.IOPS, latency, cache hit ratio.
EC2 rule: launch is not enough. The instance must be tagged, monitored, patched, backed up and replaceable.
SSH keys, default users and safe access patterns

Ubuntu cloud images use SSH keys for initial access. On AWS, the selected EC2 key pair is injected into the default Ubuntu user account during first boot. For Ubuntu images, the common default username is ubuntu.

Access elementProduction practiceReason
Default userubuntu for initial access.Standard cloud image behavior.
SSH key pairUse protected private key.Controls initial admin access.
SSH exposureRestrict by source IP or bastion.Reduces brute-force surface.
Root loginDisabled.Use named users and sudo.
Long-term accessCreate named admin users or SSM access.Improves audit and revocation.
Emergency accessDocument SSM, console or recovery procedure.Prevents lockout during incidents.
SSH examples
# Secure private key permissions
                            chmod 600 my-aws-key.pem

                            # Connect to Ubuntu EC2
                            ssh -i my-aws-key.pem ubuntu@EC2_PUBLIC_IP

                            # First checks
                            hostnamectl
                            whoami
                            sudo -l
                            ip a
                            systemctl status ssh
Safe access architecture
Admin workstation
                            โ”‚
                            โ”œโ”€โ”€ SSH private key
                            โ””โ”€โ”€ fixed public IP if possible
                            โ”‚
                            โ–ผ
                            Security group
                            โ”‚
                            โ”œโ”€โ”€ allow SSH only from admin IP
                            โ””โ”€โ”€ or allow SSH only from bastion
                            โ”‚
                            โ–ผ
                            Ubuntu EC2 instance
                            โ”‚
                            โ”œโ”€โ”€ default ubuntu user
                            โ”œโ”€โ”€ sudo for admin tasks
                            โ”œโ”€โ”€ no root SSH login
                            โ””โ”€โ”€ logs in auth/journal
Hardening after first login
# Update packages
                            sudo apt update
                            sudo apt upgrade

                            # Create named admin user if needed
                            sudo adduser deploy
                            sudo usermod -aG sudo deploy

                            # Add SSH key for deploy user
                            sudo mkdir -p /home/deploy/.ssh
                            sudo cp /home/ubuntu/.ssh/authorized_keys /home/deploy/.ssh/authorized_keys
                            sudo chown -R deploy:deploy /home/deploy/.ssh
                            sudo chmod 700 /home/deploy/.ssh
                            sudo chmod 600 /home/deploy/.ssh/authorized_keys

                            # Test deploy login before restricting access
                            ssh deploy@EC2_PUBLIC_IP
Access alternatives
PatternWhen usefulComment
Direct SSHSmall setups, restricted IP.Simple but exposed if public.
Bastion hostMultiple private instances.Centralized admin entry point.
AWS Systems ManagerNo public SSH desired.Requires IAM, agent and network access.
VPNPrivate operations network.Good for strict environments.
Lockout warning: before removing old keys or restricting SSH, test the new access path from a separate session.
cloud-init: first boot automation for Ubuntu cloud images

cloud-init is the standard first-boot initialization system for Ubuntu cloud images. It reads cloud metadata and user data, then applies configuration such as users, SSH keys, packages, files, commands, hostname, timezone and service setup.

cloud-init sectionPurposeExample usage
package_updateRefresh package metadata.Prepare apt before install.
package_upgradeUpgrade packages at first boot.Apply latest security patches.
usersCreate users and SSH keys.Provision deploy user.
packagesInstall packages.Nginx, fail2ban, monitoring agent.
write_filesCreate config files.Systemd unit, app env template.
runcmdRun final commands.Enable services, configure firewall.
cloud-init lifecycle
EC2 instance first boot
                            โ”‚
                            โ”œโ”€โ”€ Query AWS metadata service
                            โ”œโ”€โ”€ Read user data
                            โ”œโ”€โ”€ Configure hostname
                            โ”œโ”€โ”€ Inject SSH key
                            โ”œโ”€โ”€ Create users
                            โ”œโ”€โ”€ Configure packages
                            โ”œโ”€โ”€ Write files
                            โ”œโ”€โ”€ Run commands
                            โ”œโ”€โ”€ Start services
                            โ””โ”€โ”€ Mark initialization complete
Minimal cloud-init baseline
#cloud-config
                            package_update: true
                            package_upgrade: true

                            timezone: UTC

                            packages:
                            - curl
                            - wget
                            - git
                            - vim
                            - htop
                            - ufw
                            - fail2ban
                            - nginx

                            runcmd:
                            - ufw allow OpenSSH
                            - ufw allow 80/tcp
                            - ufw allow 443/tcp
                            - ufw --force enable
                            - systemctl enable --now nginx
                            - systemctl enable --now fail2ban
cloud-init diagnostics
# Status
                            cloud-init status

                            # Wait for completion
                            cloud-init status --wait

                            # Main logs
                            sudo less /var/log/cloud-init.log
                            sudo less /var/log/cloud-init-output.log

                            # Show instance metadata if allowed
                            curl -s http://169.254.169.254/latest/meta-data/ || true

                            # Validate config if tool supports it
                            cloud-init schema --config-file user-data.yaml
cloud-init rule: use user data for minimal first-boot bootstrap. For complex configuration, call a versioned script or configuration management tool.
User data patterns: simple bootstrap, web server, app server and config handoff

User data should be small, readable and reliable. A good pattern is to install only base packages, harden basic access, install monitoring and call a versioned bootstrap script from a trusted source. Avoid stuffing an entire production deployment into a long untested user-data block.

Pattern 1: simple HTTP test server
#cloud-config
                            package_update: true

                            packages:
                            - nginx

                            write_files:
                            - path: /var/www/html/index.html
                            permissions: '0644'
                            content: |
                            Ubuntu EC2 is running.

                            runcmd:
                            - systemctl enable --now nginx
Pattern 2: baseline security bootstrap
#cloud-config
                            package_update: true
                            package_upgrade: true

                            packages:
                            - ufw
                            - fail2ban
                            - curl
                            - htop

                            runcmd:
                            - ufw default deny incoming
                            - ufw default allow outgoing
                            - ufw allow OpenSSH
                            - ufw --force enable
                            - systemctl enable --now fail2ban
Pattern 3: handoff to versioned script
#cloud-config
                            package_update: true

                            packages:
                            - curl
                            - ca-certificates

                            runcmd:
                            - curl -fsSL https://example.com/bootstrap/ubuntu-app.sh -o /root/bootstrap.sh
                            - chmod 700 /root/bootstrap.sh
                            - /root/bootstrap.sh --role app --env prod
Better handoff pattern
Preferred production pattern:
                            1. cloud-init creates minimal baseline
                            2. instance has IAM role
                            3. script is downloaded from trusted private source
                            4. script checksum or signature is verified
                            5. configuration is versioned
                            6. logs are written to /var/log/bootstrap.log
                            7. monitoring reports success or failure
Bootstrap logging example
#!/usr/bin/env bash
                            set -euo pipefail

                            exec > >(tee -a /var/log/bootstrap.log) 2>&1

                            echo "bootstrap started at $(date -Is)"

                            apt update
                            apt install -y nginx

                            systemctl enable --now nginx

                            echo "bootstrap finished at $(date -Is)"
Supply-chain warning: avoid blindly running remote scripts as root. Use trusted sources, checksums, IAM controls and private artifact storage.
Golden AMI pattern: reproducible Ubuntu servers

A golden AMI is a prebuilt, approved image containing a hardened baseline: Ubuntu LTS, patches, standard packages, users, agents, logging, monitoring and security defaults. It reduces boot time, improves repeatability and makes replacement safer than manual repair.

Golden AMI contentPurposeExample
Ubuntu LTS baseApproved OS baseline.24.04 LTS server image.
Security updatesReduce patch work at boot.apt upgrade during image build.
AgentsMonitoring, logs, SSM, backup.CloudWatch agent, SSM agent.
HardeningCommon security defaults.SSH policy, sysctl, UFW baseline.
Tags and metadataInventory and lifecycle.version, build date, git commit.
Validation testsProve image boots and works.SSH, cloud-init, services, logs.
Golden AMI build flow
Official Ubuntu AMI
                            โ”‚
                            โ–ผ
                            Packer image build
                            โ”‚
                            โ”œโ”€โ”€ apply apt updates
                            โ”œโ”€โ”€ install baseline packages
                            โ”œโ”€โ”€ install monitoring agents
                            โ”œโ”€โ”€ apply hardening
                            โ”œโ”€โ”€ clean temporary files
                            โ”œโ”€โ”€ validate services
                            โ””โ”€โ”€ create AMI
                            โ”‚
                            โ–ผ
                            Approved AMI
                            โ”‚
                            โ”œโ”€โ”€ tagged with version
                            โ”œโ”€โ”€ tested in staging
                            โ”œโ”€โ”€ used by launch templates
                            โ””โ”€โ”€ rolled out progressively
Replace, do not repair
Traditional server:
                            - SSH into machine
                            - manually patch
                            - manually edit config
                            - server becomes unique
                            - recovery depends on memory

                            Cloud-native server:
                            - build image
                            - deploy new instance
                            - attach to load balancer
                            - drain old instance
                            - terminate old instance
                            - rollback by previous image
Golden AMI governance
[ ] Base AMI owner verified
                            [ ] Ubuntu LTS version recorded
                            [ ] Build script versioned
                            [ ] Security updates applied
                            [ ] Image tests pass
                            [ ] AMI is tagged
                            [ ] AMI ID is published to parameter store or IaC
                            [ ] Rollback AMI is retained
                            [ ] Staging rollout completed
                            [ ] Production rollout is progressive
                            [ ] Old AMIs are retired safely
Launch template model
Launch Template
                            โ”‚
                            โ”œโ”€โ”€ approved AMI ID
                            โ”œโ”€โ”€ instance type
                            โ”œโ”€โ”€ IAM role
                            โ”œโ”€โ”€ security groups
                            โ”œโ”€โ”€ EBS configuration
                            โ”œโ”€โ”€ user data
                            โ””โ”€โ”€ tags
                            โ”‚
                            โ–ผ
                            Auto Scaling Group
                            โ”œโ”€โ”€ desired capacity
                            โ”œโ”€โ”€ health checks
                            โ”œโ”€โ”€ rolling replacement
                            โ””โ”€โ”€ rollback to previous template version
Cloud robustness rule: if a server can be rebuilt from image and code, incidents become easier to recover from than manually maintained machines.
AWS security for Ubuntu EC2: security groups, IAM, metadata and private networking

Ubuntu hardening and AWS security must work together. Security groups restrict network access before traffic reaches the instance. UFW adds host-level defense. IAM roles avoid static cloud keys. Private subnets prevent unnecessary exposure.

Security controlAWS sideUbuntu side
Network filteringSecurity groups, NACLs, load balancer.UFW or nftables.
Admin accessBastion, VPN, SSM Session Manager.SSH keys, no root login, auth logs.
Cloud permissionsIAM role attached to instance.No static AWS keys stored on disk.
SecretsSecrets Manager, SSM Parameter Store.Strict file permissions if cached locally.
ObservabilityCloudWatch, VPC Flow Logs, CloudTrail.journald, auth logs, application logs.
RecoveryEBS snapshots, AMIs, backups.Restore tests and runbooks.
Security group examples
Public web server:
                            - inbound 443/tcp from 0.0.0.0/0
                            - inbound 80/tcp from 0.0.0.0/0 only if redirect is needed
                            - inbound 22/tcp only from admin IP or bastion
                            - outbound restricted if strict policy is required

                            Private app server behind load balancer:
                            - inbound app port only from load balancer security group
                            - inbound SSH only from bastion security group
                            - no direct public access

                            Database server:
                            - inbound DB port only from app security group
                            - no public IP
                            - no public SSH
Layered AWS Ubuntu security diagram
Internet
                            โ”‚
                            โ–ผ
                            AWS perimeter
                            โ”œโ”€โ”€ Route 53
                            โ”œโ”€โ”€ CloudFront / WAF
                            โ”œโ”€โ”€ Load Balancer
                            โ””โ”€โ”€ Security Groups
                            โ”‚
                            โ–ผ
                            Ubuntu host
                            โ”œโ”€โ”€ UFW
                            โ”œโ”€โ”€ SSH hardening
                            โ”œโ”€โ”€ non-root services
                            โ”œโ”€โ”€ package updates
                            โ”œโ”€โ”€ logs
                            โ””โ”€โ”€ monitoring agent
                            โ”‚
                            โ–ผ
                            Application
                            โ”œโ”€โ”€ TLS
                            โ”œโ”€โ”€ secrets management
                            โ”œโ”€โ”€ app logs
                            โ”œโ”€โ”€ DB access
                            โ””โ”€โ”€ health checks
Metadata and credentials
Recommended:
                            - use IAM roles instead of static AWS keys
                            - keep role permissions minimal
                            - avoid storing credentials in user data
                            - avoid secrets in AMI images
                            - use Parameter Store or Secrets Manager
                            - monitor CloudTrail for suspicious API calls
                            - review instance profile permissions

                            Avoid:
                            - AWS_ACCESS_KEY_ID in .bashrc
                            - secrets embedded in user data
                            - secrets baked into AMIs
                            - overly broad IAM roles
                            - public metadata exposure through SSRF-vulnerable apps
Security warning: user data may be visible to users or processes with instance metadata access. Do not place long-lived secrets in user data.
Operations: monitoring, logs, snapshots, patching and recovery

Ubuntu EC2 operations combine Linux administration and AWS lifecycle management. The system must be patched, monitored, backed up, logged, tagged, replaceable and tested. A server that cannot be rebuilt is a long-term operational risk.

Operational areaAWS controlUbuntu controlQuestion to answer
MetricsCloudWatch metrics and agent.node exporter, system metrics.Is the host saturated?
LogsCloudWatch Logs, S3 archive.journald, app logs, auth logs.Can we diagnose incidents?
BackupsEBS snapshots, AWS Backup.Application-aware backup hooks.Can we restore?
PatchingSSM Patch Manager, image rebuild.apt, unattended upgrades.Are CVEs patched?
RecoveryAMI, launch template, autoscaling.cloud-init, bootstrap scripts.Can we replace the server?
InventoryTags, AWS Config, Systems Manager.hostname, OS version, package list.Do we know what this server is?
Operational metrics
Host:
                            - CPU utilization
                            - memory usage
                            - disk usage
                            - disk IO latency
                            - network throughput
                            - systemd failed services
                            - reboot-required state

                            Application:
                            - HTTP latency
                            - HTTP 5xx
                            - worker queue depth
                            - database connections
                            - error logs
                            - health check status

                            AWS:
                            - instance status checks
                            - EBS burst balance
                            - EBS latency
                            - load balancer health
                            - security group changes
                            - CloudTrail events
Recovery patterns
Pattern 1: EBS snapshot restore
                            โ”œโ”€โ”€ create volume from snapshot
                            โ”œโ”€โ”€ attach to instance
                            โ”œโ”€โ”€ mount and recover data
                            โ””โ”€โ”€ validate application

                            Pattern 2: AMI rollback
                            โ”œโ”€โ”€ select previous AMI
                            โ”œโ”€โ”€ launch replacement instance
                            โ”œโ”€โ”€ attach to load balancer
                            โ”œโ”€โ”€ validate health
                            โ””โ”€โ”€ terminate bad instance

                            Pattern 3: Blue/green replacement
                            โ”œโ”€โ”€ build new Ubuntu image
                            โ”œโ”€โ”€ launch green environment
                            โ”œโ”€โ”€ smoke test
                            โ”œโ”€โ”€ shift traffic
                            โ””โ”€โ”€ keep blue as rollback
Ubuntu EC2 health commands
# OS and kernel
                            hostnamectl
                            uname -a
                            lsb_release -a

                            # Cloud-init status
                            cloud-init status
                            sudo tail -100 /var/log/cloud-init-output.log

                            # System health
                            uptime
                            df -h
                            free -h
                            systemctl --failed
                            journalctl -p warning --since "30 min ago"

                            # Network and ports
                            ip a
                            ip r
                            ss -lntp

                            # Reboot required
                            test -f /var/run/reboot-required && cat /var/run/reboot-required
Operations rule: backups only count if restore has been tested. Snapshots without restore tests are assumptions, not recovery guarantees.
Final AWS Ubuntu checklist
AMI and launch checklist
[ ] Official Ubuntu AMI selected
                            [ ] Canonical owner ID verified
                            [ ] LTS release selected for production
                            [ ] Architecture matches instance type
                            [ ] AMI ID is approved or pinned
                            [ ] Instance type matches workload
                            [ ] EBS volume size is sufficient
                            [ ] EBS performance is appropriate
                            [ ] VPC and subnet are correct
                            [ ] Security group is minimal
                            [ ] SSH access is restricted
                            [ ] IAM role uses least privilege
                            [ ] User data is tested
                            [ ] Tags are complete
                            [ ] Monitoring is enabled
Cloud-init checklist
[ ] User data starts with #cloud-config if YAML
                            [ ] package_update is intentional
                            [ ] package_upgrade is intentional
                            [ ] No long-lived secrets in user data
                            [ ] Bootstrap logs to file
                            [ ] cloud-init status is checked
                            [ ] /var/log/cloud-init-output.log is reviewed
                            [ ] Failed commands are visible
                            [ ] Complex setup is delegated to versioned script
                            [ ] Script source is trusted
                            [ ] Rebuild process is documented
Production operations checklist
[ ] UFW matches security group policy
                            [ ] Root SSH login is disabled
                            [ ] SSH keys are controlled
                            [ ] Patch policy exists
                            [ ] Reboot policy exists
                            [ ] CloudWatch or equivalent metrics enabled
                            [ ] Logs are shipped centrally
                            [ ] EBS snapshots are scheduled
                            [ ] Restore has been tested
                            [ ] Launch template is versioned
                            [ ] Golden AMI pipeline exists if fleet is large
                            [ ] Rollback AMI is retained
                            [ ] Instance is replaceable
                            [ ] Runbook exists
                            [ ] Owner and cost tags are present
Final rule
Ubuntu on AWS is reliable when the server is reproducible.
Use official Canonical AMIs, LTS baselines, controlled user data, minimal security groups, SSH keys or SSM, IAM roles, monitoring, snapshots, tested restore and image-based replacement where possible.
Minimal safe EC2 Ubuntu baseline
Minimum safe baseline:
                            - official Ubuntu LTS AMI
                            - Canonical owner verified
                            - SSH restricted
                            - security group minimal
                            - IAM role instead of static keys
                            - cloud-init bootstrap tested
                            - packages updated
                            - UFW enabled if needed
                            - monitoring installed
                            - snapshots configured
                            - restore tested
                            - instance documented and tagged
5.2 Ubuntu Containers & Virtualisation: Docker, LXD/LXC, KVM, virt-manager, CI/CD, labs and production patterns
Containers and virtualization on Ubuntu

Ubuntu is a strong platform for both containers and virtualization. Docker is typically used for application containers. LXD/LXC is used for system containers that behave more like lightweight machines. KVM/QEMU is used for full virtual machines with their own kernel. virt-manager provides a graphical management interface for KVM.

The key difference is isolation level. Docker containers share the host kernel and are optimized for application packaging. LXD containers also share the host kernel but feel closer to small Linux systems. KVM virtual machines run a full guest OS with stronger isolation and more overhead.

TechnologyCategoryBest forIsolationTypical command
DockerApplication containersApps, microservices, dev stacks, CI jobs.Process/container isolation, shared kernel.docker run nginx
Docker ComposeMulti-container orchestrationLocal stacks, demos, small deployments.Same as Docker.docker compose up
LXD / LXCSystem containersMini Linux systems, labs, isolated services.OS-level isolation, shared kernel.lxc launch ubuntu:24.04 c1
KVM / QEMUFull virtualizationVMs, different kernels, stronger isolation.Hardware-assisted VM isolation.virsh list --all
virt-managerGUI for KVM/libvirtDesktop/lab VM management.Manages KVM guests.Graphical interface.
Core rule: use Docker for application packaging, LXD for lightweight Linux environments, and KVM when you need a full VM with its own kernel.
Isolation model diagram
Ubuntu host
                            โ”‚
                            โ”œโ”€โ”€ Docker containers
                            โ”‚       โ”œโ”€โ”€ app process
                            โ”‚       โ”œโ”€โ”€ image layers
                            โ”‚       โ”œโ”€โ”€ container network
                            โ”‚       โ””โ”€โ”€ shared host kernel
                            โ”‚
                            โ”œโ”€โ”€ LXD system containers
                            โ”‚       โ”œโ”€โ”€ init/systemd inside container
                            โ”‚       โ”œโ”€โ”€ full Ubuntu userspace
                            โ”‚       โ”œโ”€โ”€ container profiles
                            โ”‚       โ””โ”€โ”€ shared host kernel
                            โ”‚
                            โ””โ”€โ”€ KVM virtual machines
                            โ”œโ”€โ”€ guest kernel
                            โ”œโ”€โ”€ guest OS
                            โ”œโ”€โ”€ virtual CPU/RAM/disk/NIC
                            โ””โ”€โ”€ stronger isolation boundary
Decision shortcut
Need to ship an application?
                            โ””โ”€โ”€ Docker

                            Need several services locally?
                            โ””โ”€โ”€ Docker Compose

                            Need a mini Ubuntu machine?
                            โ””โ”€โ”€ LXD / LXC

                            Need another kernel or full VM isolation?
                            โ””โ”€โ”€ KVM / QEMU

                            Need a graphical VM manager?
                            โ””โ”€โ”€ virt-manager

                            Need production orchestration at scale?
                            โ””โ”€โ”€ Kubernetes, ECS, Nomad or managed platform
Common mistake: using containers as if they were VMs without understanding storage, networking, privilege, logs, lifecycle and security boundaries.
Docker on Ubuntu: images, containers, networks, volumes and logs

Docker packages an application and its runtime dependencies into an image. A container is a running instance of that image. On Ubuntu, Docker is commonly used for development, CI/CD, local demos, staging environments and production workloads behind a reverse proxy or orchestrator.

ConceptMeaningExample
ImageImmutable package template.nginx:latest, postgres:16
ContainerRunning process from an image.docker run nginx
VolumePersistent storage outside container lifecycle.Database data, uploads.
NetworkContainer communication layer.bridge network, app network.
RegistryImage storage and distribution.Docker Hub, GHCR, ECR.
DockerfileBuild recipe for an image.Python app image.
Install Docker baseline
# Install from Ubuntu repository for simple usage
                            sudo apt update
                            sudo apt install docker.io docker-compose-v2

                            # Enable Docker
                            sudo systemctl enable --now docker

                            # Check status
                            systemctl status docker

                            # Add current user to docker group
                            sudo usermod -aG docker $USER

                            # Re-login before using docker without sudo
                            docker version
                            docker info
Core Docker commands
# List running containers
                            docker ps

                            # List all containers
                            docker ps -a

                            # List images
                            docker images

                            # Run Nginx
                            docker run --name web -p 8080:80 nginx:latest

                            # Stop and remove
                            docker stop web
                            docker rm web

                            # Logs
                            docker logs web
                            docker logs -f web

                            # Shell inside container
                            docker exec -it web bash

                            # Inspect container
                            docker inspect web

                            # Disk usage
                            docker system df
Docker architecture
Developer or CI
                            โ”‚
                            โ”œโ”€โ”€ Dockerfile
                            โ”œโ”€โ”€ build image
                            โ”œโ”€โ”€ tag image
                            โ””โ”€โ”€ push image
                            โ”‚
                            โ–ผ
                            Registry
                            โ”‚
                            โ”œโ”€โ”€ Docker Hub
                            โ”œโ”€โ”€ GitHub Container Registry
                            โ”œโ”€โ”€ AWS ECR
                            โ””โ”€โ”€ private registry
                            โ”‚
                            โ–ผ
                            Ubuntu host
                            โ”‚
                            โ”œโ”€โ”€ pull image
                            โ”œโ”€โ”€ run container
                            โ”œโ”€โ”€ attach volume
                            โ”œโ”€โ”€ expose port
                            โ””โ”€โ”€ collect logs
Security warning: membership in the docker group is effectively root-equivalent on the host. Do not grant it casually on production servers.
Docker Compose: local stacks, demos, CI environments and small deployments

Docker Compose defines several containers in one YAML file. It is excellent for local development, prototypes, demos, test stacks and small internal deployments. For large production environments, Compose is usually replaced by Kubernetes, ECS, Nomad or another orchestrator.

Use caseCompose fitComment
Local Django + Postgres + RedisExcellent.Reproducible dev environment.
Demo platformExcellent.Easy to start and stop.
CI integration testsGood.Start dependencies for test run.
Single-server productionPossible with discipline.Needs backups, monitoring, update strategy.
Large multi-node productionNot ideal.Use orchestrator.
Example Compose stack
services:
                            web:
                            build: .
                            command: gunicorn config.wsgi:application --bind 0.0.0.0:8000
                            ports:
                            - "8000:8000"
                            environment:
                            DJANGO_SETTINGS_MODULE: config.settings
                            DATABASE_URL: postgres://app:app@db:5432/app
                            REDIS_URL: redis://redis:6379/0
                            depends_on:
                            - db
                            - redis

                            db:
                            image: postgres:16
                            environment:
                            POSTGRES_DB: app
                            POSTGRES_USER: app
                            POSTGRES_PASSWORD: app
                            volumes:
                            - pgdata:/var/lib/postgresql/data

                            redis:
                            image: redis:7

                            volumes:
                            pgdata:
Compose commands
# Start stack
                            docker compose up

                            # Start in background
                            docker compose up -d

                            # Show containers
                            docker compose ps

                            # Show logs
                            docker compose logs
                            docker compose logs -f web

                            # Execute command
                            docker compose exec web bash

                            # Stop stack
                            docker compose down

                            # Stop and remove volumes - destructive
                            docker compose down -v

                            # Rebuild
                            docker compose build
                            docker compose up -d --build
Compose lifecycle
docker-compose.yml
                            โ”‚
                            โ”œโ”€โ”€ services
                            โ”œโ”€โ”€ networks
                            โ”œโ”€โ”€ volumes
                            โ”œโ”€โ”€ environment
                            โ”œโ”€โ”€ ports
                            โ””โ”€โ”€ dependencies
                            โ”‚
                            โ–ผ
                            docker compose up
                            โ”‚
                            โ”œโ”€โ”€ creates network
                            โ”œโ”€โ”€ creates volumes
                            โ”œโ”€โ”€ starts containers
                            โ”œโ”€โ”€ streams logs
                            โ””โ”€โ”€ exposes ports
Production cautions
If using Compose in production:
                            [ ] pin image versions
                            [ ] avoid latest tags
                            [ ] define restart policies
                            [ ] configure log rotation
                            [ ] persist data in named volumes
                            [ ] backup volumes
                            [ ] monitor containers
                            [ ] document upgrade process
                            [ ] keep secrets out of git
                            [ ] place behind Nginx or load balancer
Compose rule: great for reproducibility and demos. For serious production, add the missing operational pieces: backups, monitoring, secrets, updates and rollback.
LXD / LXC: system containers and lightweight Ubuntu environments

LXD manages LXC system containers. Unlike Docker, which usually runs one application process per container, LXD containers can behave like lightweight Linux machines with init, SSH, packages, services and multiple processes. This makes LXD excellent for labs, training, test environments, network simulations and isolated system services.

FeatureLXD / LXC behaviorUsefulness
System containerFull Linux userspace.Feels like a mini VM.
Shared kernelUses host kernel.Lightweight compared to VM.
ImagesLaunch Ubuntu and other Linux images.Fast lab creation.
ProfilesReusable config for containers.Standardized CPU/RAM/network/storage.
SnapshotsSnapshot and restore container state.Safe experimentation.
NetworkingBridge, routed, macvlan patterns.Complex labs and isolated networks.
LXD install and init
# Install LXD
                            sudo snap install lxd

                            # Add user to lxd group
                            sudo usermod -aG lxd $USER

                            # Re-login, then initialize
                            lxd init

                            # Launch Ubuntu container
                            lxc launch ubuntu:24.04 test1

                            # List containers
                            lxc list

                            # Shell inside container
                            lxc exec test1 -- bash
LXD command examples
# Start / stop
                            lxc start test1
                            lxc stop test1

                            # Execute command
                            lxc exec test1 -- apt update

                            # Copy file into container
                            lxc file push local.txt test1/root/local.txt

                            # Snapshot
                            lxc snapshot test1 before-change

                            # Restore snapshot
                            lxc restore test1 before-change

                            # Delete container
                            lxc delete test1 --force

                            # Show configuration
                            lxc config show test1

                            # Limit memory
                            lxc config set test1 limits.memory 1GiB

                            # Limit CPU
                            lxc config set test1 limits.cpu 2
LXD lab architecture
Ubuntu host
                            โ”‚
                            โ”œโ”€โ”€ lxdbr0 bridge
                            โ”‚
                            โ”œโ”€โ”€ container: web01
                            โ”‚       โ”œโ”€โ”€ nginx
                            โ”‚       โ””โ”€โ”€ app service
                            โ”‚
                            โ”œโ”€โ”€ container: db01
                            โ”‚       โ””โ”€โ”€ PostgreSQL
                            โ”‚
                            โ”œโ”€โ”€ container: monitor01
                            โ”‚       โ””โ”€โ”€ Prometheus / Grafana
                            โ”‚
                            โ””โ”€โ”€ snapshots
                            โ”œโ”€โ”€ before-upgrade
                            โ”œโ”€โ”€ before-network-test
                            โ””โ”€โ”€ clean-baseline
LXD rule: use LXD when you want machine-like Linux environments without the full overhead of VMs.
KVM / QEMU / libvirt: full virtualization on Ubuntu

KVM is Linux kernel-based virtualization. With QEMU and libvirt, Ubuntu can host full virtual machines. Each VM has its own virtual CPU, memory, disk, network card and guest operating system. This is heavier than containers but gives stronger isolation and supports different kernels and operating systems.

ComponentRoleExample usage
KVMKernel virtualization acceleration.Runs VM workloads efficiently.
QEMUMachine emulator and virtualizer.Emulates devices and runs guests.
libvirtManagement layer for VMs.virsh, virt-manager.
virt-installCLI VM installer.Create VM from ISO or cloud image.
virshCLI administration tool.List, start, stop, inspect VMs.
qcow2Common VM disk image format.Snapshots and thin provisioning.
Install KVM stack
# Check CPU virtualization support
                            egrep -c '(vmx|svm)' /proc/cpuinfo

                            # Install KVM/libvirt tools
                            sudo apt update
                            sudo apt install -y \
                            qemu-kvm \
                            libvirt-daemon-system \
                            libvirt-clients \
                            bridge-utils \
                            virtinst

                            # Add user to groups
                            sudo usermod -aG libvirt,kvm $USER

                            # Re-login, then check
                            virsh list --all
                            systemctl status libvirtd
KVM architecture
Ubuntu host
                            โ”‚
                            โ”œโ”€โ”€ Linux kernel with KVM
                            โ”œโ”€โ”€ QEMU processes
                            โ”œโ”€โ”€ libvirt daemon
                            โ”œโ”€โ”€ virtual networks
                            โ”œโ”€โ”€ storage pools
                            โ””โ”€โ”€ VM guests
                            โ”‚
                            โ”œโ”€โ”€ Ubuntu VM
                            โ”œโ”€โ”€ Debian VM
                            โ”œโ”€โ”€ Windows VM
                            โ””โ”€โ”€ lab appliance VM
virsh commands
# List VMs
                            virsh list --all

                            # Start VM
                            virsh start vm1

                            # Stop gracefully
                            virsh shutdown vm1

                            # Force stop
                            virsh destroy vm1

                            # VM info
                            virsh dominfo vm1

                            # Autostart VM
                            virsh autostart vm1

                            # Show networks
                            virsh net-list --all

                            # Show storage pools
                            virsh pool-list --all
When KVM is better than containers
Use KVM when:
                            - guest needs its own kernel
                            - running another OS
                            - stronger isolation is required
                            - testing kernel-level behavior
                            - simulating production VM topology
                            - running legacy software
                            - needing VM snapshots and full machine state
Resource warning: VMs consume reserved CPU/RAM/disk more like real machines. Capacity planning matters more than with lightweight containers.
virt-manager: graphical VM management on Ubuntu

virt-manager is a desktop GUI for managing KVM/libvirt virtual machines. It is useful for labs, local testing, training, troubleshooting, VM console access and visual VM configuration. On servers, CLI tools such as virsh and automation are more common.

virt-manager featurePurposeTypical usage
VM creation wizardCreate new VM from ISO or image.Ubuntu or Windows lab VM.
Console viewAccess VM screen.Install OS, fix boot issues.
Hardware editorConfigure CPU, RAM, disks, NICs.Adjust VM resources.
SnapshotsCapture VM state.Before risky change.
Network viewManage virtual networks.NAT, bridge, isolated network.
Storage poolsManage VM disks.qcow2 images and volumes.
Install virt-manager
sudo apt update
                            sudo apt install virt-manager

                            # Start GUI from desktop
                            virt-manager

                            # Check libvirt service
                            systemctl status libvirtd

                            # List VMs from CLI
                            virsh list --all
VM creation flow with virt-manager
virt-manager
                            โ”‚
                            โ”œโ”€โ”€ New virtual machine
                            โ”œโ”€โ”€ Choose ISO or cloud image
                            โ”œโ”€โ”€ Select OS type
                            โ”œโ”€โ”€ Allocate CPU and RAM
                            โ”œโ”€โ”€ Create virtual disk
                            โ”œโ”€โ”€ Choose network
                            โ”œโ”€โ”€ Start installation
                            โ””โ”€โ”€ Install guest tools if needed
Lab topology example
Ubuntu desktop host
                            โ”‚
                            โ”œโ”€โ”€ virt-manager
                            โ”‚
                            โ”œโ”€โ”€ VM: router-lab
                            โ”‚       โ”œโ”€โ”€ NIC 1: NAT
                            โ”‚       โ””โ”€โ”€ NIC 2: isolated lab network
                            โ”‚
                            โ”œโ”€โ”€ VM: web-server
                            โ”‚       โ””โ”€โ”€ lab network
                            โ”‚
                            โ””โ”€โ”€ VM: db-server
                            โ””โ”€โ”€ lab network
Good usage boundaries
Use virt-manager for:
                            - desktop labs
                            - OS installation
                            - visual debugging
                            - VM console access
                            - local experiments

                            Prefer CLI/IaC for:
                            - production servers
                            - repeatable deployment
                            - remote headless hosts
                            - large VM fleets
                            - automated rebuilds
virt-manager rule: excellent for learning and labs. For production, prefer versioned definitions, automation and CLI-controlled operations.
CI/CD, labs and developer workflows

Ubuntu containers and virtualization are extremely useful for reproducible development, automated tests, CI runners, network labs, database experiments, security sandboxes and integration environments. The goal is to reduce โ€œworks on my machineโ€ problems.

WorkflowBest technologyWhy
Local web app stackDocker ComposeFast, reproducible dependencies.
CI integration testsDocker servicesStart DB/cache/message broker for tests.
Linux admin trainingLXDFast mini Ubuntu machines.
Network topology labLXD or KVMMultiple nodes and networks.
Kernel or OS testingKVMFull guest kernel isolation.
Security sandboxKVMStronger isolation boundary.
CI pipeline example
Git push
                            โ”‚
                            โ–ผ
                            CI runner on Ubuntu
                            โ”‚
                            โ”œโ”€โ”€ checkout code
                            โ”œโ”€โ”€ build Docker image
                            โ”œโ”€โ”€ start Compose services
                            โ”‚       โ”œโ”€โ”€ app
                            โ”‚       โ”œโ”€โ”€ postgres
                            โ”‚       โ””โ”€โ”€ redis
                            โ”œโ”€โ”€ run tests
                            โ”œโ”€โ”€ scan image
                            โ”œโ”€โ”€ push image to registry
                            โ””โ”€โ”€ deploy to target environment
Demo architecture: one Ubuntu host
Ubuntu host
                            โ”‚
                            โ”œโ”€โ”€ Docker service
                            โ”‚       โ”œโ”€โ”€ nginx container
                            โ”‚       โ”œโ”€โ”€ app container
                            โ”‚       โ””โ”€โ”€ redis container
                            โ”‚
                            โ”œโ”€โ”€ LXD container
                            โ”‚       โ””โ”€โ”€ ubuntu:24.04 system lab
                            โ”‚
                            โ”œโ”€โ”€ KVM VM
                            โ”‚       โ””โ”€โ”€ isolated test machine
                            โ”‚
                            โ””โ”€โ”€ Monitoring
                            โ”œโ”€โ”€ node exporter
                            โ”œโ”€โ”€ docker stats
                            โ””โ”€โ”€ systemd status
Useful demo commands
# Docker demo
                            docker run -d --name demo-nginx -p 8080:80 nginx:latest
                            curl -I http://localhost:8080
                            docker logs demo-nginx

                            # LXD demo
                            lxc launch ubuntu:24.04 lab1
                            lxc exec lab1 -- bash -lc "hostnamectl && apt update"

                            # KVM visibility
                            virsh list --all

                            # Host monitoring
                            top
                            df -h
                            ss -lntp
Portfolio demo idea: one Ubuntu host running one Docker app, one LXD system container, one KVM VM, plus a small monitoring page. It clearly shows platform breadth.
Production patterns: Docker host, reverse proxy, volumes, logs, updates and orchestration

Containers in production require more than docker run. You need image governance, non-root containers, pinned versions, health checks, persistent volumes, log rotation, backup, monitoring, network boundaries, secrets management and a clear update strategy.

Production topicGood practiceRisk if ignored
Image versionsPin tags or digests.Unexpected changes from latest.
VolumesPersist state outside container.Data loss on container removal.
LogsConfigure rotation and centralization.Disk fills under /var/lib/docker.
SecretsUse secret store or strict env file permissions.Secrets leaked in git or inspect output.
NetworkingExpose only reverse proxy, keep internal networks private.DB/cache exposed accidentally.
Health checksDefine container and load balancer health.Dead service appears running.
BackupsBackup volumes and databases.No recovery path.
Single-host container production pattern
Internet
                            โ”‚
                            โ–ผ
                            Nginx on Ubuntu host
                            โ”‚
                            โ”œโ”€โ”€ TLS termination
                            โ”œโ”€โ”€ rate limiting
                            โ”œโ”€โ”€ static files
                            โ””โ”€โ”€ reverse proxy
                            โ”‚
                            โ–ผ
                            Docker network
                            โ”‚
                            โ”œโ”€โ”€ app container
                            โ”œโ”€โ”€ worker container
                            โ”œโ”€โ”€ redis container
                            โ””โ”€โ”€ internal-only database or external DB
Docker daemon log rotation
# /etc/docker/daemon.json
                            {
                            "log-driver": "json-file",
                            "log-opts": {
                            "max-size": "50m",
                            "max-file": "5"
                            }
                            }

                            # Apply
                            sudo systemctl restart docker
Production Compose example
services:
                            app:
                            image: registry.example.com/myapp:1.4.2
                            restart: unless-stopped
                            env_file:
                            - /srv/myapp/app.env
                            networks:
                            - internal
                            healthcheck:
                            test: ["CMD", "curl", "-f", "http://localhost:8000/health/"]
                            interval: 30s
                            timeout: 5s
                            retries: 3

                            redis:
                            image: redis:7.2
                            restart: unless-stopped
                            networks:
                            - internal
                            volumes:
                            - redisdata:/data

                            networks:
                            internal:

                            volumes:
                            redisdata:
When to move beyond Compose
Move to orchestrator when:
                            - multiple hosts are needed
                            - rolling updates are required
                            - autoscaling is required
                            - service discovery is complex
                            - secrets and config need governance
                            - many teams deploy independently
                            - high availability is mandatory
Production warning: containers do not remove the need for Linux administration. The host still needs patching, disk monitoring, firewalling, backups and incident response.
Troubleshooting containers and virtualization

Troubleshooting should start at the right layer: host health, Docker daemon, container logs, network binding, volume permissions, image version, LXD profile, libvirt daemon, VM console or storage pool. Avoid deleting containers or volumes before understanding where persistent data lives.

SymptomFirst checksCommon cause
Docker container exitsdocker ps -a, docker logsBad config, missing env, app crash.
Port not reachabledocker ps, ss -lntp, UFW.Port not published, firewall, bind address.
Disk fulldocker system df, du -sh /var/lib/dockerLogs, images, old containers, volumes.
Permission denied on volumels -lah, container user, UID/GID.Host volume ownership mismatch.
LXD container has no networklxc list, lxc network list.Bridge, DNS, profile or firewall issue.
KVM VM will not startvirsh list --all, libvirt logs.Missing storage, permission, CPU virtualization.
Docker troubleshooting commands
docker ps
                            docker ps -a
                            docker logs CONTAINER
                            docker inspect CONTAINER
                            docker exec -it CONTAINER bash
                            docker stats
                            docker system df
                            docker network ls
                            docker volume ls
                            systemctl status docker
                            journalctl -u docker --since "30 min ago"
LXD and KVM troubleshooting commands
# LXD
                            lxc list
                            lxc info CONTAINER
                            lxc config show CONTAINER
                            lxc network list
                            lxc storage list
                            lxc exec CONTAINER -- bash
                            journalctl -u snap.lxd.daemon --since "30 min ago"

                            # KVM / libvirt
                            virsh list --all
                            virsh dominfo VM
                            virsh net-list --all
                            virsh pool-list --all
                            systemctl status libvirtd
                            journalctl -u libvirtd --since "30 min ago"
Decision tree
Container or VM issue
                            โ”‚
                            โ”œโ”€โ”€ Is host healthy?
                            โ”‚       โ””โ”€โ”€ CPU, RAM, disk, network
                            โ”‚
                            โ”œโ”€โ”€ Is manager running?
                            โ”‚       โ”œโ”€โ”€ Docker daemon
                            โ”‚       โ”œโ”€โ”€ LXD daemon
                            โ”‚       โ””โ”€โ”€ libvirt daemon
                            โ”‚
                            โ”œโ”€โ”€ Is workload running?
                            โ”‚       โ”œโ”€โ”€ docker ps -a
                            โ”‚       โ”œโ”€โ”€ lxc list
                            โ”‚       โ””โ”€โ”€ virsh list --all
                            โ”‚
                            โ”œโ”€โ”€ What do logs say?
                            โ”‚       โ”œโ”€โ”€ docker logs
                            โ”‚       โ”œโ”€โ”€ lxc info --show-log
                            โ”‚       โ””โ”€โ”€ journalctl
                            โ”‚
                            โ””โ”€โ”€ Is it network, storage or permissions?
                            โ”œโ”€โ”€ ports
                            โ”œโ”€โ”€ volumes
                            โ”œโ”€โ”€ bridges
                            โ””โ”€โ”€ UID/GID
Data warning: docker compose down -v deletes named volumes. Never run it on production unless you explicitly want to delete persistent data.
Final checklist and command cheat sheet
Technology choice checklist
[ ] Docker selected for application containers
                            [ ] Compose selected for local or small multi-service stacks
                            [ ] LXD selected for system-container labs
                            [ ] KVM selected for full VM isolation
                            [ ] virt-manager selected for GUI lab management
                            [ ] Production orchestrator considered if multi-node
                            [ ] Persistent data location is documented
                            [ ] Backup strategy exists for volumes and VM disks
                            [ ] Network exposure is documented
                            [ ] Host firewall rules are known
                            [ ] Logs are rotated
                            [ ] Images are pinned
                            [ ] Secrets are not stored in git
                            [ ] Monitoring covers host and workloads
                            [ ] Update and rollback process exists
Docker cheat sheet
docker ps
                            docker ps -a
                            docker images
                            docker run --name web -p 8080:80 nginx
                            docker logs -f web
                            docker exec -it web bash
                            docker stop web
                            docker rm web
                            docker system df
                            docker compose up -d
                            docker compose logs -f
                            docker compose down
LXD / KVM cheat sheet
# LXD
                            lxd init
                            lxc launch ubuntu:24.04 c1
                            lxc list
                            lxc exec c1 -- bash
                            lxc snapshot c1 before-change
                            lxc restore c1 before-change
                            lxc delete c1 --force

                            # KVM / libvirt
                            virsh list --all
                            virsh start vm1
                            virsh shutdown vm1
                            virsh dominfo vm1
                            virsh net-list --all
                            virsh pool-list --all

                            # Host checks
                            systemctl status docker
                            systemctl status libvirtd
                            df -h
                            free -h
                            ss -lntp
Final rule
Ubuntu is a strong virtualization and container host when the lifecycle is controlled.
Docker gives fast application packaging, Compose gives reproducible stacks, LXD gives machine-like containers, and KVM gives full VM isolation. Production quality depends on security, storage, networking, logs, monitoring, backups and rollback.
Minimal robust host profile
Minimum robust Ubuntu container/VM host:
                            - Ubuntu LTS
                            - patched kernel and packages
                            - Docker/LXD/KVM installed intentionally
                            - non-root operational model
                            - storage sized and monitored
                            - log rotation enabled
                            - firewall rules documented
                            - images or VM templates versioned
                            - backups tested
                            - monitoring and alerts enabled
                            - runbook documented
6.1 Ubuntu Troubleshooting Playbook: logs, systemd, network, DNS, disk, boot, services and incidents
Professional troubleshooting method

Ubuntu troubleshooting must be systematic. The objective is not to try random commands until something changes. The objective is to identify the failing layer: application, service manager, process, logs, permissions, network, DNS, firewall, storage, memory, CPU, kernel, package update, boot or recent configuration change.

A good incident workflow follows a stable sequence: define the symptom, determine the scope, collect evidence, isolate the layer, apply one minimal fix, verify, document, then add prevention.

StepQuestionCommand familyOutput expected
1. SymptomWhat exactly is failing?curl, browser, user report, monitoringPrecise error, time, scope.
2. ScopeOne service, one host, one network, all users?health checks, ping, curl, dashboardIncident boundary.
3. LogsWhat did the system report?journalctl, tail, grep, dmesgError message and timeline.
4. ServicesIs the daemon running and enabled?systemctl, ssRunning state, PID, port.
5. ResourcesIs the host saturated?top, free, df, iostat, vmstatCPU/RAM/disk/IO pressure.
6. NetworkCan traffic reach the service?ip, ss, ufw, dig, curlIP, route, DNS, port, firewall status.
7. ChangeWhat changed recently?apt history, deploy logs, git, config diffLikely trigger.
8. FixWhat is the smallest safe correction?rollback, restart, config fix, cleanupService restored and verified.
Core rule: observe before acting. During an incident, every random restart or untracked change can destroy evidence and make the root cause harder to find.
Global diagnostic decision tree
Ubuntu incident
                            โ”‚
                            โ”œโ”€โ”€ Is the host reachable?
                            โ”‚       โ”œโ”€โ”€ no  -> cloud, network, firewall, boot, provider
                            โ”‚       โ””โ”€โ”€ yes
                            โ”‚
                            โ”œโ”€โ”€ Is disk full?
                            โ”‚       โ”œโ”€โ”€ yes -> disk playbook
                            โ”‚       โ””โ”€โ”€ no
                            โ”‚
                            โ”œโ”€โ”€ Are critical services running?
                            โ”‚       โ”œโ”€โ”€ no  -> systemd playbook
                            โ”‚       โ””โ”€โ”€ yes
                            โ”‚
                            โ”œโ”€โ”€ Are ports listening?
                            โ”‚       โ”œโ”€โ”€ no  -> service config / bind / crash
                            โ”‚       โ””โ”€โ”€ yes
                            โ”‚
                            โ”œโ”€โ”€ Is network path valid?
                            โ”‚       โ”œโ”€โ”€ no  -> DNS / route / firewall / SG
                            โ”‚       โ””โ”€โ”€ yes
                            โ”‚
                            โ”œโ”€โ”€ Are resources saturated?
                            โ”‚       โ”œโ”€โ”€ CPU -> CPU playbook
                            โ”‚       โ”œโ”€โ”€ RAM -> memory playbook
                            โ”‚       โ”œโ”€โ”€ IO  -> disk / IO playbook
                            โ”‚       โ””โ”€โ”€ no
                            โ”‚
                            โ””โ”€โ”€ Application layer likely
                            โ”œโ”€โ”€ app logs
                            โ”œโ”€โ”€ DB connectivity
                            โ”œโ”€โ”€ cache connectivity
                            โ”œโ”€โ”€ external dependency
                            โ””โ”€โ”€ recent deploy
Do / avoid
DoAvoid
Collect logs with time window.Reading huge logs without filtering.
Change one thing at a time.Restarting everything blindly.
Check disk early.Debugging app while root FS is full.
Validate config before restart.Restarting with unvalidated config.
Keep rollback possible.Deleting unknown files in production.
First 5 minutes: collect facts quickly

The first minutes of an incident are for orientation. You need to know whether the machine is alive, whether the disk is full, whether memory is exhausted, whether services failed, which ports are listening, and whether logs show a clear error.

One-screen diagnostic
echo "== HOST =="
                            hostnamectl

                            echo "== UPTIME =="
                            uptime

                            echo "== WHO IS CONNECTED =="
                            who

                            echo "== DISK =="
                            df -h

                            echo "== MEMORY =="
                            free -h

                            echo "== FAILED UNITS =="
                            systemctl --failed

                            echo "== LISTENING PORTS =="
                            ss -lntp

                            echo "== RECENT WARNINGS =="
                            journalctl -p warning --since "30 min ago" --no-pager | tail -100
SignalGood signBad signNext action
UptimeStable, expected boot time.Unexpected reboot.Check previous boot logs.
DiskFilesystem below alert threshold./ or /var near 100%.Disk playbook.
MemoryAvailable RAM healthy.Swap active, OOM events.Memory playbook.
Failed unitsNo failed services.Critical unit failed.systemd playbook.
PortsExpected ports listening.Missing 80/443/app/DB port.Service and network playbook.
Minimum incident facts
Incident facts to capture:
                            - exact symptom
                            - first detection time
                            - impacted users or services
                            - hostname
                            - Ubuntu version
                            - kernel version
                            - uptime
                            - recent deployments
                            - recent package upgrades
                            - failed services
                            - resource saturation
                            - relevant logs
                            - immediate workaround
                            - rollback option
Recent change checks
# Apt package changes
                            less /var/log/apt/history.log
                            less /var/log/apt/term.log

                            # Recently modified config files under /etc
                            sudo find /etc -type f -mtime -2 -ls | sort -k 8,9

                            # Recent system boots
                            last reboot | head

                            # Current users
                            who
                            w

                            # Cron logs if syslog is available
                            grep CRON /var/log/syslog | tail -100
Immediate triage matrix
ObservationLikely category
Service failed in systemctlService/config/dependency issue.
Port not listeningService did not bind or crashed.
Port listening locally but unreachable remotelyFirewall, route, security group, load balancer.
Disk fullLogs, Docker, DB, uploads, backups.
OOM kill in kernel logsMemory leak or insufficient RAM.
Fast triage: first separate host failure, service failure, network failure, resource saturation and application failure.
systemd and service troubleshooting

Most production daemons on Ubuntu are managed by systemd. When a service is down, start with systemctl status, then read the journal, validate the config, check ports, check permissions, and only then restart.

QuestionCommandWhat it tells you
Is service running?systemctl status nginxActive state, PID, exit code, recent logs.
Why did it fail?journalctl -u nginxService logs and errors.
Did unit fail?systemctl --failedFailed systemd units.
Does it start at boot?systemctl is-enabled nginxBoot activation state.
Which ports are bound?ss -lntpListening sockets and processes.
What are unit properties?systemctl show nginxRestart policy, limits, user, environment.
Service commands
# Status
                            systemctl status SERVICE

                            # Logs
                            journalctl -u SERVICE --since "1 hour ago"
                            journalctl -u SERVICE -f

                            # Restart
                            sudo systemctl restart SERVICE

                            # Reload config if supported
                            sudo systemctl reload SERVICE

                            # Enable at boot
                            sudo systemctl enable SERVICE

                            # Failed units
                            systemctl --failed

                            # Unit file
                            systemctl cat SERVICE

                            # Runtime properties
                            systemctl show SERVICE | less
Service failure decision tree
Service failed
                            โ”‚
                            โ”œโ”€โ”€ Read status
                            โ”‚       โ””โ”€โ”€ systemctl status SERVICE
                            โ”‚
                            โ”œโ”€โ”€ Read logs with time window
                            โ”‚       โ””โ”€โ”€ journalctl -u SERVICE --since "30 min ago"
                            โ”‚
                            โ”œโ”€โ”€ Config syntax valid?
                            โ”‚       โ”œโ”€โ”€ nginx -t
                            โ”‚       โ”œโ”€โ”€ sshd -t
                            โ”‚       โ””โ”€โ”€ app-specific check
                            โ”‚
                            โ”œโ”€โ”€ Dependency available?
                            โ”‚       โ”œโ”€โ”€ database
                            โ”‚       โ”œโ”€โ”€ redis
                            โ”‚       โ”œโ”€โ”€ network
                            โ”‚       โ””โ”€โ”€ filesystem mount
                            โ”‚
                            โ”œโ”€โ”€ Permissions correct?
                            โ”‚       โ”œโ”€โ”€ service user
                            โ”‚       โ”œโ”€โ”€ config files
                            โ”‚       โ””โ”€โ”€ runtime directories
                            โ”‚
                            โ”œโ”€โ”€ Port conflict?
                            โ”‚       โ””โ”€โ”€ ss -lntp
                            โ”‚
                            โ””โ”€โ”€ Restart with monitoring
                            โ””โ”€โ”€ systemctl restart SERVICE
Common service failures
Error patternLikely causeFix direction
Exit code 1 after deployBad config or app error.Validate config, rollback deploy.
Permission deniedWrong owner/group/path.Check service user and namei -l.
Address already in usePort conflict.Find process with ss -lntp.
Start request repeated too quicklyCrash loop.Fix root cause, then systemctl reset-failed.
Dependency failedDatabase, network, mount, Redis missing.Restore dependency first.
Service rule: restart is a recovery action, not a diagnosis. Read the reason before restarting when possible.
Logs and journald: finding the real error

Logs are the timeline of the incident. On Ubuntu, the main tools are journalctl, service-specific logs, /var/log/auth.log, /var/log/syslog, kernel logs and application logs. The most useful log queries are scoped by service and time.

NeedCommandUse case
Recent critical contextjournalctl -xeQuick overview of recent errors.
Service logsjournalctl -u nginxWhy one service failed.
Current boot logsjournalctl -bBoot-time errors and service startup.
Previous boot logsjournalctl -b -1Debug crash/reboot before current boot.
Kernel logsjournalctl -kOOM, disk, driver, network errors.
Authentication logs/var/log/auth.logSSH, sudo, login attempts.
journalctl commands
# Recent errors and context
                            journalctl -xe

                            # Service logs today
                            journalctl -u SERVICE --since today

                            # Service logs last 30 minutes
                            journalctl -u SERVICE --since "30 min ago"

                            # Follow service logs
                            journalctl -u SERVICE -f

                            # Warnings and errors today
                            journalctl -p warning --since today

                            # Current boot
                            journalctl -b

                            # Previous boot
                            journalctl -b -1

                            # Kernel logs
                            journalctl -k --since today
Classic log files
# System log
                            sudo tail -200 /var/log/syslog

                            # Authentication log
                            sudo tail -200 /var/log/auth.log

                            # Nginx logs
                            sudo tail -200 /var/log/nginx/error.log
                            sudo tail -200 /var/log/nginx/access.log

                            # Apt history
                            less /var/log/apt/history.log
                            less /var/log/apt/term.log

                            # Compressed rotated logs
                            zgrep -i "error" /var/log/syslog.*.gz

                            # Kernel ring buffer
                            dmesg -T | tail -100
Log investigation flow
Need root cause from logs
                            โ”‚
                            โ”œโ”€โ”€ Identify incident time window
                            โ”‚
                            โ”œโ”€โ”€ Read service journal
                            โ”‚       โ””โ”€โ”€ journalctl -u SERVICE --since TIME
                            โ”‚
                            โ”œโ”€โ”€ Read system warnings
                            โ”‚       โ””โ”€โ”€ journalctl -p warning --since TIME
                            โ”‚
                            โ”œโ”€โ”€ Read kernel logs
                            โ”‚       โ””โ”€โ”€ journalctl -k --since TIME
                            โ”‚
                            โ”œโ”€โ”€ Read application logs
                            โ”‚
                            โ”œโ”€โ”€ Correlate with deploy/update
                            โ”‚       โ””โ”€โ”€ apt history / deploy log
                            โ”‚
                            โ””โ”€โ”€ Extract first error, not last symptom
Useful grep patterns
grep -i "error" app.log
                            grep -i "permission denied" app.log
                            grep -i "connection refused" app.log
                            grep -i "no space left" /var/log/syslog
                            grep -i "killed process" /var/log/syslog
                            grep -i "failed password" /var/log/auth.log
Log rule: find the first meaningful error in the timeline. Later messages often describe consequences, not causes.
Network and DNS troubleshooting

Network debugging should be layered: IP address, route, DNS, firewall, listening port, local service response, remote response. Do not assume an application is broken until the network path is verified.

LayerQuestionCommandBad sign
InterfaceDoes the host have an IP?ip aNo expected IP.
RouteIs default route present?ip rNo default route.
DNSCan names resolve?dig, resolvectlTimeout or wrong answer.
FirewallIs traffic allowed?ufw status verboseRequired port denied.
SocketIs service listening?ss -lntpPort missing.
HTTP localDoes local endpoint respond?curl -I localhostConnection refused or 5xx.
Remote pathDoes public endpoint respond?curl -I domainTimeout, TLS, 5xx, wrong IP.
Network commands
# Interfaces
                            ip a

                            # Routes
                            ip r

                            # DNS status
                            resolvectl status

                            # DNS query
                            dig example.com
                            dig A example.com
                            dig AAAA example.com

                            # Listening ports
                            ss -lntp

                            # Connection summary
                            ss -s

                            # Firewall
                            sudo ufw status verbose

                            # Local HTTP check
                            curl -I http://localhost

                            # Public HTTP check
                            curl -I https://example.com
Network decision tree
Service unreachable
                            โ”‚
                            โ”œโ”€โ”€ Is service listening locally?
                            โ”‚       โ”œโ”€โ”€ no  -> service/config issue
                            โ”‚       โ””โ”€โ”€ yes
                            โ”‚
                            โ”œโ”€โ”€ Does local curl work?
                            โ”‚       โ”œโ”€โ”€ no  -> app/service issue
                            โ”‚       โ””โ”€โ”€ yes
                            โ”‚
                            โ”œโ”€โ”€ Is firewall open?
                            โ”‚       โ”œโ”€โ”€ no  -> UFW/cloud security group
                            โ”‚       โ””โ”€โ”€ yes
                            โ”‚
                            โ”œโ”€โ”€ Does DNS point to correct IP?
                            โ”‚       โ”œโ”€โ”€ no  -> DNS provider / record
                            โ”‚       โ””โ”€โ”€ yes
                            โ”‚
                            โ”œโ”€โ”€ Does remote curl reach?
                            โ”‚       โ”œโ”€โ”€ no  -> route/LB/firewall/provider
                            โ”‚       โ””โ”€โ”€ yes
                            โ”‚
                            โ””โ”€โ”€ Is response app error?
                            โ””โ”€โ”€ app logs / upstream logs
Common network symptoms
SymptomLikely causeCheck
Connection refusedNo service listening on target port.ss -lntp, service status.
Connection timeoutFirewall, route, security group, provider.UFW, cloud firewall, route.
DNS resolves wrong IPBad DNS record or stale cache.dig, DNS console.
Works locally, not remotelyFirewall, bind address, reverse proxy, LB.ss, UFW, Nginx.
TLS errorWrong certificate, expired cert, SNI issue.Nginx logs, certbot, openssl.
Network rule: localhost works and public domain works are different tests. Always verify both.
Disk, filesystem and IO troubleshooting

Disk problems can break everything: package installs, logs, databases, Docker, SSH, application uploads and systemd services. Always check disk early in an incident. A full / or /var often creates misleading application errors.

ProblemCommandLikely causeSafe first action
Filesystem fulldf -hLogs, Docker, DB, backups, uploads.Identify large directories.
Large logsdu -sh /var/log/*Log storm or missing rotation.Vacuum journal, rotate logs.
Docker disk growthdocker system dfImages, volumes, logs.Prune only understood objects.
Mount missingfindmnt, lsblk -ffstab error, disk detach.Fix mount, do not write to wrong path.
High IO waitiostat -xz 1Slow disk, DB writes, swap, backup.Find process with iotop.
Disk commands
# Filesystem usage
                            df -h

                            # Inode usage
                            df -ih

                            # Top-level directory sizes
                            sudo du -xhd1 / 2>/dev/null | sort -h

                            # Common growth areas
                            sudo du -sh /var/log/*
                            sudo du -sh /var/lib/docker/* 2>/dev/null
                            sudo du -sh /var/lib/postgresql/* 2>/dev/null
                            sudo du -sh /tmp/* 2>/dev/null

                            # Mounts and disks
                            lsblk -f
                            findmnt
                            cat /etc/fstab

                            # IO statistics
                            iostat -xz 1
                            sudo iotop -o
Disk full decision tree
Disk full
                            โ”‚
                            โ”œโ”€โ”€ Which filesystem?
                            โ”‚       โ””โ”€โ”€ df -h
                            โ”‚
                            โ”œโ”€โ”€ Is it root or /var?
                            โ”‚       โ””โ”€โ”€ du -xhd1 /
                            โ”‚
                            โ”œโ”€โ”€ Is journal huge?
                            โ”‚       โ””โ”€โ”€ journalctl --disk-usage
                            โ”‚
                            โ”œโ”€โ”€ Are app logs huge?
                            โ”‚       โ””โ”€โ”€ du -sh /var/log/*
                            โ”‚
                            โ”œโ”€โ”€ Is Docker huge?
                            โ”‚       โ””โ”€โ”€ docker system df
                            โ”‚
                            โ”œโ”€โ”€ Is database huge?
                            โ”‚       โ””โ”€โ”€ do not delete manually
                            โ”‚
                            โ””โ”€โ”€ Prevent recurrence
                            โ”œโ”€โ”€ logrotate
                            โ”œโ”€โ”€ monitoring
                            โ”œโ”€โ”€ retention
                            โ””โ”€โ”€ resize or separate volume
Safe cleanup commands
# Clean apt cache
                            sudo apt clean

                            # Remove unused packages
                            sudo apt autoremove

                            # Show journal size
                            journalctl --disk-usage

                            # Vacuum journal by time
                            sudo journalctl --vacuum-time=7d

                            # Vacuum journal by size
                            sudo journalctl --vacuum-size=1G

                            # Docker usage
                            docker system df

                            # Docker image cleanup - use with care
                            docker image prune
Dangerous cleanup commands
Dangerous in production:
                            rm -rf /var/lib/postgresql/*
                            rm -rf /var/lib/mysql/*
                            rm -rf /var/lib/docker/volumes/*
                            docker compose down -v
                            truncate unknown database files
                            delete random files under /var/lib

                            Safer:
                            - understand owner service
                            - stop service if required
                            - backup first
                            - use native cleanup tools
                            - document action
Disk rule: never delete unknown files under database or volume directories. Freeing space quickly can create permanent data loss.
CPU, memory, swap and process troubleshooting

Resource saturation explains many incidents: slow responses, timeouts, SSH lag, services killed by OOM, high load, worker backlog, database slowness and container instability. Identify whether the bottleneck is CPU, RAM, swap, IO wait or one process.

SignalCommandInterpretationNext action
High loaduptimeRunnable/waiting tasks high.Check CPU vs IO wait.
High CPUtop, pidstatProcess consuming CPU.Profile app or reduce load.
Low memoryfree -hAvailable memory low.Find memory process.
Swap activityvmstat 1RAM pressure causing latency.Reduce workers, add RAM.
OOM killjournalctl -kKernel killed process.Fix memory pressure.
High IO waitiostat, topCPU waiting on disk.Disk/IO playbook.
Resource commands
# CPU/load
                            uptime
                            top
                            htop
                            ps aux --sort=-%cpu | head -30

                            # Memory
                            free -h
                            ps aux --sort=-%mem | head -30
                            vmstat 1

                            # Swap
                            swapon --show

                            # OOM events
                            journalctl -k --since today | grep -i -E "oom|killed process"

                            # Per-process stats if sysstat installed
                            pidstat -u -r 1
Resource decision tree
Server slow
                            โ”‚
                            โ”œโ”€โ”€ Load high?
                            โ”‚       โ””โ”€โ”€ uptime
                            โ”‚
                            โ”œโ”€โ”€ CPU saturated?
                            โ”‚       โ”œโ”€โ”€ yes -> top, process, app profiler
                            โ”‚       โ””โ”€โ”€ no
                            โ”‚
                            โ”œโ”€โ”€ IO wait high?
                            โ”‚       โ”œโ”€โ”€ yes -> iostat, iotop, disk playbook
                            โ”‚       โ””โ”€โ”€ no
                            โ”‚
                            โ”œโ”€โ”€ Memory low?
                            โ”‚       โ”œโ”€โ”€ yes -> ps by memory, OOM logs
                            โ”‚       โ””โ”€โ”€ no
                            โ”‚
                            โ”œโ”€โ”€ Swap active?
                            โ”‚       โ”œโ”€โ”€ yes -> reduce workers or add RAM
                            โ”‚       โ””โ”€โ”€ no
                            โ”‚
                            โ””โ”€โ”€ App-level bottleneck
                            โ”œโ”€โ”€ DB query
                            โ”œโ”€โ”€ lock
                            โ”œโ”€โ”€ external API
                            โ”œโ”€โ”€ cache miss
                            โ””โ”€โ”€ queue backlog
Common resource fixes
CauseShort-term actionLong-term fix
Too many app workersReduce workers, restart app.Right-size worker count.
Memory leakRestart controlled service.Fix code, add monitoring, MemoryMax.
Traffic spikeRate limit, scale, cache.Autoscaling, CDN, capacity plan.
Slow database queryKill/limit bad query if safe.Index, query optimization, DB scaling.
Backup job overloadPause or throttle job.Schedule and IO limits.
Resource rule: high load does not always mean high CPU. It can also mean tasks waiting on disk or blocked resources.
Boot, kernel, emergency mode and recovery troubleshooting

Boot issues usually come from filesystem errors, broken /etc/fstab, failed mounts, bootloader problems, bad kernel update, disk issues or cloud volume attachment problems. Recovery may require console access, rescue mode, previous kernel or mounting the disk on another instance.

SymptomLikely causeDiagnosticRecovery direction
Emergency modeBroken fstab or failed mount.Console logs, journalctl -xb.Fix fstab or mount issue.
Boot hangs after updateKernel/driver issue.GRUB previous kernel.Boot previous kernel, rollback.
No SSH after rebootNetwork, firewall, ssh service, boot incomplete.Cloud console / serial log.Console recovery.
Filesystem check failsDisk corruption or unclean shutdown.fsck from recovery.Repair with backup ready.
Wrong boot diskBootloader or cloud volume mapping.UEFI/GRUB/cloud console.Fix boot order or volume attachment.
Boot diagnostics
# Current boot logs
                            journalctl -b

                            # Previous boot logs
                            journalctl -b -1

                            # Boot errors
                            journalctl -b -p err

                            # Kernel logs
                            journalctl -k -b

                            # Filesystems
                            lsblk -f
                            findmnt
                            cat /etc/fstab

                            # Failed units
                            systemctl --failed

                            # Kernel version
                            uname -a
Boot failure decision tree
Server did not come back after reboot
                            โ”‚
                            โ”œโ”€โ”€ Cloud or physical console available?
                            โ”‚       โ”œโ”€โ”€ yes -> read boot output
                            โ”‚       โ””โ”€โ”€ no  -> use provider recovery tools
                            โ”‚
                            โ”œโ”€โ”€ Reaches GRUB?
                            โ”‚       โ”œโ”€โ”€ yes -> try previous kernel
                            โ”‚       โ””โ”€โ”€ no  -> bootloader/disk issue
                            โ”‚
                            โ”œโ”€โ”€ Emergency mode?
                            โ”‚       โ”œโ”€โ”€ yes -> check fstab and mounts
                            โ”‚       โ””โ”€โ”€ no
                            โ”‚
                            โ”œโ”€โ”€ Network failed?
                            โ”‚       โ”œโ”€โ”€ yes -> check netplan/cloud-init
                            โ”‚       โ””โ”€โ”€ no
                            โ”‚
                            โ”œโ”€โ”€ SSH failed?
                            โ”‚       โ”œโ”€โ”€ yes -> ssh service/firewall/keys
                            โ”‚       โ””โ”€โ”€ no
                            โ”‚
                            โ””โ”€โ”€ Application failed after boot
                            โ””โ”€โ”€ systemd service playbook
fstab recovery checks
# Check fstab content
                            cat /etc/fstab

                            # Test mounts without reboot
                            sudo mount -a

                            # Show current mounts
                            findmnt

                            # Validate UUIDs
                            blkid
                            lsblk -f
Cloud recovery pattern
Broken cloud VM
                            โ”‚
                            โ”œโ”€โ”€ Stop instance
                            โ”œโ”€โ”€ Detach root volume
                            โ”œโ”€โ”€ Attach volume to rescue instance
                            โ”œโ”€โ”€ Mount filesystem
                            โ”œโ”€โ”€ Fix fstab/config/keys
                            โ”œโ”€โ”€ Unmount cleanly
                            โ”œโ”€โ”€ Reattach as root volume
                            โ””โ”€โ”€ Boot and verify
Boot rule: any change to /etc/fstab, bootloader, kernel or network config should be tested before rebooting a remote server.
Incident playbooks: common Ubuntu production failures
Playbook matrix
IncidentFirst commandLikely root causesSafe correction
Website downcurl -I localhostNginx, app service, DB, firewall.Fix failed layer, rollback deploy if needed.
502 Bad Gatewaysystemctl status appUpstream app down, socket path, port mismatch.Fix app service or Nginx upstream.
SSH unavailableCloud console / provider console.Firewall, SSH config, key, fail2ban, network.Console recovery, avoid closing existing session.
Disk fulldf -hLogs, Docker, DB, backups.Safe cleanup and retention fix.
High CPUtopTraffic spike, hot process, backup, worker count.Limit, scale, rollback, profile.
OOM killjournalctl -k | grep -i oomMemory leak, too many workers, low RAM.Reduce memory pressure, add limits.
DNS failuredig domainBad record, resolver, TTL, provider issue.Fix DNS or resolver path.
Package update broke serviceless /var/log/apt/history.logDependency change, config prompt, version mismatch.Rollback package or restore previous image.
502 Nginx playbook
502 Bad Gateway
                            โ”‚
                            โ”œโ”€โ”€ Check Nginx config
                            โ”‚       โ””โ”€โ”€ sudo nginx -t
                            โ”‚
                            โ”œโ”€โ”€ Check Nginx logs
                            โ”‚       โ””โ”€โ”€ tail -100 /var/log/nginx/error.log
                            โ”‚
                            โ”œโ”€โ”€ Check upstream app service
                            โ”‚       โ””โ”€โ”€ systemctl status gunicorn
                            โ”‚
                            โ”œโ”€โ”€ Check upstream port/socket
                            โ”‚       โ””โ”€โ”€ ss -lntp
                            โ”‚
                            โ”œโ”€โ”€ Check app logs
                            โ”‚       โ””โ”€โ”€ journalctl -u gunicorn
                            โ”‚
                            โ””โ”€โ”€ Fix app or upstream config
SSH lockout playbook
SSH unavailable
                            โ”‚
                            โ”œโ”€โ”€ Is server reachable?
                            โ”‚       โ””โ”€โ”€ ping / cloud status checks
                            โ”‚
                            โ”œโ”€โ”€ Is port open externally?
                            โ”‚       โ””โ”€โ”€ security group / firewall
                            โ”‚
                            โ”œโ”€โ”€ Console access possible?
                            โ”‚       โ””โ”€โ”€ provider console / serial console
                            โ”‚
                            โ”œโ”€โ”€ Check ssh service
                            โ”‚       โ””โ”€โ”€ systemctl status ssh
                            โ”‚
                            โ”œโ”€โ”€ Check firewall
                            โ”‚       โ””โ”€โ”€ ufw status verbose
                            โ”‚
                            โ”œโ”€โ”€ Check SSH config syntax
                            โ”‚       โ””โ”€โ”€ sshd -t
                            โ”‚
                            โ”œโ”€โ”€ Check keys and user
                            โ”‚       โ””โ”€โ”€ authorized_keys, permissions
                            โ”‚
                            โ””โ”€โ”€ Restore safe access before hardening again
Disk full playbook
Disk full
                            โ”‚
                            โ”œโ”€โ”€ df -h
                            โ”œโ”€โ”€ du -xhd1 /
                            โ”œโ”€โ”€ du -sh /var/log/*
                            โ”œโ”€โ”€ journalctl --disk-usage
                            โ”œโ”€โ”€ docker system df
                            โ”œโ”€โ”€ apt clean
                            โ”œโ”€โ”€ journalctl --vacuum-time=7d
                            โ”œโ”€โ”€ prune Docker carefully if applicable
                            โ”œโ”€โ”€ resize volume if needed
                            โ””โ”€โ”€ add monitoring and retention
Post-incident actions
After restoration:
                            [ ] Confirm user-visible service is healthy
                            [ ] Confirm logs are clean
                            [ ] Confirm monitoring is green
                            [ ] Record exact root cause
                            [ ] Record commands executed
                            [ ] Record rollback option used or not used
                            [ ] Add missing alert
                            [ ] Add missing dashboard panel
                            [ ] Add missing runbook step
                            [ ] Schedule permanent fix
Incident rule: the incident is not finished when the service is back. It is finished when the cause is understood and recurrence is reduced.
Ubuntu troubleshooting cheat sheet and final checklist
Command cheat sheet
# Host
                            hostnamectl
                            uptime
                            who
                            w
                            last reboot | head

                            # Services
                            systemctl status SERVICE
                            systemctl --failed
                            systemctl cat SERVICE
                            journalctl -u SERVICE --since "30 min ago"
                            journalctl -u SERVICE -f

                            # Logs
                            journalctl -xe
                            journalctl -p warning --since today
                            journalctl -k --since today
                            tail -100 /var/log/syslog
                            tail -100 /var/log/auth.log

                            # Network
                            ip a
                            ip r
                            ss -lntp
                            ss -s
                            resolvectl status
                            dig example.com
                            curl -I http://localhost
                            ufw status verbose

                            # Disk
                            df -h
                            df -ih
                            du -xhd1 /
                            lsblk -f
                            findmnt
                            journalctl --disk-usage

                            # Resources
                            free -h
                            top
                            ps aux --sort=-%cpu | head
                            ps aux --sort=-%mem | head
                            vmstat 1
                            iostat -xz 1
Final troubleshooting checklist
[ ] Symptom is precisely described
                            [ ] Incident start time is known
                            [ ] Scope is known
                            [ ] Host health is checked
                            [ ] Disk usage is checked
                            [ ] Memory and CPU are checked
                            [ ] Failed systemd units are checked
                            [ ] Relevant service logs are read
                            [ ] Kernel logs are checked if needed
                            [ ] Network path is verified
                            [ ] DNS is verified
                            [ ] Firewall is verified
                            [ ] Listening ports are verified
                            [ ] Recent deploys are checked
                            [ ] Recent apt updates are checked
                            [ ] Fix is minimal and reversible
                            [ ] Service health is verified after fix
                            [ ] Postmortem notes are written
                            [ ] Preventive action is created
Final rule
Ubuntu troubleshooting is evidence-driven.
Start with facts: logs, service status, ports, disk, memory, CPU, network and recent changes. Apply one controlled fix, verify the result, document the root cause, and add monitoring or a runbook so the same incident becomes easier next time.
Minimal incident report template
Incident report:
                            - title
                            - start time
                            - detection method
                            - impacted service
                            - user impact
                            - root cause
                            - immediate fix
                            - commands executed
                            - rollback used
                            - prevention action
                            - owner
                            - deadline
7.1 Ubuntu Cheat Sheet: essential commands, production checklists, cloud ops and incident shortcuts
Ubuntu operator quick map

This cheat sheet is a compact operational reference for Ubuntu servers: first checks, package operations, systemd services, journald logs, network, DNS, disk, memory, security, cloud patterns and production readiness.

NeedFirst commandWhat it answers
Host identityhostnamectlHostname, OS, kernel, machine type.
System loaduptimeLoad average and uptime.
Failed servicessystemctl --failedBroken units.
Service statussystemctl status SERVICEService state, PID, exit code, recent logs.
Service logsjournalctl -u SERVICEService timeline and errors.
Listening portsss -lntpOpen TCP ports and owning processes.
Disk usagedf -hFilesystem capacity.
Memory usagefree -hRAM, available memory and swap.
Firewallsudo ufw status verboseHost-level network exposure.
Recent package changesless /var/log/apt/history.logUpdates, installs and removals.
Fast rule: in an incident, check disk, memory, failed units, ports and recent logs before changing configuration.
First 90 seconds on a server
echo "== HOST =="
                            hostnamectl

                            echo "== UPTIME =="
                            uptime

                            echo "== USERS =="
                            who

                            echo "== DISK =="
                            df -h

                            echo "== MEMORY =="
                            free -h

                            echo "== FAILED UNITS =="
                            systemctl --failed

                            echo "== PORTS =="
                            ss -lntp

                            echo "== WARNINGS =="
                            journalctl -p warning --since "30 min ago" --no-pager | tail -100
Triage decision tree
Problem reported
                            โ”‚
                            โ”œโ”€โ”€ Host unreachable?
                            โ”‚       โ””โ”€โ”€ cloud, network, boot, firewall
                            โ”‚
                            โ”œโ”€โ”€ Disk full?
                            โ”‚       โ””โ”€โ”€ df -h, du, journal size, Docker
                            โ”‚
                            โ”œโ”€โ”€ Service failed?
                            โ”‚       โ””โ”€โ”€ systemctl status, journalctl
                            โ”‚
                            โ”œโ”€โ”€ Port missing?
                            โ”‚       โ””โ”€โ”€ service bind, config, crash
                            โ”‚
                            โ”œโ”€โ”€ Network path broken?
                            โ”‚       โ””โ”€โ”€ IP, route, DNS, UFW, security group
                            โ”‚
                            โ””โ”€โ”€ App problem?
                            โ””โ”€โ”€ app logs, DB, cache, deploy
Bad practice: changing several things at once. Use one change, one verification, one rollback path.
System identity, users, processes and host state
Host and OS
# Host and OS summary
                            hostnamectl

                            # Ubuntu release metadata
                            cat /etc/os-release
                            lsb_release -a

                            # Kernel
                            uname -a

                            # Architecture
                            dpkg --print-architecture

                            # Boot and uptime
                            uptime
                            last reboot | head

                            # Current users
                            who
                            w
Processes
# Interactive process view
                            top
                            htop

                            # Top CPU processes
                            ps aux --sort=-%cpu | head -30

                            # Top memory processes
                            ps aux --sort=-%mem | head -30

                            # Process tree
                            pstree -ap

                            # Find process
                            pgrep -af nginx

                            # Open files by process
                            sudo lsof -p PID
Users and groups
# Current identity
                            whoami
                            id

                            # User identity
                            id deploy
                            groups deploy

                            # Create user
                            sudo adduser deploy

                            # Add sudo rights
                            sudo usermod -aG sudo deploy

                            # Show sudo group
                            getent group sudo

                            # Show shell users
                            grep -E "/bin/bash|/bin/sh|/usr/bin/zsh" /etc/passwd

                            # Lock user password
                            sudo passwd -l username
Permissions
# File permissions
                            ls -lah /srv/app

                            # Path permissions
                            namei -l /srv/app/current/.env

                            # Change owner
                            sudo chown deploy:www-data file

                            # Recursive owner change
                            sudo chown -R deploy:www-data /srv/app

                            # File mode
                            chmod 644 file

                            # Directory mode
                            chmod 755 directory

                            # Secret file mode
                            chmod 600 secret.key
Permission rule: never solve production permissions with chmod 777. Fix owner, group and minimal access.
Packages, APT, repositories and updates
APT essentials
# Refresh package metadata
                            sudo apt update

                            # Show upgradeable packages
                            apt list --upgradable

                            # Upgrade packages
                            sudo apt upgrade

                            # Full dependency-aware upgrade
                            sudo apt full-upgrade

                            # Install package
                            sudo apt install PACKAGE

                            # Remove package, keep config
                            sudo apt remove PACKAGE

                            # Remove package and config
                            sudo apt purge PACKAGE

                            # Remove unused dependencies
                            sudo apt autoremove

                            # Clean package cache
                            sudo apt clean
Package inspection
# Search package
                            apt search nginx

                            # Package details
                            apt show nginx

                            # Installed and candidate version
                            apt policy nginx

                            # Installed packages
                            dpkg -l | grep nginx

                            # Files installed by package
                            dpkg -L nginx

                            # Package owning a file
                            dpkg -S /usr/sbin/nginx

                            # Available versions
                            apt-cache madison nginx
Repositories and history
# Source files
                            cat /etc/apt/sources.list
                            ls -lah /etc/apt/sources.list.d/

                            # Search repo lines
                            grep -R "^deb" /etc/apt/sources.list /etc/apt/sources.list.d/

                            # APT history
                            less /var/log/apt/history.log

                            # APT terminal logs
                            less /var/log/apt/term.log

                            # Held packages
                            apt-mark showhold

                            # Hold package
                            sudo apt-mark hold PACKAGE

                            # Unhold package
                            sudo apt-mark unhold PACKAGE
Package repair
# Finish interrupted dpkg operation
                            sudo dpkg --configure -a

                            # Fix broken dependencies
                            sudo apt -f install

                            # Check locks safely
                            ps aux | grep -E 'apt|dpkg'

                            # Re-run metadata refresh
                            sudo apt update
Update safety
# Reboot required?
                            test -f /var/run/reboot-required && cat /var/run/reboot-required

                            # Packages requiring reboot if present
                            cat /var/run/reboot-required.pkgs 2>/dev/null

                            # Security automation
                            sudo apt install unattended-upgrades
                            sudo dpkg-reconfigure unattended-upgrades
Production rule: review external repositories and package changes before major upgrades. Unknown PPAs create upgrade and supply-chain risk.
systemd services, unit files and runtime control
Service commands
# Service status
                            systemctl status SERVICE

                            # Start / stop / restart
                            sudo systemctl start SERVICE
                            sudo systemctl stop SERVICE
                            sudo systemctl restart SERVICE

                            # Reload config if supported
                            sudo systemctl reload SERVICE

                            # Enable / disable at boot
                            sudo systemctl enable SERVICE
                            sudo systemctl disable SERVICE

                            # Is active / enabled?
                            systemctl is-active SERVICE
                            systemctl is-enabled SERVICE

                            # Failed units
                            systemctl --failed

                            # Reset failed state
                            sudo systemctl reset-failed SERVICE
Unit inspection
# Show unit file
                            systemctl cat SERVICE

                            # Show runtime properties
                            systemctl show SERVICE | less

                            # Show service logs
                            journalctl -u SERVICE --since "1 hour ago"

                            # Follow logs
                            journalctl -u SERVICE -f

                            # Reload unit files after edit
                            sudo systemctl daemon-reload
Service failure flow
Service broken
                            โ”‚
                            โ”œโ”€โ”€ systemctl status SERVICE
                            โ”œโ”€โ”€ journalctl -u SERVICE --since "30 min ago"
                            โ”œโ”€โ”€ systemctl cat SERVICE
                            โ”œโ”€โ”€ validate config
                            โ”œโ”€โ”€ check dependencies
                            โ”œโ”€โ”€ check permissions
                            โ”œโ”€โ”€ check ports
                            โ”œโ”€โ”€ restart only after cause is understood
                            โ””โ”€โ”€ verify logs and health check
Common config validators
# Nginx
                            sudo nginx -t

                            # SSH
                            sudo sshd -t

                            # Apache
                            sudo apachectl configtest

                            # PostgreSQL
                            sudo -u postgres psql -c "select version();"

                            # Redis
                            redis-cli ping

                            # Local HTTP health
                            curl -I http://localhost
Robust unit pattern
[Unit]
                            Description=My app
                            After=network.target

                            [Service]
                            User=myapp
                            Group=myapp
                            WorkingDirectory=/srv/myapp
                            EnvironmentFile=/srv/myapp/.env
                            ExecStart=/srv/myapp/venv/bin/gunicorn config.wsgi:application
                            Restart=on-failure
                            RestartSec=5
                            LimitNOFILE=65535

                            [Install]
                            WantedBy=multi-user.target
Service rule: every production daemon should be systemd-managed, non-root, restartable, logged and documented.
Logs, journald, auth logs, kernel logs and audit trail
journald essentials
# Recent diagnostic context
                            journalctl -xe

                            # Current boot
                            journalctl -b

                            # Previous boot
                            journalctl -b -1

                            # Service logs
                            journalctl -u SERVICE

                            # Service logs since today
                            journalctl -u SERVICE --since today

                            # Service logs last 30 minutes
                            journalctl -u SERVICE --since "30 min ago"

                            # Follow service logs
                            journalctl -u SERVICE -f

                            # Warnings and errors
                            journalctl -p warning --since today

                            # Kernel logs
                            journalctl -k --since today
Classic log files
# System log
                            sudo tail -200 /var/log/syslog

                            # Authentication log
                            sudo tail -200 /var/log/auth.log

                            # Nginx
                            sudo tail -200 /var/log/nginx/error.log
                            sudo tail -200 /var/log/nginx/access.log

                            # APT
                            less /var/log/apt/history.log
                            less /var/log/apt/term.log

                            # Kernel ring buffer
                            dmesg -T | tail -100
Search patterns
# Generic errors
                            grep -i "error" app.log
                            grep -i "failed" app.log
                            grep -i "permission denied" app.log
                            grep -i "connection refused" app.log
                            grep -i "no space left" /var/log/syslog

                            # SSH failures
                            sudo grep -i "failed password" /var/log/auth.log | tail -100

                            # Sudo usage
                            sudo grep -i "sudo" /var/log/auth.log | tail -100

                            # OOM events
                            journalctl -k --since today | grep -i -E "oom|killed process"

                            # Compressed rotated logs
                            zgrep -i "error" /var/log/syslog.*.gz
Journal size control
# Show journal size
                            journalctl --disk-usage

                            # Vacuum by time
                            sudo journalctl --vacuum-time=14d

                            # Vacuum by size
                            sudo journalctl --vacuum-size=1G
Log investigation flow
Find root cause
                            โ”‚
                            โ”œโ”€โ”€ define incident time window
                            โ”œโ”€โ”€ read service journal
                            โ”œโ”€โ”€ read system warnings
                            โ”œโ”€โ”€ read kernel logs
                            โ”œโ”€โ”€ read app logs
                            โ”œโ”€โ”€ check apt/deploy history
                            โ””โ”€โ”€ identify first meaningful error
Log rule: the first meaningful error is more valuable than the last visible symptom.
Network, DNS, firewall and HTTP checks
Network commands
# Interfaces
                            ip a

                            # Routes
                            ip r

                            # Interface counters
                            ip -s link

                            # Listening TCP ports
                            ss -lntp

                            # Established connections
                            ss -antp

                            # Socket summary
                            ss -s

                            # DNS status
                            resolvectl status

                            # DNS query
                            dig example.com
                            dig A example.com
                            dig AAAA example.com

                            # Reachability
                            ping -c 3 1.1.1.1
                            tracepath example.com
                            mtr -rw example.com
HTTP and TLS checks
# Local HTTP
                            curl -I http://localhost

                            # Public HTTP
                            curl -I https://example.com

                            # Follow redirects
                            curl -IL https://example.com

                            # Verbose TLS/HTTP
                            curl -vI https://example.com

                            # Check certificate with openssl
                            openssl s_client -connect example.com:443 -servername example.com
Firewall commands
# Status
                            sudo ufw status verbose
                            sudo ufw status numbered

                            # Baseline
                            sudo ufw default deny incoming
                            sudo ufw default allow outgoing

                            # Allow SSH
                            sudo ufw allow OpenSSH

                            # Allow web
                            sudo ufw allow 80/tcp
                            sudo ufw allow 443/tcp

                            # Restrict SSH by source
                            sudo ufw allow from 203.0.113.10 to any port 22 proto tcp

                            # Delete numbered rule
                            sudo ufw delete RULE_NUMBER

                            # Enable firewall
                            sudo ufw enable
Network diagnostic flow
Service unreachable
                            โ”‚
                            โ”œโ”€โ”€ service listening locally?
                            โ”‚       โ””โ”€โ”€ ss -lntp
                            โ”‚
                            โ”œโ”€โ”€ local curl works?
                            โ”‚       โ””โ”€โ”€ curl -I localhost
                            โ”‚
                            โ”œโ”€โ”€ firewall open?
                            โ”‚       โ””โ”€โ”€ ufw status
                            โ”‚
                            โ”œโ”€โ”€ DNS points correctly?
                            โ”‚       โ””โ”€โ”€ dig domain
                            โ”‚
                            โ”œโ”€โ”€ remote curl works?
                            โ”‚       โ””โ”€โ”€ curl -I domain
                            โ”‚
                            โ””โ”€โ”€ app returns error?
                            โ””โ”€โ”€ app logs / upstream logs
Cloud note: for AWS or another cloud, check both host firewall and cloud security group. Either one can block traffic.
Disk, filesystem, memory, CPU and IO
Disk and filesystem
# Filesystem usage
                            df -h

                            # Inode usage
                            df -ih

                            # Block devices
                            lsblk -f

                            # Mounts
                            findmnt

                            # fstab
                            cat /etc/fstab

                            # Top-level usage
                            sudo du -xhd1 / 2>/dev/null | sort -h

                            # Common growth paths
                            sudo du -sh /var/log/*
                            sudo du -sh /var/lib/docker/* 2>/dev/null
                            sudo du -sh /var/lib/postgresql/* 2>/dev/null
                            sudo du -sh /tmp/* 2>/dev/null
Safe cleanup
# APT cache
                            sudo apt clean
                            sudo apt autoremove

                            # Journal
                            journalctl --disk-usage
                            sudo journalctl --vacuum-time=14d
                            sudo journalctl --vacuum-size=1G

                            # Docker usage
                            docker system df

                            # Docker cleanup, use carefully
                            docker image prune
                            docker container prune
CPU, memory and IO
# CPU/load
                            uptime
                            top
                            htop
                            ps aux --sort=-%cpu | head -30

                            # Memory
                            free -h
                            ps aux --sort=-%mem | head -30

                            # Swap
                            swapon --show
                            vmstat 1

                            # OOM
                            journalctl -k --since today | grep -i -E "oom|killed process"

                            # IO, requires sysstat
                            iostat -xz 1

                            # Per-process IO
                            sudo iotop -o
Disk full playbook
Disk full
                            โ”‚
                            โ”œโ”€โ”€ df -h
                            โ”œโ”€โ”€ du -xhd1 /
                            โ”œโ”€โ”€ journalctl --disk-usage
                            โ”œโ”€โ”€ du -sh /var/log/*
                            โ”œโ”€โ”€ docker system df
                            โ”œโ”€โ”€ apt clean
                            โ”œโ”€โ”€ journalctl --vacuum-time=14d
                            โ”œโ”€โ”€ resize volume if needed
                            โ””โ”€โ”€ add alert and retention policy
Resource interpretation
SignalLikely issue
High load + high CPUCPU-bound workload or traffic spike.
High load + high IO waitDisk or database bottleneck.
Low available RAM + swap activityMemory pressure.
OOM kill logsProcess killed by kernel due to memory exhaustion.
Filesystem 100%Services may fail unpredictably.
Data warning: never delete unknown files in database directories or Docker volumes without understanding ownership and backup state.
Security hardening and quick audit
SSH hardening
# Backup config
                            sudo cp /etc/ssh/sshd_config /etc/ssh/sshd_config.bak.$(date +%Y%m%d-%H%M%S)

                            # Recommended directives
                            PermitRootLogin no
                            PasswordAuthentication no
                            PubkeyAuthentication yes
                            X11Forwarding no
                            MaxAuthTries 3
                            AllowUsers deploy

                            # Validate and restart
                            sudo sshd -t
                            sudo systemctl restart ssh

                            # Logs
                            journalctl -u ssh --since today
SSH key permissions
chmod 700 ~/.ssh
                            chmod 600 ~/.ssh/id_ed25519
                            chmod 644 ~/.ssh/id_ed25519.pub
                            chmod 600 ~/.ssh/authorized_keys
fail2ban
sudo apt install fail2ban
                            sudo systemctl enable --now fail2ban
                            sudo fail2ban-client status
                            sudo fail2ban-client status sshd
                            sudo journalctl -u fail2ban --since today
Security snapshot
echo "== UFW =="
                            sudo ufw status verbose

                            echo "== OPEN PORTS =="
                            ss -lntp

                            echo "== SUDO USERS =="
                            getent group sudo

                            echo "== SHELL USERS =="
                            grep -E "/bin/bash|/bin/sh|/usr/bin/zsh" /etc/passwd

                            echo "== SSH LOGS =="
                            journalctl -u ssh --since "24 hours ago" --no-pager | tail -100

                            echo "== AUTH LOG =="
                            sudo tail -100 /var/log/auth.log
Security checklist
[ ] Ubuntu LTS
                            [ ] Packages updated
                            [ ] Reboot-required checked
                            [ ] Named admin user
                            [ ] Root SSH login disabled
                            [ ] SSH key login validated
                            [ ] Password SSH disabled
                            [ ] UFW enabled
                            [ ] Only required ports open
                            [ ] Database ports private
                            [ ] Redis ports private
                            [ ] fail2ban enabled if SSH public
                            [ ] Service users are non-root
                            [ ] Secrets are not world-readable
                            [ ] Backups exist
                            [ ] Restore tested
Security rule: hardening is useful only if access remains recoverable and changes are documented.
Cloud and AWS Ubuntu quick reference
AWS Ubuntu baseline
Production EC2 Ubuntu baseline:
                            - official Ubuntu LTS AMI
                            - Canonical owner verified
                            - minimal security group
                            - SSH restricted by source or bastion
                            - IAM role instead of static keys
                            - cloud-init tested
                            - packages updated
                            - UFW aligned with security group
                            - monitoring installed
                            - snapshots scheduled
                            - restore tested
                            - tags complete
                            - instance replaceable
Canonical AMI owner
Canonical AWS owner ID:
                            099720109477

                            Use it to filter official Ubuntu AMIs.
AWS CLI AMI search
aws ec2 describe-images \
                            --owners 099720109477 \
                            --filters \
                            "Name=name,Values=ubuntu/images/hvm-ssd/ubuntu-*-24.04-amd64-server-*" \
                            "Name=state,Values=available" \
                            "Name=architecture,Values=x86_64" \
                            --query 'Images | sort_by(@, &CreationDate)[-5:].{Name:Name,ImageId:ImageId,CreationDate:CreationDate}' \
                            --output table
cloud-init quick pattern
#cloud-config
                            package_update: true
                            package_upgrade: true

                            timezone: UTC

                            packages:
                            - curl
                            - wget
                            - git
                            - htop
                            - ufw
                            - fail2ban
                            - nginx

                            runcmd:
                            - ufw allow OpenSSH
                            - ufw allow 80/tcp
                            - ufw allow 443/tcp
                            - ufw --force enable
                            - systemctl enable --now nginx
                            - systemctl enable --now fail2ban
Cloud-init diagnostics
cloud-init status
                            cloud-init status --wait
                            sudo tail -200 /var/log/cloud-init.log
                            sudo tail -200 /var/log/cloud-init-output.log
Official links
ResourceURL
Ubuntu downloadshttps://ubuntu.com/download
Ubuntu documentationhttps://documentation.ubuntu.com/
Ubuntu Server docshttps://documentation.ubuntu.com/server/
Ubuntu release cyclehttps://ubuntu.com/about/release-cycle
Ubuntu releaseshttps://releases.ubuntu.com/
Ubuntu on AWShttps://documentation.ubuntu.com/aws/
AWS AMI conceptshttps://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html
Cloud rule: do not put long-lived secrets in user data or baked AMIs. Use IAM, Parameter Store or a proper secret manager.
Production server checklist and mini demo
Production readiness checklist
[System]
                            [ ] Ubuntu LTS selected
                            [ ] Hostname correct
                            [ ] Timezone configured
                            [ ] Packages updated
                            [ ] Reboot-required checked
                            [ ] Server role documented

                            [Security]
                            [ ] SSH keys only
                            [ ] Root SSH login disabled
                            [ ] UFW enabled
                            [ ] Only required ports open
                            [ ] Users and sudo controlled
                            [ ] Service users non-root
                            [ ] Secrets protected

                            [Operations]
                            [ ] systemd services enabled
                            [ ] Logs visible with journalctl
                            [ ] Monitoring installed
                            [ ] Alerts configured
                            [ ] Backups scheduled
                            [ ] Restore tested
                            [ ] Patch policy defined
                            [ ] Runbook written

                            [Cloud]
                            [ ] Official LTS image
                            [ ] Security groups minimal
                            [ ] IAM role least privilege
                            [ ] Snapshots configured
                            [ ] Tags complete
                            [ ] Replacement strategy documented
Mini demo for portfolio
Demo: production-minded Ubuntu EC2

                            Architecture:
                            Internet
                            โ”‚
                            โ–ผ
                            Security Group
                            โ”‚
                            โ”œโ”€โ”€ 22/tcp from admin IP only
                            โ”œโ”€โ”€ 80/tcp public
                            โ””โ”€โ”€ 443/tcp public
                            โ”‚
                            โ–ผ
                            Ubuntu LTS EC2
                            โ”œโ”€โ”€ cloud-init installs nginx
                            โ”œโ”€โ”€ UFW enabled
                            โ”œโ”€โ”€ fail2ban enabled
                            โ”œโ”€โ”€ logs checked
                            โ”œโ”€โ”€ metrics installed
                            โ””โ”€โ”€ backup snapshot configured
Mini demo validation commands
hostnamectl
                            cloud-init status
                            systemctl status nginx
                            sudo ufw status verbose
                            ss -lntp
                            curl -I http://localhost
                            journalctl -u nginx --since today
                            df -h
                            free -h
Cheat-sheet poster placeholder
Optional poster image:
Ubuntu Cheatsheet Poster
Placeholder: static/img/ubuntu/ubuntu_cheatsheet_poster.png
Final rule: a professional Ubuntu server is not just installed. It is secured, monitored, patched, backed up, documented and recoverable.