
๐ง Ubuntu Linux โ Guide Complet (Desktop / Server / Cloud / AWS)
Ubuntu = distribution Linux โproduction-friendlyโ : stabilitรฉ, sรฉcuritรฉ, support, cloud, รฉcosystรจme. (Catรฉgorie IDEO-Lab : O/S & Platforms)
Ubuntu : cโest quoi ?
Positionnement, philosophie, rรฉputation (desktop + serveur + cloud), pourquoi cโest un standard โproโ.
O/S & Platforms Linux Enterprise-readyVersions & cycle (LTS)
LTS vs interim, support, comment choisir (prod/dev), exemples de versions actuelles.
LTS Release cycle SupportInstallation (Desktop/Server)
ISO, partitionnement, UEFI, SSH, cloud-init (server), post-install โpropreโ.
Install UEFI SSHSoftware Management
Ubuntu App Center, .deb packages, Snap, Flatpak, PPA repositories, software sources and safe production usage.
App Center DEB PPAFonctions de base (CLI)
Fichiers, users, permissions, services, logs, rรฉseau, storage : le kit โsysadminโ.
Terminal Systemd TroubleshootPaquets : APT & Snap
Repositories, pinning, updates, sรฉcuritรฉ, snaps, bonnes pratiques (prod).
APT Snap ReposMaรฎtriser le Terminal
BASH, commandes fondamentales, navigation fichiers, sudo, permissions, chmod, chown et rรฉflexes sysadmin.
BASH CLI PermissionsSรฉcuritรฉ (hardening)
UFW, SSH, fail2ban, mises ร jour sรฉcuritรฉ, users/roles, audit & bonnes pratiques cloud.
Security UFW SSHPerformance & robustesse
Kernel, IO, memory, CPU, tuning, monitoring, pourquoi Ubuntu est โstableโ en prod.
Perf Robust MonitoringMaintenance & Security
System updates, UFW firewall, Timeshift restore points, logs, journald and safe maintenance routines.
Updates UFW TimeshiftCloud & AWS (Ubuntu images)
AMI officielles, Owner Canonical, cloud-init, userdata, SSH keys, patterns EC2.
AWS EC2 cloud-initContainers & Virtualisation
Docker, LXD/LXC, KVM, virt-manager, usages (CI/CD, lab, prod).
Docker LXD KVMCustomization & Optimization
GNOME extensions, themes, icons, keyboard shortcuts, battery management, swappiness and safe cleanup routines.
GNOME Themes OptimizeDรฉpannage (mรฉthodo)
Logs systemd, journald, rรฉseau, DNS, disk, boot, services : playbook.
Debug Logs IncidentsCheat-Sheet Ubuntu
Commandes essentielles + checklists โserveur prodโ + bonnes pratiques cloud.
Quick Checklist OpsDefinition
Ubuntu is a Linux distribution maintained by Canonical. It is built on the Linux kernel and provides a complete operating system: package management, system services, security updates, networking, storage, user management, desktop environment, server tools and cloud images.
In professional environments, Ubuntu is popular because it is predictable, widely documented, cloud-friendly, developer-friendly and available in long-term support releases. It is commonly used for web servers, APIs, containers, DevOps tooling, CI/CD runners, databases, monitoring, AI workloads and desktop development.
Where Ubuntu sits in the technology landscape
| Layer | Ubuntu role | Examples |
|---|---|---|
| Hardware / VM | Runs on physical machines or virtual machines. | Server, laptop, AWS EC2, Azure VM, KVM. |
| Kernel | Uses Linux kernel for process, memory, network and filesystem control. | scheduler, TCP/IP, ext4, drivers. |
| User space | Provides tools, libraries, shells and services. | bash, systemd, apt, ssh, journald. |
| Applications | Hosts business and infrastructure services. | Nginx, PostgreSQL, Redis, Docker, Django. |
| Operations | Provides operational surface for admins and DevOps. | logs, units, firewall, packages, users. |
Mental classification
Ubuntu is not:
- a programming language
- a framework
- a database
- a cloud provider
- a container engine
Ubuntu is:
- an operating system
- a Linux distribution
- a server platform
- a desktop platform
- a cloud image baseline
- a container host
- a DevOps execution environmentWhy Ubuntu became a professional standard
Ubuntu became a common professional choice because it offers a practical balance: easier than many traditional server distributions for newcomers, stable enough for production when using LTS, and supported by a huge ecosystem of packages, tutorials, cloud images and vendor documentation.
| Reason | Professional impact | Concrete example |
|---|---|---|
| LTS releases | Stable baseline for servers and production workloads. | Choose one version and patch it for years. |
| Large package ecosystem | Fast installation of standard infrastructure tools. | apt install nginx postgresql redis |
| Cloud images | Quick deployment on public cloud providers. | EC2, Azure, GCP, OpenStack. |
| Documentation | Faster troubleshooting and onboarding. | Server docs, community docs, vendor guides. |
| Developer tooling | Good fit for Python, Node.js, Go, Java, Docker and CI/CD. | Local dev and production parity. |
| Enterprise support | Commercial support path exists if needed. | Canonical support, Ubuntu Pro, security services. |
Professional value map
Ubuntu knowledge helps in:
Backend engineering
โโโ deploy APIs
โโโ manage services
โโโ inspect logs
โโโ debug network and permissions
DevOps
โโโ automate installs
โโโ configure systemd
โโโ harden SSH
โโโ manage packages
โโโ operate containers
SRE / Production
โโโ monitor CPU/RAM/disk/network
โโโ investigate incidents
โโโ patch security updates
โโโ tune services
โโโ write runbooks
Cloud engineering
โโโ boot cloud images
โโโ use cloud-init
โโโ configure storage
โโโ set firewall rules
โโโ deploy workloadsRecruiter-friendly summary
Ubuntu operating system architecture
Applications / Services
โโโ nginx
โโโ postgres
โโโ redis
โโโ django
โโโ docker
โโโ monitoring agents
โ
โผ
User space
โโโ bash / shell
โโโ GNU tools
โโโ systemd
โโโ journald
โโโ apt / dpkg
โโโ ssh
โโโ libraries
โ
โผ
Linux kernel
โโโ process scheduler
โโโ memory management
โโโ filesystem layer
โโโ network stack
โโโ security modules
โโโ drivers
โ
โผ
Hardware / Hypervisor
โโโ CPU
โโโ RAM
โโโ disk
โโโ network card
โโโ KVM / VMware
โโโ cloud hypervisorWhat each layer means in operations
| Layer | Typical admin action | Diagnostic command |
|---|---|---|
| Application | Restart service, inspect config, read logs. | systemctl status nginx |
| User space | Install package, manage users, run scripts. | apt list --installed |
| Systemd | Enable boot services and dependencies. | journalctl -u service |
| Kernel | Check memory, processes, sockets, I/O. | dmesg, ss, top |
| Storage | Mount disks, inspect usage, tune I/O. | df -h, lsblk |
| Network | Check IP, routes, DNS, firewall. | ip a, ip r, resolvectl |
Ubuntu Desktop, Server and Cloud
| Edition / usage | Main purpose | Typical user | Key components |
|---|---|---|---|
| Ubuntu Desktop | Workstation, development, daily OS. | Developer, engineer, analyst. | GNOME, terminal, browser, IDEs, Docker. |
| Ubuntu Server | Production services and infrastructure. | DevOps, SRE, backend engineer. | SSH, systemd, apt, netplan, firewall. |
| Ubuntu Cloud Image | Cloud VM baseline. | Cloud engineer, platform team. | cloud-init, optimized kernel, cloud agent. |
| Ubuntu Container Base | Base image for containers. | DevOps, application engineer. | minimal packages, apt, runtime libraries. |
| Ubuntu Core | IoT and embedded-oriented variant. | IoT platform team. | snap-based, transactional updates. |
Use-case examples
Desktop:
- Python development
- Docker-based local stack
- SSH into production servers
- Kubernetes and cloud CLI tools
Server:
- Nginx reverse proxy
- Django or Node.js API
- PostgreSQL or Redis host
- monitoring server
- VPN or bastion host
Cloud:
- EC2 instance
- Azure VM
- GCP Compute Engine
- OpenStack instance
- Kubernetes nodeEdition decision tree
Need a local workstation?
โโโ Ubuntu Desktop
Need a production VM?
โโโ Ubuntu Server LTS
Need a cloud instance?
โโโ Ubuntu cloud image
Need a container base?
โโโ Ubuntu minimal/base image
Need IoT appliance-like OS?
โโโ Ubuntu Core
Need enterprise security extensions?
โโโ Ubuntu LTS + Ubuntu ProPractical distinction
LTS model, releases and upgrade strategy
Ubuntu is frequently chosen in production because of the LTS model. LTS means long-term support: a stable base release used for servers, cloud images and enterprise deployments. Non-LTS releases are useful for newer software, but less common as a conservative production baseline.
| Release type | Best for | Production recommendation |
|---|---|---|
| LTS | Servers, cloud, enterprise, long-lived systems. | Default choice for production. |
| Interim release | Newer packages, testing, short-lived environments. | Use only with clear upgrade discipline. |
| Rolling behavior | Not Ubuntu's main model. | Use another distro if rolling release is required. |
Upgrade strategy
Safe production upgrade path:
1. Inventory servers and services
2. Confirm current Ubuntu version
3. Check application compatibility
4. Snapshot or backup
5. Test upgrade on staging
6. Review package changes
7. Schedule maintenance window
8. Upgrade one node first
9. Validate services and logs
10. Roll out progressively
11. Keep rollback plan readyVersion management commands
# Show Ubuntu version
lsb_release -a
# Show OS release file
cat /etc/os-release
# Show kernel version
uname -a
# Update package lists
sudo apt update
# Upgrade installed packages
sudo apt upgrade
# Full upgrade with dependency changes
sudo apt full-upgrade
# Check reboot requirement
test -f /var/run/reboot-required && cat /var/run/reboot-requiredRelease risk table
| Risk | Cause | Control |
|---|---|---|
| Package incompatibility | Runtime or library version changes. | Test staging before production. |
| Service restart failure | Config syntax or dependency change. | Validate configs before restart. |
| Kernel reboot required | Security kernel update. | Plan reboot window. |
| Repository mismatch | Third-party packages not ready. | Audit external repositories. |
Ubuntu in enterprise and production
In enterprise environments, Ubuntu is used when teams need a stable Linux baseline with strong cloud support, broad package availability, automation compatibility and a known operational model. It is especially common for backend platforms, DevOps infrastructure, Kubernetes nodes, CI runners and cloud-hosted services.
| Enterprise requirement | Ubuntu answer | Operational practice |
|---|---|---|
| Security patching | Regular package and kernel updates. | Patch windows and reboot strategy. |
| Repeatable deployment | Cloud images, apt, automation tools. | Ansible, Terraform, cloud-init. |
| Service supervision | systemd standard service manager. | Unit files, restart policy, journald logs. |
| Access control | Linux users, groups, sudo, SSH. | Least privilege and key-based access. |
| Observability | journald, syslog, metrics agents. | Central logging and monitoring. |
| Cloud integration | Images and cloud-init. | Bootstrap on first boot. |
Production server lifecycle
Provision
โ
โโโ select Ubuntu LTS image
โโโ configure cloud-init
โโโ attach disk
โโโ configure network
โ
โผ
Harden
โ
โโโ SSH keys
โโโ disable root login
โโโ firewall
โโโ unattended upgrades policy
โโโ least-privilege users
โ
โผ
Deploy
โ
โโโ install packages
โโโ configure services
โโโ systemd unit files
โโโ application release
โ
โผ
Operate
โ
โโโ logs
โโโ metrics
โโโ backups
โโโ patching
โโโ incident responseProfessional checklist
[ ] LTS release selected
[ ] SSH key access only
[ ] sudo policy controlled
[ ] firewall enabled
[ ] services managed by systemd
[ ] logs visible through journalctl
[ ] backups configured
[ ] monitoring installed
[ ] security updates planned
[ ] disk usage monitored
[ ] certificates tracked
[ ] rollback plan documentedCore Ubuntu administration toolkit
| Area | Tools | Typical command |
|---|---|---|
| Packages | apt, dpkg | sudo apt install nginx |
| Services | systemd, systemctl | sudo systemctl restart nginx |
| Logs | journalctl, syslog | journalctl -u nginx -f |
| Network | ip, ss, resolvectl, netplan | ss -lntp |
| Firewall | ufw, nftables | sudo ufw status verbose |
| Storage | df, du, lsblk, mount | df -h |
| Processes | ps, top, htop, kill | ps aux | grep nginx |
| Users | useradd, usermod, sudoers | sudo usermod -aG sudo user |
First diagnostic commands
# System identity
hostnamectl
cat /etc/os-release
uptime
# CPU and memory
top
free -h
# Disk usage
df -h
du -sh /var/log/*
# Network
ip a
ip r
ss -lntp
resolvectl status
# Services
systemctl status nginx
journalctl -u nginx --since "30 min ago"
# Packages
apt policy nginx
dpkg -l | grep nginx
# Security
sudo ufw status verbose
sudo journalctl -u ssh --since todayTypical Ubuntu production stacks
| Stack | Components | Ubuntu role |
|---|---|---|
| Django / Python API | Nginx, Gunicorn, Django, PostgreSQL, Redis. | Host services, packages, systemd units, logs. |
| Node.js API | Nginx, Node.js, PM2/systemd, database. | Runtime host and reverse proxy. |
| Docker host | Docker Engine, Compose, images, volumes. | Container runtime platform. |
| Database server | PostgreSQL, MySQL, MariaDB, backups. | Storage, service control, tuning, logs. |
| Monitoring server | Prometheus, Grafana, Loki, exporters. | Observability host. |
| Bastion host | SSH gateway, audit, restricted access. | Secure entry point. |
Django deployment example
Internet
โ
โผ
Nginx
โ
โโโ TLS termination
โโโ static files
โโโ reverse proxy
โ
โผ
Gunicorn systemd service
โ
โผ
Django application
โ
โโโ PostgreSQL
โโโ Redis
โโโ Celery workers
โโโ media/static storageExample service units
Common systemd units:
- nginx.service
- postgresql.service
- redis-server.service
- docker.service
- gunicorn.service
- celery.service
- celerybeat.service
- prometheus-node-exporter.service
Typical commands:
sudo systemctl enable nginx
sudo systemctl restart gunicorn
sudo systemctl status redis-server
journalctl -u celery -fMinimal web server setup flow
1. Create server
2. Update packages
3. Create deploy user
4. Configure SSH
5. Install Nginx
6. Install app runtime
7. Configure database
8. Create systemd service
9. Configure TLS
10. Enable firewall
11. Add monitoring
12. Add backup
13. Document runbookCommon risks, anti-patterns and production mistakes
| Anti-pattern | Risk | Correction |
|---|---|---|
| Logging in as root directly | Weak audit and high blast radius. | Use named users, sudo and SSH keys. |
| Public SSH with passwords | Brute-force exposure. | Key-only SSH, firewall, fail2ban or VPN. |
| Ignoring package updates | Known vulnerabilities remain active. | Patch policy and reboot planning. |
| No service manager | App dies and does not restart. | Use systemd with restart policy. |
| No log strategy | Incidents are hard to diagnose. | Use journald, logrotate and central logs. |
| Manual untracked changes | Server becomes unreproducible. | Use automation and versioned configs. |
| No disk monitoring | Full disk causes outage. | Monitor filesystem usage and logs. |
| No rollback plan | Failed upgrade becomes long outage. | Snapshot, backup and tested restore path. |
Incident diagnostic decision tree
Application is down
โ
โโโ Is server reachable?
โ โโโ no -> network, firewall, cloud, DNS
โ โโโ yes
โ
โโโ Is service running?
โ โโโ no -> systemctl status + journalctl
โ โโโ yes
โ
โโโ Is port listening?
โ โโโ no -> config or bind failure
โ โโโ yes
โ
โโโ Is reverse proxy healthy?
โ โโโ no -> nginx config/logs
โ โโโ yes
โ
โโโ Is database reachable?
โ โโโ no -> DB service/network/auth
โ โโโ yes
โ
โโโ Is app throwing errors?
โโโ yes -> application logs
โโโ no -> upstream routing/cache/client issueFirst-response commands
systemctl status nginx
journalctl -u nginx --since "15 min ago"
ss -lntp
df -h
free -h
top
sudo ufw status
curl -I http://localhost
curl -I https://example.comOfficial links and useful references
| Resource | URL | Usage |
|---|---|---|
| Ubuntu main site | https://ubuntu.com/ | Product overview and downloads. |
| Download Ubuntu | https://ubuntu.com/download | Desktop, server and cloud downloads. |
| Ubuntu documentation | https://documentation.ubuntu.com/ | Official documentation portal. |
| Ubuntu Server docs | https://documentation.ubuntu.com/server/ | Server administration reference. |
| Ubuntu releases | https://releases.ubuntu.com/ | Release images and versions. |
| Ubuntu packages | https://packages.ubuntu.com/ | Package lookup. |
| Ubuntu security notices | https://ubuntu.com/security/notices | Security update tracking. |
Learning roadmap
Ubuntu learning path:
1. Shell basics
2. Filesystem and permissions
3. Users, groups and sudo
4. apt and packages
5. systemd services
6. journald and logs
7. networking and DNS
8. firewall and SSH hardening
9. storage and mounts
10. Nginx reverse proxy
11. database service operation
12. Docker host usage
13. backups and restore
14. monitoring and alerting
15. cloud-init and automationOne-line positioning
Software management on Ubuntu
Ubuntu provides several ways to install software. The most important are: graphical installation through Ubuntu App Center, traditional Debian packages through APT and .deb files, Snap packages, Flatpak applications and third-party repositories such as PPAs.
The right method depends on the context. A desktop user may prefer App Center, Snap or Flatpak. A server administrator usually prefers APT and controlled repositories. A developer may use a vendor repository for Docker, PostgreSQL, Node.js or cloud tooling. A production team must control package origin, version, update policy and rollback.
| Method | Best for | Strength | Risk |
|---|---|---|---|
| App Center | Desktop users and simple installs. | Easy graphical installation. | Less precise for production governance. |
| APT / DEB | Servers, system packages, standard tools. | Native Ubuntu package management. | Repository conflicts if unmanaged. |
| Snap | Sandboxed apps and some Canonical-supported tools. | Bundled dependencies and automatic refresh. | Refresh policy and confinement must be understood. |
| Flatpak | Desktop applications, especially cross-distro apps. | Good desktop app ecosystem. | Another runtime and update channel to govern. |
| PPA | Newer versions or community packages. | Access to versions not in official repos. | Trust, lifecycle and upgrade conflicts. |
| Vendor repo | Official upstream packages. | Best path for many professional tools. | Keys, pinning and repository ownership matter. |
.deb installs only with explicit justification.Software source map
Ubuntu software sources
โ
โโโ App Center
โ โโโ graphical install
โ โโโ desktop apps
โ โโโ simple discovery
โ
โโโ APT repositories
โ โโโ official Ubuntu repos
โ โโโ security updates
โ โโโ vendor repos
โ โโโ PPAs
โ
โโโ Local DEB files
โ โโโ downloaded installer
โ โโโ vendor package
โ โโโ manual install
โ
โโโ Snap
โ โโโ snap store
โ โโโ channels
โ โโโ sandbox
โ โโโ auto refresh
โ
โโโ Flatpak
โโโ Flathub
โโโ desktop app runtimes
โโโ sandbox permissions
โโโ user-level installsDecision shortcut
Need a server package?
โโโ APT from Ubuntu or official vendor repository
Need a desktop application?
โโโ App Center, Snap or Flatpak
Need a newer application version?
โโโ check official vendor repo first
โโโ then consider PPA
โโโ document the reason
Need a one-off local installer?
โโโ .deb file with verified source
Need strict production reproducibility?
โโโ APT + pinned repositories + automationUbuntu App Center: simplified graphical installation
Ubuntu App Center is the graphical software interface on Ubuntu Desktop. It is designed for easy discovery, installation and removal of common applications. It is convenient for desktop workflows, but it is not the primary tool for server automation or strict production package governance.
| Use case | App Center fit | Comment |
|---|---|---|
| Install browser, editor, media tool | Excellent. | Simple desktop workflow. |
| Discover common applications | Excellent. | Good for non-terminal users. |
| Install developer desktop tools | Good. | Check whether package is Snap or DEB. |
| Production server package | Poor fit. | Use APT, automation or vendor repo. |
| Fleet management | Poor fit. | Use Ansible, cloud-init, image build or MDM. |
Typical App Center flow
Ubuntu Desktop
โ
โโโ Open App Center
โโโ Search application
โโโ Review publisher and package type
โโโ Click Install
โโโ Authenticate if required
โโโ Launch application
โโโ Update through system update flowWhat to verify before installing
Before installing a desktop app:
[ ] Is the publisher trusted?
[ ] Is it a Snap, DEB or Flatpak package?
[ ] Is the app maintained?
[ ] Does it need sensitive permissions?
[ ] Is there an official vendor package?
[ ] Is it needed system-wide or only for one user?
[ ] Is it appropriate for a professional workstation?Graphical vs CLI management
| Approach | Strength | Weakness |
|---|---|---|
| App Center | Easy, visual, good for desktop users. | Less scriptable and less auditable. |
| APT CLI | Scriptable, auditable, server-friendly. | Requires terminal knowledge. |
| Snap CLI | Precise Snap control. | Requires understanding channels and refresh. |
| Flatpak CLI | Good app and permission control. | Separate ecosystem and runtimes. |
Useful desktop package checks
# Show installed Snap packages
snap list
# Show installed DEB packages
dpkg -l | less
# Search APT package
apt search package-name
# Show package origin
apt policy package-name
# Show Flatpak apps if installed
flatpak listDEB packages and APT: native Ubuntu package management
Ubuntu is based on Debian packaging. A .deb file is a Debian package. APT is the higher-level tool that downloads packages from repositories, resolves dependencies, installs upgrades and tracks package versions.
| Concept | Meaning | Command |
|---|---|---|
.deb | Local Debian package file. | sudo apt install ./file.deb |
apt | High-level package manager. | sudo apt install nginx |
dpkg | Low-level package tool. | dpkg -l |
| Repository | Package source. | /etc/apt/sources.list.d/ |
| Dependency | Package required by another package. | Resolved by APT. |
| Candidate version | Version APT would install. | apt policy package |
APT essentials
# Update package metadata
sudo apt update
# Install package from repository
sudo apt install nginx
# Install local DEB file with dependency resolution
sudo apt install ./package.deb
# Remove package but keep config
sudo apt remove package-name
# Remove package and config
sudo apt purge package-name
# Upgrade packages
sudo apt upgrade
# Show package details
apt show package-name
# Show installed and candidate versions
apt policy package-nameDEB installation flow
Local DEB file
โ
โโโ verify source
โโโ check vendor signature or checksum if available
โโโ install with apt
โ โโโ sudo apt install ./package.deb
โโโ inspect installed package
โ โโโ dpkg -l | grep package
โโโ verify service or binary
โโโ document install sourcePackage inspection commands
# List installed packages
dpkg -l
# Filter installed packages
dpkg -l | grep nginx
# Show package status
dpkg -s nginx
# Show files installed by package
dpkg -L nginx
# Find package owning a file
dpkg -S /usr/sbin/nginx
# Show APT history
less /var/log/apt/history.log
# Show available versions
apt-cache madison nginxProduction DEB rules
Do:
- prefer repository installation over random downloads
- use official vendor DEB if needed
- keep package source documented
- automate installation in scripts or Ansible
- review apt history after changes
Avoid:
- random DEB files from unknown sites
- manual installs without documentation
- local DEB files with no update path
- mixing multiple competing repositories
- installing critical server packages from untrusted sourcesSnap packages: bundled apps, channels, confinement and refresh
Snap packages bundle applications with their dependencies and run with a confinement model. They are distributed through the Snap ecosystem and can use channels such as stable, candidate, beta or edge. Snap refresh behavior is important because packages can update automatically.
| Snap concept | Meaning | Operational impact |
|---|---|---|
| Channel | Release track. | Stable is safer than beta or edge. |
| Revision | Specific build of a Snap. | Can support revert to previous revision. |
| Confinement | Sandbox permissions. | May restrict filesystem/device access. |
| Interface | Permission connection. | May require manual connection. |
| Refresh | Update mechanism. | Needs maintenance policy for servers. |
Snap commands
# List installed snaps
snap list
# Search for app
snap find package-name
# Show package info
snap info package-name
# Install stable channel
sudo snap install package-name --channel=stable
# Refresh snaps
sudo snap refresh
# Show refresh schedule
snap refresh --time
# Remove snap
sudo snap remove package-nameSnap operations
# Show changes
snap changes
# Show connections
snap connections package-name
# Connect interface
sudo snap connect package-name:interface
# Revert to previous revision if available
sudo snap revert package-name
# Hold refresh temporarily
sudo snap refresh --hold=24h package-name
# Logs for snap service
snap logs package-nameSnap decision tree
Considering Snap?
โ
โโโ Desktop application?
โ โโโ often acceptable
โ
โโโ Server daemon?
โ โโโ check refresh policy
โ โโโ check confinement
โ โโโ check logs
โ โโโ check rollback
โ
โโโ Need strict package timing?
โ โโโ prefer APT or control refresh window
โ
โโโ Need sandboxed app delivery?
โโโ Snap can be a good fitSnap strengths and cautions
| Strength | Caution |
|---|---|
| Bundled dependencies. | More disk usage than native package in some cases. |
| Simple install path. | Refresh behavior must be understood. |
| Sandbox confinement. | Permissions may surprise users or services. |
| Channels and revert. | Wrong channel can increase instability. |
Flatpak: desktop application distribution and Flathub ecosystem
Flatpak is a cross-distribution packaging system often used for desktop applications. Applications run with a sandbox model and rely on runtimes. Flatpak is especially common when users want recent desktop applications independently from the system package version.
| Flatpak concept | Meaning | Operational note |
|---|---|---|
| Remote | Package source. | Flathub is the common public remote. |
| Runtime | Shared dependency platform. | Required by Flatpak apps. |
| Application ID | Unique app identifier. | Example: org.gimp.GIMP. |
| Sandbox | Permission model. | Filesystem and device access can be restricted. |
| User install | Install for one user. | Useful on shared desktops. |
| System install | Install for all users. | Requires admin privileges. |
Install Flatpak support
# Install Flatpak
sudo apt update
sudo apt install flatpak
# Add Flathub remote
flatpak remote-add --if-not-exists flathub https://flathub.org/repo/flathub.flatpakrepo
# Search app
flatpak search gimp
# Install app
flatpak install flathub org.gimp.GIMP
# Run app
flatpak run org.gimp.GIMPFlatpak operations
# List installed apps
flatpak list
# List remotes
flatpak remotes
# Update apps
flatpak update
# Show app info
flatpak info org.gimp.GIMP
# Uninstall app
flatpak uninstall org.gimp.GIMP
# Remove unused runtimes
flatpak uninstall --unused
# Show app permissions
flatpak info --show-permissions org.gimp.GIMPFlatpak fit
| Context | Flatpak fit | Comment |
|---|---|---|
| Desktop apps | Strong. | Especially when recent versions matter. |
| Server daemons | Weak. | APT or vendor repo is usually better. |
| Developer workstation | Good. | Useful for GUI tools. |
| Production fleet | Limited. | Needs desktop app governance. |
Flatpak vs Snap mental model
Snap:
- integrated by default on Ubuntu
- used for desktop apps and selected system tools
- has channels and refresh behavior
Flatpak:
- popular for cross-distro desktop apps
- commonly uses Flathub
- strong desktop application ecosystem
- often installed separately on UbuntuPPA repositories: newer software versions and controlled exceptions
A PPA is a third-party APT repository hosted on Launchpad. PPAs are useful when the official Ubuntu repository does not provide the needed version, but they must be treated as trust decisions. Adding a PPA can change package candidates, dependencies and upgrade behavior.
| PPA use case | Good reason? | Production caution |
|---|---|---|
| Need newer desktop app | Sometimes. | Check maintainer and update history. |
| Need newer dev tool | Sometimes. | Prefer official vendor repo when available. |
| Need critical server package | Rarely. | Use official Ubuntu or vendor repo if possible. |
| Random tutorial says add PPA | No. | Understand why before adding. |
| Temporary test machine | Acceptable. | Disposable environment lowers risk. |
PPA commands
# Install helper if needed
sudo apt install software-properties-common
# Add PPA
sudo add-apt-repository ppa:owner/name
# Update metadata
sudo apt update
# Install package
sudo apt install package-name
# Show package origin and candidate
apt policy package-name
# Remove PPA source
sudo add-apt-repository --remove ppa:owner/namePPA governance flow
Need a PPA?
โ
โโโ Is package available in official Ubuntu repo?
โ โโโ yes -> prefer official repo
โ โโโ no
โ
โโโ Is there an official vendor repository?
โ โโโ yes -> prefer vendor repo
โ โโโ no
โ
โโโ Is PPA trusted and maintained?
โ โโโ no -> reject
โ โโโ yes
โ
โโโ Is this production?
โ โโโ yes -> document and test in staging
โ โโโ no -> acceptable for lab if understood
โ
โโโ Add with owner, reason and review dateInspect repository sources
# Source list files
ls -lah /etc/apt/sources.list.d/
# Search active deb lines
grep -R "^deb" /etc/apt/sources.list /etc/apt/sources.list.d/
# Show package candidate and priorities
apt policy package-name
# Show all versions
apt-cache madison package-name
# Recent repository changes
sudo find /etc/apt -type f -mtime -30 -lsPPA risk matrix
| Risk | Cause | Control |
|---|---|---|
| Wrong package version selected | PPA has higher candidate version. | Check apt policy. |
| Upgrade conflict | PPA dependencies diverge. | Test in staging. |
| Abandoned package | Maintainer stops updates. | Review regularly. |
| Supply-chain concern | Untrusted publisher. | Prefer official source. |
DEB vs Snap vs Flatpak vs PPA: practical comparison
| Criterion | DEB / APT | Snap | Flatpak | PPA |
|---|---|---|---|---|
| Best target | Server and system packages. | Desktop apps and selected tools. | Desktop apps. | Newer APT packages. |
| Dependency model | System dependencies. | Bundled dependencies. | Runtimes and bundled app parts. | APT dependencies from repo. |
| Update model | APT updates. | Snap refresh. | Flatpak update. | APT updates from PPA. |
| Sandboxing | Usually no app sandbox. | Confinement model. | Sandbox model. | Same as APT package. |
| Production servers | Best default. | Case-by-case. | Usually no. | Exception only. |
| Desktop apps | Good. | Good. | Good. | Sometimes. |
| Governance complexity | Medium. | Medium. | Medium. | High if unmanaged. |
Choice diagram
Choose package format
โ
โโโ Is this a production server dependency?
โ โโโ yes -> APT / DEB / official vendor repo
โ โโโ no
โ
โโโ Is this a desktop GUI app?
โ โโโ yes -> App Center, Snap or Flatpak
โ โโโ no
โ
โโโ Do you need newest upstream version?
โ โโโ yes -> official vendor repo first
โ โโโ then PPA if trusted
โ โโโ document exception
โ
โโโ Do you need sandboxed desktop app?
โ โโโ yes -> Snap or Flatpak
โ โโโ no
โ
โโโ Need reproducible fleet?
โโโ automate APT and pin sourcesUse-case recommendations
| Use case | Preferred option | Reason |
|---|---|---|
| Nginx on server | APT. | Native service integration. |
| PostgreSQL production | Ubuntu repo or official PostgreSQL repo. | Clear lifecycle and updates. |
| Docker Engine | Official Docker repo or Ubuntu package by policy. | Version and support clarity. |
| Desktop editor | App Center, Snap, DEB or vendor repo. | Depends on vendor support. |
| Graphic design app | Flatpak or Snap often acceptable. | Desktop app freshness. |
Security, provenance and update governance
Software installation is a supply-chain decision. Every package source can install code with user or system privileges. Good governance means knowing where software comes from, how it updates, who maintains it and how to roll back when it breaks.
| Risk | Example | Control |
|---|---|---|
| Untrusted publisher | Random DEB or PPA. | Use official source or trusted vendor. |
| Unexpected updates | Snap refresh, PPA version change. | Control channels, windows and policy. |
| Dependency conflict | PPA overrides Ubuntu package. | Check apt policy and pin if needed. |
| Abandoned package | No security patches. | Review source health. |
| Secret exposure | Install script writes credentials. | Inspect scripts, avoid long-lived secrets. |
| No rollback path | Manual install with no version record. | Document package version and source. |
Pre-install security checklist
[ ] Is the source official?
[ ] Is the publisher trusted?
[ ] Is the package maintained?
[ ] Is the update mechanism known?
[ ] Is the package type known?
[ ] Is the installation reversible?
[ ] Is the version documented?
[ ] Does it add a repository?
[ ] Does it add a signing key?
[ ] Does it run a script as root?
[ ] Does it request sensitive permissions?
[ ] Is it approved for production?Install script warning pattern
Risky pattern:
curl https://example.com/install.sh | sudo bash
Safer pattern:
1. Download script
2. Inspect script
3. Verify source
4. Verify checksum or signature if available
5. Run intentionally
6. Record package source
7. Test in staging firstRepository audit commands
# Show APT sources
grep -R "^deb" /etc/apt/sources.list /etc/apt/sources.list.d/
# List source files
ls -lah /etc/apt/sources.list.d/
# Show package origin
apt policy package-name
# Show installed Snap packages
snap list
# Show Flatpak remotes and apps
flatpak remotes
flatpak list
# Show recent package operations
less /var/log/apt/history.logProduction software governance
Governance record:
- package name
- package type
- source repository
- publisher
- installed version
- update policy
- rollback method
- owner
- reason
- review dateTroubleshooting software installation and updates
| Symptom | Likely cause | First command | Fix direction |
|---|---|---|---|
| APT lock error | Another apt/dpkg process running. | ps aux | grep -E 'apt|dpkg' | Wait or investigate process. |
| Broken packages | Interrupted install or dependency conflict. | sudo dpkg --configure -a | Repair dpkg and dependencies. |
| Repository signature error | Missing or wrong signing key. | sudo apt update | Fix keyring or remove repo. |
| Package version unexpected | PPA or vendor repo changes candidate. | apt policy package-name | Pin, remove repo or choose version. |
| Snap app cannot access file | Confinement or interface issue. | snap connections app | Connect interface or adjust path. |
| Flatpak app missing permission | Sandbox permission. | flatpak info --show-permissions app | Adjust permission intentionally. |
APT repair commands
# Repair interrupted package configuration
sudo dpkg --configure -a
# Fix broken dependencies
sudo apt -f install
# Refresh metadata
sudo apt update
# Clean package cache
sudo apt clean
# Check holds
apt-mark showhold
# Review package history
less /var/log/apt/history.log
less /var/log/apt/term.logPackage troubleshooting decision tree
Software install failed
โ
โโโ Read exact error
โ
โโโ APT lock?
โ โโโ check apt/dpkg process
โ
โโโ DNS or network?
โ โโโ resolvectl, dig, curl
โ
โโโ Signature or key?
โ โโโ inspect source and keyring
โ
โโโ Dependency conflict?
โ โโโ apt policy, apt -f install, holds
โ
โโโ PPA conflict?
โ โโโ disable source, apt update
โ
โโโ Snap confinement?
โ โโโ snap connections
โ
โโโ Flatpak permission?
โโโ flatpak info --show-permissionsDisable source temporarily
# Disable a repository source file
sudo mv /etc/apt/sources.list.d/vendor.list \
/etc/apt/sources.list.d/vendor.list.disabled
# Refresh metadata
sudo apt update
# Check package candidate again
apt policy package-nameSnap and Flatpak diagnostics
# Snap
snap list
snap info package-name
snap changes
snap connections package-name
snap logs package-name
# Flatpak
flatpak list
flatpak remotes
flatpak info app-id
flatpak info --show-permissions app-id
flatpak updateFinal checklist and command cheat sheet
Software management checklist
[ ] Package type is understood
[ ] Source is trusted
[ ] Publisher is verified
[ ] Update mechanism is known
[ ] Rollback path exists
[ ] Repository additions are documented
[ ] PPAs are justified
[ ] Vendor repos are preferred over random PPAs
[ ] Local DEB files are avoided unless necessary
[ ] Snap refresh behavior is understood
[ ] Flatpak remotes are known
[ ] Production servers use governed sources
[ ] Package changes are traceable
[ ] Staging test exists for critical software
[ ] Security updates are plannedAPT / DEB cheat sheet
sudo apt update
sudo apt install package-name
sudo apt install ./package.deb
sudo apt remove package-name
sudo apt purge package-name
sudo apt upgrade
apt search package-name
apt show package-name
apt policy package-name
dpkg -l | grep package-name
dpkg -L package-name
dpkg -S /path/to/file
less /var/log/apt/history.logSnap / Flatpak / PPA cheat sheet
# Snap
snap list
snap find package-name
snap info package-name
sudo snap install package-name
sudo snap refresh
snap refresh --time
sudo snap remove package-name
# Flatpak
flatpak remotes
flatpak search package-name
flatpak install flathub app-id
flatpak run app-id
flatpak update
flatpak uninstall app-id
# PPA
sudo apt install software-properties-common
sudo add-apt-repository ppa:owner/name
sudo apt update
apt policy package-name
sudo add-apt-repository --remove ppa:owner/nameFinal rule
Installing software means trusting a publisher, an update channel and a dependency chain. Use App Center for simple desktop workflows, APT and DEB for professional server management, Snap or Flatpak for selected desktop/application cases, and PPAs only as controlled exceptions.
Production default
Production software default:
- Ubuntu LTS
- official Ubuntu repositories
- official vendor repositories when needed
- no random PPAs
- no unknown DEB downloads
- package baseline automated
- update policy documented
- rollback path tested
- package history reviewed after changesUbuntu release model
Ubuntu follows a predictable release model with two main families: LTS releases and interim releases. LTS means Long-Term Support and is the default choice for production systems. Interim releases provide newer software faster, but with a much shorter support window.
In professional environments, version choice is not cosmetic. It impacts security patching, kernel behavior, package versions, compatibility, cloud images, automation, compliance, upgrade windows and rollback strategy.
| Release type | Typical cadence | Support model | Best usage |
|---|---|---|---|
| LTS | Every 2 years | Long support window, production-oriented. | Servers, cloud, enterprise, databases, Kubernetes nodes. |
| Interim | Between LTS releases | Short support window. | Testing, newer kernels, recent desktop features, short-lived dev systems. |
| Point release | LTS refresh images | Updated installer media for the same LTS family. | Fresh installs with fewer post-install updates. |
| ESM / Ubuntu Pro | After standard support or for broader package coverage | Extended security maintenance model. | Long-lived enterprise systems that cannot upgrade quickly. |
Release cycle mental diagram
Ubuntu release cycle
โ
โโโ LTS release
โ โโโ stable baseline
โ โโโ long security maintenance
โ โโโ enterprise-friendly
โ โโโ common cloud image
โ โโโ recommended for production
โ
โโโ Interim release
โ โโโ newer kernel
โ โโโ newer user-space
โ โโโ shorter support
โ โโโ useful for testing
โ โโโ requires upgrade discipline
โ
โโโ Point release
โ โโโ refreshed installer image
โ โโโ accumulated updates
โ โโโ useful for new deployments
โ
โโโ ESM / extended support
โโโ longer security coverage
โโโ used when upgrade is delayed
โโโ enterprise lifecycle toolWhat version choice affects
Version choice affects:
- kernel version
- driver support
- OpenSSL version
- Python / PHP / Node packages
- systemd behavior
- Netplan / network stack
- cloud-init behavior
- container runtime support
- security patch horizon
- application certification
- upgrade planning
- operational riskLTS vs interim: practical comparison
| Criterion | LTS | Interim |
|---|---|---|
| Primary goal | Stability and long-term operation. | Newer features and faster evolution. |
| Production fit | Excellent default choice. | Only if justified and actively managed. |
| Security maintenance | Long support window. | Short support window. |
| Kernel freshness | Stable, sometimes less recent. | More recent. |
| Package freshness | Conservative. | Newer versions. |
| Operational burden | Lower. | Higher, because upgrades are frequent. |
| Cloud image standardization | Excellent. | Less common for long-lived fleets. |
| Best for | Servers, DBs, APIs, CI runners, cloud nodes. | Labs, test machines, recent hardware, feature validation. |
Decision shortcut
Choose LTS when:
- server is production
- database is production
- uptime matters
- patching must be predictable
- infrastructure must be standardized
- cloud images are reused
- upgrade windows are rare
- compliance matters
Choose interim when:
- testing newer kernel
- testing new desktop stack
- testing new hardware support
- environment is disposable
- upgrade cadence is accepted
- production risk is lowCommon professional rule
Bad decision examples
Bad:
- installing an interim release on a long-lived database server
- using different Ubuntu versions randomly across servers
- upgrading production without staging validation
- ignoring end-of-support dates
- choosing a release because it is "newer" only
Better:
- define one production LTS baseline
- define patch cadence
- define upgrade window
- keep rollback images
- document exceptionsCurrent release examples and how to read them
Ubuntu versions use a year.month format. For example, 24.04 means a release from April 2024. LTS releases are usually April releases in even-numbered years. Point releases such as 24.04.4 are refreshed installation images for the same LTS family.
| Example | Meaning | Use case | What to remember |
|---|---|---|---|
| 24.04 LTS | Noble Numbat LTS family. | Production baseline, servers, cloud. | Long-term support release. |
| 24.04.x LTS | Point release inside the 24.04 LTS family. | Fresh install image with accumulated updates. | Still same LTS generation. |
| 25.10 | Interim release. | Short-lived dev/test or recent features. | Requires faster upgrade planning. |
| 22.04 LTS | Previous LTS generation. | Existing production fleets. | Plan migration before support constraints become urgent. |
| 20.04 LTS | Older LTS generation. | Legacy systems. | Often requires ESM/Pro or migration plan. |
24.04.4 LTS means the 4th point-release image of Ubuntu 24.04 LTS, not a completely different major OS generation.Version naming pattern
Ubuntu version format:
YY.MM
Examples:
22.04 = April 2022
24.04 = April 2024
25.10 = October 2025
LTS examples:
20.04 LTS
22.04 LTS
24.04 LTS
Point release examples:
22.04.5 LTS
24.04.3 LTS
24.04.4 LTS
Meaning:
major LTS family + refreshed installer mediaRelease interpretation flow
See a version number
โ
โผ
Is it marked LTS?
โโโ yes
โ โโโ good production candidate
โ โโโ check standard support date
โ โโโ check Pro/ESM if long-lived
โ
โโโ no
โโโ interim release
โโโ check short support date
โโโ use mainly for dev/test unless justifiedUseful official sources
Ubuntu release cycle:
https://ubuntu.com/about/release-cycle
Ubuntu releases:
https://releases.ubuntu.com/
Ubuntu release list:
https://documentation.ubuntu.com/project/release-team/list-of-releases/
Ubuntu release notes:
https://documentation.ubuntu.com/release-notes/Support timeline: standard support, ESM and end of life
Support lifecycle matters because an unsupported server becomes a security and compliance risk. Once a release is out of standard support, teams must either upgrade, use an extended maintenance option if available, or retire the system.
| Lifecycle phase | Meaning | Operational action |
|---|---|---|
| Active standard support | Normal security and maintenance updates. | Patch regularly, monitor advisories. |
| Point release phase | Refreshed install media for LTS family. | Use latest point image for new servers. |
| Approaching end of standard support | Upgrade planning becomes urgent. | Inventory, staging test, migration window. |
| ESM / extended maintenance | Extended security coverage for supported scenarios. | Use as controlled bridge, not as excuse to avoid upgrades forever. |
| End of life | No normal support path for that release. | Upgrade, isolate, replace or retire. |
Support responsibility map
Operating system lifecycle
โ
โโโ security updates
โโโ kernel updates
โโโ package patches
โโโ repository availability
โโโ vendor support
โโโ compliance status
Operations team responsibility
โ
โโโ know release version
โโโ know support end date
โโโ patch regularly
โโโ plan reboots
โโโ test upgrades
โโโ avoid unsupported serversTimeline diagram
LTS release
โ
โโโ Year 0
โ โโโ release becomes production candidate
โ
โโโ Years 0-5
โ โโโ standard security maintenance
โ โโโ point releases
โ โโโ cloud images maintained
โ โโโ normal production usage
โ
โโโ After standard support
โ โโโ upgrade recommended
โ โโโ ESM / Ubuntu Pro may be used
โ
โโโ Long-lived legacy phase
โโโ higher operational risk
โโโ stronger justification required
โโโ migration plan should existOperational policy example
Company Ubuntu policy:
- production servers use LTS only
- new projects use current LTS point image
- old LTS versions are reviewed quarterly
- unsupported releases are forbidden
- interim releases require architecture approval
- upgrade tests must pass in staging
- rollback image must exist
- patching window is monthly
- emergency CVE patching is immediateHow to choose the right Ubuntu version
| Context | Recommended choice | Reason |
|---|---|---|
| Production web server | Latest stable LTS point release. | Security support, standardization, predictable patching. |
| Database server | LTS only. | Data systems need stability and tested upgrade windows. |
| Kubernetes node | LTS supported by your Kubernetes distribution. | Kernel, container runtime and vendor compatibility. |
| CI runner | LTS by default. | Reproducible builds and stable toolchains. |
| Developer workstation | LTS for stability, interim for recent desktop features. | Depends on tolerance for upgrades. |
| Recent hardware | LTS with HWE kernel or interim if required. | Driver and kernel support may matter. |
| Short-lived lab | Interim can be acceptable. | Easy to rebuild if support ends. |
Production decision matrix
Production workload?
โโโ yes -> LTS
โโโ no
โ
โผ
Long-lived machine?
โโโ yes -> LTS
โโโ no
โ
โผ
Need newest kernel/userspace?
โโโ yes -> interim or LTS HWE
โโโ no -> LTS
Compliance or security audit?
โโโ LTS + documented patch policyVersion choice scoring
| Question | If yes | Impact |
|---|---|---|
| Will this server live more than 12 months? | Choose LTS. | Reduces upgrade pressure. |
| Does it host production data? | Choose LTS. | Stability matters more than novelty. |
| Is it part of a fleet? | Standardize on one LTS. | Improves automation and support. |
| Does hardware need a newer kernel? | Evaluate HWE or interim. | Driver support may override default. |
| Is it disposable? | Interim is acceptable. | Lower lifecycle risk. |
Upgrade strategy: from one Ubuntu generation to another
Ubuntu upgrades should be treated as infrastructure changes, not casual package updates. A release upgrade may change kernel, libraries, system services, defaults, packages, Python versions, OpenSSL behavior, firewall tooling or network configuration.
Safe upgrade process
1. Inventory
- server role
- Ubuntu version
- kernel version
- installed packages
- services
- external repositories
2. Prepare
- backup data
- snapshot VM
- export configs
- check disk space
- review release notes
3. Test
- clone staging
- upgrade staging
- run application tests
- validate logs and services
4. Execute
- schedule maintenance
- stop risky jobs
- upgrade
- reboot
- validate services
5. Verify
- application health
- network ports
- logs
- performance
- monitoring
6. Rollback if needed
- restore snapshot
- restore old image
- revert DNS or load balancerUpgrade architecture
Current production server
โ
โโโ snapshot / AMI / backup
โโโ package inventory
โโโ config export
โโโ staging clone
โ
โผ
Staging upgrade
โ
โโโ do-release-upgrade
โโโ reboot
โโโ service validation
โโโ application tests
โโโ performance checks
โ
โผ
Production rollout
โ
โโโ one node first
โโโ monitor
โโโ continue rollout
โโโ keep rollback windowBlue/green alternative
Instead of in-place upgrade:
1. Build new Ubuntu LTS image
2. Install application stack
3. Restore or connect data
4. Run smoke tests
5. Attach to load balancer
6. Shift traffic gradually
7. Keep old server as rollback
8. Retire old server after validation
Often safer for:
- web apps
- stateless APIs
- container hosts
- cloud workloadsCommands to identify version, support and upgrade state
Version inspection
# Ubuntu version
lsb_release -a
# OS release metadata
cat /etc/os-release
# Kernel version
uname -a
# Host and OS summary
hostnamectl
# Architecture
dpkg --print-architecture
# Check codename only
lsb_release -csPackage maintenance
# Refresh package indexes
sudo apt update
# Show upgradeable packages
apt list --upgradable
# Upgrade installed packages
sudo apt upgrade
# Full upgrade with dependency changes
sudo apt full-upgrade
# Remove unused packages
sudo apt autoremove
# Check held packages
apt-mark showholdReboot and upgrade readiness
# Check if reboot is required
test -f /var/run/reboot-required && cat /var/run/reboot-required
# See packages requiring reboot if available
cat /var/run/reboot-required.pkgs 2>/dev/null
# Check disk space before upgrades
df -h
# Check package manager locks
ps aux | grep -E 'apt|dpkg'
# Repair interrupted package operation
sudo dpkg --configure -a
sudo apt -f installRelease upgrade
# Install release upgrade tool if missing
sudo apt install update-manager-core
# Check release upgrader configuration
cat /etc/update-manager/release-upgrades
# Start release upgrade
sudo do-release-upgrade
# Server session safety
sudo apt install screen
screen -S upgrade
sudo do-release-upgradeUbuntu versions in cloud images, AMIs and automation
In cloud environments, Ubuntu versioning becomes part of your infrastructure standard. Teams usually define a base image: Ubuntu LTS version, packages, users, SSH hardening, monitoring agent, logging agent, cloud-init behavior and security baseline.
| Cloud concept | Ubuntu version impact | Best practice |
|---|---|---|
| AMI / image | Defines OS baseline and package versions. | Use approved LTS image family. |
| cloud-init | Bootstraps users, packages and config. | Test with target LTS version. |
| Terraform | References image IDs or filters. | Avoid unpinned surprise changes in production. |
| Golden image | Pre-baked hardened server template. | Rebuild regularly with patches. |
| Autoscaling | New nodes inherit image baseline. | Validate image before rollout. |
| Patch management | Images age quickly if not rebuilt. | Rebuild and replace, not only patch in place. |
Cloud image lifecycle
Official Ubuntu LTS cloud image
โ
โผ
Golden image pipeline
โ
โโโ install baseline packages
โโโ configure SSH
โโโ add monitoring agent
โโโ apply security hardening
โโโ apply updates
โโโ run validation tests
โ
โผ
Approved image
โ
โโโ used by Terraform
โโโ used by autoscaling groups
โโโ used by Kubernetes nodes
โโโ used by application servers
โ
โผ
Periodic rebuild
โโโ security patches
โโโ config changes
โโโ new point releaseCloud version rules
Recommended:
- use LTS for production cloud VMs
- pin or control image selection
- rebuild images regularly
- test cloud-init on target release
- document image version
- keep rollback image available
- avoid unmanaged snowflake servers
Avoid:
- latest image without validation
- random Ubuntu versions across fleet
- old images with no patch process
- manual changes after boot with no automationVersion-related risks and anti-patterns
| Anti-pattern | Risk | Correction |
|---|---|---|
| Using interim release for long-lived production | Support ends quickly, forced upgrade under pressure. | Use LTS for production. |
| No inventory of Ubuntu versions | Unsupported servers remain hidden. | Maintain fleet inventory. |
| Ignoring release notes | Breaking changes surprise production. | Review release notes before upgrade. |
| Mixing many versions randomly | Automation, debugging and support become harder. | Define approved baselines. |
| No rollback image | Failed upgrade becomes long outage. | Snapshot or blue/green rollout. |
| Third-party repositories unmanaged | Upgrade conflicts and broken packages. | Audit external apt sources. |
| Kernel upgrade without reboot plan | Security patch is installed but not active. | Track reboot-required state. |
| Old LTS kept forever | Security and compliance risk grows. | Plan migration or use ESM as a temporary bridge. |
Version risk decision tree
Server has old Ubuntu version
โ
โผ
Is it still in standard support?
โโโ yes
โ โโโ keep patched
โ โโโ plan future migration
โ
โโโ no
โ
โผ
Is ESM / Pro enabled and valid?
โโโ yes
โ โโโ use as temporary bridge
โ โโโ plan upgrade
โ
โโโ no
โ
โผ
Risk is high
โโโ isolate if necessary
โโโ snapshot
โโโ test upgrade path
โโโ migrate or retireUpgrade failure symptoms
After upgrade, check:
- service fails to start
- port no longer listens
- Python or PHP version changed
- OpenSSL behavior changed
- Nginx config warning becomes fatal
- database extension mismatch
- kernel module missing
- firewall rule behavior changed
- DNS resolution changed
- cloud-init or network config changedProduction checklist for Ubuntu version strategy
Version governance checklist
[ ] Approved Ubuntu LTS baseline is defined
[ ] Interim releases require explicit exception
[ ] Fleet inventory contains Ubuntu version
[ ] Fleet inventory contains kernel version
[ ] Support end dates are tracked
[ ] Old LTS migration plan exists
[ ] ESM / Pro usage is documented if used
[ ] Golden images are versioned
[ ] Cloud image selection is controlled
[ ] Third-party apt repositories are inventoried
[ ] Release notes are reviewed before upgrade
[ ] Staging upgrade test is mandatory
[ ] Rollback method is documented
[ ] Reboot policy exists for kernel updates
[ ] Patch cadence is documentedMinimum production baseline
Production Ubuntu baseline:
- LTS release
- latest approved point image
- security updates enabled
- patch window defined
- reboot policy defined
- monitored support end date
- standard package repositories
- controlled third-party repositories
- backup/snapshot before major upgrade
- staging validation before production rolloutFinal decision summary
| Question | Answer |
|---|---|
| What should I use for production? | Ubuntu LTS. |
| Should I use the latest interim release on a server? | Only for a short-lived or explicitly justified case. |
| Should I standardize versions? | Yes, define one or two approved LTS baselines. |
| Should I upgrade in place? | Only with backup, staging test and rollback plan. |
| Is ESM a replacement for upgrading? | No, it is usually a bridge for long-lived systems. |
| What matters most? | Support horizon, patching, compatibility and rollback. |
Final rule
Choose LTS for stability, track support dates, patch regularly, test upgrades in staging, keep rollback images, and never let unsupported servers become invisible infrastructure.
Why the terminal matters
The terminal is the fastest and most precise way to operate Ubuntu. It gives direct access to files, processes, services, logs, permissions, packages, networking, storage and automation. On a server, there is often no graphical interface: SSH plus terminal is the normal administration model.
BASH is the default command-line shell on many Ubuntu systems. It lets you run commands, chain them, inspect output, redirect logs, write scripts and automate repeatable tasks. A developer who understands BASH can deploy, debug and operate systems more effectively.
| Use case | Terminal advantage | Example |
|---|---|---|
| Server administration | Works remotely over SSH. | ssh deploy@server |
| Debugging | Direct logs and service state. | journalctl -u nginx |
| File operations | Fast navigation, copy, move, search. | find /var/log -name "*.log" |
| Automation | Repeatable scripts. | backup.sh, deploy.sh |
| Security | Precise control of users and permissions. | chmod, chown, sudo |
| Performance | Immediate resource inspection. | top, df -h, free -h |
Terminal control map
Ubuntu terminal
โ
โโโ Files
โ โโโ ls
โ โโโ cd
โ โโโ pwd
โ โโโ cp
โ โโโ mv
โ โโโ rm
โ
โโโ Text and search
โ โโโ cat
โ โโโ less
โ โโโ head
โ โโโ tail
โ โโโ grep
โ โโโ find
โ
โโโ Permissions
โ โโโ sudo
โ โโโ chmod
โ โโโ chown
โ โโโ groups
โ โโโ id
โ
โโโ System operations
โ โโโ systemctl
โ โโโ journalctl
โ โโโ apt
โ โโโ ssh
โ
โโโ Automation
โโโ variables
โโโ pipes
โโโ redirects
โโโ loops
โโโ scriptsMental model
Command anatomy:
command [options] [arguments]
Examples:
ls -lah /var/log
cp -a source destination
rm old-file.log
sudo systemctl restart nginx
Where:
- command = program to run
- options = behavior modifiers
- arguments = files, directories, services, valuesBASH basics: prompt, paths, history, completion, pipes and redirects
BASH is both an interactive shell and a scripting language. It receives commands, expands variables, resolves paths, runs programs, connects outputs to inputs and lets you automate tasks through scripts.
| Concept | Meaning | Example |
|---|---|---|
| Prompt | Where you type commands. | user@host:~$ |
| Home directory | Your personal directory. | ~, /home/deploy |
| Current directory | Where commands operate by default. | pwd |
| Absolute path | Path from root /. | /var/log/syslog |
| Relative path | Path from current directory. | ../backup |
| History | Previous commands. | history |
| Tab completion | Auto-complete command or path. | Press TAB |
BASH essentials
# Show current directory
pwd
# Show current user
whoami
# Show command history
history
# Clear screen
clear
# Show current shell
echo $SHELL
# Show environment variables
env
# Show PATH
echo $PATH
# Show command location
which bash
which python3
which nginxPipes and redirects
# Pipe output to another command
ps aux | grep nginx
# Redirect output to a file
ls -lah /var/log > files.txt
# Append output to a file
date >> audit.log
# Redirect errors too
command > output.log 2> error.log
# Redirect output and errors together
command > all.log 2>&1
# View long output page by page
journalctl -u nginx | lessUseful keyboard shortcuts
| Shortcut | Action |
|---|---|
TAB | Complete command or filename. |
Ctrl + C | Interrupt current command. |
Ctrl + L | Clear screen. |
Ctrl + R | Search command history. |
Ctrl + A | Move to beginning of line. |
Ctrl + E | Move to end of line. |
Navigation: pwd, ls, cd and filesystem orientation
Navigation is the first terminal skill. You need to know where you are, what files are present, how to move between directories and how to distinguish absolute and relative paths.
| Command | Purpose | Example |
|---|---|---|
pwd | Print current directory. | pwd |
ls | List files. | ls |
ls -lah | Detailed list, hidden files, human sizes. | ls -lah /etc |
cd | Change directory. | cd /var/log |
cd .. | Move to parent directory. | cd .. |
cd ~ | Move to home directory. | cd ~ |
cd - | Return to previous directory. | cd - |
Navigation examples
# Where am I?
pwd
# List current directory
ls
# Detailed list with hidden files
ls -lah
# Go to logs
cd /var/log
# Go home
cd ~
# Go one level up
cd ..
# Go to previous directory
cd -
# List directory without entering it
ls -lah /etc/nginxUbuntu filesystem map
/
โโโ etc system configuration
โโโ home user home directories
โโโ var logs, cache, databases, runtime data
โโโ srv service/application data
โโโ opt optional third-party software
โโโ usr installed programs and libraries
โโโ tmp temporary files
โโโ boot bootloader and kernel files
โโโ dev device files
โโโ proc process and kernel virtual filesystem
โโโ root root user's home directoryPath examples
Absolute paths:
- /etc/nginx/nginx.conf
- /var/log/syslog
- /srv/myapp
- /home/deploy/.ssh/authorized_keys
Relative paths:
- ./script.sh
- ../backup
- logs/app.log
- ../../etc/example.conf
Special paths:
- . current directory
- .. parent directory
- ~ current user's home directory
- / filesystem rootrm, chmod and chown depend heavily on the path you give. Always verify with pwd and ls before destructive actions.File operations: cp, mv, rm, mkdir, touch and safe handling
File operations are powerful and dangerous. Copying, moving and deleting files from the terminal is fast, but usually does not ask for confirmation unless you request it. In production, create backups before editing or deleting configuration files.
| Command | Purpose | Safe example |
|---|---|---|
cp | Copy files. | cp file.txt file.bak |
cp -a | Copy preserving metadata. | cp -a /etc/nginx /etc/nginx.bak |
mv | Move or rename. | mv app.conf app.conf.disabled |
rm | Remove file. | rm old.log |
mkdir | Create directory. | mkdir -p /srv/myapp/logs |
touch | Create empty file or update timestamp. | touch deploy.log |
File operation examples
# Create a directory tree
mkdir -p /srv/myapp/releases
# Create an empty file
touch /tmp/test.txt
# Copy a file
cp config.ini config.ini.bak
# Copy a directory with attributes
cp -a /etc/nginx /etc/nginx.bak.$(date +%Y%m%d-%H%M%S)
# Rename a file
mv old.conf new.conf
# Move a file to backup directory
mv app.log /tmp/app.log.bak
# Remove a file
rm old-file.txtDanger zone: rm
# Remove one file
rm file.txt
# Ask before deleting
rm -i file.txt
# Remove directory recursively
rm -r directory
# Force recursive delete - dangerous
rm -rf directory
# Extremely dangerous if path is wrong
sudo rm -rf /some/pathSafe deletion workflow
Before deleting:
1. Show current directory
pwd
2. List target
ls -lah target
3. Check size if directory
du -sh target
4. Move to quarantine first
mv target /tmp/target.to-delete
5. Verify service still works
6. Delete later if safeBackup-before-edit pattern
# Backup config before edit
sudo cp -a /etc/nginx/nginx.conf \
/etc/nginx/nginx.conf.bak.$(date +%Y%m%d-%H%M%S)
# Edit file
sudo vim /etc/nginx/nginx.conf
# Validate before reload
sudo nginx -t
# Reload if valid
sudo systemctl reload nginxRead, inspect and search files: cat, less, head, tail, grep, find
Reading and searching files is a core Linux skill. Logs, configuration, service units, environment files and scripts are plain text. The right command depends on file size and whether you need the beginning, the end, live follow or keyword search.
| Command | Best for | Example |
|---|---|---|
cat | Small files. | cat /etc/os-release |
less | Large files, page navigation. | less /var/log/syslog |
head | First lines of a file. | head -50 app.log |
tail | Last lines of a file. | tail -100 app.log |
tail -f | Follow a log live. | tail -f /var/log/syslog |
grep | Search text. | grep -i error app.log |
find | Find files by name, size, age. | find /var/log -name "*.log" |
Read commands
# Small file
cat /etc/os-release
# Large file, scroll
less /var/log/syslog
# First lines
head -50 /var/log/syslog
# Last lines
tail -100 /var/log/syslog
# Follow live
tail -f /var/log/syslog
# Number lines
nl config.ini | lessSearch examples
# Case-insensitive search
grep -i "error" app.log
# Search recursively
grep -R "server_name" /etc/nginx
# Show line numbers
grep -n "listen" /etc/nginx/sites-enabled/*
# Exclude noisy files
grep -R "DEBUG" /srv/myapp --exclude="*.pyc"
# Search compressed logs
zgrep -i "error" /var/log/syslog.*.gz
# Find files by name
find /etc -name "*.conf"
# Find large files
find /var -type f -size +100M -exec ls -lh {} \;
# Find recently modified files
find /etc -type f -mtime -2 -lsLog reading pattern
When debugging logs:
1. Identify time window
2. Read service-specific logs first
3. Search for first real error
4. Correlate with recent deploy/update
5. Avoid reading huge files without filters
Examples:
journalctl -u nginx --since "30 min ago"
grep -i "permission denied" app.log
grep -i "connection refused" app.logless for large files, tail for recent lines, grep for patterns, and find for unknown locations.sudo: super-user privileges and safe administration
sudo runs a command with elevated privileges, usually as root. On Ubuntu, normal users do not directly administer protected system areas. Instead, trusted users are added to the sudo group and elevate only when needed.
| Command | Meaning | Example |
|---|---|---|
sudo command | Run one command as root. | sudo apt update |
sudo -l | List allowed sudo commands. | sudo -l |
sudo -u user command | Run command as another user. | sudo -u postgres psql |
sudo -i | Start root login shell. | Use rarely and carefully. |
visudo | Edit sudoers safely. | sudo visudo |
sudo examples
# Update packages
sudo apt update
# Edit protected config
sudo vim /etc/ssh/sshd_config
# Restart service
sudo systemctl restart nginx
# Read protected log
sudo tail -100 /var/log/auth.log
# Run command as postgres user
sudo -u postgres psql
# Check your sudo privileges
sudo -lsudo mental model
Normal user
โ
โโโ can read/write own files
โโโ cannot edit system files
โโโ cannot restart system services
โโโ cannot install packages
โ
โผ
sudo
โ
โโโ asks for authentication
โโโ checks sudoers policy
โโโ logs action
โโโ runs command with elevated privilegeUser sudo setup
# Create user
sudo adduser deploy
# Add to sudo group
sudo usermod -aG sudo deploy
# Check group membership
groups deploy
# Show sudo group
getent group sudo
# Edit sudoers safely
sudo visudosudo safety rules
Do:
- use sudo for specific commands
- keep named admin users
- review sudo group members
- use visudo for sudoers edits
- log administrative changes
Avoid:
- logging in directly as root
- running long sessions as root
- using sudo with unknown scripts
- using sudo rm -rf without path verification
- granting sudo to every usersudo is not just โpermission acceptedโ. It is root-level control. Treat every sudo command as potentially system-changing.chmod: file permission modes and practical examples
chmod changes file permissions. Linux permissions are split into three groups: owner, group and others. Each can have read, write and execute permissions. For directories, execute means โcan enter/traverseโ.
Permission notation
Example:
-rw-r--r-- 1 root root 1200 app.conf
Breakdown:
- file type
rw- owner permissions
r-- group permissions
r-- others permissions
r = read
w = write
x = execute / enter directory| Mode | Meaning | Typical use |
|---|---|---|
600 | Owner read/write only. | Private keys, secret files. |
640 | Owner read/write, group read. | App env files readable by service group. |
644 | Owner write, everyone read. | Normal config or static files. |
700 | Owner full access only. | .ssh directory. |
755 | Owner write, everyone read/execute. | Directories and public scripts. |
777 | Everyone can read/write/execute. | Almost never acceptable. |
chmod examples
# Normal file readable by everyone, writable by owner
chmod 644 config.ini
# Directory accessible by everyone, writable by owner
chmod 755 /srv/myapp
# Private SSH directory
chmod 700 ~/.ssh
# Private SSH key
chmod 600 ~/.ssh/id_ed25519
# Authorized keys
chmod 600 ~/.ssh/authorized_keys
# Make script executable
chmod +x deploy.sh
# Remove write access for group and others
chmod go-w file.txtNumeric mode logic
Permission values:
r = 4
w = 2
x = 1
Examples:
7 = 4 + 2 + 1 = rwx
6 = 4 + 2 = rw-
5 = 4 + 1 = r-x
4 = 4 = r--
chmod 755:
owner = 7 = rwx
group = 5 = r-x
other = 5 = r-x
chmod 640:
owner = 6 = rw-
group = 4 = r--
other = 0 = ---Permission troubleshooting
# Show permissions
ls -lah file
# Show full path permissions
namei -l /srv/myapp/current/.env
# Show current user groups
id
# Test as service user
sudo -u myapp cat /srv/myapp/.envchmod 777 to โfixโ access. It hides the real ownership problem and creates a security risk.chown: file ownership, groups and service users
chown changes file owner and group. Many permission errors are not caused by missing chmod, but by wrong ownership. Services such as Nginx, Gunicorn, PostgreSQL or application workers must be able to read the files they need, but should not own everything as root.
| Command | Meaning | Example |
|---|---|---|
chown user file | Change owner. | sudo chown deploy app.log |
chown user:group file | Change owner and group. | sudo chown deploy:www-data app |
chown :group file | Change group only. | sudo chown :www-data static |
chown -R | Recursive ownership change. | Use carefully on directories. |
id user | Show UID, GID and groups. | id myapp |
chown examples
# Change one file owner
sudo chown deploy app.log
# Change owner and group
sudo chown deploy:www-data /srv/myapp
# Change group only
sudo chown :www-data /srv/myapp/static
# Recursive change, use carefully
sudo chown -R deploy:www-data /srv/myapp
# App env file owned by root, readable by app group
sudo chown root:myapp /srv/myapp/.env
sudo chmod 640 /srv/myapp/.envOwnership model for web app
/srv/myapp
โ
โโโ code files
โ โโโ owner: deploy
โ โโโ group: www-data
โ
โโโ static files
โ โโโ readable by nginx
โ โโโ not writable by public users
โ
โโโ .env secrets
โ โโโ owner: root
โ โโโ group: myapp
โ โโโ mode: 640
โ
โโโ runtime logs/uploads
โโโ owner: myapp
โโโ controlled write accessService user checks
# Show service user in unit file
systemctl cat myapp
# Show process user
ps aux | grep gunicorn
# Check user groups
id myapp
# Check path permissions
namei -l /srv/myapp/current/.env
# Test access as service user
sudo -u myapp test -r /srv/myapp/.env && echo readableCommon ownership mistakes
| Mistake | Consequence | Better approach |
|---|---|---|
| Everything owned by root | App cannot write needed runtime files. | Use service-specific owner/group. |
| Everything owned by app user | App can modify its own code/secrets. | Separate code, secrets and runtime dirs. |
| Recursive chown on wrong path | System or app permissions broken. | Verify path with pwd and ls. |
| Using chmod instead of chown | Permissions become too broad. | Fix ownership first. |
Safety patterns: avoid destructive mistakes
The terminal is powerful because it does exactly what you ask. That also makes it dangerous. Professional terminal usage means verifying targets, backing up before edits, validating configs before restart and avoiding irreversible commands when a reversible action is possible.
| Risky action | Safer pattern | Reason |
|---|---|---|
Delete directly with rm -rf | Move to quarantine first. | Allows rollback. |
| Edit config without backup | cp -a file file.bak.DATE | Easy restore. |
| Restart service blindly | Validate config and read logs first. | Avoid making outage worse. |
| Recursive chmod/chown on broad path | Check target with pwd, ls, du. | Prevents system-wide damage. |
| Run unknown script with sudo | Download, inspect, verify, then run. | Supply-chain safety. |
| Disable SSH password auth immediately | Test SSH key in second terminal first. | Prevents lockout. |
Safe config edit workflow
# 1. Backup
sudo cp -a /etc/ssh/sshd_config \
/etc/ssh/sshd_config.bak.$(date +%Y%m%d-%H%M%S)
# 2. Edit
sudo vim /etc/ssh/sshd_config
# 3. Validate
sudo sshd -t
# 4. Restart only if valid
sudo systemctl restart ssh
# 5. Check logs
journalctl -u ssh --since "10 min ago"Dangerous command patterns
Dangerous:
sudo rm -rf /
sudo rm -rf *
sudo chown -R user:user /
sudo chmod -R 777 /
sudo chmod -R 777 /var/www
curl URL | sudo bash
sudo mv /etc /tmp
docker compose down -v
Safer:
- verify path first
- backup first
- move instead of delete
- inspect scripts
- target exact directory
- keep rollback possiblePre-flight checklist
Before destructive command:
[ ] Am I on the right server?
[ ] Am I in the right directory?
[ ] Did I list the target?
[ ] Did I check the size?
[ ] Do I have a backup?
[ ] Can I rollback?
[ ] Is the command scoped enough?
[ ] Am I using sudo unnecessarily?
[ ] Did I understand wildcard expansion?
[ ] Did I test on staging if production?Know your context
# Confirm server
hostnamectl
# Confirm user
whoami
# Confirm directory
pwd
# Confirm target
ls -lah target
# Confirm disk and space
df -h
du -sh target
# Confirm command before sudo
echo sudo rm -rf targetTerminal and permissions cheat sheet
Fundamental commands
# Navigation
pwd
ls
ls -lah
cd /path
cd ..
cd ~
cd -
# Files
cp file file.bak
cp -a dir dir.bak
mv old new
rm file
rm -i file
mkdir -p path/to/dir
touch file
# Read and search
cat file
less file
head -50 file
tail -100 file
tail -f file
grep -i "error" file
find /path -name "*.log"
# Context
whoami
id
hostnamectl
history
which commandsudo, chmod, chown cheat sheet
# sudo
sudo apt update
sudo systemctl restart nginx
sudo -l
sudo -u postgres psql
sudo visudo
# chmod
chmod 644 file
chmod 640 secret.env
chmod 755 directory
chmod 700 ~/.ssh
chmod 600 ~/.ssh/id_ed25519
chmod +x script.sh
# chown
sudo chown user file
sudo chown user:group file
sudo chown :group file
sudo chown -R user:group directory
# Diagnose permissions
ls -lah file
namei -l /path/to/file
id user
groups userFinal rule
Use BASH to navigate, inspect, modify, automate and troubleshoot. Use
sudo only when required. Use chmod for permissions, chown for ownership, and always verify the target before destructive commands.Minimal professional reflexes
[ ] I know where I am with pwd
[ ] I inspect before changing with ls -lah
[ ] I backup config files before editing
[ ] I validate configs before restart
[ ] I avoid chmod 777
[ ] I understand sudo impact
[ ] I test SSH access before hardening
[ ] I move risky files before deleting
[ ] I use logs before restarting blindly
[ ] I document production changesMaintenance and security objective
Ubuntu maintenance is the set of recurring actions that keep a system secure, stable, recoverable and understandable. It includes package updates, security patching, reboot planning, firewall control, restore points, backups, log review and incident diagnosis.
On a desktop, maintenance protects the user from data loss and broken upgrades. On a server, maintenance protects services from outages, vulnerabilities, full disks, misconfiguration and unrecoverable incidents.
| Area | Purpose | Main tools | Failure prevented |
|---|---|---|---|
| System updates | Apply fixes and security patches. | apt, Software Updater, unattended upgrades. | Known vulnerabilities, outdated packages. |
| Firewall | Limit network exposure. | ufw, security groups, router firewall. | Open services reachable from outside. |
| Restore points | Return system state after bad change. | Timeshift, snapshots. | Broken updates, bad configuration. |
| Backups | Protect personal or business data. | rsync, external disk, cloud backup, database dumps. | Data loss, disk failure, accidental deletion. |
| Logs | Understand what happened. | journalctl, /var/log, app logs. | Blind troubleshooting and repeated incidents. |
| Routine checks | Detect problems before they grow. | df, systemctl, journalctl. | Full disk, failed services, unnoticed errors. |
Maintenance architecture map
Ubuntu maintenance
โ
โโโ Updates
โ โโโ apt update
โ โโโ apt upgrade
โ โโโ security fixes
โ โโโ reboot policy
โ
โโโ Firewall
โ โโโ default deny incoming
โ โโโ allow required services
โ โโโ restrict SSH
โ โโโ review open ports
โ
โโโ Recovery
โ โโโ Timeshift restore points
โ โโโ backups
โ โโโ snapshots
โ โโโ restore testing
โ
โโโ Logs
โ โโโ journalctl
โ โโโ auth logs
โ โโโ system logs
โ โโโ app logs
โ
โโโ Routine
โโโ weekly checks
โโโ monthly cleanup
โโโ update review
โโโ documentationDesktop vs server emphasis
| Context | Priority | Example |
|---|---|---|
| Desktop | Restore points, data backup, safe updates. | Timeshift before big upgrade. |
| Server | Security patches, firewall, monitoring, backups. | Patch window and reboot plan. |
| Cloud VM | Snapshots, security groups, logs, replacement. | AMI or EBS snapshot before change. |
| Developer workstation | Tool updates, project backup, SSH keys. | Backup home and dotfiles. |
System updates: why and how to perform them
System updates fix bugs, close security vulnerabilities, improve hardware support and keep installed packages consistent with the Ubuntu release. Updates should be frequent enough to reduce exposure, but controlled enough to avoid surprise downtime on important machines.
| Command | Purpose | When to use |
|---|---|---|
sudo apt update | Refresh package metadata. | Before installing or upgrading packages. |
apt list --upgradable | Show available upgrades. | Before applying updates. |
sudo apt upgrade | Upgrade packages safely without removals. | Regular maintenance. |
sudo apt full-upgrade | Upgrade with dependency changes. | When upgrade requires installs/removals. |
sudo apt autoremove | Remove unused dependencies. | After upgrades or package removals. |
sudo apt clean | Clean package cache. | When disk cleanup is needed. |
Standard update flow
# 1. Refresh package metadata
sudo apt update
# 2. Review available upgrades
apt list --upgradable
# 3. Apply regular upgrades
sudo apt upgrade
# 4. Remove unused packages
sudo apt autoremove
# 5. Check if reboot is required
test -f /var/run/reboot-required && cat /var/run/reboot-required
# 6. Verify system state
systemctl --failed
journalctl -p warning --since "30 min ago"Update decision diagram
Updates available
โ
โโโ Desktop workstation?
โ โโโ create Timeshift snapshot if major change
โ โโโ apply updates
โ โโโ reboot if required
โ
โโโ Production server?
โ โโโ review packages
โ โโโ confirm backup/snapshot
โ โโโ test staging if critical
โ โโโ schedule maintenance window
โ โโโ apply and verify
โ
โโโ Cloud VM?
โโโ snapshot or image
โโโ patch
โโโ reboot if required
โโโ validate health checksReboot-required checks
# Check if reboot is required
test -f /var/run/reboot-required && cat /var/run/reboot-required || echo "no reboot flag"
# Show packages that requested reboot if available
cat /var/run/reboot-required.pkgs 2>/dev/null
# Current kernel
uname -a
# Boot time
uptime
last reboot | headGraphical update path
Ubuntu Desktop
โ
โโโ Open Software Updater
โโโ Review proposed updates
โโโ Install updates
โโโ Reboot if requested
โโโ Verify desktop and main apps/var/run/reboot-required after maintenance.Update strategy: safe patching, unattended upgrades and rollback
A good update strategy balances speed and safety. Security updates should not be postponed indefinitely, but critical systems need backups, staging tests and rollback paths. The more important the machine, the more controlled the update process must be.
| Strategy | Best for | Strength | Risk |
|---|---|---|---|
| Manual updates | Personal desktop, small servers. | Human review before changes. | Can be forgotten. |
| Unattended security upgrades | Standard servers. | Faster security patching. | Needs reboot policy. |
| Scheduled patch window | Production systems. | Predictable maintenance. | Emergency patches still need fast track. |
| Snapshot before update | Desktop, VM, cloud instances. | Rollback-friendly. | Snapshot does not replace data backup. |
| Blue/green replacement | Cloud application servers. | Safer than in-place update. | Requires automation. |
Unattended upgrades
# Install unattended upgrades
sudo apt update
sudo apt install unattended-upgrades
# Enable basic automatic security updates
sudo dpkg-reconfigure unattended-upgrades
# Main configuration files
/etc/apt/apt.conf.d/20auto-upgrades
/etc/apt/apt.conf.d/50unattended-upgrades
# Logs
sudo less /var/log/unattended-upgrades/unattended-upgrades.logSafe production patch workflow
Patch workflow
โ
โโโ Inventory
โ โโโ OS version
โ โโโ kernel version
โ โโโ critical packages
โ โโโ running services
โ
โโโ Protect
โ โโโ backup
โ โโโ snapshot
โ โโโ Timeshift on desktop
โ โโโ rollback plan
โ
โโโ Apply
โ โโโ apt update
โ โโโ review upgrades
โ โโโ apt upgrade
โ โโโ reboot if required
โ
โโโ Verify
โโโ systemctl --failed
โโโ logs
โโโ ports
โโโ app health
โโโ user validationPost-update validation
# Failed services
systemctl --failed
# Recent warnings
journalctl -p warning --since "30 min ago"
# Listening ports
ss -lntp
# Disk and memory
df -h
free -h
# Web smoke test
curl -I http://localhost
# Package history
less /var/log/apt/history.logUpdate risk matrix
| Update type | Risk | Control |
|---|---|---|
| Kernel | Requires reboot, driver risk. | Snapshot and reboot window. |
| OpenSSL / libc | Service restart may be needed. | Restart affected services. |
| Database packages | Service compatibility. | Backup and staging test. |
| Nginx / SSH | Access or web outage if config breaks. | Validate config before restart. |
Basic security with UFW firewall
UFW is Ubuntuโs simple firewall interface. It helps expose only the ports required by the machine. A safe default is to deny incoming traffic, allow outgoing traffic, then explicitly allow SSH, web traffic or other required services.
| Port | Service | Typical exposure | Comment |
|---|---|---|---|
22/tcp | SSH | Restricted source IP if possible. | Administration access. |
80/tcp | HTTP | Public only for web server or redirect. | Often redirects to HTTPS. |
443/tcp | HTTPS | Public for web application. | Main public web port. |
3306/tcp | MySQL / MariaDB | Private only. | Never expose casually. |
5432/tcp | PostgreSQL | Private only. | Restrict to app server. |
6379/tcp | Redis | Private only. | Should not be public. |
UFW baseline
# Check current firewall status
sudo ufw status verbose
# Default policies
sudo ufw default deny incoming
sudo ufw default allow outgoing
# Allow SSH before enabling firewall
sudo ufw allow OpenSSH
# Allow web traffic if needed
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
# Enable firewall
sudo ufw enable
# Verify rules
sudo ufw status verbose
sudo ufw status numberedFirewall decision diagram
New service installed
โ
โโโ Does it need network access?
โ โโโ no -> keep local only
โ โโโ yes
โ
โโโ Should it be public?
โ โโโ yes -> open exact required port
โ โโโ no
โ
โโโ Should it be private?
โ โโโ yes -> restrict by source IP or subnet
โ โโโ no
โ
โโโ Is rule documented?
โโโ yes -> apply rule
โโโ no -> do not expose yetRestrict by source
# Allow SSH from one admin IP
sudo ufw allow from 203.0.113.10 to any port 22 proto tcp
# Allow PostgreSQL from one app server
sudo ufw allow from 10.0.1.25 to any port 5432 proto tcp
# Delete a numbered rule
sudo ufw status numbered
sudo ufw delete 3
# Deny a specific IP
sudo ufw deny from 198.51.100.44UFW troubleshooting
# Firewall status
sudo ufw status verbose
# Listening ports
ss -lntp
# Local service test
curl -I http://localhost
# Kernel firewall logs if enabled
sudo ufw logging on
sudo journalctl -k --since "30 min ago" | grep UFWTimeshift: system restore points for safer changes
Timeshift creates system restore points. It is useful on Ubuntu Desktop and some workstation scenarios before major updates, driver changes, package experiments or risky configuration changes. It is not a full personal-data backup solution by itself: it mainly protects system state.
| Timeshift concept | Meaning | Operational note |
|---|---|---|
| Snapshot | Restore point of system files. | Useful before upgrades. |
| RSYNC mode | File-based snapshot mode. | Works on common filesystems. |
| BTRFS mode | Filesystem snapshot mode. | Requires BTRFS layout. |
| Schedule | Automatic snapshot frequency. | Daily, weekly, monthly policies. |
| Restore | Return system to previous state. | Can recover from bad update or config. |
| Exclusions | Paths not included. | Understand home/data behavior. |
Install Timeshift
# Install Timeshift
sudo apt update
sudo apt install timeshift
# Launch graphical interface
sudo timeshift-gtk
# CLI help
timeshift --help
# List snapshots
sudo timeshift --listTimeshift workflow
Before risky change
โ
โโโ Open Timeshift
โโโ Create snapshot
โโโ Name or comment the snapshot
โโโ Apply update or configuration change
โโโ Reboot if required
โโโ Verify system works
โโโ Keep or delete snapshot laterWhen to create a snapshot
Create a Timeshift snapshot before:
- major system update
- release upgrade
- driver installation
- desktop environment change
- kernel experiment
- repository or PPA experiment
- risky configuration edit
- important package removalTimeshift vs backup
| Need | Timeshift | Data backup |
|---|---|---|
| Restore broken system update | Excellent. | Not primary role. |
| Recover deleted personal file | Not always sufficient. | Best tool. |
| Recover from disk failure | Only if snapshot stored elsewhere. | Required. |
| Recover database state | Not ideal. | Use database backup. |
Backup model: system restore, personal data and server data
A complete protection strategy separates system restore from data backup. Timeshift can help restore the OS state. Personal files, project folders, databases, uploads, secrets and configuration must also be backed up separately.
| Data type | Recommended protection | Example path |
|---|---|---|
| System files | Timeshift or VM snapshot. | /etc, packages, system state. |
| Personal files | File backup to external disk or cloud. | /home/user/Documents |
| Project code | Git remote and file backup. | /home/user/projects |
| Databases | Database-native dump and volume backup. | PostgreSQL, MySQL, MariaDB. |
| Application uploads | File backup with retention. | /srv/app/media |
| Secrets | Secure secret backup or vault. | .env, keys, certificates. |
Simple rsync backup example
# Backup home directory to external disk
rsync -aHAX --info=progress2 \
/home/user/ \
/media/user/backup/home-user/
# Backup project directory
rsync -a --delete \
/srv/myapp/ \
/backup/myapp/
# Dry run first
rsync -a --dry-run /source/ /destination/Backup strategy diagram
Protection strategy
โ
โโโ System restore
โ โโโ Timeshift
โ โโโ VM snapshot
โ โโโ cloud image
โ
โโโ Data backup
โ โโโ documents
โ โโโ projects
โ โโโ uploads
โ โโโ databases
โ
โโโ Configuration backup
โ โโโ /etc
โ โโโ service units
โ โโโ nginx configs
โ โโโ SSH configs
โ
โโโ Restore test
โโโ can files be restored?
โโโ can database be restored?
โโโ can server boot?
โโโ is procedure documented?Database backup examples
# PostgreSQL dump
pg_dump -U app_user -h localhost app_db > app_db.sql
# PostgreSQL compressed dump
pg_dump -U app_user -h localhost app_db | gzip > app_db.sql.gz
# MySQL / MariaDB dump
mysqldump -u app_user -p app_db > app_db.sql
# MySQL / MariaDB compressed dump
mysqldump -u app_user -p app_db | gzip > app_db.sql.gzBackup quality checklist
[ ] Backup is automatic
[ ] Backup includes data, not only system files
[ ] Backup destination is separate from source disk
[ ] Backup has retention policy
[ ] Backup is encrypted if sensitive
[ ] Restore has been tested
[ ] Database backups are consistent
[ ] Secrets are protected
[ ] Backup logs are reviewed
[ ] Owner and schedule are documentedLog management: reading system journals when problems occur
Logs are the first source of truth when Ubuntu behaves unexpectedly. They show service failures, authentication attempts, package operations, kernel events, disk errors, network issues and application errors.
| Log source | Contains | Command |
|---|---|---|
| systemd journal | Service and system events. | journalctl |
| Service logs | One daemon timeline. | journalctl -u SERVICE |
| Kernel logs | OOM, disk, driver, hardware events. | journalctl -k |
| Authentication logs | SSH, sudo, login attempts. | /var/log/auth.log |
| System log | General system messages. | /var/log/syslog |
| APT logs | Package updates and installs. | /var/log/apt/history.log |
journalctl essentials
# Recent errors and context
journalctl -xe
# Current boot logs
journalctl -b
# Previous boot logs
journalctl -b -1
# Service logs
journalctl -u nginx
# Service logs with time window
journalctl -u nginx --since "1 hour ago"
# Follow service logs live
journalctl -u nginx -f
# Warnings and errors
journalctl -p warning --since today
# Kernel logs
journalctl -k --since todayClassic log commands
# System log
sudo tail -200 /var/log/syslog
# Authentication log
sudo tail -200 /var/log/auth.log
# APT history
less /var/log/apt/history.log
# Search errors
grep -i "error" /var/log/syslog
# Search failed SSH attempts
sudo grep -i "failed password" /var/log/auth.log | tail -100
# Search sudo usage
sudo grep -i "sudo" /var/log/auth.log | tail -100
# Search OOM events
journalctl -k --since today | grep -i -E "oom|killed process"Log reading workflow
Problem detected
โ
โโโ Identify time window
โโโ Check failed services
โโโ Read service journal
โโโ Read system warnings
โโโ Read kernel logs
โโโ Check auth logs if access issue
โโโ Check apt history if after update
โโโ Find first meaningful errorJournal size control
# Show journal disk usage
journalctl --disk-usage
# Keep only last 14 days
sudo journalctl --vacuum-time=14d
# Keep journal under 1 GB
sudo journalctl --vacuum-size=1G--since "30 min ago" is often more useful than reading thousands of old lines.Troubleshooting maintenance problems
Maintenance can fail: updates may be interrupted, repositories may break, firewall rules may block access, Timeshift snapshots may fill disk space, logs may grow, or services may fail after a package upgrade. Diagnose from the exact symptom.
| Symptom | Likely cause | First command | Fix direction |
|---|---|---|---|
| APT locked | Another package process running. | ps aux | grep -E 'apt|dpkg' | Wait or investigate process. |
| Broken packages | Interrupted install. | sudo dpkg --configure -a | Repair package state. |
| No network after UFW | Required port blocked. | sudo ufw status numbered | Allow required rule or rollback. |
| SSH locked out | Firewall or SSH config error. | Console access, UFW and SSH status. | Restore SSH path safely. |
| Disk full | Logs, snapshots, cache, Docker. | df -h, du -sh | Clean safely and add retention. |
| Service failed after update | Config change or dependency issue. | systemctl status SERVICE | Read logs, rollback or fix config. |
APT repair commands
# Finish interrupted package configuration
sudo dpkg --configure -a
# Fix broken dependencies
sudo apt -f install
# Refresh metadata
sudo apt update
# Clean package cache
sudo apt clean
# Remove unused dependencies
sudo apt autoremove
# Review update history
less /var/log/apt/history.log
less /var/log/apt/term.logMaintenance failure decision tree
Maintenance issue
โ
โโโ Package manager error?
โ โโโ lock -> check apt/dpkg process
โ โโโ broken -> dpkg --configure -a
โ โโโ repo -> inspect apt sources
โ
โโโ Firewall issue?
โ โโโ check UFW rules
โ โโโ verify SSH rule
โ โโโ test required ports
โ
โโโ Disk issue?
โ โโโ check df -h
โ โโโ check logs
โ โโโ check snapshots
โ โโโ clean safely
โ
โโโ Service issue?
โ โโโ systemctl status
โ โโโ journalctl -u service
โ โโโ validate config
โ
โโโ Bad update?
โโโ use Timeshift if desktop
โโโ use snapshot if VM
โโโ rollback package or configDisk cleanup for maintenance
# Disk usage
df -h
# Large top-level directories
sudo du -xhd1 / 2>/dev/null | sort -h
# Journal usage and cleanup
journalctl --disk-usage
sudo journalctl --vacuum-time=14d
# APT cleanup
sudo apt clean
sudo apt autoremove
# Timeshift snapshots
sudo timeshift --listMaintenance routine: daily, weekly, monthly and before major changes
A simple routine prevents many incidents. The goal is not to spend hours every day, but to maintain visibility: update status, disk usage, failed services, logs, backup state and restore readiness.
| Frequency | Actions | Commands / tools |
|---|---|---|
| Daily | Check failed services and critical alerts. | systemctl --failed, monitoring. |
| Weekly | Review updates, disk usage and warnings. | apt list --upgradable, df -h. |
| Monthly | Apply updates, reboot if needed, verify backups. | apt upgrade, backup logs. |
| Before major change | Create restore point or snapshot. | Timeshift, VM snapshot, cloud snapshot. |
| After incident | Review logs and add prevention. | journalctl, runbook update. |
Weekly maintenance command block
echo "== SYSTEM =="
hostnamectl
uptime
echo "== UPDATES =="
sudo apt update
apt list --upgradable
echo "== DISK =="
df -h
echo "== FAILED SERVICES =="
systemctl --failed
echo "== WARNINGS TODAY =="
journalctl -p warning --since today --no-pager | tail -100
echo "== REBOOT REQUIRED =="
test -f /var/run/reboot-required && cat /var/run/reboot-required || echo "no reboot flag"Maintenance calendar
Daily
โโโ monitor alerts
โโโ failed services
โโโ backup success
Weekly
โโโ package update review
โโโ disk space review
โโโ log warnings review
โโโ firewall exposure review
Monthly
โโโ apply updates
โโโ reboot if required
โโโ test restore sample
โโโ cleanup old logs/snapshots
โโโ review users and sudo
Before major upgrade
โโโ backup data
โโโ Timeshift or VM snapshot
โโโ record current version
โโโ apply change
โโโ verify and documentServer maintenance record
Maintenance record:
- date
- hostname
- Ubuntu version
- packages updated
- reboot required
- reboot performed
- services checked
- disk usage
- backup status
- warnings found
- actions taken
- rollback point
- operatorFinal maintenance and security checklist
Maintenance checklist
[ ] Ubuntu LTS version is known
[ ] Package updates are reviewed regularly
[ ] Security updates are applied
[ ] Reboot-required status is checked
[ ] Reboot window exists for servers
[ ] Failed services are checked
[ ] Disk usage is monitored
[ ] Journal size is controlled
[ ] APT history is reviewed after updates
[ ] Timeshift is configured on desktop/workstation
[ ] Restore point is created before major changes
[ ] Data backup exists
[ ] Restore has been tested
[ ] Logs are readable
[ ] Maintenance actions are documentedSecurity checklist
[ ] UFW is enabled when appropriate
[ ] Default incoming policy is deny
[ ] Only required ports are open
[ ] SSH is protected
[ ] SSH source is restricted if possible
[ ] Root SSH login is disabled
[ ] Password SSH is disabled after key test
[ ] Users and sudo group are reviewed
[ ] Secrets are not world-readable
[ ] Backups are protected
[ ] Firewall rules are documented
[ ] Logs are reviewed after suspicious activityCommand cheat sheet
# Updates
sudo apt update
apt list --upgradable
sudo apt upgrade
sudo apt autoremove
sudo apt clean
test -f /var/run/reboot-required && cat /var/run/reboot-required
# Firewall
sudo ufw status verbose
sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw allow OpenSSH
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw enable
# Timeshift
sudo apt install timeshift
sudo timeshift-gtk
sudo timeshift --list
# Logs
journalctl -xe
journalctl -p warning --since today
journalctl -u SERVICE --since "1 hour ago"
journalctl -k --since today
sudo tail -100 /var/log/auth.log
less /var/log/apt/history.log
# Health
systemctl --failed
df -h
free -h
ss -lntpFinal rule
Apply updates with a rollback plan, restrict network exposure with UFW, create restore points before risky changes, back up real data, read logs when problems occur, and keep a repeatable maintenance routine.
Minimal safe maintenance profile
Minimum safe profile:
- updates applied regularly
- reboot-required checked
- UFW configured
- SSH protected
- Timeshift or snapshot before major changes
- real data backup
- restore tested
- logs reviewed
- failed services checked
- disk usage monitored
- maintenance documentedInstallation scope
Ubuntu installation depends on the target: desktop workstation, production server, cloud VM, container host, lab machine or hardened bastion. The installation itself is only the first step. A clean Ubuntu setup also includes users, SSH, updates, firewall, time sync, storage layout, service baseline, logs, monitoring and backup strategy.
The professional approach is to install with a clear target architecture: what the machine will host, how it will be accessed, how it will be patched, how it will be monitored, and how it can be rebuilt.
| Target | Installer | Key choices | Post-install priority |
|---|---|---|---|
| Desktop workstation | Ubuntu Desktop ISO | GUI, disk encryption, drivers, developer tools. | Updates, IDE, Docker, SSH keys, backups. |
| Server VM | Ubuntu Server ISO or cloud image | SSH, LVM, static IP, no GUI, minimal packages. | Hardening, firewall, systemd, monitoring. |
| Cloud server | Cloud image | cloud-init, SSH key, security group, disk size. | Bootstrap automation, logging, backup, patching. |
| Database server | Server ISO or image | Disk layout, filesystem, I/O, backup volume. | Storage monitoring, backup, security, tuning. |
| Container host | Server LTS | Disk for Docker, cgroups, kernel, network. | Docker, log rotation, registry access, metrics. |
Installation flow map
Installation workflow
โ
โโโ Choose target
โ โโโ desktop
โ โโโ server
โ โโโ cloud VM
โ โโโ container host
โ
โโโ Prepare media
โ โโโ download ISO
โ โโโ verify checksum
โ โโโ create USB key
โ โโโ boot in UEFI mode
โ
โโโ Install system
โ โโโ language and keyboard
โ โโโ network
โ โโโ disk layout
โ โโโ user account
โ โโโ SSH server
โ โโโ base packages
โ
โโโ Post-install
โโโ update packages
โโโ harden SSH
โโโ configure firewall
โโโ enable monitoring
โโโ configure backups
โโโ document serverOfficial URLs
Ubuntu downloads:
https://ubuntu.com/download
Ubuntu Server documentation:
https://documentation.ubuntu.com/server/
Ubuntu Desktop documentation:
https://documentation.ubuntu.com/desktop/
Ubuntu release images:
https://releases.ubuntu.com/
Ubuntu cloud images:
https://cloud-images.ubuntu.com/
cloud-init documentation:
https://cloudinit.readthedocs.io/Ubuntu Desktop installation
Ubuntu Desktop installation is designed for workstations: developers, engineers, analysts and general desktop users. The main choices are language, keyboard, network, installation type, disk encryption, user account and optional third-party drivers.
Desktop install path
1. Download Ubuntu Desktop ISO
2. Verify checksum if required
3. Create bootable USB key
4. Boot in UEFI mode
5. Select language and keyboard
6. Connect to network
7. Choose normal or minimal install
8. Enable third-party drivers if needed
9. Choose disk layout
10. Enable encryption if laptop or sensitive data
11. Create admin user
12. Install and reboot
13. Remove USB key
14. Run updates
15. Install development tools| Choice | Recommended option | Reason |
|---|---|---|
| Release | LTS for stable workstation. | Less upgrade pressure. |
| Install type | Normal for general use, minimal for clean dev setup. | Controls preinstalled apps. |
| Disk encryption | Yes on laptop. | Protects data if machine is lost. |
| Third-party drivers | Enable if NVIDIA or Wi-Fi requires it. | Improves hardware compatibility. |
| Partitioning | Automatic unless dual boot or advanced layout. | Simple and safe for most users. |
Desktop post-install developer baseline
# Update system
sudo apt update
sudo apt upgrade
# Install useful tools
sudo apt install curl wget git vim htop tree unzip ca-certificates
# Install build basics
sudo apt install build-essential pkg-config
# Install Python essentials
sudo apt install python3 python3-venv python3-pip
# Check version
lsb_release -a
uname -aDeveloper workstation map
Ubuntu Desktop
โ
โโโ Terminal
โโโ Git
โโโ Python / Node / Java / Go
โโโ Docker Desktop or Docker Engine
โโโ IDE
โโโ SSH keys
โโโ browser dev tools
โโโ cloud CLIs
โโโ VPN / security toolingUbuntu Server installation
Ubuntu Server installation is usually text-based and focused on production readiness: network, storage, user account, SSH, package selection and minimal attack surface. A server should normally be installed without a desktop environment.
Server install path
1. Download Ubuntu Server ISO
2. Boot in UEFI mode
3. Select language and keyboard
4. Configure network
- DHCP for simple cases
- static IP for fixed infrastructure
5. Configure proxy if needed
6. Configure apt mirror
7. Choose disk layout
- guided LVM for most servers
- manual for advanced storage
8. Create admin user
9. Install OpenSSH server
10. Import SSH key if available
11. Select minimal server packages
12. Install bootloader
13. Reboot
14. Connect by SSH
15. Run post-install baseline| Server choice | Recommendation | Why |
|---|---|---|
| GUI | No GUI on production server. | Lower resource usage and smaller attack surface. |
| SSH | Install OpenSSH during setup. | Remote administration required. |
| User | Named sudo user. | Avoid direct root workflow. |
| Disk | LVM for flexible servers. | Easier resizing and volume management. |
| Packages | Minimal baseline. | Install only what is needed. |
Server install architecture
Bare metal or VM
โ
โผ
Ubuntu Server installer
โ
โโโ network setup
โโโ disk layout
โโโ user creation
โโโ SSH setup
โโโ package baseline
โโโ bootloader
โ
โผ
First boot
โ
โโโ SSH login
โโโ update packages
โโโ harden access
โโโ configure firewall
โโโ install services
โโโ enable monitoringFirst server commands
# Update package index and upgrade
sudo apt update
sudo apt upgrade
# Install baseline tools
sudo apt install curl wget vim git htop tree unzip net-tools dnsutils
# Check services
systemctl --failed
systemctl status ssh
# Check network
ip a
ip r
ss -lntp
# Check storage
lsblk
df -hUEFI, BIOS, boot media and installation verification
Modern Ubuntu installations should normally boot in UEFI mode. UEFI affects the boot partition, bootloader installation and compatibility with Secure Boot. If the USB key is booted in legacy BIOS mode, the final installation may not match the target firmware configuration.
| Boot concept | Meaning | Practical rule |
|---|---|---|
| UEFI | Modern firmware boot mode. | Preferred for new machines and servers. |
| Legacy BIOS | Older boot mode. | Use only if hardware requires it. |
| ESP | EFI System Partition. | Required for UEFI boot. |
| Secure Boot | Firmware validation of boot chain. | Usually supported, but test with custom drivers. |
| Boot order | Firmware decides which disk or USB boots first. | Verify after installation. |
Boot media preparation
Recommended flow:
1. Download ISO from official source
2. Verify ISO checksum if required
3. Write USB with Rufus, Balena Etcher or dd
4. Boot USB in UEFI mode
5. Install Ubuntu
6. Reboot without USB key
7. Confirm system boots from target diskUEFI disk layout sketch
Disk /dev/sda
โ
โโโ EFI System Partition
โ โโโ size: 512 MB to 1 GB
โ โโโ filesystem: FAT32
โ โโโ mount: /boot/efi
โ
โโโ /boot
โ โโโ optional separate partition
โ โโโ kernel and initramfs
โ
โโโ LVM physical volume
โ โโโ root volume /
โ โโโ var volume /var
โ โโโ home volume /home
โ โโโ swap volume or swapfile
โ
โโโ free space or data volumesBoot verification commands
# Check if system booted in UEFI mode
test -d /sys/firmware/efi && echo "UEFI boot" || echo "Legacy boot"
# Show block devices
lsblk -f
# Show EFI boot entries
sudo efibootmgr -v
# Show mounted filesystems
findmnt
# Show boot partition
findmnt /boot/efiDisk layout, partitions, LVM, encryption and swap
Disk layout should reflect the server role. A laptop usually benefits from full-disk encryption. A server often benefits from LVM. A database server needs careful storage planning. A Docker host needs enough space under /var/lib/docker.
| Pattern | Best for | Strength | Watch out |
|---|---|---|---|
| Automatic layout | Desktop, lab, simple VM. | Fast and low-risk. | Less control over growth areas. |
| LVM | Servers and VMs. | Flexible resizing and volume management. | Requires basic LVM knowledge. |
| Encrypted disk | Laptops and sensitive systems. | Protects data at rest. | Remote boot can be harder. |
| Separate /var | Servers with logs, caches, Docker. | Protects root filesystem from log growth. | Size must be planned. |
| Separate data volume | Database and application data. | Cleaner backup and scaling. | Mount and permission discipline required. |
Example server layout
Small web server:
- /boot/efi 512 MB to 1 GB
- / 30 GB to 50 GB
- /var 20 GB to 100 GB
- /home optional
- swap swapfile or LV
- /srv application data if needed
Docker host:
- / 30 GB to 50 GB
- /var large volume
- /var/lib/docker on dedicated volume if possible
Database host:
- / 30 GB to 50 GB
- /var/log separate or monitored
- /data dedicated fast volume
- /backup separate volume or external storageDisk decision tree
Is it a laptop?
โโโ yes -> enable disk encryption
โโโ no
โ
โผ
Is it a production server?
โโโ yes -> prefer LVM or cloud volume strategy
โโโ no -> automatic layout is acceptable
Will logs, Docker or DB grow?
โโโ yes -> separate /var or data volume
โโโ no -> simple root filesystem
Need easy snapshot/resize?
โโโ yes -> LVM or cloud block volumes
โโโ no -> simple partitioningStorage inspection commands
# Show disks and partitions
lsblk
# Show filesystems
lsblk -f
# Show disk usage
df -h
# Show directory usage
sudo du -sh /var/*
# Show mounts
findmnt
# Show LVM volumes
sudo pvs
sudo vgs
sudo lvs
# Show swap
swapon --show
free -h/var fills up, logs, Docker, package installs, databases and services can fail. Monitor disk usage from day one.Network, SSH and remote access baseline
Server installation must make remote access reliable and safe. The minimum baseline is: one named sudo user, SSH key access, password authentication disabled where possible, root login disabled, firewall enabled and only required ports opened.
| Area | Baseline | Reason |
|---|---|---|
| Admin user | Named user with sudo rights. | Audit and safer administration. |
| SSH keys | Key-based access. | Stronger than passwords. |
| Root login | Disabled. | Reduces brute-force and blast radius. |
| Password auth | Disabled after key validation. | Reduces attack surface. |
| Firewall | Default deny incoming. | Only expose required services. |
| Network config | DHCP for simple cases, static for infrastructure. | Predictable access. |
SSH hardening example
# Backup SSH config
sudo cp /etc/ssh/sshd_config /etc/ssh/sshd_config.bak
# Edit SSH config
sudo vim /etc/ssh/sshd_config
# Recommended directives
PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes
AllowUsers deploy
# Validate and restart
sudo sshd -t
sudo systemctl restart sshFirewall baseline
# Enable UFW
sudo ufw default deny incoming
sudo ufw default allow outgoing
# Allow SSH
sudo ufw allow OpenSSH
# Web server example
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
# Enable firewall
sudo ufw enable
# Check status
sudo ufw status verboseNetwork diagnostic commands
# IP addresses
ip a
# Routes
ip r
# DNS status
resolvectl status
# Listening ports
ss -lntp
# Test local service
curl -I http://localhost
# Test remote host
ping -c 3 8.8.8.8
# Trace path
tracepath ubuntu.comcloud-init for automated server bootstrap
cloud-init is the standard way to initialize Ubuntu cloud images. It can create users, install packages, add SSH keys, write files, run commands, configure timezone and prepare the machine during first boot.
| cloud-init feature | Usage | Example |
|---|---|---|
| users | Create admin users. | deploy user with sudo. |
| ssh_authorized_keys | Install public keys. | Key-based access from first boot. |
| packages | Install baseline packages. | curl, git, htop, nginx. |
| write_files | Create config files. | systemd unit, app config, banner. |
| runcmd | Run final bootstrap commands. | enable firewall, restart service. |
| package_update | Refresh apt cache. | Update before package install. |
Minimal cloud-init example
#cloud-config
package_update: true
package_upgrade: true
users:
- name: deploy
groups: sudo
shell: /bin/bash
sudo: ['ALL=(ALL) NOPASSWD:ALL']
ssh_authorized_keys:
- ssh-ed25519 AAAA_REPLACE_WITH_PUBLIC_KEY deploy-key
packages:
- curl
- wget
- git
- vim
- htop
- ufw
runcmd:
- ufw allow OpenSSH
- ufw --force enable
- timedatectl set-timezone UTCcloud-init lifecycle
Cloud VM first boot
โ
โผ
cloud-init starts
โ
โโโ reads metadata
โโโ reads user-data
โโโ configures hostname
โโโ creates users
โโโ installs SSH keys
โโโ installs packages
โโโ writes files
โโโ runs commands
โโโ marks initialization done
โ
โผ
Server ready for automation
โโโ Ansible
โโโ deploy pipeline
โโโ monitoring
โโโ application installcloud-init diagnostics
# Show cloud-init status
cloud-init status
# Wait until finished
cloud-init status --wait
# Inspect logs
sudo less /var/log/cloud-init.log
sudo less /var/log/cloud-init-output.log
# Validate config file if tool is available
cloud-init schema --config-file user-data.yaml
# Re-run is not trivial on production
# Prefer rebuilding disposable cloud instancesClean post-install baseline
Post-install is where a raw Ubuntu machine becomes a clean operating platform. The goal is to make the system secure, updated, observable, recoverable and ready for application deployment.
Post-install baseline commands
# Update system
sudo apt update
sudo apt upgrade
# Install useful admin tools
sudo apt install curl wget vim git htop tree unzip ca-certificates dnsutils
# Set timezone
timedatectl
sudo timedatectl set-timezone UTC
# Check failed units
systemctl --failed
# Check logs
journalctl -p warning --since today
# Check reboot requirement
test -f /var/run/reboot-required && cat /var/run/reboot-required| Post-install action | Command / file | Why |
|---|---|---|
| Update packages | apt update && apt upgrade | Apply latest security fixes. |
| Create admin user | adduser, usermod -aG sudo | Avoid root workflow. |
| Harden SSH | /etc/ssh/sshd_config | Reduce remote access risk. |
| Enable firewall | ufw | Expose only required ports. |
| Configure time | timedatectl | Correct logs and certificates. |
| Install monitoring | agent or exporter | Detect issues early. |
Clean server baseline
Fresh Ubuntu Server
โ
โโโ system update
โโโ admin user
โโโ SSH key access
โโโ root login disabled
โโโ password auth disabled
โโโ firewall enabled
โโโ timezone configured
โโโ monitoring installed
โโโ log policy checked
โโโ backups configured
โโโ service manager ready
โโโ runbook documentedServer documentation template
Server record:
- hostname
- Ubuntu version
- kernel version
- role
- owner
- public IP
- private IP
- SSH port
- open firewall ports
- installed services
- data volumes
- backup policy
- monitoring URL
- patching window
- rollback method
- emergency contactInstallation troubleshooting
| Problem | Likely cause | First diagnostic | Correction |
|---|---|---|---|
| USB does not boot | Bad USB image, wrong boot mode, firmware order. | Check UEFI boot menu. | Rewrite USB, select UEFI USB entry. |
| Installed system does not boot | Bootloader installed in wrong mode or disk. | Check UEFI entries. | Repair bootloader or reinstall in correct mode. |
| No network during install | Driver, cable, DHCP, VLAN, Wi-Fi issue. | Check link and IP. | Use wired network or configure static IP. |
| Cannot SSH after install | SSH not installed, firewall, wrong IP, bad key. | Console login and systemctl status ssh. | Install SSH, fix firewall, verify key. |
| Disk full after install | Small root, logs, Docker, wrong partition plan. | df -h, du -sh. | Resize volume, clean logs, separate /var. |
| Package install fails | Broken apt state, no DNS, mirror issue. | apt update, DNS check. | Fix DNS, mirror, dpkg configure. |
Diagnostic decision tree
Fresh server problem
โ
โโโ Does it boot?
โ โโโ no -> UEFI, bootloader, disk
โ โโโ yes
โ
โโโ Does it have network?
โ โโโ no -> IP, route, DNS, driver
โ โโโ yes
โ
โโโ Can you SSH?
โ โโโ no -> ssh service, firewall, key, IP
โ โโโ yes
โ
โโโ Are packages working?
โ โโโ no -> DNS, apt mirror, dpkg lock
โ โโโ yes
โ
โโโ Are baseline services healthy?
โโโ no -> systemctl and journalctl
โโโ yes -> server readyEmergency commands
# SSH service
sudo systemctl status ssh
sudo systemctl restart ssh
# Firewall
sudo ufw status verbose
# Network
ip a
ip r
resolvectl status
# Package repair
sudo dpkg --configure -a
sudo apt -f install
sudo apt update
# Logs
journalctl -p err --since today
dmesg -T | tail -100Final installation checklist
Before install
[ ] Target role is defined
[ ] Ubuntu edition selected
[ ] LTS version selected
[ ] ISO downloaded from official source
[ ] Checksum verified if required
[ ] Boot media created
[ ] UEFI mode confirmed
[ ] Disk layout planned
[ ] Static IP or DHCP decision made
[ ] Hostname chosen
[ ] Admin user chosen
[ ] SSH key available
[ ] Backup of existing data done
[ ] Rollback plan exists if replacing serverDuring install
[ ] Correct disk selected
[ ] EFI partition created if UEFI
[ ] LVM selected if server needs flexibility
[ ] Encryption enabled if needed
[ ] OpenSSH server installed
[ ] Admin user created
[ ] Network works
[ ] Bootloader installed correctly
[ ] Machine reboots without USBAfter install
[ ] System updated
[ ] Reboot performed if required
[ ] SSH key login tested
[ ] Root SSH login disabled
[ ] Password SSH disabled after key validation
[ ] Firewall enabled
[ ] Only required ports open
[ ] Timezone and time sync configured
[ ] Hostname correct
[ ] Disk usage checked
[ ] Failed systemd units checked
[ ] Monitoring installed
[ ] Backup configured
[ ] Server documented
[ ] Snapshot or image created if neededFinal rule
The machine should boot correctly, be reachable through controlled SSH, have a clear disk layout, expose only required ports, receive updates, produce usable logs, be monitored, be backed up and be documented.
Minimal safe server baseline
Minimum safe server:
- Ubuntu Server LTS
- named sudo user
- SSH key access
- root login disabled
- firewall enabled
- system updated
- timezone configured
- disk monitored
- logs accessible
- backup and rollback plan
- server record documentedWhat โUbuntu CLI basicsโ means
The Ubuntu command line is the operational control layer of a Linux server. It is used to inspect files, manage users, control services, read logs, diagnose network problems, check storage, install packages, secure access and troubleshoot production incidents.
A good sysadmin workflow is not memorizing thousands of commands. It is knowing which subsystem to inspect first: files, permissions, users, service manager, logs, network, storage, packages or security.
| Area | Purpose | Main tools | Typical question |
|---|---|---|---|
| Files | Navigate, copy, move, inspect, search. | ls, cp, mv, find, du | Where is the file? How large is it? |
| Permissions | Control who can read, write or execute. | chmod, chown, umask, stat | Why can this process not access this file? |
| Users | Create accounts, groups and sudo rights. | adduser, usermod, id, sudo | Who can administer this machine? |
| Services | Start, stop, enable and debug daemons. | systemctl, journalctl | Is Nginx, SSH, Redis or PostgreSQL running? |
| Logs | Understand what happened. | journalctl, tail, grep | What error occurred and when? |
| Network | Inspect IP, routes, DNS, ports, sockets. | ip, ss, curl, dig | Can the server reach or expose the service? |
| Storage | Inspect disks, mounts, free space, I/O. | df, du, lsblk, findmnt | Is the disk full or mounted correctly? |
CLI diagnostic mental model
Problem on Ubuntu
โ
โโโ Is the file present?
โ โโโ ls, find, stat
โ
โโโ Are permissions correct?
โ โโโ ls -l, chmod, chown, id
โ
โโโ Is the service running?
โ โโโ systemctl status
โ
โโโ What do logs say?
โ โโโ journalctl, tail, grep
โ
โโโ Is the port listening?
โ โโโ ss -lntp
โ
โโโ Is the network path OK?
โ โโโ ip, ping, curl, dig
โ
โโโ Is storage full?
โ โโโ df, du, lsblk
โ
โโโ Did something recently change?
โโโ apt history, logs, config diffFirst 60 seconds on a server
hostnamectl
uptime
who
df -h
free -h
systemctl --failed
ss -lntp
journalctl -p warning --since "30 min ago"Files and directories: navigate, inspect, copy, search
Most Ubuntu administration starts with files: configuration files, logs, service units, application folders, SSH keys, certificates, scripts, backups and data directories.
| Command | Usage | Example |
|---|---|---|
pwd | Show current directory. | pwd |
ls -lah | List files with details and hidden files. | ls -lah /etc/nginx |
cd | Change directory. | cd /var/log |
cp -a | Copy while preserving attributes. | cp -a app app.bak |
mv | Move or rename. | mv old.conf new.conf |
rm | Remove files. | rm old.log |
find | Search files by name, type, age or size. | find /var/log -type f -name "*.log" |
du -sh | Show directory size. | du -sh /var/lib/docker |
Essential file commands
# Where am I?
pwd
# List files with permissions, owner, size and hidden files
ls -lah
# Copy a directory safely, preserving metadata
cp -a /etc/nginx /etc/nginx.bak
# Move or rename
mv app.conf app.conf.disabled
# Remove carefully
rm file.txt
# Dangerous: recursive delete
rm -rf path
# Find recent logs
find /var/log -type f -name "*.log" -mtime -7
# Find large files
find /var -type f -size +100M -exec ls -lh {} \;Linux filesystem map
/
โโโ bin essential binaries
โโโ boot kernel and boot files
โโโ dev devices
โโโ etc system configuration
โโโ home user home directories
โโโ lib system libraries
โโโ media removable media
โโโ mnt temporary mounts
โโโ opt optional software
โโโ proc kernel/process virtual filesystem
โโโ root root user home
โโโ run runtime state
โโโ sbin system binaries
โโโ srv service/application data
โโโ sys kernel/device virtual filesystem
โโโ tmp temporary files
โโโ usr user-space programs and libraries
โโโ var logs, cache, spool, databases, runtime dataUseful inspection commands
# Show file type
file /path/to/file
# Show file metadata
stat /path/to/file
# Read first lines
head -50 /var/log/syslog
# Read last lines
tail -100 /var/log/syslog
# Follow a log live
tail -f /var/log/syslog
# Search inside files
grep -R "error" /etc/nginx
# Compare two files
diff -u old.conf new.confsudo cp -a file file.bak.$(date +%Y%m%d-%H%M%S).Permissions: rwx, ownership, groups, umask and safe defaults
Linux permissions define who can read, write or execute a file. Most application failures on Ubuntu servers eventually involve one of these: wrong owner, wrong group, missing execute bit on directory, overly permissive file, SSH key permissions or service user unable to access application files.
Permission notation
Example:
-rw-r--r-- 1 root root 1200 app.conf
Breakdown:
- file type
rw- owner permissions
r-- group permissions
r-- others permissions
r = read
w = write
x = execute / enter directory| Mode | Meaning | Typical use |
|---|---|---|
600 | Owner read/write only. | Private keys, secrets. |
644 | Owner write, everyone read. | Config files, static files. |
700 | Owner full access only. | Private directories, .ssh. |
755 | Owner write, everyone read/execute. | Directories, scripts, web static dirs. |
777 | Everyone can do everything. | Almost never acceptable. |
Permission commands
# Show permissions
ls -lah /srv/app
# Show user and group identity
id deploy
# Change owner
sudo chown deploy:www-data /srv/app
# Change owner recursively
sudo chown -R deploy:www-data /srv/app
# Change file permissions
chmod 644 config.ini
# Change directory permissions
chmod 755 /srv/app
# SSH key permissions
chmod 700 ~/.ssh
chmod 600 ~/.ssh/id_ed25519
chmod 644 ~/.ssh/authorized_keys
# Show default creation mask
umaskPermission troubleshooting flow
Permission denied
โ
โโโ Which user runs the process?
โ โโโ ps aux | grep service
โ
โโโ Who owns the file?
โ โโโ ls -lah file
โ
โโโ Can the user access parent directories?
โ โโโ namei -l /path/to/file
โ
โโโ Is the group correct?
โ โโโ id user
โ
โโโ Are permissions too strict or too broad?
โโโ chmod / chown carefullychmod 777. Fix ownership, groups and minimal required permissions.Users, groups, sudo and SSH access
Ubuntu administration should use named users with sudo privileges, not direct root logins. This improves traceability, reduces operational risk and supports least-privilege access. For production, SSH keys should be preferred over passwords.
| Task | Command | Purpose |
|---|---|---|
| Create user | sudo adduser deploy | Create named account. |
| Add sudo rights | sudo usermod -aG sudo deploy | Allow admin actions. |
| Inspect identity | id deploy | Show UID, GID and groups. |
| Show groups | groups deploy | Confirm group membership. |
| Check sudo rights | sudo -l | Show allowed sudo commands. |
| Lock account | sudo usermod -L user | Disable password login. |
User management examples
# Create admin user
sudo adduser deploy
sudo usermod -aG sudo deploy
# Check user
id deploy
groups deploy
# Switch user
su - deploy
# Test sudo permissions
sudo -l
# Add user to web group
sudo usermod -aG www-data deploy
# Lock user password
sudo passwd -l deploySSH access model
Admin workstation
โ
โโโ private key
โโโ public key
โ
โผ
Ubuntu server
โ
โโโ /home/deploy/.ssh/authorized_keys
โโโ sshd service
โโโ firewall allows SSH
โโโ sudo controls privilege escalationSSH hardening baseline
# Backup SSH config
sudo cp /etc/ssh/sshd_config /etc/ssh/sshd_config.bak
# Recommended settings in /etc/ssh/sshd_config
PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes
AllowUsers deploy
# Validate syntax
sudo sshd -t
# Restart SSH
sudo systemctl restart ssh
# Check logs
journalctl -u ssh --since todayAccess control checklist
[ ] One named admin user
[ ] SSH key installed
[ ] User belongs to sudo group only if required
[ ] Root SSH login disabled
[ ] Password authentication disabled after key test
[ ] Unused users disabled
[ ] sudoers changes made with visudo
[ ] SSH access logs reviewedServices with systemd: status, start, stop, enable, logs
Ubuntu uses systemd to manage services. A service can be running now, enabled at boot, failed, disabled, masked or waiting on dependencies. Most production daemons such as SSH, Nginx, PostgreSQL, Redis, Docker, Gunicorn and Celery are managed by systemd.
| Command | Meaning | Example |
|---|---|---|
status | Show state, PID, recent logs. | systemctl status nginx |
start | Start now. | sudo systemctl start nginx |
stop | Stop now. | sudo systemctl stop nginx |
restart | Stop and start again. | sudo systemctl restart nginx |
reload | Reload config without full restart if supported. | sudo systemctl reload nginx |
enable | Start automatically at boot. | sudo systemctl enable nginx |
disable | Do not start automatically at boot. | sudo systemctl disable nginx |
Essential systemd commands
# Service status
systemctl status nginx
# Start / stop / restart
sudo systemctl start nginx
sudo systemctl stop nginx
sudo systemctl restart nginx
# Enable at boot
sudo systemctl enable nginx
# Disable at boot
sudo systemctl disable nginx
# Show failed services
systemctl list-units --type=service --state=failed
# Show enabled services
systemctl list-unit-files --type=service --state=enabledService troubleshooting flow
Service is down
โ
โโโ Check status
โ โโโ systemctl status service
โ
โโโ Read service logs
โ โโโ journalctl -u service
โ
โโโ Validate config
โ โโโ nginx -t / sshd -t / app-specific check
โ
โโโ Check port binding
โ โโโ ss -lntp
โ
โโโ Check permissions
โ โโโ ls -lah, id service-user
โ
โโโ Restart only after understanding error
โโโ systemctl restart serviceCustom service unit example
[Unit]
Description=Gunicorn Django application
After=network.target
[Service]
User=deploy
Group=www-data
WorkingDirectory=/srv/myapp
Environment="DJANGO_SETTINGS_MODULE=config.settings"
ExecStart=/srv/myapp/venv/bin/gunicorn config.wsgi:application \
--bind 127.0.0.1:8000 \
--workers 3
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.targetInstall custom unit
sudo cp gunicorn.service /etc/systemd/system/gunicorn.service
sudo systemctl daemon-reload
sudo systemctl enable gunicorn
sudo systemctl start gunicorn
systemctl status gunicorn
journalctl -u gunicorn -fsystemctl status then journalctl -u service. Do not debug blindly.Logs: journald, syslog, auth logs and application logs
Logs tell what the system and services reported at the time of the incident. On Ubuntu, systemd logs are read with journalctl. Traditional logs often live under /var/log. Applications may log to journald, files, Docker logs or external observability tools.
| Log source | What it contains | Command |
|---|---|---|
| systemd journal | Service logs and system events. | journalctl |
| Service unit logs | Specific service output. | journalctl -u nginx |
| Auth logs | SSH, sudo, authentication events. | /var/log/auth.log |
| Syslog | General system messages. | /var/log/syslog |
| Kernel logs | Kernel and hardware messages. | dmesg |
| Application logs | App-specific runtime errors. | App path, journald or Docker logs. |
journalctl essentials
# Recent critical context
journalctl -xe
# Logs for one service
journalctl -u nginx
# Follow service logs live
journalctl -u nginx -f
# Logs since a time
journalctl -u nginx --since "1 hour ago"
# Logs since today
journalctl -u ssh --since today
# Warnings and errors
journalctl -p warning --since today
# Boot logs
journalctl -b
# Previous boot
journalctl -b -1Classic log commands
# Last lines
tail -n 200 /var/log/syslog
tail -n 200 /var/log/auth.log
# Follow file live
tail -f /var/log/syslog
# Search errors
grep -i "error" /var/log/syslog
# Search SSH failures
grep -i "failed" /var/log/auth.log
# Compressed rotated logs
zgrep -i "error" /var/log/syslog.*.gz
# Kernel recent messages
dmesg -T | tail -100Log diagnosis map
Incident type
โ
โโโ Service fails
โ โโโ journalctl -u service
โ
โโโ SSH login issue
โ โโโ journalctl -u ssh, /var/log/auth.log
โ
โโโ Kernel or hardware issue
โ โโโ dmesg -T
โ
โโโ Package install issue
โ โโโ /var/log/apt/history.log
โ
โโโ Web server issue
โ โโโ nginx/apache logs + journal
โ
โโโ App issue
โโโ app logs + service journalApt history
# See package changes
less /var/log/apt/history.log
# See apt terminal output
less /var/log/apt/term.log--since "30 min ago".Network: IP, routes, DNS, ports, sockets, firewall
Network troubleshooting should follow a strict order: local IP, route, DNS, firewall, listening socket, service health, upstream application. This avoids confusing a DNS issue with a service issue, or a firewall issue with an application crash.
| Layer | Question | Command |
|---|---|---|
| Interface | Does the server have an IP? | ip a |
| Route | Does it know where to send traffic? | ip r |
| DNS | Can names resolve? | resolvectl status, dig |
| Port | Is the service listening? | ss -lntp |
| Firewall | Is traffic allowed? | ufw status verbose |
| HTTP test | Does the endpoint respond? | curl -I |
Network essentials
# IP addresses
ip a
# Routes
ip r
# Listening TCP ports with process
ss -lntp
# Established connections
ss -antp
# DNS status
resolvectl status
# DNS query
dig example.com
# HTTP check
curl -I https://example.com
# Basic reachability
ping -c 3 1.1.1.1
# Path test
tracepath example.comNetwork troubleshooting flow
Network problem
โ
โโโ Local IP present?
โ โโโ ip a
โ
โโโ Default route present?
โ โโโ ip r
โ
โโโ DNS working?
โ โโโ dig domain
โ
โโโ Firewall allows traffic?
โ โโโ ufw status verbose
โ
โโโ Service listening?
โ โโโ ss -lntp
โ
โโโ Local curl works?
โ โโโ curl -I http://localhost
โ
โโโ Remote curl works?
โโโ curl -I https://public-domainFirewall commands
# Status
sudo ufw status verbose
# Default rules
sudo ufw default deny incoming
sudo ufw default allow outgoing
# Allow SSH
sudo ufw allow OpenSSH
# Allow web ports
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
# Enable
sudo ufw enable
# Delete a rule
sudo ufw delete allow 80/tcpStorage: disks, mounts, usage, LVM, swap and full-disk incidents
Storage problems are among the most common Linux incidents. A full root filesystem, a full /var, a missing mount, broken permissions on a data directory or uncontrolled Docker logs can stop services even when CPU and memory look fine.
| Command | Purpose | Example |
|---|---|---|
df -h | Show filesystem free space. | df -h |
du -sh | Show directory size. | du -sh /var/* |
lsblk | Show disks and partitions. | lsblk -f |
findmnt | Show mounted filesystems. | findmnt /var |
swapon | Show swap devices/files. | swapon --show |
lvs | Show LVM logical volumes. | sudo lvs |
Storage essentials
# Filesystem usage
df -h
# Directory usage
sudo du -sh /var/*
sudo du -sh /var/log/*
sudo du -sh /var/lib/docker/*
# Disks and filesystems
lsblk
lsblk -f
# Mounted filesystems
findmnt
# Swap
swapon --show
free -h
# LVM if used
sudo pvs
sudo vgs
sudo lvsFull disk incident flow
Disk alert or service failure
โ
โโโ Check filesystems
โ โโโ df -h
โ
โโโ Identify large directories
โ โโโ du -sh /*
โ
โโโ Focus common growth areas
โ โโโ /var/log
โ โโโ /var/lib/docker
โ โโโ /var/lib/postgresql
โ โโโ /tmp
โ โโโ application uploads
โ
โโโ Clean safely
โ โโโ rotate logs
โ โโโ prune Docker carefully
โ โโโ archive/delete known files
โ
โโโ Prevent recurrence
โโโ monitoring
โโโ logrotate
โโโ retention policy
โโโ larger/separate volumeSafe cleanup examples
# Clean apt cache
sudo apt clean
# Remove unused packages
sudo apt autoremove
# Vacuum journal logs older than 14 days
sudo journalctl --vacuum-time=14d
# Show Docker usage
docker system df
# Docker cleanup - use carefully
docker system pruneTroubleshooting patterns: from symptom to root cause
Troubleshooting on Ubuntu should follow a repeatable sequence: observe, isolate, verify, change one thing, measure again, document. Most incidents can be reduced to service state, logs, ports, permissions, network, storage, memory or recent changes.
| Symptom | First checks | Common causes |
|---|---|---|
| Service down | systemctl status, journalctl -u | Bad config, dependency, permission, port conflict. |
| 502 from Nginx | Nginx logs, upstream service, socket/port. | Gunicorn down, wrong socket, app error. |
| SSH blocked | SSH service, firewall, key, auth logs. | Bad key, password disabled, UFW, fail2ban. |
| Cannot install package | apt update, DNS, locks, dpkg state. | Mirror, DNS, interrupted install, lock file. |
| Disk full | df -h, du -sh. | Logs, Docker, DB, uploads, backups. |
| App permission error | ls -lah, id, namei -l. | Wrong owner, group, parent directory permissions. |
| DNS issue | resolvectl status, dig. | Resolver config, firewall, network, cloud DNS. |
Universal incident decision tree
Application not working
โ
โโโ Is the server alive?
โ โโโ ping, SSH, cloud console
โ
โโโ Is disk full?
โ โโโ df -h
โ
โโโ Is memory exhausted?
โ โโโ free -h, top
โ
โโโ Is the service running?
โ โโโ systemctl status service
โ
โโโ What do logs say?
โ โโโ journalctl -u service
โ
โโโ Is the port listening?
โ โโโ ss -lntp
โ
โโโ Is firewall blocking?
โ โโโ ufw status verbose
โ
โโโ Is DNS/routing OK?
โ โโโ ip r, resolvectl, dig
โ
โโโ Did a recent change happen?
โโโ apt history, deploy logs, config diffUseful โone screenโ diagnostic
echo "== HOST ==" && hostnamectl
echo "== UPTIME ==" && uptime
echo "== DISK ==" && df -h
echo "== MEMORY ==" && free -h
echo "== FAILED UNITS ==" && systemctl --failed
echo "== PORTS ==" && ss -lntp
echo "== WARNINGS ==" && journalctl -p warning --since "30 min ago" --no-pagerUbuntu CLI cheat sheet and production checklist
Core cheat sheet
# Files
ls -lah
cp -a src dst
mv old new
rm file
find /path -name "*.log"
du -sh *
df -h
# Permissions
ls -l
chmod 644 file
chmod 755 dir
chown user:group file
id user
namei -l /path/to/file
# Users
adduser user
usermod -aG sudo user
groups user
sudo -l
# Services
systemctl status service
systemctl restart service
systemctl enable service
systemctl --failed
# Logs
journalctl -u service -f
journalctl -p warning --since today
tail -f /var/log/syslog
# Network
ip a
ip r
ss -lntp
curl -I http://localhost
dig domain
resolvectl status
# Storage
lsblk -f
findmnt
swapon --show
free -hProduction sysadmin baseline
[ ] I know the server role
[ ] I know the Ubuntu version
[ ] I know which services must run
[ ] I know which ports must listen
[ ] I know where logs are
[ ] I know which user runs each app
[ ] I know where configs are
[ ] I know where data is stored
[ ] I know backup location
[ ] I know firewall rules
[ ] I know how to restart safely
[ ] I know how to rollback
[ ] I avoid chmod 777
[ ] I avoid root direct login
[ ] I document changesFinal rule
It lets you inspect the real state of the machine: files, permissions, users, services, logs, ports, network paths, disks and failures. Good troubleshooting means reading evidence before making changes.
Troubleshooting order
1. Observe symptoms
2. Check server health
3. Check disk and memory
4. Check service state
5. Read logs
6. Check ports
7. Check network and DNS
8. Check permissions
9. Check recent changes
10. Apply one fix
11. Verify
12. DocumentPackage management on Ubuntu
Ubuntu package management is mainly based on APT, which installs, upgrades, removes and resolves software dependencies from configured repositories. Ubuntu also supports Snap, a package format designed for sandboxed applications with automatic refresh behavior.
In production, package management is not only about installing software. It controls security patching, dependency stability, reproducibility, rollback strategy, package provenance, compliance and operational risk.
| Tool | Role | Typical usage | Production concern |
|---|---|---|---|
| APT | Main Ubuntu package manager frontend. | Install Nginx, PostgreSQL, Redis, Python packages from Ubuntu repos. | Repository control, upgrade policy, dependency stability. |
| dpkg | Low-level Debian package tool. | Inspect installed packages or install local .deb files. | Does not resolve dependencies like APT. |
| Snap | Sandboxed application packaging. | Desktop apps, selected server tools, Canonical ecosystem packages. | Automatic refresh, policy control, mixed packaging strategy. |
| PPA | Third-party repository hosted on Launchpad. | Newer package versions or vendor-specific builds. | Trust, support, upgrade conflicts, governance. |
| Vendor repo | Repository maintained by software vendor. | Docker, PostgreSQL, NodeSource, Elastic, HashiCorp. | Key management, package pinning, lifecycle tracking. |
Package management architecture
Ubuntu package flow
โ
โโโ Repository configuration
โ โโโ Ubuntu official repositories
โ โโโ security repositories
โ โโโ updates repositories
โ โโโ PPAs
โ โโโ vendor repositories
โ
โโโ APT metadata
โ โโโ package lists
โ โโโ versions
โ โโโ dependencies
โ โโโ priorities
โ
โโโ Package operations
โ โโโ install
โ โโโ upgrade
โ โโโ remove
โ โโโ purge
โ โโโ autoremove
โ
โโโ Operational controls
โโโ pinning
โโโ holds
โโโ unattended upgrades
โโโ reboot policy
โโโ rollback planDecision map
Need standard server package?
โโโ use APT from Ubuntu repositories
Need vendor-supported latest version?
โโโ use official vendor repository
Need experimental or community package?
โโโ use PPA only with governance
Need desktop-style sandboxed app?
โโโ Snap can be acceptable
Need strict production reproducibility?
โโโ prefer APT + pinned versions + image buildAPT basics: install, upgrade, remove, inspect
APT is the standard daily tool for Ubuntu package operations. It downloads package metadata, resolves dependencies, installs software, upgrades packages and removes software cleanly.
| Command | Purpose | Example |
|---|---|---|
apt update | Refresh repository metadata. | sudo apt update |
apt upgrade | Upgrade installed packages without removing packages. | sudo apt upgrade |
apt full-upgrade | Upgrade with dependency changes, installs/removals if needed. | sudo apt full-upgrade |
apt install | Install package. | sudo apt install nginx |
apt remove | Remove package but keep config files. | sudo apt remove nginx |
apt purge | Remove package and config files. | sudo apt purge nginx |
apt autoremove | Remove unused dependencies. | sudo apt autoremove |
apt policy | Show installed and candidate version. | apt policy nginx |
Essential APT commands
# Refresh package metadata
sudo apt update
# Show upgradeable packages
apt list --upgradable
# Upgrade packages
sudo apt upgrade
# Install package
sudo apt install nginx
# Show package information
apt show nginx
# Show package versions and source repository
apt policy nginx
# Search package
apt search postgresql
# Remove package but keep configuration
sudo apt remove nginx
# Remove package and configuration
sudo apt purge nginx
# Remove unused dependencies
sudo apt autoremoveAPT vs dpkg
| Tool | Best for | Important detail |
|---|---|---|
apt | Normal package management. | Resolves dependencies from repositories. |
apt-cache | Older metadata inspection commands. | Still useful in scripts and diagnostics. |
dpkg | Inspect or install local Debian packages. | Does not automatically resolve dependencies. |
apt-file | Find which package provides a file. | Requires package metadata installation. |
Package inspection
# List installed packages
dpkg -l
# Filter installed packages
dpkg -l | grep nginx
# Show files installed by package
dpkg -L nginx
# Find which package owns a file
dpkg -S /usr/sbin/nginx
# Show package version
dpkg -s nginx | grep Version
# Show apt history
less /var/log/apt/history.log
# Show apt terminal logs
less /var/log/apt/term.logapt list --upgradable and review critical packages such as kernel, OpenSSL, database, web server and runtime.Repositories: official sources, PPAs, vendor repos and trust
APT installs packages from repositories. Repository governance is critical: every repository added to a production server becomes part of the trust and upgrade surface. Too many uncontrolled PPAs or vendor repositories can make upgrades unpredictable.
| Repository type | Usage | Risk | Production rule |
|---|---|---|---|
| Ubuntu main | Official supported packages. | Low. | Default baseline. |
| Ubuntu universe | Community-maintained packages. | Support scope differs. | Accept with awareness. |
| Security repo | Security updates. | Must stay enabled. | Never disable casually. |
| PPA | Community or project-specific builds. | Trust and compatibility risk. | Use only with explicit approval. |
| Vendor repo | Official software vendor packages. | Key, pinning and lifecycle complexity. | Document and monitor. |
| Local mirror | Enterprise-controlled package mirror. | Mirror freshness. | Useful for controlled fleets. |
Repository locations
# Main APT source files
/etc/apt/sources.list
/etc/apt/sources.list.d/
# Newer Ubuntu systems may use deb822 source files
/etc/apt/sources.list.d/*.sources
# Trusted keyring locations
/etc/apt/keyrings/
/usr/share/keyrings/
# Apt preferences and pinning
/etc/apt/preferences
/etc/apt/preferences.d/Repository inspection commands
# Show active source files
ls -lah /etc/apt/sources.list.d/
cat /etc/apt/sources.list
# Search configured repositories
grep -R "^deb" /etc/apt/sources.list /etc/apt/sources.list.d/
# Refresh repository metadata
sudo apt update
# Show repository used for package candidate
apt policy nginx
# Show all versions available
apt-cache madison nginx
# Show package origin details
apt-cache policy nginxVendor repository pattern
Recommended vendor repo pattern:
1. Add vendor signing key into /etc/apt/keyrings/
2. Add repository source referencing signed-by key
3. Run apt update
4. Check apt policy package
5. Install exact package
6. Document repository owner and reason
7. Monitor vendor release notes
8. Pin if requiredRepository risk diagram
New repository added
โ
โโโ Can replace existing packages?
โโโ Can introduce newer dependencies?
โโโ Can break upgrade path?
โโโ Is signing key controlled?
โโโ Is vendor trusted?
โโโ Is lifecycle documented?
โโโ Is rollback possible?Updates: patching, reboot policy, golden images and upgrade windows
Ubuntu updates must balance security and stability. Security patches should be applied quickly, but critical production systems often require staging validation, maintenance windows and rollback plans. Kernel and libc-related updates may require service restart or full reboot.
| Update strategy | Best for | Strength | Watch out |
|---|---|---|---|
| Manual updates | Small systems, controlled maintenance. | Maximum human control. | Can be forgotten. |
| Unattended security updates | Standard servers. | Fast CVE patching. | Needs reboot/service restart policy. |
| Monthly patch window | Critical production. | Testing and coordination. | Emergency CVEs still need fast path. |
| Golden image replacement | Cloud fleets and autoscaling. | Reproducible and rollback-friendly. | Requires image pipeline. |
| Rolling patching | Clusters and HA services. | No full downtime. | Requires health checks and drain logic. |
Update commands
# Refresh metadata
sudo apt update
# Show upgradeable packages
apt list --upgradable
# Upgrade packages
sudo apt upgrade
# More complete dependency-aware upgrade
sudo apt full-upgrade
# Remove unused dependencies
sudo apt autoremove
# Check if reboot is required
test -f /var/run/reboot-required && cat /var/run/reboot-required
# Show packages requiring reboot
cat /var/run/reboot-required.pkgs 2>/dev/nullPatch workflow
Patch workflow
โ
โโโ Inventory
โ โโโ OS version
โ โโโ kernel version
โ โโโ critical services
โ โโโ package list
โ
โโโ Prepare
โ โโโ backup
โ โโโ snapshot
โ โโโ staging test
โ โโโ maintenance window
โ
โโโ Patch
โ โโโ apt update
โ โโโ apt upgrade
โ โโโ service validation
โ โโโ reboot if required
โ
โโโ Verify
โโโ systemctl --failed
โโโ journalctl warnings
โโโ listening ports
โโโ application smoke tests
โโโ monitoring greenUnattended upgrades
# Install unattended upgrades
sudo apt install unattended-upgrades
# Configure automatic updates
sudo dpkg-reconfigure unattended-upgrades
# Main config files
/etc/apt/apt.conf.d/20auto-upgrades
/etc/apt/apt.conf.d/50unattended-upgrades
# Check logs
less /var/log/unattended-upgrades/unattended-upgrades.logSecurity: CVEs, package provenance, keys and audit trail
Package security is about more than installing updates. It includes repository trust, signing keys, CVE awareness, dependency origin, package version visibility, automatic security updates, rollback and auditability.
| Security concern | Diagnostic | Control |
|---|---|---|
| Known vulnerable package | Security notices, scanner, package version. | Patch quickly, reboot/restart if needed. |
| Untrusted repository | Inspect sources and keys. | Remove unused PPAs and vendor repos. |
| Unsigned or broken repository | apt update errors. | Fix keyring or disable repository. |
| Package replaced by PPA | apt policy package. | Pin or remove repository. |
| No audit trail | Apt history missing from process. | Record update windows and package changes. |
Security inspection commands
# Show installed version and candidate
apt policy openssl
apt policy nginx
# Show package details
apt show openssl
# Show package changelog if available
apt changelog openssl
# Review apt history
less /var/log/apt/history.log
# Show recently modified source files
sudo find /etc/apt -type f -mtime -30 -ls
# Check Ubuntu Pro status if available
pro statusPackage security flow
Security advisory or CVE
โ
โโโ Identify affected package
โ โโโ apt policy package
โ
โโโ Check installed version
โ โโโ dpkg -s package
โ
โโโ Check available update
โ โโโ apt list --upgradable
โ
โโโ Apply patch
โ โโโ apt upgrade package
โ
โโโ Restart service if needed
โ โโโ systemctl restart service
โ
โโโ Reboot if kernel/system library
โ โโโ reboot-required
โ
โโโ Verify
โโโ version updated
โโโ service healthy
โโโ logs cleanKey management principles
Good:
- vendor keys stored in /etc/apt/keyrings/
- repository line uses signed-by=
- repository owner documented
- old repositories removed
- package origin checked with apt policy
Avoid:
- legacy apt-key usage
- unknown curl | sudo bash scripts
- unmanaged PPAs
- repositories kept after one-time install
- blind upgrades without package reviewPinning, holds and version control
Pinning and holds control package versions. They are useful when a service depends on a specific version, when a repository offers unwanted newer packages, or when an upgrade must be temporarily blocked. They should be documented because forgotten pins can create security and maintenance risks.
| Mechanism | Purpose | Example use | Risk |
|---|---|---|---|
apt-mark hold | Prevent package upgrades. | Freeze PostgreSQL or Nginx temporarily. | Security patches may be blocked. |
| APT preferences | Control repository priority. | Prefer Ubuntu repo over PPA. | Misconfiguration can select wrong packages. |
| Exact version install | Install specific version. | apt install package=version | Version may disappear from repo. |
| Golden image | Freeze whole system baseline. | Cloud server fleet. | Image must be rebuilt for patches. |
Hold commands
# Hold a package
sudo apt-mark hold nginx
# Show held packages
apt-mark showhold
# Remove hold
sudo apt-mark unhold nginx
# Install exact version
sudo apt install nginx=1.24.0-2ubuntu7
# Show available versions
apt-cache madison nginx
apt policy nginxAPT preferences example
# Example file:
# /etc/apt/preferences.d/nginx-pin
Package: nginx*
Pin: release o=Ubuntu
Pin-Priority: 700
Package: nginx*
Pin: origin "ppa.launchpadcontent.net"
Pin-Priority: 400Version governance flow
Need version control?
โ
โโโ Is this temporary?
โ โโโ yes -> apt-mark hold + ticket + expiry date
โ โโโ no
โ
โโโ Is repo priority wrong?
โ โโโ yes -> APT preferences pinning
โ โโโ no
โ
โโโ Need fleet reproducibility?
โ โโโ yes -> golden image or IaC
โ โโโ no
โ
โโโ Document package policy
โโโ package
โโโ desired version
โโโ reason
โโโ owner
โโโ review datePinning risks
| Risk | Cause | Control |
|---|---|---|
| Missed security update | Package held too long. | Review holds regularly. |
| Dependency conflict | Package versions drift. | Test upgrades in staging. |
| Wrong repo selected | Bad pin priority. | Check apt policy. |
| Hidden operational debt | No owner or expiry. | Document every hold and pin. |
Snap: concept, commands, refresh behavior and production policy
Snap packages bundle applications with their dependencies and run with confinement rules. Snaps are useful for some desktop applications and selected server tools, but production teams must understand refresh behavior, confinement, channels and operational policy before relying on them.
| Snap concept | Meaning | Operational impact |
|---|---|---|
| Channel | Release track such as stable, candidate, beta, edge. | Controls risk level. |
| Confinement | Sandbox permissions model. | Can affect filesystem and device access. |
| Refresh | Automatic update behavior. | Needs maintenance window policy. |
| Revision | Specific snap build version. | Rollback may use previous revision. |
| Interface | Permission connection between snap and system resource. | May require manual connection. |
Snap essentials
# List installed snaps
snap list
# Find package
snap find code
# Install snap
sudo snap install package-name
# Install from specific channel
sudo snap install package-name --channel=stable
# Refresh snaps
sudo snap refresh
# Show refresh schedule
snap refresh --time
# Show snap information
snap info package-name
# Remove snap
sudo snap remove package-nameSnap operational commands
# Show connections/interfaces
snap connections package-name
# Connect interface manually
sudo snap connect package-name:interface
# Revert to previous revision if available
sudo snap revert package-name
# Hold refresh temporarily
sudo snap refresh --hold=24h package-name
# Hold all refreshes temporarily
sudo snap refresh --hold=24h
# Show changes
snap changes
# Show logs for snap service if applicable
snap logs package-nameAPT vs Snap decision table
| Need | Prefer APT | Prefer Snap |
|---|---|---|
| Core server packages | Yes. | Usually no. |
| Desktop applications | Sometimes. | Often acceptable. |
| Strict patch window | Easier to control. | Refresh policy must be managed. |
| Sandboxed app delivery | Less direct. | Good fit. |
| Traditional system service | Usually better. | Depends on package and support model. |
APT and package troubleshooting
Package problems often come from broken dependencies, interrupted installs, repository errors, DNS issues, expired keys, dpkg locks, held packages or third-party repository conflicts. Troubleshooting should start by reading the actual APT error.
| Symptom | Likely cause | First command |
|---|---|---|
Could not get lock | Another apt or dpkg process is running. | ps aux | grep -E 'apt|dpkg' |
Temporary failure resolving | DNS problem. | resolvectl status |
NO_PUBKEY | Missing repository signing key. | Inspect repository and keyring. |
held broken packages | Dependency conflict or holds. | apt-mark showhold |
| Package version unexpected | PPA or pinning changed candidate. | apt policy package |
| Install interrupted | dpkg half-configured packages. | sudo dpkg --configure -a |
Repair commands
# Repair interrupted dpkg configuration
sudo dpkg --configure -a
# Fix broken dependencies
sudo apt -f install
# Refresh metadata
sudo apt update
# Clean local package cache
sudo apt clean
# Check held packages
apt-mark showhold
# Check locks safely
ps aux | grep -E 'apt|dpkg'
# Review apt history
less /var/log/apt/history.log
less /var/log/apt/term.logTroubleshooting decision tree
APT operation fails
โ
โโโ Read exact error
โ
โโโ Lock error?
โ โโโ wait or inspect apt/dpkg processes
โ
โโโ Network or DNS error?
โ โโโ check ip, route, DNS, proxy
โ
โโโ Repository signature error?
โ โโโ check source file and keyring
โ
โโโ Dependency conflict?
โ โโโ apt -f install, apt policy, holds
โ
โโโ Interrupted install?
โ โโโ dpkg --configure -a
โ
โโโ Third-party repo conflict?
โโโ disable repo, update, retry in stagingRepository isolation technique
# Temporarily disable a source file
sudo mv /etc/apt/sources.list.d/vendor.list \
/etc/apt/sources.list.d/vendor.list.disabled
# Refresh metadata
sudo apt update
# Re-check package candidate
apt policy package-nameProduction best practices: governance, reproducibility and rollback
In production, package management must be reproducible. The same server role should use the same repositories, packages, versions, configuration and patching process. Manual package drift is a major source of incidents.
| Practice | Why it matters | Implementation |
|---|---|---|
| Approved repository list | Controls supply-chain risk. | Document Ubuntu, security and vendor repos. |
| Package baseline | Improves reproducibility. | Ansible, Packer, Terraform, cloud-init. |
| Patch windows | Reduces surprise outages. | Monthly standard, emergency CVE fast path. |
| Staging validation | Catches dependency and config breakage. | Upgrade staging before production. |
| Rollback plan | Limits outage duration. | Snapshot, AMI, previous image, package downgrade plan. |
| Change log | Enables incident diagnosis. | Ticket, deployment log, apt history archive. |
Production package lifecycle
Package change request
โ
โโโ Why is package needed?
โโโ Which repository provides it?
โโโ Is vendor trusted?
โโโ Is version pinned or floating?
โโโ Has staging been tested?
โโโ Is rollback possible?
โโโ Is owner documented?
โ
โผ
Approved installation
โ
โโโ update IaC
โโโ apply in staging
โโโ validate
โโโ apply in production
โโโ document resultProduction rules
Do:
- use Ubuntu LTS for production
- keep security repository enabled
- document every external repository
- prefer vendor official repositories over random PPAs
- test updates in staging
- track reboot-required state
- keep rollback snapshot or image
- automate package baseline
- review apt history after changes
- monitor security advisories
Avoid:
- unmanaged PPAs
- curl | sudo bash without review
- compiling manually into /usr/local without documentation
- mixing APT and Snap for the same service role
- holding packages forever
- patching critical systems without rollbackInfrastructure-as-code examples
Package baseline can be expressed in:
- Ansible apt module
- cloud-init packages section
- Packer image build
- Terraform user_data
- Dockerfile for containers
- shell bootstrap script under version control
Goal:
rebuild server from code, not memory.Package management cheat sheet and final checklist
APT cheat sheet
# Metadata and updates
sudo apt update
apt list --upgradable
sudo apt upgrade
sudo apt full-upgrade
# Install and remove
sudo apt install package-name
sudo apt remove package-name
sudo apt purge package-name
sudo apt autoremove
# Inspect
apt show package-name
apt policy package-name
apt-cache madison package-name
dpkg -l | grep package-name
dpkg -L package-name
dpkg -S /path/to/file
# Troubleshoot
sudo dpkg --configure -a
sudo apt -f install
apt-mark showhold
less /var/log/apt/history.log
# Hold
sudo apt-mark hold package-name
sudo apt-mark unhold package-nameSnap cheat sheet
# Inspect
snap list
snap find package-name
snap info package-name
# Install and remove
sudo snap install package-name
sudo snap install package-name --channel=stable
sudo snap remove package-name
# Refresh
sudo snap refresh
snap refresh --time
sudo snap refresh --hold=24h package-name
# Operations
snap changes
snap connections package-name
snap logs package-name
sudo snap revert package-nameFinal production checklist
[ ] Ubuntu official repositories are enabled
[ ] Security repository is enabled
[ ] External repositories are documented
[ ] Repository keys are managed in keyrings
[ ] PPAs are justified or avoided
[ ] Package baseline is automated
[ ] Critical package versions are known
[ ] Holds and pins are documented
[ ] Update policy is defined
[ ] Reboot policy is defined
[ ] Staging update test exists
[ ] Rollback image or snapshot exists
[ ] Apt history is reviewed after changes
[ ] Snap policy is defined
[ ] Security advisories are monitoredFinal rule
APT and Snap are not just installation tools. They define what software runs, where it comes from, how it is patched, how it is upgraded, and how safely the system can recover when a package change goes wrong.
Customization and optimization objective
Ubuntu can be customized at several levels: desktop interface, GNOME extensions, themes, icons, fonts, keyboard shortcuts, startup applications, power settings, memory behavior and cleanup routines. The objective is to improve usability and performance without making the system fragile.
Good customization is controlled, reversible and documented. Bad customization creates unstable extensions, broken themes, slow login, excessive startup services, battery drain, hidden disk growth and difficult troubleshooting.
| Area | Goal | Main tools | Risk if unmanaged |
|---|---|---|---|
| GNOME interface | Improve desktop workflow. | Settings, Tweaks, Extensions. | Shell instability or visual inconsistency. |
| Themes and icons | Adapt visual style. | GTK themes, icon themes, user themes. | Broken UI after updates. |
| Keyboard shortcuts | Accelerate daily workflow. | Settings, custom commands, terminal shortcuts. | Conflicts and hard-to-remember mappings. |
| Battery | Reduce power usage on laptops. | Power profiles, TLP, powertop. | Thermal issues or poor autonomy. |
| Memory tuning | Control swap behavior. | vm.swappiness, monitoring. | Slow system if tuned blindly. |
| Cleanup | Keep disk usage healthy. | APT cleanup, journal vacuum, cache review. | Disk full or accidental data loss. |
Optimization map
Ubuntu workstation optimization
โ
โโโ Interface
โ โโโ GNOME Settings
โ โโโ GNOME Tweaks
โ โโโ dock behavior
โ โโโ workspace behavior
โ โโโ display settings
โ
โโโ Extensions
โ โโโ shell extensions
โ โโโ app indicators
โ โโโ tiling helpers
โ โโโ workflow enhancers
โ
โโโ Visual style
โ โโโ GTK theme
โ โโโ icon theme
โ โโโ cursor theme
โ โโโ fonts
โ
โโโ Productivity
โ โโโ keyboard shortcuts
โ โโโ terminal shortcuts
โ โโโ custom commands
โ โโโ launcher workflow
โ
โโโ Performance
โโโ startup apps
โโโ battery profile
โโโ swappiness
โโโ cache cleanup
โโโ logs and disk hygieneDecision shortcut
Want a better desktop?
โโโ first use built-in Settings
โโโ then GNOME Tweaks
โโโ then a few trusted extensions
โโโ avoid stacking many shell modifications
Want better performance?
โโโ remove useless startup apps
โโโ check disk and memory
โโโ tune battery profile
โโโ clean caches safely
โโโ measure before changing kernel parametersGNOME interface: built-in customization first
Ubuntu Desktop uses GNOME with Ubuntu-specific defaults. Before installing extensions or themes, start with built-in settings: dock placement, appearance, workspaces, display scaling, night light, keyboard layout, privacy, notifications and power profile.
| Interface area | Where to configure | Useful for |
|---|---|---|
| Appearance | Settings โ Appearance. | Light/dark mode, accent style, dock behavior. |
| Displays | Settings โ Displays. | Resolution, scaling, multi-monitor layout. |
| Keyboard | Settings โ Keyboard. | Shortcuts, input sources, custom commands. |
| Power | Settings โ Power. | Battery profile, screen blank, suspend behavior. |
| Notifications | Settings โ Notifications. | Reduce distractions. |
| Privacy | Settings โ Privacy. | Location, file history, camera, microphone. |
GNOME Tweaks installation
# Install GNOME Tweaks
sudo apt update
sudo apt install gnome-tweaks
# Launch from terminal
gnome-tweaks
# Install extension app if available
sudo apt install gnome-shell-extension-managerInterface customization flow
Customize desktop
โ
โโโ Built-in Settings
โ โโโ appearance
โ โโโ display
โ โโโ keyboard
โ โโโ power
โ โโโ privacy
โ
โโโ GNOME Tweaks
โ โโโ fonts
โ โโโ window behavior
โ โโโ startup apps
โ โโโ themes if enabled
โ
โโโ Extensions
โ โโโ install only useful ones
โ โโโ verify compatibility
โ โโโ disable if shell breaks
โ
โโโ Backup preferences
โโโ document installed extensions
โโโ export dotfiles if needed
โโโ keep restore pointUseful inspection commands
# GNOME Shell version
gnome-shell --version
# Current desktop session
echo $XDG_CURRENT_DESKTOP
echo $XDG_SESSION_TYPE
# Display environment
echo $WAYLAND_DISPLAY
echo $DISPLAY
# Installed GNOME packages
dpkg -l | grep -i gnome | head
# User configuration directories
ls -lah ~/.config
ls -lah ~/.local/shareGNOME extensions: workflow power with compatibility discipline
GNOME extensions modify the behavior of GNOME Shell. They can add indicators, tiling, dock improvements, clipboard managers, system monitors or workflow enhancements. However, extensions run inside the desktop shell environment and can break after GNOME upgrades if not maintained.
| Extension type | Use case | Risk |
|---|---|---|
| App indicators | Tray icons for apps. | Low to medium. |
| Dock customization | Dock behavior and visual changes. | Medium if overlapping Ubuntu dock. |
| Tiling assistants | Window snapping and layouts. | Medium if shell version changes. |
| System monitors | CPU, RAM, network indicators. | Can add overhead if badly implemented. |
| Theme/user shell | Shell visual customization. | Can break visual consistency. |
Install and manage extensions
# Install Extension Manager if available
sudo apt update
sudo apt install gnome-shell-extension-manager
# List enabled extensions
gnome-extensions list --enabled
# List all extensions
gnome-extensions list
# Show extension info
gnome-extensions info extension-name
# Disable extension
gnome-extensions disable extension-name
# Enable extension
gnome-extensions enable extension-nameExtension safety flow
Before installing extension
โ
โโโ Is it really needed?
โโโ Is it compatible with GNOME version?
โโโ Is it maintained?
โโโ Does it overlap with another extension?
โโโ Can it be disabled easily?
โโโ Is there a restore point before major desktop changes?Extension troubleshooting
# Disable all extensions for diagnostic
gnome-extensions disable extension-name
# Check GNOME Shell logs
journalctl /usr/bin/gnome-shell --since "1 hour ago"
# Check session errors
journalctl --user -p warning --since "1 hour ago"
# Restart GNOME Shell on Xorg
# Press Alt+F2, type r, press Enter
# On Wayland, log out and log back inRecommended extension policy
Good:
- install only a few extensions
- prefer maintained extensions
- remove unused extensions
- document core workflow extensions
- test after Ubuntu upgrade
Avoid:
- stacking many visual extensions
- installing abandoned extensions
- relying on extensions for critical access
- changing many extensions at once
- ignoring shell errors after loginGTK themes, icon themes, cursor themes and visual consistency
Ubuntu visual customization can use GTK themes, icon themes, cursor themes and fonts. Themes can improve comfort and readability, but deep theming may break after application or desktop updates, especially when applications use different toolkit versions.
| Theme element | What it changes | Typical location |
|---|---|---|
| GTK theme | Window and widget appearance. | ~/.themes, /usr/share/themes |
| Icon theme | Application and file icons. | ~/.icons, ~/.local/share/icons |
| Cursor theme | Mouse pointer style. | ~/.icons, system icon paths. |
| Shell theme | GNOME Shell top bar, menus, overview. | Requires user theme support. |
| Fonts | UI and document typography. | GNOME Tweaks. |
Theme directories
# User theme directories
mkdir -p ~/.themes
mkdir -p ~/.icons
mkdir -p ~/.local/share/icons
# System theme directories
ls -lah /usr/share/themes
ls -lah /usr/share/icons
# User config
ls -lah ~/.config
ls -lah ~/.local/shareTheme installation flow
Install theme safely
โ
โโโ Download from trusted source
โโโ Extract theme
โโโ Place in user directory
โ โโโ ~/.themes
โ โโโ ~/.icons
โโโ Open GNOME Tweaks
โโโ Select theme
โโโ Verify apps look correct
โโโ Keep original theme as fallbackVisual customization checklist
[ ] Theme source is trusted
[ ] Theme supports current GNOME/GTK version
[ ] Original theme remains available
[ ] Icons are readable in light and dark mode
[ ] Terminal colors remain readable
[ ] File manager remains usable
[ ] Browser and developer tools remain clear
[ ] Screenshots and presentations look professional
[ ] Theme can be reverted quicklyCommon theme problems
| Problem | Likely cause | Correction |
|---|---|---|
| Invisible text | Theme color mismatch. | Return to default or compatible theme. |
| Broken window controls | Unsupported shell or GTK version. | Use maintained theme. |
| Icons missing | Incomplete icon theme. | Install fallback icon set. |
| App does not follow theme | Different toolkit or sandbox package. | Accept limitation or configure app separately. |
Keyboard shortcuts: customize workflow and reduce friction
Keyboard shortcuts are one of the highest-return customizations. They reduce mouse use, speed up window management, launch tools quickly and make development workflows smoother. The best shortcuts are easy to remember and do not conflict with application shortcuts.
| Shortcut area | Example action | Good candidate |
|---|---|---|
| Terminal | Open terminal quickly. | Ctrl + Alt + T |
| Window management | Move, maximize, tile windows. | Super + arrows. |
| Workspaces | Switch between focused contexts. | Super + Page Up/Page Down. |
| Screenshots | Capture screen or region. | Print Screen shortcuts. |
| Custom app launch | Open IDE, browser, file manager. | Custom commands. |
| Scripts | Run productivity automation. | Custom script binding. |
Custom shortcut flow
Create custom shortcut
โ
โโโ Open Settings
โโโ Go to Keyboard
โโโ Open Keyboard Shortcuts
โโโ Add Custom Shortcut
โโโ Enter name
โโโ Enter command
โโโ Assign key combination
โโโ Test immediatelyUseful custom commands
# Open terminal
gnome-terminal
# Open file manager
nautilus
# Open browser
firefox
# Open specific project directory
gnome-terminal --working-directory=/home/user/projects
# Run a custom script
/home/user/bin/daily-check.sh
# Lock screen
gnome-screensaver-command -lShortcut design principles
Good shortcuts:
- easy to remember
- close to existing habits
- not conflicting with IDE/browser
- consistent by category
- documented if custom
- limited to high-frequency actions
Avoid:
- too many shortcuts
- hard-to-type combinations
- overriding critical app shortcuts
- shortcuts that run destructive scripts
- undocumented production scriptsDeveloper workflow example
Workflow:
Super + Enter -> terminal
Super + E -> file manager
Super + B -> browser
Super + D -> IDE
Super + Shift + L -> lock screen
Super + Shift + M -> monitoring dashboard
Super + Shift + T -> project terminalBattery and power optimization for laptops
Battery optimization on Ubuntu starts with power profiles, screen brightness, sleep behavior, background applications and hardware drivers. More advanced users can use tools like TLP or powertop, but should avoid applying random power tweaks without verifying their effect.
| Power area | Optimization | Trade-off |
|---|---|---|
| Power profile | Use power saver on battery. | Lower performance. |
| Screen brightness | Reduce brightness. | Less visibility in bright environment. |
| Sleep behavior | Shorter idle suspend. | May interrupt background tasks. |
| Startup apps | Disable unnecessary background apps. | Some apps need manual launch. |
| Bluetooth | Disable when unused. | Peripheral inconvenience. |
| GPU mode | Use integrated graphics if possible. | Lower graphics performance. |
Power commands
# Show power profiles if supported
powerprofilesctl
# Set power saver
powerprofilesctl set power-saver
# Set balanced
powerprofilesctl set balanced
# Set performance if available
powerprofilesctl set performance
# Battery status
upower -i $(upower -e | grep BAT) 2>/dev/null
# Show running processes
top
# Show startup applications through GUI
gnome-session-propertiesTLP and powertop
# Install TLP
sudo apt update
sudo apt install tlp
# Enable TLP
sudo systemctl enable --now tlp
# Show TLP status
sudo tlp-stat -s
# Install powertop
sudo apt install powertop
# Run powertop
sudo powertopBattery optimization flow
Battery drains quickly
โ
โโโ Check power profile
โโโ Reduce screen brightness
โโโ Close high CPU apps
โโโ Disable unused Bluetooth
โโโ Review startup apps
โโโ Check browser tabs
โโโ Check GPU mode
โโโ Use TLP if needed
โโโ Measure againLaptop routine
On battery:
[ ] power-saver profile
[ ] lower brightness
[ ] close heavy browser tabs
[ ] stop unused containers or VMs
[ ] disable Bluetooth if unused
[ ] avoid heavy indexing jobs
[ ] monitor CPU usage
[ ] suspend when idleSwappiness: memory behavior and swap tuning
Swappiness controls how aggressively the Linux kernel tends to move memory pages to swap. Lower values generally reduce swap tendency; higher values allow more swapping. It is not a magic performance setting. The correct value depends on RAM size, workload, disk speed and latency tolerance.
| Context | Typical approach | Reason |
|---|---|---|
| Desktop with enough RAM | Moderately low swappiness. | Keep apps responsive. |
| Small laptop | Do not disable swap blindly. | Swap can prevent abrupt OOM. |
| Database server | Avoid active swapping. | Swap can hurt latency heavily. |
| Batch workload | Some swap may be acceptable. | Throughput may tolerate latency. |
| VM with slow disk | Be careful with swap activity. | Slow storage amplifies latency. |
Inspect memory and swappiness
# Current swappiness
cat /proc/sys/vm/swappiness
sysctl vm.swappiness
# Memory overview
free -h
# Swap devices/files
swapon --show
# Swap activity
vmstat 1
# Top memory processes
ps aux --sort=-%mem | head -30Temporary and persistent swappiness
# Temporary change until reboot
sudo sysctl -w vm.swappiness=10
# Persistent configuration
sudo vim /etc/sysctl.d/99-custom-swappiness.conf
# Example content
vm.swappiness = 10
# Apply persistent sysctl files
sudo sysctl --system
# Verify
sysctl vm.swappinessSwappiness decision tree
Considering swappiness change?
โ
โโโ Is there real swap activity?
โ โโโ no -> do not tune yet
โ โโโ yes
โ
โโโ Is system slow because of swapping?
โ โโโ no -> investigate app first
โ โโโ yes
โ
โโโ Is RAM insufficient?
โ โโโ yes -> reduce workload or add RAM
โ โโโ no
โ
โโโ Test lower value
โโโ apply temporarily
โโโ measure behavior
โโโ document result
โโโ make persistent only if usefulMemory interpretation
| Signal | Meaning |
|---|---|
| High used memory | Normal if Linux is using cache. |
| Low available memory | Possible pressure. |
| Swap used but stable | Not always a problem. |
| Active swap in/out | Performance warning. |
| OOM logs | Memory exhaustion occurred. |
Cleanup: temporary files, caches, logs and safe disk hygiene
Cleanup keeps Ubuntu healthy, but careless cleanup can delete useful data. Focus on safe areas first: APT cache, unused packages, journal size, trash, thumbnails, old downloads and application caches. Be very careful with database directories, Docker volumes and project folders.
| Cleanup target | Command / location | Safety level |
|---|---|---|
| APT cache | sudo apt clean | Safe. |
| Unused packages | sudo apt autoremove | Usually safe, review output. |
| Systemd journal | journalctl --vacuum-time=14d | Safe if retention is acceptable. |
| User trash | File manager or trash path. | Safe if reviewed. |
| Downloads | ~/Downloads | Manual review recommended. |
| Docker data | docker system df | Careful, volumes may contain data. |
| Database files | /var/lib/mysql, /var/lib/postgresql | Dangerous to delete manually. |
Safe cleanup commands
# Check disk usage first
df -h
# Show top-level directory sizes
sudo du -xhd1 / 2>/dev/null | sort -h
# Clean APT cache
sudo apt clean
# Remove unused packages
sudo apt autoremove
# Show journal size
journalctl --disk-usage
# Vacuum journal by time
sudo journalctl --vacuum-time=14d
# Vacuum journal by size
sudo journalctl --vacuum-size=1GUser cache cleanup
# Check user cache size
du -sh ~/.cache 2>/dev/null
# Check thumbnails
du -sh ~/.cache/thumbnails 2>/dev/null
# Remove thumbnail cache
rm -rf ~/.cache/thumbnails/*
# Review downloads manually
du -sh ~/Downloads/*
ls -lah ~/DownloadsDocker cleanup caution
# Show Docker disk usage
docker system df
# Remove unused images only
docker image prune
# Remove stopped containers
docker container prune
# More aggressive cleanup, use carefully
docker system prune
# Dangerous for persistent data if volumes included
docker system prune --volumesCleanup decision tree
Need disk space?
โ
โโโ Check filesystem
โ โโโ df -h
โ
โโโ Find large directories
โ โโโ du -xhd1 /
โ
โโโ Safe cleanup first
โ โโโ apt clean
โ โโโ apt autoremove
โ โโโ journal vacuum
โ โโโ trash/downloads review
โ
โโโ App-specific cleanup
โ โโโ browser cache
โ โโโ Docker images
โ โโโ old build artifacts
โ
โโโ Dangerous data zones
โโโ databases
โโโ Docker volumes
โโโ project data
โโโ backupsTroubleshooting customization and optimization problems
Customization problems usually appear after installing extensions, changing themes, modifying startup apps, changing power tools, altering sysctl settings or cleaning too aggressively. The fastest recovery is to isolate the last change and revert it.
| Symptom | Likely cause | First check | Fix direction |
|---|---|---|---|
| Desktop shell unstable | Broken GNOME extension. | gnome-extensions list --enabled | Disable recent extension. |
| Text unreadable | Theme mismatch. | GNOME Tweaks theme settings. | Return to default theme. |
| Login slow | Startup apps or extensions. | User journal and startup apps. | Disable nonessential startup entries. |
| Battery drains fast | High CPU app, containers, VM, browser. | top, power profile. | Stop heavy workload, set power saver. |
| System slow after tuning | Bad sysctl or swap behavior. | vmstat 1, sysctl values. | Revert tuning. |
| Missing files after cleanup | Over-aggressive delete. | Trash, backup, shell history. | Restore from backup if possible. |
Diagnostic commands
# GNOME version and session
gnome-shell --version
echo $XDG_SESSION_TYPE
# Enabled extensions
gnome-extensions list --enabled
# User session warnings
journalctl --user -p warning --since "1 hour ago"
# GNOME Shell logs
journalctl /usr/bin/gnome-shell --since "1 hour ago"
# Resource usage
top
free -h
df -h
# Swappiness
sysctl vm.swappinessRollback flow
Customization issue
โ
โโโ What changed last?
โ โโโ extension
โ โโโ theme
โ โโโ shortcut
โ โโโ startup app
โ โโโ power tool
โ โโโ sysctl value
โ
โโโ Disable or revert one change
โโโ Log out and log back in if needed
โโโ Check user journal
โโโ Verify desktop stability
โโโ Document stable configurationSafe mode mindset
If desktop is unstable:
1. Switch to terminal if possible
2. Disable recent extensions
3. Return to default theme
4. Remove recent startup app
5. Reboot or log out
6. Restore Timeshift snapshot if neededUseful reset targets
# Disable one extension
gnome-extensions disable extension-name
# List user autostart entries
ls -lah ~/.config/autostart
# Move suspicious autostart entry away
mkdir -p ~/.config/autostart.disabled
mv ~/.config/autostart/app.desktop ~/.config/autostart.disabled/
# Revert sysctl custom file
sudo mv /etc/sysctl.d/99-custom-swappiness.conf /tmp/
sudo sysctl --systemFinal customization and optimization checklist
Customization checklist
[ ] Built-in Settings used before extensions
[ ] GNOME Tweaks installed if needed
[ ] Extension list is short and useful
[ ] Extensions are compatible with GNOME version
[ ] Unused extensions removed
[ ] Theme source is trusted
[ ] Default theme remains available
[ ] Icons remain readable
[ ] Terminal colors remain readable
[ ] Keyboard shortcuts are documented
[ ] No shortcut runs destructive command
[ ] Startup applications are reviewed
[ ] Restore point exists before major desktop changesOptimization checklist
[ ] Power profile configured
[ ] Battery behavior reviewed
[ ] Heavy startup apps disabled
[ ] Disk usage checked
[ ] APT cache cleaned when needed
[ ] Journal size controlled
[ ] User caches reviewed
[ ] Docker usage reviewed if installed
[ ] Swappiness observed before tuning
[ ] sysctl changes documented
[ ] Performance measured before and after changes
[ ] Cleanup avoids databases and important volumesCommand cheat sheet
# GNOME and extensions
gnome-shell --version
gnome-extensions list
gnome-extensions list --enabled
gnome-extensions disable extension-name
sudo apt install gnome-tweaks
sudo apt install gnome-shell-extension-manager
# Power
powerprofilesctl
powerprofilesctl set power-saver
powerprofilesctl set balanced
sudo apt install tlp
sudo systemctl enable --now tlp
sudo tlp-stat -s
# Memory and swappiness
free -h
swapon --show
vmstat 1
sysctl vm.swappiness
sudo sysctl -w vm.swappiness=10
# Cleanup
df -h
sudo du -xhd1 / 2>/dev/null | sort -h
sudo apt clean
sudo apt autoremove
journalctl --disk-usage
sudo journalctl --vacuum-time=14dFinal rule
Customize GNOME carefully, keep extensions minimal, use readable themes, build a keyboard-driven workflow, optimize battery and memory only with evidence, clean disk space safely, and keep rollback options before major changes.
Minimal safe profile
Minimum safe customization profile:
- default theme fallback
- small extension set
- documented shortcuts
- reviewed startup apps
- power profile selected
- disk cleanup routine
- no blind sysctl tuning
- no dangerous cleanup
- restore point before major changes
- stable desktop after logout/reboot testSecurity hardening objective
Ubuntu hardening means reducing the attack surface of a machine while keeping it maintainable. The goal is not to make the server impossible to use. The goal is to control access, reduce exposed ports, keep packages patched, monitor suspicious events, protect secrets, isolate services and keep a clear recovery path.
A secure Ubuntu server is built layer by layer: SSH access, users and sudo, firewall, package updates, service isolation, log visibility, intrusion throttling, cloud network rules, backups and incident procedures.
| Security layer | Goal | Main tools | Failure prevented |
|---|---|---|---|
| SSH | Control remote administration. | sshd_config, SSH keys, logs. | Brute force, root login abuse, password compromise. |
| Firewall | Expose only required ports. | ufw, nftables, cloud security groups. | Unwanted network exposure. |
| Users and sudo | Apply least privilege. | adduser, usermod, sudoers. | Shared accounts, excessive privileges, poor auditability. |
| Updates | Patch known vulnerabilities. | apt, unattended upgrades, reboot policy. | Known CVEs left exploitable. |
| Audit | See what happened. | journalctl, auth.log, auditd, central logs. | Blind incidents and no forensic trail. |
| Cloud | Control external exposure and identity. | Security groups, IAM, metadata settings, snapshots. | Public services, leaked secrets, weak recovery. |
Hardening architecture map
Internet
โ
โโโ DNS
โโโ CDN / WAF / Load Balancer
โโโ Cloud security group
โ
โผ
Ubuntu server
โ
โโโ UFW / nftables
โโโ SSH daemon
โโโ system users and sudo
โโโ systemd services
โโโ package security updates
โโโ logs and audit trail
โโโ fail2ban or rate controls
โโโ secrets and permissions
โโโ backups and restore plan
โ
โผ
Application layer
โโโ Nginx
โโโ app runtime
โโโ database
โโโ Redis
โโโ monitoring agentSecurity baseline priorities
Priority 1:
- SSH keys
- no root SSH login
- firewall enabled
- security updates
- backups
Priority 2:
- fail2ban or equivalent
- sudo policy
- service users
- secret permissions
- log review
Priority 3:
- auditd
- central logging
- file integrity checks
- vulnerability scanning
- CIS-style benchmark review
Priority 4:
- bastion host
- VPN-only administration
- WAF
- immutable images
- automated rebuildSSH hardening: keys, root login, password policy and safe reload
SSH is usually the main administration door. On a public server, weak SSH configuration is one of the first risks to address. The safe baseline is key-based login, no direct root login, no password authentication when keys are validated, and limited users.
| Setting | Recommended value | Why |
|---|---|---|
PermitRootLogin | no | Forces named-user login and sudo audit trail. |
PasswordAuthentication | no | Blocks password brute-force login. |
PubkeyAuthentication | yes | Uses SSH keys. |
AllowUsers | Specific admin users only. | Reduces account exposure. |
X11Forwarding | no on servers. | Reduces unused features. |
MaxAuthTries | Small value such as 3. | Limits repeated authentication attempts. |
Generate and install key
# On admin workstation
ssh-keygen -t ed25519 -C "admin-server-access"
# Copy public key to server
ssh-copy-id deploy@server.example.com
# Test key login before changing server policy
ssh deploy@server.example.comSafe SSH hardening flow
Open current SSH session
โ
โโโ Create deploy user
โโโ Add SSH key
โโโ Test second SSH session
โโโ Backup sshd_config
โโโ Apply hardening
โโโ Validate syntax
โโโ Restart SSH
โโโ Test third SSH session
โโโ Close old session only after successServer-side SSH configuration
# Create backup
sudo cp /etc/ssh/sshd_config /etc/ssh/sshd_config.bak.$(date +%Y%m%d-%H%M%S)
# Edit configuration
sudo vim /etc/ssh/sshd_config
# Recommended baseline
PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes
X11Forwarding no
MaxAuthTries 3
AllowUsers deploy
# Validate syntax before restart
sudo sshd -t
# Restart SSH
sudo systemctl restart ssh
# Check service and logs
systemctl status ssh
journalctl -u ssh --since "15 min ago"SSH diagnostic commands
# Service status
systemctl status ssh
# Listening port
ss -lntp | grep ssh
# Authentication logs
journalctl -u ssh --since today
sudo tail -100 /var/log/auth.log
# Current sessions
who
w
# Show user key file permissions
ls -lah ~/.ssh
ls -lah ~/.ssh/authorized_keysUFW firewall: minimal exposure and safe activation
UFW is a simple firewall frontend commonly used on Ubuntu. The baseline is to deny incoming traffic by default, allow outgoing traffic, then open only the required service ports. On cloud servers, UFW complements cloud security groups; it does not replace them.
| Port | Service | Exposure rule | Comment |
|---|---|---|---|
22/tcp | SSH | Restrict by source IP if possible. | Administration path. |
80/tcp | HTTP | Open only for web server or redirect. | Often redirects to HTTPS. |
443/tcp | HTTPS | Open for public web apps. | Primary web entry point. |
5432/tcp | PostgreSQL | Private network only. | Never public unless heavily controlled. |
6379/tcp | Redis | Private network only. | Do not expose publicly. |
3306/tcp | MySQL/MariaDB | Private network only. | Restrict by source and credentials. |
Safe UFW baseline
# Show current status
sudo ufw status verbose
# Default policy
sudo ufw default deny incoming
sudo ufw default allow outgoing
# Allow SSH before enabling firewall
sudo ufw allow OpenSSH
# Web server ports if needed
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
# Enable firewall
sudo ufw enable
# Verify
sudo ufw status verbose
sudo ufw status numberedFirewall decision diagram
New service installed
โ
โโโ Does it need network access?
โ โโโ no -> keep local only
โ โโโ yes
โ
โโโ Is it public-facing?
โ โโโ yes -> allow only required public port
โ โโโ no
โ
โโโ Is it internal only?
โ โโโ yes -> restrict to private CIDR or source IP
โ โโโ no
โ
โโโ Is exposure documented?
โโโ yes -> add rule
โโโ no -> do not exposeRestrict access by source
# Allow SSH only from one admin IP
sudo ufw allow from 203.0.113.10 to any port 22 proto tcp
# Allow PostgreSQL only from app server
sudo ufw allow from 10.0.1.25 to any port 5432 proto tcp
# Delete rule by number
sudo ufw status numbered
sudo ufw delete 3
# Deny a specific IP
sudo ufw deny from 198.51.100.55UFW diagnostics
# UFW status
sudo ufw status verbose
# Listening ports
ss -lntp
# Check service locally
curl -I http://localhost
# Check logs if logging enabled
sudo ufw logging on
sudo journalctl -k --since "30 min ago" | grep UFWUsers, groups, sudo, service accounts and least privilege
Least privilege means each human and service gets only the permissions needed to do its job. Avoid shared admin accounts, avoid running applications as root, and keep secrets readable only by the users that need them.
| Identity type | Recommended practice | Example |
|---|---|---|
| Human admin | Named account with sudo if required. | deploy, ops_admin |
| Application user | Dedicated non-login user. | myapp, www-data |
| Database user | Application-specific DB account. | myapp_db_user |
| Root | Avoid direct login. | Use sudo with audit trail. |
| Shared account | Avoid. | Hard to audit and revoke safely. |
User and group commands
# Create admin user
sudo adduser deploy
sudo usermod -aG sudo deploy
# Create service user without login shell
sudo adduser --system --group --home /srv/myapp myapp
# Show user identity
id deploy
groups deploy
# Show sudo permissions
sudo -l
# Edit sudoers safely
sudo visudo
# Add sudoers file safely
sudo visudo -f /etc/sudoers.d/deployLeast privilege model
Human admin
โ
โโโ SSH key login
โโโ sudo for admin actions
โโโ no direct root login
Application service
โ
โโโ dedicated user
โโโ limited filesystem access
โโโ systemd service unit
โโโ no shell login if not needed
Secrets
โ
โโโ owned by service user or root
โโโ mode 600 or 640
โโโ not world-readable
โโโ not committed to gitSecret and file permissions
# Private key
chmod 600 /home/deploy/.ssh/id_ed25519
# SSH directory
chmod 700 /home/deploy/.ssh
# Application env file
sudo chown root:myapp /srv/myapp/.env
sudo chmod 640 /srv/myapp/.env
# Application directory
sudo chown -R myapp:www-data /srv/myapp
sudo chmod -R u=rwX,g=rX,o= /srv/myappAccount review commands
# List users
cut -d: -f1 /etc/passwd
# Show users with shell access
grep -E "/bin/bash|/bin/sh|/usr/bin/zsh" /etc/passwd
# Show sudo group members
getent group sudo
# Show recent sudo usage
sudo grep sudo /var/log/auth.log | tail -100Security updates, patch windows and reboot policy
Security updates close known vulnerabilities. On Ubuntu, patching must include package updates, service restarts, kernel reboots when required, and validation after patching. Production teams should define standard patch windows and emergency patch paths.
| Patch model | Best for | Advantage | Risk |
|---|---|---|---|
| Manual patching | Critical systems with maintenance windows. | Control and validation. | Can be delayed. |
| Unattended security updates | Standard servers. | Fast CVE response. | Needs restart and reboot policy. |
| Golden image rebuild | Cloud fleets and stateless systems. | Reproducible and rollback-friendly. | Requires image pipeline. |
| Rolling patching | HA clusters. | Minimizes downtime. | Requires health checks and drain logic. |
Patch commands
# Refresh metadata
sudo apt update
# Show upgradeable packages
apt list --upgradable
# Apply upgrades
sudo apt upgrade
# Full upgrade with dependency changes
sudo apt full-upgrade
# Remove obsolete packages
sudo apt autoremove
# Check reboot requirement
test -f /var/run/reboot-required && cat /var/run/reboot-required
cat /var/run/reboot-required.pkgs 2>/dev/nullPatch workflow diagram
Security update required
โ
โโโ Identify affected packages
โโโ Check staging compatibility
โโโ Snapshot or backup
โโโ Apply apt updates
โโโ Restart affected services
โโโ Reboot if required
โโโ Validate application
โโโ Check logs
โโโ Document package changesUnattended upgrades
# Install unattended upgrades
sudo apt install unattended-upgrades
# Enable basic automatic security updates
sudo dpkg-reconfigure unattended-upgrades
# Config files
/etc/apt/apt.conf.d/20auto-upgrades
/etc/apt/apt.conf.d/50unattended-upgrades
# Logs
sudo less /var/log/unattended-upgrades/unattended-upgrades.logPost-patch validation
# Failed services
systemctl --failed
# Warnings since patch
journalctl -p warning --since "30 min ago"
# Listening ports
ss -lntp
# Application smoke test
curl -I https://example.com
# Confirm kernel after reboot
uname -afail2ban: throttling brute-force attempts and noisy clients
fail2ban watches logs and temporarily bans IP addresses that match suspicious patterns, such as repeated SSH authentication failures. It is not a replacement for key-based SSH and firewall rules, but it is useful as an extra layer against brute-force noise.
| Component | Meaning | Example |
|---|---|---|
| Jail | Protection rule for a service. | sshd |
| Filter | Log pattern that detects failures. | SSH failed login regex. |
| Action | What to do when matched. | Ban IP with firewall. |
| findtime | Time window for counting failures. | 10m |
| maxretry | Number of failures before ban. | 5 |
| bantime | Ban duration. | 1h |
Install and baseline
# Install
sudo apt update
sudo apt install fail2ban
# Create local jail config
sudo cp /etc/fail2ban/jail.conf /etc/fail2ban/jail.local
# Edit local config
sudo vim /etc/fail2ban/jail.local
# Restart and enable
sudo systemctl enable fail2ban
sudo systemctl restart fail2ban
# Check status
sudo systemctl status fail2banExample SSH jail
[sshd]
enabled = true
port = ssh
filter = sshd
logpath = %(sshd_log)s
maxretry = 5
findtime = 10m
bantime = 1hfail2ban operations
# Overall status
sudo fail2ban-client status
# Jail status
sudo fail2ban-client status sshd
# Ban an IP manually
sudo fail2ban-client set sshd banip 198.51.100.10
# Unban an IP
sudo fail2ban-client set sshd unbanip 198.51.100.10
# Logs
sudo journalctl -u fail2ban --since today
sudo tail -100 /var/log/fail2ban.logLayered protection
SSH security layers
โ
โโโ SSH keys
โโโ no root login
โโโ no password auth
โโโ AllowUsers
โโโ UFW source restriction
โโโ fail2ban
โโโ bastion or VPN for stricter environmentsAudit, logs, detection and security visibility
Hardening without visibility is incomplete. You need to know who logged in, who used sudo, which services failed, what ports are listening, which packages changed, and whether suspicious authentication events occurred.
| Question | Command / source | Why it matters |
|---|---|---|
| Who logged in? | last, who, w | Session visibility. |
| Who used sudo? | /var/log/auth.log | Privilege escalation audit. |
| Which SSH attempts failed? | journalctl -u ssh | Brute-force or misconfiguration detection. |
| Which packages changed? | /var/log/apt/history.log | Patch and change traceability. |
| Which services failed? | systemctl --failed | Operational health. |
| Which ports are open? | ss -lntp | Exposure check. |
Security log commands
# SSH logs
journalctl -u ssh --since today
# Authentication logs
sudo tail -200 /var/log/auth.log
# Failed SSH attempts
sudo grep -i "failed password" /var/log/auth.log | tail -100
# Sudo usage
sudo grep -i "sudo" /var/log/auth.log | tail -100
# Recent logins
last -a | head -30
# Current sessions
who
wAudit architecture
Ubuntu host
โ
โโโ journald
โโโ auth.log
โโโ apt history
โโโ service logs
โโโ firewall logs
โโโ fail2ban logs
โโโ application logs
โ
โผ
Central logging
โ
โโโ CloudWatch
โโโ ELK / OpenSearch
โโโ Loki
โโโ SIEM
โโโ long-term archiveauditd baseline
# Install audit daemon
sudo apt install auditd audispd-plugins
# Enable service
sudo systemctl enable auditd
sudo systemctl start auditd
# Status
sudo systemctl status auditd
# Search audit logs
sudo ausearch -m USER_LOGIN
sudo ausearch -m USER_CMD
sudo aureport --summarySecurity review snapshot
echo "== USERS WITH SHELL =="
grep -E "/bin/bash|/bin/sh|/usr/bin/zsh" /etc/passwd
echo "== SUDO GROUP =="
getent group sudo
echo "== OPEN PORTS =="
ss -lntp
echo "== FAILED UNITS =="
systemctl --failed
echo "== RECENT SSH LOGS =="
journalctl -u ssh --since "24 hours ago" --no-pagerCloud security: security groups, metadata, IAM, snapshots and bastion design
On cloud servers, Ubuntu security is shared between the operating system and the cloud perimeter. A safe design uses security groups, private subnets, IAM roles, metadata protection, snapshots, central logs and restricted administration paths.
| Cloud control | Purpose | Ubuntu-side complement |
|---|---|---|
| Security group | Cloud firewall at instance or interface level. | UFW local firewall. |
| Private subnet | Keep databases and internal services non-public. | Bind services to private IP or localhost. |
| Bastion host | Controlled admin entry point. | SSH restricted to bastion IP. |
| IAM role | Grant cloud API permissions without static keys. | Avoid storing cloud keys on disk. |
| Metadata service controls | Reduce credential exposure risk. | Limit local process access and use least privilege. |
| Snapshots | Rollback and disaster recovery. | Test restore and document recovery. |
| Cloud logs | Centralize evidence and monitoring. | Forward Ubuntu logs and app logs. |
Cloud exposure model
Public Internet
โ
โโโ HTTPS only
โผ
Load balancer / WAF
โ
โโโ forwards to app subnet
โผ
Application server
โ
โโโ UFW allows LB source only
โโโ SSH allowed from bastion only
โโโ app talks to DB privately
โ
โผ
Database server
โโโ no public IP
โโโ private subnet
โโโ port allowed only from app serverCloud hardening checklist
[ ] Only required public ports are open
[ ] SSH is restricted by source IP or bastion
[ ] Database has no public exposure
[ ] Redis has no public exposure
[ ] Security groups are documented
[ ] UFW rules match cloud security model
[ ] IAM role uses least privilege
[ ] No static cloud keys in home directories
[ ] Instance metadata policy is reviewed
[ ] Snapshots are scheduled
[ ] Restore has been tested
[ ] Logs are shipped centrally
[ ] Monitoring alerts are configuredCloud diagnostic commands
# Local listening ports
ss -lntp
# Local firewall
sudo ufw status verbose
# Instance view of routes and IPs
ip a
ip r
# Check outbound cloud metadata access if policy allows it
curl -s --max-time 2 http://169.254.169.254/ || true
# Check public service from server
curl -I http://localhost
# Check logs
journalctl -p warning --since "30 min ago"Security incident response: brute force, exposed port, compromise suspicion
Security incidents must be handled carefully. The first objective is to preserve evidence and stop further damage. Avoid making random changes before collecting logs, current sessions, open ports and process state.
| Incident | Immediate checks | Containment |
|---|---|---|
| SSH brute force | Auth logs, fail2ban, source IPs. | Restrict SSH, disable passwords, ban sources. |
| Unexpected open port | ss -lntp, service status, firewall. | Stop service or close firewall rule. |
| Suspicious user | /etc/passwd, sudo group, auth logs. | Lock account, preserve logs. |
| Package tampering | Apt history, modified repositories. | Disable unknown repos, rebuild if needed. |
| Possible compromise | Processes, ports, cron, users, logs. | Isolate host, snapshot disk, rotate credentials. |
| Secret exposure | Access logs, shell history, app logs. | Rotate keys, revoke tokens, audit access. |
First response commands
# Current users and sessions
who
w
last -a | head -50
# Listening ports and processes
ss -lntp
ps aux --sort=-%cpu | head -30
# Failed services
systemctl --failed
# Recent auth activity
sudo tail -300 /var/log/auth.log
# SSH logs
journalctl -u ssh --since "24 hours ago"
# Recent package changes
less /var/log/apt/history.logIncident response flow
Security alert
โ
โโโ Preserve evidence
โ โโโ logs
โ โโโ sessions
โ โโโ ports
โ โโโ process list
โ
โโโ Determine scope
โ โโโ one account
โ โโโ one service
โ โโโ one host
โ โโโ multiple systems
โ
โโโ Contain
โ โโโ firewall rule
โ โโโ disable account
โ โโโ stop service
โ โโโ isolate instance
โ
โโโ Eradicate
โ โโโ patch
โ โโโ remove access
โ โโโ rotate secrets
โ โโโ rebuild if needed
โ
โโโ Recover
โโโ restore service
โโโ validate logs
โโโ monitor closely
โโโ write postmortemCredential rotation checklist
[ ] SSH keys reviewed
[ ] Unknown keys removed
[ ] Sudo users reviewed
[ ] Application secrets rotated
[ ] Database passwords rotated
[ ] Cloud API keys revoked or rotated
[ ] CI/CD tokens rotated
[ ] Webhook secrets rotated
[ ] TLS private key checked
[ ] Backup access reviewedFinal hardening checklist and command cheat sheet
Ubuntu security baseline checklist
[ ] Ubuntu LTS is used
[ ] Packages are updated
[ ] Reboot-required state is checked
[ ] Named admin user exists
[ ] Root SSH login is disabled
[ ] SSH key login is validated
[ ] Password SSH login is disabled
[ ] UFW default deny incoming is enabled
[ ] Only required ports are open
[ ] Database ports are private only
[ ] Redis ports are private only
[ ] fail2ban is installed if public SSH exists
[ ] Sudo group is reviewed
[ ] Service users are non-root
[ ] Secrets are not world-readable
[ ] Logs are reviewed and centralized if possible
[ ] Backups or snapshots exist
[ ] Restore has been tested
[ ] Cloud security groups are minimal
[ ] Incident response procedure existsQuick security snapshot
echo "== OS =="
lsb_release -a
echo "== REBOOT REQUIRED =="
test -f /var/run/reboot-required && cat /var/run/reboot-required || echo "no reboot flag"
echo "== UFW =="
sudo ufw status verbose
echo "== OPEN PORTS =="
ss -lntp
echo "== SUDO USERS =="
getent group sudo
echo "== SSH LOGS =="
journalctl -u ssh --since "24 hours ago" --no-pager | tail -100Command cheat sheet
# SSH
sudo sshd -t
sudo systemctl restart ssh
journalctl -u ssh --since today
# Firewall
sudo ufw status verbose
sudo ufw allow OpenSSH
sudo ufw allow 443/tcp
sudo ufw enable
# Users
id deploy
getent group sudo
sudo visudo
sudo passwd -l username
# Updates
sudo apt update
apt list --upgradable
sudo apt upgrade
test -f /var/run/reboot-required && cat /var/run/reboot-required
# fail2ban
sudo fail2ban-client status
sudo fail2ban-client status sshd
# Logs
sudo tail -100 /var/log/auth.log
journalctl -p warning --since today
systemctl --failedFinal rule
Secure the access path, minimize exposed ports, patch regularly, run services with least privilege, monitor logs, keep backups, test recovery, and document every exception.
Minimal secure server profile
Minimum secure Ubuntu server:
- Ubuntu LTS
- SSH key access only
- no root SSH login
- UFW enabled
- only required ports open
- security updates applied
- non-root service users
- secrets protected
- logs available
- backups tested
- cloud perimeter restrictedPerformance and robustness objective
Ubuntu performance engineering is not random tuning. It is a disciplined loop: observe real metrics, identify the bottleneck, make one controlled change, measure again, document the result, and keep rollback possible.
Production robustness comes from stable LTS releases, predictable package updates, systemd service supervision, good logs, monitoring, disk hygiene, firewalling, backup, capacity planning and incident runbooks. Ubuntu is considered stable in production because these operational primitives are mature, widely documented and automation-friendly.
| Layer | Observe | Typical bottleneck | Main tools |
|---|---|---|---|
| CPU | Load average, CPU %, run queue, per-process usage. | Too many workers, hot loop, compression, TLS, DB query CPU. | top, htop, pidstat, mpstat |
| Memory | Used RAM, available RAM, swap, OOM kills. | Memory leak, cache pressure, too many processes. | free, vmstat, journalctl |
| Disk / IO | IO wait, disk latency, queue depth, filesystem usage. | Slow volume, log growth, DB writes, Docker layers. | iostat, iotop, df, du |
| Network | Ports, connections, packet errors, latency, throughput. | Firewall, DNS, saturation, SYN flood, bad route. | ss, ip, mtr, nload |
| Services | systemd state, restarts, logs, health checks. | Crash loop, bad config, missing dependency. | systemctl, journalctl |
Performance investigation map
Application is slow
โ
โโโ CPU saturated?
โ โโโ yes -> top, htop, pidstat, app profiler
โ โโโ no
โ
โโโ Memory pressure?
โ โโโ yes -> free, vmstat, OOM logs, process RSS
โ โโโ no
โ
โโโ IO wait high?
โ โโโ yes -> iostat, iotop, df, DB/log writes
โ โโโ no
โ
โโโ Network slow?
โ โโโ yes -> ss, ip, mtr, DNS, firewall
โ โโโ no
โ
โโโ Service unstable?
โ โโโ yes -> systemctl, journalctl, restart policy
โ โโโ no
โ
โโโ Application bottleneck?
โโโ DB query
โโโ external API
โโโ lock contention
โโโ cache miss
โโโ bad algorithmInstall performance toolkit
sudo apt update
sudo apt install -y \
sysstat \
iotop \
htop \
nload \
iftop \
mtr-tiny \
dstat \
strace \
lsof \
curl \
dnsutilsstrace, perf or heavy tracing can add overhead. Use carefully on busy production systems.CPU: load average, saturation, processes and worker sizing
CPU performance issues usually appear as high load average, high user CPU, high system CPU, excessive context switching or too many runnable processes. On web servers, wrong worker counts can create CPU contention or memory pressure.
| Metric | Meaning | Warning sign | Command |
|---|---|---|---|
| Load average | Runnable or waiting tasks over 1, 5, 15 min. | Load consistently above CPU cores. | uptime |
| User CPU | Application code CPU usage. | One process dominates. | top, pidstat |
| System CPU | Kernel work. | High network, IO, syscall overhead. | mpstat |
| IO wait | CPU waiting on disk. | App slow but CPU not busy. | iostat, top |
| Steal time | VM CPU stolen by hypervisor. | Cloud instance contention. | mpstat |
| Context switches | Task switching overhead. | Too many workers or threads. | vmstat |
CPU commands
# Load average and uptime
uptime
# Interactive CPU/process view
top
htop
# Per-CPU statistics
mpstat -P ALL 1
# Per-process CPU every second
pidstat -u 1
# Process tree
ps aux --sort=-%cpu | head -30
# Threads of a process
ps -L -p PID -o pid,tid,pcpu,pmem,commCPU diagnosis flow
High CPU or high load
โ
โโโ Is load higher than CPU core count?
โ โโโ uptime, nproc
โ
โโโ Is CPU user, system, iowait or steal?
โ โโโ top, mpstat
โ
โโโ Which process dominates?
โ โโโ ps aux --sort=-%cpu
โ
โโโ Is it app code, DB, web server, backup, cron?
โ โโโ systemctl, logs, cron
โ
โโโ Did traffic increase?
โ โโโ nginx logs, app metrics
โ
โโโ Did a deployment or package update happen?
โโโ deploy logs, apt historyWorker sizing examples
Gunicorn starting point:
workers = (2 * CPU cores) + 1
Example:
2 vCPU -> 5 workers
4 vCPU -> 9 workers
But verify with:
- memory per worker
- request latency
- DB connection limit
- CPU saturation
- queue time
- error rate
Celery:
- CPU-bound tasks: concurrency near CPU cores
- IO-bound tasks: higher concurrency can help
- DB-heavy tasks: limit by database capacityMemory: RAM, cache, swap, OOM killer and service limits
Linux uses free memory for cache, so โused memoryโ is not automatically a problem. The important indicators are available memory, swap activity, OOM kills, growing process RSS, and whether memory pressure correlates with latency or crashes.
| Metric | Meaning | Bad sign | Command |
|---|---|---|---|
| Available RAM | Memory that can be used without heavy reclaim. | Very low for sustained period. | free -h |
| Swap used | Memory pages moved to disk. | Growing swap + latency. | swapon --show |
| si / so | Swap in / swap out activity. | Non-zero under load. | vmstat 1 |
| RSS | Resident memory per process. | Process grows without bound. | ps, top |
| OOM kill | Kernel killed process due to memory exhaustion. | Service disappears suddenly. | journalctl -k |
Memory commands
# Memory overview
free -h
# Swap
swapon --show
# VM activity
vmstat 1
# Top memory processes
ps aux --sort=-%mem | head -30
# Kernel OOM events
journalctl -k --since today | grep -i -E "oom|killed process"
# Memory per service process
systemctl status myservice
ps -o pid,rss,vsz,cmd -C gunicornMemory diagnosis flow
Service slow or killed
โ
โโโ Is available memory low?
โ โโโ free -h
โ
โโโ Is swap active?
โ โโโ swapon --show, vmstat 1
โ
โโโ Any OOM kills?
โ โโโ journalctl -k | grep -i oom
โ
โโโ Which process uses memory?
โ โโโ ps aux --sort=-%mem
โ
โโโ Did memory grow after deploy?
โ โโโ compare metrics before/after
โ
โโโ Can service be limited?
โโโ systemd MemoryMax, worker count, app configsystemd memory limit example
# /etc/systemd/system/myapp.service.d/limits.conf
[Service]
MemoryMax=1G
MemoryHigh=800M
Restart=on-failure
RestartSec=5
# Apply
sudo systemctl daemon-reload
sudo systemctl restart myapp
systemctl status myappSwap policy
| Context | Swap recommendation | Reason |
|---|---|---|
| Small VM | Small swap can prevent abrupt OOM. | Graceful degradation. |
| Latency-sensitive DB | Avoid heavy swap activity. | Swap can destroy latency. |
| Batch worker | Some swap acceptable. | Throughput may tolerate latency. |
Disk and IO: latency, throughput, filesystem usage and log growth
Disk IO bottlenecks often look like application slowness. CPU may appear idle while requests are stuck waiting for disk. Common causes: database writes, slow cloud volume, Docker logs, journal growth, backups, missing indexes, swap activity or full filesystem.
| Symptom | Likely cause | Verification | Correction |
|---|---|---|---|
| High app latency | IO wait or DB disk pressure. | iostat -xz 1 | Faster disk, batching, DB tuning. |
| Disk full | Logs, Docker, uploads, backups. | df -h, du -sh | Retention, cleanup, bigger volume. |
| Swap activity | RAM shortage. | vmstat 1 | Reduce workers, add RAM, tune app. |
| Docker grows fast | Images, containers, logs, volumes. | docker system df | Log rotation, prune carefully. |
| Journal too large | Systemd journal retention unmanaged. | journalctl --disk-usage | Vacuum or configure retention. |
Disk and IO commands
# Filesystem usage
df -h
# Largest top-level directories
sudo du -sh /* 2>/dev/null | sort -h
# Common growth areas
sudo du -sh /var/log/*
sudo du -sh /var/lib/docker/*
sudo du -sh /var/lib/postgresql/* 2>/dev/null
# Block devices
lsblk -f
findmnt
# IO statistics
iostat -xz 1
# IO by process
sudo iotop -o
# Journal usage
journalctl --disk-usageIO bottleneck flow
Application latency spike
โ
โโโ Is CPU iowait high?
โ โโโ top, iostat
โ
โโโ Which disk is busy?
โ โโโ iostat -xz 1
โ
โโโ Which process writes?
โ โโโ iotop -o
โ
โโโ Is filesystem near full?
โ โโโ df -h
โ
โโโ Did logs or Docker grow?
โ โโโ du -sh /var/log /var/lib/docker
โ
โโโ Is database the writer?
โโโ check slow queries
โโโ check checkpoints
โโโ check volume IOPSSafe cleanup examples
# Clean apt cache
sudo apt clean
# Remove unused packages
sudo apt autoremove
# Vacuum systemd journal by time
sudo journalctl --vacuum-time=14d
# Vacuum systemd journal by size
sudo journalctl --vacuum-size=1G
# Docker usage
docker system df
# Docker cleanup - careful on production
docker image prune
docker container prunePreventive controls
Prevent disk incidents:
- alert when filesystem > 80%
- separate /var for log-heavy servers
- configure logrotate
- configure Docker log rotation
- monitor journal size
- monitor database volume
- keep backup volume separate
- test volume resize procedureNetwork performance: ports, connections, latency, packet loss and throughput
Network issues may appear as API latency, timeouts, intermittent failures, failed database connections, slow downloads or connection storms. Diagnose from local socket state outward: listening ports, established connections, interface errors, DNS, routing, packet loss and remote latency.
| Question | Command | What to look for |
|---|---|---|
| Which ports listen? | ss -lntp | Expected services only. |
| How many connections? | ss -s | Established, time-wait, orphaned sockets. |
| Interface errors? | ip -s link | RX/TX errors, dropped packets. |
| DNS working? | dig, resolvectl | Resolver latency and correctness. |
| Packet loss or route issue? | mtr -rw | Loss, latency, bad hop. |
| Bandwidth usage? | nload, iftop | Unexpected egress or ingress. |
Network commands
# Socket summary
ss -s
# Listening TCP ports
ss -lntp
# Established connections
ss -antp
# Network interfaces and counters
ip -s link
# Routes
ip r
# DNS
resolvectl status
dig example.com
# Latency and packet loss
ping -c 5 1.1.1.1
mtr -rw example.com
# Live bandwidth
nload
sudo iftopNetwork diagnosis flow
Network latency or timeout
โ
โโโ Local service listening?
โ โโโ ss -lntp
โ
โโโ Firewall blocking?
โ โโโ ufw status, cloud security group
โ
โโโ DNS slow or wrong?
โ โโโ dig, resolvectl
โ
โโโ Route correct?
โ โโโ ip r
โ
โโโ Packet loss?
โ โโโ ping, mtr
โ
โโโ Interface drops?
โ โโโ ip -s link
โ
โโโ Too many connections?
โโโ ss -s, logs, rate limitsConnection pressure examples
# Count connections by state
ss -ant | awk 'NR>1 {state[$1]++} END {for (s in state) print s, state[s]}'
# Top remote IPs connected to port 443
ss -ant '( sport = :443 )' | awk 'NR>1 {print $5}' | cut -d: -f1 | sort | uniq -c | sort -nr | head
# Check Nginx access bursts
sudo tail -1000 /var/log/nginx/access.log | awk '{print $1}' | sort | uniq -c | sort -nr | headKernel and sysctl: controlled tuning, limits and safe defaults
Kernel tuning should be conservative. Ubuntu defaults are reasonable for general production. Change kernel parameters only when you understand the workload and can measure before and after. Keep changes versioned and reversible.
| Area | Parameter / control | Why it matters | Warning |
|---|---|---|---|
| File descriptors | LimitNOFILE, ulimit | Many sockets/files. | Must align with app and systemd. |
| Swappiness | vm.swappiness | Swap tendency. | Do not blindly set to zero. |
| TCP backlog | net.core.somaxconn | Connection bursts. | App backlog must also match. |
| Ephemeral ports | ip_local_port_range | Outbound connection scale. | Usually not first bottleneck. |
| Kernel logs | dmesg, journalctl -k | OOM, disk, driver, network errors. | Read before tuning. |
Inspect kernel and limits
# Kernel version
uname -a
# CPU cores
nproc
# Current sysctl values
sysctl vm.swappiness
sysctl net.core.somaxconn
sysctl net.ipv4.ip_local_port_range
# Current shell limits
ulimit -a
# systemd service limits
systemctl show nginx | grep -E "LimitNOFILE|LimitNPROC"
# Kernel messages
dmesg -T | tail -100
journalctl -k --since todaySafe sysctl pattern
# Temporary test until reboot
sudo sysctl -w net.core.somaxconn=4096
# Persistent setting
sudo vim /etc/sysctl.d/99-custom-performance.conf
# Example content
net.core.somaxconn = 4096
vm.swappiness = 10
# Apply
sudo sysctl --system
# Verify
sysctl net.core.somaxconn
sysctl vm.swappinesssystemd limit example
# Create override
sudo systemctl edit nginx
# Add:
[Service]
LimitNOFILE=65535
# Apply
sudo systemctl daemon-reload
sudo systemctl restart nginx
# Verify
systemctl show nginx | grep LimitNOFILETuning decision tree
Want to tune kernel?
โ
โโโ Is bottleneck measured?
โ โโโ no -> measure first
โ โโโ yes
โ
โโโ Is app configured consistently?
โ โโโ no -> tune app first
โ โโโ yes
โ
โโโ Is change reversible?
โ โโโ no -> do not apply
โ โโโ yes
โ
โโโ Apply one change
โโโ measure again
โโโ document
โโโ keep rollbackRobustness with systemd: restart policy, health checks, limits and dependencies
Production robustness depends on what happens when a process fails. systemd can restart services, limit resources, order dependencies, isolate users, set environment files and expose logs. A fragile script becomes a production service when it has a proper unit.
| systemd feature | Purpose | Example |
|---|---|---|
Restart | Restart process after failure. | Restart=on-failure |
RestartSec | Delay before restart. | RestartSec=5 |
StartLimitBurst | Prevent infinite crash loops. | StartLimitBurst=5 |
MemoryMax | Limit memory usage. | MemoryMax=1G |
LimitNOFILE | Raise file descriptor limit. | LimitNOFILE=65535 |
User | Run service as non-root user. | User=myapp |
Robust service unit
[Unit]
Description=My application service
After=network.target
StartLimitIntervalSec=60
StartLimitBurst=5
[Service]
User=myapp
Group=myapp
WorkingDirectory=/srv/myapp
EnvironmentFile=/srv/myapp/.env
ExecStart=/srv/myapp/venv/bin/gunicorn config.wsgi:application \
--bind 127.0.0.1:8000 \
--workers 3
Restart=on-failure
RestartSec=5
TimeoutStopSec=30
LimitNOFILE=65535
MemoryMax=1G
[Install]
WantedBy=multi-user.targetService robustness flow
Service process
โ
โโโ runs as non-root user
โโโ starts after dependencies
โโโ has environment file
โโโ logs to journald
โโโ restarts on failure
โโโ has restart backoff
โโโ has resource limits
โโโ is enabled at boot
โ
โผ
Operations
โโโ systemctl status
โโโ journalctl -u service
โโโ systemctl restart service
โโโ systemctl show service
โโโ alerts on restart countService diagnostics
# Status
systemctl status myapp
# Logs
journalctl -u myapp --since "1 hour ago"
journalctl -u myapp -f
# Check restart count and limits
systemctl show myapp | grep -E "NRestarts|Restart|Memory|LimitNOFILE"
# Failed units
systemctl --failed
# Reload unit changes
sudo systemctl daemon-reloadMonitoring: host metrics, service metrics, logs, alerts and SLOs
Monitoring makes performance visible before users complain. The minimum production stack should monitor CPU, memory, disk, IO, network, service state, open ports, logs, reboot-required state, certificate expiry, backups and application health.
| Metric family | Examples | Alert idea |
|---|---|---|
| CPU | CPU %, load average, steal, iowait. | Sustained saturation over baseline. |
| Memory | Available RAM, swap activity, OOM events. | Low available memory or OOM kill. |
| Disk | Filesystem %, inode %, IO latency. | Filesystem above 80-90%. |
| Network | Throughput, packet drops, connection count. | Unexpected drop/error spike. |
| Services | systemd failed units, restart count. | Service failed or crash-looping. |
| Security | SSH failures, auth failures, UFW denies. | Spike above baseline. |
Monitoring stack example
Ubuntu server
โ
โโโ node exporter
โโโ journald logs
โโโ application metrics
โโโ nginx metrics/logs
โโโ database exporter
โโโ backup status
โ
โผ
Observability platform
โโโ Prometheus
โโโ Grafana
โโโ Loki / ELK
โโโ Alertmanager
โโโ incident channelLocal monitoring commands
# CPU and memory
top
free -h
vmstat 1
# Disk and IO
df -h
iostat -xz 1
iotop -o
# Network
ss -s
ip -s link
nload
# Services and logs
systemctl --failed
journalctl -p warning --since "30 min ago"
# Reboot required
test -f /var/run/reboot-required && cat /var/run/reboot-requiredAlerting baseline
Recommended alerts:
[ ] Disk filesystem > 80%
[ ] Disk filesystem > 90%
[ ] Inode usage high
[ ] Service failed
[ ] Reboot required too long
[ ] OOM kill detected
[ ] Swap activity sustained
[ ] CPU saturation sustained
[ ] IO wait sustained
[ ] Backup failed
[ ] Certificate expires soon
[ ] SSH failure spike
[ ] HTTP 5xx spike
[ ] Database unavailableTroubleshooting playbooks: slow server, crash loop, disk full, memory pressure
Symptom matrix
| Symptom | First checks | Likely cause | Action |
|---|---|---|---|
| Server slow | top, free -h, iostat | CPU, memory or IO saturation. | Identify resource, reduce load, scale. |
| Service crash loop | systemctl status, journalctl -u | Bad config, dependency, permission, OOM. | Fix root cause, then restart. |
| Disk full | df -h, du -sh | Logs, Docker, DB, backups. | Clean safely, add retention, resize. |
| Memory pressure | free, vmstat, OOM logs. | Leak, too many workers, cache pressure. | Reduce workers, limit service, add RAM. |
| Network timeout | ss, ip, mtr, DNS. | Firewall, DNS, route, saturation. | Fix correct layer. |
One-shot diagnostic
echo "== HOST =="
hostnamectl
echo "== UPTIME =="
uptime
echo "== CPU/MEM =="
free -h
top -b -n1 | head -30
echo "== DISK =="
df -h
echo "== PORTS =="
ss -lntp
echo "== FAILED SERVICES =="
systemctl --failed
echo "== WARNINGS =="
journalctl -p warning --since "30 min ago" --no-pager | tail -100Universal performance decision tree
Production issue
โ
โโโ Is it user-visible?
โ โโโ yes -> check app SLO, HTTP 5xx, latency
โ โโโ no -> check monitoring and trend
โ
โโโ Resource saturation?
โ โโโ CPU -> top, pidstat
โ โโโ RAM -> free, vmstat, OOM
โ โโโ IO -> iostat, iotop
โ โโโ NET -> ss, ip, mtr
โ
โโโ Service instability?
โ โโโ systemctl, journalctl
โ
โโโ Recent change?
โ โโโ deployment
โ โโโ apt upgrade
โ โโโ config change
โ โโโ traffic spike
โ
โโโ Fix strategy
โโโ rollback
โโโ reduce load
โโโ scale resource
โโโ tune one parameter
โโโ monitor resultChange discipline
During performance incident:
[ ] Do not change many things at once
[ ] Capture metrics before change
[ ] Identify the bottleneck
[ ] Apply one controlled change
[ ] Measure again
[ ] Keep rollback possible
[ ] Document root cause
[ ] Add alert if missing
[ ] Add runbook step if usefulPerformance and robustness checklist
Production performance baseline
[ ] Ubuntu LTS is used
[ ] Packages are updated
[ ] Reboot policy exists
[ ] CPU metrics are monitored
[ ] Memory and swap are monitored
[ ] Disk usage is monitored
[ ] IO latency is monitored
[ ] Network errors are monitored
[ ] systemd failed units are alerted
[ ] Service restart count is monitored
[ ] Logs are centralized if possible
[ ] Backups are monitored
[ ] Restore has been tested
[ ] Capacity baseline is documented
[ ] Load test exists for critical services
[ ] Runbooks exist for CPU/RAM/IO/disk incidentsRobust service checklist
[ ] Service runs under non-root user
[ ] systemd unit is versioned
[ ] Restart policy is configured
[ ] Resource limits are configured if needed
[ ] Logs are visible with journalctl
[ ] Health check exists
[ ] Environment file permissions are strict
[ ] Deployment rollback is possible
[ ] Service starts at boot
[ ] Dependencies are documentedCommand cheat sheet
# CPU
uptime
top
htop
mpstat -P ALL 1
pidstat -u 1
# Memory
free -h
vmstat 1
swapon --show
journalctl -k | grep -i oom
# Disk / IO
df -h
du -sh /var/*
iostat -xz 1
iotop -o
journalctl --disk-usage
# Network
ss -s
ss -lntp
ip -s link
mtr -rw example.com
nload
# Services
systemctl --failed
systemctl status service
journalctl -u service -f
# Kernel / limits
uname -a
sysctl -a | grep vm.swappiness
ulimit -aFinal rule
Stability comes from LTS discipline, measured capacity, controlled updates, systemd supervision, monitored resources, clean logs, safe rollback, tested backups and calm incident handling.
Minimal robust server profile
Minimum robust Ubuntu server:
- Ubuntu LTS
- systemd-managed services
- restart policies
- monitoring for CPU/RAM/disk/IO/network
- alerting on failed services and disk growth
- log retention
- patch and reboot policy
- backup and restore test
- documented runbookUbuntu on cloud: what it means
Ubuntu is one of the most common Linux baselines for cloud servers. On AWS, it is typically deployed as an EC2 instance using an official Ubuntu AMI. The instance then boots with cloud-init, receives an SSH key, attaches storage, joins a network, applies security groups and runs the server bootstrap.
In production, the cloud image is part of the infrastructure contract. It defines the operating system version, kernel, package baseline, boot behavior, cloud-init behavior, default users, storage layout and initial security posture.
| Concept | Meaning | Production impact |
|---|---|---|
| AMI | Amazon Machine Image used to boot EC2. | Defines OS baseline and initial package state. |
| Official Ubuntu image | Image published by Canonical for AWS. | Preferred baseline for Ubuntu EC2 servers. |
| Owner ID | AWS account that owns the public AMI. | Used to avoid fake or untrusted public images. |
| cloud-init | First-boot initialization system. | Creates users, installs packages, writes files, runs commands. |
| User data | Bootstrap content passed at EC2 launch. | Automates first boot configuration. |
| Security group | AWS network firewall attached to instance or ENI. | Controls inbound and outbound exposure. |
| Key pair | SSH access credential used at launch. | Controls first admin access. |
AWS Ubuntu mental model
AWS EC2 Ubuntu instance
โ
โโโ AMI
โ โโโ Ubuntu release
โ โโโ kernel
โ โโโ cloud-init
โ โโโ base packages
โ
โโโ Instance configuration
โ โโโ instance type
โ โโโ EBS volume
โ โโโ subnet
โ โโโ security group
โ โโโ IAM role
โ โโโ SSH key pair
โ
โโโ First boot
โ โโโ cloud-init metadata
โ โโโ user data
โ โโโ SSH key injection
โ โโโ package installation
โ โโโ service bootstrap
โ
โโโ Operations
โโโ patching
โโโ monitoring
โโโ backups
โโโ logs
โโโ snapshots
โโโ replacement strategyOfficial URLs
Ubuntu on AWS:
https://documentation.ubuntu.com/aws/
Find Ubuntu images on AWS:
https://documentation.ubuntu.com/aws/aws-how-to/instances/find-ubuntu-images/
Ubuntu cloud images:
https://cloud-images.ubuntu.com/
AWS EC2:
https://docs.aws.amazon.com/ec2/
cloud-init:
https://cloudinit.readthedocs.io/Official Ubuntu AMIs and Canonical owner filtering
Public AMI catalogs contain many images. In production, you should avoid selecting a random public image called โUbuntuโ. Use official Canonical images and verify the owner. This reduces the risk of using an untrusted image with unknown modifications.
| Item | Value / practice | Reason |
|---|---|---|
| Canonical AWS owner ID | 099720109477 | Filters official Ubuntu AMIs published by Canonical. |
| Release choice | Ubuntu Server LTS for production. | Longer support and safer lifecycle. |
| Architecture | amd64 or arm64. | Must match EC2 instance family. |
| Storage type | EBS-backed AMI. | Standard for modern EC2 instances. |
| Virtualization | HVM. | Modern EC2 virtualization mode. |
| Image lifecycle | Pin or approve AMI IDs for production. | Avoid surprise image changes. |
Console filtering pattern
EC2 Console
โ
โโโ Images
โโโ AMIs
โโโ Public images
โโโ Owner = 099720109477
โโโ Name contains ubuntu/images/hvm-ssd/ubuntu
โโโ Select LTS release
โโโ Verify architecture and regionAWS CLI AMI search example
aws ec2 describe-images \
--owners 099720109477 \
--filters \
"Name=name,Values=ubuntu/images/hvm-ssd/ubuntu-*-24.04-amd64-server-*" \
"Name=state,Values=available" \
"Name=architecture,Values=x86_64" \
--query 'Images | sort_by(@, &CreationDate)[-5:].{Name:Name,ImageId:ImageId,CreationDate:CreationDate}' \
--output tableAMI selection decision tree
Need Ubuntu EC2 image?
โ
โโโ Is this production?
โ โโโ yes -> choose LTS
โ โโโ no -> LTS still preferred, interim only if justified
โ
โโโ Is owner Canonical?
โ โโโ yes -> continue
โ โโโ no -> reject for production
โ
โโโ Does architecture match instance?
โ โโโ yes -> continue
โ โโโ no -> select amd64 or arm64 correctly
โ
โโโ Is AMI approved or pinned?
โ โโโ yes -> launch
โ โโโ no -> review before productionEC2 launch pattern: instance type, storage, network, security and bootstrap
Launching Ubuntu on EC2 is a sequence of infrastructure decisions. The AMI is only the OS. Production quality also depends on instance sizing, EBS volume type, network placement, security groups, IAM role, SSH access, user data and monitoring.
| EC2 decision | Typical production practice | Risk if ignored |
|---|---|---|
| Instance type | Size by CPU, RAM, network and workload. | CPU steal, memory pressure, throttling. |
| EBS root volume | Enough size, gp3 baseline tuned if needed. | Disk full or IO bottleneck. |
| Subnet | Public only if it must receive internet traffic. | Unnecessary exposure. |
| Security group | Only required ports, source restricted. | Public SSH, public DB, attack surface. |
| IAM role | Least privilege role attached to instance. | Static credentials on disk. |
| User data | Minimal bootstrap, versioned and tested. | Unreproducible snowflake server. |
| Tags | Name, environment, owner, cost center, role. | Poor inventory and cost tracking. |
EC2 launch flow
Launch EC2
โ
โโโ Choose official Ubuntu LTS AMI
โโโ Choose instance type
โโโ Configure EBS root volume
โโโ Select VPC and subnet
โโโ Attach security group
โโโ Attach IAM role
โโโ Select SSH key pair
โโโ Add user data
โโโ Add tags
โโโ Launch instanceReference AWS Ubuntu architecture
Internet
โ
โผ
AWS Load Balancer
โ
โโโ HTTPS listener
โโโ certificate
โโโ health checks
โ
โผ
Public or private app subnet
โ
โโโ Ubuntu EC2 app server
โ โโโ Nginx
โ โโโ Gunicorn / app runtime
โ โโโ CloudWatch agent
โ โโโ UFW local firewall
โ
โโโ Security group
โโโ inbound from load balancer only
โโโ SSH from bastion or admin IP only
Private data subnet
โ
โโโ RDS / database
โโโ Redis / cache
โโโ no public exposureSizing examples
| Use case | Starting point | Watch metric |
|---|---|---|
| Small Nginx reverse proxy | Small general-purpose instance. | Network, CPU, connections. |
| Django / API server | Balanced CPU/RAM instance. | CPU, memory, latency, worker count. |
| Celery worker | CPU or RAM based on task type. | Queue depth, CPU, memory. |
| Database on EC2 | Memory and IO optimized. | IOPS, latency, cache hit ratio. |
SSH keys, default users and safe access patterns
Ubuntu cloud images use SSH keys for initial access. On AWS, the selected EC2 key pair is injected into the default Ubuntu user account during first boot. For Ubuntu images, the common default username is ubuntu.
| Access element | Production practice | Reason |
|---|---|---|
| Default user | ubuntu for initial access. | Standard cloud image behavior. |
| SSH key pair | Use protected private key. | Controls initial admin access. |
| SSH exposure | Restrict by source IP or bastion. | Reduces brute-force surface. |
| Root login | Disabled. | Use named users and sudo. |
| Long-term access | Create named admin users or SSM access. | Improves audit and revocation. |
| Emergency access | Document SSM, console or recovery procedure. | Prevents lockout during incidents. |
SSH examples
# Secure private key permissions
chmod 600 my-aws-key.pem
# Connect to Ubuntu EC2
ssh -i my-aws-key.pem ubuntu@EC2_PUBLIC_IP
# First checks
hostnamectl
whoami
sudo -l
ip a
systemctl status sshSafe access architecture
Admin workstation
โ
โโโ SSH private key
โโโ fixed public IP if possible
โ
โผ
Security group
โ
โโโ allow SSH only from admin IP
โโโ or allow SSH only from bastion
โ
โผ
Ubuntu EC2 instance
โ
โโโ default ubuntu user
โโโ sudo for admin tasks
โโโ no root SSH login
โโโ logs in auth/journalHardening after first login
# Update packages
sudo apt update
sudo apt upgrade
# Create named admin user if needed
sudo adduser deploy
sudo usermod -aG sudo deploy
# Add SSH key for deploy user
sudo mkdir -p /home/deploy/.ssh
sudo cp /home/ubuntu/.ssh/authorized_keys /home/deploy/.ssh/authorized_keys
sudo chown -R deploy:deploy /home/deploy/.ssh
sudo chmod 700 /home/deploy/.ssh
sudo chmod 600 /home/deploy/.ssh/authorized_keys
# Test deploy login before restricting access
ssh deploy@EC2_PUBLIC_IPAccess alternatives
| Pattern | When useful | Comment |
|---|---|---|
| Direct SSH | Small setups, restricted IP. | Simple but exposed if public. |
| Bastion host | Multiple private instances. | Centralized admin entry point. |
| AWS Systems Manager | No public SSH desired. | Requires IAM, agent and network access. |
| VPN | Private operations network. | Good for strict environments. |
cloud-init: first boot automation for Ubuntu cloud images
cloud-init is the standard first-boot initialization system for Ubuntu cloud images. It reads cloud metadata and user data, then applies configuration such as users, SSH keys, packages, files, commands, hostname, timezone and service setup.
| cloud-init section | Purpose | Example usage |
|---|---|---|
package_update | Refresh package metadata. | Prepare apt before install. |
package_upgrade | Upgrade packages at first boot. | Apply latest security patches. |
users | Create users and SSH keys. | Provision deploy user. |
packages | Install packages. | Nginx, fail2ban, monitoring agent. |
write_files | Create config files. | Systemd unit, app env template. |
runcmd | Run final commands. | Enable services, configure firewall. |
cloud-init lifecycle
EC2 instance first boot
โ
โโโ Query AWS metadata service
โโโ Read user data
โโโ Configure hostname
โโโ Inject SSH key
โโโ Create users
โโโ Configure packages
โโโ Write files
โโโ Run commands
โโโ Start services
โโโ Mark initialization completeMinimal cloud-init baseline
#cloud-config
package_update: true
package_upgrade: true
timezone: UTC
packages:
- curl
- wget
- git
- vim
- htop
- ufw
- fail2ban
- nginx
runcmd:
- ufw allow OpenSSH
- ufw allow 80/tcp
- ufw allow 443/tcp
- ufw --force enable
- systemctl enable --now nginx
- systemctl enable --now fail2bancloud-init diagnostics
# Status
cloud-init status
# Wait for completion
cloud-init status --wait
# Main logs
sudo less /var/log/cloud-init.log
sudo less /var/log/cloud-init-output.log
# Show instance metadata if allowed
curl -s http://169.254.169.254/latest/meta-data/ || true
# Validate config if tool supports it
cloud-init schema --config-file user-data.yamlUser data patterns: simple bootstrap, web server, app server and config handoff
User data should be small, readable and reliable. A good pattern is to install only base packages, harden basic access, install monitoring and call a versioned bootstrap script from a trusted source. Avoid stuffing an entire production deployment into a long untested user-data block.
Pattern 1: simple HTTP test server
#cloud-config
package_update: true
packages:
- nginx
write_files:
- path: /var/www/html/index.html
permissions: '0644'
content: |
Ubuntu EC2 is running.
runcmd:
- systemctl enable --now nginxPattern 2: baseline security bootstrap
#cloud-config
package_update: true
package_upgrade: true
packages:
- ufw
- fail2ban
- curl
- htop
runcmd:
- ufw default deny incoming
- ufw default allow outgoing
- ufw allow OpenSSH
- ufw --force enable
- systemctl enable --now fail2banPattern 3: handoff to versioned script
#cloud-config
package_update: true
packages:
- curl
- ca-certificates
runcmd:
- curl -fsSL https://example.com/bootstrap/ubuntu-app.sh -o /root/bootstrap.sh
- chmod 700 /root/bootstrap.sh
- /root/bootstrap.sh --role app --env prodBetter handoff pattern
Preferred production pattern:
1. cloud-init creates minimal baseline
2. instance has IAM role
3. script is downloaded from trusted private source
4. script checksum or signature is verified
5. configuration is versioned
6. logs are written to /var/log/bootstrap.log
7. monitoring reports success or failureBootstrap logging example
#!/usr/bin/env bash
set -euo pipefail
exec > >(tee -a /var/log/bootstrap.log) 2>&1
echo "bootstrap started at $(date -Is)"
apt update
apt install -y nginx
systemctl enable --now nginx
echo "bootstrap finished at $(date -Is)"Golden AMI pattern: reproducible Ubuntu servers
A golden AMI is a prebuilt, approved image containing a hardened baseline: Ubuntu LTS, patches, standard packages, users, agents, logging, monitoring and security defaults. It reduces boot time, improves repeatability and makes replacement safer than manual repair.
| Golden AMI content | Purpose | Example |
|---|---|---|
| Ubuntu LTS base | Approved OS baseline. | 24.04 LTS server image. |
| Security updates | Reduce patch work at boot. | apt upgrade during image build. |
| Agents | Monitoring, logs, SSM, backup. | CloudWatch agent, SSM agent. |
| Hardening | Common security defaults. | SSH policy, sysctl, UFW baseline. |
| Tags and metadata | Inventory and lifecycle. | version, build date, git commit. |
| Validation tests | Prove image boots and works. | SSH, cloud-init, services, logs. |
Golden AMI build flow
Official Ubuntu AMI
โ
โผ
Packer image build
โ
โโโ apply apt updates
โโโ install baseline packages
โโโ install monitoring agents
โโโ apply hardening
โโโ clean temporary files
โโโ validate services
โโโ create AMI
โ
โผ
Approved AMI
โ
โโโ tagged with version
โโโ tested in staging
โโโ used by launch templates
โโโ rolled out progressivelyReplace, do not repair
Traditional server:
- SSH into machine
- manually patch
- manually edit config
- server becomes unique
- recovery depends on memory
Cloud-native server:
- build image
- deploy new instance
- attach to load balancer
- drain old instance
- terminate old instance
- rollback by previous imageGolden AMI governance
[ ] Base AMI owner verified
[ ] Ubuntu LTS version recorded
[ ] Build script versioned
[ ] Security updates applied
[ ] Image tests pass
[ ] AMI is tagged
[ ] AMI ID is published to parameter store or IaC
[ ] Rollback AMI is retained
[ ] Staging rollout completed
[ ] Production rollout is progressive
[ ] Old AMIs are retired safelyLaunch template model
Launch Template
โ
โโโ approved AMI ID
โโโ instance type
โโโ IAM role
โโโ security groups
โโโ EBS configuration
โโโ user data
โโโ tags
โ
โผ
Auto Scaling Group
โโโ desired capacity
โโโ health checks
โโโ rolling replacement
โโโ rollback to previous template versionAWS security for Ubuntu EC2: security groups, IAM, metadata and private networking
Ubuntu hardening and AWS security must work together. Security groups restrict network access before traffic reaches the instance. UFW adds host-level defense. IAM roles avoid static cloud keys. Private subnets prevent unnecessary exposure.
| Security control | AWS side | Ubuntu side |
|---|---|---|
| Network filtering | Security groups, NACLs, load balancer. | UFW or nftables. |
| Admin access | Bastion, VPN, SSM Session Manager. | SSH keys, no root login, auth logs. |
| Cloud permissions | IAM role attached to instance. | No static AWS keys stored on disk. |
| Secrets | Secrets Manager, SSM Parameter Store. | Strict file permissions if cached locally. |
| Observability | CloudWatch, VPC Flow Logs, CloudTrail. | journald, auth logs, application logs. |
| Recovery | EBS snapshots, AMIs, backups. | Restore tests and runbooks. |
Security group examples
Public web server:
- inbound 443/tcp from 0.0.0.0/0
- inbound 80/tcp from 0.0.0.0/0 only if redirect is needed
- inbound 22/tcp only from admin IP or bastion
- outbound restricted if strict policy is required
Private app server behind load balancer:
- inbound app port only from load balancer security group
- inbound SSH only from bastion security group
- no direct public access
Database server:
- inbound DB port only from app security group
- no public IP
- no public SSHLayered AWS Ubuntu security diagram
Internet
โ
โผ
AWS perimeter
โโโ Route 53
โโโ CloudFront / WAF
โโโ Load Balancer
โโโ Security Groups
โ
โผ
Ubuntu host
โโโ UFW
โโโ SSH hardening
โโโ non-root services
โโโ package updates
โโโ logs
โโโ monitoring agent
โ
โผ
Application
โโโ TLS
โโโ secrets management
โโโ app logs
โโโ DB access
โโโ health checksMetadata and credentials
Recommended:
- use IAM roles instead of static AWS keys
- keep role permissions minimal
- avoid storing credentials in user data
- avoid secrets in AMI images
- use Parameter Store or Secrets Manager
- monitor CloudTrail for suspicious API calls
- review instance profile permissions
Avoid:
- AWS_ACCESS_KEY_ID in .bashrc
- secrets embedded in user data
- secrets baked into AMIs
- overly broad IAM roles
- public metadata exposure through SSRF-vulnerable appsOperations: monitoring, logs, snapshots, patching and recovery
Ubuntu EC2 operations combine Linux administration and AWS lifecycle management. The system must be patched, monitored, backed up, logged, tagged, replaceable and tested. A server that cannot be rebuilt is a long-term operational risk.
| Operational area | AWS control | Ubuntu control | Question to answer |
|---|---|---|---|
| Metrics | CloudWatch metrics and agent. | node exporter, system metrics. | Is the host saturated? |
| Logs | CloudWatch Logs, S3 archive. | journald, app logs, auth logs. | Can we diagnose incidents? |
| Backups | EBS snapshots, AWS Backup. | Application-aware backup hooks. | Can we restore? |
| Patching | SSM Patch Manager, image rebuild. | apt, unattended upgrades. | Are CVEs patched? |
| Recovery | AMI, launch template, autoscaling. | cloud-init, bootstrap scripts. | Can we replace the server? |
| Inventory | Tags, AWS Config, Systems Manager. | hostname, OS version, package list. | Do we know what this server is? |
Operational metrics
Host:
- CPU utilization
- memory usage
- disk usage
- disk IO latency
- network throughput
- systemd failed services
- reboot-required state
Application:
- HTTP latency
- HTTP 5xx
- worker queue depth
- database connections
- error logs
- health check status
AWS:
- instance status checks
- EBS burst balance
- EBS latency
- load balancer health
- security group changes
- CloudTrail eventsRecovery patterns
Pattern 1: EBS snapshot restore
โโโ create volume from snapshot
โโโ attach to instance
โโโ mount and recover data
โโโ validate application
Pattern 2: AMI rollback
โโโ select previous AMI
โโโ launch replacement instance
โโโ attach to load balancer
โโโ validate health
โโโ terminate bad instance
Pattern 3: Blue/green replacement
โโโ build new Ubuntu image
โโโ launch green environment
โโโ smoke test
โโโ shift traffic
โโโ keep blue as rollbackUbuntu EC2 health commands
# OS and kernel
hostnamectl
uname -a
lsb_release -a
# Cloud-init status
cloud-init status
sudo tail -100 /var/log/cloud-init-output.log
# System health
uptime
df -h
free -h
systemctl --failed
journalctl -p warning --since "30 min ago"
# Network and ports
ip a
ip r
ss -lntp
# Reboot required
test -f /var/run/reboot-required && cat /var/run/reboot-requiredFinal AWS Ubuntu checklist
AMI and launch checklist
[ ] Official Ubuntu AMI selected
[ ] Canonical owner ID verified
[ ] LTS release selected for production
[ ] Architecture matches instance type
[ ] AMI ID is approved or pinned
[ ] Instance type matches workload
[ ] EBS volume size is sufficient
[ ] EBS performance is appropriate
[ ] VPC and subnet are correct
[ ] Security group is minimal
[ ] SSH access is restricted
[ ] IAM role uses least privilege
[ ] User data is tested
[ ] Tags are complete
[ ] Monitoring is enabledCloud-init checklist
[ ] User data starts with #cloud-config if YAML
[ ] package_update is intentional
[ ] package_upgrade is intentional
[ ] No long-lived secrets in user data
[ ] Bootstrap logs to file
[ ] cloud-init status is checked
[ ] /var/log/cloud-init-output.log is reviewed
[ ] Failed commands are visible
[ ] Complex setup is delegated to versioned script
[ ] Script source is trusted
[ ] Rebuild process is documentedProduction operations checklist
[ ] UFW matches security group policy
[ ] Root SSH login is disabled
[ ] SSH keys are controlled
[ ] Patch policy exists
[ ] Reboot policy exists
[ ] CloudWatch or equivalent metrics enabled
[ ] Logs are shipped centrally
[ ] EBS snapshots are scheduled
[ ] Restore has been tested
[ ] Launch template is versioned
[ ] Golden AMI pipeline exists if fleet is large
[ ] Rollback AMI is retained
[ ] Instance is replaceable
[ ] Runbook exists
[ ] Owner and cost tags are presentFinal rule
Use official Canonical AMIs, LTS baselines, controlled user data, minimal security groups, SSH keys or SSM, IAM roles, monitoring, snapshots, tested restore and image-based replacement where possible.
Minimal safe EC2 Ubuntu baseline
Minimum safe baseline:
- official Ubuntu LTS AMI
- Canonical owner verified
- SSH restricted
- security group minimal
- IAM role instead of static keys
- cloud-init bootstrap tested
- packages updated
- UFW enabled if needed
- monitoring installed
- snapshots configured
- restore tested
- instance documented and taggedContainers and virtualization on Ubuntu
Ubuntu is a strong platform for both containers and virtualization. Docker is typically used for application containers. LXD/LXC is used for system containers that behave more like lightweight machines. KVM/QEMU is used for full virtual machines with their own kernel. virt-manager provides a graphical management interface for KVM.
The key difference is isolation level. Docker containers share the host kernel and are optimized for application packaging. LXD containers also share the host kernel but feel closer to small Linux systems. KVM virtual machines run a full guest OS with stronger isolation and more overhead.
| Technology | Category | Best for | Isolation | Typical command |
|---|---|---|---|---|
| Docker | Application containers | Apps, microservices, dev stacks, CI jobs. | Process/container isolation, shared kernel. | docker run nginx |
| Docker Compose | Multi-container orchestration | Local stacks, demos, small deployments. | Same as Docker. | docker compose up |
| LXD / LXC | System containers | Mini Linux systems, labs, isolated services. | OS-level isolation, shared kernel. | lxc launch ubuntu:24.04 c1 |
| KVM / QEMU | Full virtualization | VMs, different kernels, stronger isolation. | Hardware-assisted VM isolation. | virsh list --all |
| virt-manager | GUI for KVM/libvirt | Desktop/lab VM management. | Manages KVM guests. | Graphical interface. |
Isolation model diagram
Ubuntu host
โ
โโโ Docker containers
โ โโโ app process
โ โโโ image layers
โ โโโ container network
โ โโโ shared host kernel
โ
โโโ LXD system containers
โ โโโ init/systemd inside container
โ โโโ full Ubuntu userspace
โ โโโ container profiles
โ โโโ shared host kernel
โ
โโโ KVM virtual machines
โโโ guest kernel
โโโ guest OS
โโโ virtual CPU/RAM/disk/NIC
โโโ stronger isolation boundaryDecision shortcut
Need to ship an application?
โโโ Docker
Need several services locally?
โโโ Docker Compose
Need a mini Ubuntu machine?
โโโ LXD / LXC
Need another kernel or full VM isolation?
โโโ KVM / QEMU
Need a graphical VM manager?
โโโ virt-manager
Need production orchestration at scale?
โโโ Kubernetes, ECS, Nomad or managed platformDocker on Ubuntu: images, containers, networks, volumes and logs
Docker packages an application and its runtime dependencies into an image. A container is a running instance of that image. On Ubuntu, Docker is commonly used for development, CI/CD, local demos, staging environments and production workloads behind a reverse proxy or orchestrator.
| Concept | Meaning | Example |
|---|---|---|
| Image | Immutable package template. | nginx:latest, postgres:16 |
| Container | Running process from an image. | docker run nginx |
| Volume | Persistent storage outside container lifecycle. | Database data, uploads. |
| Network | Container communication layer. | bridge network, app network. |
| Registry | Image storage and distribution. | Docker Hub, GHCR, ECR. |
| Dockerfile | Build recipe for an image. | Python app image. |
Install Docker baseline
# Install from Ubuntu repository for simple usage
sudo apt update
sudo apt install docker.io docker-compose-v2
# Enable Docker
sudo systemctl enable --now docker
# Check status
systemctl status docker
# Add current user to docker group
sudo usermod -aG docker $USER
# Re-login before using docker without sudo
docker version
docker infoCore Docker commands
# List running containers
docker ps
# List all containers
docker ps -a
# List images
docker images
# Run Nginx
docker run --name web -p 8080:80 nginx:latest
# Stop and remove
docker stop web
docker rm web
# Logs
docker logs web
docker logs -f web
# Shell inside container
docker exec -it web bash
# Inspect container
docker inspect web
# Disk usage
docker system dfDocker architecture
Developer or CI
โ
โโโ Dockerfile
โโโ build image
โโโ tag image
โโโ push image
โ
โผ
Registry
โ
โโโ Docker Hub
โโโ GitHub Container Registry
โโโ AWS ECR
โโโ private registry
โ
โผ
Ubuntu host
โ
โโโ pull image
โโโ run container
โโโ attach volume
โโโ expose port
โโโ collect logsdocker group is effectively root-equivalent on the host. Do not grant it casually on production servers.Docker Compose: local stacks, demos, CI environments and small deployments
Docker Compose defines several containers in one YAML file. It is excellent for local development, prototypes, demos, test stacks and small internal deployments. For large production environments, Compose is usually replaced by Kubernetes, ECS, Nomad or another orchestrator.
| Use case | Compose fit | Comment |
|---|---|---|
| Local Django + Postgres + Redis | Excellent. | Reproducible dev environment. |
| Demo platform | Excellent. | Easy to start and stop. |
| CI integration tests | Good. | Start dependencies for test run. |
| Single-server production | Possible with discipline. | Needs backups, monitoring, update strategy. |
| Large multi-node production | Not ideal. | Use orchestrator. |
Example Compose stack
services:
web:
build: .
command: gunicorn config.wsgi:application --bind 0.0.0.0:8000
ports:
- "8000:8000"
environment:
DJANGO_SETTINGS_MODULE: config.settings
DATABASE_URL: postgres://app:app@db:5432/app
REDIS_URL: redis://redis:6379/0
depends_on:
- db
- redis
db:
image: postgres:16
environment:
POSTGRES_DB: app
POSTGRES_USER: app
POSTGRES_PASSWORD: app
volumes:
- pgdata:/var/lib/postgresql/data
redis:
image: redis:7
volumes:
pgdata:Compose commands
# Start stack
docker compose up
# Start in background
docker compose up -d
# Show containers
docker compose ps
# Show logs
docker compose logs
docker compose logs -f web
# Execute command
docker compose exec web bash
# Stop stack
docker compose down
# Stop and remove volumes - destructive
docker compose down -v
# Rebuild
docker compose build
docker compose up -d --buildCompose lifecycle
docker-compose.yml
โ
โโโ services
โโโ networks
โโโ volumes
โโโ environment
โโโ ports
โโโ dependencies
โ
โผ
docker compose up
โ
โโโ creates network
โโโ creates volumes
โโโ starts containers
โโโ streams logs
โโโ exposes portsProduction cautions
If using Compose in production:
[ ] pin image versions
[ ] avoid latest tags
[ ] define restart policies
[ ] configure log rotation
[ ] persist data in named volumes
[ ] backup volumes
[ ] monitor containers
[ ] document upgrade process
[ ] keep secrets out of git
[ ] place behind Nginx or load balancerLXD / LXC: system containers and lightweight Ubuntu environments
LXD manages LXC system containers. Unlike Docker, which usually runs one application process per container, LXD containers can behave like lightweight Linux machines with init, SSH, packages, services and multiple processes. This makes LXD excellent for labs, training, test environments, network simulations and isolated system services.
| Feature | LXD / LXC behavior | Usefulness |
|---|---|---|
| System container | Full Linux userspace. | Feels like a mini VM. |
| Shared kernel | Uses host kernel. | Lightweight compared to VM. |
| Images | Launch Ubuntu and other Linux images. | Fast lab creation. |
| Profiles | Reusable config for containers. | Standardized CPU/RAM/network/storage. |
| Snapshots | Snapshot and restore container state. | Safe experimentation. |
| Networking | Bridge, routed, macvlan patterns. | Complex labs and isolated networks. |
LXD install and init
# Install LXD
sudo snap install lxd
# Add user to lxd group
sudo usermod -aG lxd $USER
# Re-login, then initialize
lxd init
# Launch Ubuntu container
lxc launch ubuntu:24.04 test1
# List containers
lxc list
# Shell inside container
lxc exec test1 -- bashLXD command examples
# Start / stop
lxc start test1
lxc stop test1
# Execute command
lxc exec test1 -- apt update
# Copy file into container
lxc file push local.txt test1/root/local.txt
# Snapshot
lxc snapshot test1 before-change
# Restore snapshot
lxc restore test1 before-change
# Delete container
lxc delete test1 --force
# Show configuration
lxc config show test1
# Limit memory
lxc config set test1 limits.memory 1GiB
# Limit CPU
lxc config set test1 limits.cpu 2LXD lab architecture
Ubuntu host
โ
โโโ lxdbr0 bridge
โ
โโโ container: web01
โ โโโ nginx
โ โโโ app service
โ
โโโ container: db01
โ โโโ PostgreSQL
โ
โโโ container: monitor01
โ โโโ Prometheus / Grafana
โ
โโโ snapshots
โโโ before-upgrade
โโโ before-network-test
โโโ clean-baselineKVM / QEMU / libvirt: full virtualization on Ubuntu
KVM is Linux kernel-based virtualization. With QEMU and libvirt, Ubuntu can host full virtual machines. Each VM has its own virtual CPU, memory, disk, network card and guest operating system. This is heavier than containers but gives stronger isolation and supports different kernels and operating systems.
| Component | Role | Example usage |
|---|---|---|
| KVM | Kernel virtualization acceleration. | Runs VM workloads efficiently. |
| QEMU | Machine emulator and virtualizer. | Emulates devices and runs guests. |
| libvirt | Management layer for VMs. | virsh, virt-manager. |
| virt-install | CLI VM installer. | Create VM from ISO or cloud image. |
| virsh | CLI administration tool. | List, start, stop, inspect VMs. |
| qcow2 | Common VM disk image format. | Snapshots and thin provisioning. |
Install KVM stack
# Check CPU virtualization support
egrep -c '(vmx|svm)' /proc/cpuinfo
# Install KVM/libvirt tools
sudo apt update
sudo apt install -y \
qemu-kvm \
libvirt-daemon-system \
libvirt-clients \
bridge-utils \
virtinst
# Add user to groups
sudo usermod -aG libvirt,kvm $USER
# Re-login, then check
virsh list --all
systemctl status libvirtdKVM architecture
Ubuntu host
โ
โโโ Linux kernel with KVM
โโโ QEMU processes
โโโ libvirt daemon
โโโ virtual networks
โโโ storage pools
โโโ VM guests
โ
โโโ Ubuntu VM
โโโ Debian VM
โโโ Windows VM
โโโ lab appliance VMvirsh commands
# List VMs
virsh list --all
# Start VM
virsh start vm1
# Stop gracefully
virsh shutdown vm1
# Force stop
virsh destroy vm1
# VM info
virsh dominfo vm1
# Autostart VM
virsh autostart vm1
# Show networks
virsh net-list --all
# Show storage pools
virsh pool-list --allWhen KVM is better than containers
Use KVM when:
- guest needs its own kernel
- running another OS
- stronger isolation is required
- testing kernel-level behavior
- simulating production VM topology
- running legacy software
- needing VM snapshots and full machine statevirt-manager: graphical VM management on Ubuntu
virt-manager is a desktop GUI for managing KVM/libvirt virtual machines. It is useful for labs, local testing, training, troubleshooting, VM console access and visual VM configuration. On servers, CLI tools such as virsh and automation are more common.
| virt-manager feature | Purpose | Typical usage |
|---|---|---|
| VM creation wizard | Create new VM from ISO or image. | Ubuntu or Windows lab VM. |
| Console view | Access VM screen. | Install OS, fix boot issues. |
| Hardware editor | Configure CPU, RAM, disks, NICs. | Adjust VM resources. |
| Snapshots | Capture VM state. | Before risky change. |
| Network view | Manage virtual networks. | NAT, bridge, isolated network. |
| Storage pools | Manage VM disks. | qcow2 images and volumes. |
Install virt-manager
sudo apt update
sudo apt install virt-manager
# Start GUI from desktop
virt-manager
# Check libvirt service
systemctl status libvirtd
# List VMs from CLI
virsh list --allVM creation flow with virt-manager
virt-manager
โ
โโโ New virtual machine
โโโ Choose ISO or cloud image
โโโ Select OS type
โโโ Allocate CPU and RAM
โโโ Create virtual disk
โโโ Choose network
โโโ Start installation
โโโ Install guest tools if neededLab topology example
Ubuntu desktop host
โ
โโโ virt-manager
โ
โโโ VM: router-lab
โ โโโ NIC 1: NAT
โ โโโ NIC 2: isolated lab network
โ
โโโ VM: web-server
โ โโโ lab network
โ
โโโ VM: db-server
โโโ lab networkGood usage boundaries
Use virt-manager for:
- desktop labs
- OS installation
- visual debugging
- VM console access
- local experiments
Prefer CLI/IaC for:
- production servers
- repeatable deployment
- remote headless hosts
- large VM fleets
- automated rebuildsCI/CD, labs and developer workflows
Ubuntu containers and virtualization are extremely useful for reproducible development, automated tests, CI runners, network labs, database experiments, security sandboxes and integration environments. The goal is to reduce โworks on my machineโ problems.
| Workflow | Best technology | Why |
|---|---|---|
| Local web app stack | Docker Compose | Fast, reproducible dependencies. |
| CI integration tests | Docker services | Start DB/cache/message broker for tests. |
| Linux admin training | LXD | Fast mini Ubuntu machines. |
| Network topology lab | LXD or KVM | Multiple nodes and networks. |
| Kernel or OS testing | KVM | Full guest kernel isolation. |
| Security sandbox | KVM | Stronger isolation boundary. |
CI pipeline example
Git push
โ
โผ
CI runner on Ubuntu
โ
โโโ checkout code
โโโ build Docker image
โโโ start Compose services
โ โโโ app
โ โโโ postgres
โ โโโ redis
โโโ run tests
โโโ scan image
โโโ push image to registry
โโโ deploy to target environmentDemo architecture: one Ubuntu host
Ubuntu host
โ
โโโ Docker service
โ โโโ nginx container
โ โโโ app container
โ โโโ redis container
โ
โโโ LXD container
โ โโโ ubuntu:24.04 system lab
โ
โโโ KVM VM
โ โโโ isolated test machine
โ
โโโ Monitoring
โโโ node exporter
โโโ docker stats
โโโ systemd statusUseful demo commands
# Docker demo
docker run -d --name demo-nginx -p 8080:80 nginx:latest
curl -I http://localhost:8080
docker logs demo-nginx
# LXD demo
lxc launch ubuntu:24.04 lab1
lxc exec lab1 -- bash -lc "hostnamectl && apt update"
# KVM visibility
virsh list --all
# Host monitoring
top
df -h
ss -lntpProduction patterns: Docker host, reverse proxy, volumes, logs, updates and orchestration
Containers in production require more than docker run. You need image governance, non-root containers, pinned versions, health checks, persistent volumes, log rotation, backup, monitoring, network boundaries, secrets management and a clear update strategy.
| Production topic | Good practice | Risk if ignored |
|---|---|---|
| Image versions | Pin tags or digests. | Unexpected changes from latest. |
| Volumes | Persist state outside container. | Data loss on container removal. |
| Logs | Configure rotation and centralization. | Disk fills under /var/lib/docker. |
| Secrets | Use secret store or strict env file permissions. | Secrets leaked in git or inspect output. |
| Networking | Expose only reverse proxy, keep internal networks private. | DB/cache exposed accidentally. |
| Health checks | Define container and load balancer health. | Dead service appears running. |
| Backups | Backup volumes and databases. | No recovery path. |
Single-host container production pattern
Internet
โ
โผ
Nginx on Ubuntu host
โ
โโโ TLS termination
โโโ rate limiting
โโโ static files
โโโ reverse proxy
โ
โผ
Docker network
โ
โโโ app container
โโโ worker container
โโโ redis container
โโโ internal-only database or external DBDocker daemon log rotation
# /etc/docker/daemon.json
{
"log-driver": "json-file",
"log-opts": {
"max-size": "50m",
"max-file": "5"
}
}
# Apply
sudo systemctl restart dockerProduction Compose example
services:
app:
image: registry.example.com/myapp:1.4.2
restart: unless-stopped
env_file:
- /srv/myapp/app.env
networks:
- internal
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health/"]
interval: 30s
timeout: 5s
retries: 3
redis:
image: redis:7.2
restart: unless-stopped
networks:
- internal
volumes:
- redisdata:/data
networks:
internal:
volumes:
redisdata:When to move beyond Compose
Move to orchestrator when:
- multiple hosts are needed
- rolling updates are required
- autoscaling is required
- service discovery is complex
- secrets and config need governance
- many teams deploy independently
- high availability is mandatoryTroubleshooting containers and virtualization
Troubleshooting should start at the right layer: host health, Docker daemon, container logs, network binding, volume permissions, image version, LXD profile, libvirt daemon, VM console or storage pool. Avoid deleting containers or volumes before understanding where persistent data lives.
| Symptom | First checks | Common cause |
|---|---|---|
| Docker container exits | docker ps -a, docker logs | Bad config, missing env, app crash. |
| Port not reachable | docker ps, ss -lntp, UFW. | Port not published, firewall, bind address. |
| Disk full | docker system df, du -sh /var/lib/docker | Logs, images, old containers, volumes. |
| Permission denied on volume | ls -lah, container user, UID/GID. | Host volume ownership mismatch. |
| LXD container has no network | lxc list, lxc network list. | Bridge, DNS, profile or firewall issue. |
| KVM VM will not start | virsh list --all, libvirt logs. | Missing storage, permission, CPU virtualization. |
Docker troubleshooting commands
docker ps
docker ps -a
docker logs CONTAINER
docker inspect CONTAINER
docker exec -it CONTAINER bash
docker stats
docker system df
docker network ls
docker volume ls
systemctl status docker
journalctl -u docker --since "30 min ago"LXD and KVM troubleshooting commands
# LXD
lxc list
lxc info CONTAINER
lxc config show CONTAINER
lxc network list
lxc storage list
lxc exec CONTAINER -- bash
journalctl -u snap.lxd.daemon --since "30 min ago"
# KVM / libvirt
virsh list --all
virsh dominfo VM
virsh net-list --all
virsh pool-list --all
systemctl status libvirtd
journalctl -u libvirtd --since "30 min ago"Decision tree
Container or VM issue
โ
โโโ Is host healthy?
โ โโโ CPU, RAM, disk, network
โ
โโโ Is manager running?
โ โโโ Docker daemon
โ โโโ LXD daemon
โ โโโ libvirt daemon
โ
โโโ Is workload running?
โ โโโ docker ps -a
โ โโโ lxc list
โ โโโ virsh list --all
โ
โโโ What do logs say?
โ โโโ docker logs
โ โโโ lxc info --show-log
โ โโโ journalctl
โ
โโโ Is it network, storage or permissions?
โโโ ports
โโโ volumes
โโโ bridges
โโโ UID/GIDdocker compose down -v deletes named volumes. Never run it on production unless you explicitly want to delete persistent data.Final checklist and command cheat sheet
Technology choice checklist
[ ] Docker selected for application containers
[ ] Compose selected for local or small multi-service stacks
[ ] LXD selected for system-container labs
[ ] KVM selected for full VM isolation
[ ] virt-manager selected for GUI lab management
[ ] Production orchestrator considered if multi-node
[ ] Persistent data location is documented
[ ] Backup strategy exists for volumes and VM disks
[ ] Network exposure is documented
[ ] Host firewall rules are known
[ ] Logs are rotated
[ ] Images are pinned
[ ] Secrets are not stored in git
[ ] Monitoring covers host and workloads
[ ] Update and rollback process existsDocker cheat sheet
docker ps
docker ps -a
docker images
docker run --name web -p 8080:80 nginx
docker logs -f web
docker exec -it web bash
docker stop web
docker rm web
docker system df
docker compose up -d
docker compose logs -f
docker compose downLXD / KVM cheat sheet
# LXD
lxd init
lxc launch ubuntu:24.04 c1
lxc list
lxc exec c1 -- bash
lxc snapshot c1 before-change
lxc restore c1 before-change
lxc delete c1 --force
# KVM / libvirt
virsh list --all
virsh start vm1
virsh shutdown vm1
virsh dominfo vm1
virsh net-list --all
virsh pool-list --all
# Host checks
systemctl status docker
systemctl status libvirtd
df -h
free -h
ss -lntpFinal rule
Docker gives fast application packaging, Compose gives reproducible stacks, LXD gives machine-like containers, and KVM gives full VM isolation. Production quality depends on security, storage, networking, logs, monitoring, backups and rollback.
Minimal robust host profile
Minimum robust Ubuntu container/VM host:
- Ubuntu LTS
- patched kernel and packages
- Docker/LXD/KVM installed intentionally
- non-root operational model
- storage sized and monitored
- log rotation enabled
- firewall rules documented
- images or VM templates versioned
- backups tested
- monitoring and alerts enabled
- runbook documentedProfessional troubleshooting method
Ubuntu troubleshooting must be systematic. The objective is not to try random commands until something changes. The objective is to identify the failing layer: application, service manager, process, logs, permissions, network, DNS, firewall, storage, memory, CPU, kernel, package update, boot or recent configuration change.
A good incident workflow follows a stable sequence: define the symptom, determine the scope, collect evidence, isolate the layer, apply one minimal fix, verify, document, then add prevention.
| Step | Question | Command family | Output expected |
|---|---|---|---|
| 1. Symptom | What exactly is failing? | curl, browser, user report, monitoring | Precise error, time, scope. |
| 2. Scope | One service, one host, one network, all users? | health checks, ping, curl, dashboard | Incident boundary. |
| 3. Logs | What did the system report? | journalctl, tail, grep, dmesg | Error message and timeline. |
| 4. Services | Is the daemon running and enabled? | systemctl, ss | Running state, PID, port. |
| 5. Resources | Is the host saturated? | top, free, df, iostat, vmstat | CPU/RAM/disk/IO pressure. |
| 6. Network | Can traffic reach the service? | ip, ss, ufw, dig, curl | IP, route, DNS, port, firewall status. |
| 7. Change | What changed recently? | apt history, deploy logs, git, config diff | Likely trigger. |
| 8. Fix | What is the smallest safe correction? | rollback, restart, config fix, cleanup | Service restored and verified. |
Global diagnostic decision tree
Ubuntu incident
โ
โโโ Is the host reachable?
โ โโโ no -> cloud, network, firewall, boot, provider
โ โโโ yes
โ
โโโ Is disk full?
โ โโโ yes -> disk playbook
โ โโโ no
โ
โโโ Are critical services running?
โ โโโ no -> systemd playbook
โ โโโ yes
โ
โโโ Are ports listening?
โ โโโ no -> service config / bind / crash
โ โโโ yes
โ
โโโ Is network path valid?
โ โโโ no -> DNS / route / firewall / SG
โ โโโ yes
โ
โโโ Are resources saturated?
โ โโโ CPU -> CPU playbook
โ โโโ RAM -> memory playbook
โ โโโ IO -> disk / IO playbook
โ โโโ no
โ
โโโ Application layer likely
โโโ app logs
โโโ DB connectivity
โโโ cache connectivity
โโโ external dependency
โโโ recent deployDo / avoid
| Do | Avoid |
|---|---|
| Collect logs with time window. | Reading huge logs without filtering. |
| Change one thing at a time. | Restarting everything blindly. |
| Check disk early. | Debugging app while root FS is full. |
| Validate config before restart. | Restarting with unvalidated config. |
| Keep rollback possible. | Deleting unknown files in production. |
First 5 minutes: collect facts quickly
The first minutes of an incident are for orientation. You need to know whether the machine is alive, whether the disk is full, whether memory is exhausted, whether services failed, which ports are listening, and whether logs show a clear error.
One-screen diagnostic
echo "== HOST =="
hostnamectl
echo "== UPTIME =="
uptime
echo "== WHO IS CONNECTED =="
who
echo "== DISK =="
df -h
echo "== MEMORY =="
free -h
echo "== FAILED UNITS =="
systemctl --failed
echo "== LISTENING PORTS =="
ss -lntp
echo "== RECENT WARNINGS =="
journalctl -p warning --since "30 min ago" --no-pager | tail -100| Signal | Good sign | Bad sign | Next action |
|---|---|---|---|
| Uptime | Stable, expected boot time. | Unexpected reboot. | Check previous boot logs. |
| Disk | Filesystem below alert threshold. | / or /var near 100%. | Disk playbook. |
| Memory | Available RAM healthy. | Swap active, OOM events. | Memory playbook. |
| Failed units | No failed services. | Critical unit failed. | systemd playbook. |
| Ports | Expected ports listening. | Missing 80/443/app/DB port. | Service and network playbook. |
Minimum incident facts
Incident facts to capture:
- exact symptom
- first detection time
- impacted users or services
- hostname
- Ubuntu version
- kernel version
- uptime
- recent deployments
- recent package upgrades
- failed services
- resource saturation
- relevant logs
- immediate workaround
- rollback optionRecent change checks
# Apt package changes
less /var/log/apt/history.log
less /var/log/apt/term.log
# Recently modified config files under /etc
sudo find /etc -type f -mtime -2 -ls | sort -k 8,9
# Recent system boots
last reboot | head
# Current users
who
w
# Cron logs if syslog is available
grep CRON /var/log/syslog | tail -100Immediate triage matrix
| Observation | Likely category |
|---|---|
Service failed in systemctl | Service/config/dependency issue. |
| Port not listening | Service did not bind or crashed. |
| Port listening locally but unreachable remotely | Firewall, route, security group, load balancer. |
| Disk full | Logs, Docker, DB, uploads, backups. |
| OOM kill in kernel logs | Memory leak or insufficient RAM. |
systemd and service troubleshooting
Most production daemons on Ubuntu are managed by systemd. When a service is down, start with systemctl status, then read the journal, validate the config, check ports, check permissions, and only then restart.
| Question | Command | What it tells you |
|---|---|---|
| Is service running? | systemctl status nginx | Active state, PID, exit code, recent logs. |
| Why did it fail? | journalctl -u nginx | Service logs and errors. |
| Did unit fail? | systemctl --failed | Failed systemd units. |
| Does it start at boot? | systemctl is-enabled nginx | Boot activation state. |
| Which ports are bound? | ss -lntp | Listening sockets and processes. |
| What are unit properties? | systemctl show nginx | Restart policy, limits, user, environment. |
Service commands
# Status
systemctl status SERVICE
# Logs
journalctl -u SERVICE --since "1 hour ago"
journalctl -u SERVICE -f
# Restart
sudo systemctl restart SERVICE
# Reload config if supported
sudo systemctl reload SERVICE
# Enable at boot
sudo systemctl enable SERVICE
# Failed units
systemctl --failed
# Unit file
systemctl cat SERVICE
# Runtime properties
systemctl show SERVICE | lessService failure decision tree
Service failed
โ
โโโ Read status
โ โโโ systemctl status SERVICE
โ
โโโ Read logs with time window
โ โโโ journalctl -u SERVICE --since "30 min ago"
โ
โโโ Config syntax valid?
โ โโโ nginx -t
โ โโโ sshd -t
โ โโโ app-specific check
โ
โโโ Dependency available?
โ โโโ database
โ โโโ redis
โ โโโ network
โ โโโ filesystem mount
โ
โโโ Permissions correct?
โ โโโ service user
โ โโโ config files
โ โโโ runtime directories
โ
โโโ Port conflict?
โ โโโ ss -lntp
โ
โโโ Restart with monitoring
โโโ systemctl restart SERVICECommon service failures
| Error pattern | Likely cause | Fix direction |
|---|---|---|
| Exit code 1 after deploy | Bad config or app error. | Validate config, rollback deploy. |
| Permission denied | Wrong owner/group/path. | Check service user and namei -l. |
| Address already in use | Port conflict. | Find process with ss -lntp. |
| Start request repeated too quickly | Crash loop. | Fix root cause, then systemctl reset-failed. |
| Dependency failed | Database, network, mount, Redis missing. | Restore dependency first. |
Logs and journald: finding the real error
Logs are the timeline of the incident. On Ubuntu, the main tools are journalctl, service-specific logs, /var/log/auth.log, /var/log/syslog, kernel logs and application logs. The most useful log queries are scoped by service and time.
| Need | Command | Use case |
|---|---|---|
| Recent critical context | journalctl -xe | Quick overview of recent errors. |
| Service logs | journalctl -u nginx | Why one service failed. |
| Current boot logs | journalctl -b | Boot-time errors and service startup. |
| Previous boot logs | journalctl -b -1 | Debug crash/reboot before current boot. |
| Kernel logs | journalctl -k | OOM, disk, driver, network errors. |
| Authentication logs | /var/log/auth.log | SSH, sudo, login attempts. |
journalctl commands
# Recent errors and context
journalctl -xe
# Service logs today
journalctl -u SERVICE --since today
# Service logs last 30 minutes
journalctl -u SERVICE --since "30 min ago"
# Follow service logs
journalctl -u SERVICE -f
# Warnings and errors today
journalctl -p warning --since today
# Current boot
journalctl -b
# Previous boot
journalctl -b -1
# Kernel logs
journalctl -k --since todayClassic log files
# System log
sudo tail -200 /var/log/syslog
# Authentication log
sudo tail -200 /var/log/auth.log
# Nginx logs
sudo tail -200 /var/log/nginx/error.log
sudo tail -200 /var/log/nginx/access.log
# Apt history
less /var/log/apt/history.log
less /var/log/apt/term.log
# Compressed rotated logs
zgrep -i "error" /var/log/syslog.*.gz
# Kernel ring buffer
dmesg -T | tail -100Log investigation flow
Need root cause from logs
โ
โโโ Identify incident time window
โ
โโโ Read service journal
โ โโโ journalctl -u SERVICE --since TIME
โ
โโโ Read system warnings
โ โโโ journalctl -p warning --since TIME
โ
โโโ Read kernel logs
โ โโโ journalctl -k --since TIME
โ
โโโ Read application logs
โ
โโโ Correlate with deploy/update
โ โโโ apt history / deploy log
โ
โโโ Extract first error, not last symptomUseful grep patterns
grep -i "error" app.log
grep -i "permission denied" app.log
grep -i "connection refused" app.log
grep -i "no space left" /var/log/syslog
grep -i "killed process" /var/log/syslog
grep -i "failed password" /var/log/auth.logNetwork and DNS troubleshooting
Network debugging should be layered: IP address, route, DNS, firewall, listening port, local service response, remote response. Do not assume an application is broken until the network path is verified.
| Layer | Question | Command | Bad sign |
|---|---|---|---|
| Interface | Does the host have an IP? | ip a | No expected IP. |
| Route | Is default route present? | ip r | No default route. |
| DNS | Can names resolve? | dig, resolvectl | Timeout or wrong answer. |
| Firewall | Is traffic allowed? | ufw status verbose | Required port denied. |
| Socket | Is service listening? | ss -lntp | Port missing. |
| HTTP local | Does local endpoint respond? | curl -I localhost | Connection refused or 5xx. |
| Remote path | Does public endpoint respond? | curl -I domain | Timeout, TLS, 5xx, wrong IP. |
Network commands
# Interfaces
ip a
# Routes
ip r
# DNS status
resolvectl status
# DNS query
dig example.com
dig A example.com
dig AAAA example.com
# Listening ports
ss -lntp
# Connection summary
ss -s
# Firewall
sudo ufw status verbose
# Local HTTP check
curl -I http://localhost
# Public HTTP check
curl -I https://example.comNetwork decision tree
Service unreachable
โ
โโโ Is service listening locally?
โ โโโ no -> service/config issue
โ โโโ yes
โ
โโโ Does local curl work?
โ โโโ no -> app/service issue
โ โโโ yes
โ
โโโ Is firewall open?
โ โโโ no -> UFW/cloud security group
โ โโโ yes
โ
โโโ Does DNS point to correct IP?
โ โโโ no -> DNS provider / record
โ โโโ yes
โ
โโโ Does remote curl reach?
โ โโโ no -> route/LB/firewall/provider
โ โโโ yes
โ
โโโ Is response app error?
โโโ app logs / upstream logsCommon network symptoms
| Symptom | Likely cause | Check |
|---|---|---|
| Connection refused | No service listening on target port. | ss -lntp, service status. |
| Connection timeout | Firewall, route, security group, provider. | UFW, cloud firewall, route. |
| DNS resolves wrong IP | Bad DNS record or stale cache. | dig, DNS console. |
| Works locally, not remotely | Firewall, bind address, reverse proxy, LB. | ss, UFW, Nginx. |
| TLS error | Wrong certificate, expired cert, SNI issue. | Nginx logs, certbot, openssl. |
localhost works and public domain works are different tests. Always verify both.Disk, filesystem and IO troubleshooting
Disk problems can break everything: package installs, logs, databases, Docker, SSH, application uploads and systemd services. Always check disk early in an incident. A full / or /var often creates misleading application errors.
| Problem | Command | Likely cause | Safe first action |
|---|---|---|---|
| Filesystem full | df -h | Logs, Docker, DB, backups, uploads. | Identify large directories. |
| Large logs | du -sh /var/log/* | Log storm or missing rotation. | Vacuum journal, rotate logs. |
| Docker disk growth | docker system df | Images, volumes, logs. | Prune only understood objects. |
| Mount missing | findmnt, lsblk -f | fstab error, disk detach. | Fix mount, do not write to wrong path. |
| High IO wait | iostat -xz 1 | Slow disk, DB writes, swap, backup. | Find process with iotop. |
Disk commands
# Filesystem usage
df -h
# Inode usage
df -ih
# Top-level directory sizes
sudo du -xhd1 / 2>/dev/null | sort -h
# Common growth areas
sudo du -sh /var/log/*
sudo du -sh /var/lib/docker/* 2>/dev/null
sudo du -sh /var/lib/postgresql/* 2>/dev/null
sudo du -sh /tmp/* 2>/dev/null
# Mounts and disks
lsblk -f
findmnt
cat /etc/fstab
# IO statistics
iostat -xz 1
sudo iotop -oDisk full decision tree
Disk full
โ
โโโ Which filesystem?
โ โโโ df -h
โ
โโโ Is it root or /var?
โ โโโ du -xhd1 /
โ
โโโ Is journal huge?
โ โโโ journalctl --disk-usage
โ
โโโ Are app logs huge?
โ โโโ du -sh /var/log/*
โ
โโโ Is Docker huge?
โ โโโ docker system df
โ
โโโ Is database huge?
โ โโโ do not delete manually
โ
โโโ Prevent recurrence
โโโ logrotate
โโโ monitoring
โโโ retention
โโโ resize or separate volumeSafe cleanup commands
# Clean apt cache
sudo apt clean
# Remove unused packages
sudo apt autoremove
# Show journal size
journalctl --disk-usage
# Vacuum journal by time
sudo journalctl --vacuum-time=7d
# Vacuum journal by size
sudo journalctl --vacuum-size=1G
# Docker usage
docker system df
# Docker image cleanup - use with care
docker image pruneDangerous cleanup commands
Dangerous in production:
rm -rf /var/lib/postgresql/*
rm -rf /var/lib/mysql/*
rm -rf /var/lib/docker/volumes/*
docker compose down -v
truncate unknown database files
delete random files under /var/lib
Safer:
- understand owner service
- stop service if required
- backup first
- use native cleanup tools
- document actionCPU, memory, swap and process troubleshooting
Resource saturation explains many incidents: slow responses, timeouts, SSH lag, services killed by OOM, high load, worker backlog, database slowness and container instability. Identify whether the bottleneck is CPU, RAM, swap, IO wait or one process.
| Signal | Command | Interpretation | Next action |
|---|---|---|---|
| High load | uptime | Runnable/waiting tasks high. | Check CPU vs IO wait. |
| High CPU | top, pidstat | Process consuming CPU. | Profile app or reduce load. |
| Low memory | free -h | Available memory low. | Find memory process. |
| Swap activity | vmstat 1 | RAM pressure causing latency. | Reduce workers, add RAM. |
| OOM kill | journalctl -k | Kernel killed process. | Fix memory pressure. |
| High IO wait | iostat, top | CPU waiting on disk. | Disk/IO playbook. |
Resource commands
# CPU/load
uptime
top
htop
ps aux --sort=-%cpu | head -30
# Memory
free -h
ps aux --sort=-%mem | head -30
vmstat 1
# Swap
swapon --show
# OOM events
journalctl -k --since today | grep -i -E "oom|killed process"
# Per-process stats if sysstat installed
pidstat -u -r 1Resource decision tree
Server slow
โ
โโโ Load high?
โ โโโ uptime
โ
โโโ CPU saturated?
โ โโโ yes -> top, process, app profiler
โ โโโ no
โ
โโโ IO wait high?
โ โโโ yes -> iostat, iotop, disk playbook
โ โโโ no
โ
โโโ Memory low?
โ โโโ yes -> ps by memory, OOM logs
โ โโโ no
โ
โโโ Swap active?
โ โโโ yes -> reduce workers or add RAM
โ โโโ no
โ
โโโ App-level bottleneck
โโโ DB query
โโโ lock
โโโ external API
โโโ cache miss
โโโ queue backlogCommon resource fixes
| Cause | Short-term action | Long-term fix |
|---|---|---|
| Too many app workers | Reduce workers, restart app. | Right-size worker count. |
| Memory leak | Restart controlled service. | Fix code, add monitoring, MemoryMax. |
| Traffic spike | Rate limit, scale, cache. | Autoscaling, CDN, capacity plan. |
| Slow database query | Kill/limit bad query if safe. | Index, query optimization, DB scaling. |
| Backup job overload | Pause or throttle job. | Schedule and IO limits. |
Boot, kernel, emergency mode and recovery troubleshooting
Boot issues usually come from filesystem errors, broken /etc/fstab, failed mounts, bootloader problems, bad kernel update, disk issues or cloud volume attachment problems. Recovery may require console access, rescue mode, previous kernel or mounting the disk on another instance.
| Symptom | Likely cause | Diagnostic | Recovery direction |
|---|---|---|---|
| Emergency mode | Broken fstab or failed mount. | Console logs, journalctl -xb. | Fix fstab or mount issue. |
| Boot hangs after update | Kernel/driver issue. | GRUB previous kernel. | Boot previous kernel, rollback. |
| No SSH after reboot | Network, firewall, ssh service, boot incomplete. | Cloud console / serial log. | Console recovery. |
| Filesystem check fails | Disk corruption or unclean shutdown. | fsck from recovery. | Repair with backup ready. |
| Wrong boot disk | Bootloader or cloud volume mapping. | UEFI/GRUB/cloud console. | Fix boot order or volume attachment. |
Boot diagnostics
# Current boot logs
journalctl -b
# Previous boot logs
journalctl -b -1
# Boot errors
journalctl -b -p err
# Kernel logs
journalctl -k -b
# Filesystems
lsblk -f
findmnt
cat /etc/fstab
# Failed units
systemctl --failed
# Kernel version
uname -aBoot failure decision tree
Server did not come back after reboot
โ
โโโ Cloud or physical console available?
โ โโโ yes -> read boot output
โ โโโ no -> use provider recovery tools
โ
โโโ Reaches GRUB?
โ โโโ yes -> try previous kernel
โ โโโ no -> bootloader/disk issue
โ
โโโ Emergency mode?
โ โโโ yes -> check fstab and mounts
โ โโโ no
โ
โโโ Network failed?
โ โโโ yes -> check netplan/cloud-init
โ โโโ no
โ
โโโ SSH failed?
โ โโโ yes -> ssh service/firewall/keys
โ โโโ no
โ
โโโ Application failed after boot
โโโ systemd service playbookfstab recovery checks
# Check fstab content
cat /etc/fstab
# Test mounts without reboot
sudo mount -a
# Show current mounts
findmnt
# Validate UUIDs
blkid
lsblk -fCloud recovery pattern
Broken cloud VM
โ
โโโ Stop instance
โโโ Detach root volume
โโโ Attach volume to rescue instance
โโโ Mount filesystem
โโโ Fix fstab/config/keys
โโโ Unmount cleanly
โโโ Reattach as root volume
โโโ Boot and verify/etc/fstab, bootloader, kernel or network config should be tested before rebooting a remote server.Incident playbooks: common Ubuntu production failures
Playbook matrix
| Incident | First command | Likely root causes | Safe correction |
|---|---|---|---|
| Website down | curl -I localhost | Nginx, app service, DB, firewall. | Fix failed layer, rollback deploy if needed. |
| 502 Bad Gateway | systemctl status app | Upstream app down, socket path, port mismatch. | Fix app service or Nginx upstream. |
| SSH unavailable | Cloud console / provider console. | Firewall, SSH config, key, fail2ban, network. | Console recovery, avoid closing existing session. |
| Disk full | df -h | Logs, Docker, DB, backups. | Safe cleanup and retention fix. |
| High CPU | top | Traffic spike, hot process, backup, worker count. | Limit, scale, rollback, profile. |
| OOM kill | journalctl -k | grep -i oom | Memory leak, too many workers, low RAM. | Reduce memory pressure, add limits. |
| DNS failure | dig domain | Bad record, resolver, TTL, provider issue. | Fix DNS or resolver path. |
| Package update broke service | less /var/log/apt/history.log | Dependency change, config prompt, version mismatch. | Rollback package or restore previous image. |
502 Nginx playbook
502 Bad Gateway
โ
โโโ Check Nginx config
โ โโโ sudo nginx -t
โ
โโโ Check Nginx logs
โ โโโ tail -100 /var/log/nginx/error.log
โ
โโโ Check upstream app service
โ โโโ systemctl status gunicorn
โ
โโโ Check upstream port/socket
โ โโโ ss -lntp
โ
โโโ Check app logs
โ โโโ journalctl -u gunicorn
โ
โโโ Fix app or upstream configSSH lockout playbook
SSH unavailable
โ
โโโ Is server reachable?
โ โโโ ping / cloud status checks
โ
โโโ Is port open externally?
โ โโโ security group / firewall
โ
โโโ Console access possible?
โ โโโ provider console / serial console
โ
โโโ Check ssh service
โ โโโ systemctl status ssh
โ
โโโ Check firewall
โ โโโ ufw status verbose
โ
โโโ Check SSH config syntax
โ โโโ sshd -t
โ
โโโ Check keys and user
โ โโโ authorized_keys, permissions
โ
โโโ Restore safe access before hardening againDisk full playbook
Disk full
โ
โโโ df -h
โโโ du -xhd1 /
โโโ du -sh /var/log/*
โโโ journalctl --disk-usage
โโโ docker system df
โโโ apt clean
โโโ journalctl --vacuum-time=7d
โโโ prune Docker carefully if applicable
โโโ resize volume if needed
โโโ add monitoring and retentionPost-incident actions
After restoration:
[ ] Confirm user-visible service is healthy
[ ] Confirm logs are clean
[ ] Confirm monitoring is green
[ ] Record exact root cause
[ ] Record commands executed
[ ] Record rollback option used or not used
[ ] Add missing alert
[ ] Add missing dashboard panel
[ ] Add missing runbook step
[ ] Schedule permanent fixUbuntu troubleshooting cheat sheet and final checklist
Command cheat sheet
# Host
hostnamectl
uptime
who
w
last reboot | head
# Services
systemctl status SERVICE
systemctl --failed
systemctl cat SERVICE
journalctl -u SERVICE --since "30 min ago"
journalctl -u SERVICE -f
# Logs
journalctl -xe
journalctl -p warning --since today
journalctl -k --since today
tail -100 /var/log/syslog
tail -100 /var/log/auth.log
# Network
ip a
ip r
ss -lntp
ss -s
resolvectl status
dig example.com
curl -I http://localhost
ufw status verbose
# Disk
df -h
df -ih
du -xhd1 /
lsblk -f
findmnt
journalctl --disk-usage
# Resources
free -h
top
ps aux --sort=-%cpu | head
ps aux --sort=-%mem | head
vmstat 1
iostat -xz 1Final troubleshooting checklist
[ ] Symptom is precisely described
[ ] Incident start time is known
[ ] Scope is known
[ ] Host health is checked
[ ] Disk usage is checked
[ ] Memory and CPU are checked
[ ] Failed systemd units are checked
[ ] Relevant service logs are read
[ ] Kernel logs are checked if needed
[ ] Network path is verified
[ ] DNS is verified
[ ] Firewall is verified
[ ] Listening ports are verified
[ ] Recent deploys are checked
[ ] Recent apt updates are checked
[ ] Fix is minimal and reversible
[ ] Service health is verified after fix
[ ] Postmortem notes are written
[ ] Preventive action is createdFinal rule
Start with facts: logs, service status, ports, disk, memory, CPU, network and recent changes. Apply one controlled fix, verify the result, document the root cause, and add monitoring or a runbook so the same incident becomes easier next time.
Minimal incident report template
Incident report:
- title
- start time
- detection method
- impacted service
- user impact
- root cause
- immediate fix
- commands executed
- rollback used
- prevention action
- owner
- deadlineUbuntu operator quick map
This cheat sheet is a compact operational reference for Ubuntu servers: first checks, package operations, systemd services, journald logs, network, DNS, disk, memory, security, cloud patterns and production readiness.
| Need | First command | What it answers |
|---|---|---|
| Host identity | hostnamectl | Hostname, OS, kernel, machine type. |
| System load | uptime | Load average and uptime. |
| Failed services | systemctl --failed | Broken units. |
| Service status | systemctl status SERVICE | Service state, PID, exit code, recent logs. |
| Service logs | journalctl -u SERVICE | Service timeline and errors. |
| Listening ports | ss -lntp | Open TCP ports and owning processes. |
| Disk usage | df -h | Filesystem capacity. |
| Memory usage | free -h | RAM, available memory and swap. |
| Firewall | sudo ufw status verbose | Host-level network exposure. |
| Recent package changes | less /var/log/apt/history.log | Updates, installs and removals. |
First 90 seconds on a server
echo "== HOST =="
hostnamectl
echo "== UPTIME =="
uptime
echo "== USERS =="
who
echo "== DISK =="
df -h
echo "== MEMORY =="
free -h
echo "== FAILED UNITS =="
systemctl --failed
echo "== PORTS =="
ss -lntp
echo "== WARNINGS =="
journalctl -p warning --since "30 min ago" --no-pager | tail -100Triage decision tree
Problem reported
โ
โโโ Host unreachable?
โ โโโ cloud, network, boot, firewall
โ
โโโ Disk full?
โ โโโ df -h, du, journal size, Docker
โ
โโโ Service failed?
โ โโโ systemctl status, journalctl
โ
โโโ Port missing?
โ โโโ service bind, config, crash
โ
โโโ Network path broken?
โ โโโ IP, route, DNS, UFW, security group
โ
โโโ App problem?
โโโ app logs, DB, cache, deploySystem identity, users, processes and host state
Host and OS
# Host and OS summary
hostnamectl
# Ubuntu release metadata
cat /etc/os-release
lsb_release -a
# Kernel
uname -a
# Architecture
dpkg --print-architecture
# Boot and uptime
uptime
last reboot | head
# Current users
who
wProcesses
# Interactive process view
top
htop
# Top CPU processes
ps aux --sort=-%cpu | head -30
# Top memory processes
ps aux --sort=-%mem | head -30
# Process tree
pstree -ap
# Find process
pgrep -af nginx
# Open files by process
sudo lsof -p PIDUsers and groups
# Current identity
whoami
id
# User identity
id deploy
groups deploy
# Create user
sudo adduser deploy
# Add sudo rights
sudo usermod -aG sudo deploy
# Show sudo group
getent group sudo
# Show shell users
grep -E "/bin/bash|/bin/sh|/usr/bin/zsh" /etc/passwd
# Lock user password
sudo passwd -l usernamePermissions
# File permissions
ls -lah /srv/app
# Path permissions
namei -l /srv/app/current/.env
# Change owner
sudo chown deploy:www-data file
# Recursive owner change
sudo chown -R deploy:www-data /srv/app
# File mode
chmod 644 file
# Directory mode
chmod 755 directory
# Secret file mode
chmod 600 secret.keychmod 777. Fix owner, group and minimal access.Packages, APT, repositories and updates
APT essentials
# Refresh package metadata
sudo apt update
# Show upgradeable packages
apt list --upgradable
# Upgrade packages
sudo apt upgrade
# Full dependency-aware upgrade
sudo apt full-upgrade
# Install package
sudo apt install PACKAGE
# Remove package, keep config
sudo apt remove PACKAGE
# Remove package and config
sudo apt purge PACKAGE
# Remove unused dependencies
sudo apt autoremove
# Clean package cache
sudo apt cleanPackage inspection
# Search package
apt search nginx
# Package details
apt show nginx
# Installed and candidate version
apt policy nginx
# Installed packages
dpkg -l | grep nginx
# Files installed by package
dpkg -L nginx
# Package owning a file
dpkg -S /usr/sbin/nginx
# Available versions
apt-cache madison nginxRepositories and history
# Source files
cat /etc/apt/sources.list
ls -lah /etc/apt/sources.list.d/
# Search repo lines
grep -R "^deb" /etc/apt/sources.list /etc/apt/sources.list.d/
# APT history
less /var/log/apt/history.log
# APT terminal logs
less /var/log/apt/term.log
# Held packages
apt-mark showhold
# Hold package
sudo apt-mark hold PACKAGE
# Unhold package
sudo apt-mark unhold PACKAGEPackage repair
# Finish interrupted dpkg operation
sudo dpkg --configure -a
# Fix broken dependencies
sudo apt -f install
# Check locks safely
ps aux | grep -E 'apt|dpkg'
# Re-run metadata refresh
sudo apt updateUpdate safety
# Reboot required?
test -f /var/run/reboot-required && cat /var/run/reboot-required
# Packages requiring reboot if present
cat /var/run/reboot-required.pkgs 2>/dev/null
# Security automation
sudo apt install unattended-upgrades
sudo dpkg-reconfigure unattended-upgradessystemd services, unit files and runtime control
Service commands
# Service status
systemctl status SERVICE
# Start / stop / restart
sudo systemctl start SERVICE
sudo systemctl stop SERVICE
sudo systemctl restart SERVICE
# Reload config if supported
sudo systemctl reload SERVICE
# Enable / disable at boot
sudo systemctl enable SERVICE
sudo systemctl disable SERVICE
# Is active / enabled?
systemctl is-active SERVICE
systemctl is-enabled SERVICE
# Failed units
systemctl --failed
# Reset failed state
sudo systemctl reset-failed SERVICEUnit inspection
# Show unit file
systemctl cat SERVICE
# Show runtime properties
systemctl show SERVICE | less
# Show service logs
journalctl -u SERVICE --since "1 hour ago"
# Follow logs
journalctl -u SERVICE -f
# Reload unit files after edit
sudo systemctl daemon-reloadService failure flow
Service broken
โ
โโโ systemctl status SERVICE
โโโ journalctl -u SERVICE --since "30 min ago"
โโโ systemctl cat SERVICE
โโโ validate config
โโโ check dependencies
โโโ check permissions
โโโ check ports
โโโ restart only after cause is understood
โโโ verify logs and health checkCommon config validators
# Nginx
sudo nginx -t
# SSH
sudo sshd -t
# Apache
sudo apachectl configtest
# PostgreSQL
sudo -u postgres psql -c "select version();"
# Redis
redis-cli ping
# Local HTTP health
curl -I http://localhostRobust unit pattern
[Unit]
Description=My app
After=network.target
[Service]
User=myapp
Group=myapp
WorkingDirectory=/srv/myapp
EnvironmentFile=/srv/myapp/.env
ExecStart=/srv/myapp/venv/bin/gunicorn config.wsgi:application
Restart=on-failure
RestartSec=5
LimitNOFILE=65535
[Install]
WantedBy=multi-user.targetLogs, journald, auth logs, kernel logs and audit trail
journald essentials
# Recent diagnostic context
journalctl -xe
# Current boot
journalctl -b
# Previous boot
journalctl -b -1
# Service logs
journalctl -u SERVICE
# Service logs since today
journalctl -u SERVICE --since today
# Service logs last 30 minutes
journalctl -u SERVICE --since "30 min ago"
# Follow service logs
journalctl -u SERVICE -f
# Warnings and errors
journalctl -p warning --since today
# Kernel logs
journalctl -k --since todayClassic log files
# System log
sudo tail -200 /var/log/syslog
# Authentication log
sudo tail -200 /var/log/auth.log
# Nginx
sudo tail -200 /var/log/nginx/error.log
sudo tail -200 /var/log/nginx/access.log
# APT
less /var/log/apt/history.log
less /var/log/apt/term.log
# Kernel ring buffer
dmesg -T | tail -100Search patterns
# Generic errors
grep -i "error" app.log
grep -i "failed" app.log
grep -i "permission denied" app.log
grep -i "connection refused" app.log
grep -i "no space left" /var/log/syslog
# SSH failures
sudo grep -i "failed password" /var/log/auth.log | tail -100
# Sudo usage
sudo grep -i "sudo" /var/log/auth.log | tail -100
# OOM events
journalctl -k --since today | grep -i -E "oom|killed process"
# Compressed rotated logs
zgrep -i "error" /var/log/syslog.*.gzJournal size control
# Show journal size
journalctl --disk-usage
# Vacuum by time
sudo journalctl --vacuum-time=14d
# Vacuum by size
sudo journalctl --vacuum-size=1GLog investigation flow
Find root cause
โ
โโโ define incident time window
โโโ read service journal
โโโ read system warnings
โโโ read kernel logs
โโโ read app logs
โโโ check apt/deploy history
โโโ identify first meaningful errorNetwork, DNS, firewall and HTTP checks
Network commands
# Interfaces
ip a
# Routes
ip r
# Interface counters
ip -s link
# Listening TCP ports
ss -lntp
# Established connections
ss -antp
# Socket summary
ss -s
# DNS status
resolvectl status
# DNS query
dig example.com
dig A example.com
dig AAAA example.com
# Reachability
ping -c 3 1.1.1.1
tracepath example.com
mtr -rw example.comHTTP and TLS checks
# Local HTTP
curl -I http://localhost
# Public HTTP
curl -I https://example.com
# Follow redirects
curl -IL https://example.com
# Verbose TLS/HTTP
curl -vI https://example.com
# Check certificate with openssl
openssl s_client -connect example.com:443 -servername example.comFirewall commands
# Status
sudo ufw status verbose
sudo ufw status numbered
# Baseline
sudo ufw default deny incoming
sudo ufw default allow outgoing
# Allow SSH
sudo ufw allow OpenSSH
# Allow web
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
# Restrict SSH by source
sudo ufw allow from 203.0.113.10 to any port 22 proto tcp
# Delete numbered rule
sudo ufw delete RULE_NUMBER
# Enable firewall
sudo ufw enableNetwork diagnostic flow
Service unreachable
โ
โโโ service listening locally?
โ โโโ ss -lntp
โ
โโโ local curl works?
โ โโโ curl -I localhost
โ
โโโ firewall open?
โ โโโ ufw status
โ
โโโ DNS points correctly?
โ โโโ dig domain
โ
โโโ remote curl works?
โ โโโ curl -I domain
โ
โโโ app returns error?
โโโ app logs / upstream logsDisk, filesystem, memory, CPU and IO
Disk and filesystem
# Filesystem usage
df -h
# Inode usage
df -ih
# Block devices
lsblk -f
# Mounts
findmnt
# fstab
cat /etc/fstab
# Top-level usage
sudo du -xhd1 / 2>/dev/null | sort -h
# Common growth paths
sudo du -sh /var/log/*
sudo du -sh /var/lib/docker/* 2>/dev/null
sudo du -sh /var/lib/postgresql/* 2>/dev/null
sudo du -sh /tmp/* 2>/dev/nullSafe cleanup
# APT cache
sudo apt clean
sudo apt autoremove
# Journal
journalctl --disk-usage
sudo journalctl --vacuum-time=14d
sudo journalctl --vacuum-size=1G
# Docker usage
docker system df
# Docker cleanup, use carefully
docker image prune
docker container pruneCPU, memory and IO
# CPU/load
uptime
top
htop
ps aux --sort=-%cpu | head -30
# Memory
free -h
ps aux --sort=-%mem | head -30
# Swap
swapon --show
vmstat 1
# OOM
journalctl -k --since today | grep -i -E "oom|killed process"
# IO, requires sysstat
iostat -xz 1
# Per-process IO
sudo iotop -oDisk full playbook
Disk full
โ
โโโ df -h
โโโ du -xhd1 /
โโโ journalctl --disk-usage
โโโ du -sh /var/log/*
โโโ docker system df
โโโ apt clean
โโโ journalctl --vacuum-time=14d
โโโ resize volume if needed
โโโ add alert and retention policyResource interpretation
| Signal | Likely issue |
|---|---|
| High load + high CPU | CPU-bound workload or traffic spike. |
| High load + high IO wait | Disk or database bottleneck. |
| Low available RAM + swap activity | Memory pressure. |
| OOM kill logs | Process killed by kernel due to memory exhaustion. |
| Filesystem 100% | Services may fail unpredictably. |
Security hardening and quick audit
SSH hardening
# Backup config
sudo cp /etc/ssh/sshd_config /etc/ssh/sshd_config.bak.$(date +%Y%m%d-%H%M%S)
# Recommended directives
PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes
X11Forwarding no
MaxAuthTries 3
AllowUsers deploy
# Validate and restart
sudo sshd -t
sudo systemctl restart ssh
# Logs
journalctl -u ssh --since todaySSH key permissions
chmod 700 ~/.ssh
chmod 600 ~/.ssh/id_ed25519
chmod 644 ~/.ssh/id_ed25519.pub
chmod 600 ~/.ssh/authorized_keysfail2ban
sudo apt install fail2ban
sudo systemctl enable --now fail2ban
sudo fail2ban-client status
sudo fail2ban-client status sshd
sudo journalctl -u fail2ban --since todaySecurity snapshot
echo "== UFW =="
sudo ufw status verbose
echo "== OPEN PORTS =="
ss -lntp
echo "== SUDO USERS =="
getent group sudo
echo "== SHELL USERS =="
grep -E "/bin/bash|/bin/sh|/usr/bin/zsh" /etc/passwd
echo "== SSH LOGS =="
journalctl -u ssh --since "24 hours ago" --no-pager | tail -100
echo "== AUTH LOG =="
sudo tail -100 /var/log/auth.logSecurity checklist
[ ] Ubuntu LTS
[ ] Packages updated
[ ] Reboot-required checked
[ ] Named admin user
[ ] Root SSH login disabled
[ ] SSH key login validated
[ ] Password SSH disabled
[ ] UFW enabled
[ ] Only required ports open
[ ] Database ports private
[ ] Redis ports private
[ ] fail2ban enabled if SSH public
[ ] Service users are non-root
[ ] Secrets are not world-readable
[ ] Backups exist
[ ] Restore testedCloud and AWS Ubuntu quick reference
AWS Ubuntu baseline
Production EC2 Ubuntu baseline:
- official Ubuntu LTS AMI
- Canonical owner verified
- minimal security group
- SSH restricted by source or bastion
- IAM role instead of static keys
- cloud-init tested
- packages updated
- UFW aligned with security group
- monitoring installed
- snapshots scheduled
- restore tested
- tags complete
- instance replaceableCanonical AMI owner
Canonical AWS owner ID:
099720109477
Use it to filter official Ubuntu AMIs.AWS CLI AMI search
aws ec2 describe-images \
--owners 099720109477 \
--filters \
"Name=name,Values=ubuntu/images/hvm-ssd/ubuntu-*-24.04-amd64-server-*" \
"Name=state,Values=available" \
"Name=architecture,Values=x86_64" \
--query 'Images | sort_by(@, &CreationDate)[-5:].{Name:Name,ImageId:ImageId,CreationDate:CreationDate}' \
--output tablecloud-init quick pattern
#cloud-config
package_update: true
package_upgrade: true
timezone: UTC
packages:
- curl
- wget
- git
- htop
- ufw
- fail2ban
- nginx
runcmd:
- ufw allow OpenSSH
- ufw allow 80/tcp
- ufw allow 443/tcp
- ufw --force enable
- systemctl enable --now nginx
- systemctl enable --now fail2banCloud-init diagnostics
cloud-init status
cloud-init status --wait
sudo tail -200 /var/log/cloud-init.log
sudo tail -200 /var/log/cloud-init-output.logOfficial links
| Resource | URL |
|---|---|
| Ubuntu downloads | https://ubuntu.com/download |
| Ubuntu documentation | https://documentation.ubuntu.com/ |
| Ubuntu Server docs | https://documentation.ubuntu.com/server/ |
| Ubuntu release cycle | https://ubuntu.com/about/release-cycle |
| Ubuntu releases | https://releases.ubuntu.com/ |
| Ubuntu on AWS | https://documentation.ubuntu.com/aws/ |
| AWS AMI concepts | https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html |
Production server checklist and mini demo
Production readiness checklist
[System]
[ ] Ubuntu LTS selected
[ ] Hostname correct
[ ] Timezone configured
[ ] Packages updated
[ ] Reboot-required checked
[ ] Server role documented
[Security]
[ ] SSH keys only
[ ] Root SSH login disabled
[ ] UFW enabled
[ ] Only required ports open
[ ] Users and sudo controlled
[ ] Service users non-root
[ ] Secrets protected
[Operations]
[ ] systemd services enabled
[ ] Logs visible with journalctl
[ ] Monitoring installed
[ ] Alerts configured
[ ] Backups scheduled
[ ] Restore tested
[ ] Patch policy defined
[ ] Runbook written
[Cloud]
[ ] Official LTS image
[ ] Security groups minimal
[ ] IAM role least privilege
[ ] Snapshots configured
[ ] Tags complete
[ ] Replacement strategy documentedMini demo for portfolio
Demo: production-minded Ubuntu EC2
Architecture:
Internet
โ
โผ
Security Group
โ
โโโ 22/tcp from admin IP only
โโโ 80/tcp public
โโโ 443/tcp public
โ
โผ
Ubuntu LTS EC2
โโโ cloud-init installs nginx
โโโ UFW enabled
โโโ fail2ban enabled
โโโ logs checked
โโโ metrics installed
โโโ backup snapshot configuredMini demo validation commands
hostnamectl
cloud-init status
systemctl status nginx
sudo ufw status verbose
ss -lntp
curl -I http://localhost
journalctl -u nginx --since today
df -h
free -hCheat-sheet poster placeholder

static/img/ubuntu/ubuntu_cheatsheet_poster.png