Project Oxygen & Ideo-LabIDEO LAB Dashboard 2026

NGINX – Optimisations & Tuning

Version: 2026-05-18 15:54
Micro-cache (GET anonymes)

Objectif : absorber les bursts sans frapper l’app. On **cache uniquement** les GET anonymes (pas de session ni d’Authorization).

Clés
  • TTL court : 1–5 s (lissage).
  • Bypass si sessionid ou Authorization.
  • En-tĂȘte X-Cache pour visibilitĂ© (HIT/MISS).
Bénéfices
  • RPS app ↓ drastique sous pic.
  • TTFB stable (cache local Nginx).
  • Pas de refactor cĂŽtĂ© Django.

http{} + vhost
# http{}
proxy_cache_path /var/cache/nginx/micro levels=1:2 keys_zone=micro:300m max_size=3g inactive=45m use_temp_path=off;

map $request_method $is_get { default 0; GET 1; }
map $http_cookie $has_session { default 0; ~*(sessionid)=1; }
map $http_authorization $has_auth { default 0; ~.+ 1; }
map $arg_nocache $force_bypass { default 0; 1 1; }

# 1 = BYPASS, 0 = OK cache
map "$is_get$has_session$has_auth$force_bypass" $skip_cache { default 1; "1000" 0; }

# vhost: location /
proxy_cache           micro;
proxy_cache_bypass    $skip_cache;
proxy_no_cache        $skip_cache;
proxy_cache_valid     200 301 302 5s;
proxy_cache_valid     404 1s;
add_header X-Cache    $upstream_cache_status always;

Astuce : ajouter proxy_cache_lock on; + proxy_cache_background_update on; pour Ă©viter le stampede (cf. “Stale” plus bas).

# 1) Vérifier le HIT
curl -I https://site/ | tr -d '\r' | egrep 'HTTP/|X-Cache'

# 2) Sous charge (parallĂšle)
seq 1 40 | xargs -n1 -P40 -I{} curl -s -o /dev/null -w "%{http_code}\n" https://site/

# 3) Logs (doit montrer cache=HIT)
tail -n 50 /var/log/nginx/access.log | egrep 'cache='
HIT > 70% sous burst
p95 rt < 200 ms
Aucun Set-Cookie sur GET
Rien en 5xx
  • PiĂšge : un Set-Cookie invalide le cache → Ă©viter sur routes publiques.
  • TTL trop long ⇒ donnĂ©es obsolĂštes (utiliser 3–10 s pour lisser, pas plus).
  • Cache clĂ© trop large si tu varies par langue/UA (voir “ClĂ© de cache & Variantes”).
# Playbook micro-cache (prod)
1) Ajouter proxy_cache_path + maps (http{}).
2) Appliquer sur location / (vhost) avec X-Cache.
3) Déployer: nginx -t && systemctl reload nginx.
4) Mesurer: ratio HIT, p95 rt/urt, charge app.
5) Ajuster TTL (3→5→10 s) selon trafic.
Rate-limit & Conn-limit

But : **lisser/Ă©crĂ©mer**. Global = file d’attente (sans nodelay). Endpoints lourds = nodelay + petit burst.

rate 10r/s
burst 60 (global)
endpoint: burst 5 + nodelay
ru=DELAYED/REJECTED

# http{}
limit_req_zone  $binary_remote_addr zone=req_per_ip:20m rate=10r/s;
limit_conn_zone $binary_remote_addr zone=conn_per_ip:20m;

# vhost (global)
limit_req  zone=req_per_ip  burst=60;     # file d'attente
limit_conn conn_per_ip 30;

# endpoint coûteux
location /api/search {
  limit_req  zone=req_per_ip  burst=5 nodelay;
  limit_conn conn_per_ip 10;
  proxy_pass http://app;
}

# logs utiles
log_format limits '$remote_addr "$request" s=$status ru=$limit_req_status cache=$upstream_cache_status';
access_log /var/log/nginx/access.log limits;
# Test en parallĂšle
seq 1 50 | xargs -n1 -P50 -I{} curl -s -o /dev/null -w "%{http_code}\n" https://site/api/search?q=x | sort | uniq -c
# Voir le limiter
tail -n 60 /var/log/nginx/access.log | egrep 'ru='
ru=DELAYED > REJECTED
%429 <= 1–2% trafic lĂ©gitime
  • Sans set_real_ip_from derriĂšre CDN → toutes IP confondues.
  • Mauvais vhost : placer les directives dans **le vhost actif**.
  • nodelay global = rejets massifs (rĂ©server aux endpoints lourds).
# Playbook limiter
1) Déclarer zones (http{}).
2) Appliquer global + endpoints lourds.
3) Activer log 'ru='.
4) Bench parallĂšle (hey/ab).
5) Ajuster: burst/rate/nodelay.
Upstream & Keepalive

Stabiliser le proxy vers Gunicorn : **HTTP/1.1 + keepalive** (réutilisation), timeouts réalistes, reprises limitées, et (optionnel) **plusieurs instances** (8001/8002).

keepalive 64
connect 3s / read 30s
next_upstream 1 try
2 instances = zéro coupure

# http{}
upstream django_upstream {
  server 127.0.0.1:8001 max_fails=2 fail_timeout=3s;
  # server 127.0.0.1:8002;
  keepalive 64;
}

# vhost location /
proxy_pass http://django_upstream;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;

proxy_connect_timeout 3s;
proxy_send_timeout    30s;
proxy_read_timeout    30s;

proxy_next_upstream error timeout http_502 http_503 http_504;
proxy_next_upstream_tries 1;
# voir sockets en ESTABLISHED entre nginx et gunicorn
ss -tanp | grep ':8001'
# journal gunicorn
journalctl -u gunicorn -n 200 --no-pager
  • Oublier proxy_set_header Connection "" → empĂȘche le keepalive upstream.
  • Mettre trop de proxy_next_upstream_tries peut dupliquer des requĂȘtes non idempotentes.
# Playbook upstream
1) Déclarer upstream + keepalive.
2) Passer le location / Ă  l'upstream.
3) Ajouter 2e instance (si RAM OK).
4) Mesurer p95 urt & erreurs 502/504.
5) Ajuster timeouts & tries.
TLS: Sessions & OCSP

Réduire les handshakes via **sessions TLS** + **OCSP stapling**. Activer H2. (H3 si build QUIC)

server {
  listen 443 ssl http2;
  server_name EX;
  ssl_certificate     /etc/letsencrypt/live/EX/fullchain.pem;
  ssl_certificate_key /etc/letsencrypt/live/EX/privkey.pem;

  ssl_protocols TLSv1.2 TLSv1.3;
  ssl_session_cache   shared:SSL:50m;   # ~200k sessions
  ssl_session_timeout 10m;
  ssl_session_tickets off;

  ssl_stapling on;
  ssl_stapling_verify on;
  resolver 1.1.1.1 1.0.0.1 valid=300s;
  resolver_timeout 5s;
}
# vérifier HTTP/2
curl -I --http2 https://site/ | grep HTTP/2
# sessions TLS réutilisées (navigateur: DevTools / connection)
# OCSP stapling (openssl)
echo | openssl s_client -connect site:443 -status 2>/dev/null | grep -i "OCSP Response Status"
  • Certbot/ACME cassĂ© → pas d’OCSP. VĂ©rifier chaĂźne fullchain.pem.
  • Terminaison TLS cĂŽtĂ© CDN : ces directives n’ont effet que si Nginx termine le TLS.
# Playbook TLS
1) Activer http2.
2) Sessions TLS (cache/tickets).
3) OCSP stapling + resolver.
4) Scanner (ssllabs) et mesurer TTFB.
Static “immutable”

Les assets **fingerprintĂ©s** (hash) peuvent avoir immutable → zĂ©ro revalidation → **zĂ©ro hit Nginx**.

location ~* \.(?:css|js|mjs|png|jpg|jpeg|gif|webp|svg|ico|woff2?|ttf)$ {
  expires 30d;
  add_header Cache-Control "public, max-age=2592000, immutable";
  try_files $uri$webp_suffix $uri =404;
}
curl -I https://site/static/app.3f1a9c2.js | tr -d '\r' | egrep 'Cache-Control|ETag|Last-Modified'

ETag/Last-Modified deviennent inutiles pour les fichiers immutables.

  • Ne pas utiliser “immutable” sur des noms non versionnĂ©s (main.js).
# Playbook
1) Fingerprinting cÎté build (Vite/Webpack).
2) Activer immutable.
3) Purge CDN à chaque déploiement.
Compression (Gzip / Brotli)

Réduire la taille des réponses texte (HTML/JS/CSS/JSON/SVG). Servir les versions pré-compressées quand possible.

gzip on;
gzip_comp_level 5;
gzip_types text/plain text/css application/javascript application/json image/svg+xml;
gzip_min_length 1024;
gzip_proxied any;
gzip_static on;            # sert *.gz si présents
# brotli on; brotli_static on;  # si module présent
curl -I --compressed https://site/static/app.js | egrep 'Content-Encoding|Vary'
  • Brotli indisponible par dĂ©faut (module requis). Garder gzip pour compatibilitĂ©.
1) Activer gzip + types.
2) Générer *.gz/*br cÎté CI.
3) Activer *_static.
Proxy buffering & buffers

Éviter les blocages lecture/Ă©criture sous charge ; absorber des rĂ©ponses volumineuses de l’upstream.

proxy_buffering on;
proxy_buffers 32 16k;
proxy_busy_buffers_size 64k;
proxy_temp_file_write_size 64k;
# proxy_buffer_size 16k;  # option
# surveiller le 'waiting' vs 'writing' dans stub_status
# (activer /nginx_status si besoin)
  • Buffers trop petits → write stalls.
1) Activer buffering.
2) Ajuster tailles selon payload.
3) ContrĂŽler CPU/IO & p95.
Access-log tamponné
log_format micro '$remote_addr - $request status=$status rt=$request_time urt=$upstream_response_time cache=$upstream_cache_status ru=$limit_req_status';
access_log /var/log/nginx/access.log micro buffer=64k flush=5s;

Tampon = moins d’I/O disque ; garder un flush court pour ne pas perdre d’évĂ©nements en cas de crash.

HTTP/2 & HTTP/3
listen 443 ssl http2;
# H3 si build QUIC :
# listen 443 quic reuseport;
# add_header Alt-Svc 'h3=":443"; ma=86400' always;

H2 = multiplexing sur 1 connexion → moins de connexions/s. H3 = gains sur rĂ©seaux instables (mobile).

CDN / WAF
# http{} — vraies IP derriùre Cloudflare (extrait)
set_real_ip_from 173.245.48.0/20;
set_real_ip_from 103.21.244.0/22;
real_ip_header   X-Forwarded-For;

Le CDN absorbe les pics mondiaux et applique du cache page. Activer “respect existing headers” pour le cache.

Kernel & NOFILE
# /etc/sysctl.d/99-tuning.conf
net.core.somaxconn = 8192
net.ipv4.tcp_max_syn_backlog = 8192
net.ipv4.tcp_syncookies = 1
net.ipv4.ip_local_port_range = 10240 65535
sysctl --system

# /etc/systemd/system/nginx.service.d/override.conf
[Service]
LimitNOFILE=65535
systemctl daemon-reload && systemctl restart nginx
Observabilité
log_format micro '$remote_addr - $request status=$status '
                 'rt=$request_time urt=$upstream_response_time '
                 'cache=$upstream_cache_status ru=$limit_req_status';
access_log /var/log/nginx/access.log micro;
# status rapide:
# location /nginx_status { stub_status on; allow 127.0.0.1; deny all; }
p95 rt/urt
ratio HIT
ru=DELAYED/REJECTED
%5xx < 0.5%
Clé de cache & Variantes (langue, UA, cookie)

Adapter la **clĂ©** aux variantes nĂ©cessaires (langue, device, cookie AB-test). Viser une clĂ© **compacte** pour limiter l’explosion du cache.

# http{}
map $http_accept_language $v_lang { default ""; ~^fr  "fr"; ~^en  "en"; }
map $http_user_agent      $v_ua   { default ""; ~*mobile "m"; }
map $cookie_ab            $v_ab   { default "";  A "A"; B "B"; }

# clé explicite (évite Host non stable)
proxy_cache_key $scheme$proxy_host$request_uri$v_lang$v_ua$v_ab;

Tracer la clé via un header pour debug :

add_header X-Cache-Key $scheme$proxy_host$request_uri$v_lang$v_ua$v_ab always;
  • Varier sans besoin → explosion d’objets & cache HIT↓.
1) Lister variantes indispensables.
2) Mapper lang/UA/cookie.
3) Exposer X-Cache-Key en dev.
Stale-While-Revalidate & verrou
# http{}
proxy_cache_lock on;
proxy_cache_lock_timeout 5s;
proxy_cache_background_update on;
proxy_cache_use_stale error timeout updating http_500 http_502 http_503 http_504;

Sous recalcul, 1 seul client passe en “writer”, les autres reçoivent l’ancienne version (stale) puis une version RAFRAÎCHIE.

Blue-Green, Canary & Reload Nginx
upstream django_upstream {
  server 127.0.0.1:8001;
  server 127.0.0.1:8002 backup;
}
# Canary simple par cookie:
map $cookie_canary $upstream_pool { default django_upstream; canary django_canary; }
upstream django_canary { server 127.0.0.1:8010; keepalive 64; }

location / { proxy_pass http://$upstream_pool; }
# reload sans coupure
nginx -t && systemctl reload nginx
Sécurité : Headers, CORS, Cookies
# Headers durs
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-Content-Type-Options "nosniff" always;
add_header Referrer-Policy "strict-origin-when-cross-origin" always;
add_header Permissions-Policy "geolocation=(), microphone=()" always;
# CORS
add_header Access-Control-Allow-Origin "*" always;
add_header Vary "Origin" always;
# Cookies proxy (HTTPS-only)
proxy_cookie_flags * secure samesite=lax;
Uploads : client_body, temp & limites
client_max_body_size 50m;
client_body_buffer_size 512k;
client_body_timeout 60s;
client_body_temp_path /var/lib/nginx/body 1 2;

Adapter les timeouts/buffers aux uploads réels (PDF, images
). Sur API, préférer le chunked ou pré-signé (S3) pour gros fichiers.