🟦 SQL Server – Architecture, T-SQL, Administration & Cloud
Guide IDEO-Lab détaillé pour SQL Server 2019, SQL Server 2022 et les versions plus récentes : moteur relationnel, T-SQL, indexation, tuning, sécurité, haute disponibilité, sauvegardes. Part 1
Intro & Positionnement
SQL Server en entreprise : Windows, Linux, Azure, PME, grands comptes.
EnterpriseT-SQLAzureVersions 2019 / 2022 / 2025
Ce qui change : compatibilité, IQP, Query Store, Ledger, IA, vector search.
201920222025Architecture moteur
Database Engine, services, fichiers MDF/NDF/LDF, pages, extents, buffer pool.
EngineMDF/LDFBuffer PoolStockage & fichiers
Filegroups, pages 8 KB, extents, allocation maps, tempdb, autogrowth.
FilegroupsPagesTempDBT-SQL & programmation
Procédures, fonctions, triggers, transactions, TRY/CATCH, MERGE, JSON.
T-SQLProceduresJSONIndexation & design physique
Clustered, nonclustered, columnstore, filtered, included columns, fragmentation.
IndexesColumnstoreStatsOptimizer & Query Store
Plans, cardinalité, parameter sniffing, Query Store, hints, plan forcing.
OptimizerQuery StorePlansTransactions & locks
Isolation levels, locks, latches, deadlocks, RCSI, SNAPSHOT, optimized locking.
LocksMVCCDeadlocksHA / DR
Always On Availability Groups, FCI, log shipping, backup/restore, RPO/RTO.
Always OnDRRPO/RTOSQL Server: the pragmatic enterprise database
SQL Server is not only a relational engine. It is a full enterprise data platform: OLTP engine, T-SQL language, security layer, backup and restore engine, high availability features, administration tools, performance diagnostics, BI integration, cloud integration, and a very strong Microsoft ecosystem.
Its natural territory is the Microsoft stack: Windows Server, Active Directory, .NET, PowerShell, Visual Studio, Azure, Power BI, Microsoft Fabric, and enterprise identity management. However, since SQL Server 2017, the engine also runs on Linux, which changed its positioning for hybrid and containerized environments.
Where SQL Server is especially strong
- Business applications: ERP, CRM, accounting, logistics, health care, insurance, retail systems.
- .NET ecosystems: mature drivers, Entity Framework, Dapper, Windows authentication, Azure DevOps workflows.
- Operational reporting: views, stored procedures, SQL Agent jobs, SSRS, Power BI, Fabric integration.
- DBA productivity: SSMS, Query Store, graphical plans, DMVs, Extended Events, SQL Agent.
- Hybrid cloud: on-premises SQL Server, Azure SQL Database, Azure SQL Managed Instance, Arc-enabled SQL Server.
Executive summary
SQL Server = relational database + DBA tooling + Microsoft ecosystem
Core strengths:
- Very productive administration experience
- Strong T-SQL procedural language
- Excellent tooling: SSMS, Query Store, SQL Agent, XEvents
- Strong integration with .NET, Active Directory, Azure, Power BI
- Solid OLTP, reporting, analytics, HA and DR features
Typical risks:
- Licensing can become expensive
- Bad indexing hurts very fast
- tempdb is often neglected
- parameter sniffing can create unstable plans
- mixed OLTP + reporting workloads must be isolated
- backups are useless if restore tests are never doneSimple mental model
Application
|
v
TDS protocol / driver
|
v
SQL Server engine
|
+-- Parser / optimizer / execution engine
+-- Buffer pool / memory manager
+-- Transaction log / recovery engine
+-- Lock manager / row versioning
+-- Storage engine / data files
+-- Security / permissions / audit
+-- SQL Agent / jobs / automationPositioning map
SQL Server timeline: why version matters
SQL Server is a mature database, but recent versions changed the practical DBA experience. SQL Server 2019 made Intelligent Query Processing a serious operational topic. SQL Server 2022 strengthened Query Store, cloud integration, Ledger, and availability scenarios. SQL Server 2025 moves the platform toward AI, vector search, modern analytics, and deeper cloud/hybrid integration.
| Version | Strategic value | Important features | Operational impact |
|---|---|---|---|
| SQL Server 2016 | Modern baseline for many enterprises | Query Store, temporal tables, Always Encrypted, JSON support | Better performance troubleshooting and security capabilities |
| SQL Server 2017 | Cross-platform shift | Linux support, adaptive query processing | SQL Server becomes possible outside pure Windows environments |
| SQL Server 2019 | Strong enterprise release | Batch mode on rowstore, scalar UDF inlining, memory grant feedback, Big Data Clusters | Major tuning improvements through Intelligent Query Processing |
| SQL Server 2022 | Hybrid cloud and Query Store maturity | Parameter Sensitive Plan optimization, Query Store hints, Ledger, Azure integration | More stable plan management and easier cloud-connected operations |
| SQL Server 2025 | AI-oriented database platform | Vector capabilities, AI integration, modern developer and analytics features | SQL Server becomes more relevant for AI-assisted search and data applications |
Version decision diagram
Is the database new?
|
+-- yes --> Start with SQL Server 2022 or newer unless a vendor blocks it
|
+-- no --> Is the current version supported and stable?
|
+-- yes --> Upgrade only with a business reason and test plan
|
+-- no --> Build an upgrade path:
inventory -> compatibility check -> restore test
-> workload replay -> performance baseline
-> migration window -> rollback planUpgrade mindset
- Never upgrade blind: capture workload, top queries, wait stats, indexes, jobs, and maintenance plans first.
- Compatibility level matters: engine version and database compatibility level are related but different.
- Query Store is your friend: it helps detect regressions and force known good plans when needed.
- Test restores: an upgrade plan without restore validation is not serious.
Pre-upgrade checklist
1. Full backup + restore test
2. DBCC CHECKDB clean result
3. SQL Agent jobs inventory
4. Linked servers inventory
5. CLR / Service Broker / replication check
6. Compatibility level review
7. Top SQL baseline from Query Store
8. Wait stats baseline
9. HA / DR failover test
10. Rollback plan validatedMain enterprise use cases
| Context | Why SQL Server fits | Typical design | Risks to monitor |
|---|---|---|---|
| ERP / CRM | Strong transactions, stored procedures, security, reporting | OLTP database + reporting replica or warehouse | Blocking, nightly batches, index bloat, vendor SQL |
| .NET web applications | Excellent drivers, AD integration, ORM support | App servers + SQL Server primary + HA replica | N+1 queries, parameter sniffing, connection storms |
| Reporting / BI | Power BI, SSRS, SSAS, Fabric integration | OLTP isolated from reporting through ETL or replicas | Heavy SELECT queries harming OLTP latency |
| Data warehouse | Columnstore, partitioning, compression, batch mode | Star schema, fact tables, partitioned loads | Bad model, poor load windows, tempdb pressure |
| Hybrid cloud | Azure SQL, Managed Instance, Arc, backups to Azure | On-prem primary + cloud DR or managed target | Latency, egress costs, identity, compliance |
| Audit and compliance systems | Security, auditing, encryption, predictable operations | Dedicated database with strict RBAC and audit logs | Over-permissioned users, weak backup encryption |
Workload classification
OLTP workload:
- many small transactions
- low latency required
- strict locking discipline
- indexes must serve selective lookups
- transaction log performance is critical
Reporting workload:
- fewer but heavier queries
- scans, aggregations, joins
- columnstore can help
- replicas or ETL pipelines should isolate load
Mixed workload:
- most dangerous pattern
- reporting can destroy OLTP latency
- solution: replicas, warehouse, caching, query governanceSimple workload diagram
+----------------+
Users -----> | Application |
+--------+-------+
|
v
+---------------+
| SQL Server | OLTP path
| Primary |
+-------+-------+
|
+--------------+----------------+
| |
v v
+--------------+ +--------------+
| HA Replica | | ETL / CDC |
| Read routing | | Warehouse |
+--------------+ +--------------+
| |
v v
Reporting users Power BI / FabricSQL Server architecture map
Understanding SQL Server starts with one simple idea: the engine tries to avoid slow disk access by using memory, cached execution plans, efficient locking, and sequential transaction log writes. Performance problems usually appear when one of these layers is saturated or badly configured.
.NET, Java, Python, ODBC, JDBC
connections, sessions, batches
parser, optimizer, executor
compiled execution plans
data pages in memory
locks, latches, isolation
durability and recovery
.mdf / .ndf
.ldf
sorts, spills, version store
full, diff, log
Core components
| Component | Role | Common issue |
|---|---|---|
| Buffer pool | Caches data pages in memory | Memory pressure, low page life expectancy |
| Plan cache | Stores compiled execution plans | Plan instability, parameter sniffing, ad hoc bloat |
| Transaction log | Guarantees durability and recovery | Slow log writes, VLF fragmentation, missing log backups |
| tempdb | Temporary objects, spills, version store | Contention, disk pressure, bad file layout |
| Lock manager | Coordinates concurrency | Blocking, deadlocks, lock escalation |
Read/write flow
SELECT flow:
1. parse T-SQL
2. compile or reuse plan
3. read pages from buffer pool
4. if page missing, read from data file
5. execute operators
6. return result set
UPDATE flow:
1. find target rows
2. acquire locks
3. modify pages in memory
4. write log records to transaction log
5. COMMIT waits for log hardening
6. dirty pages are written later by checkpoint/lazy writerSQL Server editions and deployment choices
Edition choice is not just a commercial decision. It affects scalability, HA features, online operations, compression, partitioning scenarios, resource limits, and operational comfort. A good architecture starts by matching edition, workload, RPO/RTO, and budget.
| Edition / service | Best fit | Strengths | Limits / warnings |
|---|---|---|---|
| Express | Small apps, dev, embedded workloads | Free, easy to deploy | Strict size, CPU, memory limits |
| Developer | Development and testing | Enterprise-like features for non-production use | Not licensed for production |
| Standard | PME/ETI production systems | Good feature set, lower cost than Enterprise | Feature and scale limitations |
| Enterprise | Mission critical workloads | Advanced HA, performance, online operations, scale | High cost, needs license governance |
| Azure SQL Database | Managed cloud database | No OS management, high automation | Feature differences from boxed SQL Server |
| Azure SQL Managed Instance | Lift-and-shift to Azure | Closer to SQL Server instance model | Cloud networking and cost model must be understood |
Decision tree
Need a tiny local database?
|
+-- yes --> Express or LocalDB
Need development only?
|
+-- yes --> Developer Edition
Production on-premises?
|
+-- small/medium workload --> Standard Edition
+-- critical workload, advanced HA, high scale --> Enterprise Edition
Cloud-first project?
|
+-- app database with managed operations --> Azure SQL Database
+-- migration from existing SQL Server instance --> Azure SQL Managed Instance
+-- hybrid governance / inventory --> Arc-enabled SQL ServerSQL Server vs Oracle vs PostgreSQL vs MySQL
No database is universally better. The right question is: which engine minimizes risk and cost for this workload, this team, this ecosystem, and this operational model?
| Criterion | SQL Server | Oracle | PostgreSQL | MySQL / MariaDB |
|---|---|---|---|---|
| Natural ecosystem | Microsoft, .NET, Azure, Power BI | Large enterprise, ERP, OCI | Open-source, cloud-neutral, extensions | Web apps, SaaS, LAMP, high read scale |
| Procedural language | T-SQL | PL/SQL | PL/pgSQL plus extensions | Stored routines, less central in practice |
| Tooling | SSMS, Agent, Query Store, XEvents | OEM, AWR, ASH, RMAN | Many open-source tools, psql, pg_stat | Many tools, simpler operational model |
| HA model | Always On AG, FCI, log shipping | RAC, Data Guard | Streaming replication, Patroni, tools | Replication, Galera, external tooling |
| Licensing | Commercial, edition based | Commercial, often expensive options | Open-source license | Open-source / commercial variants |
| Best fit | Microsoft-centric enterprise apps | Very large mission-critical enterprise systems | Modern open-source applications and extensibility | Web-scale and simple relational workloads |
When SQL Server is a very good choice
- The company already uses Microsoft identity, Windows Server, Azure, Power BI, and .NET.
- The team wants strong GUI tooling and fast operational onboarding.
- The workload is classic OLTP with reporting and business procedures.
- The DBA team wants Query Store, graphical plans, Agent jobs, and a very readable diagnostic workflow.
When SQL Server may not be ideal
- The organization wants to avoid commercial licensing completely.
- The system requires deep open-source extensibility or PostgreSQL-specific extensions.
- The architecture is designed around cloud-native managed open-source services.
- The workload is mostly document, graph, search, or event streaming rather than relational OLTP.
Numbers and limits every DBA should know
Exact limits depend on edition, version, hardware, workload and configuration. But these practical numbers help build intuition and avoid beginner mistakes.
Operational indicators
| Signal | What it means | First check | Typical action |
|---|---|---|---|
| High PAGEIOLATCH waits | Waiting for data pages from disk | Storage latency, missing indexes, scans | Index tuning, query rewrite, storage review |
| High WRITELOG waits | Waiting for transaction log writes | Log disk latency, commit rate, VLFs | Move log, batch commits, size log correctly |
| High CXPACKET / CXCONSUMER | Parallel query coordination | MAXDOP, cost threshold, bad plans | Tune query, indexes, parallelism settings |
| High LCK_M waits | Blocking locks | Blocking chain, open transactions | Fix transaction scope, isolation, indexes |
| High SOS_SCHEDULER_YIELD | CPU pressure or CPU-heavy queries | Top CPU queries, host CPU | Optimize SQL, reduce scans, add CPU only after proof |
| tempdb growth | Sorts, spills, temp tables, version store | Query spills, snapshot isolation, temp objects | Tune memory grants, indexes, tempdb layout |
Visual risk bars
Educational examples: what SQL Server teaches very well
Example 1: find the current database version
SELECT
@@VERSION AS sql_server_version;
SELECT
SERVERPROPERTY('ProductVersion') AS product_version,
SERVERPROPERTY('ProductLevel') AS product_level,
SERVERPROPERTY('Edition') AS edition,
SERVERPROPERTY('EngineEdition') AS engine_edition;Example 2: list database files
SELECT
DB_NAME(database_id) AS database_name,
type_desc,
name AS logical_name,
physical_name,
size * 8 / 1024 AS size_mb,
growth,
is_percent_growth
FROM sys.master_files
ORDER BY database_name, type_desc;Example 3: top active requests
SELECT
r.session_id,
r.status,
r.command,
r.cpu_time,
r.total_elapsed_time,
r.wait_type,
r.blocking_session_id,
DB_NAME(r.database_id) AS database_name,
t.text AS sql_text
FROM sys.dm_exec_requests r
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) t
ORDER BY r.total_elapsed_time DESC;Example 4: index usage snapshot
SELECT
OBJECT_SCHEMA_NAME(i.object_id) AS schema_name,
OBJECT_NAME(i.object_id) AS table_name,
i.name AS index_name,
s.user_seeks,
s.user_scans,
s.user_lookups,
s.user_updates
FROM sys.indexes i
LEFT JOIN sys.dm_db_index_usage_stats s
ON s.object_id = i.object_id
AND s.index_id = i.index_id
AND s.database_id = DB_ID()
WHERE i.object_id > 100
ORDER BY s.user_updates DESC, s.user_seeks ASC;Example 5: database backup history
SELECT TOP (50)
bs.database_name,
bs.type AS backup_type,
bs.backup_start_date,
bs.backup_finish_date,
DATEDIFF(second, bs.backup_start_date, bs.backup_finish_date) AS duration_sec,
bs.backup_size / 1024 / 1024 AS backup_size_mb
FROM msdb.dbo.backupset bs
ORDER BY bs.backup_finish_date DESC;Example 6: quick blocking check
SELECT
session_id,
blocking_session_id,
wait_type,
wait_time,
wait_resource,
status,
command
FROM sys.dm_exec_requests
WHERE blocking_session_id <> 0
ORDER BY wait_time DESC;Common SQL Server myths vs reality
| Myth | Reality | Educational lesson |
|---|---|---|
| SQL Server is only for Windows | Modern SQL Server also runs on Linux and containers. | Know both the historical ecosystem and the modern deployment options. |
| SSMS is enough | SSMS is excellent, but serious DBA work needs DMVs, wait stats, Query Store, XEvents, and restore drills. | Tools help, but engine knowledge remains mandatory. |
| Indexes always improve performance | Indexes speed reads but slow writes and consume storage. Bad indexes hurt. | Every index must have a reason, usage evidence, and maintenance cost. |
| More CPU fixes slow SQL | Many slow workloads are caused by bad plans, missing indexes, blocking, memory pressure, or IO latency. | Measure before buying hardware. |
| Full backups are enough | For point-in-time recovery, transaction log backups are essential in full recovery model. | Backup strategy must match RPO and RTO. |
| NOLOCK is a magic performance fix | NOLOCK can return dirty, inconsistent, duplicated or missing rows. | Concurrency must be designed, not hacked. |
Reality from production
Most real incidents are boring:
- disk full
- log not backed up
- bad deployment changed a plan
- missing index after new feature
- reporting query hits primary database
- long transaction blocks users
- SQL Agent job failed silently
- backup exists but restore was never testedWhat a strong DBA does
A strong DBA:
1. baselines normal behavior
2. automates health checks
3. tests restores
4. reviews top queries
5. controls index growth
6. validates HA failover
7. documents runbooks
8. watches capacity trends
9. explains incidents with facts
10. reduces operational surprisesLearning path: how to become operational with SQL Server
SQL Server can be learned in layers. The mistake is trying to learn every feature at once. The correct path is engine basics, administration, backup/restore, performance diagnostics, high availability, security, and finally cloud/hybrid integration.
| Level | Topics | Practical exercises | Expected skill |
|---|---|---|---|
| 1. Foundations | Databases, schemas, tables, indexes, T-SQL basics | Create tables, constraints, joins, stored procedures | Understand relational model and T-SQL workflow |
| 2. Administration | Files, filegroups, log, recovery models, SQL Agent | Create jobs, monitor files, configure maintenance | Run a small production database safely |
| 3. Backup and restore | Full, differential, log backups, point-in-time restore | Restore to another server and validate data | Prove recovery, not just backups |
| 4. Performance | Execution plans, indexes, waits, Query Store, tempdb | Tune a slow query and compare before/after | Diagnose with evidence |
| 5. HA / DR | Always On, FCI, log shipping, RPO/RTO, failover tests | Build a small AG lab and test failover | Design for continuity |
| 6. Security | Logins, users, roles, encryption, audit, least privilege | Implement RBAC and audit access | Protect data and prove compliance |
| 7. Cloud / hybrid | Azure SQL, Managed Instance, Arc, backups to Azure | Migrate a test DB and compare features | Operate SQL Server across on-prem and cloud |
DBA daily checklist
Daily:
- Check failed SQL Agent jobs
- Check backup completion and restore validation status
- Review severe errors in SQL Server logs
- Check disk and log growth
- Look for blocking, deadlocks, long transactions
- Review top Query Store regressions
- Watch tempdb and transaction log behavior
Weekly:
- Review index usage and fragmentation
- Review capacity trends
- Validate HA synchronization
- Test a restore sample
- Review security changes
- Update runbooks after incidents
Monthly:
- Failover exercise if the business requires it
- Patch review
- License and edition review
- Disaster recovery drill
- Performance baseline reportThree generations, three different reasons to care
SQL Server 2019, 2022 and 2025 should not be seen as a simple chronological list. Each generation answers a different operational need. SQL Server 2019 is a strong enterprise baseline. SQL Server 2022 is the Query Store and hybrid-cloud maturity release. SQL Server 2025 moves the product toward AI-assisted data applications, vector workloads, modern development, and deeper cloud integration.
Fast reading
2019:
- IQP becomes operationally important
- ADR changes rollback and recovery behavior
- Linux is mature enough for real use
- Good target from older versions
2022:
- Query Store becomes more strategic
- PSP reduces some parameter sniffing cases
- Ledger improves tamper-evidence scenarios
- Stronger Azure hybrid story
2025:
- Vector data type and vector search
- AI-oriented database scenarios
- Modern developer and hybrid features
- New tests needed before production adoptionVersion timeline
SQL Server 2019
Major operational value around Intelligent Query Processing, Accelerated Database Recovery, better Linux maturity, and a very solid enterprise engine for classic OLTP and reporting.
IQP ADR Linux Batch mode on rowstoreSQL Server 2022
Strong release for plan stability and hybrid operations. Parameter Sensitive Plan optimization addresses cases where one cached plan is not good for all parameter values.
Query Store hints PSP Ledger Azure integrationSQL Server 2025
New strategic direction: vector data type, vector search, AI-oriented capabilities, and more modern application patterns directly closer to the relational engine.
Vector data type AI integration Semantic search Hybrid cloudVisual positioning
SQL Server 2019: the strong modern baseline
SQL Server 2019 remains very relevant in production environments. It is stable, well-known by DBA teams, widely supported by vendors, and modern enough to include several performance and recovery improvements that changed daily operations.
Main reasons to care
- Intelligent Query Processing: multiple optimizer improvements can improve existing workloads with minimal code changes.
- Batch mode on rowstore: analytical-style processing can benefit even without columnstore indexes in some scenarios.
- Table variable deferred compilation: better cardinality estimates for table variables compared with older behavior.
- Scalar UDF inlining: some scalar functions can be transformed to reduce row-by-row execution cost.
- Memory grant feedback: SQL Server can adjust memory grants after executions to reduce spills or waste.
- Accelerated Database Recovery: crash recovery and rollback behavior becomes more predictable, especially with long transactions.
- Linux maturity: SQL Server is no longer only a Windows Server story.
Good production profile
SQL Server 2019 is a strong choice when:
- the application vendor supports it
- the team wants a stable enterprise release
- the company migrates from 2012, 2014 or 2016
- the environment is mostly Windows and .NET
- the workload is classic OLTP plus reporting
- the team needs modern Query Store and IQP features
- production risk must remain moderateWatch points
Do not assume everything is automatic:
- IQP depends on compatibility level and feature scope
- Query regressions can still happen
- tempdb design remains critical
- ADR changes recovery internals, so test it
- Linux deployment requires OS-specific skills
- vendor applications may restrict compatibility levelSQL Server 2019 feature impact table
| Feature | Problem addressed | DBA value | Testing focus |
|---|---|---|---|
| Batch mode on rowstore | Heavy analytic queries on regular rowstore tables | Can improve CPU efficiency for eligible queries | Compare plans and CPU before/after |
| Scalar UDF inlining | Slow scalar function calls executed row by row | Can remove a major hidden bottleneck | Check behavior, plans, and edge cases |
| Table variable deferred compilation | Poor estimates for table variables | Better join choices and grants | Validate procedures using table variables |
| Memory grant feedback | Spills to tempdb or excessive memory grants | More adaptive execution over time | Observe repeated executions |
| Accelerated Database Recovery | Long rollback and slow recovery after long transactions | Improved availability and more predictable recovery | Test rollback, log growth, version store behavior |
2019 simplified engine diagram
SQL Server 2022: Query Store and hybrid maturity
SQL Server 2022 is especially important because it strengthens the DBA control loop: observe performance, identify regressions, influence behavior, and stabilize workloads without necessarily changing application code.
Main themes
- Query Store is more central: new databases have Query Store enabled by default in many scenarios.
- Query Store hints: influence query behavior without editing application SQL text.
- Parameter Sensitive Plan optimization: multiple plan variants can be used for different parameter cardinality patterns.
- Ledger: tamper-evident history for compliance and audit scenarios.
- Azure integration: stronger hybrid cloud story with managed services and cloud-connected features.
- ADR improvements: recovery and transaction behavior continue to mature.
What PSP changes
Old parameter sniffing problem:
- first execution uses parameter A
- optimizer builds a plan for A
- cached plan is reused for parameter B
- B has very different selectivity
- performance collapses
SQL Server 2022 PSP idea:
- detect sensitive parameter patterns
- keep multiple useful plan variants
- choose a better variant for future executions
- reduce some classic parameter sniffing painQuery Store mental model
Query Store captures:
- query text
- plan history
- runtime statistics
- wait statistics
- regressions over time
DBA uses it to:
- compare before/after upgrade
- detect plan regression
- force or guide a better plan
- produce evidence for tuning decisionsSQL Server 2022 feature table
| Feature | Practical value | Who cares | Operational warning |
|---|---|---|---|
| Parameter Sensitive Plan optimization | Reduces some unstable plan cases caused by skewed parameter values | DBA, app teams, legacy systems | Not a universal fix for all bad plans |
| Query Store hints | Allows plan influence without changing code | DBA and production support | Must be documented and reviewed |
| Improved memory grant feedback | Better stability for queries with variable memory needs | BI, reporting, mixed workloads | Requires observation over repeated runs |
| Ledger | Tamper-evident data history for audit cases | Compliance, finance, legal | Not a replacement for security design |
| Azure integration | Easier hybrid scenarios and cloud-connected management | Platform teams, cloud architects | Network, latency and cost must be designed |
PSP visual example
SQL Server 2025: AI-ready database direction
SQL Server 2025 represents a strategic shift. The relational database is no longer positioned only as a transaction engine. It becomes closer to modern AI and semantic search applications by adding native vector-oriented capabilities and stronger developer patterns.
Core ideas
- Vector data type: store embeddings more naturally inside the SQL database engine.
- Vector search: support similarity search patterns closer to relational data.
- AI integration: reduce the distance between business data and AI-assisted retrieval.
- Hybrid and cloud direction: continue the path toward Azure, Fabric, Arc and managed scenarios.
- Developer value: better fit for applications that combine structured filters and semantic search.
Why vector support matters
Classic search:
- exact text match
- LIKE predicates
- full-text search
- filters and keywords
Vector search:
- semantic similarity
- embeddings represent meaning
- useful for AI search, recommendations, RAG
- can combine:
business filters
security filters
relational joins
semantic similarityPrudent production stance
Before adopting 2025 in production:
1. confirm vendor support
2. validate drivers and tooling
3. test compatibility level behavior
4. benchmark top queries
5. test Query Store baselines
6. test HA and DR
7. test backup and restore
8. validate AI/vector use cases separately
9. document rollback
10. avoid novelty-driven migrationVector search architecture pattern
2025 feature value table
| Capability | What it enables | Example use case | DBA concern |
|---|---|---|---|
| Vector data type | Store embeddings inside SQL Server | Semantic product search | Storage, indexing, backup size, query cost |
| Vector search | Similarity-based retrieval | RAG over enterprise documents | Plan quality, latency, index maintenance |
| AI-oriented patterns | Closer integration between data and AI apps | Customer support assistant over SQL data | Security, governance, data leakage |
| Hybrid integration | Connect on-premises estate to cloud services | Fabric reporting over operational data | Cost, latency, compliance |
Compatibility level: the hidden switch behind many surprises
A SQL Server instance version and a database compatibility level are not the same thing. The instance engine can be upgraded while a database keeps an older compatibility level. This gives DBAs a safer way to separate infrastructure upgrade from optimizer behavior changes.
Engine version
The installed SQL Server binaries and engine capabilities. It defines available engine features, patch level, instance behavior, and global capabilities.
Example:
SQL Server 2022 instance
running a database at
compatibility level 150Database compatibility level
A database-level setting that influences query processing behavior, optimizer rules, and availability of some database-scoped features.
Example:
ALTER DATABASE Sales
SET COMPATIBILITY_LEVEL = 160;Common compatibility levels
| Compatibility level | Associated generation | Typical reason to keep it | Typical reason to change it |
|---|---|---|---|
| 140 | SQL Server 2017 | Vendor or legacy optimizer stability | Too old for newer IQP behavior |
| 150 | SQL Server 2019 | Stable modern baseline | Need 2022 optimizer features |
| 160 | SQL Server 2022 | Enable 2022 query processing improvements | Requires regression testing |
| 170 | SQL Server 2025 | Use latest behavior and new scenarios | Only after lab and workload validation |
Safe compatibility change workflow
Useful commands
SELECT
name AS database_name,
compatibility_level
FROM sys.databases
ORDER BY name;
ALTER DATABASE YourDatabaseName
SET COMPATIBILITY_LEVEL = 160;
ALTER DATABASE YourDatabaseName
SET QUERY_STORE = ON;Intelligent Query Processing evolution
Intelligent Query Processing is not one feature. It is a family of optimizer and execution improvements designed to make SQL Server adapt better to real workloads. The practical DBA value is simple: fewer bad plans, better memory usage, better execution mode choices, and less manual intervention in selected scenarios.
| Feature family | Typical generation | Problem addressed | Example symptom |
|---|---|---|---|
| Batch mode on rowstore | 2019 | Improve analytic query execution on rowstore data | High CPU on aggregation queries |
| Scalar UDF inlining | 2019 | Reduce row-by-row scalar function overhead | Function looks small but query is extremely slow |
| Table variable deferred compilation | 2019 | Improve estimates for table variables | Nested loops chosen for large intermediate set |
| Memory grant feedback | 2019 / 2022 improvements | Adjust excessive or insufficient query memory | Sort/hash spills to tempdb |
| Parameter Sensitive Plan optimization | 2022 | Support multiple plan variants for skewed parameters | Same procedure is fast for one customer and slow for another |
IQP visual map
Impact chart
Query Store: the DBA black box recorder
Query Store is one of the most important SQL Server features for modern operations. It records query text, plans, runtime statistics and wait statistics over time. During upgrades, it becomes the main tool for comparing before and after behavior.
Why it matters by version
| Version | Query Store role | DBA value |
|---|---|---|
| 2019 | Baseline and regression analysis | Capture top queries before upgrade or tuning |
| 2022 | More central to optimizer features | Hints, PSP visibility, stronger plan governance |
| 2025 | Still central for modern workloads | Measure new workload patterns, including AI-related queries |
Good Query Store setup
ALTER DATABASE YourDatabaseName
SET QUERY_STORE = ON;
ALTER DATABASE YourDatabaseName
SET QUERY_STORE (
OPERATION_MODE = READ_WRITE,
CLEANUP_POLICY = (STALE_QUERY_THRESHOLD_DAYS = 30),
DATA_FLUSH_INTERVAL_SECONDS = 900,
INTERVAL_LENGTH_MINUTES = 60,
MAX_STORAGE_SIZE_MB = 2048,
QUERY_CAPTURE_MODE = AUTO
);What Query Store gives you
For each important query:
- query text
- query id
- plan id
- execution count
- duration
- CPU
- logical reads
- memory consumption
- waits
- plan changes over time
This allows:
- before/after upgrade comparison
- regression detection
- plan forcing if justified
- evidence-based tuning reportOperational anti-pattern
Bad:
- enable Query Store after the incident
- keep default storage too small
- never review forced plans
- force plans without documenting reason
- ignore Query Store cleanup
Good:
- enable before migration
- capture baseline
- review top regressions
- document all hints and forced plans
- review monthlyQuery Store upgrade diagram
Security and compliance evolution
Version choice also affects security posture. SQL Server already has strong foundations: logins, users, roles, schemas, certificates, encryption, auditing and row-level controls. Recent versions add stronger compliance narratives, especially with Ledger and cloud governance.
| Area | SQL Server 2019 | SQL Server 2022 | SQL Server 2025 direction |
|---|---|---|---|
| Encryption | TDE, Always Encrypted, backup encryption | Same foundation, stronger cloud governance contexts | More AI data governance pressure |
| Auditing | SQL Server Audit, Extended Events, custom logs | Ledger supports tamper-evident scenarios | Audit becomes essential for AI-facing data access |
| Identity | Windows Auth, SQL Auth, AD patterns | Hybrid identity becomes more common | Cloud and AI workloads increase identity complexity |
| Data exposure | Row-level security, dynamic data masking | Same plus stronger cloud-connected governance | Semantic search must respect business permissions |
Ledger simplified
Ledger goal:
- make data tampering evident
- preserve cryptographic history
- support audit and compliance scenarios
Ledger is useful for:
- financial records
- sensitive business events
- compliance evidence
- external audit confidence
Ledger is not:
- a replacement for backups
- a replacement for permissions
- a magic security layer
- a reason to ignore least privilegeAI and vector security warning
If SQL Server stores embeddings:
- embeddings may leak semantic meaning
- search results must respect tenant and user permissions
- vector queries need business filters
- audit must include AI-facing queries
- backups contain vectors too
- data retention policies must include embeddingsSecurity architecture for modern SQL Server
Upgrade strategy: serious, measured, reversible
A SQL Server upgrade is not a setup wizard. It is an engineering project. The goal is not only to install a new version. The goal is to prove that the workload, backups, jobs, HA, monitoring, security and performance remain correct.
Phase 1: inventory
- SQL Server version and edition
- Database list and compatibility levels
- SQL Agent jobs
- Linked servers
- Replication, CDC, Service Broker
- SSIS, SSRS, SSAS dependencies
- Third-party backup or monitoring tools
Phase 2: baseline
- Enable Query Store
- Capture top queries
- Capture wait statistics
- Capture file growth and disk latency
- Capture job durations
- Capture backup durations
- Document business peak periods
Phase 3: lab upgrade
- Restore production backup to lab
- Run DBCC CHECKDB
- Run application smoke tests
- Replay business workload if possible
- Compare Query Store metrics
- Test compatibility level change
- Document regressions
Phase 4: production cutover
- Validated backup
- Rollback script
- Maintenance window
- Communication plan
- Post-upgrade monitoring
- Performance comparison
- Final acceptance report
Upgrade pipeline diagram
Risk table
| Risk | Cause | Detection | Mitigation |
|---|---|---|---|
| Plan regression | Optimizer behavior changes | Query Store before/after | Compatibility level, Query Store hints, tuning |
| Vendor incompatibility | Application not certified | Vendor matrix and lab test | Delay upgrade or keep older compatibility level |
| Job failure | SQL Agent, paths, permissions, proxies | Job history and test runs | Pre-cutover job validation |
| HA failure | Always On or cluster mismatch | Failover exercise | Rehearsed DR plan |
| Rollback impossible | No restore path or data changed too far | Rollback rehearsal | Backup, log chain, snapshot, cutover design |
Decision matrix: which version should you choose?
| Scenario | Recommended target | Reason | Risk level |
|---|---|---|---|
| Legacy SQL Server 2012 / 2014 | 2019 or 2022 | Modern baseline, better tooling, supported ecosystem | Medium |
| Stable vendor ERP | Vendor-certified version | Certification matters more than novelty | Low if certified |
| Heavy parameter sniffing issues | 2022 | PSP and Query Store improvements may help | Medium |
| Hybrid Azure strategy | 2022 or newer | Better cloud-connected capabilities | Medium |
| AI semantic search application | 2025 lab first | Vector and AI-oriented features are strategic | High until validated |
| Mission-critical conservative production | 2022 after mature testing | Modern but less novelty-driven than 2025 | Medium |
Version fit chart
DBA scripts for version, compatibility and upgrade preparation
Server version and edition
SELECT
@@VERSION AS full_version;
SELECT
SERVERPROPERTY('MachineName') AS machine_name,
SERVERPROPERTY('ServerName') AS server_name,
SERVERPROPERTY('InstanceName') AS instance_name,
SERVERPROPERTY('ProductVersion') AS product_version,
SERVERPROPERTY('ProductLevel') AS product_level,
SERVERPROPERTY('ProductUpdateLevel') AS update_level,
SERVERPROPERTY('Edition') AS edition,
SERVERPROPERTY('EngineEdition') AS engine_edition;Database compatibility levels
SELECT
name AS database_name,
compatibility_level,
recovery_model_desc,
state_desc,
is_query_store_on
FROM sys.databases
ORDER BY compatibility_level, name;Query Store status
SELECT
d.name AS database_name,
d.is_query_store_on,
q.actual_state_desc,
q.desired_state_desc,
q.current_storage_size_mb,
q.max_storage_size_mb,
q.query_capture_mode_desc
FROM sys.databases d
LEFT JOIN sys.database_query_store_options q
ON d.database_id = DB_ID(d.name)
ORDER BY d.name;Top queries by total duration
SELECT TOP (25)
qt.query_sql_text,
rs.count_executions,
rs.avg_duration / 1000.0 AS avg_duration_ms,
rs.avg_cpu_time / 1000.0 AS avg_cpu_ms,
rs.avg_logical_io_reads,
p.plan_id
FROM sys.query_store_query_text qt
JOIN sys.query_store_query q
ON qt.query_text_id = q.query_text_id
JOIN sys.query_store_plan p
ON q.query_id = p.query_id
JOIN sys.query_store_runtime_stats rs
ON p.plan_id = rs.plan_id
ORDER BY rs.avg_duration DESC;Check latest backups
SELECT
d.name AS database_name,
MAX(CASE WHEN b.type = 'D' THEN b.backup_finish_date END) AS last_full_backup,
MAX(CASE WHEN b.type = 'I' THEN b.backup_finish_date END) AS last_diff_backup,
MAX(CASE WHEN b.type = 'L' THEN b.backup_finish_date END) AS last_log_backup
FROM sys.databases d
LEFT JOIN msdb.dbo.backupset b
ON d.name = b.database_name
GROUP BY d.name
ORDER BY d.name;Find databases not at expected compatibility level
DECLARE @expected_level int = 160;
SELECT
name AS database_name,
compatibility_level,
CASE
WHEN compatibility_level = @expected_level THEN 'OK'
ELSE 'Review'
END AS status
FROM sys.databases
WHERE database_id > 4
ORDER BY status DESC, name;Hands-on lab plan: learn versions by proving behavior
The best way to understand SQL Server versions is to build a small lab and observe behavior. Do not only read feature lists. Measure plans, waits, Query Store history, compatibility level effects, backup behavior and rollback scenarios.
| Lab | Goal | Steps | Expected lesson |
|---|---|---|---|
| Compatibility level test | Understand optimizer behavior changes | Run same workload at level 150 then 160 | Compatibility level can change plans |
| Query Store baseline | Capture before/after metrics | Enable Query Store, execute workload, compare stats | Performance must be measured historically |
| Parameter sensitivity | Observe skewed data behavior | Create uneven customer/order distribution | One plan is not always good for all parameters |
| ADR rollback | Understand recovery changes | Run long transaction, rollback, observe behavior | Recovery design affects availability |
| Vector search prototype | Explore AI-ready patterns | Store text, embeddings, business filters | Semantic search must still respect relational rules |
| Upgrade rehearsal | Practice safe migration | Restore backup, upgrade, compare Query Store, test rollback | Upgrade is a controlled engineering process |
Lab architecture
Final checklist
A serious version comparison report should include:
1. server version and edition
2. database compatibility level
3. Query Store status
4. workload description
5. top 20 queries by duration
6. top 20 queries by CPU
7. top waits
8. backup and restore validation
9. HA and DR test result
10. regressions and mitigations
11. final recommendation
12. rollback strategySQL Server Database Engine: the central machine
SQL Server is not just a service that stores tables. It is a complete execution platform: network protocol, query parser, binder, optimizer, execution engine, storage engine, buffer manager, transaction log manager, lock manager, memory manager, schedulers, workers, metadata, system databases and automation services.
For a DBA, the goal is not to memorize every internal detail. The goal is to understand how a SQL request becomes physical work: CPU work, memory usage, page reads, log writes, locks, waits, tempdb activity and file growth.
Six core principles
- Data is read in pages: SQL Server works with 8 KB pages. Tables and indexes are collections of pages.
- Memory is central: the Buffer Pool caches pages to avoid repeated disk reads.
- The optimizer chooses a plan: the plan determines joins, scans, seeks, sorts and memory needs.
- The log protects durability: committed changes must be hardened in the transaction log.
- SQLOS schedules execution: workers, schedulers, tasks and waits are managed internally.
- tempdb is a shared work area: sorts, spills, row versioning and temp objects can create heavy pressure.
Complete mental diagram
Client Application
|
| TDS protocol
v
SQL Server Network Interface
|
v
Relational Engine
+-- Parser
+-- Algebrizer / Binder
+-- Query Optimizer
+-- Execution Engine
|
v
Storage Engine
+-- Access Methods
+-- Buffer Manager
+-- Lock Manager
+-- Transaction Manager
+-- Log Manager
|
v
SQLOS
+-- Schedulers
+-- Workers
+-- Tasks
+-- Waits
+-- Memory Clerks
|
v
Physical Storage
+-- MDF primary data file
+-- NDF secondary data files
+-- LDF transaction log files
+-- tempdb files
+-- backup filesDBA translation of “the database is slow”
User says:
"The application is slow."
DBA translates:
- Is CPU saturated?
- Is the query waiting on I/O?
- Is the query blocked?
- Is the transaction log slow?
- Is the plan wrong?
- Are statistics stale?
- Is tempdb spilling?
- Is memory pressure high?
- Is the storage path slow?
- Is the application sending too many queries?Internal responsibility map
| Layer | Main role | What it controls | Typical symptom |
|---|---|---|---|
| Relational Engine | Compiles and runs SQL requests | Plans, joins, filters, aggregates | High CPU, wrong plan, bad cardinality |
| Query Optimizer | Chooses the cheapest estimated plan | Index usage, join order, memory grant | Regression after statistics or parameter change |
| Storage Engine | Reads and writes pages | Data access, index traversal, locking | PAGEIOLATCH waits, scans, hot pages |
| Buffer Manager | Caches data pages in memory | Read performance and memory pressure | Physical reads, low cache efficiency |
| Log Manager | Writes transaction log records | Commit latency and recovery | WRITELOG waits, log full, slow commits |
| SQLOS | Internal scheduling and resource coordination | Schedulers, workers, waits, memory clerks | SOS_SCHEDULER_YIELD, THREADPOOL, memory waits |
Main SQL Server services
A SQL Server platform can include several services. The Database Engine is the core. Other services provide scheduling, reporting, ETL, analytics, text search or instance discovery. In production, each enabled service must have a reason, a service account, monitoring and a patching policy.
| Service | Role | Production use |
|---|---|---|
| Database Engine | Relational engine, storage, transactions, security | Core OLTP and reporting databases |
| SQL Server Agent | Jobs, alerts, schedules, operators | Backups, maintenance, ETL, monitoring tasks |
| SQL Server Browser | Instance name and port discovery | Named instances, dynamic ports |
| Integration Services | ETL packages and data workflows | Data warehouse loads, migrations, exports |
| Analysis Services | Tabular or multidimensional analytical models | BI semantic layer, cubes, analytical models |
| Reporting Services | Paginated reports and reporting portal | Operational reports, PDF and Excel exports |
| Full-Text Search | Advanced text indexes and linguistic search | Search over descriptions, documents, tickets |
Typical enterprise layout
SQL Server host
|
+-- SQL Server Database Engine
| +-- master
| +-- model
| +-- msdb
| +-- tempdb
| +-- user databases
|
+-- SQL Server Agent
| +-- backup jobs
| +-- index maintenance jobs
| +-- integrity checks
| +-- ETL jobs
| +-- alert jobs
|
+-- Optional services
+-- SSIS
+-- SSRS
+-- SSAS
+-- Full-Text SearchService account checklist
- Dedicated accounts: one account per important SQL Server service when possible.
- No personal accounts: never run a production service with a named user account.
- Least privilege: grant only required rights on files, shares and network resources.
- Password policy: coordinate password rotation with service restart windows.
- SPN management: Kerberos requires correct Service Principal Names in domain environments.
- Monitoring: detect service stopped, failed Agent jobs and startup errors.
System databases
| Database | Purpose | DBA warning |
|---|---|---|
| master | Instance-level metadata, logins, endpoints, configuration | Must be backed up after instance-level changes |
| model | Template for new databases | Bad settings here propagate to new databases |
| msdb | SQL Agent jobs, backup history, alerts, operators | Critical for operations and recovery visibility |
| tempdb | Temporary objects, spills, version store, internal worktables | Performance-critical shared resource |
From T-SQL text to physical execution
SQL Server does not execute a query as raw text. The relational engine transforms SQL text into a logical tree, validates names and types, estimates cardinalities, explores possible plans, chooses one plan, then the execution engine runs operators against the storage engine.
Execution pipeline
1. Client sends T-SQL batch
2. Parser checks syntax
3. Algebrizer resolves names and data types
4. Optimizer builds candidate plans
5. Cost model estimates relative cost
6. Best plan is selected
7. Memory grant is requested if needed
8. Execution engine runs plan operators
9. Storage engine reads or writes pages
10. Results are returned to clientPipeline diagram
T-SQL text
|
v
Parser
|
v
Algebrizer / Binder
|
v
Optimizer
|
+-- statistics
+-- indexes
+-- constraints
+-- cardinality estimator
+-- cost model
|
v
Execution Plan
|
v
Execution Engine
|
+-- scans
+-- seeks
+-- joins
+-- sorts
+-- aggregates
|
v
Storage EnginePlan operators explained
| Operator | Meaning | Good sign | Warning sign |
|---|---|---|---|
| Index Seek | Targeted access through an index | Selective predicate, low reads | Many key lookups after seek |
| Index Scan | Reads a large part of an index | Useful for large reporting query | Bad for selective OLTP lookup |
| Key Lookup | Fetches missing columns from clustered index | Acceptable for few rows | Terrible when repeated thousands of times |
| Nested Loops | Repeated lookup join strategy | Good for small outer input | Bad when outer input is huge |
| Hash Match | Builds a hash table for joins or aggregates | Good for large sets | Can spill to tempdb if memory grant is wrong |
| Sort | Sorts rows for ORDER BY, merge join or window function | Expected for final ordering | Spill to tempdb, high memory grant |
Educational example
-- A selective predicate may use an index seek
SELECT order_id, order_date, customer_id
FROM dbo.orders
WHERE customer_id = 1001;
-- A broad predicate may use a scan
SELECT order_id, order_date, customer_id
FROM dbo.orders
WHERE order_date >= '2024-01-01';
-- The same table can legitimately need different access methods.
-- The question is not "seek good, scan bad".
-- The question is "is the plan correct for the number of rows returned?"Memory architecture
SQL Server uses memory for data pages, compiled plans, query execution, locks, metadata, columnstore objects and internal structures. Memory pressure does not always mean the server needs more RAM. It can also mean bad queries, wrong indexes, poor memory grants or excessive ad hoc plan compilation.
| Memory area | Role | Symptom if unhealthy |
|---|---|---|
| Buffer Pool | Caches 8 KB data and index pages | High physical reads, storage pressure |
| Plan Cache | Stores compiled execution plans | High compilation CPU, cache pollution |
| Memory Grants | Memory for sort, hash and exchange operators | Spills or RESOURCE_SEMAPHORE waits |
| Lock memory | Tracks locks held by transactions | Lock escalation, blocking, memory pressure |
| Columnstore object pool | Caches columnstore segments | Slow analytical workloads |
Memory pressure diagram
SQL Server memory
|
+-- Buffer Pool
| +-- data pages
| +-- index pages
|
+-- Plan Cache
| +-- compiled plans
| +-- ad hoc plans
|
+-- Query Execution
| +-- sort memory
| +-- hash memory
| +-- exchange memory
|
+-- Internal Structures
+-- locks
+-- metadata
+-- clerks
+-- columnstoreCommon memory problems
Problem: queries spill to tempdb
Cause:
- memory grant too low
- cardinality estimate wrong
- statistics stale
- large sort/hash operation
Problem: RESOURCE_SEMAPHORE waits
Cause:
- queries waiting for memory grant
- grants too large
- concurrency too high
Problem: high compilation CPU
Cause:
- ad hoc workload
- plan cache pollution
- parameterization issuesDMV: memory clerks
SELECT TOP (30)
type,
pages_kb,
virtual_memory_committed_kb,
awe_allocated_kb,
shared_memory_committed_kb
FROM sys.dm_os_memory_clerks
ORDER BY pages_kb DESC;DMV: memory grants
SELECT
session_id,
request_id,
scheduler_id,
dop,
requested_memory_kb,
granted_memory_kb,
required_memory_kb,
used_memory_kb,
max_used_memory_kb,
wait_time_ms,
is_next_candidate
FROM sys.dm_exec_query_memory_grants
ORDER BY requested_memory_kb DESC;Buffer Pool: why memory saves the database
SQL Server reads data from disk into memory as 8 KB pages. Once a page is in the Buffer Pool, future reads can be served from memory instead of storage. This is why RAM sizing, query design and index strategy are directly connected.
Read path
Query needs a row
|
v
Storage Engine checks Buffer Pool
|
+-- Page found in memory
| |
| v
| Logical read
|
+-- Page not found
|
v
Physical read from data file
|
v
Page added to Buffer PoolImportant vocabulary
| Term | Meaning |
|---|---|
| Logical read | Page read from Buffer Pool |
| Physical read | Page read from disk because it was not cached |
| Dirty page | Page modified in memory but not yet written to data file |
| Checkpoint | Process that helps flush dirty pages and reduce recovery work |
| Lazy writer | Frees buffers when memory pressure requires it |
DBA warning
High logical reads:
- query may be inefficient
- missing index
- broad scan
- bad join strategy
High physical reads:
- memory pressure
- cold cache
- storage issue
- working set larger than memoryBuffer Pool health queries
SELECT
counter_name,
cntr_value
FROM sys.dm_os_performance_counters
WHERE object_name LIKE '%Buffer Manager%'
AND counter_name IN (
'Page life expectancy',
'Buffer cache hit ratio',
'Page reads/sec',
'Page writes/sec',
'Lazy writes/sec',
'Checkpoint pages/sec'
);Top cached objects by pages
SELECT TOP (25)
DB_NAME(database_id) AS database_name,
COUNT(*) * 8 / 1024 AS cached_mb
FROM sys.dm_os_buffer_descriptors
WHERE database_id <> 32767
GROUP BY database_id
ORDER BY cached_mb DESC;Data files, log files and filegroups
SQL Server stores data in data files and transaction log records in log files. Data files can be grouped into filegroups. Filegroups help organize large databases, isolate objects, support partial restore strategies and distribute I/O.
| File type | Extension | Role | Best practice |
|---|---|---|---|
| Primary data file | .mdf | Main data file, contains system allocation metadata | Do not rely on a tiny autogrowth value |
| Secondary data file | .ndf | Additional data file, often in specific filegroups | Useful for large databases and filegroup strategy |
| Transaction log file | .ldf | Sequential log for recovery and durability | Place on low-latency storage, control growth |
| tempdb data file | .mdf / .ndf | Shared temporary workspace | Pre-size, multiple files when needed |
| Backup file | .bak / .trn | Full, differential or log backup | Test restores, not only backup creation |
File architecture diagram
Database Sales
|
+-- PRIMARY filegroup
| +-- Sales.mdf
|
+-- FG_DATA
| +-- Sales_Data_01.ndf
| +-- Sales_Data_02.ndf
|
+-- FG_INDEX
| +-- Sales_Index_01.ndf
|
+-- Transaction Log
+-- Sales_Log.ldfAutogrowth mistakes
Bad:
- growth by 1 MB
- percent growth on large files
- no monitoring
- log file growing every few minutes
- disk nearly full
Good:
- pre-size files
- fixed MB growth
- monitor growth events
- separate data and log latency
- test restore duration
- align file layout with workloadFile inventory query
SELECT
DB_NAME(database_id) AS database_name,
type_desc,
name AS logical_name,
physical_name,
size * 8 / 1024 AS size_mb,
growth,
is_percent_growth
FROM sys.master_files
ORDER BY database_name, type_desc, logical_name;Pages and extents: the physical unit of SQL Server storage
SQL Server stores table and index data in 8 KB pages. Eight contiguous pages form an extent of 64 KB. Understanding pages and extents helps explain fragmentation, page splits, fill factor, reads, allocation contention and why row width matters.
Storage hierarchy
Database
|
+-- Filegroup
|
+-- Data file
|
+-- Extent = 64 KB
|
+-- Page = 8 KB
|
+-- RowsImportant page concepts
| Concept | Explanation |
|---|---|
| 8 KB page | Basic unit of data and index storage |
| 64 KB extent | Group of 8 pages |
| Page split | Occurs when an insert/update requires space in a full page |
| Fill factor | Controls free space left in index pages during rebuild |
| Forwarded record | Heap row moved to another page, leaving a pointer behind |
Why row width matters
| Design choice | Impact on pages | Performance consequence |
|---|---|---|
| Wide rows | Fewer rows per page | More pages read for same row count |
| Too many included columns | Larger nonclustered indexes | More memory, more I/O, slower writes |
| Random GUID clustered key | Random inserts across pages | Page splits and fragmentation |
| Monotonic key | Mostly append-only pattern | Less fragmentation but possible hot last page |
Index physical stats
SELECT
OBJECT_SCHEMA_NAME(ips.object_id) AS schema_name,
OBJECT_NAME(ips.object_id) AS table_name,
i.name AS index_name,
ips.index_type_desc,
ips.avg_fragmentation_in_percent,
ips.page_count,
ips.avg_page_space_used_in_percent
FROM sys.dm_db_index_physical_stats(
DB_ID(),
NULL,
NULL,
NULL,
'LIMITED'
) ips
JOIN sys.indexes i
ON ips.object_id = i.object_id
AND ips.index_id = i.index_id
WHERE ips.page_count > 1000
ORDER BY ips.avg_fragmentation_in_percent DESC;Transaction log: durability and recovery
The transaction log is one of the most critical components of SQL Server. Every data modification is logged before it is considered durable. The log allows rollback, crash recovery, high availability, replication, log shipping and point-in-time restore.
Write path
UPDATE statement
|
v
Acquire locks
|
v
Modify page in Buffer Pool
|
v
Generate log record
|
v
COMMIT
|
v
Flush log block to LDF
|
v
Transaction durableLog-related terms
| Term | Meaning |
|---|---|
| LSN | Log Sequence Number, position in the log chain |
| VLF | Virtual Log File, internal division of the log file |
| Log backup | Captures log records and allows log truncation in full recovery |
| Recovery model | Controls log behavior and restore possibilities |
| WRITELOG | Wait type for log flush latency |
Common log incidents
Log grows continuously:
- missing log backups
- long open transaction
- replication or availability issue
- recovery model misunderstood
Slow commits:
- slow log disk
- too many tiny transactions
- synchronous HA commit latency
- log file autogrowth eventRecovery model comparison
| Recovery model | Point-in-time restore | Log backup needed | Typical use |
|---|---|---|---|
| Simple | No | No | Dev, staging, non-critical workloads |
| Full | Yes | Yes | Production OLTP, critical systems |
| Bulk logged | Limited depending on operations | Yes | Controlled bulk load windows |
Log health queries
SELECT
name,
recovery_model_desc,
log_reuse_wait_desc
FROM sys.databases
ORDER BY name;
DBCC SQLPERF(LOGSPACE);SQLOS: the internal operating layer
SQLOS is SQL Server’s internal layer for scheduling, workers, memory, waits and synchronization. It is not an external operating system, but it acts like a specialized runtime that allows SQL Server to coordinate many concurrent tasks efficiently.
Key components
- Scheduler: logical scheduler usually associated with a visible CPU.
- Worker: execution context used to run a task.
- Task: unit of work scheduled by SQL Server.
- Wait: measured time spent waiting for a resource.
- Yielding: cooperative scheduling behavior when a worker gives up CPU.
SQLOS mental model
CPU visible to SQL Server
|
v
Scheduler
|
+-- runnable queue
+-- current worker
+-- suspended workers
|
v
Workers execute tasks
|
+-- running
+-- runnable
+-- suspended
|
v
Wait stats reveal why work is delayedCommon SQLOS symptoms
SOS_SCHEDULER_YIELD:
- CPU pressure
- CPU-heavy queries
- inefficient plans
THREADPOOL:
- worker starvation
- too many concurrent requests
- blocking chains consuming workers
RESOURCE_SEMAPHORE:
- queries waiting for memory grants
- large sort/hash operations
- concurrency too highScheduler query
SELECT
scheduler_id,
cpu_id,
status,
current_tasks_count,
runnable_tasks_count,
current_workers_count,
active_workers_count,
work_queue_count,
load_factor
FROM sys.dm_os_schedulers
WHERE scheduler_id < 255
ORDER BY scheduler_id;Wait stats query
SELECT TOP (30)
wait_type,
waiting_tasks_count,
wait_time_ms,
signal_wait_time_ms,
wait_time_ms - signal_wait_time_ms AS resource_wait_time_ms
FROM sys.dm_os_wait_stats
WHERE wait_type NOT LIKE 'SLEEP%'
AND wait_type NOT LIKE 'BROKER%'
AND wait_type NOT LIKE 'XE%'
ORDER BY wait_time_ms DESC;tempdb: the shared scratch database
tempdb is used by all databases on the instance. It stores temporary tables, table variables, internal worktables, sorts, hash spills, row versioning data and online operation work areas. A weak tempdb design can slow the entire instance.
What uses tempdb?
- Temporary tables: local and global temp tables.
- Table variables: depending on usage and size.
- Sorts and hashes: when memory grant is insufficient.
- Version store: snapshot isolation and read committed snapshot.
- Index rebuilds: depending on options and operations.
- DBCC CHECKDB: can use significant tempdb space.
I/O architecture
SQL Server I/O paths
|
+-- Data reads
| +-- random reads
| +-- sequential scans
|
+-- Data writes
| +-- dirty page flush
| +-- checkpoint
|
+-- Log writes
| +-- sequential writes
| +-- commit latency critical
|
+-- tempdb I/O
+-- spills
+-- temp objects
+-- version storetempdb mistakes
Bad:
- one tiny tempdb data file
- autogrowth every few minutes
- tempdb on slow disk
- no monitoring of version store
- ignoring spills in plans
Good:
- pre-size tempdb
- use multiple data files when needed
- fixed-size autogrowth
- monitor waits and file usage
- tune queries that spilltempdb file usage
SELECT
SUM(user_object_reserved_page_count) * 8 / 1024 AS user_objects_mb,
SUM(internal_object_reserved_page_count) * 8 / 1024 AS internal_objects_mb,
SUM(version_store_reserved_page_count) * 8 / 1024 AS version_store_mb,
SUM(unallocated_extent_page_count) * 8 / 1024 AS free_space_mb
FROM tempdb.sys.dm_db_file_space_usage;I/O latency by database file
SELECT
DB_NAME(vfs.database_id) AS database_name,
mf.type_desc,
mf.physical_name,
vfs.num_of_reads,
vfs.num_of_writes,
vfs.io_stall_read_ms,
vfs.io_stall_write_ms,
CASE WHEN vfs.num_of_reads = 0 THEN 0
ELSE vfs.io_stall_read_ms / vfs.num_of_reads END AS avg_read_ms,
CASE WHEN vfs.num_of_writes = 0 THEN 0
ELSE vfs.io_stall_write_ms / vfs.num_of_writes END AS avg_write_ms
FROM sys.dm_io_virtual_file_stats(NULL, NULL) vfs
JOIN sys.master_files mf
ON vfs.database_id = mf.database_id
AND vfs.file_id = mf.file_id
ORDER BY avg_write_ms DESC, avg_read_ms DESC;Essential DMV toolkit for architecture diagnostics
DMVs are the DBA cockpit. They expose what SQL Server is doing now and what happened since startup. A serious diagnostic should combine active requests, wait stats, file latency, memory clerks, query stats, Query Store, locks and Agent job history.
Active requests
SELECT
r.session_id,
r.status,
r.command,
r.cpu_time,
r.total_elapsed_time,
r.wait_type,
r.wait_time,
r.blocking_session_id,
DB_NAME(r.database_id) AS database_name,
t.text AS sql_text
FROM sys.dm_exec_requests r
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) t
ORDER BY r.total_elapsed_time DESC;Top cached queries by CPU
SELECT TOP (25)
qs.total_worker_time / 1000 AS total_cpu_ms,
qs.execution_count,
qs.total_worker_time / NULLIF(qs.execution_count, 0) / 1000 AS avg_cpu_ms,
qs.total_logical_reads,
qs.total_elapsed_time / 1000 AS total_elapsed_ms,
SUBSTRING(
st.text,
(qs.statement_start_offset / 2) + 1,
((CASE qs.statement_end_offset
WHEN -1 THEN DATALENGTH(st.text)
ELSE qs.statement_end_offset
END - qs.statement_start_offset) / 2) + 1
) AS statement_text
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) st
ORDER BY qs.total_worker_time DESC;Blocking overview
SELECT
r.session_id,
r.blocking_session_id,
r.wait_type,
r.wait_time,
r.wait_resource,
r.status,
r.command,
DB_NAME(r.database_id) AS database_name,
t.text AS sql_text
FROM sys.dm_exec_requests r
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) t
WHERE r.blocking_session_id <> 0
ORDER BY r.wait_time DESC;Database file sizes
SELECT
DB_NAME(database_id) AS database_name,
type_desc,
name AS logical_name,
physical_name,
size * 8 / 1024 AS size_mb,
growth,
is_percent_growth
FROM sys.master_files
ORDER BY database_name, type_desc, logical_name;Index usage
SELECT
OBJECT_SCHEMA_NAME(i.object_id) AS schema_name,
OBJECT_NAME(i.object_id) AS table_name,
i.name AS index_name,
s.user_seeks,
s.user_scans,
s.user_lookups,
s.user_updates
FROM sys.indexes i
LEFT JOIN sys.dm_db_index_usage_stats s
ON s.object_id = i.object_id
AND s.index_id = i.index_id
AND s.database_id = DB_ID()
WHERE i.object_id > 100
ORDER BY s.user_updates DESC, s.user_seeks ASC;Diagnostic decision table
| Symptom | First DMV | Next step |
|---|---|---|
| Users blocked | sys.dm_exec_requests | Find blocker, transaction, SQL text, application owner |
| High CPU | sys.dm_exec_query_stats | Find top CPU plans and compare Query Store |
| Slow disk | sys.dm_io_virtual_file_stats | Separate data, log and tempdb latency |
| Memory pressure | sys.dm_os_memory_clerks | Check grants, cache, PLE, OS pressure |
| Plan regression | Query Store views | Compare plans, runtime stats and waits |
DBA architecture checklist
This checklist is designed for a real SQL Server production review. It focuses on architecture, not only query tuning. The goal is to detect structural weaknesses before they become incidents.
| Area | Question | Good signal | Bad signal |
|---|---|---|---|
| Version | Is the SQL Server version supported and patched? | Documented build and patch policy | Unknown build, no patch schedule |
| Memory | Is max server memory configured? | SQL Server leaves memory for OS and agents | Default memory setting on production |
| Files | Are data and log files pre-sized? | Controlled growth and monitored capacity | Frequent autogrowth, percent growth |
| Log | Are log backups aligned with RPO? | Log chain valid and restore tested | Full recovery with no log backups |
| tempdb | Is tempdb sized and monitored? | Stable files, no frequent growth, low contention | tempdb full, spills ignored |
| Indexes | Are indexes useful and measured? | Usage review, missing index review, write cost known | Index explosion, unused indexes |
| Query Store | Is Query Store enabled and reviewed? | Baselines and regression detection | No plan history during incident |
| SQL Agent | Are failed jobs monitored? | Operators, alerts, failure notifications | Silent job failures |
| Backups | Are restores tested? | Restore report and RTO proof | Backups exist but no restore validation |
| HA / DR | Has failover been tested? | Documented RPO/RTO and rehearsal | Architecture diagram only, no proof |
Daily architecture checks
Daily:
1. Failed SQL Agent jobs
2. Backup status
3. Log reuse wait
4. Disk free space
5. Blocking sessions
6. Severe SQL errors
7. tempdb usage
8. Top waits
9. Long-running requests
10. Query Store regressionsMonthly architecture checks
Monthly:
1. Restore test
2. File growth review
3. Index usage review
4. Query Store report
5. Patch level review
6. Security permission review
7. HA failover drill
8. Capacity forecast
9. Service account review
10. Documentation updateFinal architecture diagnosis model
A strong SQL Server diagnosis follows this order:
1. Is the instance healthy?
- CPU, memory, waits, workers, services
2. Is storage healthy?
- data latency, log latency, tempdb latency, autogrowth
3. Is the workload healthy?
- top queries, blocking, plans, Query Store, jobs
4. Is recoverability proven?
- backup chain, restore tests, HA, DR, RPO/RTO
5. Is the design sustainable?
- file layout, index strategy, capacity, security, monitoringWhy storage design matters in SQL Server
SQL Server performance is strongly linked to storage design. Tables and indexes are not abstract objects floating inside a database. They are stored in pages, grouped into extents, allocated through allocation maps, written into data files, protected by the transaction log, and often supported by tempdb during sorting, hashing, versioning and internal operations.
A poor storage design creates production problems that look like query problems: slow reads, slow commits, blocking during autogrowth, tempdb saturation, log full incidents, restore delays, fragmentation, high I/O latency and capacity surprises.
Core storage principles
- Data pages are 8 KB: SQL Server reads and writes data through fixed-size pages.
- Extents are 64 KB: one extent contains 8 contiguous pages.
- Data files hold pages: MDF and NDF files contain table and index pages.
- Log files hold log records: LDF files are sequential and critical for commits.
- Filegroups organize data files: useful for large databases, partitioning and restore strategy.
- tempdb is shared: every database on the instance can create pressure on tempdb.
- Autogrowth is emergency capacity: not a normal sizing strategy.
Complete storage map
SQL Server Instance
|
+-- System databases
| +-- master
| +-- model
| +-- msdb
| +-- tempdb
|
+-- User database
|
+-- Filegroups
| |
| +-- Data files
| +-- MDF primary data file
| +-- NDF secondary data files
|
+-- Transaction log
+-- LDF log file
Inside data files:
Database -> Filegroup -> File -> Extent -> Page -> Row
Page size:
8 KB
Extent size:
8 pages = 64 KBDBA reading model
If users report slowness, storage questions are:
1. Are reads slow?
2. Are writes slow?
3. Is the log slow?
4. Is tempdb slow or full?
5. Are files growing during workload?
6. Is the database poorly distributed?
7. Is restore time acceptable?
8. Are backups too slow?
9. Are indexes wasting pages?
10. Is storage aligned with RPO and RTO?Storage responsibility table
| Object | Size / type | Role | Common production issue |
|---|---|---|---|
| Page | 8 KB | Basic unit for table and index data | Too many reads caused by wide rows or bad indexes |
| Extent | 64 KB | Allocation unit containing 8 pages | Allocation contention or poor space usage |
| MDF | Primary data file | Main data file of a database | Overloaded single file, uncontrolled growth |
| NDF | Secondary data file | Additional data files, often in filegroups | Poor distribution or unused complexity |
| LDF | Transaction log file | Durability, rollback, recovery, log backups | WRITELOG waits, log full, missing log backups |
| tempdb | Shared system database | Temporary work area for the whole instance | Spills, version store growth, allocation contention |
Pages and extents: the physical base
SQL Server stores data in 8 KB pages. A page is the basic unit of I/O. Even if a query needs only one small row, SQL Server reads the page containing that row. This is why row width, index design and page density directly affect performance.
Hierarchy
Database
|
+-- Filegroup
|
+-- Data file
|
+-- Extent = 64 KB
|
+-- Page 1 = 8 KB
+-- Page 2 = 8 KB
+-- Page 3 = 8 KB
+-- Page 4 = 8 KB
+-- Page 5 = 8 KB
+-- Page 6 = 8 KB
+-- Page 7 = 8 KB
+-- Page 8 = 8 KBImportant concepts
| Concept | Meaning | DBA impact |
|---|---|---|
| Page | 8 KB storage unit | More pages read means more I/O and memory pressure |
| Extent | 8 pages, 64 KB | Allocation and scan efficiency |
| Row density | Number of rows per page | Wide rows reduce density and increase reads |
| Page split | Page divided to make room for insert/update | Fragmentation, extra writes, possible latch pressure |
| Fill factor | Free space left in index pages during rebuild | Useful for write-heavy indexes, harmful if misused |
How page design affects workload
| Design | Storage effect | Performance effect | DBA action |
|---|---|---|---|
| Very wide table | Fewer rows per page | More logical reads for same row count | Normalize carefully or split cold columns |
| Random clustered key | Inserts across many pages | Page splits and fragmentation | Use better key strategy or fill factor |
| Many included columns | Large nonclustered indexes | More memory and slower writes | Keep indexes justified by usage |
| Heap with updates | Forwarded records possible | Extra lookups and poor scan behavior | Prefer clustered index in most OLTP cases |
| Low page density | Wasted space | More I/O, larger backups | Review fragmentation and fill factor |
Page and index physical stats
SELECT
OBJECT_SCHEMA_NAME(ips.object_id) AS schema_name,
OBJECT_NAME(ips.object_id) AS table_name,
i.name AS index_name,
ips.index_type_desc,
ips.avg_fragmentation_in_percent,
ips.avg_page_space_used_in_percent,
ips.page_count
FROM sys.dm_db_index_physical_stats(
DB_ID(),
NULL,
NULL,
NULL,
'SAMPLED'
) AS ips
JOIN sys.indexes AS i
ON ips.object_id = i.object_id
AND ips.index_id = i.index_id
WHERE ips.page_count > 1000
ORDER BY ips.page_count DESC;Allocation maps: how SQL Server tracks space
SQL Server uses special internal pages to track allocated extents, mixed extents, free space, object ownership and changed pages. These allocation maps are essential for space management, backups, page allocation and internal navigation.
Main allocation map pages
| Map | Full name | Role |
|---|---|---|
| PFS | Page Free Space | Tracks page allocation and approximate free space |
| GAM | Global Allocation Map | Tracks which extents are allocated |
| SGAM | Shared Global Allocation Map | Tracks mixed extents with free pages |
| IAM | Index Allocation Map | Maps extents used by an allocation unit |
| DCM | Differential Changed Map | Tracks extents changed since last full backup |
| BCM | Bulk Changed Map | Tracks extents changed by minimally logged operations |
Allocation map mental model
Data file
|
+-- Allocation map pages
| +-- PFS: page free space
| +-- GAM: allocated extents
| +-- SGAM: mixed extents
| +-- IAM: object allocation map
| +-- DCM: differential backup tracking
| +-- BCM: bulk operation tracking
|
+-- Data pages
+-- Index pages
+-- LOB pages
+-- Row-overflow pagesWhy DBAs care
Allocation maps explain:
- why tempdb can have allocation contention
- why differential backups know what changed
- how SQL Server tracks object space
- why page allocation is not random
- why file layout matters under high concurrencyAllocation-related symptoms
| Symptom | Possible storage reason | Typical check | Possible action |
|---|---|---|---|
| tempdb contention | Hot allocation pages | PAGELATCH waits on tempdb | Multiple tempdb files, modern SQL Server settings |
| Large differential backup | Many extents changed since full backup | Backup size trend | Review full backup frequency |
| Unexpected file growth | Allocation pressure from objects or indexes | Object space report | Capacity planning, archive, purge, compression |
| High latch waits | Hot pages or allocation structures | Wait stats and page resource | Schema, tempdb, index or workload redesign |
Space by allocation unit
SELECT
s.name AS schema_name,
o.name AS object_name,
i.name AS index_name,
au.type_desc AS allocation_type,
SUM(au.total_pages) * 8 / 1024 AS total_mb,
SUM(au.used_pages) * 8 / 1024 AS used_mb,
SUM(au.data_pages) * 8 / 1024 AS data_mb
FROM sys.allocation_units AS au
JOIN sys.partitions AS p
ON au.container_id = p.hobt_id
JOIN sys.objects AS o
ON p.object_id = o.object_id
JOIN sys.schemas AS s
ON o.schema_id = s.schema_id
LEFT JOIN sys.indexes AS i
ON p.object_id = i.object_id
AND p.index_id = i.index_id
WHERE o.is_ms_shipped = 0
GROUP BY s.name, o.name, i.name, au.type_desc
ORDER BY total_mb DESC;MDF, NDF, LDF: file types and responsibilities
SQL Server databases normally have at least one data file and one log file. The data file stores database pages. The log file stores the transaction log. They have completely different I/O patterns: data files are often random-read heavy, while log files are sequential-write sensitive.
| File | Extension | Contains | I/O pattern | DBA rule |
|---|---|---|---|---|
| Primary data file | .mdf | Data pages, metadata, allocation structures | Reads and writes | Must exist; do not let it grow blindly |
| Secondary data file | .ndf | Additional data pages | Reads and writes | Useful for filegroups and large databases |
| Transaction log | .ldf | Log records and VLFs | Sequential writes | Low-latency storage and log backup discipline |
| Backup file | .bak / .trn | Database or log backup | Sequential read/write | Restore tests are mandatory |
Bad file layout
Single volume:
C:\SQL\
Sales.mdf
Sales_log.ldf
tempdb.mdf
tempdb_log.ldf
backups\Sales.bak
Risks:
- OS, data, log, tempdb and backups compete
- log latency affects commits
- backup can saturate production disk
- tempdb can hurt every database
- disk full becomes catastrophicBetter file layout
Separated paths:
D:\SQLData\
Sales.mdf
Sales_Data01.ndf
L:\SQLLog\
Sales_log.ldf
T:\SQLTempDB\
tempdb.mdf
tempdb_02.ndf
tempdb_log.ldf
B:\SQLBackups\
Sales_full.bak
Sales_log.trn
Benefits:
- clearer capacity monitoring
- better latency isolation
- easier troubleshooting
- safer backup strategyFile inventory
SELECT
DB_NAME(database_id) AS database_name,
type_desc,
name AS logical_name,
physical_name,
size * 8 / 1024 AS size_mb,
growth,
is_percent_growth
FROM sys.master_files
ORDER BY database_name, type_desc, logical_name;Filegroups: logical organization of data files
A filegroup is a logical container for one or more data files. Filegroups are useful when a database becomes large, when partitioning is needed, when some data is read-only, when backup and restore strategies need more granularity, or when different storage tiers are used.
Why use filegroups?
- Separate large objects: isolate tables, indexes or archives.
- Support partitioning: place partitions on different filegroups.
- Improve restore strategy: partial restore can reduce recovery time for large databases.
- Support read-only data: historical filegroups can become read-only.
- Improve administration: make capacity and ownership clearer.
Filegroup architecture
Database SalesDB
|
+-- PRIMARY
| +-- SalesDB.mdf
|
+-- FG_ACTIVE_DATA
| +-- SalesDB_Active_01.ndf
| +-- SalesDB_Active_02.ndf
|
+-- FG_ARCHIVE_2022
| +-- SalesDB_Archive_2022.ndf
|
+-- FG_ARCHIVE_2023
+-- SalesDB_Archive_2023.ndfTypical strategy
Active data:
- fast storage
- frequent backup
- full read/write
Archive data:
- cheaper storage
- possible read-only filegroup
- less frequent access
- separate restore strategy
Indexes:
- sometimes separate filegroup
- useful for administration
- not magic for performance by itselfCreate database with filegroups
CREATE DATABASE SalesDB
ON PRIMARY
(
NAME = SalesDB_Primary,
FILENAME = 'D:\SQLData\SalesDB.mdf',
SIZE = 2048MB,
FILEGROWTH = 512MB
),
FILEGROUP FG_DATA
(
NAME = SalesDB_Data01,
FILENAME = 'E:\SQLData\SalesDB_Data01.ndf',
SIZE = 8192MB,
FILEGROWTH = 1024MB
),
FILEGROUP FG_INDEX
(
NAME = SalesDB_Index01,
FILENAME = 'F:\SQLIndex\SalesDB_Index01.ndf',
SIZE = 4096MB,
FILEGROWTH = 512MB
)
LOG ON
(
NAME = SalesDB_Log,
FILENAME = 'L:\SQLLog\SalesDB.ldf',
SIZE = 4096MB,
FILEGROWTH = 512MB
);Move an index to a filegroup
CREATE INDEX IX_orders_customer_date
ON dbo.orders(customer_id, order_date)
WITH (DROP_EXISTING = ON)
ON FG_INDEX;Filegroup inventory
SELECT
fg.name AS filegroup_name,
fg.type_desc,
df.name AS logical_file_name,
df.physical_name,
df.size * 8 / 1024 AS size_mb
FROM sys.filegroups AS fg
JOIN sys.database_files AS df
ON fg.data_space_id = df.data_space_id
ORDER BY fg.name, df.name;Indexes are storage objects, not only performance objects
Every index consumes pages. Every insert, update and delete may need to maintain several indexes. A good index can save millions of reads. A bad index can waste memory, slow writes, increase backups, increase maintenance time and create more fragmentation.
| Index type | Storage behavior | Best use | Risk |
|---|---|---|---|
| Clustered index | Defines physical row order at leaf level | Most OLTP tables need a good clustered key | Bad key affects every nonclustered index |
| Nonclustered index | Separate B-tree pointing to base rows | Selective searches and covering queries | Too many indexes slow writes |
| Columnstore index | Column-oriented compressed segments | Analytics, large scans, aggregations | Not always ideal for small OLTP lookups |
| Filtered index | Index only a subset of rows | Highly selective conditions | Predicate must match query pattern |
| XML / spatial / full-text | Specialized structures | Specific data access patterns | Maintenance and storage overhead |
B-tree simplified
Index root page
|
+-- intermediate pages
|
+-- leaf pages
|
+-- key values
+-- row locator or included columns
Index seek:
root -> intermediate -> leaf -> row
Index scan:
many leaf pages read sequentiallyStorage cost mindset
One new index can mean:
- more pages in data files
- more pages in memory
- more log records on writes
- more backup size
- more index maintenance
- more statistics to update
- slower inserts and updates
Therefore:
Every index must have evidence.Index size report
SELECT
s.name AS schema_name,
o.name AS table_name,
i.name AS index_name,
i.type_desc,
SUM(a.total_pages) * 8 / 1024 AS total_mb,
SUM(a.used_pages) * 8 / 1024 AS used_mb,
SUM(a.data_pages) * 8 / 1024 AS data_mb
FROM sys.indexes AS i
JOIN sys.objects AS o
ON i.object_id = o.object_id
JOIN sys.schemas AS s
ON o.schema_id = s.schema_id
JOIN sys.partitions AS p
ON i.object_id = p.object_id
AND i.index_id = p.index_id
JOIN sys.allocation_units AS a
ON p.partition_id = a.container_id
WHERE o.is_ms_shipped = 0
GROUP BY s.name, o.name, i.name, i.type_desc
ORDER BY total_mb DESC;Index usage report
SELECT
OBJECT_SCHEMA_NAME(i.object_id) AS schema_name,
OBJECT_NAME(i.object_id) AS table_name,
i.name AS index_name,
s.user_seeks,
s.user_scans,
s.user_lookups,
s.user_updates
FROM sys.indexes AS i
LEFT JOIN sys.dm_db_index_usage_stats AS s
ON s.object_id = i.object_id
AND s.index_id = i.index_id
AND s.database_id = DB_ID()
WHERE i.object_id > 100
ORDER BY s.user_updates DESC, s.user_seeks ASC;Transaction log storage
The transaction log is not just another file. It is the sequential history of database changes. SQL Server uses it for rollback, crash recovery, high availability, replication, log shipping and point-in-time restore. A slow or full log can stop production.
Commit path
Application sends COMMIT
|
v
SQL Server has dirty pages in memory
|
v
Log records are flushed to LDF
|
v
Commit is acknowledged
|
v
Data pages can be written laterLog design rules
- Use low-latency storage: log writes are commit-critical.
- Pre-size the log: avoid repeated growth during workload.
- Use fixed growth: avoid tiny growth and percent growth on large logs.
- Monitor VLF count: too many VLFs can hurt recovery and operations.
- Back up the log: in full recovery model, log backups are mandatory.
- Watch log_reuse_wait_desc: it explains why the log cannot be reused.
Common incident pattern
Database in FULL recovery model
|
+-- no transaction log backups
|
v
Log cannot be truncated
|
v
LDF file grows
|
v
Disk becomes full
|
v
Database stops accepting writesRecovery model table
| Recovery model | Log behavior | Point-in-time restore | Production fit |
|---|---|---|---|
| Simple | Log is automatically truncated after checkpoint when possible | No | Non-critical databases, dev, staging |
| Full | Log chain must be backed up | Yes | Most critical OLTP systems |
| Bulk logged | Some operations can be minimally logged | Limited in some scenarios | Controlled bulk load windows |
Log monitoring
SELECT
name,
recovery_model_desc,
log_reuse_wait_desc
FROM sys.databases
ORDER BY name;
DBCC SQLPERF(LOGSPACE);VLF information
SELECT *
FROM sys.dm_db_log_info(DB_ID())
ORDER BY vlf_sequence_number;tempdb: the shared pressure point
tempdb is recreated at every SQL Server startup and is shared by the whole instance. It is used by user objects, internal objects, row versioning, sorts, hash joins, index operations, triggers, snapshot isolation, online operations and DBCC commands.
tempdb users
| Usage | Examples | Risk |
|---|---|---|
| User objects | #temp tables, global temp tables, table variables | Space pressure and allocation contention |
| Internal objects | Sorts, hashes, worktables, spools | Spills and heavy I/O |
| Version store | RCSI, SNAPSHOT, online index operations | Long transactions can grow tempdb |
| Maintenance | DBCC CHECKDB, index rebuilds | Large temporary space requirement |
tempdb architecture
tempdb
|
+-- Data files
| +-- tempdb.mdf
| +-- tempdb_02.ndf
| +-- tempdb_03.ndf
| +-- tempdb_04.ndf
|
+-- Log file
| +-- templog.ldf
|
+-- Consumers
+-- temp tables
+-- worktables
+-- spills
+-- version store
+-- DBCC
+-- online operationstempdb design rules
- Pre-size files: avoid growth during workload.
- Use fixed growth: predictable expansion.
- Use multiple data files when needed: helps reduce allocation contention.
- Keep equal file sizes: proportional fill works better.
- Monitor version store: long transactions can retain versions.
- Fix spills: do not hide query problems by only adding disk.
tempdb file space usage
SELECT
SUM(user_object_reserved_page_count) * 8 / 1024 AS user_objects_mb,
SUM(internal_object_reserved_page_count) * 8 / 1024 AS internal_objects_mb,
SUM(version_store_reserved_page_count) * 8 / 1024 AS version_store_mb,
SUM(unallocated_extent_page_count) * 8 / 1024 AS free_space_mb
FROM tempdb.sys.dm_db_file_space_usage;tempdb task usage
SELECT
session_id,
request_id,
user_objects_alloc_page_count * 8 / 1024 AS user_alloc_mb,
user_objects_dealloc_page_count * 8 / 1024 AS user_dealloc_mb,
internal_objects_alloc_page_count * 8 / 1024 AS internal_alloc_mb,
internal_objects_dealloc_page_count * 8 / 1024 AS internal_dealloc_mb
FROM sys.dm_db_task_space_usage
ORDER BY internal_objects_alloc_page_count DESC;Autogrowth: emergency mechanism, not capacity planning
Autogrowth prevents immediate failure when a file runs out of space, but it should not be the normal way a database grows. Growth events can pause workload, generate latency, fragment files at the storage level and create unpredictable incidents.
Bad pattern
Database file size:
50 GB
Autogrowth:
1 MB
Workload:
bulk load inserts 20 GB
Result:
thousands of growth events
production latency
storage fragmentation
angry users
slow maintenance windowGood pattern
Database file size:
pre-sized for next 3 to 6 months
Autogrowth:
fixed size, for example 512 MB or 1 GB
Monitoring:
alert at 70 percent
warning at 80 percent
critical at 90 percent
Capacity:
reviewed weekly or monthly
growth forecast documentedRules
- Use fixed MB or GB growth: avoid percent growth for large files.
- Avoid tiny growth: 1 MB growth is almost always wrong for production.
- Pre-size files: especially data, log and tempdb files.
- Monitor growth frequency: frequent growth means bad capacity planning.
- Keep log growth controlled: log autogrowth can block commits.
Autogrowth configuration report
SELECT
DB_NAME(database_id) AS database_name,
type_desc,
name AS logical_name,
size * 8 / 1024 AS size_mb,
CASE
WHEN is_percent_growth = 1 THEN CAST(growth AS varchar(20)) + ' percent'
ELSE CAST(growth * 8 / 1024 AS varchar(20)) + ' MB'
END AS growth_setting,
is_percent_growth,
physical_name
FROM sys.master_files
ORDER BY database_name, type_desc, logical_name;File growth events from default trace, when available
SELECT
te.name AS event_name,
t.DatabaseName,
t.FileName,
t.StartTime,
t.Duration / 1000 AS duration_ms,
t.IntegerData * 8 / 1024 AS growth_mb
FROM sys.traces AS tr
CROSS APPLY sys.fn_trace_gettable(tr.path, DEFAULT) AS t
JOIN sys.trace_events AS te
ON t.EventClass = te.trace_event_id
WHERE tr.is_default = 1
AND te.name IN ('Data File Auto Grow', 'Log File Auto Grow')
ORDER BY t.StartTime DESC;Growth risk table
| Setting | Risk | Recommended action |
|---|---|---|
| 1 MB growth | Too many growth events | Change to realistic fixed MB or GB |
| Percent growth | Growth becomes huge as file grows | Use fixed growth |
| No free disk alert | Production outage | Add disk and SQL file alerts |
| Log grows often | Commit latency and log full risk | Check log backups, VLFs, transactions |
I/O design: separate workload patterns
SQL Server storage is not only about capacity. Latency matters. Data reads, data writes, transaction log writes, tempdb spills and backups have different I/O patterns. Mixing everything on the same saturated volume makes troubleshooting difficult and increases incident probability.
I/O pattern map
Data files:
- random reads
- sequential scans
- checkpoint writes
- index maintenance writes
Log files:
- sequential writes
- commit critical
- sensitive to latency
tempdb:
- mixed read/write
- spills
- temp tables
- version store
Backups:
- large sequential reads from data
- large sequential writes to backup targetStorage design model
Recommended separation:
D:\SQLData
user database MDF/NDF
L:\SQLLog
user database LDF
T:\SQLTempDB
tempdb data and log
B:\SQLBackups
backup output
M:\SQLMaintenance
exports, scripts, staging files
Goal:
isolate latency
simplify monitoring
reduce blast radius
improve recovery operationsI/O latency interpretation
| File type | Latency impact | Common wait | DBA action |
|---|---|---|---|
| Data file reads | Queries wait for pages | PAGEIOLATCH_SH | Index tuning, memory, storage review |
| Data file writes | Checkpoint and dirty page flush slow | PAGEIOLATCH_EX, IO waits | Storage throughput, checkpoint review |
| Log writes | Commits slow | WRITELOG | Move log, improve storage, batch commits |
| tempdb I/O | Sorts, hashes and temp objects slow | PAGELATCH, IO_COMPLETION | tempdb layout, query spills, memory grants |
| Backup I/O | Backup window too long | Backup waits | Compression, striping, target throughput |
I/O latency query
SELECT
DB_NAME(vfs.database_id) AS database_name,
mf.type_desc,
mf.physical_name,
vfs.num_of_reads,
vfs.num_of_writes,
vfs.io_stall_read_ms,
vfs.io_stall_write_ms,
CASE WHEN vfs.num_of_reads = 0 THEN 0
ELSE vfs.io_stall_read_ms / vfs.num_of_reads END AS avg_read_ms,
CASE WHEN vfs.num_of_writes = 0 THEN 0
ELSE vfs.io_stall_write_ms / vfs.num_of_writes END AS avg_write_ms
FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS vfs
JOIN sys.master_files AS mf
ON vfs.database_id = mf.database_id
AND vfs.file_id = mf.file_id
ORDER BY avg_write_ms DESC, avg_read_ms DESC;Storage DMV toolkit
A storage diagnosis must be evidence-based. File size, growth settings, file latency, table size, index size, tempdb usage, log reuse waits and backup history should be reviewed together.
Database file overview
SELECT
DB_NAME(database_id) AS database_name,
type_desc,
name AS logical_name,
physical_name,
size * 8 / 1024 AS size_mb,
max_size,
growth,
is_percent_growth
FROM sys.master_files
ORDER BY database_name, type_desc, logical_name;Database size summary
SELECT
DB_NAME(database_id) AS database_name,
SUM(size) * 8 / 1024 AS total_size_mb
FROM sys.master_files
GROUP BY database_id
ORDER BY total_size_mb DESC;Object space usage
SELECT TOP (50)
s.name AS schema_name,
o.name AS object_name,
SUM(a.total_pages) * 8 / 1024 AS total_mb,
SUM(a.used_pages) * 8 / 1024 AS used_mb,
SUM(a.data_pages) * 8 / 1024 AS data_mb
FROM sys.objects AS o
JOIN sys.schemas AS s
ON o.schema_id = s.schema_id
JOIN sys.partitions AS p
ON o.object_id = p.object_id
JOIN sys.allocation_units AS a
ON p.partition_id = a.container_id
WHERE o.is_ms_shipped = 0
GROUP BY s.name, o.name
ORDER BY total_mb DESC;Log reuse wait
SELECT
name,
recovery_model_desc,
log_reuse_wait_desc
FROM sys.databases
ORDER BY name;tempdb usage
SELECT
SUM(user_object_reserved_page_count) * 8 / 1024 AS user_objects_mb,
SUM(internal_object_reserved_page_count) * 8 / 1024 AS internal_objects_mb,
SUM(version_store_reserved_page_count) * 8 / 1024 AS version_store_mb,
SUM(unallocated_extent_page_count) * 8 / 1024 AS free_space_mb
FROM tempdb.sys.dm_db_file_space_usage;Backup history size
SELECT TOP (100)
database_name,
type AS backup_type,
backup_start_date,
backup_finish_date,
DATEDIFF(second, backup_start_date, backup_finish_date) AS duration_sec,
backup_size / 1024 / 1024 AS backup_size_mb,
compressed_backup_size / 1024 / 1024 AS compressed_backup_size_mb
FROM msdb.dbo.backupset
ORDER BY backup_finish_date DESC;Diagnostic flow
Storage incident workflow:
1. Check disk free space
2. Check SQL file sizes and growth settings
3. Check recent autogrowth events
4. Check file latency by data, log and tempdb
5. Check log_reuse_wait_desc
6. Check tempdb usage
7. Check top objects and indexes by size
8. Check backup history and duration
9. Check current active requests and waits
10. Decide: capacity, query, index, log, tempdb or storage issue?Production DBA checklist for storage
This checklist is designed for production review. It covers physical storage, SQL files, log behavior, tempdb, growth settings, filegroups, capacity, backup and restore implications.
| Area | Question | Good signal | Bad signal |
|---|---|---|---|
| Data files | Are files pre-sized? | Stable size and predictable growth | Frequent emergency growth |
| Log files | Is log storage low-latency? | Low WRITELOG waits | Slow commits, log growth during peak |
| Autogrowth | Is growth fixed and realistic? | Fixed MB or GB growth | 1 MB growth or percent growth |
| tempdb | Is tempdb sized and monitored? | Stable usage, no frequent growth | Spills, contention, full tempdb |
| Filegroups | Are filegroups useful or accidental? | Clear admin or restore purpose | Complexity without benefit |
| Indexes | Are large indexes justified? | Usage evidence and maintenance plan | Unused large indexes |
| Backups | Does storage design support restore goals? | Restore duration tested | Backup exists but restore unknown |
| Capacity | Is growth forecasted? | Monthly capacity report | Disk alerts only when nearly full |
| I/O | Are data, log and tempdb latencies known? | Measured by file type | Single black-box storage metric |
| Monitoring | Are growth events and disk usage alerted? | Actionable alerts | Manual discovery after incident |
Daily checks
Daily:
1. Disk free space
2. Failed backups
3. Log backup chain
4. Log reuse wait
5. tempdb usage
6. Recent autogrowth events
7. File latency
8. Blocking caused by storage waits
9. Backup duration
10. SQL Agent storage alertsMonthly checks
Monthly:
1. Database growth trend
2. Top tables by size
3. Top indexes by size
4. Unused large indexes
5. tempdb peak usage
6. Restore test duration
7. Filegroup review
8. VLF review
9. Autogrowth setting review
10. Storage capacity forecastFinal storage diagnosis model
A solid SQL Server storage review answers:
1. Where is the data?
2. Where is the log?
3. Where is tempdb?
4. How fast are reads?
5. How fast are writes?
6. How often do files grow?
7. Which objects consume space?
8. Which indexes waste space?
9. Can we restore within RTO?
10. Can the storage survive the next 6 months of growth?T-SQL: SQL plus procedural power
T-SQL is Microsoft SQL Server's dialect of SQL. It combines relational SQL with procedural extensions: variables, batches, stored procedures, functions, transactions, error handling, temporary objects, table variables, dynamic SQL, cursors, triggers, JSON functions and administrative commands.
In production, T-SQL is not only a query language. It often becomes a stable API layer between applications and data. Stored procedures can centralize business rules, security, validation, logging and transaction control. But badly written T-SQL can also create hidden complexity, slow queries, blocking, parameter sniffing, tempdb pressure and maintenance pain.
What T-SQL is good at
- Set-based data processing: operate on sets of rows instead of loops.
- Transactional logic: keep related changes atomic and recoverable.
- Stored procedures: expose stable database-side APIs to applications.
- Reporting queries: joins, aggregations, windows, grouping and filtering.
- JSON integration: parse and produce JSON for APIs and hybrid schemas.
- Operational automation: maintenance scripts, health checks, DBA reports.
Mental map
T-SQL programming
|
+-- Query language
| +-- SELECT
| +-- JOIN
| +-- GROUP BY
| +-- window functions
|
+-- Data modification
| +-- INSERT
| +-- UPDATE
| +-- DELETE
| +-- MERGE
|
+-- Programmability
| +-- stored procedures
| +-- scalar functions
| +-- inline table-valued functions
| +-- triggers
|
+-- Reliability
| +-- transactions
| +-- TRY/CATCH
| +-- THROW
| +-- XACT_STATE
|
+-- Semi-structured data
+-- JSON_VALUE
+-- JSON_QUERY
+-- OPENJSON
+-- FOR JSONDBA warning
T-SQL can be the best friend or the worst enemy.
Good T-SQL:
- set-based
- parameterized
- predictable transactions
- measured with execution plans
- clear error handling
- documented procedures
Bad T-SQL:
- row-by-row loops
- SELECT *
- implicit conversions
- non-sargable predicates
- long transactions
- uncontrolled dynamic SQL
- hidden trigger side effectsProgramming object comparison
| Object | Main use | Strength | Risk |
|---|---|---|---|
| Stored procedure | Database-side API and transaction logic | Stable interface, security, reuse | Business logic hidden in database |
| Inline table-valued function | Parameterized reusable query | Often optimizes well like a view | Can still hide complexity |
| Scalar function | Reusable scalar calculation | Readable abstraction | Can hurt performance if executed row by row |
| Trigger | Audit, validation, side effects on DML | Automatic enforcement | Invisible behavior and cascading problems |
| View | Reusable query abstraction | Security and simplification | Nested views can become unreadable |
| Dynamic SQL | Runtime-built queries | Flexible search and admin scripts | SQL injection and plan cache pollution |
Core T-SQL syntax
T-SQL extends standard SQL with variables, control flow, batches, temporary objects, system functions and SQL Server-specific expressions. The most important habit is to write set-based, deterministic and measurable code.
Variables and basic flow
DECLARE @customer_id int = 1001;
DECLARE @min_amount decimal(12,2) = 500.00;
DECLARE @status varchar(20) = 'ACTIVE';
IF @status = 'ACTIVE'
BEGIN
SELECT
order_id,
order_date,
total_amount
FROM dbo.orders
WHERE customer_id = @customer_id
AND total_amount >= @min_amount;
END
ELSE
BEGIN
SELECT 'Customer is not active' AS message;
END;CASE expression
SELECT
order_id,
total_amount,
CASE
WHEN total_amount >= 10000 THEN 'HIGH'
WHEN total_amount >= 1000 THEN 'MEDIUM'
ELSE 'LOW'
END AS order_band
FROM dbo.orders;Common building blocks
| Feature | Example | Use |
|---|---|---|
| Variable | DECLARE @id int | Store a local value |
| Batch separator | GO | Client-side batch separation |
| Temporary table | #orders | Store intermediate results |
| Table variable | @items table | Small scoped rowset |
| CTE | WITH cte AS (...) | Readable query decomposition |
| Window function | ROW_NUMBER() | Rank, running totals, analytics |
CTE and window function
WITH ranked_orders AS (
SELECT
customer_id,
order_id,
order_date,
total_amount,
ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY order_date DESC
) AS rn
FROM dbo.orders
)
SELECT
customer_id,
order_id,
order_date,
total_amount
FROM ranked_orders
WHERE rn = 1;Set-based versus row-by-row thinking
| Approach | Pattern | Performance profile | Recommended? |
|---|---|---|---|
| Set-based | One statement processes many rows | Optimizer can choose efficient plan | Yes, default choice |
| Cursor | One row at a time | Often slow and log-heavy | Only when truly required |
| WHILE loop | Manual iteration | Can be slow at scale | Use for controlled batches only |
| Batch processing | Set-based chunks | Good for large updates and deletes | Excellent for maintenance |
Batch delete example
WHILE 1 = 1
BEGIN
DELETE TOP (5000)
FROM dbo.audit_log
WHERE created_at < DATEADD(day, -180, SYSUTCDATETIME());
IF @@ROWCOUNT = 0
BREAK;
WAITFOR DELAY '00:00:01';
END;Stored procedures as database-side APIs
Stored procedures are one of the most important SQL Server programming tools. They can encapsulate validation, transactions, security, logging and stable access patterns. A clean procedure is predictable, parameterized, small enough to maintain, and measurable through Query Store or execution plans.
Good stored procedure design
- Clear name: use a verb and business object.
- Typed parameters: match column data types to avoid implicit conversions.
- SET NOCOUNT ON: reduce unnecessary row count messages.
- Transaction scope: keep it as short as possible.
- Error handling: use TRY/CATCH and THROW.
- Security: grant EXECUTE instead of direct table access when appropriate.
- Observability: log business errors and monitor performance.
Procedure template
CREATE OR ALTER PROCEDURE dbo.create_customer_order
@customer_id int,
@order_date datetime2(0),
@created_by sysname
AS
BEGIN
SET NOCOUNT ON;
SET XACT_ABORT ON;
BEGIN TRY
BEGIN TRAN;
IF NOT EXISTS (
SELECT 1
FROM dbo.customers
WHERE customer_id = @customer_id
AND status = 'ACTIVE'
)
BEGIN
THROW 50001, 'Customer is not active or does not exist.', 1;
END;
INSERT INTO dbo.orders (
customer_id,
order_date,
created_by,
created_at
)
VALUES (
@customer_id,
@order_date,
@created_by,
SYSUTCDATETIME()
);
SELECT SCOPE_IDENTITY() AS new_order_id;
COMMIT;
END TRY
BEGIN CATCH
IF XACT_STATE() <> 0
ROLLBACK;
THROW;
END CATCH;
END;Procedure design table
| Design point | Good practice | Bad practice | Risk |
|---|---|---|---|
| Naming | dbo.create_customer_order | sp_process | Ambiguous maintenance |
| Parameters | Match table column types | Use generic varchar for everything | Implicit conversion and bad plans |
| Result shape | Stable columns and types | Different result sets by branch | Application fragility |
| Transactions | Short and explicit | Long transaction around slow work | Blocking and log growth |
| Error handling | TRY/CATCH, XACT_STATE, THROW | Ignore errors or return magic codes only | Silent data corruption or partial work |
Execution and permissions
GRANT EXECUTE ON dbo.create_customer_order TO app_role;
EXEC dbo.create_customer_order
@customer_id = 1001,
@order_date = '2026-01-15T10:30:00',
@created_by = 'api-service';Functions: useful, but dangerous if misunderstood
T-SQL functions help reuse logic, but not all functions have the same performance profile. Inline table-valued functions are often excellent because the optimizer can expand them. Multi-statement table-valued functions and scalar functions can be problematic when they hide row-by-row execution, poor estimates or expensive logic.
Function types
| Type | Returns | Performance profile |
|---|---|---|
| Scalar function | Single value | Can be costly when called per row |
| Inline TVF | Table expression | Usually best function pattern |
| Multi-statement TVF | Table variable built inside function | Can suffer from poor estimates |
Inline TVF example
CREATE OR ALTER FUNCTION dbo.get_customer_orders
(
@customer_id int,
@from_date date
)
RETURNS TABLE
AS
RETURN
(
SELECT
o.order_id,
o.customer_id,
o.order_date,
o.total_amount,
o.status
FROM dbo.orders AS o
WHERE o.customer_id = @customer_id
AND o.order_date >= @from_date
);Usage
SELECT
order_id,
order_date,
total_amount
FROM dbo.get_customer_orders(1001, '2026-01-01')
WHERE status = 'PAID'
ORDER BY order_date DESC;Scalar function warning
-- Dangerous pattern when executed for many rows
SELECT
order_id,
dbo.calculate_discount(total_amount, customer_id) AS discount
FROM dbo.orders;
-- Better options:
-- 1. inline the expression when simple
-- 2. use an inline table-valued function
-- 3. precompute in a persisted computed column if appropriate
-- 4. move complex logic outside hot queriesFunction decision table
| Need | Recommended object | Why |
|---|---|---|
| Reusable filtered query | Inline TVF | Composable and optimizer-friendly |
| Simple formatting | Application layer | Avoid doing presentation work in database |
| Heavy calculation for every row | Precompute or redesign | Scalar row-by-row cost can be high |
| Security filter | Inline TVF or row-level security predicate | Centralized and reusable |
Transactions: ACID in practice
Transactions make a group of changes atomic. Either all changes are committed, or the work is rolled back. In SQL Server, transaction design directly affects locks, blocking, log growth, deadlocks, recovery time and application latency.
ACID reminder
| Property | Meaning | SQL Server mechanism |
|---|---|---|
| Atomicity | All or nothing | Transaction log and rollback |
| Consistency | Rules remain valid | Constraints, triggers, application logic |
| Isolation | Concurrent work is controlled | Locks, row versioning, isolation levels |
| Durability | Committed work survives crash | Transaction log flush |
Safe transfer example
CREATE OR ALTER PROCEDURE dbo.transfer_funds
@from_account_id int,
@to_account_id int,
@amount decimal(19,4)
AS
BEGIN
SET NOCOUNT ON;
SET XACT_ABORT ON;
BEGIN TRY
BEGIN TRAN;
UPDATE dbo.accounts
SET balance = balance - @amount
WHERE account_id = @from_account_id
AND balance >= @amount;
IF @@ROWCOUNT <> 1
THROW 50010, 'Insufficient funds or source account missing.', 1;
UPDATE dbo.accounts
SET balance = balance + @amount
WHERE account_id = @to_account_id;
IF @@ROWCOUNT <> 1
THROW 50011, 'Target account missing.', 1;
INSERT INTO dbo.account_movements (
from_account_id,
to_account_id,
amount,
created_at
)
VALUES (
@from_account_id,
@to_account_id,
@amount,
SYSUTCDATETIME()
);
COMMIT;
END TRY
BEGIN CATCH
IF XACT_STATE() <> 0
ROLLBACK;
THROW;
END CATCH;
END;Isolation levels
| Isolation level | Behavior | Risk / cost | Typical use |
|---|---|---|---|
| READ UNCOMMITTED | Can read uncommitted data | Dirty, duplicated or missing rows | Rarely justified |
| READ COMMITTED | Default locking behavior | Readers can wait for writers | Classic OLTP default |
| READ COMMITTED SNAPSHOT | Readers use row versions | tempdb version store pressure | Reduce read/write blocking |
| SNAPSHOT | Transaction-level consistent version | Update conflicts possible | Consistent reads without blocking |
| SERIALIZABLE | Strongest isolation with range locks | Blocking and deadlock risk | Strict correctness cases |
Transaction monitoring
SELECT
s.session_id,
s.login_name,
s.host_name,
r.status,
r.command,
r.blocking_session_id,
r.wait_type,
r.total_elapsed_time,
t.text AS sql_text
FROM sys.dm_exec_sessions AS s
JOIN sys.dm_exec_requests AS r
ON s.session_id = r.session_id
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) AS t
WHERE s.is_user_process = 1
ORDER BY r.total_elapsed_time DESC;TRY/CATCH and reliable error handling
Error handling is not decoration. It decides whether a failed operation leaves the database clean or inconsistent. SQL Server procedures that modify data should normally use TRY/CATCH, XACT_ABORT, XACT_STATE and THROW.
Core functions
| Function / command | Purpose |
|---|---|
| ERROR_NUMBER() | Returns the error number |
| ERROR_MESSAGE() | Returns the error message |
| ERROR_LINE() | Returns the line where error occurred |
| ERROR_PROCEDURE() | Returns procedure or trigger name when available |
| XACT_STATE() | Shows whether transaction is committable, doomed or absent |
| THROW | Raises or rethrows an error |
Robust error template
BEGIN TRY
SET XACT_ABORT ON;
BEGIN TRAN;
-- Write work here
INSERT INTO dbo.audit_event (
event_type,
event_payload,
created_at
)
VALUES (
'ORDER_CREATED',
N'{"status":"created"}',
SYSUTCDATETIME()
);
COMMIT;
END TRY
BEGIN CATCH
IF XACT_STATE() <> 0
ROLLBACK;
INSERT INTO dbo.error_log (
error_number,
error_message,
error_line,
error_procedure,
created_at
)
VALUES (
ERROR_NUMBER(),
ERROR_MESSAGE(),
ERROR_LINE(),
ERROR_PROCEDURE(),
SYSUTCDATETIME()
);
THROW;
END CATCH;XACT_STATE interpretation
| XACT_STATE | Meaning | Recommended action |
|---|---|---|
| 1 | Active and committable transaction | Usually rollback in CATCH unless business logic says otherwise |
| 0 | No active transaction | No rollback needed |
| -1 | Active but uncommittable transaction | Rollback required |
Anti-patterns in error handling
Bad:
- swallowing errors
- returning only 0 or 1 without detail
- COMMIT inside CATCH without checking state
- no rollback after partial data changes
- using RAISERROR inconsistently in modern code
- logging error but not rethrowing it
Better:
- TRY/CATCH around transaction
- SET XACT_ABORT ON
- XACT_STATE check
- structured error logging
- THROW to preserve error contextTriggers: powerful but dangerous
Triggers execute automatically after or instead of data modifications. They are useful for audit, history, validation and synchronization patterns. But they are also invisible to many developers, can create unexpected workload, and can turn simple DML into complex side effects.
Trigger rules
- Think set-based: inserted and deleted can contain many rows.
- Keep trigger short: no slow remote calls, no complex workflow.
- Never assume one row: multi-row INSERT, UPDATE and DELETE are normal.
- Document side effects: application teams must know they exist.
- Monitor performance: trigger cost is paid by the original DML statement.
Audit trigger example
CREATE OR ALTER TRIGGER dbo.trg_orders_audit
ON dbo.orders
AFTER UPDATE
AS
BEGIN
SET NOCOUNT ON;
INSERT INTO dbo.orders_audit (
order_id,
old_status,
new_status,
changed_at
)
SELECT
i.order_id,
d.status AS old_status,
i.status AS new_status,
SYSUTCDATETIME()
FROM inserted AS i
JOIN deleted AS d
ON i.order_id = d.order_id
WHERE ISNULL(i.status, '') <> ISNULL(d.status, '');
END;inserted and deleted tables
| DML operation | inserted contains | deleted contains |
|---|---|---|
| INSERT | New rows | No rows |
| DELETE | No rows | Old rows |
| UPDATE | New version of rows | Old version of rows |
Bad trigger pattern
-- Bad because it assumes only one row
DECLARE @order_id int;
SELECT @order_id = order_id
FROM inserted;
-- If inserted contains 500 rows, this keeps only one arbitrary row.
-- Always write triggers as set-based logic.Find triggers
SELECT
OBJECT_SCHEMA_NAME(parent_id) AS schema_name,
OBJECT_NAME(parent_id) AS table_name,
name AS trigger_name,
is_disabled,
create_date,
modify_date
FROM sys.triggers
WHERE parent_class_desc = 'OBJECT_OR_COLUMN'
ORDER BY schema_name, table_name, trigger_name;MERGE and UPSERT logic
MERGE is designed to synchronize source and target sets: update matching rows, insert missing rows, and optionally delete rows absent from the source. It is elegant, but in production it must be used carefully because concurrency, duplicate source rows and complex logic can create subtle bugs.
When MERGE is tempting
- Synchronizing staging table into target table.
- Loading dimension tables in a warehouse.
- Applying changes from an external system.
- Building idempotent import routines.
MERGE warnings
- Source must be unique: duplicate source keys can break logic.
- Concurrency matters: two sessions can race without proper locking.
- Test triggers: MERGE can fire triggers with inserted and deleted sets.
- Keep it simple: complex MERGE statements become hard to audit.
MERGE example
MERGE dbo.product AS target
USING dbo.stage_product AS source
ON target.product_code = source.product_code
WHEN MATCHED THEN
UPDATE SET
target.product_name = source.product_name,
target.price = source.price,
target.updated_at = SYSUTCDATETIME()
WHEN NOT MATCHED BY TARGET THEN
INSERT (
product_code,
product_name,
price,
created_at
)
VALUES (
source.product_code,
source.product_name,
source.price,
SYSUTCDATETIME()
)
OUTPUT
$action AS merge_action,
inserted.product_code,
inserted.product_name;Alternative UPSERT pattern
BEGIN TRAN;
UPDATE tgt
SET
tgt.product_name = src.product_name,
tgt.price = src.price,
tgt.updated_at = SYSUTCDATETIME()
FROM dbo.product AS tgt
JOIN dbo.stage_product AS src
ON tgt.product_code = src.product_code;
INSERT INTO dbo.product (
product_code,
product_name,
price,
created_at
)
SELECT
src.product_code,
src.product_name,
src.price,
SYSUTCDATETIME()
FROM dbo.stage_product AS src
WHERE NOT EXISTS (
SELECT 1
FROM dbo.product AS tgt
WHERE tgt.product_code = src.product_code
);
COMMIT;MERGE decision table
| Scenario | MERGE fit | Preferred approach |
|---|---|---|
| Simple warehouse dimension load | Good if tested | MERGE or separated UPDATE/INSERT |
| High-concurrency OLTP upsert | Risky | Explicit transaction with locking strategy |
| Complex business rules | Risky | Separate statements for readability |
| Need detailed audit output | Possible | MERGE with OUTPUT, or explicit audit table |
JSON and XML in SQL Server
SQL Server can store and process semi-structured data. JSON is commonly used for API payloads, application metadata, audit details, configuration and flexible attributes. XML is older and more strongly typed, with XML data type, XQuery and XML indexes.
JSON functions
| Function | Use |
|---|---|
| ISJSON | Validate JSON text |
| JSON_VALUE | Extract scalar value |
| JSON_QUERY | Extract JSON object or array |
| JSON_MODIFY | Update JSON text |
| OPENJSON | Parse JSON into rows |
| FOR JSON | Return query result as JSON |
Extract JSON values
DECLARE @payload nvarchar(max) = N'{
"customer": "ACME",
"amount": 1200.50,
"items": [
{ "sku": "A-100", "qty": 2 },
{ "sku": "B-200", "qty": 1 }
]
}';
SELECT
JSON_VALUE(@payload, '$.customer') AS customer_name,
TRY_CAST(JSON_VALUE(@payload, '$.amount') AS decimal(12,2)) AS amount,
JSON_QUERY(@payload, '$.items') AS items_json;Parse JSON array
SELECT
sku,
qty
FROM OPENJSON(@payload, '$.items')
WITH (
sku varchar(30) '$.sku',
qty int '$.qty'
);Return JSON from relational rows
SELECT
c.customer_id,
c.customer_name,
(
SELECT
o.order_id,
o.order_date,
o.total_amount
FROM dbo.orders AS o
WHERE o.customer_id = c.customer_id
FOR JSON PATH
) AS orders
FROM dbo.customers AS c
WHERE c.customer_id = 1001
FOR JSON PATH, WITHOUT_ARRAY_WRAPPER;JSON design table
| Use case | Good fit? | Warning |
|---|---|---|
| API payload archive | Yes | Keep searchable fields relational |
| Flexible attributes | Sometimes | Can become schema chaos |
| High-frequency filtered fields | No, unless indexed carefully | Extract to computed columns when needed |
| Audit details | Yes | Validate JSON and control size |
Computed column for JSON search
ALTER TABLE dbo.api_event
ADD customer_code AS JSON_VALUE(payload_json, '$.customerCode');
CREATE INDEX IX_api_event_customer_code
ON dbo.api_event(customer_code);Dynamic SQL: flexibility with strict safety
Dynamic SQL is useful when table names, column lists, filters or administrative statements must be built at runtime. It is also one of the easiest ways to create SQL injection, plan cache pollution and unreadable systems.
Good dynamic SQL rules
- Use sp_executesql: parameterize values.
- Use QUOTENAME: protect identifiers such as schema, table or column names.
- Never concatenate user values: values must be parameters.
- Log generated SQL: helpful for debugging and audits.
- Keep scope limited: dynamic SQL should solve a real problem, not become default style.
Safe parameterized dynamic SQL
DECLARE @sql nvarchar(max);
DECLARE @customer_id int = 1001;
DECLARE @from_date date = '2026-01-01';
SET @sql = N'
SELECT
order_id,
order_date,
total_amount
FROM dbo.orders
WHERE customer_id = @p_customer_id
AND order_date >= @p_from_date
ORDER BY order_date DESC;
';
EXEC sys.sp_executesql
@sql,
N'@p_customer_id int, @p_from_date date',
@p_customer_id = @customer_id,
@p_from_date = @from_date;Dynamic object name example
DECLARE @schema_name sysname = N'dbo';
DECLARE @table_name sysname = N'orders';
DECLARE @sql nvarchar(max);
SET @sql = N'
SELECT COUNT(*) AS row_count
FROM ' + QUOTENAME(@schema_name) + N'.' + QUOTENAME(@table_name) + N';';
EXEC sys.sp_executesql @sql;Dangerous pattern
-- Bad: value concatenation creates SQL injection risk
SET @sql = N'
SELECT *
FROM dbo.users
WHERE user_name = ''' + @user_input + N''';
';
-- If @user_input contains malicious text, the query is compromised.
-- Use sp_executesql with parameters instead.Dynamic SQL decision table
| Need | Dynamic SQL justified? | Safer option |
|---|---|---|
| Variable filter values | No | Regular parameterized SQL |
| Variable table name | Yes | QUOTENAME plus whitelist |
| Optional filters | Sometimes | sp_executesql with parameters |
| Administrative script over many databases | Yes | Controlled dynamic SQL with logging |
T-SQL performance patterns
T-SQL performance is mostly about giving the optimizer good options: sargable predicates, useful indexes, accurate statistics, correct data types, controlled transactions, and set-based logic. Most slow T-SQL is not slow because SQL Server is weak. It is slow because the code hides the search condition, reads too much, loops row by row, or forces bad estimates.
| Anti-pattern | Consequence | Better pattern |
|---|---|---|
| SELECT * | Extra I/O, wider memory grants, unstable application coupling | Select only required columns |
| Function on filtered column | Index seek may become scan | Rewrite predicate to be sargable |
| Implicit conversion | Bad estimates or index not used | Match parameter and column data types |
| Cursor for mass update | Slow row-by-row execution | Set-based update or controlled batches |
| Long transaction | Blocking, log growth, deadlocks | Short transaction scope |
| NOLOCK everywhere | Dirty and inconsistent reads | Proper isolation strategy |
Non-sargable predicate
-- Bad: function applied to column
SELECT order_id
FROM dbo.orders
WHERE YEAR(order_date) = 2026;
-- Better: range predicate
SELECT order_id
FROM dbo.orders
WHERE order_date >= '2026-01-01'
AND order_date < '2027-01-01';Implicit conversion risk
-- Bad if customer_code is varchar
DECLARE @customer_code nvarchar(30) = N'ACME';
SELECT customer_id
FROM dbo.customers
WHERE customer_code = @customer_code;
-- Better: parameter type matches column type
DECLARE @customer_code_v varchar(30) = 'ACME';
SELECT customer_id
FROM dbo.customers
WHERE customer_code = @customer_code_v;Covering index example
CREATE INDEX IX_orders_customer_date
ON dbo.orders(customer_id, order_date)
INCLUDE (total_amount, status);
SELECT
order_date,
total_amount,
status
FROM dbo.orders
WHERE customer_id = 1001
AND order_date >= '2026-01-01'
ORDER BY order_date DESC;Parameter sniffing symptom
Same procedure:
- fast for small customer
- slow for large customer
- same cached plan reused
- data distribution is skewed
Possible actions:
- better indexes
- update statistics
- Query Store analysis
- PSP on SQL Server 2022+
- OPTION RECOMPILE for specific cases
- procedure redesignTop query DMV
SELECT TOP (25)
qs.execution_count,
qs.total_worker_time / 1000 AS total_cpu_ms,
qs.total_elapsed_time / 1000 AS total_elapsed_ms,
qs.total_logical_reads,
qs.total_logical_writes,
SUBSTRING(
st.text,
(qs.statement_start_offset / 2) + 1,
((CASE qs.statement_end_offset
WHEN -1 THEN DATALENGTH(st.text)
ELSE qs.statement_end_offset
END - qs.statement_start_offset) / 2) + 1
) AS statement_text
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
ORDER BY qs.total_worker_time DESC;DBA and developer checklist for T-SQL
This checklist is designed for reviewing production T-SQL code. It helps detect reliability, performance, security and maintainability risks before they become incidents.
| Area | Question | Good signal | Bad signal |
|---|---|---|---|
| Parameters | Do parameter types match columns? | No implicit conversions | Generic nvarchar parameters everywhere |
| Transactions | Is transaction scope short? | Only required writes inside transaction | User interaction or slow work inside transaction |
| Error handling | Does code use TRY/CATCH correctly? | Rollback and THROW | Errors swallowed silently |
| Predicates | Are filters sargable? | Range filters and direct column comparison | Functions applied to indexed columns |
| Indexes | Does the query have a useful access path? | Seek or justified scan | Large scan for selective lookup |
| Dynamic SQL | Is it parameterized? | sp_executesql and QUOTENAME | String concatenation of user input |
| Triggers | Are triggers documented and set-based? | Short and auditable | Hidden complex side effects |
| JSON | Are searchable fields indexed or extracted? | Computed columns when needed | Full JSON scan on hot workload |
| MERGE | Is source unique and concurrency tested? | Controlled staging and tests | Blind MERGE in OLTP hot path |
| Observability | Can performance be measured? | Query Store and clear procedure names | Anonymous ad hoc workload only |
Code review checklist
Before production:
1. Check execution plan
2. Check logical reads
3. Check CPU and duration
4. Check missing or excessive indexes
5. Check transaction scope
6. Check error handling
7. Check parameter data types
8. Check concurrency behavior
9. Check tempdb spills
10. Check Query Store after deploymentIncident checklist
When T-SQL causes an incident:
1. Identify SQL text or procedure
2. Capture plan
3. Check waits
4. Check blocking
5. Check Query Store history
6. Compare previous plan
7. Check statistics and indexes
8. Check parameter values
9. Apply safe mitigation
10. Document permanent fixFinal T-SQL design model
Good T-SQL is:
Correct:
- transactionally safe
- error-aware
- constraint-friendly
Fast:
- set-based
- sargable
- indexed correctly
- measured with plans and Query Store
Secure:
- parameterized
- least privilege
- no SQL injection
Maintainable:
- readable names
- small procedures
- documented side effects
- testable behaviorIndexes are physical access paths
An index is not just a tuning object. It is a physical structure stored in pages and extents. Every index consumes disk, memory, transaction log, backup space and maintenance time. A good index can reduce millions of logical reads. A bad index can slow every insert, update and delete.
SQL Server indexing is the art of matching workload patterns with physical access paths: WHERE filters, JOIN predicates, ORDER BY clauses, GROUP BY patterns, foreign keys, reporting scans, update frequency and data distribution.
Core principle
A useful index is not defined by beauty.
It is defined by evidence:
1. Which query needs it?
2. Which predicate does it support?
3. How selective is the key?
4. Does it avoid a scan?
5. Does it avoid a sort?
6. Does it avoid a key lookup?
7. What is the write cost?
8. How often is it used?
9. How large is it?
10. Can Query Store prove the benefit?Index family map
SQL Server indexes
|
+-- Rowstore indexes
| |
| +-- Clustered index
| +-- Nonclustered index
| +-- Unique index
| +-- Filtered index
| +-- Included columns
|
+-- Columnstore indexes
| |
| +-- Clustered columnstore
| +-- Nonclustered columnstore
|
+-- Specialized indexes
|
+-- Full-text index
+-- XML index
+-- Spatial index
+-- Memory-optimized indexesRead versus write trade-off
One extra index can improve:
- SELECT with selective predicates
- JOIN lookup
- ORDER BY without sort
- GROUP BY with ordered access
- covering query performance
But it also costs:
- more pages on disk
- more memory pressure
- more log records
- slower INSERT
- slower UPDATE
- slower DELETE
- longer backups
- longer maintenanceIndex types overview
| Index type | Best use | Strength | Warning |
|---|---|---|---|
| Clustered index | Main table access path | Defines leaf-level row order | Bad clustered key affects every nonclustered index |
| Nonclustered index | Secondary access path | Fast seeks and covering queries | Too many indexes slow writes |
| Unique index | Enforce uniqueness | Data integrity plus optimizer knowledge | Must match real business rule |
| Filtered index | Small subset of rows | Very efficient for selective states | Predicate must match query pattern |
| Columnstore index | Analytics and large scans | Compression and batch mode | Not always ideal for pure OLTP |
| Full-text index | Linguistic text search | Better than LIKE '%word%' | Not a full replacement for Elasticsearch/Lucene |
B-tree rowstore index model
Most traditional SQL Server indexes are B-tree structures. The root page points to intermediate pages, which point to leaf pages. For a clustered index, the leaf level is the actual data. For a nonclustered index, the leaf level contains index keys plus a row locator or included columns.
B-tree diagram
Root page
|
+-- Intermediate page A
| |
| +-- Leaf page A1
| +-- Leaf page A2
|
+-- Intermediate page B
|
+-- Leaf page B1
+-- Leaf page B2
Index seek:
root -> intermediate -> leaf -> matching rows
Index scan:
read many or all leaf pagesSeek, scan, lookup
| Operation | Meaning | Good / bad? |
|---|---|---|
| Index Seek | Targeted access using index key order | Usually good for selective queries |
| Index Scan | Reads many pages from an index | Good for large analytical reads, bad for selective OLTP |
| Key Lookup | Fetch missing columns from clustered index | Good for few rows, bad for many rows |
| RID Lookup | Fetch missing columns from a heap | Often a sign the heap design needs review |
| Bookmark Lookup | Generic term for lookup back to base row | Can dominate query cost |
Access path intuition
Selective predicate:
WHERE customer_id = 1001
-> index seek can be excellent
Broad predicate:
WHERE order_date >= '2020-01-01'
-> scan may be legitimate
Important:
seek is not always good
scan is not always bad
the real question is:
how many pages are read for the business result?Logical reads are the first performance currency
| Signal | Meaning | Index interpretation |
|---|---|---|
| Low logical reads | Few pages read from memory | Good access path for the query |
| High logical reads | Many pages read even from cache | Possible scan, bad predicate or wide index |
| High physical reads | Pages had to be read from disk | Memory pressure or cold cache |
| Large key lookup count | Repeated base-row lookups | Consider covering index or query rewrite |
Measure logical reads
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
SELECT
order_id,
order_date,
total_amount
FROM dbo.orders
WHERE customer_id = 1001
AND order_date >= '2026-01-01';
SET STATISTICS IO OFF;
SET STATISTICS TIME OFF;Clustered index: the table’s main physical path
A clustered index defines the logical order of rows at the leaf level. In SQL Server, a table can have only one clustered index because the data rows themselves live at the leaf level of that structure. The clustered key is also stored in nonclustered indexes as the row locator, so choosing it badly has a global cost.
Good clustered key qualities
- Narrow: small key means smaller nonclustered indexes.
- Stable: avoid updates to the clustered key.
- Unique or made unique: duplicate clustered keys require internal uniquifier overhead.
- Usually increasing: reduces random page splits for heavy insert workloads.
- Useful for access pattern: should support the way the table is queried.
Clustered index diagram
Clustered index on orders(order_id)
Root
|
+-- Intermediate
|
+-- Leaf pages
|
+-- Full data row order_id = 1
+-- Full data row order_id = 2
+-- Full data row order_id = 3
The leaf level IS the table data.Create clustered index
CREATE TABLE dbo.orders
(
order_id bigint NOT NULL,
customer_id int NOT NULL,
order_date datetime2(0) NOT NULL,
total_amount decimal(12,2) NOT NULL,
status varchar(20) NOT NULL,
CONSTRAINT PK_orders
PRIMARY KEY CLUSTERED (order_id)
);Clustered key decision table
| Candidate key | Pros | Cons | Verdict |
|---|---|---|---|
| bigint identity | Narrow, increasing, simple | Can create last-page insert hotspot under extreme concurrency | Very common OLTP choice |
| random uniqueidentifier | Globally unique | Wide, random inserts, page splits, fragmentation | Usually bad as clustered key |
| sequential GUID | More insert-friendly than random GUID | Still wide | Acceptable when GUID is required |
| natural business key | Meaningful and searchable | May be wide, mutable or not truly stable | Use only if stable and narrow |
| date key | Useful for time-range queries | Duplicates, hot ranges, poor uniqueness alone | Usually needs composite design |
Find clustered indexes
SELECT
s.name AS schema_name,
o.name AS table_name,
i.name AS clustered_index_name,
i.is_unique,
i.fill_factor
FROM sys.indexes AS i
JOIN sys.objects AS o
ON i.object_id = o.object_id
JOIN sys.schemas AS s
ON o.schema_id = s.schema_id
WHERE i.type_desc = 'CLUSTERED'
AND o.is_ms_shipped = 0
ORDER BY s.name, o.name;Nonclustered indexes: secondary access paths
A nonclustered index is a separate B-tree structure. Its leaf pages contain the nonclustered key, optional included columns, and a row locator. It is designed to help specific query patterns without changing the clustered structure of the table.
Nonclustered index anatomy
Nonclustered index:
key columns:
customer_id, order_date
included columns:
total_amount, status
row locator:
clustered key value
or RID if table is a heap
Query can be covered if all required columns are in the index.Example
CREATE INDEX IX_orders_customer_date
ON dbo.orders(customer_id, order_date DESC)
INCLUDE (total_amount, status);Query it supports
SELECT
order_date,
total_amount,
status
FROM dbo.orders
WHERE customer_id = 1001
AND order_date >= '2026-01-01'
ORDER BY order_date DESC;Why it works
WHERE:
customer_id = equality predicate
order_date = range predicate
ORDER BY:
order_date DESC matches index order
SELECT:
total_amount and status are included
Result:
seek + ordered read + no key lookupKey order rules
| Rule | Explanation | Example |
|---|---|---|
| Equality first | Columns filtered with equals are often strong leading keys | customer_id = @id |
| Range after equality | Range columns work well after equality keys | order_date >= @date |
| Order matters | Index key order can avoid explicit sort | ORDER BY order_date DESC |
| Join keys matter | Foreign keys often need supporting indexes | orders.customer_id |
| Do not over-cover | Included columns increase index size | Avoid including large unused columns |
Multiple query patterns: one index is not universal
Pattern A:
WHERE customer_id = @customer_id
ORDER BY order_date DESC
Good index:
(customer_id, order_date DESC)
Pattern B:
WHERE order_date >= @from_date
AND status = 'PAID'
Possible index:
(status, order_date)
Pattern C:
WHERE sales_rep_id = @rep
AND status = 'OPEN'
Possible index:
(sales_rep_id, status)
One table can need several indexes, but each index must be justified.Included columns: covering without changing key order
Included columns live at the leaf level of a nonclustered index. They do not participate in the B-tree navigation order, but they can satisfy SELECT columns and avoid key lookups. They are excellent when used carefully, but dangerous when they turn every index into a copy of the table.
Key versus included columns
| Part | Used for navigation? | Used for covering? |
|---|---|---|
| Key column | Yes | Yes |
| Included column | No | Yes |
Covering index example
CREATE INDEX IX_invoice_customer_date
ON dbo.invoice(customer_id, invoice_date)
INCLUDE (invoice_number, total_amount, payment_status);Covered query
SELECT
invoice_number,
invoice_date,
total_amount,
payment_status
FROM dbo.invoice
WHERE customer_id = @customer_id
AND invoice_date >= @from_date;Effect
Without included columns:
seek index -> key lookup for missing columns
With included columns:
seek index -> return directly from index leaf
Benefit:
fewer random lookups
fewer logical reads
lower CPU
better latencyIncluded column decision table
| Column type | Good include? | Reason |
|---|---|---|
| Small frequently selected column | Yes | Good covering value with limited size cost |
| Large varchar(max) or nvarchar(max) | Usually no | Can bloat index and hurt memory |
| Column used only in WHERE equality | Key, not include | Must help navigation |
| Column used only in SELECT | Include candidate | Can avoid lookup |
| Frequently updated column | Be careful | Every update must maintain the index |
Detect lookup-heavy plans
-- Use actual execution plan and look for:
-- - Key Lookup
-- - RID Lookup
-- - high number of executions
-- - high logical reads on base table
-- - nested loops repeatedly calling lookup
-- Then decide:
-- 1. add included columns
-- 2. rewrite query
-- 3. accept lookup if row count is low
-- 4. create a different index if predicate is wrongFiltered indexes: small, precise and powerful
A filtered index indexes only rows that match a predicate. This can be extremely efficient for sparse or status-based data, such as active rows, unprocessed jobs, open orders, non-deleted records or rows with a specific type.
Good filtered index cases
- Soft delete:
WHERE is_deleted = 0 - Open workflow:
WHERE status = 'OPEN' - Queue processing:
WHERE processed_at IS NULL - Sparse data:
WHERE optional_code IS NOT NULL - Hot subset: small active subset inside a huge historical table
Filtered index example
CREATE INDEX IX_orders_open_customer_date
ON dbo.orders(customer_id, order_date)
INCLUDE (total_amount, status)
WHERE status = 'OPEN'
AND is_deleted = 0;Query must match filter
SELECT
order_id,
order_date,
total_amount
FROM dbo.orders
WHERE customer_id = @customer_id
AND status = 'OPEN'
AND is_deleted = 0
ORDER BY order_date DESC;Why it is efficient
Full table:
100,000,000 rows
Open active orders:
350,000 rows
Filtered index:
only stores open active rows
Result:
smaller index
less memory
less I/O
faster maintenance
better selectivityFiltered index warnings
| Problem | Why it hurts | Fix |
|---|---|---|
| Query predicate does not match filter | Optimizer may not use the filtered index | Align query and filter exactly |
| Parameterization hides value | Optimizer may not prove the filter applies | Test with real procedure pattern |
| Filter is too broad | Index becomes almost full-table | Use only genuinely selective filters |
| Filter changes frequently | Rows constantly enter and leave index | Measure write cost |
Filtered unique index example
-- Allow only one active subscription per customer
CREATE UNIQUE INDEX UX_subscription_one_active
ON dbo.subscription(customer_id)
WHERE status = 'ACTIVE';Columnstore indexes: analytics and compression
Columnstore indexes store data by columns instead of rows. They are excellent for analytical workloads: large scans, aggregations, fact tables, reporting and data warehouses. They can compress strongly and use batch mode execution, which can greatly reduce CPU for large sets.
Columnstore concepts
| Concept | Meaning |
|---|---|
| Rowgroup | Large group of rows compressed together |
| Column segment | Column data inside a rowgroup |
| Delta store | Rowstore area for new rows before compression |
| Tuple mover | Background process that compresses rowgroups |
| Segment elimination | Skip segments based on min/max metadata |
| Batch mode | Vectorized execution for many rows at once |
Columnstore architecture
Fact table
|
+-- Rowgroup 1
| +-- column segment: date_key
| +-- column segment: product_key
| +-- column segment: quantity
| +-- column segment: amount
|
+-- Rowgroup 2
| +-- column segment: date_key
| +-- column segment: product_key
| +-- column segment: quantity
| +-- column segment: amount
|
v
Compression + batch mode + segment eliminationCreate clustered columnstore
CREATE CLUSTERED COLUMNSTORE INDEX CCI_fact_sales
ON dbo.fact_sales;Create nonclustered columnstore
CREATE NONCLUSTERED COLUMNSTORE INDEX NCCI_orders_analytics
ON dbo.orders (
customer_id,
order_date,
status,
total_amount
);Columnstore fit table
| Workload | Columnstore fit | Reason |
|---|---|---|
| Large fact table scans | Excellent | Compression and batch mode |
| Aggregations over millions of rows | Excellent | Columnar reads and vectorized operators |
| Small OLTP point lookup | Poor | Rowstore B-tree is usually better |
| Mixed OLTP/reporting table | Maybe | Nonclustered columnstore can help if tested |
| High-frequency singleton updates | Risky | Columnstore prefers batch-style operations |
Columnstore health
SELECT
OBJECT_SCHEMA_NAME(rg.object_id) AS schema_name,
OBJECT_NAME(rg.object_id) AS table_name,
i.name AS index_name,
rg.row_group_id,
rg.state_desc,
rg.total_rows,
rg.deleted_rows,
rg.size_in_bytes / 1024 / 1024 AS size_mb
FROM sys.dm_db_column_store_row_group_physical_stats AS rg
JOIN sys.indexes AS i
ON rg.object_id = i.object_id
AND rg.index_id = i.index_id
ORDER BY table_name, row_group_id;Statistics: the optimizer’s eyesight
Statistics describe data distribution. The optimizer uses them to estimate how many rows will flow through each operator. Bad estimates lead to bad joins, wrong memory grants, bad parallelism decisions, unnecessary scans and tempdb spills.
Statistics contain
- Header: metadata such as update time and row count.
- Density vector: information about uniqueness and combinations.
- Histogram: distribution of values for the leading statistics column.
Critical idea
The optimizer does not know the data perfectly.
It estimates.
Bad estimate:
expected rows = 1
actual rows = 1,000,000
Possible consequences:
wrong join type
wrong memory grant
spills to tempdb
nested loops disaster
underestimated parallelism
plan regressionInspect statistics
DBCC SHOW_STATISTICS (
'dbo.orders',
'IX_orders_customer_date'
);Update statistics
UPDATE STATISTICS dbo.orders IX_orders_customer_date;
UPDATE STATISTICS dbo.orders WITH FULLSCAN;
EXEC sp_updatestats;Statistics metadata
SELECT
s.name AS statistics_name,
s.auto_created,
s.user_created,
s.no_recompute,
sp.last_updated,
sp.rows,
sp.rows_sampled,
sp.modification_counter
FROM sys.stats AS s
CROSS APPLY sys.dm_db_stats_properties(
s.object_id,
s.stats_id
) AS sp
WHERE s.object_id = OBJECT_ID('dbo.orders')
ORDER BY sp.last_updated DESC;Statistics maintenance decision table
| Situation | Symptom | Action |
|---|---|---|
| Large data change | Plans suddenly poor | Update statistics on affected tables |
| Ascending key problem | New dates or IDs underestimated | Review statistics and compatibility features |
| Skewed distribution | Parameter-sensitive performance | Query Store, PSP, filtered stats or rewrite |
| Bulk load | Plans bad after load | Update stats after load window |
| Complex predicate | Wrong cardinality estimate | Composite index or computed column statistics |
Filtered statistics example
CREATE STATISTICS ST_orders_open_customer
ON dbo.orders(customer_id)
WHERE status = 'OPEN'
AND is_deleted = 0;Fragmentation: real, but often over-obsessed
Fragmentation means index pages are not physically or logically ordered as efficiently as they could be, or that page density is poor. It can hurt scans and increase I/O, but it is not always the first cause of poor performance. Statistics, bad plans and missing indexes often matter more.
Types of issues
| Issue | Meaning | Impact |
|---|---|---|
| Logical fragmentation | Pages not in logical order | Can hurt range scans |
| Low page density | Pages have too much empty space | More pages, more memory, more I/O |
| Page splits | Page split to make room for row | Extra writes and fragmentation |
| Forwarded records | Heap row moved with pointer left behind | Extra lookups in heaps |
Maintenance options
REORGANIZE:
- lighter operation
- online
- incremental cleanup
- useful for moderate fragmentation
REBUILD:
- rebuilds index structure
- updates statistics with fullscan for index stats
- more expensive
- more log generation
- online availability depends on edition/options
UPDATE STATISTICS:
- often more important than defragmentation
- cheaper than rebuild
- can fix bad estimatesExample commands
ALTER INDEX IX_orders_customer_date
ON dbo.orders
REORGANIZE;
ALTER INDEX IX_orders_customer_date
ON dbo.orders
REBUILD WITH (
FILLFACTOR = 90,
SORT_IN_TEMPDB = ON
);
UPDATE STATISTICS dbo.orders IX_orders_customer_date;Fragmentation report
SELECT
OBJECT_SCHEMA_NAME(ips.object_id) AS schema_name,
OBJECT_NAME(ips.object_id) AS table_name,
i.name AS index_name,
ips.index_type_desc,
ips.avg_fragmentation_in_percent,
ips.avg_page_space_used_in_percent,
ips.page_count
FROM sys.dm_db_index_physical_stats(
DB_ID(),
NULL,
NULL,
NULL,
'LIMITED'
) AS ips
JOIN sys.indexes AS i
ON ips.object_id = i.object_id
AND ips.index_id = i.index_id
WHERE ips.page_count > 1000
ORDER BY ips.avg_fragmentation_in_percent DESC;Maintenance decision table
| Condition | Recommended action | Comment |
|---|---|---|
| Small index under 1000 pages | Usually ignore | Fragmentation percentage is less meaningful |
| Moderate fragmentation | REORGANIZE or do nothing if no scan pain | Measure workload first |
| High fragmentation on large scanned index | REBUILD in maintenance window | Watch log, locks and edition options |
| Bad plan after data changes | UPDATE STATISTICS first | Often more useful than rebuild |
| Heavy write table | Review fill factor and key choice | Do not rebuild endlessly without fixing cause |
Indexing anti-patterns
Bad indexing is one of the most common causes of SQL Server performance problems. The danger is not only missing indexes. The opposite problem is also common: too many overlapping, unused, wide or badly ordered indexes.
| Anti-pattern | Consequence | Correction |
|---|---|---|
| Index every column | Write performance collapses | Index only measured query patterns |
| Many duplicate indexes | Storage and maintenance waste | Consolidate overlapping indexes |
| Wide clustered key | All nonclustered indexes become larger | Use narrow, stable clustered key |
| Random GUID clustered key | Page splits and fragmentation | Use sequential key or different clustering strategy |
| Included columns abuse | Indexes become table copies | Include only columns needed to cover hot queries |
| Ignoring foreign key indexes | Joins and deletes become expensive | Index important FK columns |
| Rebuild everything nightly | Log, I/O and maintenance waste | Targeted maintenance based on evidence |
| Trust missing index DMVs blindly | Index explosion | Consolidate and validate with workload |
Overlapping indexes example
IX_orders_customer:
(customer_id)
IX_orders_customer_date:
(customer_id, order_date)
IX_orders_customer_date_status:
(customer_id, order_date, status)
Possible issue:
first two may be redundant depending on workload.
DBA action:
compare usage
compare included columns
compare query patterns
consolidate carefully
test before droppingNon-sargable query problem
-- Bad: function on indexed column
SELECT order_id
FROM dbo.orders
WHERE YEAR(order_date) = 2026;
-- Better: range predicate
SELECT order_id
FROM dbo.orders
WHERE order_date >= '2026-01-01'
AND order_date < '2027-01-01';
Reason:
the second predicate can use index order.Implicit conversion example
-- If customer_code is varchar(30), this parameter can cause conversion problems
DECLARE @customer_code nvarchar(30) = N'ACME';
SELECT customer_id
FROM dbo.customers
WHERE customer_code = @customer_code;
-- Better: match the column type
DECLARE @customer_code_v varchar(30) = 'ACME';
SELECT customer_id
FROM dbo.customers
WHERE customer_code = @customer_code_v;Index DMV toolkit
Index decisions should be driven by evidence: usage, size, fragmentation, page count, logical reads, Query Store regressions, write cost and business criticality. DMVs are not perfect, but they give a strong operational starting point.
Index usage stats
SELECT
OBJECT_SCHEMA_NAME(i.object_id) AS schema_name,
OBJECT_NAME(i.object_id) AS table_name,
i.name AS index_name,
i.type_desc,
s.user_seeks,
s.user_scans,
s.user_lookups,
s.user_updates
FROM sys.indexes AS i
LEFT JOIN sys.dm_db_index_usage_stats AS s
ON s.object_id = i.object_id
AND s.index_id = i.index_id
AND s.database_id = DB_ID()
WHERE i.object_id > 100
ORDER BY s.user_updates DESC, s.user_seeks ASC;Index size report
SELECT
s.name AS schema_name,
o.name AS table_name,
i.name AS index_name,
i.type_desc,
SUM(a.total_pages) * 8 / 1024 AS total_mb,
SUM(a.used_pages) * 8 / 1024 AS used_mb,
SUM(a.data_pages) * 8 / 1024 AS data_mb
FROM sys.indexes AS i
JOIN sys.objects AS o
ON i.object_id = o.object_id
JOIN sys.schemas AS s
ON o.schema_id = s.schema_id
JOIN sys.partitions AS p
ON i.object_id = p.object_id
AND i.index_id = p.index_id
JOIN sys.allocation_units AS a
ON p.partition_id = a.container_id
WHERE o.is_ms_shipped = 0
GROUP BY s.name, o.name, i.name, i.type_desc
ORDER BY total_mb DESC;Missing index suggestions
SELECT TOP (50)
DB_NAME(mid.database_id) AS database_name,
OBJECT_NAME(mid.object_id, mid.database_id) AS table_name,
migs.user_seeks,
migs.avg_total_user_cost,
migs.avg_user_impact,
mid.equality_columns,
mid.inequality_columns,
mid.included_columns
FROM sys.dm_db_missing_index_group_stats AS migs
JOIN sys.dm_db_missing_index_groups AS mig
ON migs.group_handle = mig.index_group_handle
JOIN sys.dm_db_missing_index_details AS mid
ON mig.index_handle = mid.index_handle
WHERE mid.database_id = DB_ID()
ORDER BY migs.avg_user_impact DESC;Warning about missing index DMVs
Missing index DMVs:
- are suggestions, not truth
- ignore existing index consolidation
- can recommend many overlapping indexes
- reset after restart
- do not know write cost
- do not replace plan analysis
Use them as clues, not commands.Foreign keys without supporting index
SELECT
OBJECT_SCHEMA_NAME(fk.parent_object_id) AS schema_name,
OBJECT_NAME(fk.parent_object_id) AS table_name,
fk.name AS foreign_key_name
FROM sys.foreign_keys AS fk
WHERE NOT EXISTS (
SELECT 1
FROM sys.index_columns AS ic
JOIN sys.foreign_key_columns AS fkc
ON fkc.parent_object_id = ic.object_id
AND fkc.parent_column_id = ic.column_id
WHERE ic.object_id = fk.parent_object_id
AND fkc.constraint_object_id = fk.object_id
);Index investigation workflow
When a query is slow:
1. Capture actual execution plan
2. Check logical reads
3. Identify scans, lookups, sorts and spills
4. Check predicate sargability
5. Check existing indexes
6. Check statistics freshness
7. Check missing index suggestions
8. Design one candidate index
9. Test with representative data
10. Measure CPU, duration, reads, writes
11. Check write overhead
12. Deploy with rollback planDBA checklist for index design
This checklist is designed for production index review. The goal is not to create as many indexes as possible. The goal is to create the smallest useful set of indexes that supports the workload with acceptable write cost and maintenance effort.
| Area | Question | Good signal | Bad signal |
|---|---|---|---|
| Clustered key | Is it narrow, stable and useful? | Small key, low fragmentation, clear access pattern | Wide random GUID clustered key |
| Nonclustered indexes | Do they match real queries? | Seeks, useful scans, reduced lookups | Many unused or overlapping indexes |
| Included columns | Do they cover hot queries? | Fewer key lookups | Huge include lists copying the table |
| Filtered indexes | Is the filter selective and matched? | Small targeted index | Predicate mismatch or broad filter |
| Columnstore | Is workload analytical? | Large scans, compression, batch mode | Pure singleton OLTP workload |
| Statistics | Are estimates accurate? | Estimated rows close to actual rows | Huge estimate mismatch |
| Fragmentation | Does it matter for this workload? | Targeted maintenance | Blind rebuild everything every night |
| Write cost | How expensive are updates? | Index count justified | OLTP table with 30 indexes |
| Storage | How large are indexes? | Capacity and backup impact known | Index size never reviewed |
| Observability | Can benefit be proven? | Query Store, plans, reads, CPU comparison | Index created because a DMV suggested it |
Weekly index review
Weekly:
1. Top slow queries from Query Store
2. Top logical read queries
3. Missing index suggestions
4. Unused large indexes
5. Duplicate or overlapping indexes
6. Fragmentation on large scanned indexes
7. Statistics freshness
8. Write-heavy tables
9. tempdb spills caused by bad plans
10. Index changes after deploymentBefore creating a new index
Ask:
1. Which query needs it?
2. What is the current plan?
3. What are current logical reads?
4. Is the predicate sargable?
5. Are statistics correct?
6. Does an existing index almost solve it?
7. Can we modify an existing index?
8. What is the write overhead?
9. How large will it be?
10. How will we prove success?Final index design model
Good index design balances:
Read benefit:
- seeks
- fewer scans
- fewer lookups
- fewer sorts
- better joins
- better grouping
Write cost:
- insert overhead
- update overhead
- delete overhead
- transaction log growth
- maintenance duration
- storage and backup size
Operational proof:
- execution plan
- logical reads
- CPU
- duration
- Query Store
- index usage
- business impactOptimizer: the decision engine behind every query
The SQL Server optimizer is responsible for transforming a T-SQL request into an execution plan. It does not search every possible plan forever. It uses metadata, statistics, indexes, constraints, cardinality estimates, cost models, transformation rules and available features to choose a plan that is estimated to be efficient.
The optimizer is not magic and it is not omniscient. It works with estimates. When estimates are wrong, the chosen plan can be wrong: bad join type, excessive scan, underestimated row count, oversized memory grant, tempdb spill, bad parallelism or catastrophic key lookup loops.
Optimizer inputs
- T-SQL text: predicates, joins, projections, grouping, ordering.
- Statistics: histograms and density information about data distribution.
- Indexes: possible access paths, key order, included columns, filtered predicates.
- Constraints: primary keys, foreign keys, unique constraints, check constraints.
- Parameters: values may influence compilation and plan choice.
- Database compatibility level: controls optimizer behavior and feature availability.
- Cost model: estimates CPU, I/O and memory-related operations.
Optimizer mental model
T-SQL query
|
v
Parser
|
v
Algebrizer / Binder
|
v
Optimizer
|
+-- Reads metadata
+-- Reads statistics
+-- Evaluates indexes
+-- Estimates cardinality
+-- Explores join order
+-- Chooses physical operators
+-- Requests memory grant
+-- Decides parallelism
|
v
Execution plan
|
v
Execution engine
|
v
Runtime feedback:
- Query Store
- wait stats
- actual plan
- memory grant feedback
- runtime statsDBA translation
When a plan is bad, ask:
1. Was the SQL written correctly?
2. Are predicates sargable?
3. Are statistics accurate?
4. Is cardinality estimated correctly?
5. Are useful indexes available?
6. Is parameter sniffing involved?
7. Is memory grant wrong?
8. Is tempdb spilling?
9. Has the plan changed recently?
10. What does Query Store show?Optimizer responsibility table
| Decision | What optimizer chooses | Bad symptom | Common root cause |
|---|---|---|---|
| Access method | Index seek, index scan, table scan | Large scan for selective query | Missing index, bad predicate, stale stats |
| Join type | Nested loops, hash join, merge join | Nested loops over millions of rows | Bad cardinality estimate |
| Join order | Which table is accessed first | Huge intermediate result | Wrong estimates or missing constraints |
| Memory grant | Memory for sort, hash, exchange | Spill to tempdb or wasted memory | Estimate error or skew |
| Parallelism | Serial or parallel plan | CX waits, CPU pressure, skewed threads | Bad MAXDOP, low cost threshold, bad plan |
| Plan reuse | Compile once, reuse plan | Fast once, slow later | Parameter sniffing or data skew |
Query pipeline: from text to execution
SQL Server transforms a query through several stages before returning rows. Understanding this pipeline helps distinguish syntax problems, binding problems, optimization problems, execution problems and storage problems.
Pipeline
1. Client sends T-SQL batch
2. Parser validates syntax
3. Algebrizer resolves:
- object names
- column names
- data types
- permissions
4. Optimizer estimates costs
5. Optimizer selects execution plan
6. Execution engine runs operators
7. Storage engine reads or writes pages
8. Results are returned to clientCompilation versus execution
| Phase | What happens | Typical problem |
|---|---|---|
| Parsing | Syntax checked | Invalid T-SQL syntax |
| Binding | Names and types resolved | Invalid object, ambiguous column |
| Optimization | Plan generated | Bad estimate, bad index choice |
| Execution | Operators run | Waits, blocking, spills, I/O |
| Runtime feedback | Metrics captured | Need Query Store and actual plan |
Important distinction
Estimated execution plan:
- based on optimizer estimates
- does not run the query
- useful before execution
Actual execution plan:
- collected during execution
- includes actual rows
- reveals estimate errors
- reveals spills and warnings
For tuning, actual plan is usually essential.Pipeline diagram
T-SQL text
|
v
Parser
|
v
Algebrizer / Binding
|
+-- resolve tables
+-- resolve columns
+-- resolve data types
+-- validate permissions
|
v
Optimizer
|
+-- explore alternatives
+-- estimate row counts
+-- estimate costs
+-- choose operators
|
v
Execution Plan
|
v
Runtime
|
+-- CPU
+-- memory grant
+-- logical reads
+-- physical reads
+-- waits
+-- spills
+-- locksCompilation metrics query
SELECT
counter_name,
cntr_value
FROM sys.dm_os_performance_counters
WHERE object_name LIKE '%SQL Statistics%'
AND counter_name IN (
'Batch Requests/sec',
'SQL Compilations/sec',
'SQL Re-Compilations/sec'
);Execution plans: the DBA’s X-ray
An execution plan shows how SQL Server intends to access data and process the query. It exposes access methods, join types, row estimates, memory grants, sorts, scans, seeks, lookups, parallelism and warnings.
Important plan operators
| Operator | Meaning | Warning sign |
|---|---|---|
| Index Seek | Targeted access through index key | May still return many rows |
| Index Scan | Reads many or all index pages | Bad for selective OLTP lookup |
| Key Lookup | Fetches missing columns from clustered index | Disaster if repeated many times |
| Nested Loops | For each row, lookup matching rows | Bad if outer input is huge |
| Hash Match | Builds hash table for join or aggregate | Can spill if memory grant is low |
| Sort | Sorts rows | Can spill or request large memory |
| Exchange | Parallelism data movement | Skew or parallel overhead |
Plan reading workflow
Read a plan like this:
1. Start with expensive operators
2. Check estimated rows vs actual rows
3. Look for warnings
- spills
- missing indexes
- implicit conversions
4. Check access methods
- seek
- scan
- lookup
5. Check join choices
6. Check memory grant
7. Check parallelism
8. Check predicates
9. Check sort operations
10. Validate with logical reads and durationEstimated versus actual rows
Good estimate:
estimated rows = 1000
actual rows = 1100
Bad estimate:
estimated rows = 1
actual rows = 850000
Consequences:
wrong join type
wrong memory grant
wrong parallelism
spills
bad plan reuseCommon plan warnings
| Warning | Meaning | Common fix |
|---|---|---|
| Missing index | Optimizer saw a possible useful index | Validate, consolidate, test |
| Implicit conversion | Data types do not match cleanly | Align parameter and column types |
| Sort spill | Sort wrote to tempdb | Stats, index order, memory grant, rewrite |
| Hash spill | Hash operation wrote to tempdb | Stats, join order, memory grant, indexing |
| Excessive grant | Query received too much memory | Stats, plan review, memory grant feedback |
Enable actual execution plan and I/O metrics
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
-- Run the query with actual execution plan enabled in SSMS.
SELECT
order_id,
order_date,
total_amount
FROM dbo.orders
WHERE customer_id = 1001
AND order_date >= '2026-01-01';
SET STATISTICS IO OFF;
SET STATISTICS TIME OFF;Cardinality estimation: the root of many plan choices
Cardinality estimation is the process of estimating how many rows will be returned by each operator. It is one of the most important parts of optimization. If row estimates are wrong, the optimizer can choose the wrong join type, wrong memory grant, wrong index strategy and wrong parallelism.
Why estimates go wrong
- Stale statistics: data changed but stats did not reflect it.
- Data skew: one value is extremely common, another is rare.
- Correlated columns: optimizer assumes independence where data is related.
- Non-sargable predicates: functions or expressions hide selectivity.
- Implicit conversions: column conversion prevents normal index use.
- Table variables: historically poor estimates in older compatibility levels.
- Local variables: unknown values can produce generic estimates.
Cardinality error diagram
Predicate:
WHERE customer_id = @customer_id
Actual data:
customer_id = 1001 -> 12 rows
customer_id = 9000 -> 4,500,000 rows
If optimizer compiles for 1001:
chooses nested loops + seeks
If reused for 9000:
nested loops become catastrophic
This is one form of parameter sniffing.Estimate error impact
Underestimate:
- memory grant too small
- hash/sort spills to tempdb
- nested loops chosen incorrectly
- parallelism not chosen
Overestimate:
- memory grant too large
- concurrency reduced
- hash join chosen unnecessarily
- excessive parallelismCardinality problem patterns
| Pattern | Example | Risk | Possible action |
|---|---|---|---|
| Skewed column | Status = OPEN versus ARCHIVED | One plan does not fit all values | Filtered stats, filtered index, PSP, rewrite |
| Function on column | YEAR(order_date) = 2026 | Index and stats less useful | Use range predicate |
| Implicit conversion | varchar column compared to nvarchar parameter | Scan or bad estimate | Match data types |
| Correlated predicates | country = Spain and city = Alicante | Independence assumption can fail | Composite stats or index |
| Ascending key | New dates not represented in histogram | Recent data underestimated | Stats update and compatibility features |
Bad versus good predicate
-- Bad: non-sargable predicate
SELECT order_id
FROM dbo.orders
WHERE YEAR(order_date) = 2026;
-- Better: range predicate
SELECT order_id
FROM dbo.orders
WHERE order_date >= '2026-01-01'
AND order_date < '2027-01-01';Statistics: the optimizer’s eyesight
Statistics describe the distribution of values in columns or indexes. Without good statistics, the optimizer is almost blind. It may choose the wrong join order, wrong join type, wrong memory grant or wrong access method.
Statistics components
| Component | Purpose |
|---|---|
| Header | Metadata: rows, sampled rows, modification count, update time |
| Density vector | Average selectivity and uniqueness information |
| Histogram | Distribution of values for the leading statistics column |
Important rule
Statistics histogram exists only on the leading column.
Index:
(customer_id, order_date)
Histogram:
customer_id
Density:
may include combinations
Consequence:
key order matters for both access and estimation.Inspect statistics
DBCC SHOW_STATISTICS (
'dbo.orders',
'IX_orders_customer_date'
);Statistics metadata
SELECT
OBJECT_SCHEMA_NAME(s.object_id) AS schema_name,
OBJECT_NAME(s.object_id) AS table_name,
s.name AS stats_name,
s.auto_created,
s.user_created,
s.no_recompute,
sp.last_updated,
sp.rows,
sp.rows_sampled,
sp.modification_counter
FROM sys.stats AS s
CROSS APPLY sys.dm_db_stats_properties(
s.object_id,
s.stats_id
) AS sp
WHERE s.object_id = OBJECT_ID('dbo.orders')
ORDER BY sp.last_updated DESC;Update statistics
UPDATE STATISTICS dbo.orders IX_orders_customer_date;
UPDATE STATISTICS dbo.orders WITH FULLSCAN;
EXEC sp_updatestats;Statistics decision table
| Situation | Symptom | Action |
|---|---|---|
| Large data load | Bad plans after import | Update stats after load |
| Skewed values | Same query fast for some parameters, slow for others | PSP, filtered stats, filtered index, Query Store |
| Stale statistics | Estimated rows far from actual rows | Update relevant stats |
| Correlated filters | Combined predicate badly estimated | Composite index or multi-column stats |
| Ascending date or identity | Recent values underestimated | Stats maintenance and compatibility review |
Create filtered statistics
CREATE STATISTICS ST_orders_open_customer
ON dbo.orders(customer_id)
WHERE status = 'OPEN'
AND is_deleted = 0;Parameter sniffing: feature, not always bug
Parameter sniffing means SQL Server uses parameter values available at compilation time to optimize the plan. This is usually good: the optimizer can build a plan for the real value. It becomes a problem when the first compiled value is not representative and the cached plan is reused for very different values.
Classic scenario
Stored procedure:
dbo.get_orders_by_customer @customer_id
Data distribution:
customer 1001 -> 10 orders
customer 9000 -> 5,000,000 orders
Compilation:
first execution uses 1001
optimizer chooses nested loops + seeks
Later:
execution uses 9000
same plan reused
nested loops become disastrousSymptoms
- Procedure is fast for one parameter and slow for another.
- Clearing plan cache temporarily changes performance.
- Recompiling fixes one case and breaks another.
- Query Store shows multiple plans with very different performance.
- Estimated rows and actual rows vary dramatically by parameter.
Remediation options
| Option | Use carefully when |
|---|---|
| Better index | Plan is bad because access path is missing |
| Update statistics | Estimates are stale |
| PSP optimization | SQL Server 2022+ and skewed parameter patterns |
| OPTION RECOMPILE | Compile cost acceptable and value varies widely |
| OPTIMIZE FOR | A representative value is known |
| Query Store forcing | Known stable plan is proven better |
| Procedure branching | Small and large cases need genuinely different logic |
Procedure branching example
CREATE OR ALTER PROCEDURE dbo.get_customer_orders
@customer_id int
AS
BEGIN
SET NOCOUNT ON;
DECLARE @order_count bigint;
SELECT @order_count = COUNT_BIG(*)
FROM dbo.orders
WHERE customer_id = @customer_id;
IF @order_count < 10000
BEGIN
SELECT
order_id,
order_date,
total_amount
FROM dbo.orders
WHERE customer_id = @customer_id
ORDER BY order_date DESC;
END
ELSE
BEGIN
SELECT
order_id,
order_date,
total_amount
FROM dbo.orders WITH (INDEX(IX_orders_customer_date))
WHERE customer_id = @customer_id
ORDER BY order_date DESC
OPTION (RECOMPILE);
END
END;Parameter sniffing diagnosis checklist
1. Capture slow parameter value
2. Capture fast parameter value
3. Compare actual execution plans
4. Compare estimated rows vs actual rows
5. Check Query Store plan history
6. Check statistics histogram
7. Check index suitability
8. Test PSP behavior if SQL Server 2022+
9. Test RECOMPILE only as controlled option
10. Document final mitigationQuery Store: the flight recorder for SQL Server performance
Query Store captures query text, plans, runtime statistics and wait statistics over time. It is essential for upgrade validation, regression detection, plan comparison, production tuning and evidence-based troubleshooting.
What Query Store stores
- Query text: normalized query statements.
- Query IDs: stable identifiers for tracked queries.
- Plan IDs: different execution plans for the same query.
- Runtime stats: duration, CPU, reads, writes, executions.
- Wait stats: wait categories associated with query execution.
- Plan forcing state: whether a plan is forced and whether forcing succeeded.
Enable Query Store
ALTER DATABASE YourDatabaseName
SET QUERY_STORE = ON;
ALTER DATABASE YourDatabaseName
SET QUERY_STORE (
OPERATION_MODE = READ_WRITE,
CLEANUP_POLICY = (STALE_QUERY_THRESHOLD_DAYS = 30),
DATA_FLUSH_INTERVAL_SECONDS = 900,
INTERVAL_LENGTH_MINUTES = 60,
MAX_STORAGE_SIZE_MB = 2048,
QUERY_CAPTURE_MODE = AUTO
);Query Store workflow
Normal operations:
capture baseline
Incident:
find regressed query
Analysis:
compare old plan vs new plan
Mitigation:
tune query / update stats / index / force plan
Validation:
monitor runtime stats after fix
Governance:
review forced plans regularlyTop queries by average duration
SELECT TOP (25)
q.query_id,
p.plan_id,
rs.count_executions,
rs.avg_duration / 1000.0 AS avg_duration_ms,
rs.avg_cpu_time / 1000.0 AS avg_cpu_ms,
rs.avg_logical_io_reads,
qt.query_sql_text
FROM sys.query_store_query_text AS qt
JOIN sys.query_store_query AS q
ON qt.query_text_id = q.query_text_id
JOIN sys.query_store_plan AS p
ON q.query_id = p.query_id
JOIN sys.query_store_runtime_stats AS rs
ON p.plan_id = rs.plan_id
ORDER BY rs.avg_duration DESC;Query Store use cases
| Use case | What to compare | Decision |
|---|---|---|
| Upgrade validation | Before and after duration, CPU, reads, plans | Accept, tune or rollback compatibility level |
| Regression detection | Old good plan versus new bad plan | Force plan temporarily or fix root cause |
| Parameter sniffing | Multiple plans and runtime variation | PSP, rewrite, hints or plan governance |
| Index tuning | Reads and duration before/after index | Keep index only if benefit is proven |
| Production reporting | Top queries by CPU, duration, reads | Monthly tuning roadmap |
Plan forcing: useful, but not a cure-all
Plan forcing tells SQL Server to use a specific known plan for a query. It can be a powerful emergency mitigation when a regression appears after statistics changes, upgrade, parameter sniffing or plan cache churn. But it should not replace root cause analysis.
When plan forcing is useful
- A query suddenly regressed and Query Store shows a previous stable plan.
- Application code cannot be changed quickly.
- A production incident requires fast mitigation.
- The forced plan is tested and monitored.
- The root cause will be addressed later.
When plan forcing is risky
- Data distribution is changing rapidly.
- The old plan is good only for one parameter value.
- Indexes used by the plan may change or disappear.
- Forcing hides the real issue permanently.
- No one reviews forced plans after deployment.
Force a plan
EXEC sys.sp_query_store_force_plan
@query_id = 123,
@plan_id = 456;Unforce a plan
EXEC sys.sp_query_store_unforce_plan
@query_id = 123,
@plan_id = 456;Find forced plans
SELECT
q.query_id,
p.plan_id,
p.is_forced_plan,
p.force_failure_count,
p.last_force_failure_reason_desc,
qt.query_sql_text
FROM sys.query_store_plan AS p
JOIN sys.query_store_query AS q
ON p.query_id = q.query_id
JOIN sys.query_store_query_text AS qt
ON q.query_text_id = qt.query_text_id
WHERE p.is_forced_plan = 1
ORDER BY p.force_failure_count DESC;Plan forcing governance table
| Step | Question | Expected evidence |
|---|---|---|
| Before forcing | Is the old plan clearly better? | Query Store runtime comparison |
| During forcing | Does forcing succeed? | No force failure reason |
| After forcing | Did duration, CPU and reads improve? | Query Store after mitigation |
| Root cause | Why did the plan regress? | Stats, index, parameter, compatibility review |
| Review | Should this plan still be forced? | Monthly forced plan report |
Emergency decision flow
Production regression detected
|
v
Query Store shows old fast plan and new slow plan
|
+-- Old plan still valid?
| |
| +-- yes --> force plan as mitigation
| |
| +-- no --> do not force blindly
|
v
Monitor CPU, duration, reads, waits
|
v
Investigate root cause:
- stats changed?
- parameter sniffing?
- index changed?
- compatibility level changed?
- data distribution changed?Hints: surgical tools, not default style
Hints influence optimizer behavior. They can be useful when the optimizer lacks information or when an emergency mitigation is needed. But hints can also freeze a bad assumption and make future data changes dangerous.
Common hint categories
| Hint | Use | Risk |
|---|---|---|
| OPTION RECOMPILE | Compile plan for current parameter values | Higher CPU from compilation |
| OPTIMIZE FOR | Optimize for a known representative value | Bad if value stops being representative |
| OPTIMIZE FOR UNKNOWN | Use generic estimate | May be mediocre for all values |
| MAXDOP | Control parallelism per query | Can hide global config problem |
| FORCESEEK | Force seek access path | Bad when scan is better |
| USE HINT | Modern hint mechanism | Requires precise documentation |
Examples
-- Compile for current parameter values
SELECT
order_id,
order_date,
total_amount
FROM dbo.orders
WHERE customer_id = @customer_id
OPTION (RECOMPILE);
-- Optimize for a representative value
SELECT
order_id,
order_date,
total_amount
FROM dbo.orders
WHERE customer_id = @customer_id
OPTION (OPTIMIZE FOR (@customer_id = 1001));
-- Limit parallelism for one query
SELECT
customer_id,
SUM(total_amount) AS revenue
FROM dbo.orders
GROUP BY customer_id
OPTION (MAXDOP 4);Hint governance
Before adding a hint:
1. prove the problem
2. check statistics
3. check indexes
4. check Query Store history
5. test with representative parameters
6. document the reason
7. add review date
8. monitor after deploymentHint decision table
| Problem | Hint candidate | Better first check |
|---|---|---|
| Plan bad for different parameter values | RECOMPILE, OPTIMIZE FOR, PSP | Data skew, stats, Query Store |
| Query goes too parallel | MAXDOP | Cost threshold, query shape, indexes |
| Optimizer chooses scan | FORCESEEK | Sargability, index design, estimates |
| Regression after upgrade | Query Store hint or plan forcing | Compatibility level and Query Store comparison |
Waits and plans: connect symptoms to operators
Execution plans show the chosen strategy. Wait stats show where time is lost during execution. A good DBA uses both. A plan can look reasonable but wait on I/O. A query can wait on locks because another transaction is blocking it. A memory spill can appear both in the plan and through tempdb-related waits.
Wait interpretation
| Wait family | Meaning | Plan connection |
|---|---|---|
| PAGEIOLATCH | Waiting for data page from disk | Scans, missing indexes, memory pressure |
| WRITELOG | Waiting for log flush | Heavy writes, tiny commits, slow log disk |
| LCK_M_* | Waiting for locks | Long transactions, bad access path, blocking |
| RESOURCE_SEMAPHORE | Waiting for memory grant | Large sorts/hashes, overestimated grants |
| SOS_SCHEDULER_YIELD | CPU pressure or CPU-heavy query | Bad joins, scans, scalar functions |
| CXPACKET / CXCONSUMER | Parallel query coordination | Parallel plan, skew, high DOP |
Tuning workflow
1. Identify user symptom
2. Find active requests
3. Check wait type
4. Capture actual execution plan
5. Compare estimated and actual rows
6. Check logical reads
7. Check Query Store history
8. Check recent plan changes
9. Validate stats and indexes
10. Test fix with representative workloadWait stats query
SELECT TOP (30)
wait_type,
waiting_tasks_count,
wait_time_ms,
signal_wait_time_ms,
wait_time_ms - signal_wait_time_ms AS resource_wait_time_ms
FROM sys.dm_os_wait_stats
WHERE wait_type NOT LIKE 'SLEEP%'
AND wait_type NOT LIKE 'BROKER%'
AND wait_type NOT LIKE 'XE%'
AND wait_type NOT LIKE 'SQLTRACE%'
ORDER BY wait_time_ms DESC;Active request waits with SQL text
SELECT
r.session_id,
r.status,
r.command,
r.wait_type,
r.wait_time,
r.blocking_session_id,
r.cpu_time,
r.total_elapsed_time,
r.logical_reads,
r.reads,
r.writes,
DB_NAME(r.database_id) AS database_name,
t.text AS sql_text
FROM sys.dm_exec_requests AS r
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) AS t
WHERE r.session_id <> @@SPID
ORDER BY r.total_elapsed_time DESC;Symptom-to-action matrix
| Symptom | Likely focus | First action |
|---|---|---|
| High CPU | Plan shape, scans, joins, scalar functions | Top CPU queries and actual plans |
| High I/O waits | Reads, storage, missing indexes | Logical reads and file latency |
| Blocking | Transactions and access paths | Find blocker and transaction age |
| tempdb spikes | Spills, version store, temp objects | Check plan warnings and tempdb usage |
| Regression after deployment | Plan change or parameter change | Query Store before/after comparison |
Optimizer and Query Store DMV toolkit
A reliable tuning workflow needs repeatable queries. These DMV and Query Store queries help locate expensive queries, plan regressions, active waits, memory grants and compilation pressure.
Top cached queries by CPU
SELECT TOP (25)
qs.execution_count,
qs.total_worker_time / 1000 AS total_cpu_ms,
qs.total_worker_time / NULLIF(qs.execution_count, 0) / 1000 AS avg_cpu_ms,
qs.total_elapsed_time / 1000 AS total_elapsed_ms,
qs.total_logical_reads,
qs.total_logical_writes,
SUBSTRING(
st.text,
(qs.statement_start_offset / 2) + 1,
((CASE qs.statement_end_offset
WHEN -1 THEN DATALENGTH(st.text)
ELSE qs.statement_end_offset
END - qs.statement_start_offset) / 2) + 1
) AS statement_text
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
ORDER BY qs.total_worker_time DESC;Current memory grants
SELECT
session_id,
request_id,
scheduler_id,
dop,
requested_memory_kb,
granted_memory_kb,
required_memory_kb,
used_memory_kb,
max_used_memory_kb,
wait_time_ms,
is_next_candidate
FROM sys.dm_exec_query_memory_grants
ORDER BY requested_memory_kb DESC;Query Store plans for one query
SELECT
q.query_id,
p.plan_id,
p.is_forced_plan,
p.last_execution_time,
rs.count_executions,
rs.avg_duration / 1000.0 AS avg_duration_ms,
rs.avg_cpu_time / 1000.0 AS avg_cpu_ms,
rs.avg_logical_io_reads,
qt.query_sql_text
FROM sys.query_store_query AS q
JOIN sys.query_store_query_text AS qt
ON q.query_text_id = qt.query_text_id
JOIN sys.query_store_plan AS p
ON q.query_id = p.query_id
JOIN sys.query_store_runtime_stats AS rs
ON p.plan_id = rs.plan_id
WHERE q.query_id = 123
ORDER BY rs.avg_duration DESC;Forced plans review
SELECT
q.query_id,
p.plan_id,
p.is_forced_plan,
p.force_failure_count,
p.last_force_failure_reason_desc,
qt.query_sql_text
FROM sys.query_store_plan AS p
JOIN sys.query_store_query AS q
ON p.query_id = q.query_id
JOIN sys.query_store_query_text AS qt
ON q.query_text_id = qt.query_text_id
WHERE p.is_forced_plan = 1
ORDER BY p.force_failure_count DESC;Compilation pressure query
SELECT
counter_name,
cntr_value
FROM sys.dm_os_performance_counters
WHERE object_name LIKE '%SQL Statistics%'
AND counter_name IN (
'Batch Requests/sec',
'SQL Compilations/sec',
'SQL Re-Compilations/sec'
);Missing indexes: clue, not truth
SELECT TOP (30)
DB_NAME(mid.database_id) AS database_name,
OBJECT_NAME(mid.object_id, mid.database_id) AS table_name,
migs.user_seeks,
migs.avg_total_user_cost,
migs.avg_user_impact,
mid.equality_columns,
mid.inequality_columns,
mid.included_columns
FROM sys.dm_db_missing_index_group_stats AS migs
JOIN sys.dm_db_missing_index_groups AS mig
ON migs.group_handle = mig.index_group_handle
JOIN sys.dm_db_missing_index_details AS mid
ON mig.index_handle = mid.index_handle
WHERE mid.database_id = DB_ID()
ORDER BY migs.avg_user_impact DESC;DBA checklist for optimizer and Query Store
This checklist is designed for real production tuning. The objective is to move from vague complaints to measured facts: query text, plan, row estimates, waits, reads, CPU, duration, Query Store history and controlled mitigation.
| Area | Question | Good signal | Bad signal |
|---|---|---|---|
| Query Store | Is it enabled and sized? | Stable history, enough storage | No history during incident |
| Plans | Do actual rows match estimated rows? | Reasonable estimate accuracy | Estimated 1, actual 1,000,000 |
| Statistics | Are stats fresh and relevant? | Recent stats after large changes | Old stats after bulk load |
| Indexes | Does the query have a good access path? | Seek or justified scan, low reads | Large scan for selective predicate |
| Parameter sniffing | Does performance vary by parameter? | PSP or stable plan strategy | Fast for one customer, terrible for another |
| Memory grants | Are grants reasonable? | No spills, no grant starvation | RESOURCE_SEMAPHORE or tempdb spills |
| Parallelism | Is parallelism helpful? | Balanced CPU and runtime improvement | Thread skew, CX waits, CPU storm |
| Hints | Are hints documented? | Clear reason and review date | Old hints nobody understands |
| Plan forcing | Are forced plans reviewed? | Monthly forced plan report | Permanent emergency workaround |
| Regression | Can before/after be proven? | Query Store comparison | Subjective “it feels slower” |
Slow query workflow
1. Identify the exact query
2. Capture actual execution plan
3. Measure duration, CPU, reads
4. Check waits
5. Compare estimated and actual rows
6. Check Query Store history
7. Check stats freshness
8. Check index design
9. Test fix in representative conditions
10. Deploy with rollback optionRegression workflow
1. Open Query Store
2. Find regressed query
3. Compare old plan and new plan
4. Check recent changes:
- deployment
- stats update
- index change
- compatibility level
- parameter pattern
5. Apply mitigation:
- update stats
- tune index
- Query Store force plan
- Query Store hint
- code rewrite
6. Monitor after fixFinal optimizer diagnosis model
A strong SQL Server tuning diagnosis answers:
1. What exact query is slow?
2. Is this a new regression or always slow?
3. What plan is used now?
4. What plan was used before?
5. Are estimates accurate?
6. Are statistics valid?
7. Are indexes appropriate?
8. Are parameters skewed?
9. Are waits CPU, I/O, lock, memory or tempdb?
10. Can Query Store prove the improvement?Concurrency: the art of allowing many users without corrupting data
SQL Server concurrency is the set of mechanisms that allows many sessions to read and write the same database at the same time while preserving transactional correctness. The main tools are locks, latches, isolation levels, row versioning, transaction log, deadlock detection and the optimizer’s choice of access paths.
A concurrency problem is often not a pure locking problem. It can be caused by missing indexes, long transactions, poor access order, bad query plans, excessive isolation, tempdb pressure, hot rows, hot pages, parameter sniffing or application behavior.
Core concurrency ideas
- Locks protect logical data: rows, keys, pages, tables, metadata and ranges.
- Latches protect internal memory structures: pages and allocation structures while being accessed.
- Isolation controls visibility: each level defines what one transaction can see from another.
- Blocking is normal: some waiting is expected; excessive blocking is the problem.
- Deadlocks are detected: SQL Server chooses a victim so the system can continue.
- Row versioning reduces reader/writer blocking: RCSI and SNAPSHOT use version store in tempdb.
- Indexes reduce lock footprint: a targeted seek usually locks fewer resources than a broad scan.
Concurrency mental map
Application sessions
|
+-- Session A reads data
+-- Session B updates data
+-- Session C deletes data
+-- Session D reports data
|
v
SQL Server concurrency control
|
+-- Transactions
+-- Isolation levels
+-- Locks
+-- Latches
+-- Row versioning
+-- Deadlock monitor
+-- Transaction log
|
v
Possible symptoms
|
+-- blocking
+-- deadlocks
+-- lock waits
+-- latch waits
+-- tempdb version store growth
+-- long rollback
+-- poor throughputDBA translation
User says:
"The application freezes."
DBA asks:
1. Is someone blocking everyone?
2. Is there a long open transaction?
3. Which wait type dominates?
4. Which SQL statement holds locks?
5. Is the query scanning too many rows?
6. Is an index missing?
7. Are deadlocks occurring?
8. Is tempdb version store growing?
9. Is isolation level too strong?
10. Did a deployment change access order?Concurrency symptom map
| Symptom | Likely mechanism | First diagnostic | Typical correction |
|---|---|---|---|
| Sessions waiting on LCK_M_* | Lock blocking | Find blocker with sys.dm_exec_requests | Shorter transactions, better indexes, access order |
| Deadlock errors 1205 | Cyclic dependency | Deadlock graph | Consistent access order, indexes, retry logic |
| PAGELATCH waits | Internal page contention | Wait resource and hot page analysis | tempdb layout, key strategy, workload redesign |
| tempdb grows under RCSI | Version store retention | Version store DMV and long transactions | Fix long readers/writers, monitor tempdb |
| Long rollback | Large transaction undo | Transaction DMV and log usage | Batch work, reduce transaction scope |
| Throughput collapse | Blocking chain or worker starvation | Blocking tree, THREADPOOL waits | Kill blocker if justified, fix root cause |
ACID: the foundation of transaction correctness
Transactions exist to guarantee that a set of changes is applied correctly, even with many users, errors, crashes and concurrent operations. SQL Server uses the transaction log, locks, isolation rules and recovery mechanisms to implement ACID behavior.
| Property | Meaning | SQL Server mechanism |
|---|---|---|
| Atomicity | All changes succeed or none do | Transaction log and rollback |
| Consistency | Data remains valid after transaction | Constraints, triggers, application rules |
| Isolation | Transactions do not interfere incorrectly | Locks or row versions |
| Durability | Committed work survives crash | Log flush before commit acknowledgement |
Transaction design rules
- Keep transactions short: do not wait for user input inside a transaction.
- Touch data in consistent order: reduces deadlock probability.
- Use proper error handling: TRY/CATCH, XACT_STATE and THROW.
- Index predicates: reduce lock footprint and scan duration.
- Batch large modifications: avoid massive log, blocking and rollback pain.
Safe transaction template
CREATE OR ALTER PROCEDURE dbo.transfer_funds
@from_account_id int,
@to_account_id int,
@amount decimal(19,4)
AS
BEGIN
SET NOCOUNT ON;
SET XACT_ABORT ON;
BEGIN TRY
BEGIN TRAN;
UPDATE dbo.accounts
SET balance = balance - @amount
WHERE account_id = @from_account_id
AND balance >= @amount;
IF @@ROWCOUNT <> 1
THROW 50010, 'Source account missing or insufficient funds.', 1;
UPDATE dbo.accounts
SET balance = balance + @amount
WHERE account_id = @to_account_id;
IF @@ROWCOUNT <> 1
THROW 50011, 'Target account missing.', 1;
INSERT INTO dbo.account_movement (
from_account_id,
to_account_id,
amount,
created_at
)
VALUES (
@from_account_id,
@to_account_id,
@amount,
SYSUTCDATETIME()
);
COMMIT;
END TRY
BEGIN CATCH
IF XACT_STATE() <> 0
ROLLBACK;
THROW;
END CATCH;
END;Good versus bad transaction scope
| Pattern | Impact | Verdict |
|---|---|---|
| Open transaction, update rows, commit immediately | Locks held briefly | Good |
| Open transaction, call remote API, then update | Locks or transaction context may last too long | Bad |
| Open transaction, wait for user confirmation | Extreme blocking risk | Very bad |
| Delete 50 million rows in one transaction | Huge log, blocking, rollback risk | Batch instead |
| Use consistent object access order | Lower deadlock risk | Good |
Open transaction check
DBCC OPENTRAN;
SELECT
s.session_id,
s.login_name,
s.host_name,
r.status,
r.command,
r.blocking_session_id,
r.wait_type,
r.total_elapsed_time,
t.text AS sql_text
FROM sys.dm_exec_sessions AS s
LEFT JOIN sys.dm_exec_requests AS r
ON s.session_id = r.session_id
OUTER APPLY sys.dm_exec_sql_text(r.sql_handle) AS t
WHERE s.open_transaction_count > 0
ORDER BY s.session_id;Isolation levels: visibility and blocking trade-offs
Isolation level defines what one transaction can see from another transaction. Stronger isolation usually gives stronger correctness guarantees but increases blocking and concurrency cost. Row versioning-based isolation can reduce reader/writer blocking but moves pressure to tempdb.
| Isolation level | Dirty reads | Non-repeatable reads | Phantoms | Main risk |
|---|---|---|---|---|
| READ UNCOMMITTED | Possible | Possible | Possible | Incorrect results |
| READ COMMITTED | Prevented | Possible | Possible | Reader/writer blocking |
| READ COMMITTED SNAPSHOT | Prevented | Statement-level version | Possible by statement | tempdb version store pressure |
| REPEATABLE READ | Prevented | Prevented for read rows | Possible | More locks held longer |
| SNAPSHOT | Prevented | Transaction-level version | Transaction-level consistent view | Update conflicts and tempdb usage |
| SERIALIZABLE | Prevented | Prevented | Prevented | Range locks, blocking, deadlocks |
Set isolation level
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
BEGIN TRAN;
SELECT
order_id,
status,
total_amount
FROM dbo.orders
WHERE customer_id = 1001;
COMMIT;Serializable example
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRAN;
SELECT
*
FROM dbo.orders
WHERE customer_id = 1001
AND status = 'OPEN';
-- Range is protected from phantom inserts
COMMIT;Isolation decision guide
Need maximum correctness for a small critical range?
-> SERIALIZABLE may be justified
Need normal OLTP behavior?
-> READ COMMITTED is common
Need fewer reader/writer blocks?
-> RCSI can help
Need transaction-level consistent snapshot?
-> SNAPSHOT can help
Need dirty reads for speed?
-> Usually a bad idea
Need reporting over OLTP?
-> Prefer replica, warehouse, RCSI or readable secondaryEnable row versioning options
ALTER DATABASE YourDatabaseName
SET READ_COMMITTED_SNAPSHOT ON
WITH ROLLBACK IMMEDIATE;
ALTER DATABASE YourDatabaseName
SET ALLOW_SNAPSHOT_ISOLATION ON;Lock types: what SQL Server protects
Locks are logical concurrency controls. They protect data correctness when multiple sessions access the same resources. Locks can be held on rows, keys, pages, tables, databases, metadata and ranges. The lock mode defines compatibility with other locks.
Common lock modes
| Mode | Name | Typical use |
|---|---|---|
| S | Shared | Reading data under locking isolation |
| X | Exclusive | Modifying data |
| U | Update | Preparing to update, helps avoid conversion deadlocks |
| IS | Intent Shared | Intention to read lower-level resources |
| IX | Intent Exclusive | Intention to modify lower-level resources |
| SIX | Shared with Intent Exclusive | Read object and modify some lower-level rows |
| Range locks | Key-range protection | Serializable isolation phantom protection |
Lock compatibility intuition
Shared locks:
many readers can coexist
Exclusive lock:
writer blocks other readers/writers
depending on isolation level
Update lock:
one session declares intent to update
reduces conversion deadlock risk
Intent locks:
table-level signals that lower-level locks exist
Range locks:
protect gaps between keys
used under SERIALIZABLEBasic lock DMV
SELECT
request_session_id,
resource_type,
resource_database_id,
resource_associated_entity_id,
request_mode,
request_status,
request_owner_type
FROM sys.dm_tran_locks
ORDER BY request_session_id, resource_type, request_mode;Lock mode interpretation
| Observation | Meaning | DBA interpretation |
|---|---|---|
| Many S locks | Readers under locking isolation | Normal unless blocking writers too long |
| X lock waiting | Writer cannot modify resource | Find reader or writer holding incompatible lock |
| IX table lock | Session intends lower-level writes | Normal for updates/deletes |
| Range locks | Serializable protection | Can create heavy blocking |
| Escalated table lock | Many row/page locks converted to table lock | May block broad workload |
Lock hints: use surgically
-- Example: reserve row for update pattern
BEGIN TRAN;
SELECT
queue_id,
payload
FROM dbo.work_queue WITH (UPDLOCK, READPAST, ROWLOCK)
WHERE status = 'READY'
ORDER BY created_at;
-- Then update selected row quickly
COMMIT;Lock hierarchy and escalation
SQL Server can lock at multiple levels: row, key, page, object, database and metadata. Fine-grained locks improve concurrency but consume memory. When too many locks are taken, SQL Server may escalate to a larger lock, commonly a table-level lock.
Lock hierarchy
Database
|
+-- Schema / metadata
|
+-- Table / object
|
+-- HoBT / partition
|
+-- Page
|
+-- Key / rowWhy escalation matters
Before escalation:
update many rows
row locks on many keys
other sessions can still access other rows
After escalation:
table lock
many sessions blocked
concurrency drops sharplyEscalation risk factors
- Large updates or deletes: many rows touched in one transaction.
- Missing indexes: scan locks many rows instead of seeking targeted rows.
- Long transaction: locks are held longer.
- Serializable isolation: range locks increase footprint.
- Foreign key checks: missing FK indexes can broaden locking.
- Partitioning strategy: escalation behavior may interact with partitions.
Batching pattern
WHILE 1 = 1
BEGIN
DELETE TOP (5000)
FROM dbo.audit_log
WHERE created_at < DATEADD(day, -180, SYSUTCDATETIME());
IF @@ROWCOUNT = 0
BREAK;
WAITFOR DELAY '00:00:01';
END;Lock footprint by query shape
| Query shape | Lock footprint | Concurrency consequence | Correction |
|---|---|---|---|
| Seek by primary key | Small | Good concurrency | Usually fine |
| Update using missing index predicate | Large scan | Blocks many unrelated rows | Add targeted index |
| Delete millions of rows | Huge | Escalation, log growth, blocking | Batch delete or partition switch |
| Serializable range query | Range locks | Blocks inserts into range | Use only when required |
Current lock count by session
SELECT
request_session_id,
resource_type,
request_mode,
request_status,
COUNT(*) AS lock_count
FROM sys.dm_tran_locks
GROUP BY
request_session_id,
resource_type,
request_mode,
request_status
ORDER BY lock_count DESC;Blocking: normal mechanism, abnormal when excessive
Blocking happens when one session holds a lock that another session needs. This is not automatically bad. It is how SQL Server protects correctness. The problem appears when blocking lasts too long, forms chains, consumes workers, or stops critical business operations.
Blocking chain example
Session 51
holds X lock on dbo.orders row
transaction open for 8 minutes
|
blocks
v
Session 62
waiting to update same customer
|
blocks
v
Session 74
report waiting behind session 62
Root blocker:
session 51Blocking causes
| Cause | Effect | Fix |
|---|---|---|
| Long transaction | Locks held too long | Reduce scope, commit earlier |
| Missing index | Scan locks too many rows | Add targeted index |
| Slow application | Transaction open while app waits | Move external work outside transaction |
| Reporting on primary | Readers/writers compete | Replica, RCSI, warehouse, read routing |
| Too strong isolation | Unnecessary lock retention | Review isolation choice |
Blocking diagnostic query
SELECT
r.session_id,
r.blocking_session_id,
r.wait_type,
r.wait_time,
r.wait_resource,
r.status,
r.command,
r.cpu_time,
r.total_elapsed_time,
DB_NAME(r.database_id) AS database_name,
t.text AS sql_text
FROM sys.dm_exec_requests AS r
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) AS t
WHERE r.blocking_session_id <> 0
ORDER BY r.wait_time DESC;Root blocker query
SELECT
s.session_id,
s.login_name,
s.host_name,
s.program_name,
s.open_transaction_count,
r.status,
r.command,
r.wait_type,
r.total_elapsed_time,
t.text AS sql_text
FROM sys.dm_exec_sessions AS s
LEFT JOIN sys.dm_exec_requests AS r
ON s.session_id = r.session_id
OUTER APPLY sys.dm_exec_sql_text(r.sql_handle) AS t
WHERE s.session_id IN (
SELECT DISTINCT blocking_session_id
FROM sys.dm_exec_requests
WHERE blocking_session_id <> 0
)
ORDER BY s.session_id;Blocking response ladder
| Severity | Action | Warning |
|---|---|---|
| Minor short wait | Observe | Do not overreact to normal locking |
| Repeated blocker | Identify SQL, plan and transaction scope | Fix root cause, not only symptom |
| Business outage | Consider killing root blocker after approval | Rollback may take time and create more log activity |
| Recurring incident | Change index, code, isolation or workflow | Requires application and DBA cooperation |
Deadlocks: when sessions block each other in a cycle
A deadlock occurs when two or more sessions hold resources that the others need, creating a cycle that cannot resolve by waiting. SQL Server detects the cycle and kills one victim with error 1205 so the other transaction can continue.
Classic deadlock
Session A:
UPDATE customer 1001
then wants order 500
Session B:
UPDATE order 500
then wants customer 1001
Cycle:
A waits for B
B waits for A
SQL Server:
chooses deadlock victim
returns error 1205 to victim sessionCommon causes
- Different access order: sessions update tables in different sequences.
- Missing indexes: scans take more locks and increase overlap.
- Long transactions: locks are held longer.
- Serializable range locks: gap protection increases conflicts.
- Triggers: hidden extra access paths create cycles.
- Foreign key validation: missing indexes on child tables can increase locking.
Prevention rules
Reduce deadlocks:
1. Standardize object access order
2. Keep transactions short
3. Add targeted indexes
4. Avoid user interaction in transaction
5. Reduce scan footprint
6. Use retry logic in application
7. Review triggers
8. Review isolation level
9. Batch large modifications
10. Capture and analyze deadlock graphDeadlock cause table
| Cause | Symptom in graph | Correction |
|---|---|---|
| Opposite table access order | Two sessions hold different object locks and request each other’s | Standardize update order in code |
| Missing index | Large scan involved in deadlock | Add narrow targeted index |
| Lookup deadlock | Key lookup and update conflict | Covering index or query rewrite |
| Serializable range lock | RangeS, RangeX locks | Review isolation or index range design |
| Trigger side effect | Extra object appears unexpectedly | Review trigger logic and access order |
Application retry pattern
Deadlock victim error:
1205
Application behavior:
- catch error 1205
- wait short random delay
- retry limited number of times
- log final failure if retries exhausted
Important:
retry is not a substitute for fixing frequent deadlocks.
It is a resilience layer for rare concurrency races.Find deadlock events with system health, when available
SELECT
XEvent.query('(event/data/value/deadlock)[1]') AS deadlock_graph
FROM (
SELECT CAST(target_data AS xml) AS TargetData
FROM sys.dm_xe_session_targets AS st
JOIN sys.dm_xe_sessions AS s
ON s.address = st.event_session_address
WHERE s.name = 'system_health'
AND st.target_name = 'ring_buffer'
) AS Data
CROSS APPLY TargetData.nodes(
'RingBufferTarget/event[@name="xml_deadlock_report"]'
) AS XEventData(XEvent);RCSI and SNAPSHOT: reducing reader/writer blocking
Row versioning allows readers to read older committed versions of rows instead of waiting behind writers. SQL Server stores row versions in tempdb version store. This can dramatically reduce reader/writer blocking, but it increases tempdb dependency and changes concurrency behavior.
RCSI versus SNAPSHOT
| Mode | Scope | Reader behavior | Risk |
|---|---|---|---|
| RCSI | Statement level | READ COMMITTED reads last committed version as of statement start | tempdb version store growth |
| SNAPSHOT | Transaction level | Transaction sees consistent version as of transaction start | Update conflicts, tempdb growth |
When row versioning helps
- Reporting queries block OLTP writers or are blocked by writers.
- Read-heavy applications suffer from read/write contention.
- Business accepts committed snapshot semantics.
- tempdb is sized and monitored seriously.
Row versioning diagram
Writer updates row
|
+-- new row version in data page
|
+-- old committed version stored in tempdb
|
v
Reader under RCSI
|
+-- does not wait for writer
+-- reads older committed version
|
v
Less blocking
but more tempdb version store usageEnable options
ALTER DATABASE YourDatabaseName
SET READ_COMMITTED_SNAPSHOT ON
WITH ROLLBACK IMMEDIATE;
ALTER DATABASE YourDatabaseName
SET ALLOW_SNAPSHOT_ISOLATION ON;Use SNAPSHOT isolation
SET TRANSACTION ISOLATION LEVEL SNAPSHOT;
BEGIN TRAN;
SELECT
order_id,
status,
total_amount
FROM dbo.orders
WHERE customer_id = 1001;
COMMIT;Row versioning risks
| Risk | Cause | Mitigation |
|---|---|---|
| tempdb growth | Long transactions retain old versions | Monitor version store and transaction age |
| Update conflict | SNAPSHOT transaction updates row changed after snapshot began | Retry logic and transaction design |
| Incorrect business assumption | App expects blocking-based behavior | Functional testing before enabling |
| Hidden tempdb dependency | Readers depend on version store | tempdb capacity, latency and alerts |
Version store monitoring
SELECT
SUM(version_store_reserved_page_count) * 8 / 1024 AS version_store_mb,
SUM(user_object_reserved_page_count) * 8 / 1024 AS user_objects_mb,
SUM(internal_object_reserved_page_count) * 8 / 1024 AS internal_objects_mb,
SUM(unallocated_extent_page_count) * 8 / 1024 AS free_space_mb
FROM tempdb.sys.dm_db_file_space_usage;Long transactions that can retain versions
SELECT
at.transaction_id,
at.name,
at.transaction_begin_time,
DATEDIFF(minute, at.transaction_begin_time, SYSUTCDATETIME()) AS age_minutes,
at.transaction_type,
at.transaction_state
FROM sys.dm_tran_active_transactions AS at
ORDER BY at.transaction_begin_time;Latches: internal lightweight synchronization
Locks protect logical data consistency. Latches protect internal memory structures while SQL Server is reading or changing them. Latch waits are not solved by changing isolation level. They often indicate hot pages, allocation contention, tempdb contention, or very high insert pressure on the same physical area.
Locks versus latches
| Mechanism | Protects | Typical wait |
|---|---|---|
| Lock | Logical data correctness | LCK_M_* |
| Latch | Internal memory page access | PAGELATCH_* |
| I/O latch | Waiting for page read from disk | PAGEIOLATCH_* |
Common latch scenarios
PAGELATCH on tempdb:
- temp table allocation contention
- heavy tempdb object creation
- allocation map contention
PAGELATCH on user database:
- hot last page insert
- sequential clustered key under extreme concurrency
- hot index page
- queue table hotspot
PAGEIOLATCH:
- waiting for page from disk
- storage latency
- memory pressure
- scan-heavy workloadLatch diagnosis mindset
Do not confuse:
LCK_M_* -> logical blocking
PAGELATCH_* -> in-memory page contention
PAGEIOLATCH_* -> waiting for disk I/O
Different waits, different fixes.Latch-related waits
SELECT TOP (30)
wait_type,
waiting_tasks_count,
wait_time_ms,
signal_wait_time_ms,
wait_time_ms - signal_wait_time_ms AS resource_wait_ms
FROM sys.dm_os_wait_stats
WHERE wait_type LIKE 'PAGELATCH%'
OR wait_type LIKE 'PAGEIOLATCH%'
ORDER BY wait_time_ms DESC;Possible fixes by latch type
| Wait | Possible cause | Possible fix |
|---|---|---|
| PAGELATCH_UP on tempdb | Allocation contention | Multiple tempdb files, reduce temp object churn |
| PAGELATCH_EX on user table | Hot insert page | Review key strategy, batching, partitioning, optimized feature options |
| PAGEIOLATCH_SH | Slow reads from disk | Index tuning, memory, storage latency review |
| PAGEIOLATCH_EX | Waiting for page for modification | Storage and write path review |
Optimized locking and modern concurrency improvements
Recent SQL Server versions introduce improvements that reduce lock memory, lock duration, and concurrency overhead in some workloads. These features can help, but they do not replace proper transaction design, correct indexing, short transactions and workload testing.
What modern locking improvements try to solve
- Reduce lock footprint: fewer locks held for some operations.
- Improve concurrency: less reader/writer or writer/writer interference in supported scenarios.
- Reduce memory pressure: fewer lock objects can reduce lock memory consumption.
- Improve high-throughput OLTP: especially when combined with good indexing and short transactions.
Important warning
Optimized locking is not a license to write bad transactions.
Still mandatory:
- short transactions
- good indexes
- consistent access order
- no user interaction inside transaction
- no massive unbatched updates
- deadlock monitoring
- realistic load testingConcurrency checklist for modern versions
Before relying on modern locking improvements:
1. Identify current blocking pattern
2. Capture wait stats
3. Capture deadlock graphs
4. Check isolation level
5. Check RCSI / SNAPSHOT usage
6. Check transaction duration
7. Check index access path
8. Check lock escalation
9. Check tempdb version store
10. Replay real workload after upgradeVersion-sensitive topics
| Topic | Why test? |
|---|---|
| Optimized locking | Effect depends on workload and database options |
| RCSI | Can reduce blocking but increase tempdb usage |
| SNAPSHOT | Can create update conflicts |
| Query plan changes | Different plan can change lock footprint |
| Parameter sniffing | Bad plan can scan and lock too much |
Before / after concurrency test plan
Test scenario:
100 concurrent sessions
80 percent reads
20 percent writes
realistic parameter distribution
same data volume as production
Measure:
- throughput
- average latency
- p95 latency
- p99 latency
- LCK_M waits
- PAGELATCH waits
- deadlock count
- tempdb version store
- transaction log latency
- CPU and memory grants
Decision:
keep change only if business workload improves.Concurrency DMV toolkit
Concurrency diagnosis must be factual. The DBA needs to identify who is waiting, who is blocking, which SQL text is involved, which locks are held, whether deadlocks are occurring, whether tempdb version store is growing, and whether waits point to locks, latches, I/O or worker starvation.
Active blocking
SELECT
r.session_id,
r.blocking_session_id,
r.wait_type,
r.wait_time,
r.wait_resource,
r.status,
r.command,
r.cpu_time,
r.total_elapsed_time,
DB_NAME(r.database_id) AS database_name,
t.text AS sql_text
FROM sys.dm_exec_requests AS r
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) AS t
WHERE r.blocking_session_id <> 0
ORDER BY r.wait_time DESC;Current locks
SELECT
request_session_id,
resource_type,
resource_database_id,
resource_description,
request_mode,
request_status,
request_owner_type
FROM sys.dm_tran_locks
ORDER BY request_session_id, resource_type, request_mode;Sessions with open transactions
SELECT
s.session_id,
s.login_name,
s.host_name,
s.program_name,
s.open_transaction_count,
r.status,
r.command,
r.wait_type,
r.total_elapsed_time
FROM sys.dm_exec_sessions AS s
LEFT JOIN sys.dm_exec_requests AS r
ON s.session_id = r.session_id
WHERE s.open_transaction_count > 0
ORDER BY s.open_transaction_count DESC;Wait stats for locks and latches
SELECT TOP (50)
wait_type,
waiting_tasks_count,
wait_time_ms,
signal_wait_time_ms,
wait_time_ms - signal_wait_time_ms AS resource_wait_ms
FROM sys.dm_os_wait_stats
WHERE wait_type LIKE 'LCK_M%'
OR wait_type LIKE 'PAGELATCH%'
OR wait_type LIKE 'PAGEIOLATCH%'
OR wait_type IN ('THREADPOOL', 'WRITELOG')
ORDER BY wait_time_ms DESC;tempdb version store
SELECT
SUM(version_store_reserved_page_count) * 8 / 1024 AS version_store_mb,
SUM(user_object_reserved_page_count) * 8 / 1024 AS user_objects_mb,
SUM(internal_object_reserved_page_count) * 8 / 1024 AS internal_objects_mb,
SUM(unallocated_extent_page_count) * 8 / 1024 AS free_space_mb
FROM tempdb.sys.dm_db_file_space_usage;Transaction log usage
SELECT
DB_NAME(database_id) AS database_name,
database_transaction_log_bytes_used / 1024 / 1024 AS log_used_mb,
database_transaction_log_bytes_reserved / 1024 / 1024 AS log_reserved_mb
FROM sys.dm_tran_database_transactions
ORDER BY log_used_mb DESC;Diagnostic workflow
Concurrency incident workflow:
1. Is there blocking?
-> find blocker and blocked sessions
2. Is there a deadlock?
-> collect deadlock graph
3. Is it lock or latch?
-> LCK_M versus PAGELATCH versus PAGEIOLATCH
4. Is tempdb involved?
-> version store, spills, temp objects
5. Is the transaction long?
-> open_transaction_count and transaction age
6. Is the plan scanning too much?
-> actual plan and logical reads
7. Is an index missing?
-> access path and predicate review
8. Is application behavior wrong?
-> user interaction, remote call, retry logic
9. Is isolation appropriate?
-> READ COMMITTED, RCSI, SNAPSHOT, SERIALIZABLE
10. Is the fix code, index, isolation, batch size or architecture?DBA checklist for transactions and locks
This checklist is designed for production concurrency review. It helps identify blocking, deadlocks, long transactions, incorrect isolation, missing indexes, tempdb row versioning pressure and application-level transaction mistakes.
| Area | Question | Good signal | Bad signal |
|---|---|---|---|
| Transaction scope | Are transactions short? | Only necessary writes inside transaction | User input or remote calls inside transaction |
| Access order | Do procedures update objects in consistent order? | Standard order documented | Different flows update same tables in opposite order |
| Indexes | Do updates use targeted access paths? | Seeks or justified scans | Missing index causes broad locking scan |
| Isolation | Is isolation level appropriate? | Business correctness and concurrency balanced | NOLOCK everywhere or SERIALIZABLE by default |
| RCSI | Is row versioning monitored? | tempdb version store under control | tempdb grows unexpectedly |
| Deadlocks | Are deadlock graphs collected? | Graph analysis and fix tracking | Only retry forever without diagnosis |
| Blocking | Can root blocker be identified quickly? | Blocking report and alerting | Manual guessing during incident |
| Latches | Are latch waits separated from lock waits? | PAGELATCH and LCK_M diagnosed differently | Wrong fix applied to wrong wait type |
| Batching | Are large modifications batched? | Controlled chunks and pauses | Millions of rows modified in one transaction |
| Application retry | Does app retry deadlock victims? | Limited retry with backoff and logging | No retry or infinite retry loop |
Daily checks
Daily:
1. Top blocking sessions
2. Deadlock count
3. Long open transactions
4. LCK_M waits
5. PAGELATCH waits
6. tempdb version store size
7. Failed jobs caused by blocking
8. Query Store regressions
9. Log reuse wait
10. Application timeout reportsDesign review checklist
Before deploying write-heavy code:
1. Transaction scope reviewed
2. Access order documented
3. Required indexes present
4. Isolation level justified
5. Deadlock retry implemented
6. Batch size tested
7. Lock footprint measured
8. tempdb impact tested
9. Rollback duration considered
10. Monitoring addedFinal concurrency diagnosis model
A serious SQL Server concurrency diagnosis answers:
1. Is the problem blocking, deadlock, latch or I/O?
2. Which session is the root blocker?
3. Which statement holds the lock?
4. How long has the transaction been open?
5. What isolation level is used?
6. Does the query have a good index?
7. Is the plan scanning too much data?
8. Is row versioning enabled?
9. Is tempdb version store healthy?
10. Is application retry logic correct?
11. Can access order be standardized?
12. Can the workload be batched or redesigned?HA / DR: two different problems
High Availability and Disaster Recovery are often confused. HA is about keeping service available when a local component fails: node crash, SQL Server service failure, planned patch, storage path failure, or local maintenance. DR is about surviving a larger disaster: datacenter loss, regional outage, ransomware, destructive deployment, logical corruption or accidental mass deletion.
A SQL Server HA / DR architecture is serious only when it connects business objectives to technical mechanisms: RPO, RTO, failover type, data loss tolerance, distance, latency, licensing, monitoring, runbooks, restore tests and application reconnection behavior.
Core principles
- HA is not backup: a highly available corrupted database is still corrupted.
- DR is not only replication: DR must handle human error, ransomware, bad deployment and logical corruption.
- RPO defines data loss: how much committed data the business can lose.
- RTO defines downtime: how long the service can be unavailable.
- Failover must be tested: architecture diagrams do not prove recovery.
- Applications matter: connection strings, retry logic and DNS/listener behavior decide real recovery time.
- Backups must be restored: backup success is not equal to recovery success.
HA / DR architecture map
Business requirement
|
+-- RPO: acceptable data loss
+-- RTO: acceptable downtime
+-- SLA: expected availability
+-- budget and licensing
|
v
SQL Server options
|
+-- Always On Availability Groups
| +-- local synchronous HA
| +-- remote asynchronous DR
| +-- readable secondary
|
+-- Failover Cluster Instance
| +-- instance-level HA
| +-- shared storage dependency
|
+-- Log Shipping
| +-- simple DR
| +-- backup / copy / restore chain
|
+-- Backup and restore
+-- full backup
+-- differential backup
+-- transaction log backup
+-- point-in-time restoreDBA translation
Question:
"Are we protected?"
DBA answer must include:
1. protected against what?
2. acceptable data loss?
3. acceptable downtime?
4. last successful backup?
5. last successful restore test?
6. last failover test?
7. application reconnect tested?
8. jobs/logins/linked servers synchronized?
9. monitoring and alerting active?
10. documented rollback and escalation?HA / DR solution map
| Technology | Protects against | Strength | Weakness |
|---|---|---|---|
| Always On Availability Groups | Database-level replica failure, node failure, DR site | Flexible HA/DR, sync/async, listener, readable secondary | More complex governance, database-level scope |
| Failover Cluster Instance | SQL Server instance/node failure | Protects whole instance, simpler for apps | Shared storage remains critical dependency |
| Log Shipping | Database loss, site-level DR, delayed copy | Simple, robust, cheap, predictable | Manual failover, higher RTO/RPO |
| Backup / restore | Logical corruption, human error, ransomware, point-in-time recovery | Essential recovery foundation | Recovery time depends on tested restore process |
| Storage replication | Storage-level failure or site replication | Can protect many systems | Can replicate corruption and does not replace SQL-aware restore |
RPO and RTO: the business contract
RPO and RTO are not technical decoration. They are the business contract that decides the architecture. A system with RPO near zero and RTO under minutes needs very different design, budget, automation and testing from a system where several hours of downtime are acceptable.
Definitions
| Metric | Question | Example |
|---|---|---|
| RPO | How much data can we lose? | 0 seconds, 5 minutes, 1 hour |
| RTO | How long can the service be down? | 30 seconds, 5 minutes, 2 hours |
| SLA | What availability do we promise? | 99.9%, 99.95%, 99.99% |
| MTTR | How long do we really take to recover? | Measured during drills |
| MTBF | How often do failures happen? | Reliability trend over time |
RPO / RTO diagram
Timeline:
T0 ---------------- T1 ---------------- T2
| | |
last protected failure service restored
transaction occurs
RPO:
data gap between last protected point and failure
RTO:
time between failure and restored service
Example:
last log backup: 10:55
failure: 11:00
service back: 11:12
RPO = 5 minutes
RTO = 12 minutesBusiness categories
Tier 0:
payment, orders, critical production
RPO: near zero
RTO: seconds/minutes
Tier 1:
operational business apps
RPO: minutes
RTO: minutes/hour
Tier 2:
reporting, internal tools
RPO: hours
RTO: hours
Tier 3:
archive, non-critical
RPO: day
RTO: day or moreRPO / RTO solution table
| Requirement | Typical solution | Comment | Risk |
|---|---|---|---|
| RPO = 0 or near 0 | Synchronous AG or FCI | Requires low latency and serious monitoring | Sync commit can affect transaction latency |
| RTO under 1 minute | Automatic failover with AG/FCI | Application reconnect must be tested | Failover can still be blocked by dependencies |
| Regional DR | Asynchronous AG or log shipping | Distance usually implies async | Some data loss possible |
| Logical corruption recovery | Point-in-time restore from backups | HA does not solve bad DELETE or ransomware | Restore time can exceed business expectation |
| Cheap DR | Log shipping | Simple and robust | Manual failover and higher RTO |
RPO / RTO interview checklist
Ask the business:
1. How many minutes of data loss are acceptable?
2. How long can the app be unavailable?
3. Is downtime allowed during working hours?
4. Is read-only mode acceptable during incident?
5. Which applications are Tier 0?
6. Which databases must fail over together?
7. Are external systems dependent on this database?
8. Who approves failover?
9. Who communicates to users?
10. How often must we test failover and restore?Always On Availability Groups
Always On Availability Groups provide database-level high availability and disaster recovery. A primary replica sends transaction log records to secondary replicas. Depending on commit mode, secondaries can be synchronous for local HA or asynchronous for remote DR.
Main components
- Availability Group: logical group of databases that fail over together.
- Primary replica: accepts read/write workload.
- Secondary replica: receives log blocks and hardens/replays them.
- Availability database: database participating in the AG.
- Listener: virtual network name used by applications.
- Endpoint: communication channel between replicas.
- Commit mode: synchronous or asynchronous.
- Failover mode: automatic or manual depending on design.
AG architecture
Application
|
v
AG Listener
|
v
Primary Replica
|
+-- DB_A
+-- DB_B
+-- DB_C
|
| synchronous commit
v
Secondary Replica - local HA
|
| asynchronous commit
v
Secondary Replica - remote DR
Optional:
readable secondary
backup preference
read-only routingCommit flow simplified
Transaction on primary
|
v
Log record generated
|
v
Primary hardens log locally
|
+-- sync replica:
| waits for secondary harden
|
+-- async replica:
does not wait for remote harden
Commit acknowledged based on commit mode.Always On AG advantages and warnings
| Topic | Value | Warning |
|---|---|---|
| Database-level failover | Fail over selected databases together | Instance-level objects are not automatically included |
| Synchronous commit | Low or zero data loss local HA | Latency affects transaction commits |
| Asynchronous commit | Good for remote DR | Data loss possible during disaster |
| Readable secondary | Offload reporting workloads | Licensing, stale reads, routing complexity |
| Backups on secondary | Reduce primary backup load | Backup chain and preference must be governed |
| Listener | Application endpoint abstraction | Connection retry logic still required |
AG monitoring query
SELECT
ag.name AS availability_group_name,
ar.replica_server_name,
ars.role_desc,
ars.synchronization_health_desc,
ar.availability_mode_desc,
ar.failover_mode_desc,
ars.connected_state_desc,
ars.operational_state_desc
FROM sys.availability_groups AS ag
JOIN sys.availability_replicas AS ar
ON ag.group_id = ar.group_id
JOIN sys.dm_hadr_availability_replica_states AS ars
ON ar.replica_id = ars.replica_id
ORDER BY ag.name, ar.replica_server_name;Synchronous versus asynchronous commit
Commit mode is one of the most important design choices. Synchronous commit can reduce or eliminate data loss, but adds latency because the primary waits for the synchronous secondary to harden log records. Asynchronous commit is better for long distance DR, but the remote replica can lag behind.
| Mode | Primary waits? | Data loss risk | Best fit |
|---|---|---|---|
| Synchronous commit | Yes, waits for secondary harden | Very low / near zero if healthy | Same datacenter or low-latency metro HA |
| Asynchronous commit | No | Possible if primary lost before replica catches up | Remote DR, long distance, high latency links |
Commit latency model
Synchronous commit latency includes:
- primary log write
- network send
- secondary log harden
- acknowledgement back to primary
If network or secondary log disk is slow:
primary transactions slow down.Topology examples
Local HA:
Primary A
|
| synchronous commit
v
Secondary B
same site or low-latency zone
Remote DR:
Primary A
|
| asynchronous commit
v
Secondary C
remote region or DR site
Hybrid:
A -> B synchronous for HA
A -> C asynchronous for DRReplica lag concepts
Important measurements:
- log send queue
- redo queue
- synchronization state
- synchronization health
- last hardened LSN
- last redone LSN
- redo rate
- send rate
Large queue:
secondary is behind
RPO risk increases
failover may take longerAG database synchronization status
SELECT
DB_NAME(drs.database_id) AS database_name,
ar.replica_server_name,
drs.synchronization_state_desc,
drs.synchronization_health_desc,
drs.database_state_desc,
drs.is_suspended,
drs.suspend_reason_desc,
drs.log_send_queue_size,
drs.redo_queue_size,
drs.redo_rate,
drs.log_send_rate
FROM sys.dm_hadr_database_replica_states AS drs
JOIN sys.availability_replicas AS ar
ON drs.replica_id = ar.replica_id
ORDER BY database_name, ar.replica_server_name;Mode decision table
| Business need | Recommended mode | Reason |
|---|---|---|
| Zero or near-zero data loss locally | Synchronous | Primary waits for secondary harden |
| Remote DR across long distance | Asynchronous | Distance latency too high for sync commits |
| Critical trading/payment system | Synchronous local HA plus async DR | Balances local RPO with remote survival |
| Reporting secondary | Depends on latency tolerance | Readable secondary may be slightly behind |
AG Listener: the application entry point
The listener is a virtual network name and IP endpoint that allows applications to connect to the current primary replica without hardcoding a physical server name. It is one of the key pieces that makes failover manageable from the application point of view.
Listener responsibilities
- Abstract primary replica: application connects to listener, not server node.
- Support failover: listener moves with the AG role.
- Support read-only routing: direct read-only connections to readable secondaries.
- Reduce manual changes: no connection string rewrite during failover.
- Require client retry logic: active connections can still break during failover.
Listener diagram
Application connection string
|
v
tcp:SalesAGListener,1433
|
v
Current primary replica
|
+-- before failover: SQLNODE01
|
+-- after failover: SQLNODE02
Application should:
- use listener name
- set connection timeout properly
- retry transient failures
- avoid hardcoded node namesConnection string ideas
Server=SalesAGListener,1433;
Database=SalesDB;
Integrated Security=True;
MultiSubnetFailover=True;
ApplicationIntent=ReadWrite;
For readable secondary:
ApplicationIntent=ReadOnly;Application behavior during failover
| Application behavior | Result during failover | Recommendation |
|---|---|---|
| Hardcoded primary node | App fails after failover | Use listener |
| No retry logic | Transient errors become user-visible incidents | Add retry with backoff |
| Long transactions during failover | Rollback or broken connection | Keep transactions short |
| Read-only routing configured | Reporting can be offloaded | Validate stale-read tolerance |
| MultiSubnetFailover missing | Slow reconnection in multi-subnet designs | Enable when appropriate |
Listener inventory
SELECT
ag.name AS availability_group_name,
l.dns_name,
l.port,
ip.ip_address,
ip.subnet_mask,
ip.network_subnet_ip
FROM sys.availability_group_listeners AS l
JOIN sys.availability_groups AS ag
ON l.group_id = ag.group_id
LEFT JOIN sys.availability_group_listener_ip_addresses AS ip
ON l.listener_id = ip.listener_id
ORDER BY ag.name, l.dns_name;Failover Cluster Instance
A Failover Cluster Instance protects the SQL Server instance as a whole. The instance can fail over between cluster nodes, usually with shared storage. From the application viewpoint, the SQL Server instance name remains stable while the active node changes.
FCI architecture
Windows Server Failover Cluster
|
+-- Node 1
| +-- SQL Server active
|
+-- Node 2
| +-- SQL Server passive
|
+-- Shared storage
+-- data files
+-- log files
+-- system databases
Failover:
SQL Server stops on Node 1
shared storage moves
SQL Server starts on Node 2FCI versus AG
| Topic | FCI | AG |
|---|---|---|
| Scope | Instance-level | Database-level |
| Storage | Shared storage | Separate storage per replica |
| Readable secondary | No passive readable copy | Possible |
| DR distance | Depends on storage design | Strong async DR option |
| Instance objects | Move with instance | Must synchronize outside AG |
FCI strengths
- Protects the entire SQL Server instance.
- Application connection model can be simple.
- Good for applications expecting instance-level continuity.
- SQL Agent jobs and instance-level objects remain with the instance.
FCI warnings
| Warning | Why it matters | Mitigation |
|---|---|---|
| Shared storage dependency | Storage failure can affect whole instance | Enterprise storage, redundancy, storage monitoring |
| Failover restart time | SQL Server service must start on another node | Measure real failover time |
| No readable passive node | Cannot offload reporting like readable AG secondary | Use AG, replica or reporting architecture if needed |
| Patch and cluster complexity | Windows cluster health is critical | Run cluster validation and patch runbooks |
FCI fit decision
FCI is a good fit when:
- instance-level protection is required
- shared storage is already enterprise-grade
- application expects one SQL instance
- readable secondary is not required
- local HA is the main objective
AG is usually better when:
- database-level DR is required
- remote async replica is required
- readable secondary is valuable
- storage independence is desiredLog Shipping: simple, robust DR
Log shipping is one of the most reliable and understandable SQL Server DR patterns. The primary database takes transaction log backups. Those backups are copied to a secondary server and restored there with NORECOVERY or STANDBY mode.
Log shipping pipeline
Primary database
|
+-- Full backup initializes secondary
|
+-- Transaction log backup job
|
v
Backup share
|
v
Copy job on secondary
|
v
Restore job on secondary
|
v
Secondary database
- restoring mode
- or standby read-only modeWhy log shipping still matters
- Simple: backup, copy, restore.
- Robust: less moving parts than complex HA architectures.
- Cheap: often lower license and infrastructure complexity.
- Delayed restore possible: useful against accidental delete or bad deployment.
- Good for DR: especially when RTO/RPO are not ultra-aggressive.
Limitations
- Failover is usually manual.
- RPO depends on log backup frequency and copy/restore delay.
- RTO includes tail-log backup, restore final logs, application redirection.
- Secondary is not continuously writable.
Log shipping jobs
| Job | Runs where | Purpose | Failure impact |
|---|---|---|---|
| Backup job | Primary | Create transaction log backups | RPO increases, log may grow |
| Copy job | Secondary | Copy log backups from primary/share | Secondary falls behind |
| Restore job | Secondary | Restore copied log backups | Secondary not current |
| Monitor job | Monitor server or SQL Agent | Detect delay and failures | Silent DR drift |
Log shipping failover runbook
Manual log shipping failover:
1. Confirm primary is unavailable or failover approved
2. If primary is reachable, take tail-log backup
3. Copy all remaining log backups to secondary
4. Restore all logs WITH NORECOVERY
5. Restore final log WITH RECOVERY
6. Validate database consistency
7. Redirect application connection
8. Validate application transactions
9. Disable old primary jobs
10. Document new primary state and rebuild DR pathBackup and restore: the foundation of all recovery
No HA architecture replaces backups. Backups are the only reliable answer to many incidents: accidental delete, bad deployment, corruption, ransomware, logical error, data poisoning, application bug or delayed discovery of a problem.
Backup types
| Backup type | Purpose | Typical frequency |
|---|---|---|
| Full | Complete database backup | Daily or weekly depending on size |
| Differential | Changes since last full backup | Hourly or several times per day |
| Transaction log | Log chain and point-in-time recovery | Every 5, 10, 15 minutes depending on RPO |
| Copy-only | Ad hoc backup without disrupting backup chain | Before risky maintenance or migration |
| Tail-log | Final log backup before restore/failover | During disaster or migration |
Restore chain example
Restore sequence for point-in-time recovery:
1. Restore full backup WITH NORECOVERY
2. Restore latest differential WITH NORECOVERY
3. Restore log backups in order WITH NORECOVERY
4. Restore final log to target time WITH RECOVERY
Example:
full: Sunday 00:00
diff: Monday 12:00
logs: every 15 minutes
restore target: Monday 14:37Restore commands
RESTORE DATABASE SalesDB
FROM DISK = 'B:\backup\SalesDB_full.bak'
WITH NORECOVERY;
RESTORE DATABASE SalesDB
FROM DISK = 'B:\backup\SalesDB_diff.bak'
WITH NORECOVERY;
RESTORE LOG SalesDB
FROM DISK = 'B:\backup\SalesDB_log_1430.trn'
WITH NORECOVERY;
RESTORE LOG SalesDB
FROM DISK = 'B:\backup\SalesDB_log_1445.trn'
WITH STOPAT = '2026-05-06T14:37:00',
RECOVERY;Backup strategy by RPO
| RPO target | Backup pattern | Warning |
|---|---|---|
| 24 hours | Daily full backup | Large data loss possible |
| 4 hours | Daily full plus differential every few hours | No point-in-time unless log backups exist |
| 15 minutes | Full plus differential plus log backups every 15 minutes | Requires full recovery model and log chain monitoring |
| Near zero | HA sync plus log backups | HA still does not replace backups |
Backup history query
SELECT TOP (100)
database_name,
type AS backup_type,
backup_start_date,
backup_finish_date,
DATEDIFF(second, backup_start_date, backup_finish_date) AS duration_sec,
backup_size / 1024 / 1024 AS backup_size_mb,
compressed_backup_size / 1024 / 1024 AS compressed_backup_size_mb
FROM msdb.dbo.backupset
ORDER BY backup_finish_date DESC;Last backup per database
SELECT
d.name AS database_name,
MAX(CASE WHEN b.type = 'D' THEN b.backup_finish_date END) AS last_full_backup,
MAX(CASE WHEN b.type = 'I' THEN b.backup_finish_date END) AS last_diff_backup,
MAX(CASE WHEN b.type = 'L' THEN b.backup_finish_date END) AS last_log_backup
FROM sys.databases AS d
LEFT JOIN msdb.dbo.backupset AS b
ON d.name = b.database_name
GROUP BY d.name
ORDER BY d.name;DR scenarios: design for real disasters, not only server crashes
A serious DR plan must cover multiple failure classes. A local node crash is not the same as storage corruption, ransomware, accidental delete, regional outage, identity failure or a bad application deployment. Each scenario needs a different response.
| Scenario | Impact | Best response | What not to do |
|---|---|---|---|
| SQL Server service crash | Local outage | FCI or AG failover | Restore backups unnecessarily |
| Primary node hardware failure | Instance unavailable | Automatic or manual failover | Manually modify apps if listener exists |
| Storage corruption | Data may be damaged | DBCC, restore, page restore, backup strategy | Blind failover if corruption replicated |
| Accidental DELETE | Logical data loss | Point-in-time restore or delayed log shipping | Fail over to replica that already received delete |
| Ransomware | Encrypted or poisoned environment | Isolated immutable/offline backups | Trust online replicas blindly |
| Region outage | Site unavailable | Remote DR replica or restore in DR site | Assume local HA is enough |
| Bad deployment | Schema/data damaged | Rollback scripts, PITR, deployment gates | Rely only on AG failover |
Logical corruption timeline
10:00 deployment starts
10:05 bad script deletes rows
10:06 AG replicates delete
10:10 users report missing data
Failover result:
delete already exists on secondary
Real recovery:
restore backup to 10:04
extract missing data
or point-in-time restore
or use delayed log shipping copyRansomware recovery mindset
Required protections:
- immutable backups
- offline backup copy
- separate credentials
- backup encryption
- restore test in isolated network
- documented clean-room recovery
- identity recovery plan
- application secrets recovery
- DNS and network recovery
- communication planScenario decision tree
Incident detected
|
+-- Is data logically wrong?
| |
| +-- yes --> do not blindly fail over
| use PITR / delayed copy / data repair
|
+-- Is primary node down but data healthy?
| |
| +-- yes --> AG or FCI failover
|
+-- Is whole site down?
| |
| +-- yes --> remote DR activation
|
+-- Is ransomware suspected?
|
+-- yes --> isolate, preserve evidence,
restore from clean immutable backupFailover runbook: planned AG failover
Planned AG failover runbook:
1. Announce maintenance window
2. Confirm replicas are healthy
3. Confirm synchronous replica is synchronized
4. Pause non-critical SQL Agent jobs
5. Check application connection string uses listener
6. Run pre-failover smoke test
7. Perform planned failover
8. Confirm new primary role
9. Confirm databases are online
10. Test application login and critical transaction
11. Check SQL Agent jobs on new primary
12. Check linked servers and external dependencies
13. Monitor waits, errors and AG health
14. Document real RTO
15. Communicate completionEmergency DR activation runbook
Emergency DR runbook:
1. Declare incident severity
2. Confirm primary site status
3. Determine data health
4. Estimate data loss
5. Get business approval if data loss possible
6. Promote DR replica or restore backups
7. Redirect application traffic
8. Validate identity, network, DNS and secrets
9. Execute smoke tests
10. Resume critical jobs only
11. Monitor error rates and performance
12. Communicate new operating mode
13. Preserve evidence and logs
14. Plan failback or rebuild primary
15. Produce incident reportRestore test runbook
Monthly restore test:
1. Select production database sample
2. Restore latest full backup to isolated test server
3. Restore latest differential if used
4. Restore log backups to chosen point in time
5. Run DBCC CHECKDB
6. Validate row counts for key tables
7. Validate application connection
8. Measure total restore duration
9. Compare measured RTO with target RTO
10. Document result and corrective actionsRunbook quality table
| Runbook quality | Bad | Good |
|---|---|---|
| Ownership | Nobody knows who decides | Named incident commander and DBA owner |
| Commands | Vague instructions | Exact commands and validation queries |
| Application tests | Database online only | Business transaction tested |
| Timing | No measured RTO | Start/end times recorded |
| Rollback | No failback plan | Failback/rebuild path documented |
Smoke test examples
After failover, validate:
Database:
- SELECT @@SERVERNAME;
- SELECT DB_NAME();
- check AG role
- check database state
Application:
- user login
- read customer
- create test order
- update test status
- rollback or clean test data
Operations:
- SQL Agent jobs
- linked servers
- file shares
- credentials
- external APIs
- monitoring alertsHA / DR DMV toolkit
HA / DR monitoring must be continuous. The DBA needs visibility on replica role, synchronization health, send queue, redo queue, database state, suspended movement, backup recency, log reuse wait and SQL Agent job failures.
AG replica health
SELECT
ag.name AS availability_group_name,
ar.replica_server_name,
ars.role_desc,
ars.synchronization_health_desc,
ars.connected_state_desc,
ars.operational_state_desc,
ar.availability_mode_desc,
ar.failover_mode_desc
FROM sys.availability_groups AS ag
JOIN sys.availability_replicas AS ar
ON ag.group_id = ar.group_id
JOIN sys.dm_hadr_availability_replica_states AS ars
ON ar.replica_id = ars.replica_id
ORDER BY ag.name, ar.replica_server_name;AG database queues
SELECT
DB_NAME(drs.database_id) AS database_name,
ar.replica_server_name,
drs.synchronization_state_desc,
drs.synchronization_health_desc,
drs.database_state_desc,
drs.is_suspended,
drs.suspend_reason_desc,
drs.log_send_queue_size,
drs.redo_queue_size,
drs.log_send_rate,
drs.redo_rate
FROM sys.dm_hadr_database_replica_states AS drs
JOIN sys.availability_replicas AS ar
ON drs.replica_id = ar.replica_id
ORDER BY database_name, ar.replica_server_name;Last backups
SELECT
d.name AS database_name,
d.recovery_model_desc,
MAX(CASE WHEN b.type = 'D' THEN b.backup_finish_date END) AS last_full_backup,
MAX(CASE WHEN b.type = 'I' THEN b.backup_finish_date END) AS last_diff_backup,
MAX(CASE WHEN b.type = 'L' THEN b.backup_finish_date END) AS last_log_backup
FROM sys.databases AS d
LEFT JOIN msdb.dbo.backupset AS b
ON d.name = b.database_name
GROUP BY d.name, d.recovery_model_desc
ORDER BY d.name;Log reuse wait
SELECT
name AS database_name,
recovery_model_desc,
log_reuse_wait_desc
FROM sys.databases
ORDER BY name;SQL Agent failed jobs
SELECT TOP (50)
j.name AS job_name,
h.run_date,
h.run_time,
h.run_duration,
h.message
FROM msdb.dbo.sysjobhistory AS h
JOIN msdb.dbo.sysjobs AS j
ON h.job_id = j.job_id
WHERE h.run_status = 0
ORDER BY h.instance_id DESC;HA / DR diagnostic workflow
When HA / DR alert fires:
1. Is the primary online?
2. Are replicas connected?
3. Is synchronization healthy?
4. Is log send queue growing?
5. Is redo queue growing?
6. Is data movement suspended?
7. Are log backups running?
8. Are SQL Agent jobs failing?
9. Is listener reachable?
10. Is application able to connect?
11. Does RPO target remain valid?
12. Does RTO target remain achievable?Monitoring thresholds example
| Signal | Warning | Critical |
|---|---|---|
| Last log backup age | Above RPO target | Several intervals missed |
| Log send queue | Growing steadily | RPO at risk |
| Redo queue | Secondary lag increasing | Failover time at risk |
| Replica disconnected | Short transient disconnect | Replica unavailable |
| Failed backup job | One failure | Backup chain at risk |
DBA checklist for SQL Server HA / DR
This checklist is designed for production readiness reviews. It verifies that the architecture is not only installed, but actually recoverable, monitored, documented and understood by operations and application teams.
| Area | Question | Good signal | Bad signal |
|---|---|---|---|
| RPO | Is data loss target documented? | RPO per application tier | “As little as possible” |
| RTO | Is recovery time measured? | Recent failover or restore drill result | Architecture diagram only |
| AG health | Are replicas synchronized as expected? | Healthy sync state and monitored queues | Unknown lag or suspended data movement |
| Listener | Do apps connect through listener? | Connection strings validated | Hardcoded node names |
| Backups | Are backups restored regularly? | Restore test report | Backup success only |
| Log chain | Are log backups aligned with RPO? | Log backup schedule and alerts | Full recovery with no log backups |
| Instance objects | Are logins, jobs, credentials synchronized? | Automated sync or checklist | Failover succeeds but jobs/logins fail |
| Application retry | Does app survive transient failover errors? | Retry and reconnect tested | Manual restart required |
| DR site | Can DR run the workload? | Capacity and dependencies validated | Replica exists but app stack missing |
| Security | Are credentials and keys recoverable? | Documented secure recovery | Backup restored but encrypted data inaccessible |
Daily checks
Daily:
1. Replica health
2. Database synchronization state
3. Log send queue
4. Redo queue
5. Failed SQL Agent jobs
6. Last full/diff/log backup
7. Log reuse wait
8. Listener availability
9. Error log HA messages
10. Monitoring alert statusMonthly checks
Monthly:
1. Restore test
2. Planned failover test
3. Application reconnect test
4. SQL Agent job validation after failover
5. Login and credential sync review
6. Linked server validation
7. RPO/RTO report
8. DR capacity review
9. Backup retention review
10. Runbook updateFinal HA / DR diagnosis model
A serious HA / DR review answers:
1. What failure are we protecting against?
2. What is the business RPO?
3. What is the business RTO?
4. Which technology covers local HA?
5. Which technology covers remote DR?
6. Which technology covers logical corruption?
7. Are backups restorable?
8. Are applications using the right endpoint?
9. Are logins, jobs and credentials synchronized?
10. Was failover tested recently?
11. Was restore tested recently?
12. Is the measured RTO inside the target?