Paul Ou Yang: Anatomy of a SQL Server CDC Cleanup Job (and How to Keep It Healthy)

Change Data Capture (CDC) is great—until it isn’t. Under heavy change volume, the CDC cleanup job may fall behind. When that happens, Change Tables (CTs) grow rapidly, and the cleanup job can block the capture job, which can in turn delay or stall downstream consumers.

This post explains: how to quickly estimate CT growth, how the cleanup job actually deletes rows, and the three main tuning knobs to keep CDC stable.

Quick Health Check: Approximate CT Row Counts

To get a fast row-count estimate of each CDC Change Table:

SELECT
c.object_id,
t.name,
p.rows
FROM cdc.change_tables c
JOIN sys.tables t
ON c.object_id = t.object_id
JOIN sys.partitions p
ON t.object_id = p.object_id
WHERE p.index_id IN (0,1)
ORDER BY c.object_id;

Example output:

object_id     name                   rows
-----------   ---------------------  --------
82099333      dbo_customer_CT         167702
98099390      dbo_district_CT         270010
114099447     dbo_item_CT             0
370100359     dbo_new_order_CT        134634
386100416     dbo_warehouse_CT        135522
562101043     dbo_order_line_CT       2018297
578101100     dbo_stock_CT            1343322
1973582069    dbo_orders_CT           191732
--------------------------------------------
Total                                 4,261,219

CDC Cleanup Defaults (Retention + Threshold)

CDC cleanup behavior is driven mainly by:

Retention (how long CT rows are kept)
Threshold (rows deleted per batch)

Check current settings:

EXEC sys.sp_cdc_help_jobs;

Example output:

job_type  job_name           retention  threshold
--------  -----------------  ---------  ---------
capture   cdc.tpcc_capture          0          0
cleanup   cdc.tpcc_cleanup       4320       5000

Defaults: Retention = 4320 minutes (72 hours / 3 days)

Threshold = 5000 rows per delete batch

What the Cleanup Job Actually Does

Extended Events typically reveals that the cleanup job processes CTs sequentially via a cursor:

DECLARE #hchange_table CURSOR LOCAL FAST_FORWARD
FOR
SELECT capture_instance, start_lsn
FROM [cdc].[change_tables]
WHERE (@capture_instance IS NULL)
OR (capture_instance = @capture_instance);

Because cdc.change_tables is clustered by object_id, it tends to delete CTs in object_id order. With @p1 = 5000 (threshold), you’ll often see patterns like:

DELETE TOP (@p1) FROM [cdc].[dbo_customer_CT]    WHERE __$start_lsn @p2 (35 times)
DELETE TOP (@p1) FROM [cdc].[dbo_district_CT]    WHERE __$start_lsn @p2 (56 times)
...
DELETE TOP (@p1) FROM [cdc].[dbo_stock_CT]       WHERE __$start_lsn @p2 (323 times)

Large tables can dominate runtime and prevent cleanup from ever catching up.

The Three Tuning Knobs

1) Adjust the Threshold

Higher threshold = deletes more per batch (often more efficient), but can increase contention.

Lower threshold = smaller deletes, but more loops and potentially longer runtime.

Example:

EXEC sys.sp_cdc_change_job
@job_type  = N'cleanup',
@threshold = 2000;

2) Reduce Retention

Reducing retention means less data to keep, making cleanup easier. But consumers must be able to ingest changes within the retention window. Example (2160 minutes = 36 hours):

EXEC sys.sp_cdc_change_job
@job_type = N'cleanup',
@retention  = 2160;

3) Run Cleanup More Frequently

First, find the schedule ID:

USE msdb;
GO
SELECT j.name AS job_name,
s.schedule_id,
s.name AS schedule_name
FROM dbo.sysjobs j
JOIN dbo.sysjobschedules js
ON j.job_id = js.job_id
JOIN dbo.sysschedules s
ON js.schedule_id = s.schedule_id
WHERE j.name = N'cdc.tpcc_cleanup';

Then update it (e.g., every 15 minutes):

USE msdb;
GO
EXEC dbo.sp_update_schedule
@schedule_id = 171,      -- from query above
@enabled = 1,
@freq_type = 4,          -- daily
@freq_interval = 1,
@freq_subday_type = 4,   -- minutes
@freq_subday_interval = 15,
@active_start_time = 000000; -- midnight

Last Resort: Truncating CT Tables (High Risk)

If cleanup still can’t keep up, truncating CT tables may be the fastest recovery path—but it irreversibly deletes change history.

Safe sequence:

Stop the capture job
Ensure consumers ingest remaining changes
Truncate CT tables
Restart capture job

Warnings:

You may create data gaps for downstream consumers.
Consumers may need a full reload/re-baseline after truncation.
Do this only with stakeholder approval and a clear recovery plan.

Paul Ou Yang

Pages

Wednesday, March 4, 2026

Anatomy of a SQL Server CDC Cleanup Job (and How to Keep It Healthy)