Paul Ou Yang

Wednesday, March 4, 2026

Anatomy of a SQL Server CDC Cleanup Job (and How to Keep It Healthy)

Change Data Capture (CDC) is great—until it isn’t. Under heavy change volume, the CDC cleanup job may fall behind. When that happens, Change Tables (CTs) grow rapidly, and the cleanup job can block the capture job, which can in turn delay or stall downstream consumers.

This post explains: how to quickly estimate CT growth, how the cleanup job actually deletes rows, and the three main tuning knobs to keep CDC stable.

Quick Health Check: Approximate CT Row Counts

To get a fast row-count estimate of each CDC Change Table:

SELECT
c.object_id,
t.name,
p.rows
FROM cdc.change_tables c
JOIN sys.tables t
ON c.object_id = t.object_id
JOIN sys.partitions p
ON t.object_id = p.object_id
WHERE p.index_id IN (0,1)
ORDER BY c.object_id;

Example output:

object_id     name                   rows
-----------   ---------------------  --------
82099333      dbo_customer_CT         167702
98099390      dbo_district_CT         270010
114099447     dbo_item_CT             0
370100359     dbo_new_order_CT        134634
386100416     dbo_warehouse_CT        135522
562101043     dbo_order_line_CT       2018297
578101100     dbo_stock_CT            1343322
1973582069    dbo_orders_CT           191732
--------------------------------------------
Total                                 4,261,219

CDC Cleanup Defaults (Retention + Threshold)

CDC cleanup behavior is driven mainly by:

Retention (how long CT rows are kept)
Threshold (rows deleted per batch)

Check current settings:

EXEC sys.sp_cdc_help_jobs;

Example output:

job_type  job_name           retention  threshold
--------  -----------------  ---------  ---------
capture   cdc.tpcc_capture          0          0
cleanup   cdc.tpcc_cleanup       4320       5000

Defaults: Retention = 4320 minutes (72 hours / 3 days)

Threshold = 5000 rows per delete batch

What the Cleanup Job Actually Does

Extended Events typically reveals that the cleanup job processes CTs sequentially via a cursor:

DECLARE #hchange_table CURSOR LOCAL FAST_FORWARD
FOR
SELECT capture_instance, start_lsn
FROM [cdc].[change_tables]
WHERE (@capture_instance IS NULL)
OR (capture_instance = @capture_instance);

Because cdc.change_tables is clustered by object_id, it tends to delete CTs in object_id order. With @p1 = 5000 (threshold), you’ll often see patterns like:

DELETE TOP (@p1) FROM [cdc].[dbo_customer_CT]    WHERE __$start_lsn @p2 (35 times)
DELETE TOP (@p1) FROM [cdc].[dbo_district_CT]    WHERE __$start_lsn @p2 (56 times)
...
DELETE TOP (@p1) FROM [cdc].[dbo_stock_CT]       WHERE __$start_lsn @p2 (323 times)

Large tables can dominate runtime and prevent cleanup from ever catching up.

The Three Tuning Knobs

1) Adjust the Threshold

Higher threshold = deletes more per batch (often more efficient), but can increase contention.

Lower threshold = smaller deletes, but more loops and potentially longer runtime.

Example:

EXEC sys.sp_cdc_change_job
@job_type  = N'cleanup',
@threshold = 2000;

2) Reduce Retention

Reducing retention means less data to keep, making cleanup easier. But consumers must be able to ingest changes within the retention window. Example (2160 minutes = 36 hours):

EXEC sys.sp_cdc_change_job
@job_type = N'cleanup',
@retention  = 2160;

3) Run Cleanup More Frequently

First, find the schedule ID:

USE msdb;
GO
SELECT j.name AS job_name,
s.schedule_id,
s.name AS schedule_name
FROM dbo.sysjobs j
JOIN dbo.sysjobschedules js
ON j.job_id = js.job_id
JOIN dbo.sysschedules s
ON js.schedule_id = s.schedule_id
WHERE j.name = N'cdc.tpcc_cleanup';

Then update it (e.g., every 15 minutes):

USE msdb;
GO
EXEC dbo.sp_update_schedule
@schedule_id = 171,      -- from query above
@enabled = 1,
@freq_type = 4,          -- daily
@freq_interval = 1,
@freq_subday_type = 4,   -- minutes
@freq_subday_interval = 15,
@active_start_time = 000000; -- midnight

Last Resort: Truncating CT Tables (High Risk)

If cleanup still can’t keep up, truncating CT tables may be the fastest recovery path—but it irreversibly deletes change history.

Safe sequence:

Stop the capture job
Ensure consumers ingest remaining changes
Truncate CT tables
Restart capture job

Warnings:

You may create data gaps for downstream consumers.
Consumers may need a full reload/re-baseline after truncation.
Do this only with stakeholder approval and a clear recovery plan.

Tuesday, December 30, 2025

Not Able to Add Article or Subscription After Upgrading

In SQL 2022, there is change in the linked server format used by replication for the listener subscriber with non default port, as shown below, the old format name is LISTENER,54321 and the new format is LISTENER and the port is in the provider string select name, provider_string from sys.servers

name              provider_string
LISTENER,54321    NULL
LISTENER          addr=tcp:LISTENER,54321

After the upgrade to SQL 2022 a couple of behaviors where observed:

The first one is adding an article to a publication of a subscriber using the old format won’t generate a snapshot: A snapshot was not generated because no subscription needed initialization

The second one is creating a new publication for an existing subscriber in the old format will get this error:

'SQL1' is not defined as a Subscriber for 'SQL2'. Could not update the distribution database subscription table. The subscription status could not be changed. The subscription could not be created. The subscription could not be found. Changed database context to 'dummy'. (Microsoft SQL Server, Error: 20032)

Cannot insert the value NULL into column ‘freq_subday_interval’, table ‘distribution.dbo.MSrepl_agent_jobs’; column doen not allow nulls. UPDATE fails. Could not update the distribution database subscription table. The subscription status could not be changed. The subscription could not be created. The subscription could not be found. Changed database context to 'dummy'. (Microsoft SQL Server, Error: 20032)

The workaround is to enable TF 15005 in both the publisher and distributor as well as creating aliases for the listener subscribers with non default port numbers, example LISTENER3

If publisher and distributor are AG then aliases for the listener publisher and listener distributor are also needed, LISTENER1 and LISTENER2, respectively, in both 32 and 64-bit SQL Native Client, as shown below, in a PowerShell script

$aliases = @(
   @{name = 'LISTENER1'; value = 'DBMSSOCN,LISTENER1,54321'}
   @{name = 'LISTENER2'; value = 'DBMSSOCN,LISTENER2,54321'}
   @{name = 'LISTENER3'; value = 'DBMSSOCN,LISTENER3,54321'}
)

$registryPaths = @(
   'HKLM:\SOFTWARE\Microsoft\MSSQLServer\Client\ConnectTo'
   'HKLM:\SOFTWARE\WOW6432Node\Microsoft\MSSQLServer\Client\ConnectTo'
)
foreach ($alias in $aliases) {
   foreach ($registryPath in $registryPaths) {
      if (-not (Test-Path $registryPath)) {
          New-Item -Path $registryPath -Force
      }
      New-ItemProperty -Path $registryPath -Name $alias.name `
      -Value $alias.value -PropertyType String -Force
   }
}

Then you should be able to add the article or to create the subscription without specifying the port number

Friday, May 17, 2024

Multi-Subnet Log Reader Agent

To connect the replication log reader agent to a multi-subnet Always On publisher you must add the parameter -MultiSubnetFailover 1 to the job step as documented here. Example:

-Publisher [SQLPUB1] -PublisherDB [dummy] -Distributor [SQLDIST] -DistributorSecurityMode 1 -Continuous -MultiSubnetFailover 1

However the log reader fails with error below when the publisher SQLPUB1 and distributor SQLDIST are in different subnets

TCP Provider: The wait operation timed out. (Source: MSSQL_REPL, Error number: MSSQL_REPL22037) Get help: http://help/MSSQL_REPL22037

A trace shows that the log reader agent executes the following on the publisher:

declare @p3 nvarchar(128)
set @p3=NULL
declare @p4 nvarchar(128)
set @p4=NULL
exec "master"."sys"."sp_executesql";1 N'exec master.sys.sp_helpdistributor
@distributor = @p1 output, @distribdb = @p2 output',N'@p1 sysname output, 
@p2 sysname output',@p3 output,@p4 output
select @p3, @p4

and it will fail with a similar error

OLE DB provider "MSOLEDBSQL" for linked server "repl_distributor" returned message "Login timeout expired". OLE DB provider "MSOLEDBSQL" for linked server "repl_distributor" returned message "A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online.". Msg 258, Level 16, State 1, Line 0 TCP Provider: The wait operation timed out.

As you can see, the log reader is accessing the distributor from the publisher using the link server

The workaround is to add the MultiSubnetFailover = ‘Yes’ to link server repl_distributor using sp_serveroption which as of this writing is not documented

USE master;
GO
EXEC sp_serveroption
    @server = N'repl_distributor',
    @optname = N'provider string',
    @optvalue = N'MultiSubnetFailover=Yes';
GO

Wednesday, January 24, 2024

Automating Major SQL Version Upgrade with DBATools

You can easily automate a major version upgrade, for example to SQL 2022, using Install-DbaInstance from the DBATools PowerShell module. As of this writing, there was not a documented example, so I played around with it and figured it out.

If you use the configuration parameter option

$config = @{
    ACTION="Upgrade"
}

You will get this error: The setting 'FEATURES' is not allowed when the value of setting 'ACTION' is 'Upgrade'. The workaround is to use a configuration file to override the FEATURES option that the configuration parameter adds by default. The configuration file must have a least these three options: ACTION, INSTANCENAME, and QUIET

The following example does a remote upgrade of a named instance to SQL 2022

$options = '
[OPTIONS]
ACTION="Upgrade"
INSTANCENAME="instance_name"
QUIET="True"
'
Set-Content -Path 'c:\temp\config.ini' -Value $options
 
$paramsUpgrade = @{
    ComputerName      = 'computer_name'
    Version           = '2022'
    Path              = 'sql_install_path’
    UpdateSourcePath  = 'sql_cu_path’
    ConfigurationFile = 'c:\temp\config.ini'
    Restart           = $true
    Credential        = Get-Credential
    Confirm           = $false
}
Install-DbaInstance @paramsUpgrade

Monday, November 6, 2023

Finding Modules that Depend on Az.Accounts

I needed to use new functionality in the Az.Compute module so I updated it to the last version

Update-Module Az.Compute

But then my script started failing with this error: "Method 'get_SerializationSettings' does not have an implementation." This article suggested to downgrade the version of the Az.Accounts module to 2.12.1. That got me thinking how many modules will be affected by the downgrade of Az.Accounts?

You can find the answer by finding the dependencies of the modules using Find-Module. For example for the Az.Compute the current version at the time of writing this post is 6.3.0 and it has a dependency on Az.Accounts 2.13.0

Find-Module -Name Az.Compute
 
Version Name       Repository Description                                                                           
------- ----       ---------- -----------                                                                           
6.3.0   Az.Compute PSGallery  Microsoft Azure PowerShell...
 
$r = Find-Module -Name Az.Compute
$r.Dependencies
 
Name           Value                                                                                                                                
----           -----                                                                                                                                
Name           Az.Accounts                                                                                                                          
MinimumVersion 2.13.0                                                                                                                               
CanonicalId    powershellget:Az.Accounts/2.13.0#https://www...

To find all the modules that have a dependency on Az.Accounts 2.12.1 you can use the same cmdlet Find-Module to first get a list of all modules that start with Az, then for each module ($module) get all the versions, next for each module version ($module2), check each dependency ($dep) if it matches the name and version you are looking for (Az.Accounts and 2.12.1), then show the the respective info. The variable $found is used as short circuit flag to break the loop once a match is found

$depName = 'Az.Accounts'
$depVersion = '2.12.1'
$modules = Find-Module -Name 'az.*' |
Where-Object {$_.Name -ne $refName} |
Sort-Object -Property Name
 
foreach ($module in $modules) {
    $modules2 = Find-Module -Name $module.Name -AllVersions
    $found = $false
    foreach ($module2 in $modules2){
        foreach ($dep in $module2.Dependencies){
            if ($dep.Name -eq $depName -and $dep.MinimumVersion -eq $depVersion){
                [pscustomobject]@{
                    ModuleName = $module2.Name
                    ModuleVersion = $module2.Version
                    DependencyName = $depName
                    DependencyVersion = $depVersion
                }
                $found = $true
                break
            }
            else {
                $found = $false
            }
        }
        if ($found) {
            break
        }
    }
}
 
ModuleName           ModuleVersion DependencyName DependencyVersion
----------           ------------- -------------- -----------------
Az.Aks               5.3.2         Az.Accounts    2.12.1          
Az.ArcResourceBridge 0.1.0         Az.Accounts    2.12.1          
Az.Batch             3.4.0         Az.Accounts    2.12.1          
Az.Billing           2.0.1         Az.Accounts    2.12.1          
Az.CognitiveServices 1.13.1        Az.Accounts    2.12.1          
Az.Compute           5.7.0         Az.Accounts    2.12.1          
Az.ContainerRegistry 3.0.3         Az.Accounts    2.12.1          
Az.CosmosDB          1.10.0        Az.Accounts    2.12.1          
Az.CostManagement    0.3.1         Az.Accounts    2.12.1          
Az.DataProtection    1.2.0         Az.Accounts    2.12.1          
Az.EventGrid         1.6.0         Az.Accounts    2.12.1          
Az.EventHub          3.2.3         Az.Accounts    2.12.1          
Az.Kusto             2.2.0         Az.Accounts    2.12.1          
Az.Network           5.6.0         Az.Accounts    2.12.1          
Az.Reservations      0.12.0        Az.Accounts    2.12.1          
Az.Resources         6.6.0         Az.Accounts    2.12.1          
Az.Search            0.9.0         Az.Accounts    2.12.1          
Az.ServiceBus        2.2.1         Az.Accounts    2.12.1          
Az.Sql               4.5.0         Az.Accounts    2.12.1          
Az.SqlVirtualMachine 1.1.1         Az.Accounts    2.12.1          
Az.Storage           5.5.0         Az.Accounts    2.12.1          
Az.Websites          2.14.0        Az.Accounts    2.12.1          
Az.Workloads         0.1.0         Az.Accounts    2.12.1

From the output you can see that in our example to use Az.Accounts 2.12.1 you require Az.Compute 5.6.0 instead of 6.3.0. Fortunately, Az.Compute 5.6.0 still had the new functionality I was looking. Thus, reinstalling the correct versions fixed the issue.

Uninstall-Module Az.Accounts -RequiredVersion 2.13.0
Install-Module Az.Accounts -RequiredVersion 2.12.1
Uninstall-Module Az.Compute -RequiredVersion 6.3.0
Install-Module Az.Compute -RequiredVersion 5.6.0

Saturday, January 14, 2023

An Exception Occurred in SMO While Trying to Manage a Service

One way to automate SQL tasks is by using SMO, but sometimes when there are multiple SQL versions installed or uninstalled it may be corrupted and you may get "An exception occurred in SMO while trying to manage a service..." when using PowerShell or “Cannot connect to WMI provider. You do not have permission or the server in unreachable…” when using SQL Server Configuration Manager. In this post I will show you how to easily fix it.

To use SMO with PowerShell, first you need to load the assembly

[System.Reflection.Assembly]::LoadWithPartialName('Microsoft.SqlServer.SqlWmiManagement') | Out-Null

Next, you can create an instance of the .Net object, providing the name of the server, e.g. PABLITO

$s = New-Object Microsoft.SqlServer.Management.Smo.Wmi.ManagedComputer PABLITO
$s

ConnectionSettings : Microsoft.SqlServer.Management.Smo.Wmi.WmiConnectionInfo
Services           :
ClientProtocols    :
ServerInstances    :
ServerAliases      :
Urn                : ManagedComputer[@Name='PABLITO']
Name               : PABLITO
Properties         : {}
UserData           :
State              : Existing

Note that in the results above, services is blank, so when you reference it, you will get the error

$s.Services

The following exception occurred while trying to enumerate the collection:
"An exception occurred in SMO while trying to manage a service.".
At line:1 char:1
+ $s.Services
+ ~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [], ExtendedTypeSystemException
    + FullyQualifiedErrorId : ExceptionInGetEnumerator

The fix is to compile the sqlmgmproviderxpsp2up.mof of the highest SQL version installed in the machine. You can quickly search the path of the .mof, order by Creation Date, and displaying the Directory name

Get-ChildItem "c:\program files (x86)\Microsoft SQL Server\*\Shared\sqlmgmproviderxpsp2up.mof" |
Sort-Object CreationTime |
Select-Object Directory
 
Directory
---------
C:\program files (x86)\Microsoft SQL Server\90\Shared
C:\program files (x86)\Microsoft SQL Server\100\Shared
C:\program files (x86)\Microsoft SQL Server\110\Shared

You can see that this machine has three SQL versions installed SQL 2005, 2008, and 2012 (Yes I still support those old versions but not by choice) To get the highest version, you just need to add -Last 1 and assign the result to the variable $i to execute mofcomp

$i = Get-ChildItem "c:\program files (x86)\Microsoft SQL Server\*\Shared\sqlmgmproviderxpsp2up.mof" |
Sort-Object CreationTime |
Select-Object Directory -Last 1

mofcomp "$($i.Directory)\sqlmgmproviderxpsp2up.mof"

Microsoft (R) MOF Compiler Version 6.1.7600.16385
Copyright (c) Microsoft Corp. 1997-2006. All rights reserved.
Parsing MOF file: C:\program files (x86)\Microsoft SQL Server\110\Shared\sqlmgmproviderxpsp2up.mof
MOF file has been successfully parsed
Storing data in the repository...
Done!

Now you can get the services with no errors

[System.Reflection.Assembly]::LoadWithPartialName('Microsoft.SqlServer.SqlWmiManagement') | Out-Null
$s = New-Object Microsoft.SqlServer.Management.Smo.Wmi.ManagedComputer PABLITO
$s.Services

Friday, December 30, 2022

Setting Delete On Termination in Attached AWS Volumes

By default, any additional EBS volumes that you attach to an EC2 instance persist even after the instance terminates, unless Delete On Termination is set on each of the attached volumes. This AWS doc shows how to set it in the console and cli but not in PowerShell.

However, the cli example uses "aws ec2 modify-instance-attribute" and you can find in the AWS PowerShell reference doc a cmdlet with similar name Edit-EC2InstanceAttribute. So here is how to set it with PowerShell.

First, you need to instantiate two classes InstanceBlockDeviceMappingSpecification and EbsInstanceBlockDeviceSpecification

$bdm = New-Object Amazon.EC2.Model.InstanceBlockDeviceMappingSpecification
$ebs = New-Object Amazon.EC2.Model.EbsInstanceBlockDeviceSpecification

Next assign values similar to the cli json format example

[
  {
    "DeviceName": "device_name",
    "Ebs": {
      "DeleteOnTermination": true
    }
  }
]

The equivalent in PowerShell is the following

$ebs.DeleteOnTermination = $true
$bdm.DeviceName = 'device_name'
$bdm.Ebs = $ebs

Finally, call the cmdlet

Edit-EC2InstanceAttribute -InstanceId 'id' -BlockDeviceMapping $bdm -Region 'region'

Pages