The distribution cleanup job may block the log reader agent for several minutes, if there is a large number of rows to keep and "delete TOP N" deletes N-1 rows.
The Delete TOP N Behavior
When there are N or N+1 rows, delete TOP N stops scanning the range once N rows are reached
When there are N-1 rows, delete TOP N will scan the entire range. If the range contains a very large amount rows then the scan may take long time, and the locks will be held until the scan finishes, which can cause blocking.
For example, the distribution cleanup job calls the delete statements in a "WHILE 1 =1" loop from the following stored procedures
-- sp_MSdelete_publisherdb_trans line 202:
delete TOP(2000) MSrepl_commands WITH (PAGLOCK) from MSrepl_commands with (INDEX(ucMSrepl_commands))
-- sp_MSdelete_dodelete line 38:
delete TOP(5000) MSrepl_transactions WITH (PAGLOCK) from MSrepl_transactions with (INDEX(ucMSrepl_transactions))
The following extended event session can be used to get the query plans for line 202 and 38 of the respective stored procedures
CREATE EVENT SESSION [distribution_cleanup] ON SERVER
ADD EVENT sqlserver.query_post_execution_showplan(
ACTION(sqlserver.query_plan_hash,sqlserver.tsql_stack)
WHERE (([object_name]=N'sp_MSdelete_publisherdb_trans'
AND [sqlserver].[like_i_sql_unicode_string]([sqlserver].[tsql_stack],N'%frame level=''1''%line=''202''%frame level=''2''%')
OR ([object_name]=N'sp_MSdelete_dodelete'
AND [sqlserver].[like_i_sql_unicode_string]([sqlserver].[tsql_stack],N'%frame level=''1''%line=''38''%frame level=''2''%'))))),
ADD EVENT sqlserver.sp_statement_completed(SET collect_object_name=(1),collect_statement=(1)
ACTION(sqlserver.query_plan_hash,sqlserver.tsql_stack)
WHERE (([object_name]=N'sp_MSdelete_publisherdb_trans'
AND [sqlserver].[like_i_sql_unicode_string]([sqlserver].[tsql_stack],N'%frame level=''1''%line=''202''%frame level=''2''%')
OR ([object_name]=N'sp_MSdelete_dodelete'
AND [sqlserver].[like_i_sql_unicode_string]([sqlserver].[tsql_stack],N'%frame level=''1''%line=''38''%frame level=''2''%')))))
ADD TARGET package0.event_file(SET filename=N'c:\temp\distribution_cleanup.xel')
WITH (TRACK_CAUSALITY=ON)
GO
In this particular case, the table MSrepl_commands had about 600 million rows, each iteration deletes 2,000 rows with a few milliseconds of duration, but the last iteration deleted 1,352 rows with a 19 minute duration
The following query can be used to extract the query plan from the .xel file, so we can just use SSMS to visualize it
SELECT event_data = CONVERT(XML, event_data)
INTO #tmp
FROM sys.fn_xe_file_target_read_file( N'C:\temp\distribution_cleanup_0_131593922293630000*.xel', NULL, NULL, NULL);
SELECT
t.event_data.value('(/event/@timestamp)[1]', 'datetime2') AS event_time
,t.event_data.value('(/event/data[@name="duration"]/value)[1]', 'bigint') AS duration
,t.event_data.value('(/event/data[@name="object_name"]/value)[1]', 'varchar(255)') AS object_name
,actual_plan = x.xml_fragment.query('.')
FROM #tmp AS t
CROSS APPLY t.event_data.nodes(N'/event/data[@name="showplan_xml"]/value/*')
AS x(xml_fragment)
order by event_time
Comparing the query plan when it deletes 2,000 rows vs. the plan that deletes 1.356 rows, the former reads 2,000 rows and the latter reads more than 200 million rows due to the residual predicate
This explains why the lock is held for a long time, and the blocking is explained by the cluster index definition shown below
The Cluster Index
Both tables MSrepl_commands and MSrepl_transaction have a similar cluster index that has the publisher_database_id as the leading key column. Thus, rows are stored in order by publisher_database_id. An example for MSrepl_transactions is shown below
If there is more than one publisher_database_id then the last page of the first publisher_database_id range will contain not only the newest rows for the first publisher_database_id but also the oldest rows for the second publisher_database_id. So, the delete with (PAGLOCK) of the oldest rows for the second publisher_database_id will block the insert of new rows by the log reader for the first publisher_database_id .
Example
In my test environment two publisher databases pub1 and pub2, have publisher_database_id = 3 and publisher_database_id = 4 respectively.
-- create databases
use master
create database pub1
create database pub2
go
-- create tables
use pub1
create table test1(a int not null identity, b datetime, c varchar(255)
constraint test1_pk primary key (a)
)
use pub2
create table test2(a int not null identity, b datetime, c varchar(255)
constraint test2_pk primary key (a)
)
GO
-- Adding the transactional publication with immediate synch
use [pub1]
exec sp_replicationdboption @dbname = N'pub1', @optname = N'publish', @value = N'true'
exec sp_addpublication @publication = N'test_pub1', @allow_anonymous = N'true', @immediate_sync = N'true', @independent_agent = N'true', @status = N'active'
exec sp_addpublication_snapshot @publication = N'test_pub1', @frequency_type = 1
exec sp_addarticle @publication = N'test_pub1', @article = N'test1', @source_owner = N'dbo', @source_object = N'test1'
GO
-- Adding the transactional publication with immediate synch
use [pub2]
exec sp_replicationdboption @dbname = N'pub2', @optname = N'publish', @value = N'true'
exec sp_addpublication @publication = N'test_pub2', @allow_anonymous = N'true', @immediate_sync = N'true', @independent_agent = N'true', @status = N'active'
exec sp_addpublication_snapshot @publication = N'test_pub2', @frequency_type = 1
exec sp_addarticle @publication = N'test_pub2', @article = N'test2', @source_owner = N'dbo', @source_object = N'test2'
GO
-- disable distribution cleanup job
exec msdb..sp_update_job @job_name = 'Distribution clean up: distribution', @enabled = 0
-- add 5999 rows at 2:43 PM
use pub2
insert test2(b,c)
values (getdate(),replicate('x',200))
go 5999
-- add 21000 rows at 3:44 PM (one hour later)
use pub1
insert test1(b,c)
values (getdate(),replicate('x',200))
go 1000
use pub2
insert test2(b,c)
values (getdate(),replicate('x',200))
go 20000
-- verify there is 26999 rows
use distribution
select count(*) from MSrepl_transactions
select count(*) from MSrepl_commands
-- get root page
select allocated_page_file_id, allocated_page_page_id, next_page_page_id, previous_page_page_id, page_level
from sys.dm_db_database_page_allocations(db_id(),object_id('MSrepl_transactions'),1,null,'DETAILED')
where page_type_desc is not null and page_type_desc = 'INDEX_PAGE'
DBCC TRACEON(3604)
DBCC PAGE (distribution, 1, 72305, 3);
DBCC TRACEOFF(3604)
The last page for the publisher_database_id = 3 range is 72702 and the first page for the publisher_database_id = 4 range is 72629.
To simulate a long duration we can execute the distribution cleanup proc with an open transaction
use distribution
begin tran
EXEC dbo.sp_MSdistribution_cleanup @min_distretention = 0, @max_distretention = 1
Removed 5999 replicated transactions consisting of 5999 statements in 1 seconds (11998 rows/sec).
The cleanup job removed the 5,999 rows we added 1 hour ago for publisher_database_id = 4 and held an X lock on page 72702 that contains the oldest rows for publisher_database_id = 4. We can see the locks using this query
select * from sys.dm_tran_locks where request_session_id = 68 and request_mode = 'X'
In another session, we can execute the following script to add new rows to publisher_database_id = 3 (pub1)
use pub1
insert test1(b,c)
values (getdate(),replicate('x',200))
go
The log reader agent for publisher_database_id = 3 is blocked by the delete of publisher_database_id = 4, as shown in the spWhoIsActive output, because page 72702 also contains the newest rows for publisher_database_id = 3
Workaround
We can avoid deleting N-1 rows by "delete TOP N" and prevent it from reading million of rows, if we modify the delete loop to iterate a multiple of N on the sp_MSdelete_publisherdb_trans stored procedure.
Code snipped for the MSrepl_commands table
select @max = count(*)/2000 from MSrepl_commands with (INDEX(ucMSrepl_commands), nolock) where
publisher_database_id = @publisher_database_id and
xact_seqno <= @max_xact_seqno and
(type & ~@snapshot_bit) not in (@directory_type, @alt_directory_type) and
(type & ~@replpost_bit) <> @scriptexec_type
OPTION (MAXDOP 4)
WHILE @i <= @max
BEGIN
DELETE TOP(2000) MSrepl_commands WITH (PAGLOCK) from MSrepl_commands with (INDEX(ucMSrepl_commands)) where
publisher_database_id = @publisher_database_id and
xact_seqno <= @max_xact_seqno and
(type & ~@snapshot_bit) not in (@directory_type, @alt_directory_type) and
(type & ~@replpost_bit) <> @scriptexec_type
OPTION (MAXDOP 1)
select @row_count = @@rowcount
-- Update output parameter
select @num_commands = @num_commands + @row_count
set @i=@i+1
END
Code snipped for the MSrepl_transactions table
select @max = count(*)/5000 from MSrepl_transactions with (INDEX(ucMSrepl_transactions), nolock) where
publisher_database_id = @publisher_database_id and
xact_seqno <= @max_xact_seqno and
xact_seqno <> @last_xact_seqno and
xact_seqno <> @last_log_xact_seqno
OPTION (MAXDOP 4)
WHILE @i <= @max
BEGIN
exec dbo.sp_MSdelete_dodelete @publisher_database_id,
@max_xact_seqno,
@last_xact_seqno,
@last_log_xact_seqno,
@has_immediate_sync
select @row_count = @@rowcount
-- Update output parameter
select @num_transactions = @num_transactions + @row_count
set @i=@i+1
END
Note that Microsoft doesn't support modifying the system stored procedures, so test it at your own risk