The Delete TOP N Behavior
When there are N or N+1 rows, delete TOP N stops scanning the range once N rows are reached
When there are N-1 rows, delete TOP N will scan the entire range. If the range contains a very large amount rows then the scan may take long time, and the locks will be held until the scan finishes, which can cause blocking.
For example, the distribution cleanup job calls the delete statements in a "WHILE 1 =1" loop from the following stored procedures
-- sp_MSdelete_publisherdb_trans line 202: delete TOP(2000) MSrepl_commands WITH (PAGLOCK) from MSrepl_commands with (INDEX(ucMSrepl_commands)) -- sp_MSdelete_dodelete line 38: delete TOP(5000) MSrepl_transactions WITH (PAGLOCK) from MSrepl_transactions with (INDEX(ucMSrepl_transactions))
The following extended event session can be used to get the query plans for line 202 and 38 of the respective stored procedures
CREATE EVENT SESSION [distribution_cleanup] ON SERVER ADD EVENT sqlserver.query_post_execution_showplan( ACTION(sqlserver.query_plan_hash,sqlserver.tsql_stack) WHERE (([object_name]=N'sp_MSdelete_publisherdb_trans' AND [sqlserver].[like_i_sql_unicode_string]([sqlserver].[tsql_stack],N'%frame level=''1''%line=''202''%frame level=''2''%') OR ([object_name]=N'sp_MSdelete_dodelete' AND [sqlserver].[like_i_sql_unicode_string]([sqlserver].[tsql_stack],N'%frame level=''1''%line=''38''%frame level=''2''%'))))), ADD EVENT sqlserver.sp_statement_completed(SET collect_object_name=(1),collect_statement=(1) ACTION(sqlserver.query_plan_hash,sqlserver.tsql_stack) WHERE (([object_name]=N'sp_MSdelete_publisherdb_trans' AND [sqlserver].[like_i_sql_unicode_string]([sqlserver].[tsql_stack],N'%frame level=''1''%line=''202''%frame level=''2''%') OR ([object_name]=N'sp_MSdelete_dodelete' AND [sqlserver].[like_i_sql_unicode_string]([sqlserver].[tsql_stack],N'%frame level=''1''%line=''38''%frame level=''2''%'))))) ADD TARGET package0.event_file(SET filename=N'c:\temp\distribution_cleanup.xel') WITH (TRACK_CAUSALITY=ON) GO
In this particular case, the table MSrepl_commands had about 600 million rows, each iteration deletes 2,000 rows with a few milliseconds of duration, but the last iteration deleted 1,352 rows with a 19 minute duration
The following query can be used to extract the query plan from the .xel file, so we can just use SSMS to visualize it
SELECT event_data = CONVERT(XML, event_data) INTO #tmp FROM sys.fn_xe_file_target_read_file( N'C:\temp\distribution_cleanup_0_131593922293630000*.xel', NULL, NULL, NULL); SELECT t.event_data.value('(/event/@timestamp)[1]', 'datetime2') AS event_time ,t.event_data.value('(/event/data[@name="duration"]/value)[1]', 'bigint') AS duration ,t.event_data.value('(/event/data[@name="object_name"]/value)[1]', 'varchar(255)') AS object_name ,actual_plan = x.xml_fragment.query('.') FROM #tmp AS t CROSS APPLY t.event_data.nodes(N'/event/data[@name="showplan_xml"]/value/*') AS x(xml_fragment) order by event_time
Comparing the query plan when it deletes 2,000 rows vs. the plan that deletes 1.356 rows, the former reads 2,000 rows and the latter reads more than 200 million rows due to the residual predicate
This explains why the lock is held for a long time, and the blocking is explained by the cluster index definition shown below
The Cluster Index
Both tables MSrepl_commands and MSrepl_transaction have a similar cluster index that has the publisher_database_id as the leading key column. Thus, rows are stored in order by publisher_database_id. An example for MSrepl_transactions is shown below
If there is more than one publisher_database_id then the last page of the first publisher_database_id range will contain not only the newest rows for the first publisher_database_id but also the oldest rows for the second publisher_database_id. So, the delete with (PAGLOCK) of the oldest rows for the second publisher_database_id will block the insert of new rows by the log reader for the first publisher_database_id .
Example
In my test environment two publisher databases pub1 and pub2, have publisher_database_id = 3 and publisher_database_id = 4 respectively.
-- create databases use master create database pub1 create database pub2 go -- create tables use pub1 create table test1(a int not null identity, b datetime, c varchar(255) constraint test1_pk primary key (a) ) use pub2 create table test2(a int not null identity, b datetime, c varchar(255) constraint test2_pk primary key (a) ) GO -- Adding the transactional publication with immediate synch use [pub1] exec sp_replicationdboption @dbname = N'pub1', @optname = N'publish', @value = N'true' exec sp_addpublication @publication = N'test_pub1', @allow_anonymous = N'true', @immediate_sync = N'true', @independent_agent = N'true', @status = N'active' exec sp_addpublication_snapshot @publication = N'test_pub1', @frequency_type = 1 exec sp_addarticle @publication = N'test_pub1', @article = N'test1', @source_owner = N'dbo', @source_object = N'test1' GO -- Adding the transactional publication with immediate synch use [pub2] exec sp_replicationdboption @dbname = N'pub2', @optname = N'publish', @value = N'true' exec sp_addpublication @publication = N'test_pub2', @allow_anonymous = N'true', @immediate_sync = N'true', @independent_agent = N'true', @status = N'active' exec sp_addpublication_snapshot @publication = N'test_pub2', @frequency_type = 1 exec sp_addarticle @publication = N'test_pub2', @article = N'test2', @source_owner = N'dbo', @source_object = N'test2' GO -- disable distribution cleanup job exec msdb..sp_update_job @job_name = 'Distribution clean up: distribution', @enabled = 0 -- add 5999 rows at 2:43 PM use pub2 insert test2(b,c) values (getdate(),replicate('x',200)) go 5999 -- add 21000 rows at 3:44 PM (one hour later) use pub1 insert test1(b,c) values (getdate(),replicate('x',200)) go 1000 use pub2 insert test2(b,c) values (getdate(),replicate('x',200)) go 20000 -- verify there is 26999 rows use distribution select count(*) from MSrepl_transactions select count(*) from MSrepl_commands -- get root page select allocated_page_file_id, allocated_page_page_id, next_page_page_id, previous_page_page_id, page_level from sys.dm_db_database_page_allocations(db_id(),object_id('MSrepl_transactions'),1,null,'DETAILED') where page_type_desc is not null and page_type_desc = 'INDEX_PAGE'
DBCC TRACEON(3604) DBCC PAGE (distribution, 1, 72305, 3); DBCC TRACEOFF(3604)
The last page for the publisher_database_id = 3 range is 72702 and the first page for the publisher_database_id = 4 range is 72629.
To simulate a long duration we can execute the distribution cleanup proc with an open transaction
use distribution begin tran EXEC dbo.sp_MSdistribution_cleanup @min_distretention = 0, @max_distretention = 1 Removed 5999 replicated transactions consisting of 5999 statements in 1 seconds (11998 rows/sec).
The cleanup job removed the 5,999 rows we added 1 hour ago for publisher_database_id = 4 and held an X lock on page 72702 that contains the oldest rows for publisher_database_id = 4. We can see the locks using this query
select * from sys.dm_tran_locks where request_session_id = 68 and request_mode = 'X'
use pub1 insert test1(b,c) values (getdate(),replicate('x',200)) go
The log reader agent for publisher_database_id = 3 is blocked by the delete of publisher_database_id = 4, as shown in the spWhoIsActive output, because page 72702 also contains the newest rows for publisher_database_id = 3
Workaround
We can avoid deleting N-1 rows by "delete TOP N" and prevent it from reading million of rows, if we modify the delete loop to iterate a multiple of N on the sp_MSdelete_publisherdb_trans stored procedure.
Code snipped for the MSrepl_commands table
select @max = count(*)/2000 from MSrepl_commands with (INDEX(ucMSrepl_commands), nolock) where publisher_database_id = @publisher_database_id and xact_seqno <= @max_xact_seqno and (type & ~@snapshot_bit) not in (@directory_type, @alt_directory_type) and (type & ~@replpost_bit) <> @scriptexec_type OPTION (MAXDOP 4) WHILE @i <= @max BEGIN DELETE TOP(2000) MSrepl_commands WITH (PAGLOCK) from MSrepl_commands with (INDEX(ucMSrepl_commands)) where publisher_database_id = @publisher_database_id and xact_seqno <= @max_xact_seqno and (type & ~@snapshot_bit) not in (@directory_type, @alt_directory_type) and (type & ~@replpost_bit) <> @scriptexec_type OPTION (MAXDOP 1) select @row_count = @@rowcount -- Update output parameter select @num_commands = @num_commands + @row_count set @i=@i+1 END
Code snipped for the MSrepl_transactions table
select @max = count(*)/5000 from MSrepl_transactions with (INDEX(ucMSrepl_transactions), nolock) where publisher_database_id = @publisher_database_id and xact_seqno <= @max_xact_seqno and xact_seqno <> @last_xact_seqno and xact_seqno <> @last_log_xact_seqno OPTION (MAXDOP 4) WHILE @i <= @max BEGIN exec dbo.sp_MSdelete_dodelete @publisher_database_id, @max_xact_seqno, @last_xact_seqno, @last_log_xact_seqno, @has_immediate_sync select @row_count = @@rowcount -- Update output parameter select @num_transactions = @num_transactions + @row_count set @i=@i+1 END
Note that Microsoft doesn't support modifying the system stored procedures, so test it at your own risk