Pages

Showing posts with label Replication. Show all posts
Showing posts with label Replication. Show all posts

Tuesday, December 30, 2025

Not Able to Add Article or Subscription After Upgrading

In SQL 2022, there is change in the linked server format used by replication for the listener subscriber with non default port, as shown below, the old format name is LISTENER,54321 and the new format is LISTENER and the port is in the provider string select name, provider_string from sys.servers
name              provider_string
LISTENER,54321    NULL
LISTENER          addr=tcp:LISTENER,54321
After the upgrade to SQL 2022 a couple of behaviors where observed: 

The first one is adding an article to a publication of a subscriber using the old format won’t generate a snapshot: A snapshot was not generated because no subscription needed initialization 

The second one is creating a new publication for an existing subscriber in the old format will get this error:

'SQL1' is not defined as a Subscriber for 'SQL2'. Could not update the distribution database subscription table. The subscription status could not be changed. The subscription could not be created. The subscription could not be found. Changed database context to 'dummy'. (Microsoft SQL Server, Error: 20032) 

Or 

Cannot insert the value NULL into column ‘freq_subday_interval’, table ‘distribution.dbo.MSrepl_agent_jobs’; column doen not allow nulls. UPDATE fails. Could not update the distribution database subscription table. The subscription status could not be changed. The subscription could not be created. The subscription could not be found. Changed database context to 'dummy'. (Microsoft SQL Server, Error: 20032) 

The workaround is to enable TF 15005 in both the publisher and distributor as well as creating aliases for the listener subscribers with non default port numbers, example LISTENER3 

If publisher and distributor are AG then aliases for the listener publisher and listener distributor are also needed, LISTENER1 and LISTENER2, respectively, in both 32 and 64-bit SQL Native Client, as shown below, in a PowerShell script
$aliases = @(
   @{name = 'LISTENER1'; value = 'DBMSSOCN,LISTENER1,54321'}
   @{name = 'LISTENER2'; value = 'DBMSSOCN,LISTENER2,54321'}
   @{name = 'LISTENER3'; value = 'DBMSSOCN,LISTENER3,54321'}
)

$registryPaths = @(
   'HKLM:\SOFTWARE\Microsoft\MSSQLServer\Client\ConnectTo'
   'HKLM:\SOFTWARE\WOW6432Node\Microsoft\MSSQLServer\Client\ConnectTo'
)
foreach ($alias in $aliases) {
   foreach ($registryPath in $registryPaths) {
      if (-not (Test-Path $registryPath)) {
          New-Item -Path $registryPath -Force
      }
      New-ItemProperty -Path $registryPath -Name $alias.name `
      -Value $alias.value -PropertyType String -Force
   }
}
Then you should be able to add the article or to create the subscription without specifying the port number

Friday, May 17, 2024

Multi-Subnet Log Reader Agent

To connect the replication log reader agent to a multi-subnet Always On publisher you must add the parameter -MultiSubnetFailover 1 to the job step as documented here. Example: 

-Publisher [SQLPUB1] -PublisherDB [dummy] -Distributor [SQLDIST] -DistributorSecurityMode 1 -Continuous -MultiSubnetFailover 1 

However the log reader fails with error below when the publisher SQLPUB1 and distributor SQLDIST are in different subnets

TCP Provider: The wait operation timed out. (Source: MSSQL_REPL, Error number: MSSQL_REPL22037) Get help: http://help/MSSQL_REPL22037 

A trace shows that the log reader agent executes the following on the publisher:
declare @p3 nvarchar(128)
set @p3=NULL
declare @p4 nvarchar(128)
set @p4=NULL
exec "master"."sys"."sp_executesql";1 N'exec master.sys.sp_helpdistributor
@distributor = @p1 output, @distribdb = @p2 output',N'@p1 sysname output, 
@p2 sysname output',@p3 output,@p4 output
select @p3, @p4
and it will fail with a similar error

OLE DB provider "MSOLEDBSQL" for linked server "repl_distributor" returned message "Login timeout expired". OLE DB provider "MSOLEDBSQL" for linked server "repl_distributor" returned message "A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online.". Msg 258, Level 16, State 1, Line 0 TCP Provider: The wait operation timed out. 

As you can see, the log reader is accessing the distributor from the publisher using the link server

The workaround is to add the MultiSubnetFailover = ‘Yes’ to link server repl_distributor using sp_serveroption which as of this writing is not documented
USE master;
GO
EXEC sp_serveroption
    @server = N'repl_distributor',
    @optname = N'provider string',
    @optvalue = N'MultiSubnetFailover=Yes';
GO
 

Wednesday, July 1, 2020

Not Defined as Subscriber Error

In SQL 2019, if you create a transnational push subscription to a database that participates in an Availability Group, using the subscriber name format <AGListener>, <port number> then it will fail with the error:

Msg 20032, Level 16, State 1, Procedure distribution.dbo.sp_MSadd_subscription, Line 175 [Batch Start Line 0]
'AGLISTENER,54321' is not defined as a Subscriber for 'SERVER1\INSTANCE1'.
Msg 14070, Level 16, State 1, Procedure sys.sp_MSrepl_changesubstatus, Line 1249 [Batch Start Line 0]
Could not update the distribution database subscription table. The subscription status could not be changed.
Msg 14057, Level 16, State 1, Procedure sys.sp_MSrepl_addsubscription_article, Line 384 [Batch Start Line 0]
The subscription could not be created.
Msg 20021, Level 16, State 1, Procedure sys.sp_MSrepl_addpushsubscription_agent, Line 258 [Batch Start Line 0]
The subscription could not be found.

Similar error will happen if you upgrade the remote distributor to SQL 2019. It is a bug fixed in CU5 released last week.

Assuming you have a replication environment like this

  • Publisher: SERVER1/INSTANCE1
  • Distributor: SERVER2/INSTANCE2
  • Subscriber: AGLISTENER,54321

The script to reproduce the issue is:

-- Publisher
create database pub_test
go
use pub_test
create table test(a int primary key, b varchar(255))
go
exec sp_replicationdboption @dbname = N'pub_test', @optname = N'publish', @value = N'true'
exec sp_addpublication @publication = N'test', @repl_freq = N'continuous', @status = N'active', @independent_agent = N'true'
exec sp_addpublication_snapshot @publication = N'test'
exec sp_addarticle @publication = N'test', @article = N'test', @source_owner = N'dbo', @source_object = N'test'
go

-- Susbcriber
create database sub_test
go

-- Publisher

exec sp_addsubscription @publication = N'test', @subscriber = N'AGLISTENER,54321', @destination_db = N'sub_test', @subscription_type = N'Push'
exec sp_addpushsubscription_agent @publication = N'test', @subscriber = N'AGLISTENER,54321', @subscriber_db = N'sub_test'

Thursday, November 29, 2018

Unable to Query the Redirected Server for Original Publisher

sp_redirect_publisher uses the account credential used when the Publisher was added to the Remote distributor and stored in msdb.dbo.MSdistpublishers. So if a SQL authentication account was used such as 'sa' or 'distributor_admin,' replication may fail when the password expired or the account doesn't exist.

For example, if we execute the following, the login column should be blank

select * from msdb.dbo.MSdistpublishers



To fix it run the following

exec [sys].[sp_changedistpublisher] 'host\instance', 'security_mode', '1'
exec [sys].[sp_changedistpublisher] 'host\instance', 'security_mode', '1'

Now the login column will be blank



Otherwise we will get these errors when the password expired

Unable to query the redirected server ',' for original publisher '\' and publisher database '' to determine the name of the remote server; Error 18456, Error message 'Error 18456, Level 14, State 1, Message: Login failed for user 'sa'.'.'. 

Unable to query the redirected server ',' for original publisher '\' and publisher database '' to determine the name of the remote server; Error 18456, Error message 'Error 18456, Level 14, State 1, Message: Login failed for user 'distributor_admin'.'.'.

Friday, August 24, 2018

Adding the Distributor to the AG Publisher May Remove Replication

If we have an Availability Group (AG) publisher and we plan to rebuild the secondary without impacting transactional replication, then we may see unexpected results if the secondary was the original publisher. Because adding the distributor will delete the metadata in the remote distributor, and replication will need to be rebuild.

Scenario 

Replication was created when HostA\InstanceA was the primary so it is the original publisher, as shown below.

Primary: HostA\InstanceA (Original Publisher)
Secondary: HostB\InstanceB
Distributor: HostR\InstanceR

After fail over HostA\InstanceA is the secondary, as shown below.

Secondary: HostA\InstanceA (Original Publisher)
Primary: HostB\InstanceB
Distributor: HostR\InstanceR

Then HostA\InstanceA is rebuild on new hardware without production impact, as shown below.

Secondary: HostA\InstanceA (Original Publisher Rebuild)
Primary: HostB\InstanceB
Distributor: HostR\InstanceR

But now we need to run sp_addistributor 'HostR\InstanceR' in the newly built secondary as outlined here

When we run sp_addistributor in the secondary, it executes the following in the distributor, passing the name of the secondary that was the original publisher

repl_distributor.distribution_db.sys.sp_MSdistpublisher_cleanup @@servername

So it will delete all the metadata and replication will stop working, then we will need to run sp_removereplication in the primary and rebuild replication

Workaround 

Since sp_adddistributor passes @@servername, we can rename the secondary to HostANew\InstanceA using sp_dropserver/sp_addserver as outlined here, then sp_MSdistpublisher_cleanup @@servername won't find any matching entries of the original publisher in the distributor and replication will continue to work as expected.

Wednesday, August 15, 2018

The Nulls in the Infamous Error "The Row Was Not Found ..."

While I was troubleshooting an issue, I wondered why there were nulls on the error:

"The row was not found at the Subscriber when applying the replicated (null) command for Table '(null)' with Primary Key(s): (null) (Source: MSSQLServer, Error number: 20598)"

After a little research I found out that since SQL 2012 SP4, sp_MSreplraiserror and sp_MSdel_dbo% have 3 additional parameters used in the raise error 20598 command, and they will show null depending on the SQL version of the subscriber or if the subscriber was upgraded in place

sp_MSreplraiserror

It now expects 3 parameters in the raise error 20598 command, as shown below. It gets called in the subscriber by the delete stored procedure for the respective table, e.g. sp_MSdel_dbotb1. Thus, if the publisher is SQL 2016 SP2 and the subscriber is SQL 2012 SP3, then the error message will show nulls

Before

create procedure sys.sp_MSreplraiserror @errorid int, @param1 sysname = null, @param2 sysname= null
as
    if @errorid = 20508 raiserror (20508, 11, 1)
    ...
    else if @errorid = 20598 raiserror (20598, 16, 1)

After

create procedure sys.sp_MSreplraiserror @errorid int, @param1 sysname = null, @param2 sysname= null, @param3 int = null
as
    if @errorid = 20508 raiserror (20508, 11, 1)
    ...
    else if @errorid = 20598 raiserror (20598, 16, 1, @param3, @param1, @param2)

Workaround

Since it is a system stored procedure there is no other option than upgrading the subscriber

sp_MSdel_dbo%

It now passes values to the 3 parameters in sp_MSreplraiserror, as shown below. Thus, if the subscriber was upgraded in place e.g from SQL 2012 SP3 to SQL 2016 SP2, then the error message with show nulls

Before

create procedure [dbo].[sp_MSdel_dbotb1] (@pkc1 int)
as
begin
delete [dbo].[tb1]
where [sno] = @pkc1
if @@rowcount = 0
    if @@microsoftversion>0x07320000
        exec sp_MSreplraiserror @errorid=20598
end

After

create procedure [dbo].[sp_MSdel_dbotb1] (@pkc1 int)
as
begin
declare @primarykey_text nvarchar(100) = ''
delete [dbo].[tb1]
where [sno] = @pkc1
if @@rowcount = 0
    if @@microsoftversion>0x07320000
    Begin
        if exists (Select * from sys.all_parameters where object_id = OBJECT_ID('sp_MSreplraiserror') and [name] = '@param3')
        Begin
        set @primarykey_text = @primarykey_text + '[sno] = ' + convert(nvarchar(100),@pkc1,1)
        exec sp_MSreplraiserror @errorid=20598, @param1=N'[dbo].[tb1]', @param2=@primarykey_text, @param3=13234
        End
        Else
        exec sp_MSreplraiserror @errorid=20598
    End
end

Workaround

Redeploy the replication stored procedures using sp_scriptpublicationcustomprocs

If the 20598 error message shows nulls, we can still use the transaction sequence number as outlined here to get more info. But it is nice to have the values in the error message to easily find the culprit, example:

"The row was not found at the Subscriber when applying the replicated DELETE command for Table '[dbo].[tb1]' with Primary Key(s): [sno] = 3 (Source: MSSQLServer, Error number: 20598)"



Monday, January 15, 2018

Distribution Cleanup Job Blocking

The distribution cleanup job may block the log reader agent for several minutes, if there is a large number of rows to keep and "delete TOP N" deletes N-1 rows.

The Delete TOP N Behavior

When there are N or N+1 rows, delete TOP N stops scanning the range once N rows are reached
When there are N-1 rows, delete TOP N will scan the entire range. If the range contains a very large amount rows then the scan may take long time, and the locks will be held until the scan finishes, which can cause blocking.

For example, the distribution cleanup job calls the delete statements in a "WHILE 1 =1" loop from the following stored procedures

-- sp_MSdelete_publisherdb_trans line 202:
delete TOP(2000) MSrepl_commands WITH (PAGLOCK) from MSrepl_commands with (INDEX(ucMSrepl_commands))

-- sp_MSdelete_dodelete line 38:
delete TOP(5000) MSrepl_transactions WITH (PAGLOCK) from MSrepl_transactions with (INDEX(ucMSrepl_transactions))

The following extended event session can be used to get the query plans for line 202 and 38 of the respective stored procedures

CREATE EVENT SESSION [distribution_cleanup] ON SERVER 
ADD EVENT sqlserver.query_post_execution_showplan(
    ACTION(sqlserver.query_plan_hash,sqlserver.tsql_stack)
    WHERE (([object_name]=N'sp_MSdelete_publisherdb_trans' 
 AND [sqlserver].[like_i_sql_unicode_string]([sqlserver].[tsql_stack],N'%frame level=''1''%line=''202''%frame level=''2''%') 
 OR ([object_name]=N'sp_MSdelete_dodelete'
 AND [sqlserver].[like_i_sql_unicode_string]([sqlserver].[tsql_stack],N'%frame level=''1''%line=''38''%frame level=''2''%'))))),
ADD EVENT sqlserver.sp_statement_completed(SET collect_object_name=(1),collect_statement=(1)
    ACTION(sqlserver.query_plan_hash,sqlserver.tsql_stack)
    WHERE (([object_name]=N'sp_MSdelete_publisherdb_trans' 
 AND [sqlserver].[like_i_sql_unicode_string]([sqlserver].[tsql_stack],N'%frame level=''1''%line=''202''%frame level=''2''%') 
 OR ([object_name]=N'sp_MSdelete_dodelete'
 AND [sqlserver].[like_i_sql_unicode_string]([sqlserver].[tsql_stack],N'%frame level=''1''%line=''38''%frame level=''2''%')))))
ADD TARGET package0.event_file(SET filename=N'c:\temp\distribution_cleanup.xel')
WITH (TRACK_CAUSALITY=ON)
GO

In this particular case, the table MSrepl_commands had about 600 million rows, each iteration deletes 2,000 rows with a few milliseconds of duration, but the last iteration deleted 1,352 rows with a 19 minute duration


The following query can be used to extract the query plan from the .xel file, so we can just use SSMS to visualize it

SELECT event_data = CONVERT(XML, event_data)
INTO #tmp
FROM sys.fn_xe_file_target_read_file( N'C:\temp\distribution_cleanup_0_131593922293630000*.xel', NULL, NULL, NULL);

SELECT 
t.event_data.value('(/event/@timestamp)[1]', 'datetime2') AS event_time
,t.event_data.value('(/event/data[@name="duration"]/value)[1]', 'bigint') AS duration
,t.event_data.value('(/event/data[@name="object_name"]/value)[1]', 'varchar(255)') AS object_name
,actual_plan = x.xml_fragment.query('.')
FROM #tmp AS t
CROSS APPLY t.event_data.nodes(N'/event/data[@name="showplan_xml"]/value/*')
AS x(xml_fragment)
order by event_time 


Comparing the query plan when it deletes 2,000 rows vs. the plan that deletes 1.356 rows, the former reads 2,000 rows and the latter reads more than 200 million rows due to the residual predicate


This explains why the lock is held for a long time, and the blocking is explained by the cluster index definition shown below

The Cluster Index

Both tables MSrepl_commands and MSrepl_transaction have a similar cluster index that has the publisher_database_id as the leading key column. Thus, rows are stored in order by publisher_database_id. An example for MSrepl_transactions is shown below


If there is more than one publisher_database_id then the last page of the first publisher_database_id range will contain not only the newest rows for the first publisher_database_id but also the oldest rows for the second publisher_database_id. So, the delete with (PAGLOCK) of the oldest rows for the second publisher_database_id will block the insert of new rows by the log reader for the first publisher_database_id .

Example

In my test environment two publisher databases pub1 and pub2, have publisher_database_id = 3 and publisher_database_id = 4 respectively.

-- create databases
use master
create database pub1
create database pub2
go

-- create tables
use pub1
create table test1(a int not null identity, b datetime, c varchar(255)
constraint test1_pk primary key (a)
)
use pub2
create table test2(a int not null identity, b datetime, c varchar(255)
constraint test2_pk primary key (a)
)
GO

-- Adding the transactional publication with immediate synch
use [pub1]
exec sp_replicationdboption @dbname = N'pub1', @optname = N'publish', @value = N'true'
exec sp_addpublication @publication = N'test_pub1', @allow_anonymous = N'true', @immediate_sync = N'true', @independent_agent = N'true', @status = N'active'
exec sp_addpublication_snapshot @publication = N'test_pub1', @frequency_type = 1
exec sp_addarticle @publication = N'test_pub1', @article = N'test1', @source_owner = N'dbo', @source_object = N'test1'
GO

-- Adding the transactional publication with immediate synch
use [pub2]
exec sp_replicationdboption @dbname = N'pub2', @optname = N'publish', @value = N'true'
exec sp_addpublication @publication = N'test_pub2', @allow_anonymous = N'true', @immediate_sync = N'true', @independent_agent = N'true', @status = N'active'
exec sp_addpublication_snapshot @publication = N'test_pub2', @frequency_type = 1
exec sp_addarticle @publication = N'test_pub2', @article = N'test2', @source_owner = N'dbo', @source_object = N'test2'
GO

-- disable distribution cleanup job
exec msdb..sp_update_job @job_name = 'Distribution clean up: distribution', @enabled = 0

-- add 5999 rows at 2:43 PM
use pub2
insert test2(b,c)
values (getdate(),replicate('x',200))
go 5999

-- add 21000 rows at 3:44 PM (one hour later)
use pub1
insert test1(b,c)
values (getdate(),replicate('x',200))
go 1000

use pub2
insert test2(b,c)
values (getdate(),replicate('x',200))
go 20000

-- verify there is 26999 rows
use distribution
select count(*) from MSrepl_transactions
select count(*) from MSrepl_commands

-- get root page
select allocated_page_file_id, allocated_page_page_id, next_page_page_id, previous_page_page_id, page_level 
from sys.dm_db_database_page_allocations(db_id(),object_id('MSrepl_transactions'),1,null,'DETAILED') 
where page_type_desc is not null and page_type_desc = 'INDEX_PAGE'


DBCC TRACEON(3604)
DBCC PAGE (distribution, 1, 72305, 3);
DBCC TRACEOFF(3604)



The last page for the publisher_database_id = 3 range is 72702 and the first page for the publisher_database_id = 4 range is 72629.

To simulate a long duration we can execute the distribution cleanup proc with an open transaction

use distribution
begin tran
EXEC dbo.sp_MSdistribution_cleanup @min_distretention = 0, @max_distretention = 1

Removed 5999 replicated transactions consisting of 5999 statements in 1 seconds (11998 rows/sec).

The cleanup job removed the 5,999 rows we added 1 hour ago for publisher_database_id = 4 and held an X lock on page 72702 that contains the oldest rows for publisher_database_id = 4. We can see the locks using this query

select * from sys.dm_tran_locks where request_session_id = 68 and request_mode = 'X'



In another session, we can execute the following script to add new rows to publisher_database_id = 3 (pub1)

use pub1
insert test1(b,c)
values (getdate(),replicate('x',200))
go 

The log reader agent for publisher_database_id = 3 is blocked by the delete of publisher_database_id = 4, as shown in the spWhoIsActive output, because page 72702 also contains the newest rows for publisher_database_id = 3


Workaround

We can avoid deleting N-1 rows by "delete TOP N" and prevent it from reading million of rows, if we modify the delete loop to iterate a multiple of N on the sp_MSdelete_publisherdb_trans stored procedure.

Code snipped for the MSrepl_commands table

select @max = count(*)/2000 from MSrepl_commands with (INDEX(ucMSrepl_commands), nolock) where
 publisher_database_id = @publisher_database_id and
 xact_seqno <= @max_xact_seqno and
 (type & ~@snapshot_bit) not in (@directory_type, @alt_directory_type) and
 (type & ~@replpost_bit) <> @scriptexec_type
 OPTION (MAXDOP 4)

WHILE @i <= @max
BEGIN

 DELETE TOP(2000) MSrepl_commands WITH (PAGLOCK) from MSrepl_commands with (INDEX(ucMSrepl_commands)) where
  publisher_database_id = @publisher_database_id and
  xact_seqno <= @max_xact_seqno and
  (type & ~@snapshot_bit) not in (@directory_type, @alt_directory_type) and
  (type & ~@replpost_bit) <> @scriptexec_type
  OPTION (MAXDOP 1)
  
 select @row_count = @@rowcount
 -- Update output parameter
 select @num_commands = @num_commands + @row_count
    
 set @i=@i+1
END

Code snipped for the MSrepl_transactions table

select @max = count(*)/5000 from MSrepl_transactions with (INDEX(ucMSrepl_transactions), nolock) where
 publisher_database_id = @publisher_database_id and
 xact_seqno <= @max_xact_seqno and
 xact_seqno <> @last_xact_seqno and
 xact_seqno <> @last_log_xact_seqno
 OPTION (MAXDOP 4)

WHILE @i <= @max
BEGIN
 exec dbo.sp_MSdelete_dodelete @publisher_database_id, 
  @max_xact_seqno, 
  @last_xact_seqno, 
  @last_log_xact_seqno,
  @has_immediate_sync


 select @row_count = @@rowcount

 -- Update output parameter
 select @num_transactions = @num_transactions + @row_count
   
 set @i=@i+1
END

Note that Microsoft doesn't support modifying the system stored procedures, so test it at your own risk



Wednesday, June 14, 2017

Version Ghost Records Causing Replication Latency

Setting snapshot isolation level on the subscriber database is a common practice to avoid user queries blocking transactional replication threads.

However, a side effect may be an increase of replication latency during long running user queries because the version ghost records from the MSreplication_subscription table cannot to be cleanup until those long running queries complete.

MSreplication_subscriptions 

There is one MSreplication_subscriptions table on each subscriber database, it contains one row per subscription, and it is used to keep track of the last transaction delivered to the subscriber. Using extended events filtered by the session id of the distribution agent

CREATE EVENT SESSION [subscription_thread_trace] ON SERVER 
ADD EVENT sqlserver.prepare_sql(
    WHERE ([sqlserver].[session_id]=(59))),
ADD EVENT sqlserver.rpc_starting(
    WHERE ([sqlserver].[session_id]=(59))),
ADD EVENT sqlserver.sql_batch_starting(
    WHERE ([sqlserver].[session_id]=(59))),
ADD EVENT sqlserver.sql_statement_starting(
    WHERE ([sqlserver].[session_id]=(59)))
WITH (TRACK_CAUSALITY=ON)
GO

We can see that the table is updated after each transaction has been committed to the subscriber database



If there are thousands of transactions replicated, this table will be updated thousands of times too

Row Versioning 

If the subscriber database has snapshot isolation enabled, example

ALTER DATABASE [subscriber] SET READ_COMMITTED_SNAPSHOT ON 

then each update on the MSreplication_subscription table will mark the old row on the page as ghost, add a pointer to the row version in the version store on tempdb, and add the new row to the page.

Ghost records cannot be cleanup if they point to a row version on the version store, and the row version cannot be cleanup if there is a query using the version store. More info can be found here and here.

So a long running query will cause MSreplication_subscription to grow full of version ghost records 

Example 

To simulate a log running query in a test environment, we can leave a transaction open on any table on the subscriber database

USE [subscriber_db]
BEGIN TRAN
DELETE TOP(1) [table]

Execute the statement obtained from the extended event session

exec sp_executesql N'update MSreplication_subscriptions 
set transaction_timestamp = cast(@P1 as binary(15)) + cast(substring(transaction_timestamp, 16, 1) as binary(1))
, "time" = @P2 
where UPPER(publisher) = UPPER(@P3) 
and publisher_db = @P4 
and publication = @P5 
and subscription_type = 0 
and (substring(transaction_timestamp, 16, 1) = 0 or datalength(transaction_timestamp) < 16)'
,N'@P1 varbinary(14)
,@P2 datetime
,@P3 nvarchar(24)
,@P4 nvarchar(8)
,@P5 nvarchar(5)'
,0x0
,'2017-06-13 15:03:47'
,N'subscriber'
,N'subscriber_db'
,N'publication'
go 

Find the data page id

DBCC IND ('susbcriber_db', 'MSreplication_subscriptions', 1);



Examine the page content

DBCC TRACEON(3604)
DBCC PAGE (subscriber_db, 1, 1767872, 3);
DBCC TRACEOFF(3604)

After the update, the old row on the page is marked as ghost in slot 0 that point to the version on the version store on tempdb. The new row is added in slot 1, so internally an update will behave as a delete and insert pair, as shown below, some output has been omitted.

Slot 0 Offset 0x18e Length 302

Record Type = GHOST_DATA_RECORD     Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS VERSIONING_INFO
Record Size = 302                   
Memory Dump @0x000000F2084FA18E

0000000000000000:   7c003e00 00000000 2d880391 a7000000 80440d57  |.>.....-..‘§....D.W
0000000000000014:   fd7f0000 d820e17f c2027dae 62de7eb6 754898b0  ý...Ø á.Â.}®bÞ~¶uH.°
0000000000000028:   f2bb6c6c 0b260000 05000000 00000000 01000000  ò»ll.&..............
000000000000003C:   fd7f0e00 00150500 58006800 9800a800 20017400  ý.......X.h...¨. .t.
0000000000000050:   65007300 74003100 74007000 63006300 5f007300  e.s.t.1.t.p.c.c._.s.
0000000000000064:   73006400 42004f00 57004e00 56004400 49004d00  s.d.B.O.W.N.V.D.I.M.
0000000000000078:   4d003100 5c004200 4f004d00 53005300 4e005600  M.1.\.B.O.M.S.S.N.V.
000000000000008C:   44004900 4d004d00 30003100 00000000 00000000  D.I.M.M.0.1.........
00000000000000A0:   00000000 00000000 42004f00 57004e00 56004400  ........B.O.W.N.V.D.
00000000000000B4:   49004d00 4d003100 5c004200 4f004d00 53005300  I.M.M.1.\.B.O.M.S.S.
00000000000000C8:   4e005600 44004900 4d002d00 74007000 63006300  N.V.D.I.M.-.t.p.c.c.
00000000000000DC:   5f007300 73006400 2d007400 65007300 74003100  _.s.s.d.-.t.e.s.t.1.
00000000000000F0:   2d004200 4f005700 53005300 44005400 45005300  -.B.O.W.S.S.D.T.E.S.
0000000000000104:   54003000 33005c00 42004f00 4d005300 53005300  T.0.3.\.B.O.M.S.S.S.
0000000000000118:   53004400 2d003100 f01e0200 05000100 745a2f07  S.D.-.1.ð.......tZ/.
000000000000012C:   0000                                          ..     

Version Information = 
 Transaction Timestamp: 120543860
 Version Pointer: (file 5 page 138992 currentSlotId 1)

Slot 1 Offset 0x2bc Length 302

Record Type = PRIMARY_RECORD        Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS VERSIONING_INFO
Record Size = 302                   
Memory Dump @0x000000F2084FA2BC

0000000000000000:   70003e00 00000000 01880391 a7000000 00100000  p.>........‘§.......
0000000000000014:   20000000 00000001 00e07dae 62de7eb6 754898b0   ........à}®bÞ~¶uH.°
0000000000000028:   f2bb6c6c 0b260000 f0bceb56 fd7f0000 d7fa7cfd  ò»ll.&..ð¼Ã«Vý...×ú|ý
000000000000003C:   7f000e00 00150500 58006800 9800a800 20017400  ........X.h...¨. .t.
0000000000000050:   65007300 74003100 74007000 63006300 5f007300  e.s.t.1.t.p.c.c._.s.
0000000000000064:   73006400 42004f00 57004e00 56004400 49004d00  s.d.B.O.W.N.V.D.I.M.
0000000000000078:   4d003100 5c004200 4f004d00 53005300 4e005600  M.1.\.B.O.M.S.S.N.V.
000000000000008C:   44004900 4d004d00 30003100 01000000 00000000  D.I.M.M.0.1.........
00000000000000A0:   00000000 00000000 42004f00 57004e00 56004400  ........B.O.W.N.V.D.
00000000000000B4:   49004d00 4d003100 5c004200 4f004d00 53005300  I.M.M.1.\.B.O.M.S.S.
00000000000000C8:   4e005600 44004900 4d002d00 74007000 63006300  N.V.D.I.M.-.t.p.c.c.
00000000000000DC:   5f007300 73006400 2d007400 65007300 74003100  _.s.s.d.-.t.e.s.t.1.
00000000000000F0:   2d004200 4f005700 53005300 44005400 45005300  -.B.O.W.S.S.D.T.E.S.
0000000000000104:   54003000 33005c00 42004f00 4d005300 53005300  T.0.3.\.B.O.M.S.S.S.
0000000000000118:   53004400 2d003100 00000000 00000000 745a2f07  S.D.-.1.........tZ/.
000000000000012C:   0000                                          ..     

Version Information = 
 Transaction Timestamp: 120543860
 Version Pointer: Null

Eventually, after thousands of updates this table will have a high number of version ghost records, on the following example the table had 17,085 version ghost records and the table grew to 1,315 pages instead of 1 page

SELECT page_count, record_count, ghost_record_count, version_ghost_record_count 
FROM sys.dm_db_index_physical_stats(db_id('tpcc_ssd'), object_id('MSreplication_subscriptions'), NULL, NULL , 'SAMPLED');


Replication Latency 

The problem of having MSreplication_subscriptions with a high number of ghost records is that the update to this table will take longer and longer, for example we may see that the logical reads are 1,363 pages

set statistics IO on
go
exec sp_executesql N'update MSreplication_subscriptions 
set transaction_timestamp = cast(@P1 as binary(15)) + cast(substring(transaction_timestamp, 16, 1) as binary(1))
, "time" = @P2 
where UPPER(publisher) = UPPER(@P3) 
and publisher_db = @P4 
and publication = @P5 
and subscription_type = 0 
and (substring(transaction_timestamp, 16, 1) = 0 or datalength(transaction_timestamp) < 16)'
,N'@P1 varbinary(14)
,@P2 datetime
,@P3 nvarchar(24)
,@P4 nvarchar(8)
,@P5 nvarchar(5)'
,0x000009B600088BB0001D00000000
,'2017-06-13 15:03:47'
,N'subscriber'
,N'subscriber_db'
,N'publication'
go 
set statistics IO off
go

Table 'MSreplication_subscriptions'. Scan count 1, logical reads 1362, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)


Because the where clause of the update is not a SARG, the query plan will show a cluster index scan 


Workaround 

If we find that the MSreplication_susbcriptions has a high number of version ghost records

SELECT page_count, record_count, ghost_record_count, version_ghost_record_count 
FROM sys.dm_db_index_physical_stats(db_id('tpcc_ssd'), object_id('MSreplication_subscriptions'), NULL, NULL , 'SAMPLED');

Then it means there is a long running query using the version store not allowing the ghost cleanup.
One way to force the ghost cleanup is to rebuild the index

ALTER INDEX uc1MSReplication_subscriptions on MSreplication_subscriptions REBUILD WITH(ONLINE=ON)

The updates will now scan one page and the replication latency should come down

Thursday, March 23, 2017

Replication Commands That May Cause Blocking

Besides the replication snapshot agent, there are other replication commands that require a Schema Modification (SCH-M) lock for a very short duration on the article(s) of the publication, and they may cause blocking in very busy publishers.

To find out which commands use SCH-M, you can use the lock acquired extended event filtered by the object id of the article (table) in the publication database and the lock mode = SCH-M while executing the replication commands:

CREATE EVENT SESSION [replication_sch_m_locks] ON SERVER 
ADD EVENT sqlserver.lock_acquired(
    ACTION(sqlserver.client_app_name,sqlserver.sql_text,sqlserver.tsql_stack)
    WHERE ([mode]=(2) 
    AND ([object_id]=(2106086889) 
    OR [object_id]=(2138087003) 
    OR [object_id]=(22603469)) 
    AND [database_id]=(37)))
ADD TARGET package0.event_file(SET filename=N'c:\temp\replication_sch_m_locks'
    ,max_file_size=(100),max_rollover_files=(10))
WITH (TRACK_CAUSALITY=ON)
GO

Below are the commands that use SCH-M lock on articles (tables)
  • Sp_addarticle @article='article'
  • Sp_changearticle @article='article '
  • Sp_dropsubscription @article='article' 
  • Sp_dropsubscription @article=’All’ 
  • Initialize with a snapshot the articles being added 
For example, if you add add one article (object id) to an existing publication that has two articles (object id = 2106086889 and object id = 2138087003), sp_addarticle requires SCH-M lock on the table you are adding (object id = 22603469)


However, when you start the snapshot agent, it will require SCH-M locks on all the tables of the publication (object id = 2106086889, object id = 2138087003, and object id = 22603469), not just the one you just added (object id = 22603469)

Wednesday, March 22, 2017

Duplicate Key or Row Not Found Errors

For example, we will get this error in the distribution agent

Violation of PRIMARY KEY constraint 'PK_Sales_OrderLines'. Cannot insert duplicate key in object 'Sales.OrderLines'. The duplicate key value is (2). (Source: MSSQLServer, Error number: 2627)

Or

The row was not found at the Subscriber when applying the replicated DELETE command for Table '[Sales].[OrderLines]' with Primary Key(s): [OrderLineID] = 2 (Source: MSSQLServer, Error number: 20568)

The most common reasons for these errors I have found are: duplicate article subscription or another process modifying the data in the subscriber table.

Duplicate article subscription

To find out if it is due to duplicate article subscription, we can run in the publisher database (e.g. WideWorldImporters) the following stored procedure providing the subscriber instance name and the article name from the error message above (e.g. OrderLines)

exec sp_helpsubscription @subscriber = 'InstanceName', @article = 'OrderLines'


We will see that we are subscribing to the same article OrderLines from 2 publications BigTables and SmallTables

Thus a single insert in the table OrderLines in the publisher will be replicated twice to the subscriber generating the duplicate key error. In similar manner, a single delete in the table OrderLines in the publisher will be replicated twice to the subscriber generating the row not found error

To fix it



Another Process Modifying the Data

To find out if it is due to another process modifying the data we can create a SQL Audit on the subscriber table (e.g. OrderLines). We can filter by the account that have access or use public to show all accounts, as shown below.
USE [master]
GO

CREATE SERVER AUDIT [Audit-OrderLines]
TO FILE 
( FILEPATH = N'c:\temp\'
 ,MAXSIZE = 1024 MB
 ,MAX_FILES = 10
 ,RESERVE_DISK_SPACE = OFF
)
WITH
( QUEUE_DELAY = 1000
 ,ON_FAILURE = CONTINUE
 ,AUDIT_GUID = '146fab87-58c8-4521-b4b5-9ffb175f739a'
)
ALTER SERVER AUDIT [Audit-OrderLines] WITH (STATE = ON)
GO

USE [WideWorldImportersDSS]
GO

CREATE DATABASE AUDIT SPECIFICATION [DatabaseAuditSpecification-OrdeLines]
FOR SERVER AUDIT [Audit-OrderLines]
ADD (DELETE ON OBJECT::[Sales].[OrderLines] BY [public]),
ADD (INSERT ON OBJECT::[Sales].[OrderLines] BY [public]),
ADD (UPDATE ON OBJECT::[Sales].[OrderLines] BY [public])
WITH (STATE = ON)
GO

The SQL Audit file will show who, when, and what modified records from the table. In the example below, the first line was a trigger, the second was a stored procedure, the third was an ad-hoc SQL, and the rest of the lines were done by the replication account


To fix it, we need to stop the other processes are modifying data and then resynch the data

Friday, December 30, 2016

Querying the Data Validation Results

Typically, I use replication monitor to validate subscriptions and we can see the results on the subscription details window, as shown below. But if the publication has many tables, we can use a query instead, to easily spot the table that is out of synchronization.


We can run this query on the distribution database and provide the publisher database (e.g. WideWorldImporters) and the publication (e.g. TransactionTables)

 
select 
a.name
,h.comments
from MSdistribution_history h with (nolock)
join MSdistribution_agents a with (nolock)
on h.agent_id = a.id
where a.publisher_db = 'WideWorldImporters'
and a.publication = 'TransactionTables'
and h.comments like 'Table% out of synchronization%'


Thursday, August 4, 2016

Replication Snapshot Agent Blocking

The snapshot agent is used to initialize the subscription in transaction replication, and it requires a very short duration schema modification (SCH-M) lock on all the articles of the publication. If the articles are being queried thousands per second or queried for a long time then they will block the snapshot agent because SCH-M lock is not compatible with any other lock, once the SCH-M lock is waiting, everybody else will wait.

If blocking for few seconds is an issue in your system then you can initialize when there is low activity or initialize without a snapshot documented in books online.

Below I will show you how to reproduce this blocking scenario using a long running query.
 

-- on publisher: create publication database

use master
create database publisher_test_snapshot_agent_blocking
go

use publisher_test_snapshot_agent_blocking
create table paul_test1(a int primary key, b int)
create table paul_test2(a int primary key, b int)

-- on subscriber: create subscription database

use master
create database subscriber_test_snapshot_agent_blocking
go

-- on publisher: create publication and subscription

use publisher_test_snapshot_agent_blocking
exec sp_replicationdboption @dbname = N'publisher_test_snapshot_agent_blocking', @optname = N'publish', @value = N'true'
exec sp_addpublication @publication = N'test_snapshot_agent_blocking', @description = N'test', @sync_method = N'concurrent', @retention = 0, @allow_push = N'true', @allow_pull = N'true', @allow_anonymous = N'false', @enabled_for_internet = N'false', @snapshot_in_defaultfolder = N'true', @compress_snapshot = N'false', @ftp_port = 21, @ftp_login = N'anonymous', @allow_subscription_copy = N'false', @add_to_active_directory = N'false', @repl_freq = N'continuous', @status = N'active', @independent_agent = N'true', @immediate_sync = N'false', @allow_sync_tran = N'false', @autogen_sync_procs = N'false', @allow_queued_tran = N'false', @allow_dts = N'false', @replicate_ddl = 1, @allow_initialize_from_backup = N'false', @enabled_for_p2p = N'false', @enabled_for_het_sub = N'false'
exec sp_addpublication_snapshot @publication = N'test_snapshot_agent_blocking', @frequency_type = 1, @frequency_interval = 0, @frequency_relative_interval = 0, @frequency_recurrence_factor = 0, @frequency_subday = 0, @frequency_subday_interval = 0, @active_start_time_of_day = 0, @active_end_time_of_day = 235959, @active_start_date = 0, @active_end_date = 0, @job_login = null, @job_password = null, @publisher_security_mode = 1
exec sp_addarticle @publication = N'test_snapshot_agent_blocking', @article = N'paul_test1', @source_owner = N'dbo', @source_object = N'paul_test1', @type = N'logbased', @description = N'', @creation_script = N'', @pre_creation_cmd = N'drop', @schema_option = 0x000000000803509F, @identityrangemanagementoption = N'none', @destination_table = N'paul_test1', @destination_owner = N'dbo', @status = 24, @vertical_partition = N'false', @ins_cmd = N'CALL [sp_MSins_dbopaul_test1]', @del_cmd = N'CALL [sp_MSdel_dbopaul_test1]', @upd_cmd = N'SCALL [sp_MSupd_dbopaul_test1]'
exec sp_addarticle @publication = N'test_snapshot_agent_blocking', @article = N'paul_test2', @source_owner = N'dbo', @source_object = N'paul_test2', @type = N'logbased', @description = N'', @creation_script = N'', @pre_creation_cmd = N'drop', @schema_option = 0x000000000803509F, @identityrangemanagementoption = N'none', @destination_table = N'paul_test2', @destination_owner = N'dbo', @status = 24, @vertical_partition = N'false', @ins_cmd = N'CALL [sp_MSins_dbopaul_test2]', @del_cmd = N'CALL [sp_MSdel_dbopaul_test2]', @upd_cmd = N'SCALL [sp_MSupd_dbopaul_test2]'
exec sp_addsubscription @publication = N'test_snapshot_agent_blocking', @subscriber = N'(subscriber)', @destination_db = N'subscriber_test_snapshot_agent_blocking', @subscription_type = N'Push', @sync_type = N'automatic', @article = N'all', @update_mode = N'read only', @subscriber_type = 0
GO

-- on a new query window: run this long running select on the publisher

use publisher_test_snapshot_agent_blocking
select *
from master..spt_values a
cross join master..spt_values b
cross join paul_test1

-- on a new query window: start the snapshot agent  on the distributor

use distribution
exec sp_MSstartsnapshot_agent @publisher = N'(publisher)', @publisher_db = N'publisher_test_snapshot_agent_blocking', @publication = N'test_snapshot_agent_blocking'

-- on a new query window: run this long running select on the publisher

use publisher_test_snapshot_agent_blocking
select *
from master..spt_values a
cross join master..spt_values b
cross join paul_test1

-- on a new query window: run this long running select on the publisher

use publisher_test_snapshot_agent_blocking
select *
from master..spt_values a
cross join master..spt_values b
cross join paul_test1

-- on a new query window: run sp_WhoIsActive

exec sp_WhoIsActive 
@get_outer_command = 1
,@get_plans = 1
,@get_transaction_info = 1
,@find_block_leaders = 1
,@get_locks = 1

You will see that the long running query (session id = 88) is blocking the snapshot agent (session id = 95) that is requesting SCH-M locks, which in turn is blocking the other 2 queries (session_ id = 52 and 86) requesting Intent Shared (IS) locks.


Saturday, October 31, 2015

Analyzing Replication Performance Statistics

As mentioned in this KB article, useful data to troubleshoot replication performance is stored in the MSlogreader_history and MSdistribution_history tables in the distribution database.

I was troubleshooting latency from the distribution to the subscriber database. So here is the query to get the info from MSdistribution_history for a particular distribution agent id. Similar query can be done for the log reader.
 
declare @agent_id int = 662;

with a as(
select 
h.agent_id
,h.time
,cast(h.comments as xml) comments
from MSdistribution_history h with (nolock)     
where             
h.[comments] like '<stats state="1"%'  
and h.agent_id = @agent_id
)
select 
a.agent_id
,a.time 
,c.value('(/stats/@state)[1]', 'int') AS [state]      
,c.value('(/stats/@work)[1]', 'int') AS [work]      
,c.value('(/stats/@idle)[1]', 'int') AS [idle]            
,c.value('(/stats/reader/@fetch)[1]', 'int') AS [reader_fetch]      
,c.value('(/stats/reader/@wait)[1]', 'int')  AS [reader_wait]            
,c.value('(/stats/writer/@write)[1]', 'int') AS [writer_write]      
,c.value('(/stats/writer/@wait)[1]', 'int')  AS [writer_wait]            
,c.value('(/stats/sincelaststats/@elapsedtime)[1]', 'int') AS [sincelaststats_elapsedtime]      
,c.value('(/stats/sincelaststats/@work)[1]', 'int') AS [sincelaststats_work]      
,c.value('(/stats/sincelaststats/@cmds)[1]', 'int') AS [sincelaststats_cmds]      
,c.value('(/stats/sincelaststats/@cmdspersec)[1]', 'float') AS [sincelaststats_cmdspersec]       
,c.value('(/stats/sincelaststats/reader/@fetch)[1]', 'int') AS [sincelaststats_reader_fetch]      
,c.value('(/stats/sincelaststats/reader/@wait)[1]', 'int')  AS [sincelaststats_reader_wait]            
,c.value('(/stats/sincelaststats/writer/@write)[1]', 'int') AS [sincelaststats_writer_write]      
,c.value('(/stats/sincelaststats/writer/@wait)[1]', 'int')  AS [sincelaststats_writer_wait]    
from a
cross apply a.comments.nodes('/stats[@state="1"]') b(c)
order by time;

Replication uses a memory buffer structure to queue up the rows between the reader thread and the writer thread. Thus a high “since last stats reader wait” means it is waiting from the writer thread to consume the rows from the buffer and deliver them to the subscriber, while a high “since last stats writer wait” means it is waiting for the reader thread to produce rows from the distribution to the buffer.

Next, you can use your favorite tool (I used SSRS) to create a graph so we can visualize the information better. In the graph below, we can see there is a correlation between the number of commands replicated and the replication latency. So we can either decrease the number of commands replicated or increase the performance of the subscriber. I this particular case we focused on the subscriber side and found that IO was the bottleneck.


Wednesday, April 29, 2015

Find subscriptions soon to be deactivated

By default, a transaction that has been delivered to the subscriber is deleted from the distribution database by the Distribution clean job that runs every 10 minutes. If for some reason the transaction cannot be delivered it will be deleted after the max retention of 72 hrs, the subscription will be marked inactive, and we will get this error message:

"The subscription(s) have been marked inactive and must be reinitialized. NoSync subscriptions will need to be dropped and recreated."

Once it is deactivated only reinitializing it will activate it back because data will be missing so it is the only way to assure consistency.

To avoid the deactivation, we should find why the transaction cannot be delivered and fix it before the max retention. However, depending on how the subscriptions are being monitored, an error may not surface until it is too late, for example if the max number of retries of the distribution agent is set to the default, replication monitor won’t show an error until 2,147,483,647 retries.

To find the subscriptions soon to be deactivated, we can use the same logic the stored procedure sp_MSsubscription_cleanup uses, but instead of using the max retention period of the distribution database, we can use a custom value on the variable @cutoff_time.

 
use distribution

declare 
@ACTIVE   tinyint
,@INACTIVE   tinyint
,@SUBSCRIBED  tinyint
,@VIRTUAL   smallint
,@SNAPSHOT_BIT  int
,@max_time   datetime
,@cutoff_time   datetime
,@max_distret   int

select 
@ACTIVE   = 2
,@INACTIVE   = 0
,@SUBSCRIBED  = 1
,@VIRTUAL   = -1
,@SNAPSHOT_BIT  = 0x80000000

-- set @cutoff_time to how old the transaction need to be in the queue to show results

select @cutoff_time = dateadd(hour, -1, getdate())
select @max_time = dateadd(hour, 1, getdate())

-- get the distribution agent with the oldest record in the queue

select derivedInfo.agent_id, derivedInfo.xact_seqno, derivedInfo.publisher_database_id
into #temp
 
-- exact code from sp_MSsubscription_cleanup
 from (
   -- Here we are retrieving the agent id, publisher database id, 
   -- min subscription sequence number, and the transaction seqno 
   -- related to the max timestamp row in the history table. this is
   -- important since the tran seqno can go back to lower values in 
   -- the case of reinit with immediate sync.
   select s.agent_id as agent_id,
    s.publisher_database_id as publisher_database_id,
    min(s.subscription_seqno) as subscription_seqno,
    isnull(h.xact_seqno, 0x0) as xact_seqno
   from MSsubscriptions s with (nolock)
    left join (MSdistribution_history h with (nolock)
      join (select agent_id, 
         max(timestamp) as timestamp
        from MSdistribution_history with (nolock)
        group by agent_id) as h2 
       on h.agent_id = h2.agent_id 
        and h.timestamp = h2.timestamp)
     on s.agent_id = h.agent_id
   where s.status = @ACTIVE                       
    and s.subscriber_id >= 0  -- Only well-known agent
   group by s.agent_id,            -- agent and pubdbid as a pair can never be differnt
    s.publisher_database_id,      
    isnull(h.xact_seqno, 0x0) -- because of join above we can include this
  ) derivedInfo
 where @cutoff_time >= (
      -- get the entry_time of the first transaction that cannot be
      -- cleaned up normally because of this agent.
      -- use history if it exists and is larger
      case when derivedInfo.xact_seqno >= derivedInfo.subscription_seqno
      then
       -- join with commands table to filter out transactions that do not have commands
       isnull((select top 1 entry_time 
          from MSrepl_transactions t with (nolock), 
            MSrepl_commands c with (nolock), 
            MSsubscriptions sss with (nolock)
          where sss.agent_id = derivedInfo.agent_id 
           and t.publisher_database_id = derivedInfo.publisher_database_id 
           and c.publisher_database_id = derivedInfo.publisher_database_id 
           and c.xact_seqno = t.xact_seqno
           -- filter out snapshot transactions not for this subscription 
           -- because they do not represent significant data changes
           and ((c.type & @SNAPSHOT_BIT) <> @SNAPSHOT_BIT 
             or (c.xact_seqno >= sss.subscription_seqno 
              and c.xact_seqno <= sss.ss_cplt_seqno)) 
           -- filter out non-subscription articles for independent agents
           and c.article_id = sss.article_id 
           -- history xact_seqno can be cleaned up
           and t.xact_seqno > isnull( derivedInfo.xact_seqno, 0x0 ) 
           and c.xact_seqno > isnull( derivedInfo.xact_seqno, 0x0 )
          order by t.xact_seqno asc), @max_time)
      else
       isnull((select top 1 entry_time 
          from MSrepl_transactions t with (nolock), 
            MSrepl_commands c with (nolock), 
            MSsubscriptions sss with (nolock)
          where sss.agent_id = derivedInfo.agent_id 
           and t.publisher_database_id = derivedInfo.publisher_database_id 
           and c.publisher_database_id = derivedInfo.publisher_database_id
           and c.xact_seqno = t.xact_seqno
           -- filter out snapshot transactions not for this subscription 
           -- because they do not represent significant data changes
           and ((c.type & @SNAPSHOT_BIT ) <> @SNAPSHOT_BIT 
             or (c.xact_seqno >= sss.subscription_seqno 
              and c.xact_seqno <= sss.ss_cplt_seqno))
           -- filter out non-subscription articles for independent agents
           and c.article_id = sss.article_id
           -- sub xact_seqno cannot be cleaned up
           and t.xact_seqno >= derivedInfo.subscription_seqno
           and c.xact_seqno >= derivedInfo.subscription_seqno
          order by t.xact_seqno asc), @max_time)
      end)

-- get max distribution retention

select @max_distret = d.max_distretention
from msdb.dbo.MSdistributiondbs d with (nolock)
where d.name = db_name()

-- get subscription name of oldest record in the queue soon to be deactivated

select 
ps.name publisher
,a.publisher_db
,a.publication
,ss.name subscriber
,'Last replicated transaction time: ' 
+cast(r.entry_time as varchar(25))
+'. Hours left before deactivation : ' 
+cast(@max_distret - datediff(hour, r.entry_time, getdate()) as varchar(10)) status_message
from #temp t
join MSrepl_transactions r with (nolock)
on t.xact_seqno = r.xact_seqno
and t.publisher_database_id = r.publisher_database_id
join MSdistribution_agents a
on t.agent_id = a.id
join sys.servers ps
on a.publisher_id = ps.server_id
join sys.servers ss
on a.subscriber_id = ss.server_id

drop table #temp