If we have an Availability Group (AG) publisher and we plan to rebuild the secondary without impacting transactional replication, then we may see unexpected results if the secondary was the original publisher. Because adding the distributor will delete the metadata in the remote distributor, and replication will need to be rebuild.
Scenario
Replication was created when HostA\InstanceA was the primary so it is the original publisher, as shown below.
Primary: HostA\InstanceA (Original Publisher)
Secondary: HostB\InstanceB
Distributor: HostR\InstanceR
After fail over HostA\InstanceA is the secondary, as shown below.
Secondary: HostA\InstanceA (Original Publisher)
Primary: HostB\InstanceB
Distributor: HostR\InstanceR
Then HostA\InstanceA is rebuild on new hardware without production impact, as shown below.
Secondary: HostA\InstanceA (Original Publisher Rebuild)
Primary: HostB\InstanceB
Distributor: HostR\InstanceR
But now we need to run sp_addistributor 'HostR\InstanceR' in the newly built secondary as outlined here
When we run sp_addistributor in the secondary, it executes the following in the distributor, passing the name of the secondary that was the original publisher
repl_distributor.distribution_db.sys.sp_MSdistpublisher_cleanup @@servername
So it will delete all the metadata and replication will stop working, then we will need to run sp_removereplication in the primary and rebuild replication
Workaround
Since sp_adddistributor passes @@servername, we can rename the secondary to HostANew\InstanceA using sp_dropserver/sp_addserver as outlined here, then sp_MSdistpublisher_cleanup @@servername won't find any matching entries of the original publisher in the distributor and replication will continue to work as expected.
Friday, August 24, 2018
Wednesday, August 15, 2018
The Nulls in the Infamous Error "The Row Was Not Found ..."
While I was troubleshooting an issue, I wondered why there were nulls on the error:
"The row was not found at the Subscriber when applying the replicated (null) command for Table '(null)' with Primary Key(s): (null) (Source: MSSQLServer, Error number: 20598)"
After a little research I found out that since SQL 2012 SP4, sp_MSreplraiserror and sp_MSdel_dbo% have 3 additional parameters used in the raise error 20598 command, and they will show null depending on the SQL version of the subscriber or if the subscriber was upgraded in place
sp_MSreplraiserror
It now expects 3 parameters in the raise error 20598 command, as shown below. It gets called in the subscriber by the delete stored procedure for the respective table, e.g. sp_MSdel_dbotb1. Thus, if the publisher is SQL 2016 SP2 and the subscriber is SQL 2012 SP3, then the error message will show nulls
Before
After
Workaround
Since it is a system stored procedure there is no other option than upgrading the subscriber
sp_MSdel_dbo%
It now passes values to the 3 parameters in sp_MSreplraiserror, as shown below. Thus, if the subscriber was upgraded in place e.g from SQL 2012 SP3 to SQL 2016 SP2, then the error message with show nulls
Before
After
Workaround
Redeploy the replication stored procedures using sp_scriptpublicationcustomprocs
If the 20598 error message shows nulls, we can still use the transaction sequence number as outlined here to get more info. But it is nice to have the values in the error message to easily find the culprit, example:
"The row was not found at the Subscriber when applying the replicated DELETE command for Table '[dbo].[tb1]' with Primary Key(s): [sno] = 3 (Source: MSSQLServer, Error number: 20598)"
"The row was not found at the Subscriber when applying the replicated (null) command for Table '(null)' with Primary Key(s): (null) (Source: MSSQLServer, Error number: 20598)"
After a little research I found out that since SQL 2012 SP4, sp_MSreplraiserror and sp_MSdel_dbo% have 3 additional parameters used in the raise error 20598 command, and they will show null depending on the SQL version of the subscriber or if the subscriber was upgraded in place
sp_MSreplraiserror
It now expects 3 parameters in the raise error 20598 command, as shown below. It gets called in the subscriber by the delete stored procedure for the respective table, e.g. sp_MSdel_dbotb1. Thus, if the publisher is SQL 2016 SP2 and the subscriber is SQL 2012 SP3, then the error message will show nulls
Before
create procedure sys.sp_MSreplraiserror @errorid int, @param1 sysname = null, @param2 sysname= null as if @errorid = 20508 raiserror (20508, 11, 1) ... else if @errorid = 20598 raiserror (20598, 16, 1)
After
create procedure sys.sp_MSreplraiserror @errorid int, @param1 sysname = null, @param2 sysname= null, @param3 int = null as if @errorid = 20508 raiserror (20508, 11, 1) ... else if @errorid = 20598 raiserror (20598, 16, 1, @param3, @param1, @param2)
Workaround
Since it is a system stored procedure there is no other option than upgrading the subscriber
sp_MSdel_dbo%
It now passes values to the 3 parameters in sp_MSreplraiserror, as shown below. Thus, if the subscriber was upgraded in place e.g from SQL 2012 SP3 to SQL 2016 SP2, then the error message with show nulls
Before
create procedure [dbo].[sp_MSdel_dbotb1] (@pkc1 int) as begin delete [dbo].[tb1] where [sno] = @pkc1 if @@rowcount = 0 if @@microsoftversion>0x07320000 exec sp_MSreplraiserror @errorid=20598 end
After
create procedure [dbo].[sp_MSdel_dbotb1] (@pkc1 int) as begin declare @primarykey_text nvarchar(100) = '' delete [dbo].[tb1] where [sno] = @pkc1 if @@rowcount = 0 if @@microsoftversion>0x07320000 Begin if exists (Select * from sys.all_parameters where object_id = OBJECT_ID('sp_MSreplraiserror') and [name] = '@param3') Begin set @primarykey_text = @primarykey_text + '[sno] = ' + convert(nvarchar(100),@pkc1,1) exec sp_MSreplraiserror @errorid=20598, @param1=N'[dbo].[tb1]', @param2=@primarykey_text, @param3=13234 End Else exec sp_MSreplraiserror @errorid=20598 End end
Workaround
Redeploy the replication stored procedures using sp_scriptpublicationcustomprocs
If the 20598 error message shows nulls, we can still use the transaction sequence number as outlined here to get more info. But it is nice to have the values in the error message to easily find the culprit, example:
"The row was not found at the Subscriber when applying the replicated DELETE command for Table '[dbo].[tb1]' with Primary Key(s): [sno] = 3 (Source: MSSQLServer, Error number: 20598)"
Subscribe to:
Posts (Atom)