Pages

Wednesday, June 14, 2017

Version Ghost Records Causing Replication Latency

Setting snapshot isolation level on the subscriber database is a common practice to avoid user queries blocking transactional replication threads.

However, a side effect may be an increase of replication latency during long running user queries because the version ghost records from the MSreplication_subscription table cannot to be cleanup until those long running queries complete.

MSreplication_subscriptions 

There is one MSreplication_subscriptions table on each subscriber database, it contains one row per subscription, and it is used to keep track of the last transaction delivered to the subscriber. Using extended events filtered by the session id of the distribution agent

CREATE EVENT SESSION [subscription_thread_trace] ON SERVER 
ADD EVENT sqlserver.prepare_sql(
    WHERE ([sqlserver].[session_id]=(59))),
ADD EVENT sqlserver.rpc_starting(
    WHERE ([sqlserver].[session_id]=(59))),
ADD EVENT sqlserver.sql_batch_starting(
    WHERE ([sqlserver].[session_id]=(59))),
ADD EVENT sqlserver.sql_statement_starting(
    WHERE ([sqlserver].[session_id]=(59)))
WITH (TRACK_CAUSALITY=ON)
GO

We can see that the table is updated after each transaction has been committed to the subscriber database



If there are thousands of transactions replicated, this table will be updated thousands of times too

Row Versioning 

If the subscriber database has snapshot isolation enabled, example

ALTER DATABASE [subscriber] SET READ_COMMITTED_SNAPSHOT ON 

then each update on the MSreplication_subscription table will mark the old row on the page as ghost, add a pointer to the row version in the version store on tempdb, and add the new row to the page.

Ghost records cannot be cleanup if they point to a row version on the version store, and the row version cannot be cleanup if there is a query using the version store. More info can be found here and here.

So a long running query will cause MSreplication_subscription to grow full of version ghost records 

Example 

To simulate a log running query in a test environment, we can leave a transaction open on any table on the subscriber database

USE [subscriber_db]
BEGIN TRAN
DELETE TOP(1) [table]

Execute the statement obtained from the extended event session

exec sp_executesql N'update MSreplication_subscriptions 
set transaction_timestamp = cast(@P1 as binary(15)) + cast(substring(transaction_timestamp, 16, 1) as binary(1))
, "time" = @P2 
where UPPER(publisher) = UPPER(@P3) 
and publisher_db = @P4 
and publication = @P5 
and subscription_type = 0 
and (substring(transaction_timestamp, 16, 1) = 0 or datalength(transaction_timestamp) < 16)'
,N'@P1 varbinary(14)
,@P2 datetime
,@P3 nvarchar(24)
,@P4 nvarchar(8)
,@P5 nvarchar(5)'
,0x0
,'2017-06-13 15:03:47'
,N'subscriber'
,N'subscriber_db'
,N'publication'
go 

Find the data page id

DBCC IND ('susbcriber_db', 'MSreplication_subscriptions', 1);



Examine the page content

DBCC TRACEON(3604)
DBCC PAGE (subscriber_db, 1, 1767872, 3);
DBCC TRACEOFF(3604)

After the update, the old row on the page is marked as ghost in slot 0 that point to the version on the version store on tempdb. The new row is added in slot 1, so internally an update will behave as a delete and insert pair, as shown below, some output has been omitted.

Slot 0 Offset 0x18e Length 302

Record Type = GHOST_DATA_RECORD     Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS VERSIONING_INFO
Record Size = 302                   
Memory Dump @0x000000F2084FA18E

0000000000000000:   7c003e00 00000000 2d880391 a7000000 80440d57  |.>.....-..Ԥ....D.W
0000000000000014:   fd7f0000 d820e17f c2027dae 62de7eb6 754898b0  ý...Ø á.Â.}®bÞ~¶uH.°
0000000000000028:   f2bb6c6c 0b260000 05000000 00000000 01000000  ò»ll.&..............
000000000000003C:   fd7f0e00 00150500 58006800 9800a800 20017400  ý.......X.h...¨. .t.
0000000000000050:   65007300 74003100 74007000 63006300 5f007300  e.s.t.1.t.p.c.c._.s.
0000000000000064:   73006400 42004f00 57004e00 56004400 49004d00  s.d.B.O.W.N.V.D.I.M.
0000000000000078:   4d003100 5c004200 4f004d00 53005300 4e005600  M.1.\.B.O.M.S.S.N.V.
000000000000008C:   44004900 4d004d00 30003100 00000000 00000000  D.I.M.M.0.1.........
00000000000000A0:   00000000 00000000 42004f00 57004e00 56004400  ........B.O.W.N.V.D.
00000000000000B4:   49004d00 4d003100 5c004200 4f004d00 53005300  I.M.M.1.\.B.O.M.S.S.
00000000000000C8:   4e005600 44004900 4d002d00 74007000 63006300  N.V.D.I.M.-.t.p.c.c.
00000000000000DC:   5f007300 73006400 2d007400 65007300 74003100  _.s.s.d.-.t.e.s.t.1.
00000000000000F0:   2d004200 4f005700 53005300 44005400 45005300  -.B.O.W.S.S.D.T.E.S.
0000000000000104:   54003000 33005c00 42004f00 4d005300 53005300  T.0.3.\.B.O.M.S.S.S.
0000000000000118:   53004400 2d003100 f01e0200 05000100 745a2f07  S.D.-.1.ð.......tZ/.
000000000000012C:   0000                                          ..     

Version Information = 
 Transaction Timestamp: 120543860
 Version Pointer: (file 5 page 138992 currentSlotId 1)

Slot 1 Offset 0x2bc Length 302

Record Type = PRIMARY_RECORD        Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS VERSIONING_INFO
Record Size = 302                   
Memory Dump @0x000000F2084FA2BC

0000000000000000:   70003e00 00000000 01880391 a7000000 00100000  p.>........Ԥ.......
0000000000000014:   20000000 00000001 00e07dae 62de7eb6 754898b0   ........à}®bÞ~¶uH.°
0000000000000028:   f2bb6c6c 0b260000 f0bceb56 fd7f0000 d7fa7cfd  ò»ll.&..ð¼ëVý...×ú|ý
000000000000003C:   7f000e00 00150500 58006800 9800a800 20017400  ........X.h...¨. .t.
0000000000000050:   65007300 74003100 74007000 63006300 5f007300  e.s.t.1.t.p.c.c._.s.
0000000000000064:   73006400 42004f00 57004e00 56004400 49004d00  s.d.B.O.W.N.V.D.I.M.
0000000000000078:   4d003100 5c004200 4f004d00 53005300 4e005600  M.1.\.B.O.M.S.S.N.V.
000000000000008C:   44004900 4d004d00 30003100 01000000 00000000  D.I.M.M.0.1.........
00000000000000A0:   00000000 00000000 42004f00 57004e00 56004400  ........B.O.W.N.V.D.
00000000000000B4:   49004d00 4d003100 5c004200 4f004d00 53005300  I.M.M.1.\.B.O.M.S.S.
00000000000000C8:   4e005600 44004900 4d002d00 74007000 63006300  N.V.D.I.M.-.t.p.c.c.
00000000000000DC:   5f007300 73006400 2d007400 65007300 74003100  _.s.s.d.-.t.e.s.t.1.
00000000000000F0:   2d004200 4f005700 53005300 44005400 45005300  -.B.O.W.S.S.D.T.E.S.
0000000000000104:   54003000 33005c00 42004f00 4d005300 53005300  T.0.3.\.B.O.M.S.S.S.
0000000000000118:   53004400 2d003100 00000000 00000000 745a2f07  S.D.-.1.........tZ/.
000000000000012C:   0000                                          ..     

Version Information = 
 Transaction Timestamp: 120543860
 Version Pointer: Null

Eventually, after thousands of updates this table will have a high number of version ghost records, on the following example the table had 17,085 version ghost records and the table grew to 1,315 pages instead of 1 page

SELECT page_count, record_count, ghost_record_count, version_ghost_record_count 
FROM sys.dm_db_index_physical_stats(db_id('tpcc_ssd'), object_id('MSreplication_subscriptions'), NULL, NULL , 'SAMPLED');


Replication Latency 

The problem of having MSreplication_subscriptions with a high number of ghost records is that the update to this table will take longer and longer, for example we may see that the logical reads are 1,363 pages

set statistics IO on
go
exec sp_executesql N'update MSreplication_subscriptions 
set transaction_timestamp = cast(@P1 as binary(15)) + cast(substring(transaction_timestamp, 16, 1) as binary(1))
, "time" = @P2 
where UPPER(publisher) = UPPER(@P3) 
and publisher_db = @P4 
and publication = @P5 
and subscription_type = 0 
and (substring(transaction_timestamp, 16, 1) = 0 or datalength(transaction_timestamp) < 16)'
,N'@P1 varbinary(14)
,@P2 datetime
,@P3 nvarchar(24)
,@P4 nvarchar(8)
,@P5 nvarchar(5)'
,0x000009B600088BB0001D00000000
,'2017-06-13 15:03:47'
,N'subscriber'
,N'subscriber_db'
,N'publication'
go 
set statistics IO off
go

Table 'MSreplication_subscriptions'. Scan count 1, logical reads 1362, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)


Because the where clause of the update is not a SARG, the query plan will show a cluster index scan 


Workaround 

If we find that the MSreplication_susbcriptions has a high number of version ghost records

SELECT page_count, record_count, ghost_record_count, version_ghost_record_count 
FROM sys.dm_db_index_physical_stats(db_id('tpcc_ssd'), object_id('MSreplication_subscriptions'), NULL, NULL , 'SAMPLED');

Then it means there is a long running query using the version store not allowing the ghost cleanup.
One way to force the ghost cleanup is to rebuild the index

ALTER INDEX uc1MSReplication_subscriptions on MSreplication_subscriptions REBUILD WITH(ONLINE=ON)

The updates will now scan one page and the replication latency should come down