Pages

Friday, December 30, 2016

Querying the Data Validation Results

Typically, I use replication monitor to validate subscriptions and we can see the results on the subscription details window, as shown below. But if the publication has many tables, we can use a query instead, to easily spot the table that is out of synchronization.


We can run this query on the distribution database and provide the publisher database (e.g. WideWorldImporters) and the publication (e.g. TransactionTables)

 
select 
a.name
,h.comments
from MSdistribution_history h with (nolock)
join MSdistribution_agents a with (nolock)
on h.agent_id = a.id
where a.publisher_db = 'WideWorldImporters'
and a.publication = 'TransactionTables'
and h.comments like 'Table% out of synchronization%'


Friday, November 18, 2016

The Power of Window Functions

Window functions are the most efficient way to calculate certain aggregates such as running totals.

For example, we could use the CROSS APPLY operator to calculate the running total, as shown below, using the sample database Wide World Importers
 
SELECT t1.CustomerID, t1.InvoiceID, t1.TransactionAmount
,c.RunningTotali
FROM [WideWorldImporters].[Sales].[CustomerTransactions] t1
CROSS APPLY(
SELECT SUM(t2.TransactionAmount) RunningTotal
FROM [WideWorldImporters].[Sales].[CustomerTransactions] t2
WHERE t2.InvoiceID is not null
AND t2.InvoiceID <= t1.InvoiceID
AND t1.CustomerID = t2.CustomerID
) c
WHERE t1.InvoiceID is not null
ORDER BY t1.CustomerID, t1.InvoiceID

We could refactor this query to use window functions, as shown below
 
SELECT t1.CustomerID, t1.InvoiceID, t1.TransactionAmount,
SUM(t1.TransactionAmount) OVER(PARTITION BY t1.CustomerID
ORDER BY t1.InvoiceID
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) RunningTotal
FROM [WideWorldImporters].[Sales].[CustomerTransactions] t1
WHERE t1.InvoiceID is not null
ORDER BY t1.CustomerID, t1.InvoiceID

Both queries will get the same result


However, the performance difference is abysmal


As shown above, the CROSS APPLY query (first statement) takes about 5 min and 2 million reads, while the window function (second statement) takes 264 milliseconds and a thousand reads to complete

Enjoy the power.

Thursday, August 4, 2016

Replication Snapshot Agent Blocking

The snapshot agent is used to initialize the subscription in transaction replication, and it requires a very short duration schema modification (SCH-M) lock on all the articles of the publication. If the articles are being queried thousands per second or queried for a long time then they will block the snapshot agent because SCH-M lock is not compatible with any other lock, once the SCH-M lock is waiting, everybody else will wait.

If blocking for few seconds is an issue in your system then you can initialize when there is low activity or initialize without a snapshot documented in books online.

Below I will show you how to reproduce this blocking scenario using a long running query.
 

-- on publisher: create publication database

use master
create database publisher_test_snapshot_agent_blocking
go

use publisher_test_snapshot_agent_blocking
create table paul_test1(a int primary key, b int)
create table paul_test2(a int primary key, b int)

-- on subscriber: create subscription database

use master
create database subscriber_test_snapshot_agent_blocking
go

-- on publisher: create publication and subscription

use publisher_test_snapshot_agent_blocking
exec sp_replicationdboption @dbname = N'publisher_test_snapshot_agent_blocking', @optname = N'publish', @value = N'true'
exec sp_addpublication @publication = N'test_snapshot_agent_blocking', @description = N'test', @sync_method = N'concurrent', @retention = 0, @allow_push = N'true', @allow_pull = N'true', @allow_anonymous = N'false', @enabled_for_internet = N'false', @snapshot_in_defaultfolder = N'true', @compress_snapshot = N'false', @ftp_port = 21, @ftp_login = N'anonymous', @allow_subscription_copy = N'false', @add_to_active_directory = N'false', @repl_freq = N'continuous', @status = N'active', @independent_agent = N'true', @immediate_sync = N'false', @allow_sync_tran = N'false', @autogen_sync_procs = N'false', @allow_queued_tran = N'false', @allow_dts = N'false', @replicate_ddl = 1, @allow_initialize_from_backup = N'false', @enabled_for_p2p = N'false', @enabled_for_het_sub = N'false'
exec sp_addpublication_snapshot @publication = N'test_snapshot_agent_blocking', @frequency_type = 1, @frequency_interval = 0, @frequency_relative_interval = 0, @frequency_recurrence_factor = 0, @frequency_subday = 0, @frequency_subday_interval = 0, @active_start_time_of_day = 0, @active_end_time_of_day = 235959, @active_start_date = 0, @active_end_date = 0, @job_login = null, @job_password = null, @publisher_security_mode = 1
exec sp_addarticle @publication = N'test_snapshot_agent_blocking', @article = N'paul_test1', @source_owner = N'dbo', @source_object = N'paul_test1', @type = N'logbased', @description = N'', @creation_script = N'', @pre_creation_cmd = N'drop', @schema_option = 0x000000000803509F, @identityrangemanagementoption = N'none', @destination_table = N'paul_test1', @destination_owner = N'dbo', @status = 24, @vertical_partition = N'false', @ins_cmd = N'CALL [sp_MSins_dbopaul_test1]', @del_cmd = N'CALL [sp_MSdel_dbopaul_test1]', @upd_cmd = N'SCALL [sp_MSupd_dbopaul_test1]'
exec sp_addarticle @publication = N'test_snapshot_agent_blocking', @article = N'paul_test2', @source_owner = N'dbo', @source_object = N'paul_test2', @type = N'logbased', @description = N'', @creation_script = N'', @pre_creation_cmd = N'drop', @schema_option = 0x000000000803509F, @identityrangemanagementoption = N'none', @destination_table = N'paul_test2', @destination_owner = N'dbo', @status = 24, @vertical_partition = N'false', @ins_cmd = N'CALL [sp_MSins_dbopaul_test2]', @del_cmd = N'CALL [sp_MSdel_dbopaul_test2]', @upd_cmd = N'SCALL [sp_MSupd_dbopaul_test2]'
exec sp_addsubscription @publication = N'test_snapshot_agent_blocking', @subscriber = N'(subscriber)', @destination_db = N'subscriber_test_snapshot_agent_blocking', @subscription_type = N'Push', @sync_type = N'automatic', @article = N'all', @update_mode = N'read only', @subscriber_type = 0
GO

-- on a new query window: run this long running select on the publisher

use publisher_test_snapshot_agent_blocking
select *
from master..spt_values a
cross join master..spt_values b
cross join paul_test1

-- on a new query window: start the snapshot agent  on the distributor

use distribution
exec sp_MSstartsnapshot_agent @publisher = N'(publisher)', @publisher_db = N'publisher_test_snapshot_agent_blocking', @publication = N'test_snapshot_agent_blocking'

-- on a new query window: run this long running select on the publisher

use publisher_test_snapshot_agent_blocking
select *
from master..spt_values a
cross join master..spt_values b
cross join paul_test1

-- on a new query window: run this long running select on the publisher

use publisher_test_snapshot_agent_blocking
select *
from master..spt_values a
cross join master..spt_values b
cross join paul_test1

-- on a new query window: run sp_WhoIsActive

exec sp_WhoIsActive 
@get_outer_command = 1
,@get_plans = 1
,@get_transaction_info = 1
,@find_block_leaders = 1
,@get_locks = 1

You will see that the long running query (session id = 88) is blocking the snapshot agent (session id = 95) that is requesting SCH-M locks, which in turn is blocking the other 2 queries (session_ id = 52 and 86) requesting Intent Shared (IS) locks.


Friday, April 8, 2016

Querying the Lock Acquired Extended Event

In this post, I will show how to query the lock acquired extended event that I used in the previous post. This query summarizes the data by aggregating the count of locks of the same type but preserving the sequence in which they appeared. For this purpose, I used the SQL 2012 windowing function LAG to solve the “island” problem as described by Itzik Ben-Gan in his T-SQL Querying book.

I used the same sample data of the previous post. Next I used the following extend event filtered by the session id I used to run the delete query, as shown below.
 
CREATE EVENT SESSION [lock_acquired] ON SERVER 
ADD EVENT sqlserver.lock_acquired(SET collect_resource_description=(1)
    ACTION(package0.callstack,sqlserver.session_id,sqlserver.sql_text,sqlserver.tsql_stack)
    WHERE ([package0].[equal_uint64]([sqlserver].[session_id],(91))))
ADD TARGET package0.event_file(SET filename=N'c:\temp\lock_acquired')
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=ON,STARTUP_STATE=OFF)
GO

alter event session [lock_acquired]
on server
state = start;

delete top (1000) child_table
from child_table c
left join parent_table p
on c.parent_id = p.parent_id
where p.parent_id is null

alter event session [lock_acquired]
on server
state = stop;

Then you can run the following query to get the locks acquired

with a as(
SELECT
t.c.value ('@name', 'nvarchar(50)') AS event_name
,t.c.value ('@timestamp', 'datetime2') AS event_time
,t.c.value ('(data[@name="associated_object_id"]/value)[1]', 'numeric(20)') AS associated_object_id
,t.c.value ('(data[@name="resource_0"]/value)[1]', 'bigint') AS resource_0
,t.c.value ('(data[@name="resource_1"]/value)[1]', 'bigint') AS resource_1
,t.c.value ('(data[@name="resource_2"]/value)[1]', 'bigint') AS resource_3
,t.c.value ('(data[@name="mode"]/text)[1]', 'varchar(50)') AS mode
,t.c.value ('(data[@name="resource_type"]/text)[1]', 'varchar(50)') AS resource_type
,t.c.value ('(data[@name="database_id"]/value)[1]', 'int') AS database_id
,t.c.value ('(data[@name="object_id"]/value)[1]', 'int') AS object_id
,t.c.value ('(action[@name="attach_activity_id"]/value)[1]', 'varchar(200)') AS attach_activity_id
FROM (
SELECT
    CAST(event_data AS XML) AS event_xml
 FROM sys.fn_xe_file_target_read_file
 (
 'C:\temp\lock_acquired_0_131015242967150000.xel'
 , null
 , null
 , null
)) target_read_file
CROSS APPLY event_xml.nodes ('//event') AS t (c)
where t.c.value ('@name', 'nvarchar(50)') = 'lock_acquired'
and t.c.value ('(data[@name="associated_object_id"]/value)[1]', 'numeric(20)') > 0
and t.c.value ('(data[@name="database_id"]/value)[1]', 'int') <> 2
) 
select
event_time
,object_name(isnull(p.object_id,a.object_id),a.database_id) object_name
,resource_type
,mode
,count(*) count
into #temp
from a
left join sys.partitions p
on a.associated_object_id = p.hobt_id
group by
event_time
,object_name(isnull(p.object_id,a.object_id),a.database_id)
,resource_type
,mode

select object_name, resource_type, mode, count from #temp

You will notice there are about 200+ lines that may be difficult to read and could be solved by the “island” task that Itzik describes as “Identifying islands means returning the ranges of consecutive values.” Which are shown below. We want the consecutive S locks of the parent table as well as the U locks of the child table aggregated preserving the sequence



The SQL 2012 window function LAG did the trick
 
with a as(
select * 
,row_number() over(
order by
event_time
,object_name
,resource_type
,mode
) row_num
from #temp
), b as(
select *
,case when object_name + resource_type + mode <>
lag(object_name + resource_type + mode) over(
order by
row_num
) 
or
lag(object_name + resource_type + mode) over(
order by
row_num
) is null
then 1
else 0
end grp1
from a
), c as(
select *
,sum(grp1) over(
order by row_num
) grp2
from b
)
select 
object_name
,resource_type
,mode
,sum(count) count
from c
group by
grp2
,object_name
,resource_type
,mode
order by
grp2
,object_name
,resource_type
,mode


Monday, February 22, 2016

Lock Escalation and Outer Joins

A common work around to avoid lock escalation is to limit the number of rows per transaction, for example deleting 1,000 rows at a time. However, if a delete is based on an outer join, for example a left join, and the left table is large enough that will require more than 5,000 page locks to read it, the Database Engine will escalate those page locks to table lock regardless of how many rows are deleted.

I can reproduce the issue by running the below queries that populate a parent table with 20,000 values and a child table with 200,000 values.
 
create table parent_table(
parent_id int identity primary key
,parent_col char(255) not null
)

insert parent_table(parent_col)
values('a')
go 20000

create table child_table(
child_id int identity primary key
,parent_id int not null
,child_col char(255) not null
)

insert child_table(parent_id, child_col)
select parent_id, parent_col
from parent_table
go 10

Then delete from the child table rows that don’t have a match in the parent table using a left outer join. In this example it won't find any orphans so it won't delete any row.
 
delete top (1000) child_table
from child_table c
left join parent_table p
on c.parent_id = p.parent_id
where p.parent_id is null

I used the lock acquired event to show what locks were acquired (I will explain the extended events I used in the next post). You will see that due to the left outer join logic the query has to scan all the rows in the child table using an Update (U) lock, and it eventually escalates to Exclusive (X) table lock, blocking any subsequent query on the child table for the duration of the transaction.


A work around is to use a temp table as an intermediate step to find the matching child id
 
select c.child_id
into #temp
from child_table c
left join parent_table p
on c.parent_id = p.parent_id
where p.parent_id is null

delete top (1000) child_table
from child_table c
join #temp t
on c.child_id = t.child_id

Now you will see on the child table an Intend Shared (IS) table lock on the select into portion, and an Intend Exclusive (IX) table lock on the delete section, instead of X table lock in the previous query, allowing more concurrency on the child table.