Pages

Wednesday, November 26, 2014

How many targets can a centralized MDW support?

In Data Collection terminology in books online, a target is an instance of the Database Engine in an edition of SQL Server that supports Data Collection.

The Management Data Warehouse (MDW) is a relational database used to store collected data. This database can be centralized and shared among multiple targets.

But I haven’t seen any guideline of how many.


We can test it out doing the following.

TEST SETUP

It is a 2 step process outlined below.

  1. Create the centralized MDW in a SQL Server, e.g. SQL01 using the Configure the Management Data Warehouse Wizard as outlined on books online http://msdn.microsoft.com/en-us/library/bb933864.aspx
  2. Configure Data Collection on a couple or more targets using the same data collection schedule. In this test I set up about 50 targets (e.g. SQL02, SQL03, etc) with a 15 min collection schedule using the following T-SQL:

 
exec msdb.dbo.sp_syscollector_set_warehouse_database_name @database_name = 'mdw'
exec msdb.dbo.sp_syscollector_set_warehouse_instance_name @instance_name = 'server'
exec msdb.dbo.sp_syscollector_set_cache_directory @cache_directory = 'c:\temp'
exec msdb.dbo.sp_syscollector_enable_collector
exec msdb.dbo.sp_syscollector_update_collection_set @name = 'Disk Usage', @days_until_expiration = 2
exec msdb.dbo.sp_syscollector_update_collection_set @name = 'Server Activity', @days_until_expiration = 2
exec msdb.dbo.sp_syscollector_update_collection_set @name = 'Query Statistics', @days_until_expiration = 2
exec msdb.dbo.sp_syscollector_update_collection_set @name = 'Server Activity', @schedule_name=N'CollectorSchedule_Every_15min'
exec msdb.dbo.sp_syscollector_update_collection_set @name = 'Query Statistics', @schedule_name=N'CollectorSchedule_Every_15min'

OBSERVATIONS

In one of the targets, we can look at the agent job activity for the next upload time to the MDW. As configured, all targets will upload at the same time, e.g. collection_set_2_upload next run is 5:15 PM. 


On the next run time, e.g. 5:15 PM run the following query on the MDW server (SQL01) to see what is going on.

 
select
r.session_id
,r.blocking_session_id
,r.status
,r.last_wait_type
,r.wait_resource
,r.wait_type
,r.wait_time
,t.text
from sys.dm_exec_requests r
cross apply sys.dm_exec_sql_text(r.plan_handle) t


In the output above, each data uploading session is waiting on the same KEY resource lock. We can query sys.partitions using hobt_id = 72057594046447616 and find that the index is UN_performance_counter_path on the table performance_counter_instance.

Why a bulk insert is causing blocking on an index key?

If we look a the index properties, we find that the “Ignore duplicated values” option have been turned on.


If we look at the SSIS package of the data collector PerfCountersUpload, we can find the data flow for performance_counters_instances table.


If we look at the data flow, we can see that the performance counters file is bulk loaded into performance_counter_instances table to find new performance counters instance paths.


What is the logic to find the new performance counters instance paths?

Well it appears that logic is pretty simple: load the file and ignore the duplicate paths, since we found earlier that the index UN_performance_counter_path has the “Ignore duplicate values” option turn on.

This would answer the question of why the bulk insert is causing blocking: because it is bulk inserting duplicates and has to place a LCK_M_RS_U lock on the index key.

The side effect of this behavior is that it serializes the bulk inserts. No matter how many targets are loading data concurrently to the central MDW, it will insert/ignore one duplicate path at the time.

How long can the blocked data loads wait?

On my test of 50 targets some of the data collection jobs failed with this message

The thread "ExecMasterPackage" has timed out 3600 seconds after being signaled to stop.  Process Exit Code 259.  The step failed.

We can infer that the number of targets supported is a function of how long a data load can wait for the KEY lock before the job times out.

CONCLUSION

The number of targets a shared MDW can support could be estimated by the number of sequential data loads that can be done in less than 1 hour.

Example: to make the math simple, let's assume each data load takes about 2 min and all start at the same time, since the inserts are serialized due to KEY lock, the first 30 targets will succeed and the 31st target will time out.

If we have hundreds of SQL Servers, it would be better to have local MDW databases and then have a custom job that gathers the data in a custom central MDW database. I will explore this option in a future post.