Friday, March 20, 2015

Exploring the MDW Components

After enabling Data Collection, there are 4 System Data Collection Sets in the Data Collection folder in SSMS, as shown below. The last one, Utility Information, is in a stopped state, we can ignore it, it was not intended to be started, as mentioned by Kendra Little and Bill Ramos in this post

In fact, Utility Control Point jobs have different user interface called Utility Explorer so not sure why Microsoft left it in the Data Collection folder.

The SQL Server Agent jobs will be configured automatically by the System Data Collection Set properties. Do not modify the job properties directly. These jobs will call dcexec.exe that uses several SSIS packages to collect and upload the data. If the collection job uses cached data, the default temp directory is C:\Users\sqlserveragent\AppData\Local\Temp. We configured it as C:\Temp.

The upload job will insert the data to the respective tables in the centralized data warehouse. The tables in the schema “core” are used for lookups and the ones with schema “snapshot,” as name implies, store snapshots of DMVs or performance counters.

There are a couple of tables named snapshots.distinct_queries and snapshots.distinct_query_to_handle that seem to be not used.

The biggest table will be snapshots.notable_query_plan because it will store the query plans in varchar max data type so it cannot be compressed. On our systems, this table has an average size of 3 GB for 2 day retention.

Speaking of retention, the defaults are as follows:

If we look at the mdw_purge_data job, we will see it executes core.sp_purge_data that deletes the core.snapshot_internal and relies in the foreign key constraints to cascade deletes to the dependent tables.

A couple of exceptions of the cascade deletes, which are worth mentioning, are the snapshot.notable_query_text and the snapshot.notable_query_plan tables. If we look at the primary keys, we will see that they are not depended on the snapshot_id.

The data collection process will insert a row if an entry does not exists for the notable query text or plan, instead of inserting the same notable query text or plan for each snapshot. Thus, the core.sp_purge_data procedure calls other stored procedure to delete the orphan rows if no other snaphot.query_stats references the sql_handle or plan_handle respectively

We can easily visualize the table relationships by creating a database diagram using SSMS

From the database diagram we can deduct that each time a data collection is executed, the data is stored in the respective snaphot.* table, as well as , in the following tables core.* tables:

  • core.source_info_internal
  • core.snapshot_timetable
  • core.snapshots_internal

For example, if we want to get snapshots.query_stats for a particular instance and time, we can use the following query:

select *
from core.snapshots_internal i
join core.snapshot_timetable_internal t
on i.snapshot_time_id = t.snapshot_time_id
join core.source_info_internal s
on i.source_id = s.source_id
join snapshots.query_stats q
on i.snapshot_id = q.snapshot_id
where s.instance_name = ''
and t.snapshot_time = ''