Pages

Friday, May 8, 2015

Data Collection (MDW) Errors after Mirror or Availability Group Failover

Symptoms

After mirror or availability group failover, we get the following errors on the data collection jobs on the mirror or secondary server

Sysssislog messages:

SSIS Error Code DTS_E_OLEDBERROR.  An OLE DB error has occurred. Error code: 0x80040E14.
An OLE DB record is available.  Source: "Microsoft SQL Server Native Client 11.0"  Hresult: 0x80040E14  Description: "The target database, 'xxxx', is participating in an availability group and is currently not accessible for queries. Either data movement is suspended or the availability replica is not enabled for read access. To allow read-only access to this and other databases in the availability group, enable read access to one or more secondary availability replicas in the group.  For more information, see the ALTER AVAILABILITY GROUP statement in SQL Server Books Online.".

OLE DB error occurred while fetching parameterized rowset. Check SQLCommand and SqlCommandParam properties.

SSIS Error Code DTS_E_PROCESSINPUTFAILED.  The ProcessInput method on component "LKUP - Look up query plans on target server" (2) failed with error code 0xC0208253 while processing input "Lookup Input" (16). The identified component returned an error from the ProcessInput method. The error is specific to the component, but the error is fatal and will cause the Data Flow task to stop running.  There may be error messages posted before this with more information about the failure.

Root Cause 

When the collector finds an “interesting” query from sys.dm_exec_query_stats, it calls the stored procedure msdb.dbo.sp_syscollector_text_query_plan_lookup to get the query plan from cache using sys.dm_exec_text_query_plan. That function will error out if the plan references a database that has the role mirror or secondary. As explained in this blog, a failover does not flush the plan cache.

Workaround

We can add error handling to the procedure to continue when the respective error occurs

 
USE [msdb]
GO
ALTER PROCEDURE [dbo].[sp_syscollector_text_query_plan_lookpup]
    @plan_handle varbinary(64),
    @statement_start_offset int,
    @statement_end_offset int
AS
BEGIN
    SET NOCOUNT ON
 BEGIN TRY
  SELECT    
   @plan_handle AS plan_handle,
   @statement_start_offset AS statement_start_offset,
   @statement_end_offset AS statement_end_offset,
   [dbid] AS database_id,
   [objectid] AS object_id,
   OBJECT_NAME(objectid, dbid) AS object_name,
   [query_plan] AS query_plan
  FROM    
   [sys].[dm_exec_text_query_plan](@plan_handle, @statement_start_offset, @statement_end_offset) dm
 END TRY
 BEGIN CATCH
  DECLARE @ErrorNumber INT = ERROR_NUMBER();
  DECLARE @ErrorSeverity INT = ERROR_SEVERITY();
  DECLARE @ErrorState INT = ERROR_STATE();
  DECLARE @ErrorMessage NVARCHAR(4000) = ERROR_MESSAGE();
  --when a plan in cache refereces a db with role = mirror or role = secondary, dm_exec_text_query_plan will return this error:
  --Msg 954, Level 14, State 1, Procedure sp_syscollector_text_query_plan_lookpup, Line 8
  --The database "xxxx" cannot be opened. It is acting as a mirror database.
  --Msg 976, Level 14, State 1, Line 5
  --The target database, 'xxxx', is participating in an availability group and is currently not accessible for queries. Either data movement is suspended or the availability replica is not enabled for read access. To allow read-only access to this and other databases in the availability group, enable read access to one or more secondary availability replicas in the group.  For more information, see the ALTER AVAILABILITY GROUP statement in SQL Server Books Online.
  IF @ErrorNumber <> 954 and @ErrorNumber <> 976
   RAISERROR(@ErrorMessage, @ErrorSeverity, @ErrorState);
 END CATCH
END

Connect item link

Monday, May 4, 2015

Data Collection (MDW) Errors after Applying SQL 2012 SP2 CU4

Symptoms

After applying CU4, we get the following errors.

Sysssislog messages:

There was an error with ODS - Get snapshot of dm_exec_requests.Outputs[OLE DB Source Output].Columns[command] on ODS - Get snapshot of dm_exec_requests.Outputs[OLE DB Source Output]. The column status returned was: "Text was truncated or one or more characters had no match in the target code page.".

The "ODS - Get snapshot of dm_exec_requests.Outputs[OLE DB Source Output].Columns[command]" failed because truncation occurred, and the truncation row disposition on "ODS - Get snapshot of dm_exec_requests.Outputs[OLE DB Source Output].Columns[command]" specifies failure on truncation. A truncation error occurred on the specified object of the specified component.

Truncation may occur due to retrieving data from database column "command" with a length of 32 to data flow column "command" with a length of 16.

Syscollector_execution_log_internal message:

Failed to create kernel event for collection set: {2DC02BD6-E230-4C05-8516-4E8C0EF21F95}. Inner Error ------------------>
Cannot create a file when that file already exists.

Root cause

CU4 contains this fix:

"An update enables you to exclude system databases from Query Statistics collection sets when you use the management data warehouse in SQL Server"

To implement it, Microsoft added stored procedures, one of them is sp_syscollector_snapshot_dm_exec_requests_internal that returns the column "command" nvarchar(32). However, the SSIS packages are expecting the "command" field to be nvarchar(16).

Workaround

A workaround is to modify this line of code in the stored procedure to return "command" nvarchar(16)
 
CAST(ISNULL (req.command, 'AWAITING COMMAND') AS nvarchar(16)) AS command

Eventually Microsoft needs to modify the "command" length on both the stored procedure and the packages to be nvarchar(32)

Connect item link



Friday, May 1, 2015

Enabling Additional Tracing for Data Collection (MDW)

If sysssislog or syscollector_execution_log_internal_message don’t give us a clue of what is the issue, we can enable additional tracing by adding a registry key as posted here

To enable tracing in a remote computer, we can execute this PowerShell script

 
$computer = "mycomputer"
$cred = Get-Credential mydomain\myadminaccount
Enter-PSSession $computer -Credential $cred

Push-Location 
if (-not(Test-Path("HKLM:\SOFTWARE\Microsoft\Microsoft SQL Server\SQLTools")))
{
    Set-Location "HKLM:\SOFTWARE\Microsoft\Microsoft SQL Server"
    New-Item -Name SQLTools
}
if (-not(Test-Path("HKLM:\SOFTWARE\Microsoft\Microsoft SQL Server\SQLTools\dcexec")))
{
    Set-Location "HKLM:\SOFTWARE\Microsoft\Microsoft SQL Server\SQLTools"
    New-Item -Name dcexec
    New-ItemProperty -Name Components -PropertyType string -Path dcexec -Value "DCEXEC,TxDataCollector,DataCollectorController,DataCollectorTasks,Microsoft.SqlServer.Management.CollectorTasks.dll"
    New-ItemProperty -Name Tracelvl -PropertyType dword -Path dcexec -Value "4294967287"
    New-ItemProperty -Name Traceloc -PropertyType dword -Path dcexec -Value 3
    New-ItemProperty -Name Prefix -PropertyType string -Path dcexec -Value "date,time,pid,tid"
    New-ItemProperty -Name LogDir -PropertyType string -Path dcexec -Value "c:\temp\tracing"
    New-ItemProperty -Name LogFileMode -PropertyType string -Path dcexec -Value "Unique"
}
Pop-Location
Exit  

Then restart data collection using the script of a previous post

A trace file per process id will be created on c:\temp\tracing, once the error in question occurs, we can open the respective file to see the information logged. File name example:

c:\Temp\Tracing\dcexec_04_27_2015_09_22_28_PID30316_n.log

To disable tracing in a remote computer, we can execute this PowerShell script

 
$computer = "mycomputer"
$cred = Get-Credential mydomain\myadminaccount
Enter-PSSession $computer -Credential $cred

Push-Location 
if (Test-Path("HKLM:\SOFTWARE\Microsoft\Microsoft SQL Server\SQLTools"))
{
    Remove-Item "HKLM:\SOFTWARE\Microsoft\Microsoft SQL Server\SQLTools" -Recurse 
}
Pop-Location  
Exit  

Then restart data collection using the script of a previous post

Troubleshooting Data Collection (MDW) Errors

This is documented in books online

In a nutshell, data collector SSIS package errors are logged in the table sysssislog.

 
select * from msdb..sysssislog where event = 'OnError'

And dcexec.exe errors are logged in the table syscollector_execution_log_internal. We can use the view syscollector_execution_log_full but it is a bit slow to query.

 
select * from msdb.dbo.syscollector_execution_log_internal with(nolock) where failure_message is not null

The most common SSIS errors are shown below and they may cause the dcexec.exe to hang. Thus, those errors can be resolved by stopping the data collection jobs, killing any rogue dcexec.exe processes, deleting all cache files, and restarting the jobs. These tasks were scripted in the previous post.

Connection timeout error

Sysssislog messages:

SSIS Error Code DTS_E_OLEDBERROR.  An OLE DB error has occurred. Error code: 0x80004005.
An OLE DB record is available.  Source: "Microsoft SQL Server Native Client 11.0"  Hresult: 0x80004005  Description: "Login timeout expired".
An OLE DB record is available.  Source: "Microsoft SQL Server Native Client 11.0"  Hresult: 0x80004005  Description: "Unable to complete login process due to delay in login response".
An OLE DB record is available.  Source: "Microsoft SQL Server Native Client 11.0"  Hresult: 0x80004005  Description: "Shared Memory Provider: Timeout error [258]. ".

Syscollector_execution_log_internal message:

Failed to create kernel event for collection set: {49268954-4FD4-4EB6-AA04-CD59D9BB5714}. Inner Error ------------------>
Cannot create a file when that file already exists.

Deadlock error

Sysssislog messages:

SSIS Error Code DTS_E_OLEDBERROR.  An OLE DB error has occurred. Error code: 0x80004005.
An OLE DB record is available.  Source: "Microsoft SQL Server Native Client 11.0"  Hresult: 0x80004005  Description: "Unspecified error".
An OLE DB record is available.  Source: "Microsoft SQL Server Native Client 11.0"  Hresult: 0x80004005  Description: "Transaction (Process ID 106) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.".

Syscollector_execution_log_internal message:

Failed to create kernel event for collection set: {2DC02BD6-E230-4C05-8516-4E8C0EF21F95}. Inner Error ------------------>
Cannot create a file when that file already exists.

Corrupt cache file error

Sysssislog messages:

The binary field is too large. The adapter attempted to read a binary field that was 134220032 bytes long, but expected a field no longer than 64 bytes at offset 57711. This usually occurs when the input file is not valid. The file contains a string length that is too large for the buffer column.
SSIS Error Code DTS_E_PRIMEOUTPUTFAILED.  The PrimeOutput method on component "RFS - Read Current Upload Data" (1) returned error code 0x80004005.  The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing.  There may be error messages posted before this with more information about the failure.

Syscollector_execution_log_internal message:

SSIS error. Component name: DFT - Upload collection snapshot, Code: -1073450952, Subcomponent: SSIS.Pipeline, Description: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED. The PrimeOutput method on component "RFS - Read Current Upload Data" (1) returned error code 0x80004005. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code

What does the error “Cannot create a file when that file already exists” mean?

If we enable additional tracing, as shown in the previous post, we can see that there was an unhandled exception pointing to the line in the source code file that does not exist in our machine. However the root cause was the previous error.

DataCollectorController!766c!4318!2015/04/27!09:49:43:: e             ERROR: SSIS Error. Code: -1071636471, Subcomponent: Connection manager "ConfigConnection", Description: SSIS Error Code DTS_E_OLEDBERROR.  An OLE DB error has occurred. Error code: 0x80004005.
An OLE DB record is available.  Source: "Microsoft SQL Server Native Client 11.0"  Hresult: 0x80004005  Description: "Login timeout expired".
An OLE DB record is available.  Source: "Microsoft SQL Server Native Client 11.0"  Hresult: 0x80004005  Description: "Unable to complete login process due to delay in prelogin response".
An OLE DB record is available.  Source: "Microsoft SQL Server Native Client 11.0"  Hresult: 0x80004005  Description: "TCP Provider: Timeout error [258]. ".

DataCollectorController!766c!1208!2015/04/28!05:05:20:: e          ERROR: Generic Error: retCode=0x5, function CDataCollectorController::ExecuteMasterPackage, line 1164, file e:\sql11_main_t\sql\mpu\shared\dc\runtime\controller\src\datacollectorcontroller.cpp
DataCollectorController!766c!1208!2015/04/28!05:05:20:: e          ERROR: The master package exited with error, previous error messages should explain the cause.

DataCollectorController!766c!7e18!2015/04/28!05:05:30:: e        ERROR: Win32 Error: GetLastError=183, retCode=0x4, function CDataCollectorController::CreateControlEvents, line 476, file e:\sql11_main_t\sql\mpu\shared\dc\runtime\controller\src\datacollectorcontroller.cpp
DataCollectorController!766c!7e18!2015/04/28!05:05:30:: e        ERROR: Failed to create kernel event for collection set: {49268954-4FD4-4EB6-AA04-CD59D9BB5714}. Inner Error ------------------>
Cannot create a file when that file already exists.

Restarting Data Collection (MDW)

Most data collection errors can be resolved by restarting data collection. A PowerShell script can be used to:

  1. Stop data collection in all the SQL Server instances running on the machine
  2. Kill dcexec.exe processes
  3. Delete all cache files (our cache directory is c:\temp)
  4. Start data collection in all the SQL Server instances running on the machine

 
function RestartDataCollection($SrcSrv)
{
    try
    {
        $SrcCon  = New-Object System.Data.SqlClient.SQLConnection("Data Source=$SrcSrv;Initial Catalog=master;Integrated Security=True;Connection Timeout=60") 
        $SrcCon.Open() 
                
        $SrcCmd = New-Object system.Data.SqlClient.SqlCommand("select @@version", $SrcCon)  
        $SrcCmd.CommandTimeout = 300

        $query = "
        declare @instances table( 
        value nvarchar(100)
        ,instance_name nvarchar(100)
        ,data nvarchar(100)
        );

        declare @count int;

        with a as(
        select *, ROW_NUMBER() over(partition by l.collection_set_id order by l.log_id desc) row_num
        from msdb.dbo.syscollector_execution_log_internal l with(nolock)
        )
        select @count = count(*)
        from a
        where a.row_num = 1
        and a.failure_message is not null

        --if @count > 0 

        insert into @instances
        exec xp_regread
        @rootkey = 'HKEY_LOCAL_MACHINE'
        ,@key = 'SOFTWARE\Microsoft\Microsoft SQL Server'
        ,@value_name = 'InstalledInstances'

        select 
        case when instance_name = 'MSSQLSERVER' then SERVERPROPERTY('ServerName') else instance_name end instance_name 
        ,SERVERPROPERTY('ComputerNamePhysicalNetBIOS') computer
        from @instances
        "

        $SrcCmd.CommandText = $query
        $instances = $SrcCmd.ExecuteReader()
        $a = New-Object System.Collections.ArrayList
        $i = 0

        $query = "
        exec msdb.dbo.sp_syscollector_stop_collection_set @name = 'Server Activity'
        exec msdb.dbo.sp_syscollector_stop_collection_set @name = 'Query Statistics'
        exec msdb.dbo.sp_syscollector_stop_collection_set @name = 'Disk Usage'
        "
        
        "stop collection"

        $computer = ""

        while ($instances.Read()) 
        {
            $SrcSrv = $instances.GetValue(0) 
            $computer = $instances.GetValue(1)
            $r = $a.Add($i) 
            $a[$i] = $SrcSrv
            $i = $i + 1
            $SrcSrv
            try
            {
                Invoke-Sqlcmd -ServerInstance $SrcSrv -Query $query -QueryTimeout 300
            }
            catch
            {
                $e = $_ | select -ExpandProperty InvocationInfo
                $m = $_.Exception.Message.TrimEnd().Replace("'","") + ", " + $e.ScriptLineNumber.ToString() + ", " + $e.OffsetInLine.ToString()
                $m                
            }
        }
        $instances.Close()

        "killing dcexec processes"

        $processes = Get-Process -ComputerName $computer -Name "DCEXEC" -ErrorAction SilentlyContinue
        
        foreach ($process in $processes)
        {
            try
            {
                taskkill /F /T /S $computer /PID $process.Id
            }
            catch
            {
                $e = $_ | select -ExpandProperty InvocationInfo
                $m = $_.Exception.Message.TrimEnd().Replace("'","") + ", " + $e.ScriptLineNumber.ToString() + ", " + $e.OffsetInLine.ToString()
                $m                
            }
        }

        "deleting cache files"

        $files = get-childitem "\\$computer\c$\temp\*.cache" -ErrorAction SilentlyContinue

        foreach ($file in $files)
        {
            try
            {
                $file.Delete()
            }
            catch
            {
                $e = $_ | select -ExpandProperty InvocationInfo
                $m = $_.Exception.Message.TrimEnd().Replace("'","") + ", " + $e.ScriptLineNumber.ToString() + ", " + $e.OffsetInLine.ToString()
                $m                
            }
        }

        $query = "
        exec msdb.dbo.sp_syscollector_start_collection_set @name = 'Server Activity'
        exec msdb.dbo.sp_syscollector_start_collection_set @name = 'Query Statistics'
        exec msdb.dbo.sp_syscollector_start_collection_set @name = 'Disk Usage'
        "
        
        "start collection"

        foreach ($SrcSrv in $a)
        {
            $SrcSrv
            try
            {
                Invoke-Sqlcmd -ServerInstance $SrcSrv -Query $query -QueryTimeout 300
            }
            catch
            {
                $e = $_ | select -ExpandProperty InvocationInfo
                $m = $_.Exception.Message.TrimEnd().Replace("'","") + ", " + $e.ScriptLineNumber.ToString() + ", " + $e.OffsetInLine.ToString()
                $m                
            }
        }

        $SrcCon.Close() 
        $SrcCon.Dispose() 
    }
    catch
    {
        $e = $_ | select -ExpandProperty InvocationInfo
        $m = $_.Exception.Message.TrimEnd().Replace("'","") + ", " + $e.ScriptLineNumber.ToString() + ", " + $e.OffsetInLine.ToString()
        $m
    }
}

("Server1", "Server2", "Server3") | foreach{RestartDataCollection $_}