Pages

Friday, May 24, 2019

Copy Data with PowerShell Benchmark

To copy a large amount of rows from one SQL Server (server1) to another (server2), I compared the .Net data provider I used back with PS 2.0, with the new sqlserver module -OutputAs DataTable. The results are shown below.

Using .Net, we need to write more code but it is faster. Using -OutputAs DataTable, we can write much less code but it takes longer since it need to load the table in memory. Note -OutputAs DataRows is the default which is even more slower since it will copy one row at a time

Repro

On server1, create a view to easily generate 6.4 million rows

create view my_spt_values
as
select c1.* from spt_values c1 cross join spt_values c2

On server2, create a stage table

select * into test_stage from spt_values where 1=2

Create the scripts below to copy from server1 to server2

Sqlclient.ps1

$conTargetString = "Data Source=server2;Initial Catalog=mydb;Integrated Security=True"

$conTarget = New-Object System.Data.SQLClient.SQLConnection($conTargetString)
$conTarget.Open()

$conBulk = New-Object System.Data.SQLClient.SQLBulkCopy($conTargetString, [System.Data.SQLClient.SqlBulkCopyOptions]::TableLock)
$conBulk.DestinationTableName = 'test_stage'
$conBulk.BatchSize = 10000000

$cmdTarget = New-Object System.Data.SQLClient.SQLCommand
$cmdTarget.Connection = $conTarget

$cmdTarget.CommandText = "truncate table test_stage"
$cmdTarget.ExecuteNonQuery()

$conSourceString = "Data Source=server1;Initial Catalog=mydb;Integrated Security=True"
$conSource = New-Object System.Data.SQLClient.SQLConnection($conSourceString)
$conSource.Open()

$cmdSource = New-Object System.Data.SQLClient.SQLCommand
$cmdSource.CommandText = "select * from my_spt_values"
$cmdSource.Connection = $conSource

$tabSource = $cmdSource.ExecuteReader()
$conBulk.WriteToServer($tabSource)

$conSource.Close()
$conSource.Dispose()

$conTarget.Close()
$conTarget.Dispose() 

Readcmd.ps1

Invoke-Sqlcmd -ServerInstance server2 -Database mydb -Query "truncate table test_stage"

Read-SqlViewData -ServerInstance server1 -Database master -SchemaName dbo -ViewName my_spt_values -OutputAs DataTable |
Write-SqlTableData -ServerInstance server2 -Database myfb -SchemaName dbo -TableName test_stage -Passthru 

Sqlcmd.ps1

Invoke-Sqlcmd -ServerInstance server2 -Database mydb -Query "truncate table test_stage"

Invoke-Sqlcmd -ServerInstance server1 -Database master -Query "select * from my_spt_values" -OutputAs DataTables |
Write-SqlTableData -ServerInstance server2 -Database mydb -SchemaName dbo -TableName test_stage -Passthru 

Measure

Measure-Command -Expression {.\sqlclient.ps1} | select seconds
Measure-Command -Expression {.\readcmd.ps1} | select seconds
Measure-Command -Expression {.\sqlcmd.ps1} | select seconds


Seconds
-------
     12
     36
     45

Reading the Data

While measuring, open task manager to see the memory consumption of each of the scrips, we will see that the first method did not load the entire result set to memory while the latter 2 methods did


On server1, a SQL trace will show the duration to select the same amount rows. The first method is the fasted with a 12 seconds. The last 2 methods take longer (18 and 24 seconds respectively) since they have to wait until the rows are loaded into memory, so they will hold a share lock on the table longer too, which may cause blocking in the source if other processes are writing to it



Also sp_WhoIsActive will show the wait type of ASYNC_NETWORK_IO which means the it is waiting for the client (PowerShell) to process the rows


Writing the Data

On server2, a SQL trace, will show the other benefit of using the .Net bulkcopy: the lock table hint, so the logical reads are much more less. In addition, bulkcopy allows to control the batch size to not fill up the transaction log if there is lots of concurrent transactions


We can see that to copy a large amount of rows, the .Net data provider is much faster because it does not load the entire result set into memory, and allows more control with hints such as table lock that saves IO