Tips and Tricks: SQL Server Bulk Import Organization
Author: Joseph Cuesta
October 2009
SituationYou have a large data warehouse where data gets loaded nightly and/or weekly
ProblemYour batch jobs are taking longer to run and are using up more disk space
SolutionOptimize your bulk-insert jobs
Optimization OverviewDetermine which method of Bulk Import Operations you are using or want to use:
• Bcp: A command-line utility (Bcp.exe) that bulk exports and bulk imports data and generates format files
• BULK INSERT Statement: A Transact-SQL statement that imports data directly from a data file into a database table or nonpartitioned view
• OPENROWSET(BULK) Statement: A Transact-SQL statement that uses the OPENROWSET bulk rowset provider to bulk import data into a SQL Server table by specifying the OPENROWSET(BULK…) function to select data in an INSERT statement.
You will also need to consider the following factors which will work best with your Import Operation:
• Whether the table has constraints or triggers, or both
• The recovery model used by the database
• Whether the table into which data is copied is empty
• Whether the table has indexes
• Whether TABLOCK is being specified
• Whether the data is being copied from a single client or copied in parallel from multiple clients
• Whether the data is to be copied between two computers on which SQL Server is running
Once you have determined the factors above you can decide which optimization methods you want to use.
Four Methods to Optimize Bulk Import
• Minimal logging
For bulk-import operations, minimal logging is more efficient than full logging and reduces the possibility that a bulk-import operation will fill the log space. To minimally log a bulk-import operation on a database that normally uses the full recovery model, you can first switch the database to the bulk-logged recovery model. After bulk importing the data, switch the recovery model back to the full recovery model.
• Import data in parallel
SQL Server allows data to be bulk imported into a single table from multiple clients in parallel. All three bulk import mechanisms support the parallel import of data. This can improve the performance of data import operations.
• Use batches
Importing a large data file as a single batch can be problematic. Each batch is imported and logged in a separate transaction, and after a given transaction is committed, the rows imported by that transaction are committed. If the operation fails, only rows imported from the current batch are rolled back, and you can resume importing data starting at the beginning of the failed batch rather than at the beginning of the data file.
• Table Locking
All three Bulk insert operations let you specify that the table is to be locked during a bulk-import operation. When you specify table locking, a bulk update table-level lock is taken for the duration of the bulk-import operation. Table locking can improve performance of the bulk-import operation by reducing lock contention on the table.
More Tips
• Consider using Full Recovery Model and switching to simple recovery model for bulk inserts. Bulk Logged recovery model actually uses more log space than Full Model during bulk imports
• Disabling triggers and constraints during the bulk import can improve load times also
• No special optimization techniques exist for bulk-export operations. These operations simply select the data from the source table by using a SELECT statement
• If using simple recovery model, Microsoft recommends that you take a full or differential backup after the bulk-import operation finishes
• Dropping indexes can improve performance on a large bulk-insert only if the amount of data is large compared to the data already in the table
• You cannot bulk insert into remote tables
• If you want to perform a parallel bulk import you cannot use table locks (TABLOCK) on a table with a clustered index
• Keeping the batchsize small will reduce log size and decrease recovery time if the job fails
These are the main factors to increase bulk-import operations; there are many more considerations to take into account. Please read Microsoft Books Online and any documentation before making any changes to your database.
If you need more information on how to optimize your SQL Server databases, contact Gary Codeluppi at 937-427-3069 x1123 or visit the Ross Group Inc web site at www.rossgroupinc.com.