Saturday, 4 April 2015

How to use fast load option in OLEDB destination(avoiding bulk data rejection due to violation of constraints)?

The common issue in OLEDB destination with fast load option is if some error occurs due to constraint violation the whole batch will fail and it will be redirected to the error flow. In this blog i will show you some work around.
  1. Remove all the constrains from the table.
  2. Have a trigger to validate the constrains like: not null, check constrain where the constraint only applied on the row only.
  3. Have a stored procedure to validate the constraints like: primary key, unique
Let me illustrate this by an example:

There is a table called Emp which is having the following columns and constraints:

DeptID column is having a reference to the DeptID column of the Dept table. Structure of the Dept table is below:


The data inside the Dept table is below:



For the work around i have created two tables Emp_Temp and Emp_Error.

Emp_Temp: Same structure as EMP without any constraints.
Emp_Error: Same structure as Emp_Temp with an extra column named ErrorDesc to store the error description.

Now we will populate the Emp table by picking data from a flat file source. Following is the data present inside the flat file:





The package should be like this:


There should be a trigger written on the top of EMP_Temp table. The trigger should be as follows:


CREATE TRIGGER dbo.Tri_EMP
   ON  dbo.Emp_Temp
   AFTER INSERT
AS 
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;

    -- Insert statements for trigger here
Insert into Emp_Error (
EMPID,
EMPNAME,
DeptID,
Salary,
ErrorDesc
)
SELECT 
EMPID,
EMPNAME,
DeptID,
Salary,
'Not null violation'
FROM inserted
Where Salary is null
OR EMPNAME is null
END
GO

Following property need to be set in the OLEDB Destination so that the trigger will be fired:


There should be a procedure as below which will enforce the PK and FK constraints:

ALTER PROCEDURE EMP_Constraint
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;

    Insert into [dbo].[Emp_Error](
EMPID,
EMPNAME,
DeptID,
Salary,
ErrorDesc
)
SELECT 
EMPID,
EMPNAME,
DeptID,
Salary,
'PK Violation.'
FROM dbo.Emp_Temp
WHERE EMPID IN (SELECT EMPID FROM Emp_Temp GROUP BY EMPID HAVING COUNT(1)>1)

Insert into [dbo].[Emp_Error](
EMPID,
EMPNAME,
DeptID,
Salary,
ErrorDesc
)
SELECT 
EMPID,
EMPNAME,
Emp_Temp.DeptID,
Salary,
'FK Violation.'
FROM dbo.Emp_Temp
LEFT
JOIN dbo.Dept
ON Emp_Temp.deptid=Dept.DeptID
WHERE Dept.DeptID IS NULL

DELETE dbo.Emp_Temp
WHERE EMPID IN (SELECT EMPID FROM Emp_Error)
END

Once the data moves into the EMP_Temp table then that need to be transferred to the actual EMP table.
Try it out for your scenario it even works for bulk amount of data. I have used it for data of size 50 million. Good luck.