World of MSBI: How to use fast load option in OLEDB destination(avoiding bulk data rejection due to violation of constraints)?

The common issue in OLEDB destination with fast load option is if some error occurs due to constraint violation the whole batch will fail and it will be redirected to the error flow. In this blog i will show you some work around.

Remove all the constrains from the table.
Have a trigger to validate the constrains like: not null, check constrain where the constraint only applied on the row only.
Have a stored procedure to validate the constraints like: primary key, unique

Let me illustrate this by an example:

There is a table called Emp which is having the following columns and constraints:

DeptID column is having a reference to the DeptID column of the Dept table. Structure of the Dept table is below:

The data inside the Dept table is below:

For the work around i have created two tables Emp_Temp and Emp_Error.

Emp_Temp: Same structure as EMP without any constraints.

Emp_Error: Same structure as Emp_Temp with an extra column named ErrorDesc to store the error description.

Now we will populate the Emp table by picking data from a flat file source. Following is the data present inside the flat file:

The package should be like this:

There should be a trigger written on the top of EMP_Temp table. The trigger should be as follows:

CREATE TRIGGER dbo.Tri_EMP

ON dbo.Emp_Temp

AFTER INSERT

BEGIN

-- SET NOCOUNT ON added to prevent extra result sets from

-- interfering with SELECT statements.

SET NOCOUNT ON;

-- Insert statements for trigger here

Insert into Emp_Error (

EMPID,

EMPNAME,

DeptID,

Salary,

ErrorDesc

)

SELECT

EMPID,

EMPNAME,

DeptID,

Salary,

'Not null violation'

FROM inserted

Where Salary is null

OR EMPNAME is null

END

Following property need to be set in the OLEDB Destination so that the trigger will be fired:

There should be a procedure as below which will enforce the PK and FK constraints:

ALTER PROCEDURE EMP_Constraint

BEGIN

-- SET NOCOUNT ON added to prevent extra result sets from

-- interfering with SELECT statements.

SET NOCOUNT ON;

Insert into [dbo].[Emp_Error](

EMPID,

EMPNAME,

DeptID,

Salary,

ErrorDesc

)

SELECT

EMPID,

EMPNAME,

DeptID,

Salary,

'PK Violation.'

FROM dbo.Emp_Temp

WHERE EMPID IN (SELECT EMPID FROM Emp_Temp GROUP BY EMPID HAVING COUNT(1)>1)

Insert into [dbo].[Emp_Error](

EMPID,

EMPNAME,

DeptID,

Salary,

ErrorDesc

)

SELECT

EMPID,

EMPNAME,

Emp_Temp.DeptID,

Salary,

'FK Violation.'

FROM dbo.Emp_Temp

LEFT

JOIN dbo.Dept

ON Emp_Temp.deptid=Dept.DeptID

WHERE Dept.DeptID IS NULL

DELETE dbo.Emp_Temp

WHERE EMPID IN (SELECT EMPID FROM Emp_Error)

END

Once the data moves into the EMP_Temp table then that need to be transferred to the actual EMP table.

Try it out for your scenario it even works for bulk amount of data. I have used it for data of size 50 million. Good luck.

World of MSBI

Saturday 4 April 2015

How to use fast load option in OLEDB destination(avoiding bulk data rejection due to violation of constraints)?

No comments:

Post a Comment