Saturday, 28 September 2013

Splitting a Flat files to small files using SSIS Package

Introduction:

Its a common scenario when you have a huge flat file in GB containing millions of records and you want to split it into small files. There are different ways of doing the same. I will demonstrate you how you can spit the file by using SSIS.

Method-1: By using For each loop, Conditional split and Row Number

Control Flow

 
 As you can see there are two Data Flow Tasks, one for loop container and two flat file connection managers are present.
 

Data Flow Task 1: DFT To get Source Count

 
This data flow is just to get the record count of the Big flat file. It has a Flat file source that reads data from the Big flat file and counts the row by using a row count transformation.
 

Variables

 
 
These are 4 variables:
  1. Counter: Used as the Counter in for loop
  2. endCounter: Upper bound of the counter in for loop
  3. reCount: Store the record count of the Big flat file
  4. splitCount: Count of record to be present in each split files

For Loop Container

 
This the condition placed inside for loop container.
 

Data Flow Task 2: Data File Splitter

 
As you can see it fetches the data from the Big flat file and passes it through a Script Component which attach a row number to each of the records.
 
This is the code present inside the Script component:
 
 
Once the row number is attached a conditional split transformation spits the file according to the row number. Following is the condition present inside the conditional split transformation:
 
 
 
After the file get spitted for the first cut it is being saved by a flat file destination. For each cut the flat file connection string changes dynamically and the flat file is saved with a different name. Following is the expression written inside the flat file destination for the Connection string property.

 

Pro

Simple easy to develop.
 

Con

Takes a lot of time.

Method-2: By using Script Component

 Control Flow

 
 Here as you can see there is only one data flow and inside that one flat file source is present. This flat file source fetches all the records from the file and passes it to a script component. The Script component splits the file into multiple flat files. Following code is present inside that:
#region Namespaces
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
using System.IO;
#endregion
 
 [Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    string copiedAddressFile;
    private StreamWriter textWriter;
    private string columnDelimiter = ",";
    private string filepath = @"C:\Work Area\Innovation\Flatfile spit on a fly\test.txt";
    private string[] columns;
 
 
    public override void PreExecute()
    {
       
        IDTSInput100 input = ComponentMetaData.InputCollection[0];
        columns = new string[input.InputColumnCollection.Count];
        for (int i = 0; i < input.InputColumnCollection.Count; i++)
        columns[i] = input.InputColumnCollection[i].Name;
    }
 
    public override void PostExecute()
    {
       
    }
 
   
    public override void AcquireConnections(object Transaction)
    {
       
        
    }
 
    public override void ProcessInput(int InputID, Microsoft.SqlServer.Dts.Pipeline.PipelineBuffer buffer)
    {
        while (buffer.NextRow())
        {
            if (buffer[columns.Length-1].ToString() == "1")
        {
            filepath = @"C:\Work Area\Innovation\Flatfile spit on a fly\test" + Convert.ToString(Convert.ToInt32(buffer[columns.Length - 1]) - 1) + ".txt";
            textWriter = new StreamWriter(filepath, false);
            for (int i = 0; i < columns.Length; i++)
            {
                textWriter.Write(columns[i]);
                if (i != columns.Length - 1)
                    textWriter.Write(columnDelimiter);
            }
            textWriter.WriteLine();
        }
            if (Convert.ToInt32(buffer[columns.Length - 1]) % 2000000 == 1)
        {
            textWriter.Close();
            filepath = @"C:\Work Area\Innovation\Flatfile spit on a fly\test" + Convert.ToString(Convert.ToInt32(buffer[columns.Length - 1]) - 1) + ".txt";
            textWriter = new StreamWriter(filepath, false);
            for (int i = 0; i < columns.Length; i++)
            {
                textWriter.Write(columns[i]);
                if (i != columns.Length - 1)
                    textWriter.Write(columnDelimiter);
            }
            textWriter.WriteLine();
        }
 
            for (int j = 0; j < columns.Length; j++)
            {
                textWriter.Write(buffer[j]);
                if (j != columns.Length-1)
                    textWriter.Write(columnDelimiter);
                else
                    textWriter.WriteLine();
            }
 
            if (buffer[columns.Length - 1].ToString() == "10000000000000")
            textWriter.Close();
        }
    }
}

Pros

Faster to split the file.