World of MSBI: SQL Server, SSIS and SSRS interview questions

Please leave your valuable comments so that we can improve.

1. What is Index? What are the types of Index? Please Explain.

Ans. Index is a faster way of doing lookup with data based on the values within those columns.
If we create an index on the primary key and then search for a row of data based on one of the primary key values, SQL Server finds that value in the index and then uses the index to locate the entire row.

An Index is made up of a set of pages(index nodes) that are organized in a b-tree hierarchical structure, starting from the root node and ending at the leaf nodes. Each node contains a double linked list.

We cant create indexes on LOB data types such as Image,Text or nvarchar(max).

Clustered Index :-contains the actual data rows at the leaf node of the Index. Index values are always sorted in ascending or descending order. That's why only 1 clustered index is possible.
Non Clustered Index :-contains a reference to the index in case of a clustered table or a rowlocator in case of heap(non clustered view or table).Max no in case of SQLserver 2005 :-249,Max no in case of SQLServer 2008 :-999
A Non-Clustered Index you will use alongside a clustered index to improve read performance. You should put these on columns used in WHERE Clause, JOIN, etc. and can include other columns that the query needs, to enhance performance even more. This is called a "covering index".
Disadvantage:-Slows down the Insert, Update or Delete. Non Clustered will occupy more disk space.

2. What is table Partition?

Ans. When a database table grows in size to the hundreds of gigabytes or more, it can become more difficult to load new data, remove old data, and maintain indexes. Just the sheer size of the table causes such operations to take much longer. Even the data that must be loaded or removed can be very sizable, making INSERT and DELETE operations on the table impractical. The Microsoft SQL Server 2008 database software provides table partitioning to make such operations more manageable.

Partitioning a large table divides the table and its indexes into smaller partitions, so that maintenance operations can be applied on a partition-by-partition basis, rather than on the entire table. In addition, the SQL Server optimizer can direct properly filtered queries to appropriate partitions rather than the entire table.

Benefits :-Performance,Parallelism

Right Click the table ,Storage,Create Partition,Create Partition Wizard will appear,Next,Give Partition Function name,Give Partition Scheme name,Mapping,option to create script or run or Schedule

Window Functions provide results for each row in the dataset.For Eg : Row_Number(),Rank() and Dense_Rank().

Group by provides one aggregated value for one dataset but Partition by provides value for each of the row in the data set.

3. What is JOIN in SQL Server?

Ans. Join is the process of joining two or more tables or views on the basis of a relationship to get a desired output result set.

It's of three types:

1. Inner Join

2. Outer Join

a. Left outer Join

b. Right outer Join

c. Full outer Join

Inner Join: The rows of the tables those passing the join condition will be the output of the Inner Join.

Outer Join:

Left Outer Join: It will give all the records of the Left table and the only those rows from the right table that pass the Join condition.

Right Outer Join: It will give all the records of the Right table and only those tables from the left table that pass the Join condition.

Full Outer Join: It will give all the records from Right table and Left table.

4. How to delete duplicate rows in SQL Server?

Ans. Two ways to delete duplicate rows in SQL Server:

1. Using %%physloc%% pseudo column.

The %%physloc%% pseudo column holds the physical location of the row. you can take the max of the column and delete the rest.

2. Using Common Table Expression(CTE)

Create a CTE on the table and create a row number column for the table grouping by all columns. Then, take the max row number row alone and delete the rest.

5. What is Common Table Expression(CTE)?

Ans. Common Table Expression or CTE allows us to write a query and name it using an alias and later refer the same data using the alias name just as a table. The changes made to the alias are reflected in the base table.
Advantages:-
1)When lot of Sub-queries are involved, then CTE can put each sub-query as a temp table and these temp tables can be referred afterwards.
2)Readability
3)Recursive Query can be achieved
3)Calculated Fields like Row number, Rank and Dense Rank can be stored in CTE.
4)When we don't want a view to be in Metadata, we can use CTE.
5)Looping

Syntax:
With Temp_Table_Name (Col1,Col2) as (Query_String)

After the above statement, we can write a query referring to the above Temp_Table_Name.

6. How to send Multivalued Parameter to a procedure from SSRS?

Ans. Parameter -Multivalued,In the dataset,Stored Procedure to be chosen from drop-down,Click Parameters,Go to expression and type join(Parameter,delimiter).In the SP,we have to use a table valued split function after IN.

7. 1 million records are coming from Source.after 10000 records not matching in the look up,we want to fail the package.

Ans. Use Row Count and FailPackageonFailure Expression on the task

8. How to delete duplicate records in Sql Server 2008?

Ans. %%physloc%% is a pseudocolumn that gives the physical location of every row(or ROW_ID).

With temp table name will re-generate the same table with additional column, which is row number. In our case, we have Col1 and Col2 and both the columns qualify as duplicate rows. It may be a different set of rows for each different query like this. Another point to note here is that once CTE is created DELETE statement can be run on it. We will put a condition here – when we receive more than one rows of record, we will remove the row which is not the first one. When DELETE command is executed over CTE it in fact deletes from the base table used in CTE.

10. What is Common Table Expression(CTE) ?

Ans. Common Table Expression or CTE allows us to write a query and name it using an alias and later refer the same data using the alias name just as a table.The changes made to the alias are reflected in the base table.Advantages:-

1)When lot of Subqueries are involved,then CTE can put each subquery as a temp table and these temp tables can be referred afterwards.

2)Readability

3)Recursive Query can be achieved

3)Calculated Fields like Rownumber,Rank and Dense Rank can be stored in CTE.

4)When we dont want a view to be in Metadata,we can use CTE.

5)Looping

With Temp_Table_Name (Col1,Col2) as (Query_String)

After the above statement,we can write a query referring to the above query."

11. Explain different types of Ranking Functions?

Ans. Ranking Functions return a ranking value for each row in a partition. Ranking Functions are Row_Number(),Rank(),Dense_Rank() and NTILE().

Row_Number() function returns a unique ranking value for each row irrespective of whether Order by Value is a tie or not.

Dense Rank() function is same as Row_Number() but it provides the same ranking value to rows in case of ties in the Order by Values.

Rank() function returns the ranking value as 1+No of ranks that come before the row in question.Thats why Rank function does not always return consecutive integers.The ranking value depends on how the rows appear in the result set.

NTILE(integer) over (partition by-order by) function distributes the rows in an ordered partition into a specified number of groups.The groups are numbered, starting at one. For each row, NTILE returns the number of the group to which the row belongs.If the number of rows in a partition is not divisible by integer_expression, this will cause groups of two sizes that differ by one member. Larger groups come before smaller groups in the order specified by the OVER clause. For example if the total number of rows is 53 and the number of groups is five, the first three groups will have 11 rows and the two remaining groups will have 10 rows each. If on the other hand the total number of rows is divisible by the number of groups, the rows will be evenly distributed among the groups. For example, if the total number of rows is 50, and there are five groups, each bucket will contain 10 rows.

Example of Ranking Functions

C1 RN DR R NTILE(3)

10 1 1 1 1

20 2 2 2 1

20 3 2 2 1

30 4 3 4 2

30 5 3 4 2

40 6 4 6 2

40 7 4 6 3

40 8 4 6 3

50 9 5 9 3

C1-Column name

RN-Row Number

DR-Dense_Rank

R-Rank

NTILE-NTILE

13. How to get the nth Maximum id in Sql Server 2008?

Ans. Top N * over descending Column dataset having distinct,Then do select top 1 * from (Earlier Dataset)

14. How to get the department name having the nth Maximum id in Sql Server 2008?

Ans. Use Dense_Rank over (order by id) and store it in a temp table using CTE.

Then Select from the temptable where rank=n"

15. How to get the 5th Maximum Salary for each department?

Ans. Use Dense_Rank over (Partition by department order by Salary) and store it in a temp table using CTE.

Then Select from the temptable where rank=n

Remember:-Partition divides the result set into different groups or partitions first and then Order By returns the ranking for each row inside the Partition based on the column value."

16. How to get Cumulative Sum of a column over a period of time?

Ans. 1)Using Self Join :-

select r1.priority_key,r1.colour,SUM(r2.priority_key)

from Rep_Priority_Colour r1,

Rep_Priority_Colour r2

where

r2.priority_key <= r1.priority_key

group by

r1.priority_key,r1.colour

order by r1.priority_key (By using self join)

2) Using Nested Sub-Query:-

SELECT DayCount,

Sales,

Sales+COALESCE((SELECT SUM(Sales)

FROM Sales b

WHERE b.DayCount < a.DayCount),0)

AS RunningTotal

FROM Sales a

ORDER BY DayCount

3) Using Cursor(A table variable and a variable RunningTotal=RunningTotal+Column value inside While )

4) Using Cross apply also we can get that"

17. What is Correlated SubQuery?

Ans. Correlated SubQuery means Repeated Subquery.For each row returned by the parent or outer query,the subquery executes .The subquery depends on the outer query for its values.

Eg:select * from dbo.FactTicket where exists (select Priority_key from DimTicketPriority where Priority_key=FactTicket.Priority_key)"

18. How to get the last day of the week or first day of the week?

Ans. 1) Take a reference 0.Do Datediff(ww,0,getdate()) which will give the difference between the reference and getdate() in terms of weeks.

2) If we want current week's,then do Datediff(ww,0,getdate())+0 or for current week+n do Datediff(ww,0,getdate())+n.This is nothing but the number of weeks from the reference.

3)Dateadd(ww,Datediff(ww,0,getdate())+n,0) will give us the first day of nth week after Current month.

4)To get the last day for (n-1) week,we can do Dateadd(s,-1,Dateadd(ww,Datediff(ww,0,getdate())+n,0))

19. How to get the last day of the month or first day of the month?

Ans. 1) Take a reference 0.Do Datediff(mm,0,getdate()) which will give the difference between the reference and getdate() in terms of months.

2) If we want current month's,then do Datediff(mm,0,getdate())+0 or for current month+n do Datediff(mm,0,getdate())+n.This is nothing but the number of months from the reference.

3)Dateadd(mm,Datediff(mm,0,getdate())+n,0) will give us the first day of nth month after Current month.

4)To get the last day for (n-1) month,we can do Dateadd(s,-1,Dateadd(mm,Datediff(mm,0,getdate())+n,0))

Same Rule applies for first day and last day of any datepart.Datepart Syntax:Datepart(mm,getdate())

20. Difference between Index Scan and Index Seek Scan?

Ans. In Index scan, SQL Server reads the whole of the index looking for matches. Time taken is proportional to the index size.In Index Seek Scan,SQL Server uses the b-tree structure of the index to seek directly to matching records.Index seek is preferable.However,Index scan is more efficient when the table is small or a large percentage of records match the predicate.

21. What are the different Transactional Faults?

Ans.

1)Dirty Reads

2)Non Repeatable reads

3)Phantom Rows

4)Lost Updates

5)Deadlocks"

22. What is Dirty Reads?

Ans. Transaction 2 can read Transaction 1's uncommitted changes. Most gregarious.

23. What is Non Repeatable Reads?

Ans. Similar to Dirty Reads,But Transaction 2 can read committed updates of Transaction 1 also.At any point of time,Transaction 2 should produce the same result set but if reading a row twice gives 2 different results then its a non repeatable read.

24. What is Phantom Rows?

Ans. A update in transaction 1 causes the select in transaction 2 to return different set of rows.Less

25. What is Lost Updates?

Ans. When the second user's updates overwrite the first user's update when they are both trying to update the same table.

26. What is Deadlocks?

Ans. When 2 transactions with multiple tasks compete for the common task, then its deadlock. Transaction 1 locks data A and needs to lock data B to complete the transaction. Transaction 2 has locked data B and needs to lock data A to complete the transaction. Each transaction is stuck waiting for the other to release its lock, and neither can complete

until the other does. Unless an outside force intercedes, or one of the transactions gives up

and quits, this situation could persist until the end of time.

27. What are the different Transaction Isolation Levels?

Ans. Read Uncommitted, Read committed, Repeatable Read, Serializable, Snapshot, Read Committed Snapshot

28. What is Read Uncommitted?

Ans. no locks,least restrictive,doesn’t prevent any transactional faults,good for reporting

29. What is Read Committed?

Ans. prevents Dirty Reads but doesn’t make the system down with excessive lock contention.Thats why SQL Server's default

30. What is Repeatable Read?

Ans. By preventing dirty reads and non-repeatable reads, the repeatable read isolation level

provides increased transaction isolation without the extreme lock contention of serializable

isolation.

31. What is Serializable Read?

Ans. Prevents every transactional faults,Extreme lock contention,imp in systems where transactional integrity is more imp than performance like Banking

32. What is Snapshot?

Ans. Prevents reader vs writer conflict.Makes a snapshot copy of the data being updated.When the update is committed, it’s written back over the original data. So readers continue to read the same snapshot copy when the writer has already updated the db. But gave rise to writer-writer conflict.

33. What is Read Committed Snapshot?

Ans. This variation of snapshot isolation behaves the same as read committed but without the writer versus reader contention.

34. What is Event Logging?

Ans. SSIS logging is feature that enables the tracing of a package execution. It can be applied to the package level or on the task level.

1. Right click on the package body

2. Chose the log provider type(Text File, Windows Log, XML, SQL Profiler, SQL Server.)

3. Configure it by placing it's connection.

4. Specify the event in the details tab.Package logging has a loggingmode which accepts three values-enabled,disabled and UseParentSetting(To use parent's setting of that component to decide whether or not to log the data)."

35. What happens when RetainSameConnection Property is set to true?

Ans. RetainSameConnection Property on OLEDB Connection manager enables us to handle SQL Server transactions without the use of MSDTC(Microsoft Distributed Transformation Coordinator.)By using RetainSameconnection property on the OLEDB Source,we can maintain transactions over multiple tasks.

36. What is Package Configuration?

Ans. Package Configuration is nothing but exposing the properties of the package to get updated by the values coming from the File, Table, Environment variable, Parent package variable or registery key.

37. Send Mail Task(How to write CC,BCC,From and To)?

Ans. Message source type can be a variable, Text file or direct input.

38. How to abruptly kill a package running on server?

Ans. In write Action of Windows administrators,Kill the Process running the package,stop the job.

39. How to know which package is running on server?

Ans. SSIS Package Instances is a performance counter that gives- Total number of simultaneous SSIS Packages running

40. What is FTP task?

Ans. Can able to perform any file operations in server.

41. What is Difference between Lookup and Fuzzy Lookup?

Ans. Lookup gives the exact match where as Fuzzy lookup gives similar matches.For each row in the input lookup will produce only 1 result but Fuzzy lookup will give many.

42. What is Difference between Control Flow(CF) and Data Flow(DF)?

Ans. 1)A CF is process oriented while DF is data oriented.

2)A CF comprises Tasks,Containers and precedence Constraints while DF comprises Sources,Transformations and Destinations.

3)A CF's fundamental unit is Task but a DF's fundamental unit is component.

4)A CF's outcome may be Success,Failure or Completion but DF's can be anything."

43. What is Difference between Merge and Union All?

Ans. Merge needs Sorted input whereas Union All does not.

Merge give sorted output but union all does not.

Merge can take only 2 datasets but Union All can take more than 2 datasets.

44. How to create SubReport?

Ans. Sub-Report is a Report item which can be included inside the report and for each row of the main report the sub report can be processed. Sub-Report is actually a reference to a different report.

45. What is Views?

Ans. We can only use select statement inside a view.

•To hide the complexity of the underlying database schema, or customize the data and schema for a set of users.

•To control access to rows and columns of data.

•To aggregate data for performance.

We can insert,update or delete a view.(if aggregate functions or joins are not there)

46. When 1 package is being called from another package and both are deployed in server,then what should be the child package's path name?

Ans. HTTP Connection Manager

47. Which is the most efficient way of reporting? A complicated dataset,A Stored Procedure,A dynamic procedure

Ans. Depends on profiler and execution log.

48. How to create a Drill Down Report?

Ans. Create any report, create group over the data region. On the details set the hidden property true and in toggle by any report item select the group.

49. How to get error description in SSIS?

Ans. In a Script component we need to write the following code to get the error description:

Imports System

Imports System.Data

Imports System.Math

Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper

Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

Public Class ScriptMain

Inherits UserComponent

Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

'Use the incoming error number as a parameter to GetErrorDescription

Row.ErrorDescription = ComponentMetaData.GetErrorDescription(Row.ErrorCode)

End Sub

End Class"

50. Explain the Architecture of SSIS?

Ans. SSIS architecture consists of four key parts:

a) Integration Services service: monitors running Integration Services packages and manages the storage of packages.

b) Integration Services object model: includes managed API for accessing Integration Services tools, command-line utilities, and custom applications.

c) Integration Services runtime and run-time executables: it saves the layout of packages, runs packages, and provides support for logging, breakpoints, configuration, connections, and transactions. The Integration Services run-time executables are the package, containers, tasks, and event handlers that Integration Services includes, and custom tasks.

d) Data flow engine: provides the in-memory buffers that move data from source to destination.

51. What is Synchronous Transformations?

Ans. Row transformations either manipulate data or create new fields using the data that is available in that row. Examples of SSIS components that perform row transformations include Derived Column, Data Conversion, Multicast, and Lookup. While these components might create new columns, row transformations do not create any additional records. Because each output row has a 1:1 relationship with an input row, row transformations are also known as synchronous transformations. Row transformations have the advantage of reusing existing buffers and do not require data to be copied to a new buffer to complete the transformation

52. What is Asynchronous Transformations?

Ans. Partially blocking transformations are often used to combine datasets. They tend to have multiple data inputs. As a result, their output may have the same, greater, or fewer records than the total number of input records. Since the number of input records will likely not match the number of output records, these transformations are also called asynchronous transformations. Examples of partially blocking transformation components available in SSIS include Merge, Merge Join, and Union All. With partially blocking transformations, the output of the transformation is copied into a new buffer and a new thread may be introduced into the data flow.

53. What is Blocking transformations?

Ans. Blocking transformations must read and process all input records before creating any output records. Of all of the transformation types, these transformations perform the most work and can have the greatest impact on available resources. Example components in SSIS include Aggregate and Sort. Like partially blocking transformations, blocking transformations are also considered to be asynchronous. Similarly, when a blocking transformation is encountered in the data flow, a new buffer is created for its output and a new thread is introduced into the data flow.

54. What is Fuzzy Grouping?

Ans. 1) The Fuzzy Grouping transformation performs data cleaning tasks by identifying rows of data that are likely to be duplicates and selecting a canonical row of data to use in standardizing the data.

The transformation produces one output row for each input row, with the following additional columns:

1)_key_in, a column that uniquely identifies each row.

2)_key_out, a column that identifies a group of duplicate rows. The _key_out column has the value of the _key_in column in the canonical data row. Rows with the same value in _key_out are part of the same group. The _key_outvalue for a group corresponds to the value of _key_in in the canonical data row.

4)_score, a value between 0 and 1 that indicates the similarity of the input row to the canonical row.

We can also specify the value of Similarity Threshold.

55. How to do Fast load without failure of the batch?

Ans. You can’t do fast load without failure of batch. In fast load option the batch will fail if a record failed to insert.

56. How to execute a Procedure in Execute SQL Task?

Ans. EXEC Procedure Name ?,?,? - '?' denotes the parameters. And the parmeters will be set in the expression.

57. How to get the connection string dynamically?

Ans. By writing an expression on the connection string property of the connection manager.

58. How to run Tasks parallelly?

Ans. Each Package has MaxConcurrentExecutables property that is set to -1 by default,which means the total no of tasks can be executed is n+2 where n is the no of processors of the system.We can change this number to an absolute value that indicates absolute no of tasks that can be executed in parallel.In DF,we can set threads which will execute different execution trees of the DFT.

59. Which is the Default container of SSIS?

Ans. Task Host Container

60. How to check the quality of data at run time?

Ans. Data Viewer, Data Profiler

61. How to remove Sort transformation?

Ans. You can avoid the Sort transformation while using OLEDB sources. You can use order by clause to sort the input records. Then set the Issorted property as true and set the sort key positions of the order by columns.

62. How to put if else in Data Flow and Control Flow?

Ans. Control flow- Precedence constraints

Data flow-Conditional Split transformation

63. How to set the Buffer in Data Flow?

Ans. Each DFT has 2 properties-DefaultMaxBufferRows and DefaultMaxBufferSize which control respectively the number of rows which can be stored in a single buffer and how many buffers can be created.DefaultMaxBufferRows has default value 10000 rows and DefaultMaxBufferSize has default value 10 MB.Any data eg BLOB data if cant be put in the buffer is spooled to the File System.

64. How to ignore the failure in Control Flow and Data Flow?

Ans. Control Flow- Use precedence constraints as on completion

Data Flow- Redirect row on failure

65. How to migrate DTS packages to SSIS packages?

Ans. Open BIDS.In BIDS go to the Project tab and click on Migrate DTS 2000 Package.Give the Source location and destination location and complete the wizard steps.

66. How you upgrade a package from SSIS 2005 to SSIS 2008r2? Ans.

67. How to create a SSAS Cube and deploy it?(Building a cube with data source)

Ans. 1.First create an analysis services project.

2.Create a Data Source (Connection made to the server and database and authentication provided.) by right clicking on the data source of the project which will start a wizard.(Extension will be .ds)

3.Create a Data Source View(Add tables from one or more database and add related tables for the added tables.Can also add filters) by right clicking on the data source view of the project which will start a wizard.(Extension will be .dsv)

4.Right Click on the Cube and Click on a new cube which will start a Cube wizardThe Cube Wizard will ask for Select bui.Here we have to check the measure group tables and check the relevant measures for the measures group tables and dimension tables.(Extension will be .cube)

5.To hide particular measure groups and dimensions to particular users,create perspectives.We can rename the already created perspective and for the new perspective we can check and uncheck based on the requirement.

6.In BIDS,we can build and then deploy the cube.After that we can right click on the particular cube and then browse.In browse window,we will have drag nad drop row fields and column fields.In Row fields basicallly we will drag dimension rows and in column fields we will drag fact measures.

68. How to create a SSAS Cube and deploy it?(Building a cube without data source)

Ans. 1.Right Click on Cube.Cube wizard will ask for 'Select build method-with or without using data source'.Click on ""Build the cube without using data source"".

2.Then define measures.

3.Define Dimensions and Dimension Usage and complete the wizard.

4.Now create the datasourceview and create data source in that.The wizard that pops up this time is called Schema Generation wizard.

5.Populate the data source with data so that cube can be populated.

6.Process the Cube and View the Cube."

69. How to create an SSRS report using SSAS?

Ans. 1.In the Data Source,put the Source type as Analysis Services and put the analysis Server name.

2.In the Dataset,click on Query builder.The Query builder will open the SSAS Query Designer Window.

3.Then we can drag and drop fields on the results pane of the Query designer as per our requirement.

4.There are several buttons in the tool bar of the Query builder that can be used in MDX like Add Calculated Members,Show Empty Cells,Auto Exec,Show Aggregates.

5.The Auto Exec button turns on and off the execution of queries.

6.The Show Empty Cells works this same as the MDX Non Empty clause; basically it shows / hides non empty cell values.

7.In the upper half of the design area,the query designer contains a filter grid which is useful for filtering the results and serves as an easy way of creation of parameters.

8.Clicking OK on the SSAS query builder gives rise to MDX query in the query string of Dataset.

9.Now we can make any type of reports using the fields provided by the dataset."

70. How to create parameters/Multivalued parameters in SSRS report using SSAS?

Ans. 1.We can put all values in the filter expression and can check the parameter.

2.By default,in report parameters a report parameter will be created and a hidden dataset for the parameter will be created.

3.If we right click on the dataset and click on ""Show Hidden Datasets"",then we will get the dataset.

4.But in previewing the report,we will get two All values-one Select All and another All.One Select All is the property of the SSRS multivalued Property of the parameter and the second parameter is driven from the MDX query that SSRS runs to populate the parameter list.

5.We can remove the second one by :-

making a change in the MDX query i.e. making Parameters.AllMembers as parameters.Children or filtering out the condition that Parameterlevel > 0 .(Because All of MDX query will have Parameter level 0)

71. What is Cube Scheduling?

Ans. When we deploy a SSAS Project,the Data Sources,Data Source Views are put in the local instance of the SQL Server Analysis Server.Whatever changes are made to the production database or Analysis service will not be reflected in the local instance of analysis service until we deploy the project again.We have to process the cube each and everytime.

We can process the cube manually (using Management Studio).

1)In order to schedule the processing of the cubes at a particular time,we can make use of Analysis Services Process Task in SSIS and give the Cube name.Then we can make a job scheduling this Package.

2)Right click on the cube and click on process.Click on Script and click on Copy to clipboard and cancel.Then make a new job and in command,paste the same and schedule the job."

72. What is Dimensions?

Ans. Dimension is a context by which we slice and dice data in a DW.Types :1)Conformed 2)Junk 3) Role playing

73. What is Conformed Dimensions?

Ans. Two dimensions are said to be conformed if they are completely identical or strict mathematical subsets of each other.Conformed dimensions have same row headers and labels also.Eg:-Time dimension used for Fact1 and Time dimension used for Fact2

74. What is Junk Dimensions?

Ans. After database modelling,if there are some attributes which are not linked to any of the dimensions or fact and they are of less importance and they carry some distinct values(mostly flag values or indicators),then in that case we can go for a dimension which carries all these attibutes.These dimensions are known as junk dimensions.Benefits :-One dimension having all the attributes will perform better compared to individual dimension for each attribute.Disadvantage:-If in future we will think of making an attribute as a fully fledged dimension,then in that case we have to restructure the cube.

75. What is Degenerate Dimensions?

Ans. When we have columns on the fact table that do not relate to any of the dimensions but are most required for analysis,then we can make a separate dimension for the attribute.This dimension is called Degenerate Dimension.For Eg: A transaction number or a ticket number which is a most requested feature can be stored separately as distinct values in a degenerate dimension because the fact will have multiple records for 1 transaction id.So while quering or analysing Transaction no or Ticket no,we can go by degenerate dimension table.In some of the degenerate dimensions,we can put another column which is not degenerate and a running id for the degenerate transaction id.The difference between normal dimension and a degenerate dimension is that degenerate attribute should not be available to the end user for analysis in analysis services.

76. What is Role Playing Dimensions?

Ans. Dimensions are often recycled for multiple applications within the same database.A Date dimension can be used for Date_of_Hire,Date_of_Joining etc.So it’s a role playing dimension.

77. Explain Slowly Changing Dimensions?

Ans. Slowly Changing Dimensions are dimensions that have data that changes slowly rather than changing on a time based regular schedule.

1)Type 0-No action is taken

2)Type 1-overwrites old data with new data and therefore does not maintain history

3)Type 2-Maintains history by inserting multiple records for a natural key with different surrogate keys(a running number against a natural key of the table) and having version numbers or Start Date-End Date.

4)Type 3-Contains a column in the table which represents Original_state.This column contains the first value of the attribute where as Current_state contains the present value.This doesnt maintain histories of values in between the original and current state.

5)Type 4-Contains a table which will have the current record and another history table with a create date column.

6)Type 6-Combines approaches of types 1,2 and3.It maintains individual records having start date,end date and Flag.The Current record's Flag is set to Y rest all are set to N's."

78. What is Factless tables or Bridge Tables?

Ans. If in a fact table there exists many to many relationships with dimensions with no figures or measures or values then it is known as Factless fact or bridge tables.This table is needed to bridge the relationships between dimensions.For Eg suppose we have a product dimensions which contain product key and product related information and we have an offer dimension which contain offer key and discount related information,its timing and pricing.We can make a bridge table of Productoffer which stores the product which has special offers.If the Fact stores Product,Offer and Sale as Columns,then we will surely miss the information about products which were having special offers but not having a sale.

79. What is Snapshot and transaction fact tables?

Ans. A Transaction fact table records events and for each event certain measurements are recorded or values recorded.So when we record a sale we create a new row altogether for the information relating to sale.A snapshot fact table records the state of something at different points of time.For Example the sales of each product every month.Snapshot Fact table therefore keeps aggregated information which results in lesser data rows,lesser cubes,faster cube processing and faster querying.But it has no information on individual transaction levels.

80. What is UDM(Unified Dimensional Modelling)?

Ans. It provides a bridge between the user and the data sources.A UDM is constructed over one or more physical data sources and then the user issues queries against the UDM.

81. What is Two tier architecture of SSAS?

Ans. SSAS uses both server and client components to supply OLAP and data mining functionality.

1)Server component is implemented as a Microsoft windows service.

2) Clients interact using Standard XML."

82. Difference between Derived measure and Calculated measure?

Ans. 1) A derived measure calculates before agggregations are created and the value is stored in the Cube.

2)A calculated measure calculates after aggregations are created and the value is not stored inside the Cube."

83. What is Cube?

Ans. Cubes are a logical representation of multidimensional data.It stores large amounts of historical data.It alos shows the point in time data and it shows the member level data and security is very detailed.

84. Differences between OLTP and OLAP

Ans. 1)OLTP contains the source information of any system.OLAP contains the data from OLTP.

2)OLTP controls and runs the fundamental business tasks like insertion,updation etc but OLAP is meant for planning,analysing and decision support.

3)OLTP performs transactions like insert,update,delete etc.OLAP performs querying and analysis of the aggregated data of OLTP.

4)OLTP basically uses smaller and less complex queries but OLAP uses complex queries.

5)OLTP is normalized with 3N but OLAP is denormalized with Star Schema or snowflake schema.

6)OLTP uses relational schema but OLAP uses multidimensional schema.

7)Backup of OLTP is very important but in case of OLAP we can always bring it from OLTP.

8)Data in OLTP can be seen instantly but in OLAP a batch or scheduler wil refresh the data."

85. Star Schema vs Snowflake Schema

Ans. Snowflake Schema:No redundancy,normalized,more complex queries,more joins,more FK's,good to use for small dw,To be used when dimension table is small in size

Star Schema:Redundant data,de normalized,less complex,good to use for large dw's,To be used when dimension table is very big in size"

86. What is Normalization?

Ans. Normalization is the mathematical model of evaluating the relational quality of the data model.The purpose is to reduce update anomalies.

87. What is First Normal Form(1 NF)?

Ans. The 1st Normal Form states that:

1)Every unit of data should be represented by a scalar value.

2)Each and every attribute i.e column should be unique in an entity.

3)Each and every tuple should be unique in an entity.

Consider a case where a table has columns basecamp,tour1,tour2,tour3.This table violates 1NF because there are so many attributes holding the same meaning.We can make two tables one having basecamp id and basecamp and the other table having basecampid and tour to bring it to 1NF."

88. What is Second Normal Form(2 NF)?

Ans. The 2nd Normal Form states that:

1)The entity should be in 1 NF and

2)There should be no partial dependency on the Primary Key.Suppose in the basecamp-tour table,we have columns basecampid,tour and phonenumberofthebasecamp.The PK of the table is a composite primary key having basecampid and tour columns.But in this case phonenumber is dependent on basecamp not on tour.So this violates 2NF.We can make two tables basecamp-tour having basecampid and tour and second table having basecampid and phonenumber.

89. What is Third Normal Form(3NF)?

Ans. The 3rd Normal Form states that:

1)the entity should be in 2nF and

2)There should be no transitive dependency on the primary Key.Suppose there ia atable basecamp-tour in which we have columns like Basecamp_id,Tour,Guide and GuideDateofHire.In this case GuideDateofHire is dependent on Guide which again is dependent on Basecamp_id.Thats why this violates the 3 NF.To resolve this,we can make another table Guide having Guide and GuideDateofHire."

90. What is Boyce Codd Normal Form?

Ans. It occurs between the third and fourth normal forms.It handles those kind of entities which have two sets of primary keys.In such case BCNF states that the entity should be split into two entities one for each primary key.

91. What is Fourth Normal Form?

Ans. It reduces problems created by complex composite primary keys.If there are three attributes in an entity in which two of the attributes combine to form a composite primary key but without the third attribute the row is not unique,then in that case the entity violates 3NF.In such case we should split the table into two separate tables.

92. What is Fifth Normal Form?

Ans. The 5NF provides the method for designing relationships that involve 3 or more entities.A three-way or ternary relationship, if properly designed, is in the fifth normal form.

93. What is Recovery Models in SQL Server 2008?

Ans. Simple,Full,Bulk Logged

Alter database databasename set recovery simple

94. What is Simple Recovery Model?

Ans. 1)SQL Server stores only minimal information in the transaction log.

2)The Changes made to the database are written to the data and to the log.

3)But the log files are not kept permanently.

4)It should be used on a db which is meant for development or test where there is a reduced need of frequent database backups.

Backup options for this model are Full .

95. What is Full Recovery Model?

Ans. This allows us to recover everything.It can use fullest set of backup options.So it can be used to recover the fullest set.Every insert,update,delete is logged into the transaction log.The committed changes are deleted from the log once the back up of the log is taken.It allows point-in-time recovery means it allows us to recover the database to a particular point in time.Backup options for this model are Full,Differential and Transaction Log.

96. What is Bulk Logged Model?

Ans. This is same as Full Recovery model but this is used in cases of bulk operations like create index,select into and bulk insert.All the bulk operations are written minimally to the transaction log there by improving performance.Backup options for this model are Full,Differential and Transaction Log.

97. What is Backups?

Ans. Full,Differential and Transaction

98. What is Full Backups?

Ans. A full backup retsores all our data but it will not restore all the data at a specific point of time.

99. What is Differential Backup?

Ans. A differential back up copies all the data changes since the last full back up.So these back ups are usually faster and smaller than Full back ups.A Full backup creates a checkpoint meaning that the differential back up has to start from that point.

100. What is Transaction Log Back up?

Ans. A transaction log backup will copy all the data changes since the last Full transaction or previous transaction log back up.they are fast and small but the demerit is recovery where we will get a transaction log for each day.

101. What is MSDTC?

Ans. MSDTC is a Transaction Manager which allows the client application to include different sources of data within one transaction.

For example: SSIS includes different data sources data into one transaction to main the acid property.

102. What is ACID?

Ans. Atomic, Consistent, Isolated and Durable.

103. What is Report Subscription?

Ans. A Subscription is a standing request to deliver a report at a specific time or to respond to a specific event, and in an application file format that you specify in the subscription.

Two types- Standard Subscription, Data driven Subscription

Stanard Subscription- Consists if static values and cann't be modified during subscription processing.

Datadriven Subscription- get subscription information at runtime by quering an external data source that provides values used to specify reciptant, parameter or application format.

104. What is XML from a select query?

Ans. <Select query> For XML RAW/AUTO/PATH/EXPLICIT;

select * from emp for xml raw

select * from emp for xml auto

select * from emp for xml path

<row><EmpName>Shakti</EmpName><DeptID>2</DeptID><Sal>3000</Sal></row><row><EmpName>Praveen</EmpName><DeptID>2</DeptID><Sal>4000</Sal></row><row><EmpName>Rohit</EmpName><DeptID>1</DeptID><Sal>2000</Sal></row><row><EmpName>Wamsi</EmpName><DeptID>1</DeptID><Sal>1500</Sal></row><row><EmpName>Kapil</EmpName><DeptID>2</DeptID><Sal>1300</Sal></row><row><EmpName>XXX</EmpName><DeptID>10</DeptID><Sal>1000</Sal></row><row><EmpName>Rahul</EmpName><DeptID>1</DeptID><Sal>1000</Sal></row>

select 1 as Tag,

NULL as Parent,

EmpName as [Employee!1!EmpName],

NULL as [DeptID!2!DeptID],

NULL as [Sal!3!SAL]

from emp

union

select 2 as Tag,

1 as Parent,

EmpName as [Employee!1!EmpName],

DeptID as [DeptID!2!DeptID],

null as [Sal!3!SAL]

from emp

union

select 3 as Tag,

2 as Parent,

EmpName as [Employee!1!EmpName],

DeptID as [DeptID!2!DeptID],

Sal as [Sal!3!SAL]

from emp

order by EmpName

for xml explicit

the values in the Tag and Parent meta columns, the information provided in the column names, and the correct ordering of the rows produce the XML you want when you use EXPLICIT mode.

105. What is SET XACT_ABORT { ON | OFF }?

Ans. Specifies whether SQL Server automatically rolls back the current transaction when a Transact-SQL statement raises a run-time error.

106. What is Computed Column?

Ans. A computed column is computed from an expression that can use other columns in the same table. The expression can be a noncomputed column name, constant, function, and any combination of these connected by one or more operators. The expression cannot be a subquery.

For example, in the AdventureWorks2008R2 sample database, the TotalDue column of the Sales.SalesOrderHeader table has the definition: TotalDue AS Subtotal + TaxAmt + Freight.

Unless otherwise specified, computed columns are virtual columns that are not physically stored in the table. Their values are recalculated every time they are referenced in a query. The Database Engine uses the PERSISTED keyword in the CREATE TABLE and ALTER TABLE statements to physically store computed columns in the table.

Computed columns can be used in select lists, WHERE clauses, ORDER BY clauses, or any other locations in which regular expressions can be used, with the following exceptions:

Computed columns used as CHECK, FOREIGN KEY, or NOT NULL constraints must be marked PERSISTED. A computed column can be used as a key column in an index or as part of any PRIMARY KEY or UNIQUE constraint if the computed column value is defined by a deterministic expression and the data type of the result is allowed in index columns.

For example, if the table has integer columns a and b, the computed column a + b can be indexed, but computed column a + DATEPART(dd, GETDATE()) cannot be indexed because the value may change in subsequent invocations.

A computed column cannot be the target of an INSERT or UPDATE statement.

World of MSBI

Friday 27 September 2013

SQL Server, SSIS and SSRS interview questions

2 comments: