SSIS INTERVIEW QUESTIONS AND ANSWERS : ALL BASIC QUESTIONS ON SQL SERVER INTEGRATION SERVICES
SSIS Interview questions with Answers
• What do we mean by dataflow in SSIS?
Data flow is nothing but the flow of data from the corresponding sources to the referred destinations. In this process, the data transformations make changes to the data to make it ready for the data warehouse.
Data flow is nothing but the flow of data from the corresponding sources to the referred destinations. In this process, the data transformations make changes to the data to make it ready for the data warehouse.
• What is a breakpoint in SSIS? How is it setup? How
do you disable it?
A breakpoint is a stopping point in the code. The breakpoint can give the Developer\DBA an
opportunity to review the status of the data, variables and the overall status of the SSIS package.
10 unique conditions exist for each breakpoint.
Breakpoints are setup in BIDS. In BIDS, navigate to the control flow interface. Right click on the
object where you want to set the breakpoint and select the ‘Edit Breakpoints…’ option.
A breakpoint is a stopping point in the code. The breakpoint can give the Developer\DBA an
opportunity to review the status of the data, variables and the overall status of the SSIS package.
10 unique conditions exist for each breakpoint.
Breakpoints are setup in BIDS. In BIDS, navigate to the control flow interface. Right click on the
object where you want to set the breakpoint and select the ‘Edit Breakpoints…’ option.
• Can you name 5 or more of the native SSIS connection
managers?
1)
OLEDB connection – Used to connect to any data source requiring an OLEDB
connection (i.e.,
SQL Server 2000)
2) Flat file connection – Used to make a connection to a single file in the File System. Required for reading information from a File System flat file
3) ADO.Net connection – Uses the .Net Provider to make a connection to SQL Server 2005 or other
connection exposed through managed code (like C#) in a custom task
4) Analysis Services connection – Used to make a connection to an Analysis Services database or project. Required for the Analysis Services DDL Task and Analysis Services Processing Task
5) File connection – Used to reference a file or folder. The options are to either use or create a file or folder
6) Excel
SQL Server 2000)
2) Flat file connection – Used to make a connection to a single file in the File System. Required for reading information from a File System flat file
3) ADO.Net connection – Uses the .Net Provider to make a connection to SQL Server 2005 or other
connection exposed through managed code (like C#) in a custom task
4) Analysis Services connection – Used to make a connection to an Analysis Services database or project. Required for the Analysis Services DDL Task and Analysis Services Processing Task
5) File connection – Used to reference a file or folder. The options are to either use or create a file or folder
6) Excel
• What is the use of Bulk Insert Task in SSIS?flat
file
Bulk Insert Task is used to upload large amount of data from flat files into Sql Server. It supports only OLE DB connections for destination database.
Bulk Insert Task is used to upload large amount of data from flat files into Sql Server. It supports only OLE DB connections for destination database.
• What is Conditional Split transformation in SSIS?
This is just like IF condition which checks for the given condition and based on the condition evaluation, the output will be sent to the appropriate OUTPUT path. It has ONE input and MANY outputs. Conditional Split transformation is used to send paths to different outputs based on some conditions. For example, we can organize the transform for the students in a class who have marks greater than 40 to one path and the students who score less than 40 to another path.
This is just like IF condition which checks for the given condition and based on the condition evaluation, the output will be sent to the appropriate OUTPUT path. It has ONE input and MANY outputs. Conditional Split transformation is used to send paths to different outputs based on some conditions. For example, we can organize the transform for the students in a class who have marks greater than 40 to one path and the students who score less than 40 to another path.
• How do you eliminate quotes from being uploaded from
a flat file to SQL Server?
This can be done using TEXT QUALIFIER property. In the SSIS package on the Flat File Connection Manager Editor, enter quotes into the Text qualifier field then preview the data to ensure the quotes are not included.
This can be done using TEXT QUALIFIER property. In the SSIS package on the Flat File Connection Manager Editor, enter quotes into the Text qualifier field then preview the data to ensure the quotes are not included.
• Can you explain how to setup a checkpoint file in SSIS?
The following items need to be configured on the properties tab for SSIS package:
CheckpointFileName – Specify the full path to the Checkpoint file that the package uses to save the value of package variables and log completed tasks. Rather than using a hard-coded path as shown above, it’s a good idea to use an expression that concatenates a path defined in a package variable and the package name.
CheckpointUsage – Determines if/how checkpoints are used. Choose from these options: Never(default), IfExists, or Always. Never indicates that you are not using Checkpoints. IfExists is the typical setting and implements the restart at the point of failure behavior. If a Checkpoint file is found it is used to restore package variable values and restart at the point of failure. If a Checkpoint file is not found the package starts execution with the first task. The Always choice raises an error if the Checkpoint file does not exist.
SaveCheckpoints – Choose from these options: True or False (default). You must select True to implement the Checkpoint behavior.
The following items need to be configured on the properties tab for SSIS package:
CheckpointFileName – Specify the full path to the Checkpoint file that the package uses to save the value of package variables and log completed tasks. Rather than using a hard-coded path as shown above, it’s a good idea to use an expression that concatenates a path defined in a package variable and the package name.
CheckpointUsage – Determines if/how checkpoints are used. Choose from these options: Never(default), IfExists, or Always. Never indicates that you are not using Checkpoints. IfExists is the typical setting and implements the restart at the point of failure behavior. If a Checkpoint file is found it is used to restore package variable values and restart at the point of failure. If a Checkpoint file is not found the package starts execution with the first task. The Always choice raises an error if the Checkpoint file does not exist.
SaveCheckpoints – Choose from these options: True or False (default). You must select True to implement the Checkpoint behavior.
• What are the different values you can set for
CheckpointUsage property ?
There are three values, which describe how a checkpoint file is used during package execution:
1) Never: The package will not use a checkpoint file and therefore will never restart.
2) If Exists: If a checkpoint file exists in the place you specified for the CheckpointFilename property, then it will be used, and the package will restart according to the checkpoints written.
3) Always: The package will always use a checkpoint file to restart, and if one does not exist, the package will fail.
There are three values, which describe how a checkpoint file is used during package execution:
1) Never: The package will not use a checkpoint file and therefore will never restart.
2) If Exists: If a checkpoint file exists in the place you specified for the CheckpointFilename property, then it will be used, and the package will restart according to the checkpoints written.
3) Always: The package will always use a checkpoint file to restart, and if one does not exist, the package will fail.
• What is the ONLY Property you need to set on TASKS
in order to configure CHECKPOINTS to RESTART package from failure?
The one property you have to set on the task is FailPackageOnFailure. This must be set for each task or container that you want to be the point for a checkpoint and restart. If you do not set this property to true and the task fails, no file will be written, and the next time you invoke the package, it will start from the beginning again.
The one property you have to set on the task is FailPackageOnFailure. This must be set for each task or container that you want to be the point for a checkpoint and restart. If you do not set this property to true and the task fails, no file will be written, and the next time you invoke the package, it will start from the beginning again.
• Where can we set the CHECKPOINTS, in DataFlow or
ControlFlow ?
Checkpoints only happen at the Control Flow; it is not possible to checkpoint transformations or restart inside a Data Flow. The Data Flow Task can be a checkpoint, but it is treated as any other task.
Checkpoints only happen at the Control Flow; it is not possible to checkpoint transformations or restart inside a Data Flow. The Data Flow Task can be a checkpoint, but it is treated as any other task.
• Can you explain different options
for dynamic configurations in SSIS?
1) XML file
2) custom variables
3) Database per environment with the variables
4) Use a centralized database with all variables
1) XML file
2) custom variables
3) Database per environment with the variables
4) Use a centralized database with all variables
• What is the use of Percentage Sampling
transformation in SSIS?
Percentage Sampling transformation is generally used for data mining. This transformation builds a random sample of set of output rows by choosing specified percentage of input rows. For example if the input has 1000 rows and if I specify 10 as percentage sample then the transformation returns 10% of the RANDOM records from the input data.
Percentage Sampling transformation is generally used for data mining. This transformation builds a random sample of set of output rows by choosing specified percentage of input rows. For example if the input has 1000 rows and if I specify 10 as percentage sample then the transformation returns 10% of the RANDOM records from the input data.
• What is the use of Term Extraction transformation in
SSIS?
Term Extraction transformation is used to extract nouns or noun phrases or both noun and noun phrases only from English text. It extracts terms from text in a transformation input column and then writes the terms to a transformation output column. It can be also used to find out the content of a dataset.
Term Extraction transformation is used to extract nouns or noun phrases or both noun and noun phrases only from English text. It extracts terms from text in a transformation input column and then writes the terms to a transformation output column. It can be also used to find out the content of a dataset.
• What is Data Viewer and what are the different types
of Data Viewers in SSIS?
A Data Viewer allows viewing data at a point of time at runtime. If data viewer is placed before and after the Aggregate transform, we can see data flowing to the transformation at the runtime and how it looks like after the transformation occurred. The different types of data viewers are:
A Data Viewer allows viewing data at a point of time at runtime. If data viewer is placed before and after the Aggregate transform, we can see data flowing to the transformation at the runtime and how it looks like after the transformation occurred. The different types of data viewers are:
1.
Grid
2. Histogram
3. Scatter Plot
4. Column Chart.
2. Histogram
3. Scatter Plot
4. Column Chart.
• What is Ignore Failure option in SSIS?
In Ignore Failure option, the error will be ignored and the data row will be directed to continue on the next transformation. Let’s say you have some JUNK data(wrong type of data or JUNK data) flowing from source, then using this option in SSIS we can REDIRECT the junk data records to another transformation instead of FAILING the package. This helps to MOVE only valid data to destination and JUNK can be captured into separate file.
In Ignore Failure option, the error will be ignored and the data row will be directed to continue on the next transformation. Let’s say you have some JUNK data(wrong type of data or JUNK data) flowing from source, then using this option in SSIS we can REDIRECT the junk data records to another transformation instead of FAILING the package. This helps to MOVE only valid data to destination and JUNK can be captured into separate file.
• Which are the different types of Control Flow
components in SSIS?
The different types of Control Flow components are: Data Flow Tasks, SQL Server Tasks, Data Preparation Tasks, Work flow Tasks, Scripting Tasks, Analysis Services Tasks, Maintenance Tasks, Containers.
The different types of Control Flow components are: Data Flow Tasks, SQL Server Tasks, Data Preparation Tasks, Work flow Tasks, Scripting Tasks, Analysis Services Tasks, Maintenance Tasks, Containers.
• What are containers? What are the different types of
containers in SSIS?
Containers are objects that provide structures to packages and extra functionality to tasks. There are four types of containers in SSIS, they are: Foreach Loop Container, For Loop Container, Sequence Container and Task Host Container.
Containers are objects that provide structures to packages and extra functionality to tasks. There are four types of containers in SSIS, they are: Foreach Loop Container, For Loop Container, Sequence Container and Task Host Container.
• What are the different types of Data flow components
in SSIS?
There are 3 data flow components in SSIS.
1. Sources
2. Transformations
3. Destinations
There are 3 data flow components in SSIS.
1. Sources
2. Transformations
3. Destinations
• What are the different types of data sources
available in SSIS?
There are 7 types of data sources provided by SSIS: a.) Data Reader source b.) Excel source c.) Flat file source d.) OLEDB source e.) Raw file source f.) XML source g.) Script component
There are 7 types of data sources provided by SSIS: a.) Data Reader source b.) Excel source c.) Flat file source d.) OLEDB source e.) Raw file source f.) XML source g.) Script component
• What is SSIS Designer?
It is a graphical tool for creating packages. It has 4 tabs: Control Flow, Data Flow, Event Handlers and Package Explorer.
It is a graphical tool for creating packages. It has 4 tabs: Control Flow, Data Flow, Event Handlers and Package Explorer.
• What is Control Flow tab?
It is the tab in SSIS designer where various Tasks can be arranged and configured. This is the tab where we provide and control the program flow of the project. In SSIS a workflow is called a control-flow. A control-flow links together our modular data-flows as a series of operations in order to achieve a desired result.
A control flow consists of one or more tasks and containers that execute when the package runs. To control order or define the conditions for running the next task or container in the package control flow, you use precedence constraints to connect the tasks and containers in a package. A subset of tasks and containers can also be grouped and run repeatedly as a unit within the package control flow
It is the tab in SSIS designer where various Tasks can be arranged and configured. This is the tab where we provide and control the program flow of the project. In SSIS a workflow is called a control-flow. A control-flow links together our modular data-flows as a series of operations in order to achieve a desired result.
A control flow consists of one or more tasks and containers that execute when the package runs. To control order or define the conditions for running the next task or container in the package control flow, you use precedence constraints to connect the tasks and containers in a package. A subset of tasks and containers can also be grouped and run repeatedly as a unit within the package control flow
• What is Data Flow tab?
This is the tab where we do all the work related to ETL job. It is the tab in SSIS Designer where we can extract data from sources, transform the data and then load them into destinations. A data flow consists of the sources and destinations that extract and load data, the transformations that modify and extend data, and the paths that link sources, transformations, and destinations. Before you can add a data flow to a package, the package control flow must include a Data Flow task.
This is the tab where we do all the work related to ETL job. It is the tab in SSIS Designer where we can extract data from sources, transform the data and then load them into destinations. A data flow consists of the sources and destinations that extract and load data, the transformations that modify and extend data, and the paths that link sources, transformations, and destinations. Before you can add a data flow to a package, the package control flow must include a Data Flow task.
• What is the function of control flow tab in SSIS?
On the control flow tab, the tasks including dataflow task, containers and precedence constraints that connect containers and tasks can be arranged and configured.
On the control flow tab, the tasks including dataflow task, containers and precedence constraints that connect containers and tasks can be arranged and configured.
• What is the function of Event handlers tab in SSIS?
On the Event handlers tab, workflows can be configured to respond to package events.
For example, we can configure Work Flow when ANY task Failes or Stops or Starts ..
On the Event handlers tab, workflows can be configured to respond to package events.
For example, we can configure Work Flow when ANY task Failes or Stops or Starts ..
• What is the function of Package explorer tab in
SSIS?
This tab provides an explorer view of the package. You can see what is happening in the package. The Package is a container at the top of the hierarchy.
This tab provides an explorer view of the package. You can see what is happening in the package. The Package is a container at the top of the hierarchy.
• What is Solution Explorer?
It is a place in SSIS Designer where all the projects, Data Sources, Data Source Views and other miscellaneous files can be viewed and accessed for modification.
It is a place in SSIS Designer where all the projects, Data Sources, Data Source Views and other miscellaneous files can be viewed and accessed for modification.
• How do we convert data type in SSIS?
The Data Conversion Transformation in SSIS converts the data type of an input column to a different data type.
The Data Conversion Transformation in SSIS converts the data type of an input column to a different data type.
• How are variables useful in ssis package?
Variables can provide communication among objects in the package. Variables can provide communication between parent and child packages. Variables can also be used in expressions and scripts. This helps in providing dynamic values to tasks.
Variables can provide communication among objects in the package. Variables can provide communication between parent and child packages. Variables can also be used in expressions and scripts. This helps in providing dynamic values to tasks.
Variables store values that a SSIS package and its
containers, tasks, and event handlers can use at run time. The scripts in the
Script task and the Script component can also use variables. The precedence
constraints that sequence tasks and containers into a workflow can use
variables when their constraint definitions include expressions.
Integration Services supports two types of variables: user-defined variables and system variables. User-defined variables are defined by package developers, and system variables are defined by Integration Services. You can create as many user-defined variables as a package requires, but you cannot create additional system variables.
Scope : A variable is created within the scope of a package or within the scope of a container, task, or event handler in the package. Because the package container is at the top of the container hierarchy, variables with package scope function like global variables and can be used by all containers in the package. Similarly, variables defined within the scope of a container such as a For Loop container can be used by all tasks or containers within the For Loop container.
Integration Services supports two types of variables: user-defined variables and system variables. User-defined variables are defined by package developers, and system variables are defined by Integration Services. You can create as many user-defined variables as a package requires, but you cannot create additional system variables.
Scope : A variable is created within the scope of a package or within the scope of a container, task, or event handler in the package. Because the package container is at the top of the container hierarchy, variables with package scope function like global variables and can be used by all containers in the package. Similarly, variables defined within the scope of a container such as a For Loop container can be used by all tasks or containers within the For Loop container.
• Explain Aggregate Transformation in SSIS?
It aggregates data, similar you do in applying TSQL functions like Group By, Min, Max, Avg, and Count. For example you get total quantity and Total line item for each product in Aggregate Transformation Editor. First you determine input columns, then output column name in Output Alias table in datagrid, and also operations for each Output Alias in Operation columns of the same datagrid. Some of operation functions listed below :
• Group By
• Average
• Count
• Count Distinct : count distinct and non null column value
• Min, Max, Sum
It aggregates data, similar you do in applying TSQL functions like Group By, Min, Max, Avg, and Count. For example you get total quantity and Total line item for each product in Aggregate Transformation Editor. First you determine input columns, then output column name in Output Alias table in datagrid, and also operations for each Output Alias in Operation columns of the same datagrid. Some of operation functions listed below :
• Group By
• Average
• Count
• Count Distinct : count distinct and non null column value
• Min, Max, Sum
- In the Aggregate transformation, Keys and Keys scale refer to the
number of groups that are expected to result from a Group by operation.
- Count distinct keys and Count distinct scale refer to the number of
distinct values that are expected to result from a Distinct count
operation.
• Explain Audit Transformation ?
It allows you to add auditing information as required in auditing world specified by HIPPA and Sarbanes-Oxley (SOX). Auditing options that you can add to transformed data through this transformation are :
1. Execution of Instance GUID : ID of execution instance of the package
2. PackageID : ID of the package
3. PackageName
4. VersionID : GUID version of the package
5. Execution StartTime
6. MachineName
7. UserName
8. TaskName
9. TaskID : uniqueidentifier type of the data flow task that contains audit transformation.
It allows you to add auditing information as required in auditing world specified by HIPPA and Sarbanes-Oxley (SOX). Auditing options that you can add to transformed data through this transformation are :
1. Execution of Instance GUID : ID of execution instance of the package
2. PackageID : ID of the package
3. PackageName
4. VersionID : GUID version of the package
5. Execution StartTime
6. MachineName
7. UserName
8. TaskName
9. TaskID : uniqueidentifier type of the data flow task that contains audit transformation.
• Explain Character Map Transformation?
It transforms some character. It gives options whether output result will override the existing column or add to new column. If you define it as new column, specify new column name. Operations available here are:
1. Uppercase
2. Lowercase
3. Byte reversal : such as from 0×1234 to 0×4321
4. Full width
5. Half width
6. Hiragana/katakana/traditional Chinese/simplified Chinese
7. Linguistic casing
It transforms some character. It gives options whether output result will override the existing column or add to new column. If you define it as new column, specify new column name. Operations available here are:
1. Uppercase
2. Lowercase
3. Byte reversal : such as from 0×1234 to 0×4321
4. Full width
5. Half width
6. Hiragana/katakana/traditional Chinese/simplified Chinese
7. Linguistic casing
• Explain Conditional split Transformation ?
It functions as if…then…else construct. It enables send input data to a satisfied conditional branch. For example you want to split product quantity between less than 500 and greater or equal to 500. You can give the conditional a name that easily identifies its purpose. Else section will be covered in Default Output Column name.
After you configure the component, it connect to subsequent transformation/destination, when connected, it pops up dialog box to let you choose which conditional options will apply to the destination transformation/destination.
It functions as if…then…else construct. It enables send input data to a satisfied conditional branch. For example you want to split product quantity between less than 500 and greater or equal to 500. You can give the conditional a name that easily identifies its purpose. Else section will be covered in Default Output Column name.
After you configure the component, it connect to subsequent transformation/destination, when connected, it pops up dialog box to let you choose which conditional options will apply to the destination transformation/destination.
• Explain Copy column Transformation?
This component simply copies a column to another new column. Just like ALIAS Column in T-Sql.
This component simply copies a column to another new column. Just like ALIAS Column in T-Sql.
• Explain Data conversion Transformation?
This component does conversion data type, similar to TSQL function CAST or CONVERT. If you wish to convert the data from one type to another then this is the best bet. But please make sure that you have COMPATABLE data in the column.
This component does conversion data type, similar to TSQL function CAST or CONVERT. If you wish to convert the data from one type to another then this is the best bet. But please make sure that you have COMPATABLE data in the column.
• Explain Data Mining query Transformation?
This component does prediction on the data or fills gap on it. Some good scenarios uses this component is:
1. Take some input columns as number of children, domestic income, and marital income to predict whether someone owns a house or not.
2. Take prediction what a customer would buy based analysis buying pattern on their shopping cart.
3. Filling blank data or default values when customer doesn’t fill some items in the questionnaire.
This component does prediction on the data or fills gap on it. Some good scenarios uses this component is:
1. Take some input columns as number of children, domestic income, and marital income to predict whether someone owns a house or not.
2. Take prediction what a customer would buy based analysis buying pattern on their shopping cart.
3. Filling blank data or default values when customer doesn’t fill some items in the questionnaire.
• Explain Derived column Transformation?
Derived column creates new column or put manipulation of several columns into new column. You can directly copy existing or create a new column using more than one column also.
Derived column creates new column or put manipulation of several columns into new column. You can directly copy existing or create a new column using more than one column also.
• Explain Merge Transformation?
Merge transformation merges two paths into single path. It is useful when you want to break out data into path that handles errors after the errors are handled, the data are merge back into downstream or you want to merge 2 data sources. It is similar with Union All transformation, but Merge has some restrictions :
1. Data should be in sorted order
2. Data type , data length and other meta data attribute must be similar before merged.
Merge transformation merges two paths into single path. It is useful when you want to break out data into path that handles errors after the errors are handled, the data are merge back into downstream or you want to merge 2 data sources. It is similar with Union All transformation, but Merge has some restrictions :
1. Data should be in sorted order
2. Data type , data length and other meta data attribute must be similar before merged.
• Explain Merge Join Transformation?
Merge Join transformation will merge output from 2 inputs and doing INNER or OUTER join on the data. But if you the data come from 1 OLEDB data source, it is better you join through SQL query rather than using Merge Join transformation. Merge Join is intended to join 2 different data source.
Merge Join transformation will merge output from 2 inputs and doing INNER or OUTER join on the data. But if you the data come from 1 OLEDB data source, it is better you join through SQL query rather than using Merge Join transformation. Merge Join is intended to join 2 different data source.
• Explain Multicast Transformation?
This transformation sends output to multiple output paths with no conditional as Conditional Split does. Takes ONE Input and makes the COPY of data and passes the same data through many outputs. In simple Give one input and take many outputs of the same data.
This transformation sends output to multiple output paths with no conditional as Conditional Split does. Takes ONE Input and makes the COPY of data and passes the same data through many outputs. In simple Give one input and take many outputs of the same data.
• Explain Percentage row sampling Transformations?
This transformation will take data from source and randomly sampling data. It gives you 2 outputs. First is selected data and second one is unselected data. It is used in situation where you train data mining model. These two are used to take the SAMPLE of data from the input data.
This transformation will take data from source and randomly sampling data. It gives you 2 outputs. First is selected data and second one is unselected data. It is used in situation where you train data mining model. These two are used to take the SAMPLE of data from the input data.
• Explain Sort Transformation?
This component will sort data, similar in TSQL command ORDER BY. Some transformations need sorted data.
This component will sort data, similar in TSQL command ORDER BY. Some transformations need sorted data.
• Explain Union all Transformation?
It works in opposite way to Merge transformation. It can take output from more than 2 input paths and combines into single output path.
It works in opposite way to Merge transformation. It can take output from more than 2 input paths and combines into single output path.
• What r the possible locations to save SSIS package?
You can save a package wherever you want.
SQL Server
Package Store
File System
You can save a package wherever you want.
SQL Server
Package Store
File System
• What is a package?
A discrete executable unit of work composed of a collection of control flow and other objects, including data sources, transformations, process sequence, and rules, errors and event handling, and data destinations.
A discrete executable unit of work composed of a collection of control flow and other objects, including data sources, transformations, process sequence, and rules, errors and event handling, and data destinations.
• What is a workflow in SSIS?
A workflow is a set of instructions on how to execute tasks.
(It is a set of instructions on how to execute tasks such as sessions, emails and shell commands. a workflow is created form work flow mgr.
A workflow is a set of instructions on how to execute tasks.
(It is a set of instructions on how to execute tasks such as sessions, emails and shell commands. a workflow is created form work flow mgr.
• What is the diff between control flow Items and data
flow Items?
The control flow is the highest level control process. It allows you to manage the run-time process activities of data flow and other processes within a package.
When we want to extract, transform and load data within a package. You add an SSIS dataflow task to the package control flow.
The control flow is the highest level control process. It allows you to manage the run-time process activities of data flow and other processes within a package.
When we want to extract, transform and load data within a package. You add an SSIS dataflow task to the package control flow.
• What are the main component of
SSIS(project-architecture)?
SSIS archItecture has 4 main components
1.ssis service
2.ssis runtime engine & runtime executables
3.ssis dataflow engine & dataflow components
4.ssis clients
SSIS archItecture has 4 main components
1.ssis service
2.ssis runtime engine & runtime executables
3.ssis dataflow engine & dataflow components
4.ssis clients
• Different components in SSIS package?
Control flow
Data flow
Event handler
Package explorer
Control flow
Data flow
Event handler
Package explorer
• What are Connection Managers?
It is a bridge b/w package object and physical data. It provides logical representation of a connection at design time the properties of the connection mgr describes the physical connection that integration services creates when the package is run.
It is a bridge b/w package object and physical data. It provides logical representation of a connection at design time the properties of the connection mgr describes the physical connection that integration services creates when the package is run.
• What is environment variable in SSIS?
An environment variable configuration sets a package property equal to the value in an environment variable.
Environmental configurations are useful for configuring properties that are dependent on the computer that is executing the package.
An environment variable configuration sets a package property equal to the value in an environment variable.
Environmental configurations are useful for configuring properties that are dependent on the computer that is executing the package.
• How to provide securIty to packages?
We can provide security in two ways
1. Package encryption
2. Password protection.
We can provide security in two ways
1. Package encryption
2. Password protection.
• What are Precedence constraints?
Constraints that link executable, container, and tasks wIthin the package control flow and specify condItion that determine the sequence and condItions for determine whether executable run.
Constraints that link executable, container, and tasks wIthin the package control flow and specify condItion that determine the sequence and condItions for determine whether executable run.
• What is Design time Deployment in SSIS ?
When you run a package from with in BIDS,it is built and
temporarily deployed to the folder. By default the package will be deployed to
the BIN folder in the Package’s Project folder and you can configure for custom
folder for deployment. When the Package’s execution is completed and stopped in
BIDS,the deployed package will be deleted and this is called as Design Time
Deployment.
SSIS - SQL Server Integration Services
Q:
What is SSIS? How it related with SQL Server.
SQL Server Integration Services (SSIS) is a component of SQL Server which can be used to perform a wide range of Data Migration and ETL operations. SSIS is a component in MSBI process of SQL Server.
This is a platform for Integration and Workflow applications. It is known for a fast and flexible OLTP and OLAP extensions used for data extraction, transformation, and loading (ETL). The tool may also be used to automate maintenance of SQL Server databases and multidimensional data sets.
SQL Server Integration Services (SSIS) is a component of SQL Server which can be used to perform a wide range of Data Migration and ETL operations. SSIS is a component in MSBI process of SQL Server.
This is a platform for Integration and Workflow applications. It is known for a fast and flexible OLTP and OLAP extensions used for data extraction, transformation, and loading (ETL). The tool may also be used to automate maintenance of SQL Server databases and multidimensional data sets.
Q:
What are the tools associated with SSIS?
We use Business Intelligence Development Studio (BIDS) and SQL Server Management Studio (SSMS) to work with Development of SSIS Projects.
We use SSMS to manage the SSIS Packages and Projects.
We use Business Intelligence Development Studio (BIDS) and SQL Server Management Studio (SSMS) to work with Development of SSIS Projects.
We use SSMS to manage the SSIS Packages and Projects.
Q:
What are the differences between DTS and SSIS
Data Transformation Services
|
SQL Server Integration Services
|
Limited Error Handling
|
Complex and powerful Error
Handling
|
Message Boxes in ActiveX Scripts
|
Message Boxes in .NET Scripting
|
No Deployment Wizard
|
Interactive Deployment Wizard
|
Limited Set of Transformation
|
Good number of Transformations
|
NO BI functionality
|
Complete BI Integration
|
Q:
What is a workflow in SSIS ?
Workflow is a set of instructions on to specify the Program Executor on how to execute tasks and containers within SSIS Packages.
Q: What is the control flow?
Workflow is a set of instructions on to specify the Program Executor on how to execute tasks and containers within SSIS Packages.
Q: What is the control flow?
A control flow consists
of one or more tasks and containers that execute when the package runs. To
control order or define the conditions for running the next task or container
in the package control flow, we use precedence constraints to connect the tasks
and containers in a package. A subset of tasks and containers can also be
grouped and run repeatedly as a unit within the package control flow. SQL
Server 2005 Integration Services (SSIS) provides three different types of control
flow elements: Containers that provide structures in packages, Tasks that
provide functionality, and Precedence Constraints that connect the executables,
containers, and tasks into an ordered control flow.
Q:
What is a data flow?
A data flow consists of the sources and destinations that extract and load data, the transformations that modify and extend data, and the paths that link sources, transformations, and destinations The Data Flow task is the executable within the SSIS package that creates, orders, and runs the data flow. A separate instance of the data flow engine is opened for each Data Flow task in a package. Data Sources, Transformations, and Data Destinations are the three important categories in the Data Flow.
A data flow consists of the sources and destinations that extract and load data, the transformations that modify and extend data, and the paths that link sources, transformations, and destinations The Data Flow task is the executable within the SSIS package that creates, orders, and runs the data flow. A separate instance of the data flow engine is opened for each Data Flow task in a package. Data Sources, Transformations, and Data Destinations are the three important categories in the Data Flow.
Q:
How does Error-Handling work in SSIS
When a data flow component applies a transformation to column data, extracts data from sources, or loads data into destinations, errors can occur. Errors frequently occur because of unexpected data values.
Type of typical Errors in SSIS:
-Data Connection Errors, which occur incase the connection manager cannot be initialized with the connection string. This applies to both Data Sources and Data Destinations along with Control Flows that use the Connection Strings.
-Data Transformation Errors, which occur while data is being transformed over a Data Pipeline from Source to Destination.
-Expression Evaluation errors, which occur if expressions that are evaluated at run time perform invalid
When a data flow component applies a transformation to column data, extracts data from sources, or loads data into destinations, errors can occur. Errors frequently occur because of unexpected data values.
Type of typical Errors in SSIS:
-Data Connection Errors, which occur incase the connection manager cannot be initialized with the connection string. This applies to both Data Sources and Data Destinations along with Control Flows that use the Connection Strings.
-Data Transformation Errors, which occur while data is being transformed over a Data Pipeline from Source to Destination.
-Expression Evaluation errors, which occur if expressions that are evaluated at run time perform invalid
Q:
What is environment variable in SSIS?
An environment variable configuration sets a package property equal to the value in an environment variable.
Environmental configurations are useful for configuring properties that are dependent on the computer that is executing the package.
An environment variable configuration sets a package property equal to the value in an environment variable.
Environmental configurations are useful for configuring properties that are dependent on the computer that is executing the package.
Q:
What are the Transformations available in SSIS?
AGGEGATE - It applies aggregate functions to Record Sets to produce new output records from aggregated values.
AUDIT - Adds Package and Task level Metadata - such as Machine Name, Execution Instance, Package Name, Package ID, etc..
CHARACTER MAP - Performs SQL Server level makes string data changes such as changing data from lower case to upper case.
CONDITIONAL SPLIT – Separates available input into separate output pipelines based on Boolean Expressions configured for each output.
COPY COLUMN - Add a copy of column to the output we can later transform the copy keeping the original for auditing.
DATA CONVERSION - Converts columns data types from one to another type. It stands for Explicit Column Conversion.
DATA MINING QUERY – Used to perform data mining query against analysis services and manage Predictions Graphs and Controls.
DERIVED COLUMN - Create a new (computed) column from given expressions.
EXPORT COLUMN – Used to export a Image specific column from the database to a flat file.
FUZZY GROUPING – Used for data cleansing by finding rows that are likely duplicates.
FUZZY LOOKUP - Used for Pattern Matching and Ranking based on fuzzy logic.
IMPORT COLUMN - Reads image specific column from database onto a flat file.
LOOKUP - Performs the lookup (searching) of a given reference object set against a data source. It is used for exact matches only.
MERGE - Merges two sorted data sets into a single data set into a single data flow.
MERGE JOIN - Merges two data sets into a single dataset using a join junction.
MULTI CAST - Sends a copy of supplied Data Source onto multiple Destinations.
ROW COUNT - Stores the resulting row count from the data flow / transformation into a variable.
ROW SAMPLING - Captures sample data by using a row count of the total rows in dataflow specified by rows or percentage.
UNION ALL - Merge multiple data sets into a single dataset.
PIVOT – Used for Normalization of data sources to reduce analomolies by converting rows into columns
UNPIVOT – Used for demoralizing the data structure by converts columns into rows incase of building Data Warehouses.
AGGEGATE - It applies aggregate functions to Record Sets to produce new output records from aggregated values.
AUDIT - Adds Package and Task level Metadata - such as Machine Name, Execution Instance, Package Name, Package ID, etc..
CHARACTER MAP - Performs SQL Server level makes string data changes such as changing data from lower case to upper case.
CONDITIONAL SPLIT – Separates available input into separate output pipelines based on Boolean Expressions configured for each output.
COPY COLUMN - Add a copy of column to the output we can later transform the copy keeping the original for auditing.
DATA CONVERSION - Converts columns data types from one to another type. It stands for Explicit Column Conversion.
DATA MINING QUERY – Used to perform data mining query against analysis services and manage Predictions Graphs and Controls.
DERIVED COLUMN - Create a new (computed) column from given expressions.
EXPORT COLUMN – Used to export a Image specific column from the database to a flat file.
FUZZY GROUPING – Used for data cleansing by finding rows that are likely duplicates.
FUZZY LOOKUP - Used for Pattern Matching and Ranking based on fuzzy logic.
IMPORT COLUMN - Reads image specific column from database onto a flat file.
LOOKUP - Performs the lookup (searching) of a given reference object set against a data source. It is used for exact matches only.
MERGE - Merges two sorted data sets into a single data set into a single data flow.
MERGE JOIN - Merges two data sets into a single dataset using a join junction.
MULTI CAST - Sends a copy of supplied Data Source onto multiple Destinations.
ROW COUNT - Stores the resulting row count from the data flow / transformation into a variable.
ROW SAMPLING - Captures sample data by using a row count of the total rows in dataflow specified by rows or percentage.
UNION ALL - Merge multiple data sets into a single dataset.
PIVOT – Used for Normalization of data sources to reduce analomolies by converting rows into columns
UNPIVOT – Used for demoralizing the data structure by converts columns into rows incase of building Data Warehouses.
Q:
How to log SSIS Executions?
SSIS includes logging features that write log entries when run-time events occur and can also write custom messages. This is not enabled by default. Integration Services supports a diverse set of log providers, and gives you the ability to create custom log providers. The Integration Services log providers can write log entries to text files, SQL Server Profiler, SQL Server, Windows Event Log, or XML files. Logs are associated with packages and are configured at the package level. Each task or container in a package can log information to any package log. The tasks and containers in a package can be enabled for logging even if the package itself is not.
SSIS includes logging features that write log entries when run-time events occur and can also write custom messages. This is not enabled by default. Integration Services supports a diverse set of log providers, and gives you the ability to create custom log providers. The Integration Services log providers can write log entries to text files, SQL Server Profiler, SQL Server, Windows Event Log, or XML files. Logs are associated with packages and are configured at the package level. Each task or container in a package can log information to any package log. The tasks and containers in a package can be enabled for logging even if the package itself is not.
Q:
How do you deploy SSIS packages.
BUILDing SSIS Projects provides a Deployment Manifest File. We need to run the manifest file and decide whether to deploy this onto File System or onto SQL Server [ msdb]. SQL Server Deployment is very faster and more secure then File System Deployment. Alternatively, we can also import the package from SSMS from File System or SQl Server.
BUILDing SSIS Projects provides a Deployment Manifest File. We need to run the manifest file and decide whether to deploy this onto File System or onto SQL Server [ msdb]. SQL Server Deployment is very faster and more secure then File System Deployment. Alternatively, we can also import the package from SSMS from File System or SQl Server.
Q:
What are variables and what is variable scope ?
Variables store values that a SSIS package and its containers, tasks, and event handlers can use at run time. The scripts in the Script task and the Script component can also use variables. The precedence constraints that sequence tasks and containers into a workflow can use variables when their constraint definitions include expressions. Integration Services supports two types of variables: user-defined variables and system variables. User-defined variables are defined by package developers, and system variables are defined by Integration Services. You can create as many user-defined variables as a package requires, but you cannot create additional system variables.
Variables store values that a SSIS package and its containers, tasks, and event handlers can use at run time. The scripts in the Script task and the Script component can also use variables. The precedence constraints that sequence tasks and containers into a workflow can use variables when their constraint definitions include expressions. Integration Services supports two types of variables: user-defined variables and system variables. User-defined variables are defined by package developers, and system variables are defined by Integration Services. You can create as many user-defined variables as a package requires, but you cannot create additional system variables.
Q: Can you name five of the Perfmon counters for SSIS and the value they provide?
§ SQLServer:SSIS Service
§ SSIS Package Instances
SQLServer:SSIS Pipeline
BLOB bytes read - Total bytes read from binary large objects during the monitoring period.
BLOB bytes written - Total bytes written to binary large objects during the monitoring period.
BLOB files in use - Number of binary large objects files used during the data flow task during the monitoring period.
Buffer memory - The amount of physical or virtual memory used by the data flow task during the monitoring period.
Buffers in use - The number of buffers in use during the data flow task during the monitoring period.
Buffers spooled - The number of buffers written to disk during the data flow task during the monitoring period.
Flat buffer memory - The total number of blocks of memory in use by the data flow task during the monitoring period.
Flat buffers in use - The number of blocks of memory in use by the data flow task at a point in time.
Private buffer memory - The total amount of physical or virtual memory used by data transformation tasks in the data flow engine during the monitoring period.
Private buffers in use - The number of blocks of memory in use by the transformations in the data flow task at a point in time.
Rows read - Total number of input rows in use by the data flow task at a point in time.
Rows written - Total number of output rows in use by the data flow task at a point in time.
Question: Have you used SSIS Framework?
Answer: This is common term in SSIS world which just means that you have templates that are set up to perform routine tasks like logging, error handling etc. Yes answer would usually indicate experienced person, no answer is still fine if your project is not very mission critical.
BLOB bytes read - Total bytes read from binary large objects during the monitoring period.
BLOB bytes written - Total bytes written to binary large objects during the monitoring period.
BLOB files in use - Number of binary large objects files used during the data flow task during the monitoring period.
Buffer memory - The amount of physical or virtual memory used by the data flow task during the monitoring period.
Buffers in use - The number of buffers in use during the data flow task during the monitoring period.
Buffers spooled - The number of buffers written to disk during the data flow task during the monitoring period.
Flat buffer memory - The total number of blocks of memory in use by the data flow task during the monitoring period.
Flat buffers in use - The number of blocks of memory in use by the data flow task at a point in time.
Private buffer memory - The total amount of physical or virtual memory used by data transformation tasks in the data flow engine during the monitoring period.
Private buffers in use - The number of blocks of memory in use by the transformations in the data flow task at a point in time.
Rows read - Total number of input rows in use by the data flow task at a point in time.
Rows written - Total number of output rows in use by the data flow task at a point in time.
Question: Have you used SSIS Framework?
Answer: This is common term in SSIS world which just means that you have templates that are set up to perform routine tasks like logging, error handling etc. Yes answer would usually indicate experienced person, no answer is still fine if your project is not very mission critical.
Question:
How many difference source and destinations have you used?
Answer: It is very common to get all kinds of sources so the more the person worked with the better for you. Common ones are SQL Server, CSV/TXT, Flat Files, Excel, Access, Oracle, MySQL but also Salesforce, web data scrapping.
Question: How do you apply business rules in SSIS (Transformations….Specific calculations but also cleansing)?
Answer: Some people use SSIS only to extract data and then go with stored procedures only….they are usually missing the point of the power of SSIS. Which allows to create "a flow" and on each step applies certain rules this greatly simplifies the ETL process and simplicity is very good.
Question: How to quickly load data into sql server table?
Answer: Fast Load option. This option is not set by default so most developers know this answer as otherwise the load is very slow.
Question: Give example of handling data quality issues?
Answer: Data Quality is almost always a problem and SSIS handles it very well. Examples include importing customers from different sources where customer name can be duplicates. For instance you can have as company name: SQL Server Business Intelligence but also SQL Server BI or SQL Server BI LTD or SQL Server BI Limited or intelligence (with one l). There are different ways to handle it. Robust and time consuming is to create a table with or possible scenarios and update it after each update. You can also use fuzzy grouping which is usually easy to implement and will make usually very good decisions but it is not 100% accurate so this approach has to be justified. Other typical quality issues are nulls (missing values), outliers (dates like 2999 or types like 50000 instead of 5000 especially important if someone is adjusting the value to get bigger bonus), incorrect addresses
and these are either corrected during ETL, ignored, re-directed for further manual updates or it fails the packages which for big processes is usually not practised.
Answer: It is very common to get all kinds of sources so the more the person worked with the better for you. Common ones are SQL Server, CSV/TXT, Flat Files, Excel, Access, Oracle, MySQL but also Salesforce, web data scrapping.
Question: How do you apply business rules in SSIS (Transformations….Specific calculations but also cleansing)?
Answer: Some people use SSIS only to extract data and then go with stored procedures only….they are usually missing the point of the power of SSIS. Which allows to create "a flow" and on each step applies certain rules this greatly simplifies the ETL process and simplicity is very good.
Question: How to quickly load data into sql server table?
Answer: Fast Load option. This option is not set by default so most developers know this answer as otherwise the load is very slow.
Question: Give example of handling data quality issues?
Answer: Data Quality is almost always a problem and SSIS handles it very well. Examples include importing customers from different sources where customer name can be duplicates. For instance you can have as company name: SQL Server Business Intelligence but also SQL Server BI or SQL Server BI LTD or SQL Server BI Limited or intelligence (with one l). There are different ways to handle it. Robust and time consuming is to create a table with or possible scenarios and update it after each update. You can also use fuzzy grouping which is usually easy to implement and will make usually very good decisions but it is not 100% accurate so this approach has to be justified. Other typical quality issues are nulls (missing values), outliers (dates like 2999 or types like 50000 instead of 5000 especially important if someone is adjusting the value to get bigger bonus), incorrect addresses
and these are either corrected during ETL, ignored, re-directed for further manual updates or it fails the packages which for big processes is usually not practised.
Question:
What is XMLify component?
Answer: It is 3rd party free component used rather frequently to output errors into XML field which saves development time.
Question: What command line tools do you use with SSIS ?
Answer: dtutil (deployment), dtexec (execution), dtexecui (generation of execution code)
Answer: It is 3rd party free component used rather frequently to output errors into XML field which saves development time.
Question: What command line tools do you use with SSIS ?
Answer: dtutil (deployment), dtexec (execution), dtexecui (generation of execution code)
Problem
When you are preparing for an SSIS
interview you need to understand what questions could be asked in the
interview. In this tip series, I will try to cover as much as I can to help you
prepare for SSIS interview. In this tip we will cover SSIS basics and SSIS
event logging.
Solution
What
are SSIS Connection Managers?
- When we talk of integrating data, we are actually
pulling data from different sources and writing it to a destination. But
how do you get connected to the source and destination systems? This is
where the connection managers come into the picture. Connection manager
represent a connection to a system which includes data provider
information, the server name, database name, authentication mechanism,
etc.
What
is the RetainSameConnection property and what is its impact?
- Whenever a task uses a connection manager to
connect to source or destination database, a connection is opened and
closed with the execution of that task. Sometimes you might need to open a
connection, execute multiple tasks and close it at the end of the
execution. This is where RetainSameConnection property of the connection
manager might help you. When you set this property to TRUE, the connection
will be opened on first time it is used and remain open until execution of
the package completes.
What
are a source and destination adapters?
- A source adaptor basically indicates a source in
Data Flow to pull data from. The source adapter uses a connection manager
to connect to a source and along with it you can also specify the query method
and query to pull data from the source.
- Similar to a source adaptor, the destination
adapter indicates a destination in the Data Flow to write data to. Again
like the source adapter, the destination adapter also uses a connection
manager to connect to a target system and along with that you also specify
the target table and writing mode, i.e. write one row at a time or do a
bulk insert as well as several other properties.
- Please note, the source and destination adapters
can both use the same connection manager if you are reading and writing to
the same database.
What
is the Data Path and how is it different from a Precedence Constraint?
- Data Path is used in a Data Flow task to connect
to different components of a Data Flow and show transition of the data
from one component to another. A data path contains the meta information
of the data flowing through it, such as the columns, data type, size, etc.
When we talk about differences between the data path and precedence
constraint; the data path is used in the data flow, which shows the flow
of data. Whereas the precedence constraint is used in control flow, which
shows control flow or transition from one task to another task.
What
is a Data Viewer utility and what it is used for?
- The data viewer utility is used in Business
Intelligence Development Studio during development or when troubleshooting
an SSIS Package. The data viewer utility is placed on a data path to see
what data is flowing through that specific data path during execution. The
data viewer utility displays rows from a single buffer at a time, so you
can click on the next or previous icons to go forward and backward to
display data.
What
is an SSIS breakpoint? How do you configure it? How do you disable or delete
it?
- A breakpoint allows you to pause the execution of
the package in Business Intelligence Development Studio during development
or when troubleshooting an SSIS Package. You can right click on the task
in control flow, click on Edit Breakpoint menu and from the Set Breakpoint
window, you specify when you want execution to be halted/paused. For
example OnPreExecute, OnPostExecute, OnError events, etc. To toggle a
breakpoint, delete all breakpoints and disable all breakpoints go to the
Debug menu and click on the respective menu item. You can event specify
different conditions to hit the breakpoint as well.
What
are the different SSIS log providers?
- There are several places where you can log
execution data generated by an SSIS event log:
- SSIS log provider for Text files
- SSIS log provider for Windows Event Log
- SSIS log provider for XML files
- SSIS log provider for SQL Profiler
- SSIS log provider for SQL Server, which writes
the data to the msdb..sysdtslog90 or msdb..sysssislog table depending on
the SQL Server version.
How
do you enable SSIS event logging?
- SSIS provides a granular level of control in
deciding what to log and where to log. To enable event logging for an SSIS
Package, right click in the control flow area of the package and click on
Logging. In the Configure SSIS Logs window you will notice all the tasks
of the package are listed on the left side of the tree view. You can
specifically choose which tasks you want to enable logging. On the right
side you will notice two tabs; on the Providers and Logs tab you specify
where you want to write the logs, you can write it to one or more log
providers together. On the Details tab you can specify what events do you
want to log for the selected task.
- Please note, enabling event logging is immensely
helpful when you are troubleshooting a package, but also incurs additional
overhead on SSIS in order to log the events and information. Hence you
should only enabling event logging when needed and only choose events
which you want to log. Avoid logging all the events unnecessarily.
What
is the LoggingMode property?
- SSIS packages and all of the associated tasks or
components have a property called LoggingMode. This property accepts three
possible values:
- Enabled - to enable logging of that component,
- Disabled - to disable logging of that component
and
- UseParentSetting - to use parent's setting of
that component to decide whether or not to log the data.
What Are The Isolation Levels in SSIS?
ReadUncommited:
Does not lock the records being read. This means that an uncommitted
change can be read and then rolled back by another client, resulting in a local
copy of a record that is not consistent with what is stored in the
database. This is called a dirty read because the data is inconsistent.
Chaos: Behaves the
same way as ReadUncommitted, but checks the isolation level of other pending
transactions during a write operation so that transactions with more
restrictive isolation levels are not overwritten.
ReadCommitted:
Locks the records being read and immediately frees the lock as soon as the
records have been read. This prevents any changes from being read before
they are committed, but it does not prevent records from being added, deleted,
or changed by other clients during the transaction.
RepeatableRead:
Locks the records being read and keeps the lock until the transaction
completes. This ensures that the data being read does not change during
the transaction.
Serializable: Locks the
entire data set being read and keeps the lock until the transaction
completes. This ensures that the data and its order within the database
do not change during the transaction.
Snapshot: The data read
within a transaction will never reflect changes made by other simultaneous
transactions. The transaction uses the data row versions that exist when
the transaction begins. No locks are placed on the data when it is read.
1) Improved
Parallelism of Execution Trees:
The biggest performance improvement in the SSIS 2008 is incorporation of parallelism in the processing of execution tree. In SSIS 2005, each execution tree used a single thread whereas in SSIS 2008 , the Data flow engine is redesigned to utilize multiple threads and take advantage of dynamic scheduling to execute multiple components in parallel, including components within the same execution tree
2) Any .NET language for Scripting:
SSIS 2008 is incorporated with new Visual Studio Tool for Application(VSTA) scripting engine. Advantage of VSTA is it enables user to use any .NET language for scripting.
3) New ADO.NET Source and Destination Component:
SSIS 2008 gets a new Source and Destination Component for ADO.NET Record sets.
4) Improved Lookup Transformation:
In SSIS 2008, the Lookuo Transformation has faster cache loading and lookup operations. It has new caching options, including the ability for the reference dataset to use a cache file(.caw) accessed by the Cache Connectin Manager. In addition same cache can be shared between multiple Lookup Transformations.
5) New Data Profiling Task:
SSIS 2008 has a new debugging aid Data Profiling Task that can help user analyze the data flows occurring in the package.In many cases, execution errors are caused by unexpected variations in the data that is being transferred. The Data Profiling Task can help users to discover the cource of these errors by giving better visibility into the data flow.
6) New Connections Project Wizard:
One of the main usability enhancement to SSIS 2008 is the new Connections Project Wizard. The Connections Project Wizard guides user through the steps required to create source and destinations.
The biggest performance improvement in the SSIS 2008 is incorporation of parallelism in the processing of execution tree. In SSIS 2005, each execution tree used a single thread whereas in SSIS 2008 , the Data flow engine is redesigned to utilize multiple threads and take advantage of dynamic scheduling to execute multiple components in parallel, including components within the same execution tree
2) Any .NET language for Scripting:
SSIS 2008 is incorporated with new Visual Studio Tool for Application(VSTA) scripting engine. Advantage of VSTA is it enables user to use any .NET language for scripting.
3) New ADO.NET Source and Destination Component:
SSIS 2008 gets a new Source and Destination Component for ADO.NET Record sets.
4) Improved Lookup Transformation:
In SSIS 2008, the Lookuo Transformation has faster cache loading and lookup operations. It has new caching options, including the ability for the reference dataset to use a cache file(.caw) accessed by the Cache Connectin Manager. In addition same cache can be shared between multiple Lookup Transformations.
5) New Data Profiling Task:
SSIS 2008 has a new debugging aid Data Profiling Task that can help user analyze the data flows occurring in the package.In many cases, execution errors are caused by unexpected variations in the data that is being transferred. The Data Profiling Task can help users to discover the cource of these errors by giving better visibility into the data flow.
6) New Connections Project Wizard:
One of the main usability enhancement to SSIS 2008 is the new Connections Project Wizard. The Connections Project Wizard guides user through the steps required to create source and destinations.
Q1 Explain
architecture of SSIS?
SSIS architecture consists of four key parts:
a) Integration Services service: monitors running Integration Services packages and manages the storage of packages.
SSIS architecture consists of four key parts:
a) Integration Services service: monitors running Integration Services packages and manages the storage of packages.
b) Integration
Services object model: includes managed API for accessing Integration
Services tools, command-line utilities, and custom applications.
c) Integration
Services runtime and run-time executables: it saves the layout of
packages, runs packages, and provides support for logging, breakpoints,
configuration, connections, and transactions. The Integration Services run-time
executables are the package, containers, tasks, and event handlers that
Integration Services includes, and custom tasks.
d) Data flow
engine: provides the in-memory buffers that move data from source to
destination.
Q2 How would you do Logging in SSIS?
Logging Configuration provides an inbuilt feature which can log the detail of various events like onError, onWarning etc to the various options say a flat file, SqlServer table, XML or SQL Profiler.
Q3 How would you do Error Handling?
A SSIS package could mainly have two types of errors
a) Procedure Error: Can be handled in Control flow through the precedence control and redirecting the execution flow.
b) Data Error: is handled in DATA FLOW TASK buy redirecting the data flow using Error Output of a component.
Q4 How to pass property value at Run time? How do you implement Package Configuration?
A property value like connection string for a Connection Manager can be passed to the pkg using package configurations.Package Configuration provides different options like XML File, Environment Variables, SQL Server Table, Registry Value or Parent package variable.
Q5 How would you deploy a SSIS Package on production?
A) Through Manifest
1. Create deployment utility by setting its propery as true .
2. It will be created in the bin folder of the solution as soon as package is build.
3. Copy all the files in the utility and use manifest file to deply it on the Prod.
B) Using DtsExec.exe utility
C)Import Package directly in MSDB from SSMS by logging in Integration Services.
Q6 Difference between DTS and SSIS?
Every thing except both are product of Microsoft :-).
Q7 What are new features in BIDS ?
Q9 What is Execution Tree?
Execution trees demonstrate how package uses buffers and threads. At run time, the data flow engine breaks down Data Flow task operations into execution trees. These execution trees specify how buffers and threads are allocated in the package. Each tree creates a new buffer and may execute on a different thread. When a new buffer is created such as when a partially blocking or blocking transformation is added to the pipeline, additional memory is required to handle the data transformation and each new tree may also give you an additional worker thread.
Q11 You may get a question stating a scenario and then asking you how would you create a package for that e.g. How would you configure a data flow task so that it can transfer data to different table based on the city name in a source table column?
Q13 Difference between Unionall and Merge Join?
a) Merge transformation can accept only two inputs whereas Union all can take more than two inputs
b) Data has to be sorted before Merge Transformation whereas Union all doesn't have any condition like that.
Q14 May get question regarding what X transformation do?Lookup, fuzzy lookup, fuzzy grouping transformation are my favorites.
For you.
Q15 How would you restart package from previous failure point?What are Checkpoints and how can we implement in SSIS?
When a package is configured to use checkpoints, information about package execution is written to a checkpoint file. When the failed package is rerun, the checkpoint file is used to restart the package from the point of failure. If the package runs successfully, the checkpoint file is deleted, and then re-created the next time that the package is run.
Q16 Where are SSIS package stored in the SQL Server?
MSDB.sysdtspackages90 stores the actual content and ssydtscategories, sysdtslog90, sysdtspackagefolders90, sysdtspackagelog, sysdtssteplog, and sysdtstasklog do the supporting roles.
Q17 How would you schedule a SSIS packages?
Using SQL Server Agent. Read about Scheduling a job on Sql server Agent
Q18 Difference between asynchronous and synchronous transformations?
Asynchronous transformation have different Input and Output buffers and it is up to the component designer in an Async component to provide a column structure to the output buffer and hook up the data from the input.
Q19 How to achieve parallelism in SSIS?
Q2 How would you do Logging in SSIS?
Logging Configuration provides an inbuilt feature which can log the detail of various events like onError, onWarning etc to the various options say a flat file, SqlServer table, XML or SQL Profiler.
Q3 How would you do Error Handling?
A SSIS package could mainly have two types of errors
a) Procedure Error: Can be handled in Control flow through the precedence control and redirecting the execution flow.
b) Data Error: is handled in DATA FLOW TASK buy redirecting the data flow using Error Output of a component.
Q4 How to pass property value at Run time? How do you implement Package Configuration?
A property value like connection string for a Connection Manager can be passed to the pkg using package configurations.Package Configuration provides different options like XML File, Environment Variables, SQL Server Table, Registry Value or Parent package variable.
Q5 How would you deploy a SSIS Package on production?
A) Through Manifest
1. Create deployment utility by setting its propery as true .
2. It will be created in the bin folder of the solution as soon as package is build.
3. Copy all the files in the utility and use manifest file to deply it on the Prod.
B) Using DtsExec.exe utility
C)Import Package directly in MSDB from SSMS by logging in Integration Services.
Q6 Difference between DTS and SSIS?
Every thing except both are product of Microsoft :-).
Q7 What are new features in BIDS ?
Q9 What is Execution Tree?
Execution trees demonstrate how package uses buffers and threads. At run time, the data flow engine breaks down Data Flow task operations into execution trees. These execution trees specify how buffers and threads are allocated in the package. Each tree creates a new buffer and may execute on a different thread. When a new buffer is created such as when a partially blocking or blocking transformation is added to the pipeline, additional memory is required to handle the data transformation and each new tree may also give you an additional worker thread.
Q11 You may get a question stating a scenario and then asking you how would you create a package for that e.g. How would you configure a data flow task so that it can transfer data to different table based on the city name in a source table column?
Q13 Difference between Unionall and Merge Join?
a) Merge transformation can accept only two inputs whereas Union all can take more than two inputs
b) Data has to be sorted before Merge Transformation whereas Union all doesn't have any condition like that.
Q14 May get question regarding what X transformation do?Lookup, fuzzy lookup, fuzzy grouping transformation are my favorites.
For you.
Q15 How would you restart package from previous failure point?What are Checkpoints and how can we implement in SSIS?
When a package is configured to use checkpoints, information about package execution is written to a checkpoint file. When the failed package is rerun, the checkpoint file is used to restart the package from the point of failure. If the package runs successfully, the checkpoint file is deleted, and then re-created the next time that the package is run.
Q16 Where are SSIS package stored in the SQL Server?
MSDB.sysdtspackages90 stores the actual content and ssydtscategories, sysdtslog90, sysdtspackagefolders90, sysdtspackagelog, sysdtssteplog, and sysdtstasklog do the supporting roles.
Q17 How would you schedule a SSIS packages?
Using SQL Server Agent. Read about Scheduling a job on Sql server Agent
Q18 Difference between asynchronous and synchronous transformations?
Asynchronous transformation have different Input and Output buffers and it is up to the component designer in an Async component to provide a column structure to the output buffer and hook up the data from the input.
Q19 How to achieve parallelism in SSIS?
Parallelism is achieved using MaxConcurrentExecutable property
of the package. Its default is -1 and is calculated as number of processors +
2.
-More questions added-Sept 2011
Q20 How do you do incremental load?
-More questions added-Sept 2011
Q20 How do you do incremental load?
Fastest way to do incremental load is by
using Timestamp column in source table and then storing last ETL timestamp, In
ETL process pick all the rows having Timestamp greater than the stored
Timestamp so as to pick only new and updated records
Question 3 – Can you explain different
options for dynamic
configurations in SSIS?
Answer: Use an XML file
Use custom variables
Use a database per environment with the variables
Use a centralized database with all variables
Use custom variables
Use a database per environment with the variables
Use a centralized database with all variables
Question 4 – How do you upgrade an SSIS
Package?
Answer: Depending on the complexity of the package, one or
two techniques are typically used:
1) Recode the package based on the functionality in SQL Server DTS
2) Use the Migrate DTS 2000 Package wizard in BIDS then recode any portion of the package that is not accurate
1) Recode the package based on the functionality in SQL Server DTS
2) Use the Migrate DTS 2000 Package wizard in BIDS then recode any portion of the package that is not accurate
) What is Business
Intelligence (BI)?
Business Intelligence (BI) refers to technologies,
applications and practices for the collection, integration, analysis, and
presentation of business information and sometimes to the information itself.
The purpose of BI is to support better business decision making. Thus, BI is
also described as a decision support system (DSS).
BI systems provide historical, current, and
predictive views of business operations, most often using data that has been
gathered into a data warehouse or a data mart and occasionally working from
operational data.
2) What is SQL Server
Integration Services (SSIS)?
SSIS is designed
for data integration and processing applications. For example, Processing of
data into a data mart or data warehouse, where data is used exclusively for
business intelligence (BI) analytics and reporting. OR, SSIS can be used to
move data from legacy systems into new systems during application migrations,
to integrate data from multiple systems, to cleanse data, to import data, to
define workflows to automate updating multi-dimensional cubes and automating
maintenance tasks for SQL Server databases.
3) Architecture of
Integration Services
SSIS architecture
has five main components:
1) SSIS Designer
SSIS Designer is a graphical tool that you can use
to create and maintain Integration Services packages.
2) Runtime engine
The Integration Services runtime saves the layout
of packages, runs packages, and provides support for logging, breakpoints,
configuration, connections, and transactions.
3) Tasks and other
executables
The Integration Services run-time executables are
the package, containers, tasks, and event handlers that Integration Services
includes. Run-time executables also include custom tasks that you develop.
4) Data Flow engine and
Data Flow components
The Data Flow task encapsulates the data flow
engine. The data flow engine provides the in-memory buffers that move data from
source to destination, and calls the sources that extract data from files and
relational databases. The data flow engine also manages the transformations
that modify data, and the destinations that load data or make data available to
other processes
5) Integration Services
Service
The Integration Services service lets you use SQL
Server Management Studio to monitor running Integration Services packages and
to manage the storage of packages.
4) What is ETL?
ETL is an acronym
for Extract, Transform, and Load and describes the processes
that take place in data warehousing environments for extracting data from
source transaction systems; transforming, cleaning, deduplicating, and
conforming the data; and finally loading it into cubes or other analysis
destinations.
5) What are the tabs
available in SSIS Designer?
SSIS Designer has four permanent
tabs: ControlFlow | DataFlow | Event Handlers |Package Explorer. At
run time a fifth tab appears that shows the execution progress of a package:
|Progress
In SQL Server 2012 there are five permanent tabs.
The additional tab name is: |Parameter.
6) What is the
difference between control flow and data flow?
The control flow is the workflow
engine and contains control flow tasks, containers, and precedence constraints,
which manage when tasks and containers execute.
The data flow, in contrast, is directly related to processing and transforming data from sources to destinations.
The data flow, in contrast, is directly related to processing and transforming data from sources to destinations.
7) What are Precedence
constraints?
Precedence constraints are the control
flow connectors between tasks and containers. Usually, these connectors are
green, red, or blue, indicating the execution success, failure, or completion,
respectively. Precedence constraints allow you to define the
logical sequence of tasks in the order they should be executed. SSIS also
provides advanced constraint capabilities, which allow conditional
and/or expression evaluation criteria.
8) What are Data
Paths?
The connectors between objects in the data flow
are data paths.
There are two primary paths in the data flow:
Data paths The green connectors
that go from one component to another. For these paths, the rows that have
successfully gone through a component are output, as are error rows when the
error output is set to ignore failures.
Error paths The red connectors
between data flow components. They contain data rows that fail in a component
when the error rows are set to be redirected.
9) What is a Data
Viewer utility and what it is used for?
The data viewer utility is used for troubleshooting
data issues especially when you are not able to easily identify the problem row
or issue. SSIS includes the capability to watch rows in the data flow as they
are passing through the pipeline. SSIS implements this capability through data
viewers, a feature you can use when you are running packages in BIDS during
development. For any path in the data flow, you can add a data viewer that
pauses the execution of the data flow and displays the data in the data viewer
in one of four formats.Grid format is the only available format in SSIS 2012.
10) What is a
breakpoint in SSIS?
By using breakpoints during package debugging, you
can pause a package during execution to observe package state information so
that you can troubleshoot potential problems. Breakpoints function available
only in the control flow and not in the data flow.
11) How do you setup
the breakpoints in SSIS?
To set a breakpoint, select the task or container,
and either press F9 or navigate to the Debug/Toggle Breakpoint menu. You can
set multiple breakpoints in a package, and you can embed a breakpoint within a
Script Task at a line of code.
12) How do you see the
variable values during debug?
Open the Locals window to see all the variable
values and the package status. To open this window, choose Windows and then
Locals from the Debug menu.
13) List some of the
Common Tasks in SSIS
Execute SQL Task
Execute Process Task
Execute Package Task
Data Flow Task
File System Task
FTP Task
Script Task
Send Mail Task
Web Service Task
Bulk Insert Task
Data Profiling Task
Execute DTS 2000Package Task
ActiveX Script Task
Analysis Services Execute DDL Task
Analysis Services Processing Task
Analysis Services Processing Task
14) What are the
available Integration Services Containers?
There are three primary containers in SSIS: a
Sequence Container, a For Loop Container, and a Foreach Loop Container.
1) Sequence
Container Groups tasks and containers into control flows that are subsets
of the package control flow, and lets you apply transactions or assign logging
to the container.
2) For Loop
Container Provides the same functionality as the Sequence Container except
that it also lets you run the tasks within it
multiple times based on an evaluation condition, such as looping from 1 to 10.
3) Foreach Loop
Container Also allows looping, but instead of providing a condition
expression, you loop over a set of objects, such as files in a folder or a
record set.
4) Apart from
the above mentioned containers, there is one more container called the Task
Host Container which is not visible from the IDE, but every task is contained
in it (the default container for all the tasks).
15) What are variables
and what is variable scope?
A variable is used to store values .Within SSIS,
there are two types of variables: System Variable and User Variable. System
variables are not editable but can be referenced within tasks and containers.
Variables are referenced as User::[VariableName] or
System::[VariableName].
The scope defines at what level
within a package the variable is created. For example package level variables
are accessible to all the tasks in the package and container level variables
are accessible only to those tasks that are within the container.
16) What are the type’s
objects available in Dataflow task toolbox?
The Data Flow Task has three types
of objects in the toolbox:
1) Data flow source
adapters
2) Data flow
transformations
3) Data flow destination
adapters
17) What are the Data
Flow Source adapters available in SSIS?
ADO.NET Source Provides
connections to tables or queries through an ADO.NET provider.
Excel Source Allows
extractions from an Excel worksheet defined in an Excel file.
Flat File Source Connects
to a delimited or fixed-width file created with different code pages.
OLE DB Source Connects
to installed OLE DB providers, such as SQL Server, Access, SSAS, and Oracle.
Raw File Source Stores
native SSIS data in a binary file type useful for data staging.
XML Source Allows
raw data to be extracted from an XML file; requires an XML schema to define
data associations.
18) What are the Data
FLOW Destination adapters available in SSIS?
ADO.NET Destination Allows
insertion of data by using an ADO.NET provider.
Data Mining Model Training Lets you pass
data from the data flow into a data mining model in SSAS.
DataReader Destination Lets you put
data in an ADO.NET record set that can be programmatically referenced.
Dimension Processing Lets SSAS
dimensions be processed directly from data flowing through the data flow.
Excel Destination Used for
inserting data into Excel, including Excel 2007.
Flat File Destination Allows
insertion of data to a fl at file such as a comma delimited or tab-delimited
file.
OLE DB Destination Uses the OLE DB
provider to insert rows into a destination system that allows an OLE DB
connection.
Partition Processing Allows SSAS
partitions to be processed directly from data fl owing through the data flow.
Raw File Destination Stores native
SSIS data in a binary file type useful for data staging.
Recordset Destination Takes the data
flow data and creates a record set in a package variable of typeobject.
SQL Server Compact Destination Lets you
send data to a mobile device running SQLMobile.
SQL Server Destination Provides a
high-speed destination specific to SQL Server 2008 if the package is running on
SQL Server.
19) What is a
Transformation?
Transformations give you the
ability to modify and manipulate data in the data flow. A transformation
performs an operation either on one row of data at a time or on several rows of
data at once.
20) What are the Data
Flow Transformations adapters available in SSIS?
Logical row-level transformations
1) Audit Adds
additional columns to each row based on system package variables such asExecutionStartTime and PackageName.
2) Cache Transform Allows data
that will be used in a Lookup Transformation to be cached and available for
multiple Lookup components.
3) Character Map Performs
common text operations, such as Uppercase, and allows advanced linguistic bit
conversion operations.
4) Copy Column Duplicates
column values in each row to new named columns. Data Conversion Creates new
columns in each row based on new data types converted from other columns—for
example, converting text to numeric.
5) Derived Column Uses the SSIS
Expression language to perform in-place calculations on existing values;
alternatively, allows the addition of new columns based on expressions and
calculations from other columns and variables.
6) Export Column Exports
binary large object (BLOB) columns, one row at a time, to a file.
7) Import Column Loads binary
files such as images into the pipeline; intended for a BLOB data type
destination.
8) Row Count Tracks the
number of rows that flow through the transformation and stores the number in a
package variable after the final row.
Multi -input or multi -output
transformations
1) Conditional Split Routes or
filters data based on Boolean expressions to one or more outputs, from which
each row can be sent out only one output path.
2) Lookup Allows
matching between pipeline column values to external database tables; additional
columns can be added to the data flow from the external table.
3) Merge Combines the
rows of two similar sorted inputs, one on top of the other, based on a defined
sort key.
4) Merge Join Joins the
rows of two sorted inputs based on a defined join column(s), adding columns
from each source.
5) Multicast Generates one
or more identical outputs, from which every row is sent out every output.
6) Union All Combines one
or more similar inputs, stacking rows one on top of another, based on matching
columns
Multi -row transformations
1) Aggregate Associates
records based on defined groupings and generates aggregations such asSUM, MAX, MIN,
and COUNT.
2) Percent Sampling Filters the
input rows by allowing only a defined percent to be passed to the output path.
3) Pivot Takes
multiple input rows and pivots the rows to generate an output with more columns
based on the original row values.
4) Row Sampling Outputs a
fixed number of rows, sampling the data from the entire input, no matter how
much larger than the defined output the input is.
5) Sort Orders the
input based on defined sort columns and sort direction and allows the removal
of duplicates across the sort columns.
6) Unpivot Takes a
single row and outputs multiple rows, moving column values to the new row based
on defined columns.
In the cases of the Sort, Aggregate, and Row
Sampling Transformations, all the input rows are blocked, allowing the
transformations to perform the work before sending rows down the output path.
These transformations often require more server resources, memory, and
processor capacity than do other transformations.
Advanced data -preparation
transformations
1) OLE DB Command Performs
database operations such as updates and deletes, one row at a time, based on
mapped parameters from input rows.
2) Slowly Changing
Dimension Processes dimension changes, including tracking dimension history
and updating dimension values. The Slowly Changing Dimension Transformation
handles these common dimension change types: Historical Attributes,
Fixed Attributes, and Changing Attributes.
3) Data Mining Query Applies input
rows against a data mining model for prediction.
4) Fuzzy Grouping Associates
column values with a set of rows based on similarity, for data cleansing.
5) Fuzzy Lookup Joins a data
flow input to a reference table based on column similarity. The Similarity
Threshold setting specifies the closeness of allowed matches—a high setting
means that matching values are closer in similarity.
6) Script Component Provides
VB.NET scripting capabilities against rows, columns, inputs, and outputs in the
data flow pipeline.
7) Term Extraction Analyzes text
input columns for English nouns and noun phrases.
8) Term Lookup Analyzes text
input columns against a user-defined set of words for association.
21) Difference between
Merge and Union All Transformation?
Both Merge and Union All transform provides
a way to combine multiple data flows into one, but there are some differences
between two.
1) Merge transform can
accept only two inputs where as Union all can combine more than two inputs.
2) The Union All
transform does not have any requirements to be sorted like the Merge transform.
3) The Merge transform
is useful when you want to retain the sorted order of rows even when
the rows must be processed in different execution paths.
4) The Union All
transform is useful for rejoining previously split paths—for example, paths
that were split upstream by a Conditional Split or Multicast transform.
22) Difference between
Merge and Merge Join Transformation?
The Merge Join transform is similar to the Merge
transform in that it requires the inputs to be sorted and combines the rows
from the inputs into one output. However, whereas the Merge transform always
produces the same amount of rows as the number flowing into both inputs, the
Merge Join transform provides the added capability to do joins, which might
increase or decrease the total resulting row count.
23) Is there another
way to use Merge Join without sort transformation?
Sort transformation can become slower on large
datasets. Merge join requires sorted input. However in order to get better
performance we might be able to change the source itself and sort
the input data using TSQL ORDER BY and change the IsSorted and SortKeyPosition
properties of Source Adapter.
24) How can you
terminate a dataflow path without a destination?
The Multicast transform can be used for terminating
a data flow path without requiring a variable or connection manager. This comes
in handy when prototyping data flows. Just make sure the RunInOptimizedMode
property is set to FALSE so the Execution Engine won’t trim the Multicast’s
upstream execution tree.
25) What are the
performance considerations given while dealing with Slowly Changing Dimension?
The Slowly Changing Dimension Wizard and the Slowly
Changing Dimension transformation are general-purpose tools that meet the needs
of most users. However, after you start dealing with larger dimensional datasets,
you’ll probably want to modify the automatically generated data flow to improve
performance.
· Typically, the
slowest components in the Slowly Changing Dimension transformation are the OLE
DB Command transformations that perform UPDATEs against a single row at a time.
Therefore, the most effective way to improve the performance of the Slowly
Changing Dimension transformation is to replace the OLE DB Command
transformations. You can replace these transformations with destination
components that save all rows to be updated to a staging table. Then, you can
add an Execute SQL task that performs a single set-based Transact-SQL UPDATE
against all rows at the same time.
· Consider using a
Lookup transform with partial caching turned on right before the SCD to quickly
identify rows that don’t exist in the dimension table. Those rows that do not
exist can be directly inserted. This effectively eliminates the rows flowing
through the New Output from the SCD transform and increases the efficiency of
the SCD transform lookup step.
· Consider
alternatives to SCD: In SQL 2008 Merge TSQL construct allows the merging of
datasets. Instead of the SCD’s row by row evaluation approach, the MERGE
operation is a set based operation which compares the whole of the source
dataset to the reference dataset in a single pass. Needs to be aware that,
Merge is a bulk operation, a single row will cause failure of the whole
batch.
26) What are the
command line tools to execute and manage SQL Server Integration Services packages?
DTExec - is a command line
tool for executing SSIS packages. The required switches must be passed into the
command for successful execution of the SSIS package.
DTUtil - is a command line tool used
for managing, moving, and deleting SSIS packages.
DTExecUI - is the
execution utility with a graphical user interface (GUI). You can use it to
build command lines for DTExec.exe, execute packages directly, and pass
parameters to the package
27) Can you explain how
to setup a checkpoint file in SSIS?
The following items
needs to be configured on the properties tab for SSIS package
1) Set the SaveCheckpoints property
at the package level to True. This allows SSIS to save checkpoints during
package execution.
2) For the CheckpointFileName property,
provide a valid path and file name to a checkpoint file. Packages use files to
maintain their state information, so if a package fails and is then restarted,
the package can read the checkpoint file to determine where it left off and to
track the state information at the last successful task.
3) Set the CheckpointUsage property
to If Exists, which causes the package to run from the beginning if the
checkpoint file is not present or to run from the identified point if the file
exists.
After you enable checkpoints in a package, the
final step is to set checkpoints at the various tasks within your package. To
do this, set the FailPackageOnFailure property at each task or container to
True.
28) What are the most
common log events in SSIS?
The most common log events are:
OnError
OnPostExecute
OnPreExecute
On-Progress
OnTaskFailed
The OnPreExecute and OnPostExecute events are
captured at the start and end of each container and task when the package runs
and allows a method to determine the execution times of each task.
29) If you select SQL
Server as the log provider, where will be the log details saved?
If you select SQL Server as the log provider, the
log details are stored in a system table named dbo.sysssislog.
30) What are the formats
available in Flat file Connection Manager?
There are three formats available: Delimited, Fixed
Width, and Ragged Right.
Delimited The
columns have some delimiter, such as a semicolon, tab, or comma. Comma
separated value (CSV) files are an example of delimited data.
Fixed Width The
data is aligned into columns whose width is the same for all rows.
Ragged Right The same as fixed
width with one exception: Each row is delimited, by default, with a CR/LF
combination but can be configured to use other delimiting characters.
Ragged-right flat files are a little more flexible because the last column can
be of different widths.
31) How to strip out
double quotes from an import file in SQL Server Integration Services?
On the Flat File Connection Manager Editor, enter
quotes into the Text qualifier.
The text qualifier specifies what character the
file uses to indicate the beginning and end of a string. Typically, the
character is a double quote or single quote, but other characters can be used
as well.
32) How do you upgrade
an SSIS Package?
There are several options
available for migrating to Integration Services 2008
The Upgrade Advisor tells you what to expect when
you migrate.
The Migration Wizard gets you started on migrating
DTS packages, and the Execute DTS 2000 Packages Task allows you to embed DTS
packages into Integration Services packages.
SSIS 2005 package upgrades are smooth and mostly
trouble free. The easiest ways are:
By loading an SSIS 2005 project inside BIDS
By right-clicking on the SSIS Packages folder of an
SSIS project and selecting the Upgrade All Packages menu option
33) Which option needs
to be enabled to ensure the rows are inserted quickly in a Dataflow
Destination?
Edit the OLE DB Destination and set the Data Access
Mode to Table or view—fast Load. When the OLE DB Destination is not configured
with fast load, only one row at a time will be inserted into the destination
table.
34) What does
RunInOptimizedMode setting meant?
RunInOptimizedMode is the data flow–specific
properties and tells the Data Flow Task to attempt to trim unused components.
The data flow engine trims components by ignoring them. You might experience
some performance gains from this setting because it eliminates components that
have no side effects.
35) What is new in
Integration Services 2012
36) What are the
discontinued functionalities in Integration Services (SQL Server 2012)?
Data Source Views have been discontinued in this
release.
With SQL Server 12, data sources and data source
views cannot be added to a project. Instead, these connections can be added to
the individual packages. In prior versions, data sources could be added as
connection managers and were allowed to refer to the included DSVs as the
tables in source components. Using this methodology to access data on SQL
Server was not the optimal way to extract data.
Data Viewer
Previous versions of SSIS included four types of
data viewer—the Grid, Histogram, Scatterplot, and Column Chart data viewers.
Because the Grid data viewer is the most commonly used, the SSIS team removed
the other types of data viewers from SSIS 12.
Data Transformation Services (DTS)
DTS was supported as a legacy application in SQL
Server 2005, 2008, and 2008 R2. DTS is no longer supported with this newest
release of SQL Server. This means that the Execute DTS 2000 Package task, the
DTS runtime and API, and Package Migration Wizard are all going away.
Additionally, the ActiveX Script task, which was
provided strictly for DTS support, will be removed. Many of the tasks that
ActiveX Script tasks were used for can be handled with precedence constraints,
while more complex tasks can be rewritten as Script tasks.
What is the
difference between data warehousing and business intelligence?
Data warehousing deals with all aspects of managing the development, implementation and operation of a data warehouse or data mart including meta data management, data acquisition, data cleansing, data transformation, storage management, data distribution, data archiving, operational reporting, analytical reporting, security management, backup/recovery planning, etc. Business intelligence, on the other hand, is a set of software tools that enable an organization to analyze measurable aspects of their business such as sales performance, profitability, operational efficiency, effectiveness of marketing campaigns, market penetration among certain customer groups, cost trends, anomalies and exceptions, etc. Typically, the term “business intelligence” is used to encompass OLAP, data visualization, data mining and query/reporting tools
Data warehousing deals with all aspects of managing the development, implementation and operation of a data warehouse or data mart including meta data management, data acquisition, data cleansing, data transformation, storage management, data distribution, data archiving, operational reporting, analytical reporting, security management, backup/recovery planning, etc. Business intelligence, on the other hand, is a set of software tools that enable an organization to analyze measurable aspects of their business such as sales performance, profitability, operational efficiency, effectiveness of marketing campaigns, market penetration among certain customer groups, cost trends, anomalies and exceptions, etc. Typically, the term “business intelligence” is used to encompass OLAP, data visualization, data mining and query/reporting tools
This is a great post for gathering a lot of information related to SSIS and the recurring interview questions.
ReplyDeleteI really feel this is the best way in which one can look for more aspects of an actual problem.SSIS as a tool is really helpful to look and find out the best.
ReplyDeleteSSIS Upsert
SSIS Upsert
It 's an amazing and awesome blog. Thanks for sharing
ReplyDeleteMsbi Online Training India
Msbi Certifiacation Training
your valuable information and time. Please keep updating.
ReplyDeleteMsbi Online Training
Msbi Developer Course