Thursday, October 13, 2016

SSIS INTERVIEW QUESTIONS AND ANSWERS : SSIS BASICS TO ADVANCED QUESTIONS AND ANSWERS

SSIS INTERVIEW QUESTIONS AND ANSWERS : ALL BASIC QUESTIONS ON SQL SERVER INTEGRATION SERVICES 
SSIS  Interview questions with Answers

• What do we mean by dataflow in SSIS?
Data flow is nothing but the flow of data from the corresponding sources to the referred destinations. In this process, the data transformations make changes to the data to make it ready for the data warehouse.

• What is a breakpoint in SSIS? How is it setup? How do you disable it?
A breakpoint is a stopping point in the code. The breakpoint can give the Developer\DBA an
opportunity to review the status of the data, variables and the overall status of the SSIS package.
10 unique conditions exist for each breakpoint.
Breakpoints are setup in BIDS. In BIDS, navigate to the control flow interface. Right click on the
object where you want to set the breakpoint and select the ‘Edit Breakpoints…’ option.

• Can you name 5 or more of the native SSIS connection managers?
1) OLEDB connection – Used to connect to any data source requiring an OLEDB connection (i.e.,
SQL Server 2000)
2) Flat file connection – Used to make a connection to a single file in the File System. Required for reading information from a File System flat file
3) ADO.Net connection – Uses the .Net Provider to make a connection to SQL Server 2005 or other
connection exposed through managed code (like C#) in a custom task
4) Analysis Services connection – Used to make a connection to an Analysis Services database or project. Required for the Analysis Services DDL Task and Analysis Services Processing Task
5) File connection – Used to reference a file or folder. The options are to either use or create a file or folder
6) Excel
• What is the use of Bulk Insert Task in SSIS?flat file   
Bulk Insert Task is used to upload large amount of data from flat files into Sql Server. It supports only OLE DB connections for destination database.
• What is Conditional Split transformation in SSIS?
This is just like IF condition which checks for the given condition and based on the condition evaluation, the output will be sent to the appropriate OUTPUT path. It has ONE input and MANY outputs. Conditional Split transformation is used to send paths to different outputs based on some conditions. For example, we can organize the transform for the students in a class who have marks greater than 40 to one path and the students who score less than 40 to another path.

• How do you eliminate quotes from being uploaded from a flat file to SQL Server? 
This can be done using TEXT QUALIFIER property. In the SSIS package on the Flat File Connection Manager Editor, enter quotes into the Text qualifier field then preview the data to ensure the quotes are not included.

• Can you explain how to setup a checkpoint file in SSIS?
The following items need to be configured on the properties tab for SSIS package:
CheckpointFileName – Specify the full path to the Checkpoint file that the package uses to save the value of package variables and log completed tasks. Rather than using a hard-coded path as shown above, it’s a good idea to use an expression that concatenates a path defined in a package variable and the package name.
CheckpointUsage – Determines if/how checkpoints are used. Choose from these options: Never(default), IfExists, or Always. Never indicates that you are not using Checkpoints. IfExists is the typical setting and implements the restart at the point of failure behavior. If a Checkpoint file is found it is used to restore package variable values and restart at the point of failure. If a Checkpoint file is not found the package starts execution with the first task. The Always choice raises an error if the Checkpoint file does not exist.
SaveCheckpoints – Choose from these options: True or False (default). You must select True to implement the Checkpoint behavior.

• What are the different values you can set for CheckpointUsage property ?
There are three values, which describe how a checkpoint file is used during package execution:
1) 
Never: The package will not use a checkpoint file and therefore will never restart.
2) 
If Exists: If a checkpoint file exists in the place you specified for the CheckpointFilename property, then it will be used, and the package will restart according to the checkpoints written.
3) 
Always: The package will always use a checkpoint file to restart, and if one does not exist, the package will fail.
• What is the ONLY Property you need to set on TASKS in order to configure CHECKPOINTS to RESTART package from failure?
The one property you have to set on the task is 
FailPackageOnFailure. This must be set for each task or container that you want to be the point for a checkpoint and restart. If you do not set this property to true and the task fails, no file will be written, and the next time you invoke the package, it will start from the beginning again.

• Where can we set the CHECKPOINTS, in DataFlow or ControlFlow ?
Checkpoints only happen at the Control Flow; it is not possible to checkpoint transformations or restart inside a Data Flow. The Data Flow Task can be a checkpoint, but it is treated as any other task.
• Can you explain different options for dynamic configurations in SSIS?
1) XML file
2) custom variables
3) Database per environment with the variables
4) Use a centralized database with all variables

• What is the use of Percentage Sampling transformation in SSIS?
Percentage Sampling transformation is generally used for data mining. This transformation builds a random sample of set of output rows by choosing specified percentage of input rows. For example if the input has 1000 rows and if I specify 10 as percentage sample then the transformation returns 10% of the RANDOM records from the input data.

• What is the use of Term Extraction transformation in SSIS?
Term Extraction transformation is used to extract nouns or noun phrases or both noun and noun phrases only from English text. It extracts terms from text in a transformation input column and then writes the terms to a transformation output column. It can be also used to find out the content of a dataset.
• What is Data Viewer and what are the different types of Data Viewers in SSIS?
A Data Viewer allows viewing data at a point of time at runtime. If data viewer is placed before and after the Aggregate transform, we can see data flowing to the transformation at the runtime and how it looks like after the transformation occurred. The different types of data viewers are:
1. Grid
2. Histogram
3. Scatter Plot
4. Column Chart.
• What is Ignore Failure option in SSIS?
In Ignore Failure option, the error will be ignored and the data row will be directed to continue on the next transformation. Let’s say you have some JUNK data(wrong type of data or JUNK data) flowing from source, then using this option in SSIS we can REDIRECT the junk data records to another transformation instead of FAILING the package. This helps to MOVE only valid data to destination and JUNK can be captured into separate file.
• Which are the different types of Control Flow components in SSIS?
The different types of Control Flow components are: Data Flow Tasks, SQL Server Tasks, Data Preparation Tasks, Work flow Tasks, Scripting Tasks, Analysis Services Tasks, Maintenance Tasks, Containers.
• What are containers? What are the different types of containers in SSIS?
Containers are objects that provide structures to packages and extra functionality to tasks. There are four types of containers in SSIS, they are: Foreach Loop Container, For Loop Container, Sequence Container and Task Host Container.
• What are the different types of Data flow components in SSIS?
There are 3 data flow components in SSIS.
1. Sources
2. Transformations
3. Destinations
• What are the different types of data sources available in SSIS?
There are 7 types of data sources provided by SSIS: a.) Data Reader source b.) Excel source c.) Flat file source d.) OLEDB source e.) Raw file source f.) XML source g.) Script component
• What is SSIS Designer?
It is a graphical tool for creating packages. It has 4 tabs: Control Flow, Data Flow, Event Handlers and Package Explorer.
• What is Control Flow tab?
It is the tab in SSIS designer where various Tasks can be arranged and configured. This is the tab where we provide and control the program flow of the project. 
 In SSIS a workflow is called a control-flow. A control-flow links together our modular data-flows as a series of operations in order to achieve a desired result.

A control flow consists of one or more tasks and containers that execute when the package runs. To control order or define the conditions for running the next task or container in the package control flow, you use precedence constraints to connect the tasks and containers in a package. A subset of tasks and containers can also be grouped and run repeatedly as a unit within the package control flow
• What is Data Flow tab?
This is the tab where we do all the work related to ETL job. It is the tab in SSIS Designer where we can extract data from sources, transform the data and then load them into destinations.
 A data flow consists of the sources and destinations that extract and load data, the transformations that modify and extend data, and the paths that link sources, transformations, and destinations. Before you can add a data flow to a package, the package control flow must include a Data Flow task.
• What is the function of control flow tab in SSIS?
On the control flow tab, the tasks including dataflow task, containers and precedence constraints that connect containers and tasks can be arranged and configured.
• What is the function of Event handlers tab in SSIS?
On the Event handlers tab, workflows can be configured to respond to package events.
For example, we can configure Work Flow when ANY task Failes or Stops or Starts ..
• What is the function of Package explorer tab in SSIS?
This tab provides an explorer view of the package. You can see what is happening in the package. The Package is a container at the top of the hierarchy.
• What is Solution Explorer?
It is a place in SSIS Designer where all the projects, Data Sources, Data Source Views and other miscellaneous files can be viewed and accessed for modification.
• How do we convert data type in SSIS?
The Data Conversion Transformation in SSIS converts the data type of an input column to a different data type.
• How are variables useful in ssis package?
Variables can provide communication among objects in the package. Variables can provide communication between parent and child packages. Variables can also be used in expressions and scripts. This helps in providing dynamic values to tasks.
 Variables store values that a SSIS package and its containers, tasks, and event handlers can use at run time. The scripts in the Script task and the Script component can also use variables. The precedence constraints that sequence tasks and containers into a workflow can use variables when their constraint definitions include expressions.

Integration Services supports two types of variables: user-defined variables and system variables. User-defined variables are defined by package developers, and system variables are defined by Integration Services. You can create as many user-defined variables as a package requires, but you cannot create additional system variables.

Scope : A variable is created within the scope of a package or within the scope of a container, task, or event handler in the package. Because the package container is at the top of the container hierarchy, variables with package scope function like global variables and can be used by all containers in the package. Similarly, variables defined within the scope of a container such as a For Loop container can be used by all tasks or containers within the For Loop container
.
• Explain Aggregate Transformation in SSIS?
It aggregates data, similar you do in applying TSQL functions like Group By, Min, Max, Avg, and Count. For example you get total quantity and Total line item for each product in Aggregate Transformation Editor. First you determine input columns, then output column name in Output Alias table in datagrid, and also operations for each Output Alias in Operation columns of the same datagrid. Some of operation functions listed below :
• Group By
• Average
• Count
• Count Distinct : count distinct and non null column value
• Min, Max, Sum

  • In the Aggregate transformation, Keys and Keys scale refer to the number of groups that are expected to result from a Group by operation.
  • Count distinct keys and Count distinct scale refer to the number of distinct values that are expected to result from a Distinct count operation.
• Explain Audit Transformation ?
It allows you to add auditing information as required in auditing world specified by HIPPA and Sarbanes-Oxley (SOX). Auditing options that you can add to transformed data through this transformation are :
1. Execution of Instance GUID : ID of execution instance of the package
2. PackageID : ID of the package
3. PackageName
4. VersionID : GUID version of the package
5. Execution StartTime
6. MachineName
7. UserName
8. TaskName
9. TaskID : uniqueidentifier type of the data flow task that contains audit transformation.
• Explain Character Map Transformation?
It transforms some character. It gives options whether output result will override the existing column or add to new column. If you define it as new column, specify new column name. Operations available here are:
1. Uppercase
2. Lowercase
3. Byte reversal : such as from 0×1234 to 0×4321
4. Full width
5. Half width
6. Hiragana/katakana/traditional Chinese/simplified Chinese
7. Linguistic casing
• Explain Conditional split Transformation ?
It functions as if…then…else construct. It enables send input data to a satisfied conditional branch. For example you want to split product quantity between less than 500 and greater or equal to 500. You can give the conditional a name that easily identifies its purpose. Else section will be covered in Default Output Column name.
After you configure the component, it connect to subsequent transformation/destination, when connected, it pops up dialog box to let you choose which conditional options will apply to the destination transformation/destination.
• Explain Copy column Transformation?
This component simply copies a column to another new column. Just like ALIAS Column in T-Sql.
• Explain Data conversion Transformation?
This component does conversion data type, similar to TSQL function CAST or CONVERT. If you wish to convert the data from one type to another then this is the best bet. But please make sure that you have COMPATABLE data in the column.
• Explain Data Mining query Transformation?
This component does prediction on the data or fills gap on it. Some good scenarios uses this component is:
1. Take some input columns as number of children, domestic income, and marital income to predict whether someone owns a house or not.
2. Take prediction what a customer would buy based analysis buying pattern on their shopping cart.
3. Filling blank data or default values when customer doesn’t fill some items in the questionnaire.
• Explain Derived column Transformation?
Derived column creates new column or put manipulation of several columns into new column. You can directly copy existing or create a new column using more than one column also.
• Explain Merge Transformation?
Merge transformation merges two paths into single path. It is useful when you want to break out data into path that handles errors after the errors are handled, the data are merge back into downstream or you want to merge 2 data sources. It is similar with Union All transformation, but Merge has some restrictions :
1. Data should be in sorted order
2. Data type , data length and other meta data attribute must be similar before merged.
• Explain Merge Join Transformation?
Merge Join transformation will merge output from 2 inputs and doing INNER or OUTER join on the data. But if you the data come from 1 OLEDB data source, it is better you join through SQL query rather than using Merge Join transformation. Merge Join is intended to join 2 different data source.
• Explain Multicast Transformation?
This transformation sends output to multiple output paths with no conditional as Conditional Split does. Takes ONE Input and makes the COPY of data and passes the same data through many outputs. In simple Give one input and take many outputs of the same data.
• Explain Percentage row sampling Transformations?
This transformation will take data from source and randomly sampling data. It gives you 2 outputs. First is selected data and second one is unselected data. It is used in situation where you train data mining model. These two are used to take the SAMPLE of data from the input data.
• Explain Sort Transformation?
This component will sort data, similar in TSQL command ORDER BY. Some transformations need sorted data.
• Explain Union all Transformation?
It works in opposite way to Merge transformation. It can take output from more than 2 input paths and combines into single output path.
• What r the possible locations to save SSIS package?
You can save a package wherever you want.
SQL Server
Package Store
File System
• What is a package?
A discrete executable unit of work composed of a collection of control flow and other objects, including data sources, transformations, process sequence, and rules, errors and event handling, and data destinations.
• What is a workflow in SSIS?
A workflow is a set of instructions on how to execute tasks.
(It is a set of instructions on how to execute tasks such as sessions, emails and shell commands. a workflow is created form work flow mgr.
• What is the diff between control flow Items and data flow Items?
The control flow is the highest level control process. It allows you to manage the run-time process activities of data flow and other processes within a package.
When we want to extract, transform and load data within a package. You add an SSIS dataflow task to the package control flow.
• What are the main component of SSIS(project-architecture)?
SSIS archItecture has 4 main components
1.ssis service
2.ssis runtime engine & runtime executables
3.ssis dataflow engine & dataflow components
4.ssis clients
• Different components in SSIS package?
Control flow
Data flow
Event handler
Package explorer
• What are Connection Managers?
It is a bridge b/w package object and physical data. It provides logical representation of a connection at design time the properties of the connection mgr describes the physical connection that integration services creates when the package is run.
• What is environment variable in SSIS?
An environment variable configuration sets a package property equal to the value in an environment variable.
Environmental configurations are useful for configuring properties that are dependent on the computer that is executing the package.
• How to provide securIty to packages?
We can provide security in two ways
1. Package encryption
2. Password protection.
• What are Precedence constraints?
Constraints that link executable, container, and tasks wIthin the package control flow and specify condItion that determine the sequence and condItions for determine whether executable run.
• What is Design time Deployment in SSIS ?
When you run a package from with in BIDS,it is built and temporarily deployed to the folder. By default the package will be deployed to the BIN folder in the Package’s Project folder and you can configure for custom folder for deployment. When the Package’s execution is completed and stopped in BIDS,the deployed package will be deleted and this is called as Design Time Deployment.

SSIS - SQL Server Integration Services
Q: What is SSIS? How it related with SQL Server.
SQL Server Integration Services (SSIS) is a component of SQL Server which can be used to perform a wide range of Data Migration and ETL operations. SSIS is a component in MSBI process of SQL Server.
This is a platform for Integration and Workflow applications. It is known for a fast and flexible OLTP and OLAP extensions used for data extraction, transformation, and loading (ETL). The tool may also be used to automate maintenance of SQL Server databases and multidimensional data sets.
Q: What are the tools associated with SSIS?
We use Business Intelligence Development Studio (BIDS) and SQL Server Management Studio (SSMS) to work with Development of SSIS Projects.
We use SSMS to manage the SSIS Packages and Projects.
Q: What are the differences between DTS and SSIS
Data Transformation Services
SQL Server Integration Services
Limited Error Handling
Complex and powerful Error Handling
Message Boxes in ActiveX Scripts
Message Boxes in .NET Scripting
No Deployment Wizard
Interactive Deployment Wizard
Limited Set of Transformation
Good number of Transformations
NO BI functionality
Complete BI Integration

Q: What is a workflow in SSIS ?
Workflow is a set of instructions on to specify the Program Executor on how to execute tasks and containers within SSIS Packages.
 
Q: What is the control flow?
A control flow consists of one or more tasks and containers that execute when the package runs. To control order or define the conditions for running the next task or container in the package control flow, we use precedence constraints to connect the tasks and containers in a package. A subset of tasks and containers can also be grouped and run repeatedly as a unit within the package control flow. SQL Server 2005 Integration Services (SSIS) provides three different types of control flow elements: Containers that provide structures in packages, Tasks that provide functionality, and Precedence Constraints that connect the executables, containers, and tasks into an ordered control flow.
Q: What is a data flow?
data flow consists of the sources and destinations that extract and load data, the transformations that modify and extend data, and the paths that link sources, transformations, and destinations The Data Flow task is the executable within the SSIS package that creates, orders, and runs the data flow. A separate instance of the data flow engine is opened for each Data Flow task in a package. Data Sources, Transformations, and Data Destinations are the three important categories in the Data Flow.
Q: How does Error-Handling work in SSIS
When a data flow component applies a transformation to column data, extracts data from sources, or loads data into destinations, errors can occur. Errors frequently occur because of unexpected data values.
Type of typical Errors in SSIS:
-Data Connection Errors, which occur incase the connection manager cannot be initialized with the connection string. This applies to both Data Sources and Data Destinations along with Control Flows that use the Connection Strings.
-Data Transformation Errors, which occur while data is being transformed over a Data Pipeline from Source to Destination.
-Expression Evaluation errors, which occur if expressions that are evaluated at run time perform invalid
Q: What is environment variable in SSIS?
An environment variable configuration sets a package property equal to the value in an environment variable.
Environmental configurations are useful for configuring properties that are dependent on the computer that is executing the package.
Q: What are the Transformations available in SSIS?

AGGEGATE  - It applies aggregate functions to Record Sets to produce new output records from aggregated values.
AUDIT  - Adds Package and Task level Metadata - such as Machine Name, Execution Instance, Package Name, Package ID, etc..
CHARACTER MAP - Performs SQL Server level makes string data changes such as changing data from lower case to upper case.
CONDITIONAL SPLIT – Separates available input into separate output pipelines based on Boolean Expressions configured for each output.
COPY COLUMN - Add a copy of column to the output we can later transform the copy keeping the original for auditing.
DATA CONVERSION - Converts columns data types from one to another type. It stands for Explicit Column Conversion.
DATA MINING QUERY – Used to perform data mining query against analysis services and manage Predictions Graphs and Controls.
DERIVED COLUMN - Create a new (computed) column from given expressions.
EXPORT COLUMN – Used to export a Image specific column from the database to a flat file.
FUZZY GROUPING – Used for data cleansing by finding rows that are likely duplicates.
FUZZY LOOKUP -  Used for Pattern Matching and Ranking based on fuzzy logic.
IMPORT COLUMN - Reads image specific column from database onto a flat file.
LOOKUP - Performs the lookup (searching) of a given reference object set against a data source. It is used for exact matches only.
MERGE - Merges two sorted data sets into a single data set into a single data flow.
MERGE JOIN - Merges two data sets into a single dataset using a join junction.
MULTI CAST - Sends a copy of supplied Data Source onto multiple Destinations.
ROW COUNT - Stores the resulting row count from the data flow / transformation into a variable.
ROW SAMPLING - Captures sample data by using a row count of the total rows in dataflow specified by rows or percentage.
UNION ALL - Merge multiple data sets into a single dataset.
PIVOT – Used for Normalization of data sources to reduce analomolies by converting rows into columns
UNPIVOT – Used for demoralizing the data structure by converts columns into rows incase of building Data Warehouses.
Q: How to log SSIS Executions?
SSIS includes logging features that write log entries when run-time events occur and can also write custom messages. This is not enabled by default. Integration Services supports a diverse set of log providers, and gives you the ability to create custom log providers. The Integration Services log providers can write log entries to text files, SQL Server Profiler, SQL Server, Windows Event Log, or XML files. Logs are associated with packages and are configured at the package level. Each task or container in a package can log information to any package log. The tasks and containers in a package can be enabled for logging even if the package itself is not.
Q: How do you deploy SSIS packages.
BUILDing SSIS Projects provides a Deployment Manifest File. We need to run the manifest file and decide whether to deploy this onto File System or onto SQL Server [ msdb]. SQL Server Deployment is very faster and more secure then File System Deployment. Alternatively, we can also import the package from SSMS from File System or SQl Server.
Q: What are variables and what is variable scope ?
Variables store values that a SSIS package and its containers, tasks, and event handlers can use at run time. The scripts in the Script task and the Script component can also use variables. The precedence constraints that sequence tasks and containers into a workflow can use variables when their constraint definitions include expressions. Integration Services supports two types of variables: user-defined variables and system variables. User-defined variables are defined by package developers, and system variables are defined by Integration Services. You can create as many user-defined variables as a package requires, but you cannot create additional system variables.

QCan you name five of the Perfmon counters for SSIS and the value they provide?
§  SQLServer:SSIS Service
§  SSIS Package Instances
SQLServer:SSIS Pipeline 
BLOB bytes read - Total bytes read from binary large objects during the monitoring period. 
BLOB bytes written - Total bytes written to binary large objects during the monitoring period. 
BLOB files in use - Number of binary large objects files used during the data flow task during the monitoring period. 
Buffer memory - The amount of physical or virtual memory used by the data flow task during the monitoring period. 
Buffers in use - The number of buffers in use during the data flow task during the monitoring period. 
Buffers spooled - The number of buffers written to disk during the data flow task during the monitoring period. 
Flat buffer memory - The total number of blocks of memory in use by the data flow task during the monitoring period. 
Flat buffers in use - The number of blocks of memory in use by the data flow task at a point in time. 
Private buffer memory - The total amount of physical or virtual memory used by data transformation tasks in the data flow engine during the monitoring period. 
Private buffers in use - The number of blocks of memory in use by the transformations in the data flow task at a point in time. 
Rows read - Total number of input rows in use by the data flow task at a point in time. 
Rows written - Total number of output rows in use by the data flow task at a point in time.


Question: Have you used SSIS Framework?
Answer: This is common term in SSIS world which just means that you have templates that are set up to perform routine tasks like logging, error handling etc. Yes answer would usually indicate experienced person, no answer is still fine if your project is not very mission critical.




Question: How many difference source and destinations have you used?
Answer: It is very common to get all kinds of sources so the more the person worked with the better for you. Common ones are SQL Server, CSV/TXT, Flat Files, Excel, Access, Oracle, MySQL but also Salesforce, web data scrapping.


Question: How do you apply business rules in SSIS (Transformations….Specific calculations but also cleansing)?
Answer: Some people use SSIS only to extract data and then go with stored procedures only….they are usually missing the point of the power of SSIS. Which allows to create "a flow" and on each step applies certain rules this greatly simplifies the ETL process and simplicity is very good.

Question: How to quickly load data into sql server table?
Answer: Fast Load option. This option is not set by default so most developers know this answer as otherwise the load is very slow.

Question: Give example of handling data quality issues?
Answer: Data Quality is almost always a problem and SSIS handles it very well. Examples include importing customers from different sources where customer name can be duplicates. For instance you can have as company name: SQL Server Business Intelligence but also SQL Server BI or SQL Server BI LTD or SQL Server BI Limited or intelligence (with one l). There are different ways to handle it. Robust and time consuming is to create a table with or possible scenarios and update it after each update. You can also use fuzzy grouping which is usually easy to implement and will make usually very good decisions but it is not 100% accurate so this approach has to be justified. Other typical quality issues are nulls (missing values), outliers (dates like 2999 or types like 50000 instead of 5000 especially important if someone is adjusting the value to get bigger bonus), incorrect addresses
and these are either corrected during ETL, ignored, re-directed for further manual updates or it fails the packages which for big processes is usually not practised.



Question: What is XMLify component?
Answer: It is 3rd party free component used rather frequently to output errors into XML field which saves development time.

Question: What command line tools do you use with SSIS ?
Answer: dtutil (deployment), dtexec (execution), dtexecui (generation of execution code)

Problem
When you are preparing for an SSIS interview you need to understand what questions could be asked in the interview. In this tip series, I will try to cover as much as I can to help you prepare for SSIS interview. In this tip we will cover SSIS basics and SSIS event logging.
Solution
What are SSIS Connection Managers?
  • When we talk of integrating data, we are actually pulling data from different sources and writing it to a destination. But how do you get connected to the source and destination systems? This is where the connection managers come into the picture. Connection manager represent a connection to a system which includes data provider information, the server name, database name, authentication mechanism, etc.
What is the RetainSameConnection property and what is its impact?
  • Whenever a task uses a connection manager to connect to source or destination database, a connection is opened and closed with the execution of that task. Sometimes you might need to open a connection, execute multiple tasks and close it at the end of the execution. This is where RetainSameConnection property of the connection manager might help you. When you set this property to TRUE, the connection will be opened on first time it is used and remain open until execution of the package completes.
What are a source and destination adapters?
  • A source adaptor basically indicates a source in Data Flow to pull data from. The source adapter uses a connection manager to connect to a source and along with it you can also specify the query method and query to pull data from the source.
  • Similar to a source adaptor, the destination adapter indicates a destination in the Data Flow to write data to. Again like the source adapter, the destination adapter also uses a connection manager to connect to a target system and along with that you also specify the target table and writing mode, i.e. write one row at a time or do a bulk insert as well as several other properties.
  • Please note, the source and destination adapters can both use the same connection manager if you are reading and writing to the same database.
What is the Data Path and how is it different from a Precedence Constraint?
  • Data Path is used in a Data Flow task to connect to different components of a Data Flow and show transition of the data from one component to another. A data path contains the meta information of the data flowing through it, such as the columns, data type, size, etc. When we talk about differences between the data path and precedence constraint; the data path is used in the data flow, which shows the flow of data. Whereas the precedence constraint is used in control flow, which shows control flow or transition from one task to another task.
What is a Data Viewer utility and what it is used for?
  • The data viewer utility is used in Business Intelligence Development Studio during development or when troubleshooting an SSIS Package. The data viewer utility is placed on a data path to see what data is flowing through that specific data path during execution. The data viewer utility displays rows from a single buffer at a time, so you can click on the next or previous icons to go forward and backward to display data.
What is an SSIS breakpoint? How do you configure it? How do you disable or delete it?
  • A breakpoint allows you to pause the execution of the package in Business Intelligence Development Studio during development or when troubleshooting an SSIS Package. You can right click on the task in control flow, click on Edit Breakpoint menu and from the Set Breakpoint window, you specify when you want execution to be halted/paused. For example OnPreExecute, OnPostExecute, OnError events, etc. To toggle a breakpoint, delete all breakpoints and disable all breakpoints go to the Debug menu and click on the respective menu item. You can event specify different conditions to hit the breakpoint as well. 

What are the different SSIS log providers?
  • There are several places where you can log execution data generated by an SSIS event log:
    • SSIS log provider for Text files
    • SSIS log provider for Windows Event Log
    • SSIS log provider for XML files
    • SSIS log provider for SQL Profiler
    • SSIS log provider for SQL Server, which writes the data to the msdb..sysdtslog90 or msdb..sysssislog table depending on the SQL Server version.
How do you enable SSIS event logging?
  • SSIS provides a granular level of control in deciding what to log and where to log. To enable event logging for an SSIS Package, right click in the control flow area of the package and click on Logging. In the Configure SSIS Logs window you will notice all the tasks of the package are listed on the left side of the tree view. You can specifically choose which tasks you want to enable logging. On the right side you will notice two tabs; on the Providers and Logs tab you specify where you want to write the logs, you can write it to one or more log providers together. On the Details tab you can specify what events do you want to log for the selected task.
  • Please note, enabling event logging is immensely helpful when you are troubleshooting a package, but also incurs additional overhead on SSIS in order to log the events and information. Hence you should only enabling event logging when needed and only choose events which you want to log. Avoid logging all the events unnecessarily.
What is the LoggingMode property?
  • SSIS packages and all of the associated tasks or components have a property called LoggingMode. This property accepts three possible values:
  • Enabled - to enable logging of that component,
  • Disabled - to disable logging of that component and
  • UseParentSetting - to use parent's setting of that component to decide whether or not to log the data.
What Are The Isolation Levels in SSIS?
ReadUncommited:  Does not lock the records being read.  This means that an uncommitted change can be read and then rolled back by another client, resulting in a local copy of a record that is not consistent with what is stored in the database.  This is called a dirty read because the data is inconsistent.
Chaos:  Behaves the same way as ReadUncommitted, but checks the isolation level of other pending transactions during a write operation so that transactions with more restrictive isolation levels are not overwritten.
ReadCommitted:  Locks the records being read and immediately frees the lock as soon as the records have been read.  This prevents any changes from being read before they are committed, but it does not prevent records from being added, deleted, or changed by other clients during the transaction.
RepeatableRead:  Locks the records being read and keeps the lock until the transaction completes.  This ensures that the data being read does not change during the transaction.
Serializable:  Locks the entire data set being read and keeps the lock until the transaction completes.  This ensures that the data and its order within the database do not change during the transaction.
Snapshot:  The data read within a transaction will never reflect changes made by other simultaneous transactions.  The transaction uses the data row versions that exist when the transaction begins.  No locks are placed on the data when it is read.

1) Improved Parallelism of Execution Trees:

The biggest performance improvement in the SSIS 2008 is incorporation of parallelism in the processing of execution tree. In SSIS 2005, each execution tree used a single thread whereas in SSIS 2008 , the Data flow engine is redesigned to utilize multiple threads and take advantage of dynamic scheduling to execute multiple components in parallel, including components within the same execution tree

2) Any .NET language for Scripting:

SSIS 2008 is incorporated with new Visual Studio Tool for Application(VSTA) scripting engine. Advantage of VSTA is it enables user to use any .NET language for scripting.

3) New ADO.NET Source and Destination Component:

SSIS 2008 gets a new Source and Destination Component for ADO.NET Record sets.

4) Improved Lookup Transformation: 

In SSIS 2008, the Lookuo Transformation has faster cache loading and lookup operations. It has new caching options, including the ability for the reference dataset to use a cache file(.caw) accessed by the Cache Connectin Manager. In addition same cache can be shared between multiple Lookup Transformations.

5) New Data Profiling Task:

SSIS 2008 has a new debugging aid Data Profiling Task that can help user analyze the data flows occurring in the package.In many cases, execution errors are caused by unexpected variations in the data that is being transferred. The Data Profiling Task can help users to discover the cource of these errors by giving better visibility into the data flow.

6) New Connections Project Wizard:

One of the main usability enhancement to SSIS 2008 is the new Connections Project Wizard. The Connections Project Wizard guides user through the steps required to create source and destinations.
Q1 Explain architecture of SSIS?
SSIS architecture consists of four key parts:
a) Integration Services service: monitors running Integration Services packages and manages the storage of packages.
b) Integration Services object model: includes managed API for accessing Integration Services tools, command-line utilities, and custom applications.
c) Integration Services runtime and run-time executables: it saves the layout of packages, runs packages, and provides support for logging, breakpoints, configuration, connections, and transactions. The Integration Services run-time executables are the package, containers, tasks, and event handlers that Integration Services includes, and custom tasks.
d) Data flow engine: provides the in-memory buffers that move data from source to destination.



Q2 How would you do Logging in SSIS?
Logging Configuration provides an inbuilt feature which can log the detail of various events like onError, onWarning etc to the various options say a flat file, SqlServer table, XML or SQL Profiler.

Q3 How would you do Error Handling?
A SSIS package could mainly have two types of errors
a) Procedure Error: Can be handled in Control flow through the precedence control and redirecting the execution flow.
b) Data Error: is handled in DATA FLOW TASK buy redirecting the data flow using Error Output of a component.


Q4 How to pass property value at Run time? How do you implement Package Configuration?
A property value like connection string for a Connection Manager can be passed to the pkg using package configurations.Package Configuration provides different options like XML File, Environment Variables, SQL Server Table, Registry Value or Parent package variable.

Q5 How would you deploy a SSIS Package on production?
 A) Through Manifest
1. Create deployment utility by setting its propery as true .
2. It will be created in the bin folder of the solution as soon as package is build.
3. Copy all the files in the utility and use manifest file to deply it on the Prod.

B) Using DtsExec.exe utility
C)Import Package directly in MSDB from SSMS by logging in Integration Services.

Q6 Difference between DTS and SSIS?
Every thing except both are product of Microsoft :-).

Q7 What are new features in BIDS ?


Q9 What is Execution Tree?
Execution trees demonstrate how package uses buffers and threads. At run time, the data flow engine breaks down Data Flow task operations into execution trees. These execution trees specify how buffers and threads are allocated in the package. Each tree creates a new buffer and may execute on a different thread. When a new buffer is created such as when a partially blocking or blocking transformation is added to the pipeline, additional memory is required to handle the data transformation and each new tree may also give you an additional worker thread.



Q11 You may get a question stating a scenario and then asking you how would you create a package for that e.g. How would you configure a data flow task so that it can transfer data to different table based on the city name in a source table column?

Q13 Difference between Unionall and Merge Join?
a) Merge transformation can accept only two inputs whereas Union all can take more than two inputs

b) Data has to be sorted before Merge Transformation whereas Union all doesn't have any condition like that.


Q14 May get question regarding what X transformation do?Lookup, fuzzy lookup, fuzzy grouping transformation are my favorites.
For you.

Q15 How would you restart package from previous failure point?What are Checkpoints and how can we implement in SSIS?
When a package is configured to use checkpoints, information about package execution is written to a checkpoint file. When the failed package is rerun, the checkpoint file is used to restart the package from the point of failure. If the package runs successfully, the checkpoint file is deleted, and then re-created the next time that the package is run.

Q16 Where are SSIS package stored in the SQL Server?
MSDB.sysdtspackages90 stores the actual content and ssydtscategories, sysdtslog90, sysdtspackagefolders90, sysdtspackagelog, sysdtssteplog, and sysdtstasklog do the supporting roles.


Q17 How would you schedule a SSIS packages?
Using SQL Server Agent. Read about Scheduling a job on Sql server Agent

Q18 Difference between asynchronous and synchronous transformations?
Asynchronous transformation have different Input and Output buffers and it is up to the component designer in an Async component to provide a column structure to the output buffer and hook up the data from the input.

Q19 How to achieve parallelism in SSIS?
Parallelism is achieved using MaxConcurrentExecutable property of the package. Its default is -1 and is calculated as number of processors + 2.

-More questions added-Sept 2011 
Q20 How do you do incremental load?
Fastest way to do incremental load is by using Timestamp column in source table and then storing last ETL timestamp, In ETL process pick all the rows having Timestamp greater than the stored Timestamp so as to pick only new and updated records 
Question 3 – Can you explain different options for dynamic
configurations in SSIS?
Answer: Use an XML file
Use custom variables
Use a database per environment with the variables
Use a centralized database with all variables
Question 4 – How do you upgrade an SSIS Package?
Answer: Depending on the complexity of the package, one or two techniques are typically used:
1) Recode the package based on the functionality in SQL Server DTS
2) Use the Migrate DTS 2000 Package wizard in BIDS then recode any portion of the package that is not accurate

)         What is Business Intelligence (BI)?
Business Intelligence (BI) refers to technologies, applications and practices for the collection, integration, analysis, and presentation of business information and sometimes to the information itself. The purpose of BI is to support better business decision making. Thus, BI is also described as a decision support system (DSS).
BI systems provide historical, current, and predictive views of business operations, most often using data that has been gathered into a data warehouse or a data mart and occasionally working from operational data.

2)         What is SQL Server Integration Services (SSIS)?
            SSIS is designed for data integration and processing applications. For example, Processing of data into a data mart or data warehouse, where data is used exclusively for business intelligence (BI) analytics and reporting. OR, SSIS can be used to move data from legacy systems into new systems during application migrations, to integrate data from multiple systems, to cleanse data, to import data, to define workflows to automate updating multi-dimensional cubes and automating maintenance tasks for SQL Server databases.

3)         Architecture of Integration Services
            SSIS architecture has five main components:
1)     SSIS Designer
SSIS Designer is a graphical tool that you can use to create and maintain Integration Services packages.
2)     Runtime engine
The Integration Services runtime saves the layout of packages, runs packages, and provides support for logging, breakpoints, configuration, connections, and transactions.
3)     Tasks and other executables
The Integration Services run-time executables are the package, containers, tasks, and event handlers that Integration Services includes. Run-time executables also include custom tasks that you develop.
4)     Data Flow engine and Data Flow components
The Data Flow task encapsulates the data flow engine. The data flow engine provides the in-memory buffers that move data from source to destination, and calls the sources that extract data from files and relational databases. The data flow engine also manages the transformations that modify data, and the destinations that load data or make data available to other processes
5)     Integration Services Service
The Integration Services service lets you use SQL Server Management Studio to monitor running Integration Services packages and to manage the storage of packages.
   
4)         What is ETL?
ETL is an acronym for Extract, Transform, and Load and describes the processes that take place in data warehousing environments for extracting data from source transaction systems; transforming, cleaning, deduplicating, and conforming the data; and finally loading it into cubes or other analysis destinations.

5)         What are the tabs available in SSIS Designer?
SSIS Designer has four permanent tabs: ControlFlow | DataFlow | Event Handlers |Package Explorer. At run time a fifth tab appears that shows the execution progress of a package: |Progress
In SQL Server 2012 there are five permanent tabs. The additional tab name is: |Parameter.

6)         What is the difference between control flow and data flow?
The control flow is the workflow engine and contains control flow tasks, containers, and precedence constraints, which manage when tasks and containers execute.
The data flow, in contrast, is directly related to processing and transforming data from sources to destinations.

7)         What are Precedence constraints?
Precedence constraints are the control flow connectors between tasks and containers. Usually, these connectors are green, red, or blue, indicating the execution success, failure, or completion, respectively. Precedence constraints allow you to define the logical sequence of tasks in the order they should be executed. SSIS also provides advanced constraint capabilities, which allow conditional and/or expression evaluation criteria.

8)         What are Data Paths?
The connectors between objects in the data flow are data paths.
There are two primary paths in the data flow:
Data paths The green connectors that go from one component to another. For these paths, the rows that have successfully gone through a component are output, as are error rows when the error output is set to ignore failures.
Error paths The red connectors between data flow components. They contain data rows that fail in a component when the error rows are set to be redirected.

9)         What is a Data Viewer utility and what it is used for?
The data viewer utility is used for troubleshooting data issues especially when you are not able to easily identify the problem row or issue. SSIS includes the capability to watch rows in the data flow as they are passing through the pipeline. SSIS implements this capability through data viewers, a feature you can use when you are running packages in BIDS during development. For any path in the data flow, you can add a data viewer that pauses the execution of the data flow and displays the data in the data viewer in one of four formats.Grid format is the only available format in SSIS 2012.

10)      What is a breakpoint in SSIS?
By using breakpoints during package debugging, you can pause a package during execution to observe package state information so that you can troubleshoot potential problems. Breakpoints function available only in the control flow and not in the data flow.

11)      How do you setup the breakpoints in SSIS?
To set a breakpoint, select the task or container, and either press F9 or navigate to the Debug/Toggle Breakpoint menu. You can set multiple breakpoints in a package, and you can embed a breakpoint within a Script Task at a line of code.

12)      How do you see the variable values during debug?
Open the Locals window to see all the variable values and the package status. To open this window, choose Windows and then Locals from the Debug menu.

13)      List some of the Common Tasks in SSIS
Execute SQL Task
Execute Process Task
Execute Package Task
Data Flow Task
File System Task
FTP Task
Script Task
Send Mail Task
Web Service Task
Bulk Insert Task
Data Profiling Task
Execute DTS 2000Package Task
ActiveX Script Task
Analysis Services Execute DDL Task
Analysis Services Processing Task

14)      What are the available Integration Services Containers?
There are three primary containers in SSIS: a Sequence Container, a For Loop Container, and a Foreach Loop Container.
1)     Sequence Container Groups tasks and containers into control flows that are subsets of the package control flow, and lets you apply transactions or assign logging to the container.
2)     For Loop Container Provides the same functionality as the Sequence Container except that it also lets you     run the tasks within it multiple times based on an evaluation condition, such as looping from 1 to 10.
3)     Foreach Loop Container Also allows looping, but instead of providing a condition expression, you loop over a set of objects, such as files in a folder or a record set.
4)      Apart from the above mentioned containers, there is one more container called the Task Host Container which is not visible from the IDE, but every task is contained in it (the default container for all the tasks).

15)      What are variables and what is variable scope?
A variable is used to store values .Within SSIS, there are two types of variables: System Variable and User Variable. System variables are not editable but can be referenced within tasks and containers. Variables are referenced as User::[VariableName] or System::[VariableName].
The scope defines at what level within a package the variable is created. For example package level variables are accessible to all the tasks in the package and container level variables are accessible only to those tasks that are within the container.

16)      What are the type’s objects available in Dataflow task toolbox?
The Data Flow Task has three types of objects in the toolbox:
1)         Data flow source adapters
2)         Data flow transformations
3)         Data flow destination adapters

17)      What are the Data Flow Source adapters available in SSIS?
ADO.NET Source    Provides connections to tables or queries through an ADO.NET provider.
Excel Source           Allows extractions from an Excel worksheet defined in an Excel file.
Flat File Source       Connects to a delimited or fixed-width file created with different code pages.
OLE DB Source      Connects to installed OLE DB providers, such as SQL Server, Access, SSAS, and Oracle.
Raw File Source     Stores native SSIS data in a binary file type useful for data staging.
XML Source             Allows raw data to be extracted from an XML file; requires an XML schema to define data associations.

18)      What are the Data FLOW Destination adapters available in SSIS?
ADO.NET Destination            Allows insertion of data by using an ADO.NET provider.
Data Mining Model Training Lets you pass data from the data flow into a data mining model in SSAS.
DataReader Destination Lets you put data in an ADO.NET record set that can be programmatically referenced.
Dimension Processing Lets SSAS dimensions be processed directly from data flowing through the data flow.
Excel Destination Used for inserting data into Excel, including Excel 2007.
Flat File Destination Allows insertion of data to a fl at file such as a comma delimited or tab-delimited file.
OLE DB Destination Uses the OLE DB provider to insert rows into a destination system that allows an OLE DB connection.
Partition Processing Allows SSAS partitions to be processed directly from data fl owing through the data flow.
Raw File Destination Stores native SSIS data in a binary file type useful for data staging.
Recordset Destination Takes the data flow data and creates a record set in a package variable of typeobject.
SQL Server Compact Destination  Lets you send data to a mobile device running SQLMobile.
SQL Server Destination Provides a high-speed destination specific to SQL Server 2008 if the package is running on SQL Server.

19)      What is a Transformation?
Transformations give you the ability to modify and manipulate data in the data flow. A transformation performs an operation either on one row of data at a time or on several rows of data at once.

20)      What are the Data Flow Transformations adapters available in SSIS?

Logical row-level transformations

1) Audit Adds additional columns to each row based on system package variables such asExecutionStartTime and PackageName.
2) Cache Transform Allows data that will be used in a Lookup Transformation to be cached and available for multiple Lookup components.
3) Character Map Performs common text operations, such as Uppercase, and allows advanced linguistic bit conversion operations.
4) Copy Column Duplicates column values in each row to new named columns. Data Conversion Creates new columns in each row based on new data types converted from other columns—for example, converting text to numeric.
5) Derived Column Uses the SSIS Expression language to perform in-place calculations on existing values; alternatively, allows the addition of new columns based on expressions and calculations from other columns and variables.
6) Export Column Exports binary large object (BLOB) columns, one row at a time, to a file.
7) Import Column Loads binary files such as images into the pipeline; intended for a BLOB data type destination.
8) Row Count Tracks the number of rows that flow through the transformation and stores the number in a package variable after the final row.

Multi -input or multi -output transformations

1) Conditional Split Routes or filters data based on Boolean expressions to one or more outputs, from which each row can be sent out only one output path.
2) Lookup Allows matching between pipeline column values to external database tables; additional columns can be added to the data flow from the external table.
3) Merge Combines the rows of two similar sorted inputs, one on top of the other, based on a defined sort key.
4) Merge Join Joins the rows of two sorted inputs based on a defined join column(s), adding columns from each source.
5) Multicast Generates one or more identical outputs, from which every row is sent out every output.
6) Union All Combines one or more similar inputs, stacking rows one on top of another, based on matching columns

Multi -row transformations

1) Aggregate Associates records based on defined groupings and generates aggregations such asSUMMAXMIN, and COUNT.
2) Percent Sampling Filters the input rows by allowing only a defined percent to be passed to the output path.
3) Pivot Takes multiple input rows and pivots the rows to generate an output with more columns based on the original row values.
4) Row Sampling Outputs a fixed number of rows, sampling the data from the entire input, no matter how much larger than the defined output the input is.
5) Sort Orders the input based on defined sort columns and sort direction and allows the removal of duplicates across the sort columns.
6) Unpivot Takes a single row and outputs multiple rows, moving column values to the new row based on defined columns.

In the cases of the Sort, Aggregate, and Row Sampling Transformations, all the input rows are blocked, allowing the transformations to perform the work before sending rows down the output path. These transformations often require more server resources, memory, and processor capacity than do other transformations.

Advanced data -preparation transformations

1) OLE DB Command Performs database operations such as updates and deletes, one row at a time, based on mapped parameters from input rows.
2) Slowly Changing Dimension Processes dimension changes, including tracking dimension history and updating dimension values. The Slowly Changing Dimension Transformation handles these common dimension change types: Historical Attributes, Fixed Attributes, and Changing Attributes.
3) Data Mining Query Applies input rows against a data mining model for prediction.
4) Fuzzy Grouping Associates column values with a set of rows based on similarity, for data cleansing.
5) Fuzzy Lookup Joins a data flow input to a reference table based on column similarity. The Similarity Threshold setting specifies the closeness of allowed matches—a high setting means that matching values are closer in similarity.
6) Script Component Provides VB.NET scripting capabilities against rows, columns, inputs, and outputs in the data flow pipeline.
7) Term Extraction Analyzes text input columns for English nouns and noun phrases.
8) Term Lookup Analyzes text input columns against a user-defined set of words for association.

21)      Difference between Merge and Union All Transformation?
Both Merge and Union All transform provides a way to combine multiple data flows into one, but there are some differences between two.
1) Merge transform can accept only two inputs where as Union all can combine more than two inputs.
2) The Union All transform does not have any requirements to be sorted like the Merge transform.
3) The Merge transform is useful when you want to retain the sorted order of rows even when the   rows must be processed in different execution paths.
4) The Union All transform is useful for rejoining previously split paths—for example, paths that were split upstream by a Conditional Split or Multicast transform.

22)      Difference between Merge and Merge Join Transformation?
The Merge Join transform is similar to the Merge transform in that it requires the inputs to be sorted and combines the rows from the inputs into one output. However, whereas the Merge transform always produces the same amount of rows as the number flowing into both inputs, the Merge Join transform provides the added capability to do joins, which might increase or decrease the total resulting row count.

23)      Is there another way to use Merge Join without sort transformation?
Sort transformation can become slower on large datasets. Merge join requires sorted input. However in order to get better performance we might be able to change the  source itself and sort the input data using TSQL ORDER BY and change the IsSorted and SortKeyPosition properties of Source Adapter.

24)      How can you terminate a dataflow path without a destination?
The Multicast transform can be used for terminating a data flow path without requiring a variable or connection manager. This comes in handy when prototyping data flows. Just make sure the RunInOptimizedMode property is set to FALSE so the Execution Engine won’t trim the Multicast’s upstream execution tree.

25)      What are the performance considerations given while dealing with Slowly Changing Dimension?
The Slowly Changing Dimension Wizard and the Slowly Changing Dimension transformation are general-purpose tools that meet the needs of most users. However, after you start dealing with larger dimensional datasets, you’ll probably want to modify the automatically generated data flow to improve performance.
·   Typically, the slowest components in the Slowly Changing Dimension transformation are the OLE DB Command transformations that perform UPDATEs against a single row at a time. Therefore, the most effective way to improve the performance of the Slowly Changing Dimension transformation is to replace the OLE DB Command transformations. You can replace these transformations with destination components that save all rows to be updated to a staging table. Then, you can add an Execute SQL task that performs a single set-based Transact-SQL UPDATE against all rows at the same time.
·   Consider using a Lookup transform with partial caching turned on right before the SCD to quickly identify rows that don’t exist in the dimension table. Those rows that do not exist can be directly inserted. This effectively eliminates the rows flowing through the New Output from the SCD transform and increases the efficiency of the SCD transform lookup step.
·   Consider alternatives to SCD: In SQL 2008 Merge TSQL construct allows the merging of datasets. Instead of the SCD’s row by row evaluation approach, the MERGE operation is a set based operation which compares the whole of the source dataset to the reference dataset in a single pass. Needs to be aware that, Merge is a bulk operation, a single row will cause failure of the whole batch.

26)      What are the command line tools to execute and manage SQL Server Integration Services packages?
DTExec - is a command line tool for executing SSIS packages. The required switches must be passed into the command for successful execution of the SSIS package.
DTUtil - is a command line tool used for managing, moving, and deleting SSIS packages.

DTExecUI - is the execution utility with a graphical user interface (GUI). You can use it to build command lines for DTExec.exe, execute packages directly, and pass parameters to the package

27)      Can you explain how to setup a checkpoint file in SSIS?
    The following items needs to be configured on the properties tab for SSIS package
1) Set the SaveCheckpoints property at the package level to True. This allows SSIS to save checkpoints during package execution.
2) For the CheckpointFileName property, provide a valid path and file name to a checkpoint file. Packages use files to maintain their state information, so if a package fails and is then restarted, the package can read the checkpoint file to determine where it left off and to track the state information at the last successful task.
3)  Set the CheckpointUsage property to If Exists, which causes the package to run from the beginning if the checkpoint file is not present or to run from the identified point if the file exists.
After you enable checkpoints in a package, the final step is to set checkpoints at the various tasks within your package. To do this, set the FailPackageOnFailure property at each task or container to True.

28)      What are the most common log events in SSIS?
The most common log events are:
OnError
OnPostExecute
OnPreExecute
On-Progress
OnTaskFailed
The OnPreExecute and OnPostExecute events are captured at the start and end of each container and task when the package runs and allows a method to determine the execution times of each task.

29)      If you select SQL Server as the log provider, where will be the log details saved?
If you select SQL Server as the log provider, the log details are stored in a system table named dbo.sysssislog.

30)      What are the formats available in Flat file Connection Manager?
There are three formats available: Delimited, Fixed Width, and Ragged Right.
Delimited  The columns have some delimiter, such as a semicolon, tab, or comma. Comma separated value   (CSV) files are an example of delimited data.
 Fixed Width     The data is aligned into columns whose width is the same for all rows.
 Ragged Right The same as fixed width with one exception: Each row is delimited, by default, with a CR/LF combination but can be configured to use other delimiting characters. Ragged-right flat files are a little more flexible because the last column can be of different widths.

31)      How to strip out double quotes from an import file in SQL Server Integration Services?
On the Flat File Connection Manager Editor, enter quotes into the Text qualifier.
The text qualifier specifies what character the file uses to indicate the beginning and end of a string. Typically, the character is a double quote or single quote, but other characters can be used as well.

32)      How do you upgrade an SSIS Package?
There are several options available for migrating to Integration Services 2008
The Upgrade Advisor tells you what to expect when you migrate.
The Migration Wizard gets you started on migrating DTS packages, and the Execute DTS 2000 Packages Task allows you to embed DTS packages into Integration Services packages.

SSIS 2005 package upgrades are smooth and mostly trouble free. The easiest ways are:
By loading an SSIS 2005 project inside BIDS
By right-clicking on the SSIS Packages folder of an SSIS project and selecting the Upgrade All Packages menu option

33)      Which option needs to be enabled to ensure the rows are inserted quickly in a Dataflow Destination?
Edit the OLE DB Destination and set the Data Access Mode to Table or view—fast Load. When the OLE DB Destination is not configured with fast load, only one row at a time will be inserted into the destination table.
34)      What does RunInOptimizedMode setting meant?
RunInOptimizedMode is the data flow–specific properties and tells the Data Flow Task to attempt to trim unused components. The data flow engine trims components by ignoring them. You might experience some performance gains from this setting because it eliminates components that have no side effects.

35)      What is new in Integration Services 2012

36)      What are the discontinued functionalities in Integration Services (SQL Server 2012)?
Data Source Views have been discontinued in this release.
With SQL Server 12, data sources and data source views cannot be added to a project. Instead, these connections can be added to the individual packages. In prior versions, data sources could be added as connection managers and were allowed to refer to the included DSVs as the tables in source components. Using this methodology to access data on SQL Server was not the optimal way to extract data.
Data Viewer
Previous versions of SSIS included four types of data viewer—the Grid, Histogram, Scatterplot, and Column Chart data viewers. Because the Grid data viewer is the most commonly used, the SSIS team removed the other types of data viewers from SSIS 12.
Data Transformation Services (DTS)
DTS was supported as a legacy application in SQL Server 2005, 2008, and 2008 R2. DTS is no longer supported with this newest release of SQL Server. This means that the Execute DTS 2000 Package task, the DTS runtime and API, and Package Migration Wizard are all going away.
Additionally, the ActiveX Script task, which was provided strictly for DTS support, will be removed. Many of the tasks that ActiveX Script tasks were used for can be handled with precedence constraints, while more complex tasks can be rewritten as Script tasks.

What is the difference between data warehousing and business intelligence?
Data warehousing deals with all aspects of managing the development, implementation and operation of a data warehouse or data mart including meta data management, data acquisition, data cleansing, data transformation, storage management, data distribution, data archiving, operational reporting, analytical reporting, security management, backup/recovery planning, etc. Business intelligence, on the other hand, is a set of software tools that enable an organization to analyze measurable aspects of their business such as sales performance, profitability, operational efficiency, effectiveness of marketing campaigns, market penetration among certain customer groups, cost trends, anomalies and exceptions, etc. Typically, the term “business intelligence” is used to encompass OLAP, data visualization, data mining and query/reporting tools


4 comments:

  1. This is a great post for gathering a lot of information related to SSIS and the recurring interview questions.

    ReplyDelete
  2. I really feel this is the best way in which one can look for more aspects of an actual problem.SSIS as a tool is really helpful to look and find out the best.

    SSIS Upsert


    SSIS Upsert

    ReplyDelete