What is Azure Data Factory?
Azure Data Factory is a cloud based data integration service that allows you to create data-driven workflows in the cloud for orchestrating and automating data movement and data transformation.
It supports us for implementing and automating ELT and ETL solutions. It is an azure service and it allows us to create data driven workflow for configuring our data movement and transformations. It is very similar to SSIS. So we can create a workflow setting up with a various components available for ELT and ETL solutions and we can schedule it just like way we have been doing with SSIS and SQL agent.
Azure Data factory supports around the 65 different source types and they can be either be on cloud or on premise and Microsoft will continue to add more and more source types.
How we can do the transformation in azure data factory there are many option as per our requirement we can use hive configuring with HDInsight and we can use U-SQL configuring with azure data analytics likewise there are many transformation are done.
Key Components of Azure Data Factory:
Pipeline: pipeline is a collection of activities such as data extraction or processing data. Activities can be operated sequentially or in parallel.
Activity: Activity is a step or task such as copying data in a pipeline.
Three types of activity:
- Data movement activities
- Data Transformation activities
- Control activities
Dataset: Dataset is a data structure that references data need to use
Linked Services: Linked service is the connection object for source, destinations or compute resources used. Integration Runtime provides integration capabilities across different network environments.It facilitates different capabilities like:
- Data Movement
- Activity dispatch
- SSIS package execution
When you need to transfer the data between public network and private network example like datalake which public and on premise SQL server which is private then its data movement capabilities are used.
Its activity dispatch activities are used when a compute service such an Azure HDInsight or SQL server used for transformation activities.
When a SQL server integration service package or SSIS package needs to be executed in manage azure compute environment then it SSIS package execution capabilities are used.
Three integration runtime types:
Azure Type can be used only in public network and it can be used with data movement and activity dispatch activities.
Self-hosted can be used both public and private network and it can also be used with data movement and activity dispatch activities.
Azure-SSIS is also used for both public and private network but only with SSIS package execution capabilities.