Hello readers!! Hope you all are doing well. In this article, you will learn how to handle the error records while processing the data in Azure Data Factory. While loading data if any junk data or unformatted data in countered then through Azure Data Factory it will be handled. In other words, we are going to see how to perform a validation of records and handle error records in the mapping data flows in Azure Data Factory.
Below are the steps you should follow to handle error records:
Error records or unformatted data from the source should be handled during the data processing. In this blog we are loading the employee data as shown below. The highlighted records do not have the proper date format. So, based on the rule or condition will filter those records and load into the error table.
Source Data File
DoJ column have a format of dd-MMM-yyyy format but the highlighted records does not follow that date format. While loading these records you have to reject those records through Azure data factory data mapping.
In the dataflow mapping as shown in the screen shot you need to configure the details like Output Stream name, Source type, Dataset and other details for source setting tab.
Configure the other tab details and then click on the data preview to visualize the source data.
Click on the + sign as shown in the screen shot and select the conditional split to filter out the error records.
Based on the split condition you can view the data in the data preview mode. Below it shows the error records.
Here you can view the data correct records which does not get filter based on the split conditions.
Based on the split condition you can see that 2 flows are created that is Errorrecords and Correctrecords flows.
The correct records and the error records need to be configured to the Sink output destination. Check the first sink destination and you can view the data which is not in the date format is rejected.
Click on the second sink destination and you can view the data with correct format of date.
In summary in this blog you have seen that based on the one column we are checking the split condition and rejecting and processing the data, same way we can use the multiple columns. This is one of the solution you can use for this scenario but there are other ways of doing this as well. In the sink destination you have configure the dataset and linked services to load this records in the different tables.
Nazir is a senior consultant having over 10 years of experience in data warehouse development and implementation. He has strong experience in databases, ETL tools, and Linux/Unix Scripting. He has trained several batches on Azure SQL.