We listened to your feedback and are constantly striving to improve the Azure Data Factory service. Earlier today, we released a new capability in the Azure portal which enables use-case based samples to be deployed to a data factory in a few clicks. The deployment of the sample takes less than 5 minutes.
We hope this makes learning about Data Factory easier and more hands-on. The samples deploy as standard pipelines in a data factory so you can view their source in the ADF Editor to learn how they work and edit/customize to your preference.
Launching the Samples
1. Create a new data factory by following the instructions in Step 1 of this tutorial or use an existing data factory.
2. On the DATA FACTORY blade, click the Sample pipelines tile as shown in the following image:
Figure 1 – Samples section in the Data Factory landing page
Select the Sample
The Sample pipelines blade shows the list of samples that can be deployed to your data factory. We are starting with the Customer Profiling sample at this time. This sample showcases how a gaming company leverages Azure Data Factory to operationalize processing of semi-structured game logs in Azure HDInsights to gain insights on customer preferences, usage behavior and marketing campaign effectiveness. We are iterating rapidly and will release new such samples in the coming weeks. If you ‘d like to see any specific samples added, please click on Suggest a sample as we’d love to hear your suggestions!
Figure 2 – Sample selection
Select Storage Resources
1. When you click Customer profiling on the Sample pipelines page, you will see the SAMPLE PIPELINE blade for the sample. Read the description of the sample on this blade to understand the details of the sample and the operations that are going to be performed when you deploy the sample.
2. First, you need to specify values for configuration parameters of the sample before deploying the sample. All the existing Azure storages and Azure SQL servers/databases from the active subscriptions are listed in the blade view. Select the appropriate storage account and the server/database combination from the drop-down list that you want to use with the sample. In case you do not have any existing Azure Storage account or an Azure SQL Database, use the Create new links to navigate to the Azure Storage and/or Azure SQL Database creation flows to create the resources and restart the Azure Data Factory sample deployment workflow.
3. Click Create button to create/deploy the sample.
Figure 3 – Sample description and resource configuration
Deployment
After you complete the configuration of storage resources for the samples and click Create button, the deployment process begins. It performs the following operations:
- Uploads sample data to your Azure storage
- Creates a table in the Azure SQL database
- Deploys all the data factory entities (linked services, tables, and pipelines ) corresponding to the sample
In less than 5 minutes, the sample is deployed and running in the data factory. You will see the Deployment succeeded message as shown in the following image.
Figure 4 – Successful sample deployment
You can view the deployed entities (linked services, datasets, and pipelines) in the DATA FACTORY blade for your data factory.
Figure 5 – Data Factory summary page
Monitor the sample pipelines
You can see the end-to-end data integration workflow in the Data Factory Diagram view and use the rich monitoring capabilities to monitor the datasets and pipelines. To navigate to the Diagram view, click the Deployment succeeded hot spot on the sample icon or click Diagram on the DATA FACTORY summary blade.
Figure 6 – Hotspot on the sample selection for easy navigation
Double-click on a table/dataset in the diagram view to see all the slices for that table and their statuses as shown in the following image.
Figure 6 – Data Factory diagram view
Figure 7 – Dataset status
Now that the sample pipelines are deployed and running successfully, you can open them in the Azure Data Factory Editor to view the JSON files for all the deployed entities and learn the format and the configurations. To learn more on Data Factory JSON scripting, click here
Figure 8 – Editor launch tile
Figure 9 – View and edit JSON files in the Editor
View the output
Once all the pipelines execute successfully and the datasets are marked as Ready, you can see the following folder structure in your Azure Storage under adfcustomerprofilingsample container
Figure 8 – View raw and processed logs in Azure Storage account
The Hive and the Pig scripts used in this sample are located in the scripts folder.
Navigate to the Azure SQL Database that you have selected in the resource configuration page and query the MarketingCampaignEffectiveness table to see the final output data copied by an Azure Data Factory copy pipeline.
Figure 9 – View final output in Azure SQL database table
Congratulations, you have now deployed the Customer Profiling sample successfully. You can use to all the entities deployed as part of the Customer Profiling sample as templates and make them work with our own data by making simple edits to the JSON files and the Hive, Pig scripts to meet your requirements.
For more details about this sample, see this tutorial on Azure.com.
With the ADF Editor in March and the simplified sample deployments now, our goal is make the onboarding and deployment experience friction free and intuitive enabling you to easily deploy and execute pipelines. We are looking forward to hear your feedback. If you are missing a specific functionality or encounter any issues, please visit the Azure Data Factory Forums and provide your feedback.