The Azure DocumentDB Data Migration Tool is an open source solution that imports data to DocumentDB, Azure’s NoSQL document database service. With the latest release of the tool, you can now ingest large volumes of data from a variety of sources even faster by partitioning the data across multiple collections during import.
We’ve also enhanced the tool’s existing support for importing CSV files as well as its ability to log import errors. While you can find full details on how to use the tool – including command line samples for each data source option in my article here, keep reading for a brief overview of the new features.
Partitioning Support
The Migration Tool now supports reading from and writing to multiple DocumentDB collections. To read from multiple collections, simply provide a regular expression to match one or more collections.
To partition data across multiple collections during import, specify a set of existing collections (or a naming pattern for the tool to use when creating collections*) and provide the property to use as the Partition Key.
Enhanced CSV Support
When importing CSV files, the tool will attempt to infer type information for unquoted values in CSV files (quoted values are always treated as strings). Types are identified in the following order: number, datetime, boolean.
Redirect Import Errors to CSV
We’ve added an advanced configuration screen to specify the location of the log file to which you would like any errors written.
The updated migration tool source code is available on GitHub and an updated compiled version is available from Microsoft Download Center. You may either compile the solution or simply download and extract the compiled version to a directory of your choice. We’d also love to hear about additional sources you would like to see in the Data Migration Tool, so please submit your feedback. To learn more about DocumentDB, please visit our service page.
*Note: Collections are billable entities in DocumentDB and, as such, there is a pricing implication to creating multiple collections. See the DocumentDB pricing page for more information.