Inside Azure DocumentDB is a write optimized, schema-agnostic, database engine purpose built for JSON and JavaScript. DocumentDB does not require developers to provide any schema or secondary indexes to index documents. This allows you to quickly define, iterate, and query on application data models using DocumentDB.
By default, as you add documents to a collection, DocumentDB automatically indexes all properties of the documents synchronously before acknowledging the writes to the client. As you update documents to a collection, DocumentDB’s database engine converts the documents into trees and efficiently indexes, de-duplicates, replicates and durably persists the paths of the trees (on a write quorum of replicas). By treating documents as trees and efficiently indexing tree structures, DocumentDB remains agnostic of schemas and does not require any secondary indices.
Customizing the index
Every DocumentDB collection has an associated “Indexing policy” to control aspects of index management including index storage overhead, write and query throughput and query consistency. The indexing policy associated with a DocumentDB collection will automatically index all properties of all documents in addition to providing consistent query. While the default indexing policy strikes an optimal balance between the index storage overhead and write/query throughput for serving consistent queries, for those interested in making finer grained tradeoffs between the storage overhead of index, query consistency can further customize the index associated with a collection by updating the indexing policy.
Online and in-place index transformation
Previously, indexing policies could be set only on the creation of a collection. Today, we are excited to announce that indexing policies can now be modified on existing collections with the release of the online index transformations feature. The index transformation resulting from a change in the indexing policy is performed online for availability and in-place for storage efficiency.
When you change the indexing policy of a collection, DocumentDB performs an index transformation from the old policy to the new one. This transformation is performed asynchronously and online, allowing the collection to remain available for writes while transformation is in progress. The documents written to the collection after the index policy is updated are even available for querying while the old index continues to get transformed into the new one, behind the scenes. The index transformation is also designed to be performed in situ (in-place), so no extra disk space is required or consumed during indexing policy changes. The provisioned throughput of your collection is fully available during transformation, which means the performance of your apps will not be affected.
If you’re using the .NET SDK, you can make an indexing policy change using the new ReplaceDocumentCollectionAsync method. Then you can track the percentage progress of the index transformation using the IndexTransformationProgress response property from a ReadDocumentCollectionAsync call. Other SDKs and the REST API support equivalent properties and methods for making indexing policy changes and tracking progress of an ongoing index transformation.
When would you make indexing policy changes to your DocumentDB collections? The following are some of the most common use cases:
- When using new indexing features on your current DocumentDB collections like Order By and string range queries which require the newly introduced string range index kind.
- When reducing the throughput for writes as well as the storage space used by hand selecting the properties to be indexed and changing them over time, or by varying the index precision of individual properties.
- When importing bulk data using lazy indexing modes for faster writes, then switching to consistent indexing for normal operation.
Get started with some of DocumentDB’s latest query features and index policy options by downloading version 1.3.0 of the .NET SDK from Nuget here or one of the other supported platforms (Node.js, Java, Python or JavaScript) here. We also created a Github project containing code samples for creating and modifying index policies.
If you need any help or have questions, please reach out to us on the developer forums on stack overflow, or schedule a 1:1 chat with the DocumentDB engineering team.
To stay up to date on the latest DocumentDB news and features, follow us on Twitter @DocumentDB.