Performance Tips for Azure DocumentDB – Part 1
Azure DocumentDB allows you to optimize the performance of your database to best meet the needs of your application. We’ve prepared a two part blog post series which will cover a number of areas that influence the performance of Azure DocumentDB.
In part 1, we look at the networking and SDK configuration options available in DocumentDB and their impact on performance. In part 2 we will cover indexing policy, throughput optimization and consistency levels. Like any performance tuning recommendation, not every one of these tips may be applicable for your use case, but you can use this information as a guide in order to assist you in making the right design choices for your applications.
Networking
Networking Tip #1 Connection Policy – Use direct connection mode for better performance
First, let’s take a look at Connection Policy. How a client connects to Azure DocumentDB has important implications on performance, especially in terms of observed client-side latency. There are two key configuration settings available for configuring client Connection Policy – the connection mode and the connection protocol. The two available modes are:
- Gateway Mode (default)
- Direct Mode
Since DocumentDB is a distributed storage system, DocumentDB resources like collections and documents are partitioned across numerous machines and each partition is replicated for high availability. The logical to physical address translation is kept in a routing table which is also internally available as a resource.
In Gateway Mode, the DocumentDB gateway machines perform this routing, thereby allowing client code to be simple and compact. A client application issues requests to the DocumentDB gateway machines, which translate the logical URI in the request to the physical address of the backend node, and forward the request appropriately. Conversely, in Direct Mode clients must maintain – and periodically refresh – a copy of this routing table, and then directly connect to the backend DocumentDB nodes.
Gateway Mode is supported on all SDK platforms and is the configured default. If your application runs within a corporate network with strict firewall restrictions, Gateway Mode is the best choice since it uses the standard HTTPS port and a single endpoint. The performance tradeoff, however, is that Gateway Mode involves an additional network hop every time data is read or written to DocumentDB. Because of this, Direct Mode offers better performance due to fewer network hops.
Direct Mode is currently supported only in the .NET SDK, but will be available in other platforms with subsequent SDK refreshes. Read more about client connectivity options here.
Networking Tip #2 Connection Policy – Use TCP protocol for better performance
When leveraging Direct Mode, there are two protocol options available:
- TCP
- HTTPS
DocumentDB offers a simple and open RESTful programming model over HTTPS. Additionally, it offers an efficient TCP protocol which is also RESTful in its communication model and is available through the .NET client SDK. For best performance, use the TCP protocol when possible.
The Connectivity Mode is configured during the construction of the DocumentClient instance with the ConnectionPolicy parameter. If Direct Mode is used, the Protocol can also be set within the ConnectionPolicy parameter.
var serviceEndpoint = new Uri("https://contoso.documents.net"); var authKey = new "your authKey from Azure Mngt Portal"; DocumentClient client = new DocumentClient(serviceEndpoint, authKey, new ConnectionPolicy { ConnectionMode = ConnectionMode.Direct, ConnectionProtocol = Protocol.Tcp });
Because TCP is only supported in Direct Mode, if Gateway Mode is used, then the HTTPS protocol is always used to communicate with the Gateway and the Protocol value in the ConnectionPolicy is ignored.
Networking Tip #3: Call OpenAsync to avoid startup latency on first request
By default, the first request will have a higher latency because it has to fetch the address routing table. In order to avoid this startup latency on the first request, you should call OpenAsync() once during initialization as follows.
await client.OpenAsync();
Networking TIP #4: Collocate clients in same Azure region for performance
It is possible to experience high network latency between making a request to DocumentDB and receiving the response. This latency could likely vary from request to request depending on the route taken by the request from the client to the Azure datacenter boundary. The lowest possible latency will be achieved by ensuring the calling application is located within the same Azure region as the provisioned DocumentDB endpoint.
Deployment considerations illustrated
SDK Usage
SDK Usage Tip #1: Use a singleton DocumentDB client for the lifetime of your application
Note that each DocumentClient instance is thread-safe and performs efficient connection management and address caching when operating in Direct Mode. To allow efficient connection management and better performance by DocumentClient, it is recommended to use a single instance of DocumentClient per AppDomain for the lifetime of the application.
SDK Usage Tip #2: Cache document and collection SelfLinks for lower read latency
In Azure DocumentDB, each document has a system-generated selfLink. These selfLinks are guaranteed to be unique and immutable for the lifetime of the document. Reading a single document using a selfLink is the most efficient way to get a single document. Due to the immutability of the selfLink, you should cache selfLinks whenever possible for best read performance.
Document document = await client.ReadDocumentAsync("/dbs/1234/colls/1234354/docs/2332435465");
Having said that, it may not be always possible for the application to work with a document’s selfLink for read scenarios; in this case, the next most efficient way to retrieve a document is to query by the document’s user provided Id property. For example:
IDocumentQueryquery = (from doc in client.CreateDocumentQuery(colSelfLink) where doc.Id == "myId" select document).AsDocumentQuery(); Document myDocument = null; while (query.HasMoreResults) { FeedResponse res = await query.ExecuteNextAsync (); if (res.Count != 0) { myDocument = res.Single(); break; } }
SDK Usage Tip #3: Tune page size for queries/read feeds for better performance
When performing a bulk read of documents using read feed functionality (i.e. ReadDocumentFeedAsync) or when issuing a DocumentDB SQL query, the results are returned in a segmented fashion if the result set is too large. By default, results are returned in chunks of 100 items or 1 MB, whichever limit is hit first.
In order to reduce the number of network round trips required to retrieve all applicable results, you can increase the page size using x-ms-max-item-count request header to up to 1000. In cases where you need to display only a few results, e.g., if your user interface or application API returns only ten results a time, you can also decrease the page size to 10 in order to reduce the throughput consumed for reads and queries.
You may also set the page size using the available DocumentDB SDKs. For example:
IQueryableauthorResults = client.CreateDocumentQuery(documentCollection.SelfLink, "SELECT p.Author FROM Pages p WHERE p.Title = 'About Seattle'", new FeedOptions { MaxItemCount = 1000 });
Wrapping Up
We hope you found one or more of these tips both useful and applicable to your usage of DocumentDB. In part 2 of this blog series we will continue the discussion and cover indexing policies, throughput optimization and consistency levels. In addition, we’d love to hear from you about the DocumentDB features and experiences you would find most valuable. Please submit your suggestions on the Microsoft Azure DocumentDB feedback forum. If you haven’t tried DocumentDB yet, then get started here.