Hello SIGMOD attendees!
Welcome to Houston, and to what is shaping up to be a great conference. I wanted to take this opportunity to share with you some of the exciting work in data that’s going on in the Azure Data team at Microsoft, and to invite you to take a closer look.
Microsoft has long been a leader in database management with SQL Server, recognized as the top DBMS by Gartner for the past three years in a row. The emergence of the cloud and edge as the new frontiers for computing, and thus data management, is an exciting direction—data is now dispersed within and beyond the enterprise, on-prem, on-cloud, and on edge devices, and we must enable intelligent analysis, transactions, and responsible governance for all data everywhere, from the moment it is created to the moment it is deleted, through the entire life-cycle of ingestion, updates, exploration, data prep, analysis, serving, and archival.
These trends require us to fundamentally re-think data management. Transactional replication can span continents. Data is not just relational. Interactive, real-time, and streaming applications with enterprise level SLAs are becoming common. Machine learning is a foundational analytic task and must be supported while ensuring that all data governance policies are enforced, taking full advantage of advances in deep learning and hardware acceleration via FPGAs and GPUs. Microsoft is a very data-driven company, and to support product teams such as Bing, Office, Skype, Windows, and Xbox, the Azure Data team operates some of the world’s largest data services for our internal customers. This grounds our thinking about how data management is changing, through close collaborations with some of the most sophisticated application developers on the planet. Of course, even in a company like Microsoft, the majority of data owners and users are domain experts, not database experts. This means that even though the underlying complexity of data management is growing rapidly, we need to greatly simplify how users deal with their data. Indeed, a large cloud database service contains millions of customer databases, and requires us to handle many of the tasks previously handled by database administrators.
We are excited about the opportunity to re-imagine data management. We are making broad and deep investments in SQL Server and open source technologies, in Azure data services that leverage them, as well as in new built-for-cloud technologies. We have an ambitious vision of the future of data management that embraces Data Lakes and No SQL, on-prem, in the cloud, and on edge devices. There has never been a better time to be part of database systems innovation at Microsoft, and we invite you to explore the opportunities to be part of our team.
This is an exciting time in our industry with many companies competing for talent and customers, and as you consider options, we want to highlight how Microsoft is differentiated in several respects. First, we have a unique combination of a world-class data management ecosystem in SQL Server and a leading public cloud in Azure. Second, our culture puts customers first with a commitment to bringing them the best of open source technologies alongside the best of Microsoft. Third, we have a deep commitment to innovation; product teams collaborate closely with research and advanced development groups at Cloud Information Services Lab, Gray Systems Lab, and Microsoft Research to go farther faster, and to maintain strong ties with the research community.
In this blog, I’ve listed the many services and ongoing work in the Azure Data group at Microsoft, together with links that will give you a closer look. I hope you will find these of interest.
Have a great SIGMOD conference!
Rohan Kumar
CVP, Azure Data
Azure Data blogs:
The Microsoft data platform continues momentum for cloud-scale innovation and migration
Azure data blog posts by Rohan Kumar
SQL Server
Industry leading database management system, now available on Linux/Docker and Windows, on-prem and in the cloud.
The SQL Server blog
SQL Server 2017
Azure Cosmos DB: The industry’s first globally-distributed, multi-model database service
Azure Cosmos DB is the first globally-distributed data service that lets you elastically scale throughput and storage across any number of geographical regions while guaranteeing low latency, high availability and consistency – backed by the most comprehensive SLAs in the industry. Azure Cosmos DB is built to power today’s IoT and mobile apps, and tomorrow’s AI-hungry future.
It is the first cloud database to natively support a multitude of data models and popular query APIs, is built on a novel database engine capable of ingesting sustained volumes of data and provides blazing-fast queries – all without having to deal with schema or index management. And it is the first cloud database to offer five well-defined consistency models, so you can choose just the right one for your app.
Azure Cosmos DB: The industry's first globally-distributed, multi-model database service
Azure Data Lake Analytics - An on-demand analytics job service to power intelligent action
Easily develop and run massively parallel data transformation and processing programs in U-SQL, R, Python, and .NET over petabytes of data. With no infrastructure to manage, you can process data on demand, scale instantly, and only pay per job. There is no infrastructure to worry about because there are no servers, virtual machines, or clusters to wait for, manage, or tune. Instantly scale the processing power for any job.
Azure Data Lake Analytics
Azure Data Catalog – Find, understand and govern data
Companies collectively own Exabytes of data across hundreds of billions of streams, files, tables, reports, spreadsheets, VMs, etc. But for the most part companies don't know where their data is, what's in it, how it's being used or if it's secure. Our job is to fix that. We are building an automated infrastructure to find & classify any form of data anywhere it lives, from on-premise to cloud, figure out what the data contains, and how it should be managed/protected. Our global scale infrastructure will track all of this data and provide a search engine to make the data discoverable and easily re-usable. From AI to distributed systems we are using cutting edge technologies to make data useful.
Azure Data Factory – Managed, hybrid data integration service at scale
Create, schedule, and manage your data integration at scale. Work with data wherever it lives, in the cloud or on-premises, with enterprise-grade security. Accelerate your data integration projects by taking advantage of over 70 available data source connectors. Use the graphical user interface to build and manage your data pipelines. Transform raw data into finished, shaped data ready for consumption by business intelligence tools or custom applications. Easily lift your SQL Server Integration Services (SSIS) packages to Azure, and let Azure Data Factory manage your resources so you can increase productivity and lower total cost of ownership (TCO).
Azure Data Factory
Azure HDInsight – Managed cluster service for the full spectrum of Hadoop & Spark
Azure HDInsight is a fully-managed cluster service that makes it easy, fast, and cost-effective to process massive amounts of data. Use popular open-source frameworks such as Hadoop, Spark, Hive, LLAP, Kafka, Storm, ML Services & more. Azure HDInsight enables a broad range of scenarios such as ETL, Data Warehousing, Machine Learning, IoT and more. A cost-effective service that is powerful and reliable. Pay only for what you use. Create clusters on demand, then scale them up or down. Decoupled compute and storage provide better performance and flexibility.
Azure HDInsight
Azure SQL DB - The intelligent relational cloud database service
The world’s first intelligent database service that learns and adapts with your application, enabling you to dynamically maximize performance with very little effort on your part. Working around the clock to learn, profile and detect anomalous database activities, Threat Detection identifies potential threats to the database, and like an airplane’s flight data recorder, Query Store collects detailed historical information about all queries, greatly simplifying performance forensics by reducing the time to diagnose and resolve issues.
Azure SQL Database
Adaptive query processing support for Azure SQL Database
Improved Automatic Tuning boosts your Azure SQL Database performance
Query Store: A flight data recorder for your database
Azure SQL DW - Fast, flexible, and secure analytics platform
A fast, fully managed, elastic, petabyte-scale cloud data warehouse. Azure SQL Data Warehouse lets you independently scale compute and storage, while pausing and resuming your data warehouse within minutes through a distributed processing architecture designed for the cloud. Seamlessly create your hub for analytics along with native connectivity with data integration and visualization services, all while using your existing SQL and BI skills. Through PolyBase, Azure Data Warehouse brings data lake data into the warehouse with support for rich queries over files and unstructured data.
Azure SQL Data Warehouse
Blazing fast data warehousing with Azure SQL Data Warehouse
Azure SQL Database Threat Detection, your built-in security expert
Azure SQL Data Warehouse now generally available in all Azure regions worldwide
Azure Stream Analytics - A serverless real-time analytics service to power intelligent action
Easily develop and run massively parallel real-time analytics on multiple IoT or non-IoT streams of data using a simple SQL-like language. Use custom code for advanced scenarios. With no infrastructure to manage, you can process data on-demand, scale instantly, and only pay per job. Azure Stream Analytics seamlessly integrates with Azure IoT Hub and Azure IoT Suite. Azure Stream Analytics is also available on Azure IoT Edge enabling near-real-time analytical intelligence closer to IoT devices so that they can unlock the full value of device-generated data.
Azure Stream Analytics
Gray Systems Lab (GSL) and Cloud and Information Services Lab (CISL)
The Azure Data Office of the CTO is the home of the Gray Systems Lab (GSL) and the Cloud and Information Services Lab (CISL). These are applied research groups, employing scientists and engineers focusing on everything data: from SQL Server in the cloud to massive-scale distributed systems for Big Data, from hardware-accelerated data processing to self-healing streaming solutions. The labs were founded in 2008 and 2012 respectively, have since produced 70+ papers in top database and system conferences, have filed numerous patents, and are continuously engaged in deep academic collaborations, through internships and sponsorship of academic projects. This research focus is matched with an equally strong commitment to open-source (with several committers in Apache projects who together contributed over 500K LOC to open-source), and continuous and deep engagement with product teams. Strong product partnerships provide an invaluable path for innovation to affect the real world—the labs’ innovations today govern hundreds of thousands of servers in our Big Data infrastructure, and ship with several cloud and on-premise products. Microsoft’s cloud-first strategy gives researchers in our group a unique vantage point, where PBs of telemetry data and first-party workloads make every step of our innovation process principled and data-driven. The labs have locations in Microsoft’s Redmond and Silicon Valley campuses, as well as in the UW-Madison campus. Contact: Carlo Curino carlo.curino@microsoft.com or Alan Halverson alanhal@microsoft.com to hear more.