In case you missed it: August 2018 roundup

September 6, 2018, 8:09 am

≫ Next: Powerful Debugging Tools for Spark for Azure HDInsight

≪ Previous: Helping customers shift to a modern desktop

In case you missed them, here are some articles from August of particular interest to R users.

A guide to installing R and RStudio with packages and multithreaded BLAS on various platforms.

Some tips for Excel users migrating to R for data analysis.

Videos of presentations from the New York R Conference.

The Chartmaker Directory compares and provides examples of data visualizations for dozens of tools, including R.

Siraj Raval's video overview of Azure machine learning services.

A simple R script based on the gganimate package illustrates the luminance illusion.

Roundup of AI, Machine Learning and Data Science news from August 2018.

A package to make R play text as speech.

Microsoft R Open 3.5.1 is now available.

R ranks #14 in the June 2018 Redmonk Language Rankings.

R drops one place to #7 in the 2018 IEEE Language Rankings.

A video tutorial on running R and Python in SQL Server from a Jupyter Notebook.

The cover story for Significance Magazine celebrates 25 years of the R project.

And some general interest stories (not necessarily related to R):

The Curiosity Show, the 80's Australian science program for kids.
A visualization of the prime factors of the first million integers shows surprising structure.
Brexit as the Titanic disaster.
An experimental underwater data center gets a fishcam.
A parody commercial for Australia tourism focuses on the "dangers".

As always, thanks for the comments and please send any suggestions to me at davidsmi@microsoft.com. Don't forget you can follow the blog using an RSS reader, via email using blogtrottr, or by following me on Twitter (I'm @revodavid). You can find roundups of previous months here.

↧

Powerful Debugging Tools for Spark for Azure HDInsight

September 6, 2018, 2:00 am

≫ Next: Exciting new capabilities on Azure HDInsight

≪ Previous: In case you missed it: August 2018 roundup

Microsoft runs one of the largest big data cluster in the world – internally called “Cosmos”. This runs millions of jobs across hundreds of thousands of servers over multiple Exabytes of data. Being able to run and manage jobs of this scale by developers was a huge challenge. Jobs with hundreds of thousands of vertices are common and to even quickly figure out why a job runs slow or narrow down bottlenecks was a huge challenge. We built powerful tools that graphically show the entire job graph including the various vertex execution times, playback etc. which helped developers greatly. While this was built for our internal language in Cosmos (called Scope), we are working very hard to bring this power to all Spark developers.

Today, we are delighted to announce the Public Preview of the Apache Spark Debugging Toolset for HDInsight for Spark 2.3 cluster and forward. The default Spark history server user experience is now enhanced in HDInsight with rich information on your spark jobs with powerful interactive visualization of Job Graphs & Data Flows. The new features greatly assist HDInsight Spark developers in job data management, data sampling, job monitoring and job diagnosis.

Spark History Server Enhancements

The Spark History Server Experience in HDInsight now features two new tabs: Graph and Data.

Graph Tab – Job graph is a powerful interactive visualization of your jobs. This interface enables innovative debugging experiences such as playback, heatmap by progress of job stages, read, written for Spark application and individual jobs.

The Spark job graph displays Spark job executions details with data input and output across stages. For completed jobs, the Spark job graph allows Spark developer to playback the job by progress, data read and written with details. You can now dwell in Spark job diagnosis around performance, data and execution time using this experience which articulates various stage outliers.

Data Tab – Job specific input, output data view, search, download, preview, data copy, data URL copy, data export to CSV, as well as table operations view are visualized in the data tab.

As a developers or data scientist, you can perform various actions such as preview, download, copy and export to CSV file of the data. You can also come here to partially download data as sample data for your local run and local debug. Metadata interpretation and correlation has always been a challenge in debugging. A cool feature has also been added around Table Operations, you are able to view the Hive metadata, investigate table operations at each stage to gain more insights for better troubleshooting and spark job analysis.

Developer Nirvana

HDInsight Spark developers can greatly increase their productivity by leveraging these capabilities:

Preview and download Spark job input and output data, as well as view Spark job table operations.
View and playback Spark application / job graph by progress, data read and written.
Identify the Spark application execution dependencies among stages and jobs for performance tuning.
View data read/write heatmap, identify the outliers and the bottlenecking stage and job for Spark job performance diagnosis.
View Spark job/stage data input/output size and time duration for performance optimization.
Locate failed stage and drill down for failed tasks details for debugging.

Getting Started with Apache Spark Debugging Toolset

These features have been built into HDInsight Spark history server.

Access from the Azure portal - Open the Spark cluster, click Cluster Dashboard from Quick Links, and then click Spark History Server.

Access by URL - Open the Spark History Server.

More features to come

Critical path analysis for Spark application and job
Spark job diagnosis
- Data Skew and Time Skew Analysis
- Executor Usage Analysis
Debugging on failed job

Feedback

We look forward to your comments and feedback. If there is any feature request, customer ask, or suggestion, please send us a note to hdivstool@microsoft.com. For bug submission, please open a new ticket using the template.

For more information, check out the following:

Learn more about today’s announcements on the Azure blog and Big Data blog, and discover more Azure service updates

↧

Exciting new capabilities on Azure HDInsight

September 6, 2018, 3:00 am

≫ Next: Where is the market opportunity for developers of IoT solutions?

≪ Previous: Powerful Debugging Tools for Spark for Azure HDInsight

Friends of Azure HDInsight, it's been a busy summer. I wanted to summarize several noteworthy enhancements we’ve recently brought to HDInsight. We have even more exciting releases coming up at Ignite so please stay tuned!

Product updates

Apache Phoenix and Zeppelin integration

You can now query data in Apache Phoenix from Zeppelin.

Apache Phoenix is an open source, massively parallel relational database layer built on HBase. Phoenix allows you to use SQL like queries over HBase. Phoenix uses Java Data Connectivity (JDBC) drivers underneath to enable users to create, delete, alter SQL tables, indexes, views and sequences, and upset rows individually and in bulk. Phoenix uses NoSQL native compilation rather than using MapReduce to compile queries, enabling the creation of low-latency applications on top of HBase.

Apache Phoenix enables online transaction processing (OLTP) and operational analytics in Hadoop for low latency applications by combining the best of both worlds. In Azure HDInsight Apache Phoenix is delivered as a first class Open Source framework.

Oozie support in HDInsight enterprise security package

Oozie is a workflow scheduler system for managing Apache Hadoop jobs. You can now use Oozie in domain-joined Hadoop clusters to build secure Oozie workflows in Azure HDInsight.

Azure Data Lake Storage Gen2 integration

Microsoft announced a preview of Azure Data Lake Storage Gen2, a globally available HDFS filesystem to store and analyze petabyte-size files and trillions of objects. HDInsight clusters can work with Azure Data Lake Storage Gen 2.

ML Services 9.3 and open-source R capabilities on HDInsight

ML Services on HDInsight provides the latest capabilities for R-based analytics on datasets of virtually any size, loaded to either Azure Blob or Data Lake storage. Since ML Services cluster is built on open-source R, the R-based applications you build can leverage any of the 8,000+ open-source R packages. The routines in ScaleR, Microsoft’s big data analytics package are also available.

Learn More: Introducing ML Services 9.3 in Azure HDInsight

Virtual Network Service Endpoints support

We announced support for Virtual Network Service Endpoints which allows customers to securely connect to Azure Blob Storage, Azure Data Lake Storage Gen2, Cosmos DB and SQL databases. By enabling a Service Endpoint for HDInsight, traffic flows through a secured route from within the Azure data center.

Kafka 1.0 and 1.1 support

Kafka on HDInsight provides a high-throughput, low-latency ingestion platform for your real-time data pipeline. We announced the support for Kafka 1.0 and 1.1

Support for Spark 2.3

We made Apache Spark 2.3.0 for production use on HDInsight. Ranging from bug fixes (more than 1,400 tickets were fixed in this release) to new experimental features, Apache Spark 2.3.0 brings advancements and polish to all areas of its unified data platform.

Move large data sets to Azure using WANDisco on Azure HDInsight

With WANdisco Fusion, you can move data that you have in other large-scale analytics platforms to Azure Blob Storage, ADLS Gen1 and Gen2 without downtime or disruption to your existing environment. Customers can also replicate the data, and metadata (Hive database schema, authorization policies using Apache Ranger, Sentry, and more) across different regions to make the data lake available globally for analytics.

More blogs:

Azure #HDInsight Interactive Query: simplifying big data analytics architecture

How Microsoft drives exabyte analytics on the world’s largest YARN cluster

Top 8 reasons to choose Azure HDInsight

Avoid Big Data pitfalls with Azure HDInsight and these partner solutions

Enterprises get deeper insights with Hadoop and Spark updates on Azure HDInsight

Microsoft deepens its commitment to Apache Hadoop and open source analytics

Siphon: Streaming data ingestion with Apache Kafka

Azure HDInsight Interactive Query: Ten tools to analyze big data faster

How to use DBeaver with Azure #HDInsight

HDInsight HBase: Migrating to new HDInsight version

Customer stories

Chinese vending machine innovator automates shopping with cloud technology

Argentinian company harnesses data to give e-commerce sellers valuable business insights

Data analytics firm signs two major deals after introductions to potential customers

We are excited to see the customer momentum across different industry verticals, and will continue to bring new capabilities to HDInsight to solve new big data challenges.

About HDInsight

Azure HDInsight is Microsoft’s premium managed offering for running open source workloads on Azure. Azure HDInsight powers mission-critical applications across a wide variety of sectors and address a broad range of use cases including ETL, streaming, and interactive querying.

Additional resources

↧

Where is the market opportunity for developers of IoT solutions?

September 6, 2018, 4:00 am

≫ Next: Empowering agents with Office 365—Douglas Elliman boosts the agent experience

≪ Previous: Exciting new capabilities on Azure HDInsight

In this article I discuss the biggest market opportunities for developers of IoT solutions. This is a topic of greatest interest to developers from independent software vendors (ISVs), and system integrators (SIs) who develop custom solutions for individual customers. For more resources, including a solution guide, podcasts, webinars, partner and customer highlights, explore Expanding business opportunities with IoT.

Opportunities for ISVs

The figure below shows the three layers in a typical IoT solution stack where an ISV could target their development.

Figure 1: IoT Solution Stack

Cloud platform: a set of PaaS services used to develop cloud-based solutions. Most cloud platforms also provide specialized analytics and IoT services.
IoT platform: a set of PaaS and SaaS services for rapid development of IoT solutions. IoT platforms are usually built on top of a cloud platform.
IoT solution: the end-user applications that help users in manufacturing companies to extract actionable insights from IoT data. IoT solutions can be built either on top of an IoT platform, or directly on top of a cloud platform.

For this article, the development of cloud platforms is out-of-scope. The cloud platform vendor ecosystem is already crowded, and the barriers to entry are extremely high. Only a handful of ISVs are in a position to develop cloud platforms. See the table below for more details.

That leaves IoT platforms and IoT solutions. It’s in these two layers where the greater opportunity exists for ISVs. Manufacturers need help realizing the value of IoT. This is where the pain — and the opportunity — is today. Addressing this pain is what IoT platforms and solutions do.

Fundamentally, IoT platforms and IoT solutions differ in their scope. The scope of an IoT solution is much narrower. It is often scoped down to a single workload on a single customer. So, for example, an IoT solution may focus on predictive maintenance of pumps in the field of a specific oil and gas company. IoT platforms, on the other hand, are used to build IoT solutions. The following table summarizes the scope for each type of development, along with their target customers, capabilities, typical cloud models, and vendor examples.

Layer	Scope	Target	Capabilities	Typical cloud model	Vendor examples
Cloud platforms (out of scope for this article)	Development of general-purpose cloud solutions	Developers	Set of cloud services and developer tools for application messaging, storage, compute, security, networking, web apps, management, analytics, and IoT	IaaS/PaaS	Microsoft Azure
IoT platforms	Rapid development of IoT solutions	Developers	Pre-built, configurable apps, templates, dashboards, analytics, and ML models. GUI IDEs and data modelers. IoT-specific APIs. Device connectors and protocol adapters. Application security and management services.	PaaS + SaaS	C3 IoT Platform Element Platform, GE Predix PTC ThingWorx relayr Analytics Siemens Mindsphere Symphony Industrial AI
IoT solutions	A specific customer’s workflow	End Users	Specific to the customer and its workload	SaaS	A custom predictive maintenance application for pumps in the field

Note that most IoT platforms also offer IoT apps that look like IoT solutions. For example, PTC Thingworx offers the “Controls Advisor App,” and the “Asset Advisor App.” Although these apps are close to an end-user IoT solution, technically they are not: they still require some level of configuration before they can be used. In addition, these apps target developers, not end users. For the purposes of this discussion, we consider these apps as part of IoT platforms, not IoT solutions themselves.

IoT solution areas

The figure below shows the structure of a typical IoT solution. The components of a solution are divided into three sections: collection, analytics, and action.

Figure 2: Footprint of a typical IoT solution (ISV extensibility points shaded in light blue)

1. Collection: This section is concerned with device connectivity, data ingestion, and raw data storage. It contains the devices, edge devices, a cloud gateway, raw data storage, and other systems. Here, ISVs can develop:

Software or hardware adapters that perform protocol translations between the devices and the cloud. Protocols include Modbus, PROFINET, Siemens S5/S7, OPC UA and MTConnect. These adapters could run either on the cloud gateway or the edge devices.
Sensors that can be retrofitted into legacy equipment to connect them to the IoT.

Azure services that could be used in this space: Azure IoT Hub, Azure IoT Edge, Azure Event Grid, Azure Cosmos DB, Azure Blobs Storage, Azure SQL Server, Azure Functions.

2. Analytics: This section enables users to interact with the IoT data and extract actionable insight from it. This section includes analytics engines and databases. Here, ISVs can develop:

Data visualization and analysis applications such as configurable dashboards and apps. Users can query the IoT data to create dashboards or reports. The data can be recent (such is the case in streaming analytics) or historical.
Pre-built AI/ML models for predictive maintenance, inventory optimization, energy management, production optimization, and more.

In order to enable these analytics capabilities, the solution needs to include:

The necessary data models to support the use case-specific analytics.
Data transformations to transform the raw IoT data into an analyzable form.
For ML-driven analytics solutions:
- Pre-selected ML algorithms.
- The data transformations needed to prepare the data to train the ML models.
- Code to deploy the ML models, connect them to client applications, and refresh them with new training data regularly.

Azure services that could be used in this space: Azure SQL Data Warehouse, Azure Data Lake Analytics, Azure Databricks, Azure Data Factory, Azure Stream Analytics, Azure Machine Learning, Azure Time Series Insights.

3. Action: This section contains components that execute actions informed by the insights extracted from the IoT data. Actions can be messages to line-of-business systems (CRM, PLM, ERP, MES, SLM, etc.) or back to the devices (for example, to perform a corrective action such as stopping the device). Actions can be triggered by:

Rules on the streaming or stored data.
User actions from the end user UIs.
Predictions or recommendations from ML models.
Integrations with other systems

Azure services that could be used in this space: Azure Logic Apps, Azure API Management, Azure Data Factory, Azure Functions.

Opportunities for system integrators

There is a clear opportunity for SIs with practices around IoT and data analytics to develop IoT solutions customized to their customers. This opportunity could come from customers:

With unique or complex data analytics requirements.
With complex OT environments: brownfield environments with devices from many vendors, from different years, sending data in a variety of formats and protocols.
With complex IT environments: with a variety of line-of-business applications to connect to, unique data security and governance requirement, etc.
Without the capabilities or desire to develop the solution themselves.

In order to develop custom IoT solutions for their customers, SIs could:

Build the customer solution on top of an IoT platform.
Build the customer solution on top of a cloud platform.
Provide ongoing data analytics services, such as creating new reports, retraining ML models to keep them fresh, etc.

The IoT space is a complex, fragmented, and crowded ecosystem. As such, it is hard for software developers to identify where exactly the opportunity is.

However, there are still many opportunities for both ISVs and SIs in IoT. The first opportunity is to ease the pain of connecting machines to an edge device or cloud gateway. The second is creating value from IoT data by extracting actionable insights. The actions can greatly improve businesses to be more competitive.

There is also an opportunity for ISVs is in the development of IoT platforms and pre-built IoT apps to accelerate the development of IoT solutions. For SIs, the opportunity is in the development of custom IoT solutions. For either case, ISVs and SIs must have strong IoT, data analytics, data science, and cloud skills. These capabilities are in addition to deep manufacturing expertise.

Recommended next steps

There is much more to discuss about IoT solution development. To get started, read the extracting actionable insights from IOT use case guide.

For more resources, including a solution guide, podcasts, webinars, partner and customer highlights, explore Expanding business opportunities with IoT.

↧

Empowering agents with Office 365—Douglas Elliman boosts the agent experience

September 6, 2018, 9:00 am

≫ Next: Reduce costs by optimizing actuarial risk compute on Azure

≪ Previous: Where is the market opportunity for developers of IoT solutions?

Today’s post was written by Jeffrey Hummel, chief technology officer at Douglas Elliman.

When I began work at Douglas Elliman, I was attracted to the company’s heritage—more than 100 years of premier real estate sales experience delivers a cachet that’s a big part of our brand. I was also intrigued by the opportunities I saw to use IT to transform this historic company with new tools and services. We wanted to empower agents to be even more successful in an internet-based market where closing a deal often depends on how quickly an agent can respond to a customer’s IM. Although Douglas Elliman agents are independent contractors, they face the same challenges as any distributed sales force: how to stay productive while working away from the office. They need easy, highly secure access to their data and their colleagues. So, I made it a priority to empower our agents with Microsoft Office 365.

We view our more than 7,000 agents as our best asset and our competitive advantage. They are some of the most knowledgeable people in the industry—we are a market leader in New York, South Florida, California, Connecticut, Colorado, and New Jersey. And we are one of New York City’s top real estate firms ranked by agents with $10 million-plus listings. Yet, when I arrived two and a half years ago, many agents were worried about technology somehow replacing them. I reassured everyone in our 81 sales offices that Douglas Elliman had a new mission: to improve, enhance, and elevate the agent experience. Today, we use Office 365 to show that we care for our agents more than anything else. And agents have gone from saying that IT kept them from working to their best ability to IT being the reason they now are.

We looked at other cloud platforms, but they did not reflect our core values. The tools we chose had to be easy to use, elegant, and efficient—and Office 365 meets all those requirements. Our agents range in age from 21 to 91. I love it when agents with decades of experience tell me, “Jeff, I just did my marketing report, and it took half the time! I was fully connected to all the data I needed online, and I had no trouble finding it.”

I’m most excited about how agents use these productivity tools to help more customers buy and sell more property. We are launching a new intranet, built on Microsoft SharePoint Online, which offers an agent app store where Office 365 will be front and center. Everyone will go there to access the tools they need to run their business and collaborate with their teams. Like many independent sales reps, each of our agents has unique work styles and demands. It’s a big benefit that we can offer customizable tools flexible enough for individual agents to choose how to run their business.

Some agents have already replaced Slack with Microsoft Teams. I consider Teams the greatest thing since the invention of the telephone. With so many options for collaboration all in one place, there’s something for everyone within a given group to improve virtual teamwork. Our top agents can have up to 10 people working for them in different offices. One agent has three members who create marketing materials and two others who do nothing but research commercial properties. They share everything using OneDrive cloud storage. Now we’re showing that agent the value of augmenting this process with Teams as a hub for teamwork where she can quickly access not only relevant materials but also all related communications among her team members. So, when they are talking to the next big client, they’ll have all the information they need in one place to help find a new storefront.

Personal productivity is way up, too. Another top agent who works with new development clients regularly juggles dozens of units at a time. He has to access enormous amounts of data, some of which is not in the public record. He used to store all the information accumulated from his work experience in 36 filing cabinets at the office. So, when a developer asked about zoning for a building site, for example, the agent had to call someone in the office to go and dig through the files. Not anymore. We scanned, categorized, and uploaded all his documents to OneDrive. Now he can get that information himself in less than a second from his mobile device. Using leading-edge tools, this highly successful agent has more time to build relationships with more developers, and his business is expanding.

Along with the launch of our new intranet, aptly named Douglas, we are going to introduce our AI chatbot, AskDouglas. This will start with some basic questions and answers and then evolve to be the go-to source for our agents to get questions answered about historical and relevant information within Douglas Elliman.

While we move our agents’ data to the cloud and introduce cloud-based business tools, we’re also improving our security posture and complying better with data privacy regulations. By using Microsoft security solutions that notify us when an agent’s account may be compromised, we can take proactive steps to thwart an attack, without the agent even knowing.

In two years, the company has changed the impact of IT through our mission to enhance and support our sales force. Today, we have agents raving to the executive team about the transformation they’ve seen in their technology tools and work styles. With the advantages of online collaboration and productivity services, plus real-time access to information, we recruit and retain top talent. Working with Office 365, we are strengthening our core advantage—the knowledge and experience of our agents—and putting it toward the next 100 years at Douglas Elliman.

—Jeffrey Hummel

The post Empowering agents with Office 365—Douglas Elliman boosts the agent experience appeared first on Microsoft 365 Blog.

↧

Reduce costs by optimizing actuarial risk compute on Azure

September 6, 2018, 5:00 am

≫ Next: Connected arms: A closer look at the winning idea from Imagine Cup 2018

≪ Previous: Empowering agents with Office 365—Douglas Elliman boosts the agent experience

Actuarial risk modeling is a compute-intensive operation. It employs thousands of server cores, with many uneven workloads such as monthly and quarterly valuation and production runs to meet regulatory requirements. With these compute-intensive workloads, actuaries today find themselves trapped by traditional on-premises systems (grid computing on hardware) which lack scale and elasticity. Many find they cannot even finish simple tasks like production runs with enough time to spare to correct an error and rerun before a deadline. With scalability and elasticity being the cornerstone of the cloud, risk modeling is incredibly well suited to take advantage of this near bottomless resource. With Azure, you can access more compute power when needed, without having to manage it. This translates to a great savings in time and money.

Your immediate savings comes from reallocating your costs from hardware investments to operational expenses. With on-demand compute, personnel are released from hardware and software burdens. They can devote their time and expertise to other production costs, such as writing scripts for the optimum deployment of cores.

More power, less time, faster reports

New regulations, such as International Financial Reporting Standard 17 (IFRS 17), Solvency II, and Actuarial Guideline XLIII, are increasing the pressure by requiring significant actuarial modeling and additional compute power. The efficiency of the compute grid helps reduce the time that senior actuaries and expensive outside consultants need to wait for results. This reduces the overall cost of generating needed reports. This efficiency also leads to improved predictive modeling, which helps enhance risk management. Finally, the increase in modeling speed yields more accurately priced products and better optimized financial positions.

Cope with the increase in reports

Partner-based solutions running on Azure have features that facilitate reporting under International Financial Reporting Standard (IFRS) 17, the new standard governing insurance contracts. These partner-based solutions provide insurers with a structure to meet the 2021 implementation deadline for the standard. Solutions must offer insurers an automated, secure, and flexible way to manage a complex, data-intensive process and perform key calculations. The solutions do this while helping to bridge actuarial and financial reporting functions with integration to the existing actuarial systems.

Reduce the friction between systems

IFRS 17 forces interactions between finance and actuarial systems. This necessitates the exchange of more data between the two domains, while placing additional time constraints on the actuarial risk process. And that’s where a managed cloud environment for your applications comes into its own. Free from the constraints of fixed data centers, the cloud’s pay-as-you-go cost model allows you to scale hardware rapidly up and down to your changing requirements. At the same time, you can free up in-house IT resources to add value in other areas – transferring any related technology risks to your managed cloud service provider, with its combined knowledge of both the software application and the cloud.

Modular capabilities

Most insurance companies work with a purchased third party modeling solution. They customize the solution with their add-ons to build their risk analysis solutions. Many of the solutions include add-on packages which allow actuaries to run models in an on-premises grid or Azure. In cases where the vendor does not offer a cloud solution, insurance companies augment the systems with process schedulers and cloud scaling tools. To interpret the results, the insurer adds reporting functionality as needed. On Azure, they can use tools like Power BI to summarize the terabytes of data. Individual users can even use bookmarks in Power BI to tell a story with the reports.

Swing at will from low use to high

For insurance companies, actuarial workloads are increasing. With increasingly complicated risk models and new broader regulations, the demand for calculation capacity will grow as well. The cloud is the only feasible solution to meet this growing need. Azure can meet these compute needs with a wide range of hardware and service options available globally. Azure can meet the wild swing of compute demand throughout the year with calculation capacity on demand orchestrated by different services.

Variable sizes to fit

Insurance companies can choose to build solutions to meet these needs or work with our partners to deliver a solution. Mid and small size insurers may prefer the simplicity of using Software as a Service (SaaS) solutions from one of our partners. Large insurers will utilize a mix of SaaS and non-SaaS solutions from one or more of our partners—again backed by the reliability of Azure. With a non-SaaS solution, insurers can add an orchestration layer to managed automation tools and manual processes.

Recommended next steps

Read the Actuarial risk compute and modeling overview to understand your options for moving your compute workload to Azure.

I post regularly about new developments on social media. If you would like to follow me, you can find me on Linkedin and Twitter.

↧

Connected arms: A closer look at the winning idea from Imagine Cup 2018

September 7, 2018, 3:00 am

≫ Next: Who wrote that anonymous NYT op-ed? Text similarity analyses with R

≪ Previous: Reduce costs by optimizing actuarial risk compute on Azure

This blog post was co-authored by Nile Wilson, Software Engineer Intern, Microsoft.

In an earlier post, we explored how several of the top teams at this year’s Imagine Cup had Artificial Intelligence (AI) at the core of their winning solutions. From helping farmers identify and manage diseased plants to helping the hearing-impaired, this year’s finalists tackled difficult problems that affect people from all walks of life.

In this post, we take a closer look at the champion project of Imagine Cup 2018, smartARM.

smartARM

Samin Khan and Hamayal Choudhry are the two-member team behind smartARM. The story begins with a by-chance meeting of these middle school classmates. Studying machine learning and computer vision at the University of Toronto, Samin decided to register for the January 2018 UofTHacks hackathon, and coincidentally ran into Hamayal, studying mechatronics engineering at the University of Ontario Institute of Technology. Despite his background, Hamayal was more interested in spectating than in participating. But when catching up, they realized that by combining their skillsets in computer vision, machine learning, and mechatronics, they might just be able to create something special. With this realization, they formed a team at the hackathon and have been working together since. From the initial hackathon to the Imagine Cup, Khan and Choudhry’s idea has evolved from a small project to a product designed to have an impact on the world.

A smartARM prototype resting palm-up on a table.

In the early stages of development, Khan and Choudhry were able to work with Annelisa Bowry, who was excited to provide input for the development of a device that she could use herself as a congenital amputee. Listening to her experiences and receiving direct feedback helped the team tremendously.

While robotic upper-limb prostheses are not new, Khan and Choudhry’s AI approach innovates on the idea of prosthesis control and greatly improves the feasibility of a well-functioning, affordable robotic solution for amputees. Let’s explore how smartARM uses AI to bypass difficult challenges faced by advanced prosthesis developers today, and how this solution can impact the community.

State of the (prosthetic) arm

Before jumping into details, it’s important to understand what kinds of upper-limb prostheses are out there today. Modern prosthetic arms can be broadly sorted into two categories: functional and cosmetic.

Within the functional prosthesis category, there is a wide variety of available functions, from simple hooks to neural interfacing robotic arms. Unfortunately, higher function is often accompanied by higher cost.

When going about our daily activities, we tend to use a wide variety and combination of grasps to interact with a multitude of objects and surfaces. Low-cost and simple solutions, such as hooks, typically only perform one type of grasp. These solutions are body-powered and operated by the user moving a certain part of their body, such as their shoulder. Although these simple solutions are low-cost and can provide the user with a sense of force through cable tension, the prostheses have very limited function. There are also electrically powered alternatives for these simple arms that reduce the strain on the user, as they no longer require body movements to power the arm, but they are typically heavier and a bit costlier without providing much more function.

Neural interfacing research prosthetic arm. Photo courtesy of The Johns Hopkins University Applied Physics Laboratory.

In contrast, neural interfacing upper-limb prostheses provide greater function, including more natural control and multiple grips, but cost much more. These myoelectric prostheses pick up on changes in the electrical activity of muscles (EMG) and are programmed to interpret specific changes to then route as commands to the robotic hand. This allows the wearer to more naturally go about trying to control their arm. However, myoelectric prostheses are not yet advanced enough to allow for direct, coordinated translation of individual muscle activations to complex hand movements. Some of these myoelectric devices require surgery to move nerve endings from the end of the limb to the chest through an operation called Targeted Muscle Reinnervation (TMR).

Even without surgery, the high cost and limited availability of these myoelectric prostheses limits accessibility. Many are still in the research stage and not yet commercially available.

SmartARM aims to fill this gap between low-cost, low-function and high-cost, higher-function below-elbow prostheses through its use of low-cost materials and AI.

Simplicity and innovation

Now that we have a better understanding of what’s available on the market today, we can ask, “What makes smartARM so different?”

Unlike myoelectric prostheses, smartARM does not rely on interpreting neural signals to inform and affect hand shape for grasping. Instead of requiring users to undergo surgery or spend time training muscles or remaining nerve endings to control the prosthesis, smartARM uses AI to think for itself. Determining individual finger movement intentions and complex grasps through EMG is a non-trivial problem. By bypassing the peripheral nervous system for control, smartARM allows for easy control that is not limited by our ability to interpret specific changes in muscle activations.

The camera placed on the wrist of the smartARM allows the prosthetic limb to determine what grasp is appropriate for the object in front of it, without requiring explicit user input or EMG signals. Although the arm “thinks” for itself, it does not act by itself. The user has complete control over when the smartARM activates and changes hand position for grasping.

Although the idea of using a camera may seem simple, it’s not something that has been thoroughly pursued in the world of prosthetic arms. Some out-of-the-box thinking around how the world of prosthetics can blend with the world of computer vision might just help with the development of an affordable multi-grip solution.

Another potential standout feature about a smartARM-like solution is its ability to learn. Because it utilizes AI to determine individual finger positions to form various grips, the prosthesis can learn to recognize more types of objects as development continues alongside the wearer’s use.

How does it work?

So, how does the smartARM really work?

The camera placed on the wrist of the smartARM continuously captures live video of the view in front of the palm of the hand. This video feed is processed frame-by-frame by an onboard Raspberry Pi Zero W single-board computer. Each frame is run through a locally deployed multi-class model, developed on Microsoft Azure Machine Learning Studio, to estimate the object’s 3D geometric shape.

Various common grips associated with different shapes are stored in memory. Once the shape is determined, smartARM selects the corresponding common grip that would allow the user to properly interact with the object. However, smartARM does not change hand position until the user activates the grip.

Prior to activating the grip, the user brings the smartARM close to the object they would like to grasp, positioning the hand such that it will hold the object once the grip is activated.

To activate the grip, the user flexes a muscle they have previously calibrated with the trigger muscle sensor. The flex triggers the grip, which moves the fingers to the proper positions based on the pre-calculated grasp. The user is then able to hold the object for as long as the calibrated muscled is flexed.

AI Oriented Architecture for smartARM.

Because smartARM is cloud compatible, it is not limited to the initially loaded model and pre-calculated grips on-board. Over time, the arm can learn to recognize different types of shapes and perform a wider variety of grasps without having to make changes to the hardware.

In addition, users have the option of training the arm to perform new grasps through sending video samples to the cloud with the corresponding finger positions of the desired grip. This allows for individuals to customize the smartARM to recognize objects specific to their daily lives and actuate grips that they commonly use, but others may not.

Digitally transforming the prosthetic arm

By combining low-cost materials with the cloud and AI, Khan and Choudhry present smartARM as an affordable yet personalized alternative to state-of-the-art advanced prosthetic arms. Not only does their prosthesis allow for multiple types of grips, but it also allows for the arm to learn more grips over time without any costly hardware modifications. This has the potential to greatly impact the quality of life for users who cannot afford expensive myoelectric prostheses but still wish to use a customizable multi-grip hand.

“Imagine a future where all assistive devices are infused with AI and designed to work with you. That could have tremendous positive impact on the quality of people’s lives, all over the world.”

Joseph Sirosh, Corporate Vice President and CTO of AI, Microsoft

Khan and Choudhry’s clever, yet simple, infusion of AI into the prosthetic arm is a great example of how the cloud and AI can truly empower people.

If you have your own ideas for how AI can be used to solve problems important to you and want to know where to begin, get started at the Microsoft AI School site where you will find free tutorials on how to implement your own AI solutions. We cannot wait to see the cool AI applications you will build.

Joseph & Nile

↧

Who wrote that anonymous NYT op-ed? Text similarity analyses with R

September 7, 2018, 12:26 pm

≫ Next: Because it’s Friday: Stars in Motion

≪ Previous: Connected arms: A closer look at the winning idea from Imagine Cup 2018

In US politics news, the New York Times took the unusual step this week of publishing an anonymous op-ed from a current member of the White House (assumed to be a cabinet member or senior staffer). Speculation about the identity of the author is, of course, rife. Much of the attention has focused on the use of specific words in the article, but can data science provide additional clues? In the last 48 hours, several R users have employed text similarity analysis to try and identify likely culprits.

David Robinson followed a similar process to the one he used to identify which of President Trump's tweets were written by him personally (as opposed to a staffer). He downloaded the tweets written by 69 Cabinet members and WH staff who maintain accounts on on Twitter (about 1.5 million words total). Then, he used the TD-IDF (term-frequency inverse-document-frequency) distance measure to find the rate at which certain words are more frequently (or less frequently) by the each subject. Comparing those rates to the rate of use in the op-ed. After ruling out the VP (and the President himself), this analysis suggests Secretary of State Pompeo as a suspect, in part due to his preference for the word "malign".

David Robinson's analysis

Michael Kearney also used tweets as the comparison text, but limited the suspect pool to cabinet members only. He compared the text of those tweets to the NYT op-ed on using 107 numeric features, such as use of capitalizaion, punctuation, whitespace, and sentence length and structure. On that basis, again ruling out the President and VP, the closest match is Secretary Pompeo.

Michael Kearney's analysis

Kanishka Misra also used tweets from cabinet members, but used the Burows' Delta metric to compare their text to that of the op-ed; those tweets with similar absolute word frequencies to the op-ed would be "closest" by this measure. Secretary of Transportation Elaine Chao is the closest match using this method.

Kanishka Misra's analysis

The authors take great pains to point out the limitations of these analyses: not all possible authors are active on Twitter; some of those likely don't write all their own tweets; Twitter is a medium that uses very different language than a newspaper editorial; and it's even possible the author was deliberately concealing their authorship through their language choices. But even so, the use of data has highlighted some unusual turns of phrase in the original op-ed that might have otherwise gone unobserved.

In each case the analysis was performed using the R language, and the authors make use of a common set of R packages to assist with the analysis:

rvest to scrape data from the NYT website,
rtweet, to download the text of tweets from likely suspects,
tidytext, to tokenize and otherwise prepare the texts for analysis, and
widyr to reshape the data for distance calculations

The full R code behind the analysis is provided by each author, and if nothing else provides interesting examples of different methods for text similarity analysis. These articles are a great demonstration of the power of rapid, in-depth data science with R: the op-ed was only published two days ago!

↧

Because it’s Friday: Stars in Motion

September 7, 2018, 1:00 pm

≫ Next: The Extremely Promising State of Diabetes Technology in 2018

≪ Previous: Who wrote that anonymous NYT op-ed? Text similarity analyses with R

In the center of our galaxy, in the direction of the constellation Sagittarius, lies the massive black hole around which the entire galaxy revolves. The European Space Agency has observed that region for 26 years, and you can actually see stars orbiting the the black hole! (via @_Astro_Nerd_)

Kinda puts things into perspective, doesn't it? Anyway, that wraps things up for this week. We'll be back again after the weekend — have a good one!

↧

The Extremely Promising State of Diabetes Technology in 2018

September 6, 2018, 6:23 pm

≫ Next: Introducing GitHub Pull Requests for Visual Studio Code

≪ Previous: Because it’s Friday: Stars in Motion

This blog post is an update to these two Diabetes Technology blog posts:

You might also enjoy this video of the talk I gave at WebStock 2018 on Solving Diabetes with an Open Source Artificial Pancreas*.

First, let me tell you that insulin is too expensive in the US.

Between 2002 and 2013, the price of insulin jumped, with the typical cost for patients increasing from about $40 a vial to $130.

Open Source Artificial Pancreas on iPhone For some of the newer insulins like the ones I use, I pay as much as $296 a bottle. I have a Health Savings Plan so this is often out of pocket until I hit the limit for the year.

People in America are rationing insulin. This is demonstrable fact. I've personally mailed extra insulin to folks in need. I've meet young people who lost their insurance at age 26 and have had to skip shots to save vials of insulin.

This is a problem, but on the technology side there's some extremely promising work happening, and it's we have really hit our stride in the last ten years.

I wrote the first Glucose Management system for the PalmPilot in 1998 called GlucoPilot and provided on the go in-depth analysis for the first time. The first thing that struck me was that the PalmPilot and the Blood Sugar Meter were the same size. Why did I need two devices with batteries, screens, buttons and a CPU? Why so many devices?

I've been told every year the a Diabetes Breakthrough is coming "in five years." It's been 25 years.

In 2001 I went on a trip across the country with my wife, an insulin pump and 8 PDAs (personal digital assistants, the "iPhones" of the time) and tried to manage my diabetes using all the latest wireless technology...this was the latest stuff 17 years ago. I had just moved from injections to an insulin pump. Even now in 2018 Insulin Pumps are expensive, mostly proprietary, and super expensive. In fact, many folks use insulin pumps in the states use out of warranty pumps purchased on Craigslist.

Fast forward to 2018 and I've been using an Open Source Artificial Pancreas for two years.

OpenAPS - Open Artificial Pancreas System. A platform for building a closed-loop with open tools.
- AndroidAPS - A branch of OpenAPS running on Android
Loop/LoopKit - Open Source Artificial Pancreas running on the iPhone with a hardware bridge (RileyLink) to the pump.
- I run this pancreas, personally, and have for nearly 2 years.
Watch Dana Lewis (the originator of OpenAPS) talk about OpenAPS at OSCON!

The results speak for themselves. While I do have bad sugars sometimes, and I do struggle, if you look at my blood work my HA1c (the long term measurement of "how I'm doing" shows non-diabetic levels. To be clear - I'm fully and completely Type 1 diabetic, I produce zero insulin of my own. I take between 40 and 50 Units of insulin every day, and have for the last 25 years...but I will likely die of old age.

Open Source Artificial Pancreas === Diabetes results pic.twitter.com/ZSsApTLRXq
— Scott Hanselman (@shanselman) September 10, 2018

This is significant. Why? Because historically diabetics die of diabetes. While we wait (or more accurately, #WeAreNotWaiting) for a biological/medical solution to Type 1 diabetes, the DIY (Do It Yourself) community is just doing it ourselves.

Building on open hardware, open software, and reverse-engineered protocols for proprietary hardware, the online diabetes community literally has their choice of open source pancreases in 2018! Who would have imagined it. You can choose your algorithms, your phone, your pump, your continuous glucose meter.

Today, in 2018, you can literally change the code and recompile a personal branch of your own pancreas.

Watch my 2010 YouTube video "I am Diabetic" as I walk you through the medical hardware (pumps, needles, tubes, wires) in managing diabetes day to day. Then watch my 2018 talk on Solving Diabetes with an Open Source Artificial Pancreas*.

I believe that every diabetic should be offered a pump, a continuous glucose meter, and trained on some kind of artificial pancreas. A cloud based reporting system has also been a joy. My wife and family can see my sugar in real time when I'm away. My wife has even called me overseas to wake me up when I was in a bad sugar situation.

Artificial Pancreas generations

As the closed-hardware and closed-software medical companies work towards their own artificial pancreases, the open source community feel those companies would better serve us by opening up their protocols, using standard Bluetooth ISO profiles and use security best practices.

Looking at the table above, the open source community is squarely in #4 and moving quickly into #5. But why did we have to do this ourselves? We got tired of waiting.

All in all, through open software and hardware, I can tell you that my life is SO MUCH BETTER than it was when I was first diagnosed. I figure we'll have this all figured out in about five years, right? ;)

THANK YOU!

Introducing GitHub Pull Requests for Visual Studio Code

September 9, 2018, 5:00 pm

≫ Next: Announcing Azure Pipelines with unlimited CI/CD minutes for open source

≪ Previous: The Extremely Promising State of Diabetes Technology in 2018

↧

Announcing Azure Pipelines with unlimited CI/CD minutes for open source

September 10, 2018, 12:55 am

≫ Next: Introducing Azure DevOps

≪ Previous: Introducing GitHub Pull Requests for Visual Studio Code

With the introduction of Azure DevOps today, we’re offering developers a new CI/CD service called Azure Pipelines that enables you to continuously build, test, and deploy to any platform or cloud. It has cloud-hosted agents for Linux, macOS, and Windows, powerful workflows with native container support, and flexible deployments to Kubernetes, VMs, and serverless environments.

Microsoft is committed to fueling open source software development. Our next step in this journey is to provide the best CI/CD experience for open source projects. Starting today, Azure Pipelines provides unlimited CI/CD minutes and 10 parallel jobs to every open source project for free. All open source projects run on the same infrastructure that our paying customers use. That means you’ll have the same fast performance and high quality of service. Many of the top open source projects are already using Azure Pipelines for CI/CD, such as Atom, Cpython, Pipenv, Tox, Visual Studio Code, and TypeScript – and the list is growing every day.

In the following, you can see Atom running parallel jobs on Linux, macOS, and Windows for its CI.

Azure Pipelines app on GitHub Marketplace

Azure Pipelines has an app in the GitHub Marketplace so it’s easy to get started. After you install the app in your GitHub account, you can start running CI/CD for all your repositories.

Pull Request and CI Checks

When the GitHub app is setup, you’ll see CI/CD checks on each commit to your default branch and every pull request.

Our integration with the GitHub Checks API makes it easy to see build results in your pull request. If there’s a failure, the call stack is shown as well as the impacted files.

More than just open source

Azure Pipelines is also great for private repositories. It is the CI/CD solution for companies like Columbia, Shell, Accenture, and many others. It’s also used by Microsoft’s biggest projects like Azure, Office 365, and Bing. Our free offer for private projects includes a cloud-hosted job with 1,800 minutes of CI/CD a month or you can run unlimited minutes of CI/CD on your own hardware, hosted in the cloud or your on-premises hardware. You can purchase parallel jobs for private projects from Azure DevOps or the GitHub Marketplace.

In addition to CI, Azure Pipelines has flexible deployments to any platform and cloud, including Azure, Amazon Web Services, and Google Cloud Platform, as well as any of your on-premises server running Linux, macOS or Windows. There are built-in tasks for Kubernetes, serverless, and VM deployments. Also, there’s a rich ecosystem of extensions for the most popular languages and tools. The Azure Pipelines agent and tasks are open source and we’re always reviewing feedback and accepting pull requests on GitHub.

Join our upcoming live streams to learn more about Azure Pipelines and other Azure DevOps services.

Keynote: Watch our live Azure DevOps keynote on September 11, 2018 from 8:00 - 9:30 AM Pacific Time.
Live training: Join our live Mixer workshop with interactive Q&A on September 17, 2018 from 8:30 AM - 2:30 PM Pacific Time.

You can save-the-date and watch both live streams on our events page. There you’ll also find additional on-demand videos and other resources to help get you started.

I’m excited for you to try Azure Pipelines and tell us what you think. You can share your thoughts directly to the product team using @AzureDevOps, Developer Community, or comment on this post.

Jeremy Epling

@jeremy_epling

↧

Introducing Azure DevOps

September 10, 2018, 12:55 am

≫ Next: Learn how Key Vault is used to secure the Healthcare AI Blueprint

≪ Previous: Announcing Azure Pipelines with unlimited CI/CD minutes for open source

Today we are announcing Azure DevOps. Working with our customers and developers around the world, it’s clear DevOps has become increasingly critical to a team’s success. Azure DevOps captures over 15 years of investment and learnings in providing tools to support software development teams. In the last month, over 80,000 internal Microsoft users and thousands of our customers, in teams both small and large, used these services to ship products to you.

The services we are announcing today span the breadth of the development lifecycle to help developers ship software faster and with higher quality. They represent the most complete offering in the public cloud. Azure DevOps includes:

Azure Pipelines

CI/CD that works with any language, platform, and cloud. Connect to GitHub or any Git repository and deploy continuously. Learn More >

Azure Boards

Powerful work tracking with Kanban boards, backlogs, team dashboards, and custom reporting. Learn more >

Azure Artifacts

Maven, npm, and NuGet package feeds from public and private sources. Learn more >

Azure Repos

Unlimited cloud-hosted private Git repos for your project. Collaborative pull requests, advanced file management, and more. Learn more >

Azure Test Plans

All in one planned and exploratory testing solution. Learn more >

Each Azure DevOps service is open and extensible. They work great for any type of application regardless of the framework, platform, or cloud. You can use them together for a full DevOps solution or with other services. If you want to use Azure Pipelines to build and test a Node service from a repo in GitHub and deploy it to a container in AWS, go for it. Azure DevOps supports both public and private cloud configurations. Run them in our cloud or in your own data center. No need to purchase different licenses. Learn more about Azure DevOps pricing.

Here's an example of Azure Pipelines used independently to build a GitHub repo:

Alternatively, here’s an example of a developer using all Azure DevOps services together from the vantage point of Azure Boards.

Open Source projects receive free CI/CD with Azure Pipelines

As an extension of our commitment to provide open and flexible tools for all developers, Azure Pipelines offers free CI/CD with unlimited minutes and 10 parallel jobs for every open source project. With cloud hosted Linux, macOS and Windows pools, Azure Pipelines is great for all types of projects.

Many of the top open source projects are already using Azure Pipelines for CI/CD, such as Atom, Cpython, Pipenv, Tox, Visual Studio Code, and TypeScript – and the list is growing every day.

We want everyone to have extremely high quality of service. Accordingly, we run open source projects on the same infrastructure that our paying customers use.

Azure Pipelines is also now available in the GitHub Marketplace making it easy to get setup for your GitHub repos, open source or otherwise.

Here’s a walkthrough of Azure Pipelines:

Learn more >

The evolution of Visual Studio Team Services (VSTS)

Azure DevOps represents the evolution of Visual Studio Team Services (VSTS). VSTS users will be upgraded into Azure DevOps projects automatically. For existing users, there is no loss of functionally, simply more choice and control. The end to end traceability and integration that has been the hallmark of VSTS is all there. Azure DevOps services work great together. Today is the start of a transformation and over the next few months existing users will begin to see changes show up. What does this mean?

URLs will change from abc.visualstudio.com to dev.azure.com/abc. We will support redirects from visualstudio.com URLs so there will not be broken links.
As part of this change, the services have an updated user experience. We continue to iterate on the experience based on feedback from the preview. Today we’re enabling it by default for new customers. In the coming months we will enable it by default for existing users.
Users of the on-premises Team Foundation Server (TFS) will continue to receive updates based on features live in Azure DevOps. Starting with next version of TFS, the product will be called Azure DevOps Server and will continue to be enhanced through our normal cadence of updates.

Learn more

To learn more about Azure DevOps, please join us:

Keynote: Watch our live Azure DevOps keynote on September 11, 2018 from 8:00 - 9:30 AM Pacific Time.
Live training: Join our live Mixer workshop with interactive Q&A on September 17, 2018 from 8:30 AM - 2:30 PM Pacific Time.

You can save-the-date and watch both live streams on our events page. There you’ll also find additional on-demand videos and other resources to help get you started.

We couldn’t be more excited to offer Azure DevOps to you and your teams. We can’t wait to see what amazing things you create with it.

↧

Learn how Key Vault is used to secure the Healthcare AI Blueprint

September 10, 2018, 6:00 am

≫ Next: Azure.Source – Volume 48

≪ Previous: Introducing Azure DevOps

System security is a top priority for any healthcare organization. There are many types of security including physical, network, application, email and so on. This article covers the system security provided by Azure Key Vault. Specifically, we examine the Key Vault implementation used in the Azure Healthcare blueprint. The intent is to demonstrate how a Key Vault works by seeing it used with the blueprint.

Securing sensitive data in the real world

In a healthcare organization there are potentially dozens (or hundreds) of users that need access to sensitive data from diverse sources. Doctors, technicians, receptionists — some need access to just x-rays, some to payment schedules, and doctors need patient records. The matrix of users and data stores can be large. Managing so many permissions could be a nightmare. For dashboards or other user interfaces, permission needs to be granted to service accounts. For example, in machine learning a data scientist may need to query data from many data repositories to find correlations, and will need appropriate rights to those data stores.

In the blueprint, a Key Vault stores data like passwords and secrets that system users need access to things like databases and Machine Learning studio (MLS).

What Key Vault provides

Most complex systems require a secure and reliable way to store various types of data. This is especially true in regulated environments like healthcare, where patient and other confidential information must be protected.

A Key Vault provides separation of the application from any cryptography needed by that application. For example, a custom application deployed to Azure can access any keys, secrets, or certificates it has permission to access from a URL at runtime.

When using Key Vault to provide reliable technical security, it stores keys, certificates and what are referred to as “secrets.” A secret may be any string that is sensitive, such as a database connection string or a password.

Keys stored in key vaults are safeguarded by hardware security modules (HSM), a special trusted network computer performing a variety of cryptographic operations like key storage and management. Key Vaults take advantage of HSMs and are FIPS 140-2 Level 2 validated.

Using Key Vault

There may be more than one Key Vault in an Azure tenant. It is a good practice to dedicate one Key Vault to a single application or system and another to a different system. This ensures Key Vaults respect the single responsibility principal by separating application or system specific stored items. Single-purpose Key Vaults are easier to manage than storing data from all systems and applications in one vault.

Anyone with an Azure subscription can create and use Key Vaults. The person who creates the Key Vault is the owner for it. In the Healthcare for AI blueprint, the user Alex_siteadmin has full access to the information in the Key Vault created using her credentials during the initial installation.

Users may access data in the vault after the vault owner provides access permissions. The owner provides developers with URIs to call from their applications.

In the blueprint system, Alex_siteadmin may access the sensitive data (certificates, keys, secrets) in the vault. To do this via the portal, simply click the Key Vault resource and look under the “settings” section.

Click the type of setting you wish to see: keys, secrets or certificates. After clicking Secrets, one finds Sqldb-ConnectionString in the list shown below.

Clicking Sqldb-ConnectionString reveals the URL to the connection string secret. Although Alex_siteadmin can see the URL to access the connection string, she cannot see the connection string itself.

Using Key Vault from an application

URLs are provided by Key Vaults for resource access in applications running under authored accounts. PowerShell and Azure CLI commands are available to retrieve keys secrets from vaults but accessing the key from a running application is typically the primary intent for storing items in the vault.

Secrets, keys and certificates are all available via a URL, but only if the current user has permissions to the item in the vault. Access policies must be set for users and service accounts to access vault items.

Managing Key Vaults

As mentioned above, it is a recommended practice to have separate instances of Key Vaults for different applications. This makes managing a Key Vault much simpler than storing secrets from multiple applications in a single vault.

It is also important to manage access policies for the various items in the vault. Accounts should be configured to have access solely to those items they need — and nothing else. Permissions are set via access policies which provide fine-grained control of permissions to items in the vault.

Deleting and restoring Key Vaults

“Soft delete” is a Key Vault feature that may be enabled on a vault. When this is true, if a Key Vault is deleted, it is recoverable for 90 days. It disappears from the Azure portal and it looks like the Key Vault has been completely deleted, like any other resource or service in Azure. This isn’t the case, however. It is held by Azure for 90 days and can be restored for any reason. Because of this precaution, a new Key Vault with the same name cannot be added to the Azure subscription until the “soft deleted” vault is truly deleted.

Restoring a Key Vault

To restore a Key Vault, owners may turn to PowerShell or the Azure CLI. In PowerShell, execute the following command.

Undo-AzureRmKeyVaultRemoval -VaultName <vault name> -ResourceGroupName <resource group> -Location <location>

Deleting a soft delete Key Vault

PowerShell and the Azure CLI also provides commands for permanently deleting a soft deleted vault. This can be used to delete the vault before the 90-day period has elapsed. This is the PowerShell command:

Remove-AzureRmKeyVault -VaultName <vault name> -InRemovedState -Location <location>

Wrapping up

The Azure Healthcare AI Blueprint makes extensive use of Key Vault. It helps to ensure HIPAA compliance and uphold HITRUST certification. Create a separate Key Vault to experiment with in your Azure environment. Do not change any items in the app’s Key Vault — otherwise, the application may stop working.

Key Vaults focus specifically on security. It is the central store for keys, certificates and secrets like database connection strings, passwords, and other sensitive information. When looking to secure your data and security keys, start your journey with Key Vault.

Recommended next steps:

Do the quickstart: Set and retrieve a secret from Azure Key Vault using PowerShell. In this quickstart, you use PowerShell to create a key vault. You then store a secret in the newly created vault.
Read the free solution guide Implementing the Azure blueprint for healthcare. Leverage the knowledge from this guide to deploy the Azure Healthcare AI Blueprint and explore how the blueprint is secured using key vault.

↧

Azure.Source – Volume 48

September 10, 2018, 7:00 am

≫ Next: What’s new in Azure DevOps Launch Update

≪ Previous: Learn how Key Vault is used to secure the Healthcare AI Blueprint

Now in preview

Avere vFXT for Microsoft Azure now in public preview - The Avere vFXT for Azure is a high-performance cache for HPC workloads in the cloud that provides low-latency access to Azure compute from Azure Blob and file-based storage locations. The Avere vFXT for Azure has no charge associated with licensing; however, costs associated with consumption do apply at normal rates. Avere Systems joined Microsoft earlier this year.

Powerful Debugging Tools for Spark for Azure HDInsight - Microsoft runs one of the largest big data cluster in the world – internally called “Cosmos”, which runs millions of jobs across hundreds of thousands of servers over multiple exabytes of data. Being able to run and manage jobs of this scale by developers was a huge challenge. We built powerful tools that graphically show the entire job graph, which helped developers greatly and we want to bring that power to all HDInsight Spark developers. The default Spark history server user experience is now enhanced in HDInsight with rich information on your spark jobs with powerful interactive visualization of Job Graphs & Data Flows. The new features greatly assist Spark developers in job data management, data sampling, job monitoring and job diagnosis.

Screenshot of an example Spark job graph displaying Spark job execution details with data input and output across stages

Also in preview

Azure Friday

Azure Friday | Enhanced productivity using Azure Data Factory visual tools - Gaurav Malhotra joins Scott Hanselman to discuss the Azure Data Factory visual tools, which enable you to iteratively create, configure, test, deploy, and monitor data integration pipelines. We took into account your feedback to enable functional, performance, and security improvements to the visual tools.

Now generally available

The Azure Podcast

The Azure Podcast | Episode 245 - Azure Certifications - Microsoft Consultants Doug Strother and John Miller, both veterans of certifications, share some tips on getting your Azure certification.

News and updates

Exciting new capabilities on Azure HDInsight - Ashish Thapliyal Principal, Program Manager, Azure HDInsight provides a roll-up of the various updates for Azure HDInsight from the past few months, including Apache Phoenix and Zeppelin integration, Oozie support in HDInsight enterprise security package, Azure Data Lake Storage Gen2 integration, Virtual Network Service Endpoints support, support for Spark 2.3, and more.

Additional news and updates

The IoT Show

The IoT Show | Monitoring And Diagnostics of an IoT Solution with Azure IoT Hub - When your IoT solution is misbehaving, what do you do? John Lian, PM on the Azure IoT team, shows us some ways to configure proper alerts and get the right logs to troubleshoot common problems.

The IoT Show | IoT In Action - Introducing Azure Sphere - Check out this sneak peek into the upcoming Special Edition of the IoT in Action Webinar Series – An Introduction to Microsoft Azure Sphere! Azure Sphere is a solution for creating highly-secured, connected microcontroller unit (MCU) powered devices, providing you with the confidence and the power to reimagine your business and create the future.

Technical content and training

Transparent data encryption or always encrypted? - Transparent Data Encryption (TDE) and Always Encrypted are two different encryption technologies offered by SQL Server and Azure SQL Database. Generally, encryption protects data from unauthorized access in different scenarios. They are complementary features, and this blog post will show a side-by-side comparison to help decide which technology to choose and how to combine them to provide a layered security approach.

Connected arms: A closer look at the winning idea from Imagine Cup 2018 - Joseph Sirosh, Corporate Vice President, Artificial Intelligence & Research, takes a closer look at the champion project of Imagine Cup 2018, smartARM. smartARM is a low-cost, low-function and high-cost, higher-function below-elbow prostheses through its use of low-cost materials and AI. By combining low-cost materials with the cloud and AI, Samin Khan and Hamayal Choudhry present smartARM as an affordable yet personalized alternative to state-of-the-art advanced prosthetic arms.

Additional technical content

Azure tips & tricks

Learn about the underlying Software in Azure Cloud Shell - Learn about the software found inside an Azure Cloud Shell instance. Get a deeper look into what happens when you fire an Azure Cloud Shell and what is happening in the underlying operating system.

How to use tags to quickly organize Azure Resources - Learn how to take advantage of tags to organize your Azure resources. Watch how to use tags to add additional metadata, allowing you categorize and search for your resources in the Azure portal.

Events

Join us for two days of live streaming activity that will help you learn all the details of what was announced for Azure DevOps set of services for developers, Azure Pipelines CI/CD platform and new GitHub extensions from Microsoft:

Azure DevOps Keynote - Keynote: Watch our live Azure DevOps keynote on September 11, 2018 from 8:00 - 9:30 AM Pacific Time (GMT -7).

Azure DevOps Workshop - Join our live Mixer workshop with interactive Q&A on September 17, 2018 from 8:30 AM - 2:30 PM Pacific Time (GMT -7).

Tuesdays with Corey

Tuesdays with Corey | Azure Event Hubs with Kafka coolness - Corey Sanders, Corporate VP - Microsoft Azure Compute team sat down with Dan Rosanova, Principal PM responsible for Azure messaging services. Dan shares some new functionality around Event Hubs with Kafka compatibility.

Industries

Delivering innovation in retail with the flexible and productive Microsoft AI platform - Learn how the application of AI is transforming the retail and consumer goods industry resulting in positive business improvements aimed at solving a range of service to production-type problems. Whether you want to use pre-built AI APIs or develop custom models with your data, Microsoft's AI platform offers flexibility using the infrastructure and tools available in the platform. Determine the best path forward by understanding each approach.

Save money on actuarial compute by retiring your on-premises HPC grids - Every insurance company sees the increasing demand from growth in the number of policies processed, and new regulations that require changes to the actuarial and accounting systems. FRS-17 requires changes to workflows, reporting and control throughout the actuarial and accounting process. Read this post to learn why now is the time to move to a cloud-based solution on Azure, and download the Actuarial risk compute and modeling overview to understand your options for moving your compute workload to Azure.

Diagrm showing 27 logos representing various industry compliance programs

Finish your insurance actuarial modeling in hours, not days - Actuarial departments everywhere work to make sure that their financial and other models produce results which can be used to evaluate their business for regulatory and internal needs. Today, it is common for quarterly reporting to require thousands of hours of compute time. Most actuarial packages on the market support in-house grids with the ability to either extend the grid to Azure, or to use Azure as the primary environment. Learn how actuarial teams review their models and processes to maximize the scale benefits of the cloud.

Anti-money laundering – Microsoft Azure helping banks reduce false positives - Anti-Money Laundering (AML) Transactions Monitoring Systems (TMS) identify suspicious transactions that may involve illicit proceeds or legitimate proceeds used for illegal purposes. AML costs across the banking industry due to this regulatory blunt force approach, has resulted in massive inefficiency and high operating costs for banks. The predominant TMS technologies are antiquated batch rule-based systems that have not fundamentally changed since the late ‘90s, and have not kept pace with increasing regulatory expectations and continually evolving money laundering techniques. Learn how the application of ML and AI can quickly flag changes in patterns of activity, whether caused by new product offerings or money laundering.

Describe, diagnose, and predict with IoT Analytics - The whole point of collecting IoT data is to extract actionable insights. Insights that will trigger some sort of action that will result in some business value such as optimized factory operations, improved product quality, better understanding of customer demand, new sources of revenue, and improved customer experience. This post discusses the extraction of value out of IoT data by focusing on the analytics part of the story, the four types of analytics and how they map to IoT, the role of machine learning in IoT analytics, and an overview of a solution architecture that highlights the components of an IoT solution with analytics.

Solution diagram showing components of an IoT Analytics Solution

Current use cases for machine learning in retail and consumer goods - Retail and consumer goods companies are seeing the applicability of machine learning (ML) to drive improvements in customer service and operational efficiency. For example, the Azure cloud is helping retail and consumer brands improve the shopping experience by ensuring shelves are stocked and product is always available when, where and how the consumer wants to shop. Check out 8 ML use cases to improve service and provide benefits of optimization, automation and scale. In addition, this post has links to a lot of other resources you should check out, such as an ML algorithm cheat sheet.

Where is the market opportunity for developers of IoT solutions? - There is a clear opportunity for SIs with practices around IoT and data analytics to develop IoT solutions customized to their customers. The IoT space is a complex, fragmented, and crowded ecosystem. As such, it is hard for software developers to identify where exactly the opportunity is. This post discusses the biggest market opportunities for developers of IoT solutions. There is also an opportunity for ISVs is in the development of IoT platforms and pre-built IoT apps to accelerate the development of IoT solutions.

Reduce costs by optimizing actuarial risk compute on Azure - Actuarial risk modeling is a compute-intensive operation. Employing thousands of server cores, with many uneven workloads such as monthly and quarterly valuation and production runs to meet regulatory requirements, these compute-intensive workloads, are well suited to take advantage of this near bottomless resources of the cloud. Learn how Azure enables you to access more compute power when needed, without having to manage it. And read the actuarial risk compute and modeling overview to understand your options for moving your compute workload to Azure.

A Cloud Guru's Azure This Week

A Cloud Guru's Azure This Week - 7 September 2018 - This Week: In this episode of Azure This Week, Dean takes a look at OMS, and its new home in the Azure Portal, new features of Visual Studio Live Share as well as updating you on a brand new Azure course on A Cloud Guru.

↧

What’s new in Azure DevOps Launch Update

September 10, 2018, 8:06 am

≫ Next: Application Insights improvements for Java and Node.js

≪ Previous: Azure.Source – Volume 48

We shared some exciting news this morning on the Azure blog: We’re bringing Azure DevOps to developers and enterprises around the world who need flexible and efficient tools for their development process. With the evolution of Visual Studio Team Services (VSTS) into Azure DevOps, current VSTS customers will shortly receive a Launch Update that seamlessly... Read More

↧

Application Insights improvements for Java and Node.js

September 11, 2018, 2:00 am

≫ Next: AI helps troubleshoot an intermittent SQL Database performance issue in one day

≪ Previous: What’s new in Azure DevOps Launch Update

Did you know that Application Insights supports Java and Node.js? That's because at Microsoft our mission is to empower every person and every organization on the planet to achieve more. For those of us on the Azure Application Insights team, every person means every developer, DevOps practitioner and site reliability engineer - regardless of the tech stack that they use.

That's why we've been working for over a year now to enable Java and Node.js teams to have a first-class monitoring experience in both their Azure and on-premises environments. So today I'm proud to share with you some of what our team has already accomplished, and I'm excited about the features and improvements that we will be continuing to release over the next several months. But first, let's talk about Java.

Application Insights for Java

The second version of our Application Insights for Java SDK was released to Maven/Gradle and GitHub earlier this year, and the team has continued to crank out improvements since then, most recently with version 2.1.2. In addition to a myriad of bug fixes, the team has also added support for fixed rate sampling, enhanced support for Log4J, and cross-component telemetry correlation. We also auto detect the usage of the most popular app frameworks, storage clients and communication libraries so that developers are not required to instrument their applications in any special way.

Java developers love the Spring framework and so do we. Because of this, we've also spent time improving our overall support of Spring in general and published a Spring Boot Starter compatible with the Spring Initializr application bootstrapping tool to help Spring developers configure Application Insights quickly and easily. As our love continues to grow, expect to see us continue to improve the monitoring experience for Spring teams in the future.

Finally, be sure to check out our quick start on Application Insights for Java to learn more.

Application Insights for Node.js

Now, on to Node.js. The stable version of our Application Insights for Node.js SDK has been available on npm and GitHub for nearly a year now, and we continue to make improvements to it. Most recently we hardened its security to prevent developers from accidentally using compromised versions of the TLS protocol, added support for HTTP(S) proxies, and, like Java, offer a great collection of auto-collectors to reduce the need for manual application instrumentation. As always, our quick start on Application Insights for Node.js is a great place to find out more.

Looking to the future

As mentioned, not only am I proud of what we've accomplished recently, but I'm also excited about what the future has in store. We already dropped some hints when we announced that we've joined the OpenCensus project, which is an open source project with a goal to achieve "a single distribution of libraries for metrics and distributed tracing with minimal overhead." Look for us to make more contributions there, and to expand our support to additional stacks and technologies.

Speaking of contributions, all of Application Insights SDKs are open source, including Java and Node.js. The team loves to hear feedback via issues and pull requests, so feel free to come develop with us!

Lastly, if you have any questions or comments, please leave them in the comments below.

↧

AI helps troubleshoot an intermittent SQL Database performance issue in one day

September 11, 2018, 3:00 am

≫ Next: Using C++17 Parallel Algorithms for Better Performance

≪ Previous: Application Insights improvements for Java and Node.js

In this blogpost, you will learn how Azure SQL Database intelligent performance feature Intelligent Insights has successfully helped a customer troubleshoot a hard to find 6-month intermittent database performance issue in a single day only. You will find out how Intelligent Insights helps an ISV operate 60,000 databases by identifying related performance issues across their database fleet. You will also learn how Intelligent Insights helped an enterprise seamlessly identify a hard to troubleshoot performance degradation issue on a large-scale 35TB database fleet.

Azure SQL Database, the most intelligent cloud database, is empowering small and medium size business, and large enterprises to focus on writing awesome applications while entrusting Azure to autonomously take care of running, scaling, and maintain a peak performance with a minimum of human interaction, or advanced technical skill set required.

Intelligent Insights is a new disruptive intelligent performance technology leveraging the power of artificial intelligence (AI) to continuously monitor and troubleshoot Azure SQL Database performance issues with a pinpoint accuracy and at a large scale simply not possible before. Performance troubleshooting models for the solution where fine-tuned and advanced based on the learning from a massive workload pool of several million Azure SQL Databases.

In the period since its public preview debut in Sep. 2017, we have witnessed some remarkable customer success stories with Intelligent Insights that I would like to share with you.

Troubleshooting hard-to-find intermittent SQL Database performance issues

NewOrbit is an ISV based in the United Kingdom building and running software on Azure for startups and enterprises, including a system for background checking of employees and tenants. Thousands of customers check more than 400,000 people each year and the system does that through integrating information from various services. NewOrbit runs entirely on Azure and use about 200 SQL Databases. NewOrbit has a flat organizational structure consisting of developers and customer account managers only. They do not employ DevOps team or DBAs as they rely on SQL Database built-in intelligence to automatically tune and troubleshoot database performance for them as NewOrbit CTO Frans Lytzen is saying “I have Azure for that”.

NewOrbit experienced an intermittent performance issue lasting for about 6 months that resulted in getting an increased amount of timeouts for existing systems running in production. Their first reaction was to upgrade to a higher pricing tier on Azure with more capacity, however this did not help and it seemed very strange to them. As they’ve seen more than enough spare DTUs on their Azure subscription, NewOrbit understood right away that dialing up the capacity will not make the issue go away.

NewOrbit has decided to try Intelligent Insights. As soon as they fired up the solution, it showed that there seems to exist a memory pressure on their databases. This was not immediately obvious and NewOrbit understood that memory pressure is not the cause but most likely an effect of another underlying problem.

Intelligent Insights has within a day of being turned on pointed out there are several queries suspected as a root cause for the memory pressure. NewOrbit has promptly fixed and deployed new queries to production witnessing an immediate decrease in memory pressure the following day.

“On an earlier occasion, we had a performance issue lasting for about 6 months. Before Intelligent Insights we have not had a way of figuring out where do we even start troubleshooting. Intelligent Insights gave us a list of things to do. What Intelligent Insight does is that it enables us to pinpoint where the problem is and to get a fix deployed within 24hrs.”

Frans Lytzen, CTO, NewOrbit.

As NewOrbit grows as a service, their databases and queries are getting bigger and the complexity grows. They have now implemented a continuous improvement program relying on Intelligent Insights to regularly review and optimize database queries for the best performance of their applications.

Identifying performance issues in the sea of 60,000 SQL Databases

SnelStart is a Dutch ISV developing and running SaaS service for small and large companies providing financial online services such are bookkeeping, invoicing and accounting. SnelStart relies on Azure to deliver their service to about 64,000 customers. The underlying infrastructure employs 60,000+ SQL Databases, mainly in elastic pools. Maintaining a massive amount of databases at scale and quickly reacting to customer issues, especially in cases of services degradation or an outage is an imperative for SnelStart.

To maintain such a large amount of databases typically involves having a large DBA and DevOps team. Once a customer calls on the phone complaining that the service is slow or unavailable, SnelStart team needs to quickly identify and act on resolving the issue. The company had a number of cases where it would find out about a performance or unavailability issues from their customers first. Hoping to improve their response time, SnelStart has decided to use Intelligent Insights as a performance-monitoring tool.

SnelStart uses the solution daily to help identify elastic pools hitting its CPU limits. The built-in intelligence helps the company identify top consuming databases in overutilized elastic pools, providing suggestions of other elastic pools with sufficient capacity where the hot databases could be moved. SnelStart as their DevOps strategy has created separate elastic pools on each of their logical servers where they move high consuming databases. This allows the company to research slow running queries, and other database problems with the help of Azure SQL Analytics until repaired. Once performance of hot databases has improved, SnelStart typically returns them back to the original elastic pools.

With the help of Intelligent Insights, SnelStart was also, in one of the instances, able to quickly identify application queries causing database locking, much before their developers could manually troubleshoot the root cause of the issue. Hotfix for the issue was deployed within minutes, and before customers were impacted at scale. Better yet, there were no customer phone calls received regarding the service performance.

This was an entirely new type of capability for the company, as the service monitoring has shifted from a reactive to proactive relying on SQL Database built-in intelligent performance.

“Intelligent Insights proactively finds a database performance problem in a more efficient way and much faster than humans. With it we can proactively help customers until we have a fix for the problem.”

Bauke Stil, Application Manager, SnelStart.

SnelStart was impressed with the efficiency of detecting database performance issues automatically and at such scale in the sea of 60,000 SQL Databases. As all of their customer databases are having the same structure, the company is also using the solution to identify and resolve common performance issues across their entire database fleet. Once an issue is identified on one database, a hotfix is deployed to all databases immediately benefiting all SnelStart customers.

Finding a locking issue on a large-scale 35TB SQL Database fleet

Microsoft TFS provides an online service supporting our developer community and customers. It is a complex system handling trillion lines of code across the customer base. The service provides sophisticated reporting, builds, labs, tests, ticketing, release automation management and project management – amongst others. Providing a 24/7/365 service globally with a top performance is imperative for TFS. Once you click to check-in your code, or if you are perhaps creating a TFS ticket, you expect a prompt response from the service. TFS massive infrastructure runs on about 900 SQL Databases with combined 35TB of data stored.

TFS has previously used its own monitoring and alerting solution capable of observing slow queries and sustained high CPU periods. On one of the occasions, the existing tools identified a performance issue providing only a shallow analysis that there exists a heavy locking on a database, but nothing else. Further troubleshooting has indicated the locking was table scoped and related to lock partitioning. This meant that the affected slow query visible in the stats is typically not the root cause of locking, but only a surface effect of some other blocking query causing the locking. The workload in such cases keeps on piling up, so upgrading to a larger resource pool could only be a short-lived solution. The only way out of this was to find the blocking query and to fix it.

Troubleshooting in cases such as this one is typically very difficult, as it requires considerable DBA skills and it is very time consuming. TFS turned to Intelligent Insights to help with performance troubleshooting automation. In this particular case, the system has identified application related increase in the workload pile up, and has provided a list the affected and blocking queries.

Analysis of the blocking query has identified that this was a maintenance query scheduled periodically to remove unused attachments. Further analyzing the code, it was determined that the size of the deletion batch was too large causing the issue. New query with reduced batch size was deployed promptly, having an immediate effect in resolving the heavy locking and gradually relieving the workload pressure.

Maintaining TFS demanding infrastructure requires some of the best Microsoft engineers on the job. We have surely met the bar with one of the best SQL performance troubleshooters at Microsoft, Remi Lemarchand, Principal Software Engineer at TFS, hearing him say:

“Intelligent Insights does a great job of finding database performance problems. I find it always right on the spot!”

Remi Lemarchand, Principal Software Engineer, Microsoft TFS.

Remi concludes that he prefers using Intelligent Insights first in troubleshooting database performance issues before moving onto other tools due to the level of depth it provides, and considerable reduction of manual DBA troubleshooting time required. Since applying the fix to the blocking query, the database is purring along nicely!

Summary

Marriage of AI and Azure SQL Database has resulted in disruptive new capabilities of intelligent performance not possible before. Small and medium size business can now do more with Azure on a large scale with a smaller crew and at a considerably lower cost compared to running infrastructure on their own. Large companies and enterprises can have less headaches and relieve pressure from their DBAs and DevOps as SQL Database intelligent performance can help them identify operational issues on tens of thousands of databases in a single day, compared to months in some cases.

To help you start using SQL Database Intelligent Insights for troubleshooting performance issues, see Setup Intelligent Insights with Log Analytics.

For related SQL Database intelligent performance products, see Monitor Azure SQL Databases with Azure SQL Analytics, and Automatic tuning in Azure SQL Database.

Please let us know how do you envision SQL Database intelligent performance helping you?

↧

Using C++17 Parallel Algorithms for Better Performance

September 11, 2018, 9:02 am

≫ Next: .NET Core September 2018 Update

≪ Previous: AI helps troubleshoot an intermittent SQL Database performance issue in one day

C++17 added support for parallel algorithms to the standard library, to help programs take advantage of parallel execution for improved performance. MSVC first added experimental support for some algorithms in 15.5, and the experimental tag was removed in 15.7.

The interface described in the standard for the parallel algorithms doesn’t say exactly how a given workload is to be parallelized. In particular, the interface is intended to express parallelism in a general form that works for heterogeneous machines, allowing SIMD parallelism like that exposed by SSE, AVX, or NEON, vector “lanes” like that exposed in GPU programming models, and traditional threaded parallelism.

Our parallel algorithms implementation currently relies entirely on library support, not on special support from the compiler. This means our implementation will work with any tool currently consuming our standard library, not just MSVC’s compiler. In particular, we test that it works with Clang/LLVM and the version of EDG that powers Intellisense.

How to: Use Parallel Algorithms

To use the parallel algorithms library, you can follow these steps:

Find an algorithm call you wish to optimize with parallelism in your program. Good candidates are algorithms which do more than O(n) work like sort, and show up as taking reasonable amounts of time when profiling your application.
Verify that code you supply to the algorithm is safe to parallelize.
Choose a parallel execution policy. (Execution policies are described below.)
If you aren’t already, #include <execution> to make the parallel execution policies available.
Add one of the execution policies as the first parameter to the algorithm call to parallelize.
Benchmark the result to ensure the parallel version is an improvement. Parallelizing is not always faster, particularly for non-random-access iterators, or when the input size is small, or when the additional parallelism creates contention on external resources like a disk.

For the sake of example, here’s a program we want to make faster. It times how long it takes to sort a million doubles.

// compile with:
//  debug: cl /EHsc /W4 /WX /std:c++latest /Fedebug /MDd .program.cpp
//  release: cl /EHsc /W4 /WX /std:c++latest /Ferelease /MD /O2 .program.cpp
#include <stddef.h>
#include <stdio.h>
#include <algorithm>
#include <chrono>
#include <random>
#include <ratio>
#include <vector>

using std::chrono::duration;
using std::chrono::duration_cast;
using std::chrono::high_resolution_clock;
using std::milli;
using std::random_device;
using std::sort;
using std::vector;

const size_t testSize = 1'000'000;
const int iterationCount = 5;

void print_results(const char *const tag, const vector<double>& sorted,
                   high_resolution_clock::time_point startTime,
                   high_resolution_clock::time_point endTime) {
  printf("%s: Lowest: %g Highest: %g Time: %fmsn", tag, sorted.front(),
         sorted.back(),
         duration_cast<duration<double, milli>>(endTime - startTime).count());
}

int main() {
  random_device rd;

  // generate some random doubles:
  printf("Testing with %zu doubles...n", testSize);
  vector<double> doubles(testSize);
  for (auto& d : doubles) {
    d = static_cast<double>(rd());
  }

  // time how long it takes to sort them:
  for (int i = 0; i < iterationCount; ++i)
  {
    vector<double> sorted(doubles);
    const auto startTime = high_resolution_clock::now();
    sort(sorted.begin(), sorted.end());
    const auto endTime = high_resolution_clock::now();
    print_results("Serial", sorted, startTime, endTime);
  }
}

Parallel algorithms depend on available hardware parallelism, so ensure you test on hardware whose performance you care about. You don’t need a lot of cores to show wins, and many parallel algorithms are divide and conquer problems that won’t show perfect scaling with thread count anyway, but more is still better. For the purposes of this example, we tested on an Intel 7980XE system with 18 cores and 36 threads. In that test, debug and release builds of this program produced the following output:

.debug.exe
Testing with 1000000 doubles...
Serial: Lowest: 1349 Highest: 4.29497e+09 Time: 310.176500ms
Serial: Lowest: 1349 Highest: 4.29497e+09 Time: 304.714800ms
Serial: Lowest: 1349 Highest: 4.29497e+09 Time: 310.345800ms
Serial: Lowest: 1349 Highest: 4.29497e+09 Time: 303.302200ms
Serial: Lowest: 1349 Highest: 4.29497e+09 Time: 290.694300ms

C:UsersbionDesktop>.release.exe
Testing with 1000000 doubles...
Serial: Lowest: 2173 Highest: 4.29497e+09 Time: 74.590400ms
Serial: Lowest: 2173 Highest: 4.29497e+09 Time: 75.703500ms
Serial: Lowest: 2173 Highest: 4.29497e+09 Time: 87.839700ms
Serial: Lowest: 2173 Highest: 4.29497e+09 Time: 73.822300ms
Serial: Lowest: 2173 Highest: 4.29497e+09 Time: 73.757400ms

Next, we need to ensure our sort call is safe to parallelize. Algorithms are safe to parallelize if the “element access functions” — that is, iterator operations, predicates, and anything else you ask the algorithm to do on your behalf follow the normal “any number of readers or at most one writer” rules for data races. Moreover, they must not throw exceptions (or throw exceptions rarely enough that terminating the program if they do throw is OK).

Next, choose an execution policy. Currently, the standard includes the parallel policy, denoted by std::execution::par, and the parallel unsequenced policy, denoted by std::execution::par_unseq. In addition to the requirements exposed by the parallel policy, the parallel unsequenced policy requires that your element access functions tolerate weaker than concurrent forward progress guarantees. That means that they don’t take locks or otherwise perform operations that require threads to concurrently execute to make progress. For example, if a parallel algorithm runs on a GPU and tries to take a spinlock, the thread spinning on the spinlock may prevent other threads on the GPU from ever executing, meaning the spinlock may never be unlocked by the thread holding it, deadlocking the program. You can read more about the nitty gritty requirements in the [algorithms.parallel.defns] and [algorithms.parallel.exec] sections of the C++ standard. If in doubt, use the parallel policy. In this example, we are using the built-in double less-than operator which doesn’t take any locks, and an iterator type provided by the standard library, so we can use the parallel unsequenced policy.

Note that the Visual C++ implementation implements the parallel and parallel unsequenced policies the same way, so you should not expect better performance for using par_unseq on our implementation, but implementations may exist that can use that additional freedom someday.

In the doubles sort example above, we can now add

#include <execution>

to the top of our program. Since we’re using the parallel unsequenced policy, we add std::execution::par_unseq to the algorithm call site. (If we were using the parallel policy, we would use std::execution::par instead.) With this change the for loop in main becomes the following:

for (int i = 0; i < iterationCount; ++i)
{
  vector<double> sorted(doubles);
  const auto startTime = high_resolution_clock::now();
  // same sort call as above, but with par_unseq:
  sort(std::execution::par_unseq, sorted.begin(), sorted.end());
  const auto endTime = high_resolution_clock::now();
  // in our output, note that these are the parallel results:
  print_results("Parallel", sorted, startTime, endTime);
}

Last, we benchmark:

.debug.exe
Testing with 1000000 doubles...
Parallel: Lowest: 6642 Highest: 4.29496e+09 Time: 54.815300ms
Parallel: Lowest: 6642 Highest: 4.29496e+09 Time: 49.613700ms
Parallel: Lowest: 6642 Highest: 4.29496e+09 Time: 49.504200ms
Parallel: Lowest: 6642 Highest: 4.29496e+09 Time: 49.194200ms
Parallel: Lowest: 6642 Highest: 4.29496e+09 Time: 49.162200ms

.release.exe
Testing with 1000000 doubles...
Parallel: Lowest: 18889 Highest: 4.29496e+09 Time: 20.971100ms
Parallel: Lowest: 18889 Highest: 4.29496e+09 Time: 17.510700ms
Parallel: Lowest: 18889 Highest: 4.29496e+09 Time: 17.823800ms
Parallel: Lowest: 18889 Highest: 4.29496e+09 Time: 20.230400ms
Parallel: Lowest: 18889 Highest: 4.29496e+09 Time: 19.461900ms

The result is that the program is faster for this input. How you benchmark your program will depend on your own success criteria. Parallelization does have some overhead and will be slower than the serial version if N is small enough, depending on memory and cache effects, and other factors specific to your particular workload. In this example, if I set N to 1000, the parallel and serial versions run at approximately the same speed, and if I change N to 100, the serial version is 10 times faster. Parallelization can deliver huge wins but choosing where to apply it is important.

Current Limitations of the MSVC Implementation of Parallel Algorithms

We built the parallel reverse, and it was 1.6x slower than the serial version on our test hardware, even for large values of N. We also tested with another parallel algorithms implementation, HPX, and got similar results. That doesn’t mean it was wrong for the standards committee to add those to the STL; it just means the hardware our implementation targets didn’t see improvements. As a result we provide the signatures for, but do not actually parallelize, algorithms which merely permute, copy, or move elements in sequential order. If we get feedback with an example where parallelism would be faster, we will look into parallelizing these. The affected algorithms are:

copy
copy_n
fill
fill_n
move
reverse
reverse_copy
rotate
rotate_copy
swap_ranges

Some algorithms are unimplemented at this time and will be completed in a future release. The algorithms we parallelize in Visual Studio 2017 15.8 are:

adjacent_difference
adjacent_find
all_of
any_of
count
count_if
equal
exclusive_scan
find
find_end
find_first_of
find_if
for_each
for_each_n
inclusive_scan
mismatch
none_of
reduce
remove
remove_if
search
search_n
sort
stable_sort
transform
transform_exclusive_scan
transform_inclusive_scan
transform_reduce

Design Goals for MSVC’s Parallel Algorithms Implementation

While the standard specifies the interface of the parallel algorithms library, it doesn’t say at all how algorithms should be parallelized, or even on what hardware they should be parallelized. Some implementations of C++ may parallelize by using GPUs or other heterogeneous compute hardware if available on the target. copy doesn’t make sense for our implementation to parallelize, but it does make sense on an implementation that targets a GPU or similar accelerator. We value the following aspects in our implementation:

Composition with platform locks

Microsoft previously shipped a parallelism framework, ConcRT, which powered parts of the standard library. ConcRT allows disparate workloads to transparently use the hardware available, and lets threads complete each other’s work, which can increase overall throughput. Basically, whenever a thread would normally go to sleep running a ConcRT workload, it suspends the chore it’s currently executing and runs other ready-to-run chores instead. This non-blocking behavior reduces context switches and can produce higher overall throughput than the Windows threadpool our parallel algorithms implementation uses. However, it also means that ConcRT workloads do not compose with operating system synchronization primitives like SRWLOCK, NT events, semaphores, COM single threaded apartments, window procedures, etc. We believe that is an unacceptable trade-off for the “by default” implementation in the standard library.

The standard’s parallel unsequenced policy allows a user to declare that they support the kinds of limitations lightweight user-mode scheduling frameworks like ConcRT have, so we may look at providing ConcRT-like behavior in the future. At the moment however, we only have plans to make use of the parallel policy. If you can meet the requirements, you should use the parallel unsequenced policy anyway, as that may lead to improved performance on other implementations, or in the future.

Usable performance in debug builds

We care about debugging performance. Solutions that require the optimizer to be turned on to be practical aren’t suitable for use in the standard library. If I add a Concurrency::parallel_sort call to the previous example program, ConcRT’s parallel sort a bit faster in release but almost 100 times slower in debug:

for (int i = 0; i < iterationCount; ++i)
{
  vector<double> sorted(doubles);
  const auto startTime = high_resolution_clock::now();
  Concurrency::parallel_sort(sorted.begin(), sorted.end());
  const auto endTime = high_resolution_clock::now();
  print_results("ConcRT", sorted, startTime, endTime);
}

C:UsersbionDesktop>.debug.exe
Testing with 1000000 doubles...
ConcRT: Lowest: 5564 Highest: 4.29497e+09 Time: 23910.081300ms
ConcRT: Lowest: 5564 Highest: 4.29497e+09 Time: 24096.297700ms
ConcRT: Lowest: 5564 Highest: 4.29497e+09 Time: 23868.098500ms
ConcRT: Lowest: 5564 Highest: 4.29497e+09 Time: 24159.756200ms
ConcRT: Lowest: 5564 Highest: 4.29497e+09 Time: 24950.541500ms

C:UsersbionDesktop>.release.exe
Testing with 1000000 doubles...
ConcRT: Lowest: 1394 Highest: 4.29496e+09 Time: 19.019000ms
ConcRT: Lowest: 1394 Highest: 4.29496e+09 Time: 16.348000ms
ConcRT: Lowest: 1394 Highest: 4.29496e+09 Time: 15.699400ms
ConcRT: Lowest: 1394 Highest: 4.29496e+09 Time: 15.907100ms
ConcRT: Lowest: 1394 Highest: 4.29496e+09 Time: 15.859500ms

Composition with other programs on the system and parallelism frameworks

Scheduling in our implementation is handled by the Windows system thread pool. The thread pool takes advantage of information not available to the standard library, such as what other threads on the system are doing, what kernel resources threads are waiting for, and similar.
It chooses when to create more threads, and when to terminate them. It’s also shared with other system components, including those not using C++.

For more information about the kinds of optimizations the thread pool does on your (and our) behalf, check out Pedro Teixeira’s talk on the thread pool, as well as official documentation for the CreateThreadpoolWork, SubmitThreadpoolWork, WaitForThreadpoolWorkCallbacks, and CloseThreadpoolWork functions.

Above all, parallelism is an optimization

If we can’t come up with a practical benchmark where the parallel algorithm wins for reasonable values of N, it will not be parallelized. We view being twice as fast at N=1’000’000’000 and 3 orders of magnitude slower when N=100 as an unacceptable tradeoff. If you want “parallelism no matter what the cost” there are plenty of other implementations which work with MSVC, including HPX and Threading Building Blocks.

Similarly, the C++ standard permits parallel algorithms to allocate memory, and throw std::bad_alloc when they can’t acquire memory. In our implementation we fall back to a serial version of the algorithm if additional resources can’t be acquired.

Benchmark parallel algorithms and speed up your own applications

Algorithms that take more than O(n) time (like sort) and are called with larger N than 2,000 are good places to consider applying parallelism. We want to make sure this feature behaves as you expect; please give it a try. If you have any feedback or suggestions for us, let us know. We can be reached via the comments below, via email (visualcpp@microsoft.com) and you can provide feedback via Help > Report A Problem in the product, or via Developer Community. You can also find us on Twitter (@VisualC) and Facebook (msftvisualcpp).

↧

.NET Core September 2018 Update

September 11, 2018, 10:00 am

≫ Next: GPUs vs CPUs for Deployment of Deep Learning Models

≪ Previous: Using C++17 Parallel Algorithms for Better Performance

Today, we are releasing the .NET Core September 2018 Update. This update includes .NET Core 2.1.4 and .NET Core SDK 2.1.402 and contains important reliability fixes.

Security

CVE-2018-8409: .NET Core Denial Of Service Vulnerability
A denial of service vulnerability exists in .NET Core 2.1 when System.IO.Pipelines improperly handles requests. An attacker who successfully exploited this vulnerability could cause a denial of service against an application that is leveraging System.IO.Pipelines. The vulnerability can be exploited remotely, without authentication. A remote unauthenticated attacker could exploit this vulnerability by providing specially crafted requests to the application.

CVE-2018-8409: ASP.NET Core Denial Of Service Vulnerability
A denial of service vulnerability exists in ASP.NET Core 2.1 that improperly handles web requests. An attacker who successfully exploited this vulnerability could cause a denial of service against an ASP.NET Core web application. The vulnerability can be exploited remotely, without authentication. A remote unauthenticated attacker could exploit this vulnerability by providing a specially crafted web requests to the ASP.NET Core application.

Getting the Update

The latest .NET Core updates are available on the .NET Core download page. This update is included in Visual Studio 15.8.4, which is also releasing today.

Additional details, such as how to update vulnerable applications, can be found in the ASP.NET Core and .NET Core repo announcements.

See the .NET Core 2.1.4 release notes for details on the release including a detailed commit list.

Docker Images

.NET Docker images have been updated for today’s release. The following repos have been updated.

microsoft/dotnet
microsoft/dotnet-samples

Note: Look at the “Tags” view in each repository to see the updated Docker image tags.

Note: You must re-pull base images in order to get updates. The Docker client does not pull updates automatically.

Azure App Services deployment

Deployment of .NET Core 2.1.4 to Azure App Services has begun and the West Central US region will be live this morning. Remaining regions will be updated over the next few days and deployment progress can be tracked in this Azure App Service announcement.

Previous .NET Core Updates

The last few .NET Core updates follow:

August 2018 Update
July 2018 Update
June 2018 Update
May 2018 Update

↧

Spark History Server Enhancements

Developer Nirvana

Getting Started with Apache Spark Debugging Toolset

More features to come

Feedback

Product updates

Apache Phoenix and Zeppelin integration

Oozie support in HDInsight enterprise security package

Azure Data Lake Storage Gen2 integration

ML Services 9.3 and open-source R capabilities on HDInsight

Virtual Network Service Endpoints support

Kafka 1.0 and 1.1 support

Support for Spark 2.3

Move large data sets to Azure using WANDisco on Azure HDInsight

More blogs:

Customer stories

About HDInsight

Additional resources

Opportunities for ISVs

IoT solution areas

Opportunities for system integrators

Recommended next steps

More power, less time, faster reports

Cope with the increase in reports

Reduce the friction between systems

Modular capabilities

Swing at will from low use to high

Variable sizes to fit

Recommended next steps

smartARM

State of the (prosthetic) arm

Simplicity and innovation

How does it work?

Digitally transforming the prosthetic arm

“Imagine a future where all assistive devices are infused with AI and designed to work with you. That could have tremendous positive impact on the quality of people’s lives, all over the world.”

MORE DIABETES READING

Azure Pipelines app on GitHub Marketplace

Pull Request and CI Checks

More than just open source

Keynote: Watch our live Azure DevOps keynote on September 11, 2018 from 8:00 - 9:30 AM Pacific Time.

Live training: Join our live Mixer workshop with interactive Q&A on September 17, 2018 from 8:30 AM - 2:30 PM Pacific Time.

Azure Pipelines

Azure Boards

Azure Artifacts

Azure Repos

Azure Test Plans

Open Source projects receive free CI/CD with Azure Pipelines

The evolution of Visual Studio Team Services (VSTS)

Learn more

Keynote: Watch our live Azure DevOps keynote on September 11, 2018 from 8:00 - 9:30 AM Pacific Time.

Live training: Join our live Mixer workshop with interactive Q&A on September 17, 2018 from 8:30 AM - 2:30 PM Pacific Time.

Securing sensitive data in the real world

What Key Vault provides

Using Key Vault

Using Key Vault from an application

Managing Key Vaults

Deleting and restoring Key Vaults

Restoring a Key Vault

Deleting a soft delete Key Vault

Wrapping up

Recommended next steps:

Now in preview

Also in preview

Azure Friday

Now generally available

The Azure Podcast

News and updates

Additional news and updates

The IoT Show

Technical content and training

Additional technical content

Azure tips & tricks

Events

Tuesdays with Corey

Industries

A Cloud Guru's Azure This Week

Application Insights for Java

Application Insights for Node.js

Looking to the future

Troubleshooting hard-to-find intermittent SQL Database performance issues