Python extension for Visual Studio Code version 0.9.0 now available

December 14, 2017, 11:07 am

≫ Next: Image Watch is now available for Visual Studio 2017

≪ Previous: Visual Studio Updates for Office 365 APIs Tools

We’re excited to announce our first update to the Microsoft Python extension for Visual Studio Code. In this release, we’ve added additional support for conda environments and fixed numerous bugs in the editor and debugger.

Support for Conda Environments

Conda environments are now automatically detected from the Anaconda root location. You also have the option of using conda to install Pylint:

The extension now has schema support for various YAML files that conda uses: meta.yml, .condarc, and environment.yml. This means that if you use the YAML extension you will now get auto-complete in these files. Similarly, if you use the YAML Support by Red Hat extension you now get auto-complete and type checking.

Editor and Debugger Improvements

A total of 84 issues have been closed since our first release, most of these are bugs fixed in the debugger and editor that were filed by you on GitHub. A couple of notable ones are:

Ctrl+F5 was sometimes not working (#25)
IntelliSense interrupting typing in inside of strings and comments (#110, #34)

We’ve also removed the python.formatting.formatOnSave and python.linting.lintOnTextChange configuration options in favor of the equivalent general VS Code settings. If you’re using this feature be sure to change to using the editor.formatOnSave option.

For more information, you can look at the full list of changes in the 0.9.0 release. Be sure to update to the latest VS Code extensions, and let us know if you find any problems by creating issues on our vscode-python GitHub page.

↧

Image Watch is now available for Visual Studio 2017

December 14, 2017, 2:06 pm

≫ Next: Bing Maps Launches Three New Fleet Management APIs

≪ Previous: Python extension for Visual Studio Code version 0.9.0 now available

Image Watch is a Visual Studio extension that provides a watch window for viewing in-memory bitmaps when debugging native C++ code. It comes with built-in support for OpenCV image types (e.g. cv::Mat, cv::Mat_<> , etc.).

We know that, for many of you, this is an important part of your C++ debugging experience. We have received many requests to support this extension on Visual Studio 2017 via survey responses, blog comments and Reddit conversations.

We’re happy to announce that Image Watch for Visual Studio 2017 is now available for download in Visual Studio Marketplace.

Getting started with OpenCV and Image Watch

You can try Image Watch with any OpenCV program. You will need the OpenCV library installed on your developer machine to be able to compile these programs. You can run the following vcpkg commands to download and compile OpenCV on your machine:

1. Install vcpkg (Skip if you already have a working vcpkg installation):

git clone https://github.com/Microsoft/vcpkg.git
cd vcpkg
.bootstrap-vcpkg.bat
vcpkg integrate install

2. Install OpenCV

vcpkg install opencv

If you are looking for a place to start with OpenCV, you can use the example provided in the Image Watch tutorial on opencv.org. Create a new C++ project from File > New > Project… > Visual C++ > Windows Desktop, by selecting the Windows Console Application template. Then, replace the content of the main C++ file with the opencv.org example linked above.

During any C++ debugging session, you can bring up the Image Watch window by selecting:

View > Other Windows > Image Watch menu or
Auto, Watch or Locals window > next to any supported variable > Add to Image Watch context menu, or
Hover over any variable to display its Data Tip > next to the value > Add to Image Watch context menu

To learn more about the various ways you can use Image Watch, check out the Image Watch Help page.

Discover more Visual Studio extensions

We hope you enjoy using Image Watch for Visual Studio 2017. As always, we’re looking forward to your feedback and suggestions.

If you want to explore other useful Visual Studio extensions for C++ development in the Visual Studio Marketplace, check out our article “Visual Studio extensions for C++ developers in Visual Studio 2017”.

↧

Bing Maps Launches Three New Fleet Management APIs

December 14, 2017, 3:54 pm

≫ Next: Bing launches new intelligent search features, powered by AI

≪ Previous: Image Watch is now available for Visual Studio 2017

Today, we are pleased to announce the release of three new Fleet Management APIs – Truck Routing, Isochrones, and Snap-to-Road – all available with the Bing Maps V8 Web Control and REST services. These APIs are the latest additions to our Fleet Management collection, which includes the Distance Matrix API that was announced last month.

Let’s dive into the details.

Truck Routing API

Bing Maps Truck Routing API calculates routes optimized for trucks and commercial vehicles. The routes include information such as route instructions, travel duration, and travel distance. The API takes into consideration specific requirements for trucks and larger vehicles (e.g., avoiding low bridges, sharp turns, steep gradients, or following restrictions and permits for hazardous material).

Commercial vehicles have specific transportation requirements. Oftentimes using consumer routing services, made for cars and other consumer vehicles, is not the best solution. Routes tailored to commercial vehicles can help avoid a number of unhappy and potentially dangerous situations, not to mention increase efficiency, ensure compliance with legal restrictions, and help customers realize potential savings on fuel costs, vehicle maintenance, and fines.

For example, the red route in the screenshot below uses a consumer routing service, whereas the green route takes into consideration the height of the truck and routes you around a low-clearance bridge. In addition, the Streetside imagery capabilities of the Bing Maps platform allow you to see why your vehicle needs to take a different route (e.g., low or narrow bridges, tight corners, sharp inclines, waterways, and more).

With the Bing Maps Truck Routing API, you can set the following parameters and preferences for your vehicle’s attributes to calculate a safe route that takes into consideration certain road and legal restrictions:

Vehicle’s dimensions and weight
Number of axles and/or number of trailers
Speed limitations
Legal restrictions and permit requirements, such as transporting flammable materials
Road conditions (strong crosswind areas, construction, road gradient, etc.)
Avoid route, such as toll roads, highways, ferries, and more

Isochrone API

The Bing Maps Isochrone API calculates the area that can be travelled to within a specified distance or time and renders a travel time or distance polygon to visualize the shape of that area on a map. Use this API to plan the area that can be reached from a designated starting point within a set time period.

For example, in a job hunting scenario, you may want to see office locations that are within a 10, 15, 20 and 25 minute-drive from your home (as shown in the screenshot below). In that case, a time-based search will provide a visualization of the area, using predicative traffic to help you determine your potential commute time.

Bing Maps Isochrone API

The polygon area can also be used to filter for spatial queries. In a commercial real estate use case, the Isochrone API can help with site selection where you want to see how many of your competitors are within a 10-minute walk or drive of a potential new store location.

Below is an overview of the Isochrone API features that can help you easily optimize planning:

Travel-time or Travel-distance isolines
Modes: Walking, driving and public transportation
Avoid routes, such as bridges, ferries, or tolls
Supports multiple polygons to determine intersecting area
Customizable parameters: Supports arrival or departure times (reverse flow) predicted traffic, user’s location, and more.

Snap-to-Road API

The Snap-to-Road API takes GPS point data, in the form of latitudes and longitudes, and returns a list of objects that form a route snapped to the roads on a map. The information returned on the road segments includes the names of the roads that the GPS points are associated with and their posted speed limits. With a maximum number of 100 points per API call, the API can easily support multi-stop route use cases.

Users can also request that the points be interpolated, resulting in a path that smoothly follows the geometry of the road. In the screenshot below, the blue pins represent the input coordinates, and the red line is formed by the snapped points in the response with the speed limits displayed.

Bing Maps Snap to Road API

How does it work? GPS points are not always exact, so the API converts the GPS points of your tracked assets to a corresponding set of interpolated latitude and longitude coordinates of the nearest roads. This feature is important when tracking assets, as GPS devices can lose their connection or encounter interference resulting in an incomplete or inaccurate collection of GPS points. Tracing GPS points, otherwise known as historical bread crumbing, can be used for post-drive analytics or in real-time asset tracking.

You can also use Bing Maps V8 Web Control to display the interpolated points on a map as a route line that nicely follows the geometry of the roads, as shown in the image above. The Bing Maps V8 Web Control Infobox feature can also be used to display information about each GPS point (e.g., truck number, driver’s name, street name of the road, posted speed limit, and driver’s speed).

Get Started

For more information about each of the APIs, including documentation, how to get licensed, and frequently asked questions, visit our website.

We hope you will enjoy using these APIs as much as we’ve enjoyed building them. As always, we are open to your feedback. Connect with us on the Bing Maps Forum to share your thoughts and let us know what you would like to see in future versions.

- Bing Maps team

↧

Bing launches new intelligent search features, powered by AI

December 13, 2017, 2:30 pm

≫ Next: Twitter sentiment as a release gate

≪ Previous: Bing Maps Launches Three New Fleet Management APIs

Today we announced new Intelligent Search features for Bing, powered by AI, to give you answers faster, give you more comprehensive and complete information, and enable you to interact more naturally with your search engine.

Intelligent answers:

One of the Intelligent Search features announced today are Intelligent Answers. These answers leverage the latest sta te of the art in machine reading comprehension, backed by Project Brainwave running on Intel’s FPGAs, to read and analyze billions of documents to understand the web and help you more quickly and confidently get the answers you need.

Bing now uses deep neural networks to validate answers by aggregating across multiple reputable sources, rather than just one, so you can feel more confident about the answer you’re getting.

Many times, you have to click into multiple sources to get a comprehensive answer for your question. Bing now saves you time by bringing together content from across multiple sources.

Of course, not every question has just one answer. Sometimes you might be looking for expert opinions, different perspectives or collective knowledge. If there are different authoritative perspectives on a topic, such as benefits vs drawbacks, Bing will aggregate the two viewpoints from reputable sources and intelligently surface them to you on the top of the page to save you time.

If there are multiple ways to answer a question, you’ll get a carousel of intelligent answers, saving you time searching from one blue link to another.

We’re also expanding our comparison answers beyond just products, so you can get a snapshot of the key differences between two items or topics in an easy-to-read table. Bing’s comparison answers understand entities, their aspects, and using machine reading comprehension, reads the web to save you time combing through numerous dense documents to find what you are looking for.

Bing also leverages technology built in Microsoft’s research labs to help make sense of numbers we increasingly encounter in the digital world. Bing translates this data into simple concepts so it’s easier to understand what data like the population of another country means.

Many of these answers are available today and others will be rolling out to users over the next week in the US with expansion to other markets over time.

Reddit on Bing:

A key element of Intelligent Search is bringing together different sources of knowledge, like the wisdom of the crowd, to help people make decisions. Today, we’re launching a new partnership with Reddit, an online community of 330M monthly active users, to bring information from the Reddit community, which generates 2.8M comments daily, to Bing. We are launching with three initial experiences, which we’ll continue to develop and expand as we get feedback from users:

While already in Bing, when you search for a specific Reddit topic or subreddit, like “Reddit Aww”, Bing will surface a sneak peak of the topic with the top conversations for the day from Reddit.
When searching for a general topic that is best answered with relevant Reddit conversations, Bing will surface a snippet of those conversations at the top of the page so you can easily get perspectives from the millions of Reddit users.
Bing will be the place to go to search for Reddit AMAs, Q&As with celebrities and every day heroes hosted by the Reddit community. On Bing you can discover AMA schedules and see snapshots of AMAs that have already been completed. Simply search a person’s name to see their AMA snapshot or search for “Reddit AMAs” to see a carousel of popular AMAs.

More conversational search:

We often hear that search would be easier if only Bing could complete your sentences. Half the battle of searching is knowing the right words to query. Combining our expertise in web scale mining of billions of documents and with Conversational AI, we're creating a new way to search that is interactive and can build on your previous searches to get you the best answer. Now if you need help figuring out the right question to ask, Bing will help you with clarifying questions based on your query to better refine your search and get you the best answer the first time around. You’ll start to see this experience in health, tech and sports queries, and we will be adding more topic areas over time. And because we’ve built it with large-scale machine learning, the experience will get better over time as more users engage with it.

Intelligent image search:

Today, we also shared more detail on Bing’s advanced image search features. Bing Image Search leverages computer vision and object recognition to give you more ways to find what you’re looking for. Search any image or within images to shop for fashion or home furniture. Bing detects and highlights different products within images or you can click the magnifying glass icon on the top right of any image to search within an image and find related images or products. We also previewed a new feature that helps you better explore the world around you. If you find a landmark on Bing image search or use a photo from your camera roll, Bing will identify it and share interesting information about that landmark, such as the origins of the landmark and other relevant trivia. For instance, if you are looking at the India Gate, Bing can tell you why it was created and even what kind of stone it was made from. More to come on this feature in the future.

We’re excited for you to try out all of Bing’s new Intelligent Search features and are committed to delivering even more features that will help to save you time and money in the future. To learn more about Intelligent Search visit our site here.

-The Bing Team

↧

Twitter sentiment as a release gate

December 15, 2017, 7:42 am

≪ Previous: Bing launches new intelligent search features, powered by AI

I’m really excited to talk about a new Twitter sentiment release gate extension in the Visual Studio Team Services (VSTS) marketplace today.

Before I say more, let me step back and give some context…

Any responsible DevOps practice uses techniques to limit the damage done by bugs that get deployed into production. One of the common techniques is to break up a production environment into a set of separate instances of an app and then configure deployments to only update one instance at a time, with a waiting period between them. During that waiting period, you watch for any signs (telemetry, customer complaints, etc.) that there is a problem and if so, halt the deployment, fix the issue and then continue the deployment. This way, any bug you deploy only affects a small fraction of your user base. In fact, often, the first product environment in the sequence is often one only available to internal people in your organization so you can validate the changes before they hit “real” customers. None-the-less, sometimes issues make it through.

Release gates are a new VSTS Release Management feature that we announced at our Connect(); event in November. Release gates automate the waiting period between environments. They enable you to configure conditions that will cause the release wait. Out of the box, we provided two conditions – Azure monitoring alerts and Work item queries. Using the first, you can have your release hold if your monitoring alerts are indicating that the environments you’ve already deployed to are unhealthy. And the second allows you to automatically pause releases if anyone files a “blocking bug” against the release.

However, one of the things I’ve learned is that no amount of monitoring will catch every single problem and, particularly, if you have a popular application, your users will know within seconds and turn very quickly to Twitter to start asking about the problem. Twitter can be a wonderful “alert” to let you know something is wrong with your app.

The Twitter sentiment release gate we released today enables exactly this. It leverages VSTS, Azure functions and Microsoft AI to analyze sentiment on your Twitter handle and gate your release progress based on it. The current implementation of the analysis is relatively simple and serves as a sample as much as anything else. It shows how easy it is to extend VSTS release gates to measure any signal you choose and use that signal to manage your release process.

Once you install the Twitter sentiment extension from the marketplace, you’ll need to follow the instructions to configure an Azure function to measure and analyze your sentiment. Then you can go into your VSTS release definition and you will find a new release gate enabled.

Start by clicking on Post-deployment conditions on one of your environments.

Then enable the release gates.

Then choose the Twitter Sentiment item and configure it.

Check it out. It’s very cool. And feel free to experiment with other ideas you have for creating interesting release gates.

Note, right now the marketplace extension says it works for TFS. Actually it doesn’t. I’m working to get that fixed. TFS doesn’t support Release Gates yet. It will in a TFS 2018 Update but not yet.

Thanks,

Brian

↧

Because it’s Friday: Editing Star Wars

December 15, 2017, 10:00 am

≫ Next: Visualizing your real-time blood sugar values AND a Git Prompt on Windows PowerShell and Linux Bash

≪ Previous: Top stories from the VSTS community – 2017.12.15

It's said that a movie is written three times: in the screenplay, in the filming, and finally in the editing. This video essay is about how Star Wars — the original 1977 release, that is — was basically recreated in the editing room after the original cut played, well, badly. From adding subtitles to Greedo with different dialogue than the original shoot, to adding tension to the final scene by putting the Rebels at risk (amazingly, in the original cut the Death Star was just out in space waiting to be attacked), you can see why the film received an Academy Award for editing. It also my explain why George Lucas's later re-edits (which reversed some of these original editing decisions) aren't so popular.

On a related note, I was sad to see the wonderful series Every Frame a Painting has ended, but check out the archives for other video essays on editing and other parts of film-making magic.

That's all from us here at the blog for this week. Have a great weekend, and we'll be back on Monday.

↧

Visualizing your real-time blood sugar values AND a Git Prompt on Windows PowerShell and Linux Bash

December 16, 2017, 10:32 pm

≫ Next: Bring your own vocabulary to Microsoft Video Indexer

≪ Previous: Because it’s Friday: Editing Star Wars

My buddy Nate become a Type 1 Diabetic a few weeks back. It sucks...I've been one for 25 years. Nate is like me - an engineer - and the one constant with all engineers that become diabetic, we try to engineer our ways out of it. ;) I use an open source artificial pancreas system with an insulin pump and continuous glucose system. At the heart of that system is some server-side software called Nightscout that has APIs for managing my current and historical blood sugar. It's updated every 5 minutes, 24 hours a day.

I told Nate to get NightScout set up ASAP and start playing with the API. Yesterday Nate added his blood sugar to his terminal prompt!

I love this. He uses Linux, but I use Linux (Ubuntu) on Windows 10, so I wanted to see if I could run his little node up from Windows (I'll make it a Windows service).

Yes, you can run cron jobs under Windows 10's Ubuntu, but only when there is an instance of bash running (the Linux subsystem shuts down when it's not used) and upstart doesn't work yet. I could run it from the .bashrc or use various hacks/workarounds to keep WSL (Windows Subsystem for Linux) running, but the benefit of running this as a Windows Service is that I can see my blood sugar in all prompts on Windows, like Powershell as well!

I'll use the "non-sucking service manager (NSSM)" to run Nate's non-Windows-service node app as a Windows service. I ran "nssm install nsprompt" and get this GUI. Then I add the --nightscout parameter and pass in my Nightscout blood sugar website. You'll get an error immediately when the service runs if this is wrong.

NSSM Service Installer

From the Log on tab, make sure the service is logged on as you. I login with my MSA (Microsoft Account) so I used my email address. This is to ensure that with the app writes to ~ on Windows, it's putting your sugars in c:usersLOGGEDINUSER.

Next, run the service with "sc start NSPrompt" or from the Services GUI.

My sugar updater runs in a Windows Service

Nate's node app gets blood sugar from Nightscout and puts it in ~/.bgl-cache. However, to be clear since I'm running it from the Windows side while changing the Bash/Ubuntu on Windows prompt from Linux, it's important to note that from WIndows ~/ is really c:usersLOGGEDINUSER so I changed the Bash .profile to load the values from the Windows mnt'ed drives like this:

eval "$(cat /mnt/c/Users/scott/.bgl-cache)"

Also, you need to make sure that you're using a Unicode font in your console. For example, I like using Fira Code Light, but it doesn't have a single character ⇈ double-up arrow (U+21C8), so I replaced it with two singles. You get the idea. You need a font that has the glyphs you want and you need those glyphs displaying properly in your .profile text file.

You'll need a Unicode Font

And boom. It's glorious. My current blood sugar and trends in my prompt. Thanks Nate!

My sugars!

So what about PowerShell as well? I want to update that totally different prompt/world/environment/planet from the same file that's updated by the service. Also, I already have a custom prompt with Git details since I use Posh-Git from Keith Dahlby (as should you).

I can edit $profile.CurrentUserAllHosts with "powershell_ise $profile.CurrentUserAllHosts" and add a prompt function before "import-module posh-git."

Here's Nate's same prompt file, translated into a PowerShell prompt() method, chained with PoshGit. So I can now see my Git Status AND my Blood Sugar. My two main priorities!

NOTE: If you don't use posh-git, you can remove the "WriteVcsStatus" line and the "Import-Module posh-git" and you should be set!

function prompt {

    Get-Content $ENV:USERPROFILE.bgl-cache | %{$bgh = @{}} {if ($_ -match "local (.*)=""(.*)""") {$bgh[$matches[1]]=$matches[2].Trim();}}

    $trend = "?"

    switch ($bgh.nightscout_trend)
    {
        "DoubleUp" {$trend="↑↑"}
        "SingleUp" {$trend="↑"}
        "FortyFiveUp" {$trend="↗"}
        "Flat" {$trend="→"}
        "FortyFiveDown" {$trend="↘"}
        "SingleDown" {$trend="↓"}
        "DoubleDown" {$trend="↓↓"}
    }

    $bgcolor = [Console]::ForegroundColor.ToString()

    if ([int]$bgh.nightscout_bgl -ge [int]$bgh.nightscout_target_top) {

        $bgcolor = "Yellow"

    } ElseIf ([int]$bgh.nightscout_bgl -le [int]$bgh.nightscout_target_bottom) {

        $bgcolor = "Red"

    } Else {

        $bgcolor = "Green"

    }


    Write-Host $bgh.nightscout_bgl -NoNewline -ForegroundColor $bgcolor

    Write-Host $trend" " -NoNewline -ForegroundColor $bgcolor

    [Console]::ResetColor()


    $origLastExitCode = $LASTEXITCODE

    Write-Host $ExecutionContext.SessionState.Path.CurrentLocation -NoNewline

    Write-VcsStatus

    $LASTEXITCODE = $origLastExitCode

    "$('>' * ($nestedPromptLevel + 1)) "

}


Import-Module posh-git

Very cool stuff.

Blood Sugar and Git in PowerShell!

This concept, of course, could be expanded to include your heart rate, FitBit steps, or any health related metrics you'd like! Thanks Nate for the push to get this working on Windows!

Sponsor: Check out JetBrains Rider: a new cross-platform .NET IDE. Edit, refactor, test and debug ASP.NET, .NET Framework, .NET Core, Xamarin or Unity applications. Learn more and download a 30-day trial!

↧

Bring your own vocabulary to Microsoft Video Indexer

December 18, 2017, 12:00 am

≫ Next: Azure HDInsight Integration with Azure Log Analytics is now generally available

≪ Previous: Visualizing your real-time blood sugar values AND a Git Prompt on Windows PowerShell and Linux Bash

Self-service customization for speech recognition

Video Indexer (VI) now supports industry and business specific customization for automatic speech recognition (ASR) through integration with the Microsoft Custom Speech Service!

ASR is an important audio analysis feature in Video Indexer. Speech recognition is artificial intelligence at its best, mimicking the human cognitive ability to extract words from audio. In this blog post, we will learn how to customize ASR in VI, to better fit specialized needs.

Before we get in to technical details, let’s take inspiration from a situation we have all experienced. Try to recall your first days on a job. You can probably remember feeling flooded with new words, product names, cryptic acronyms, and ways to use them. After some time, however, you can understand all these new words. You adapted yourself to the vocabulary.

ASR systems are great, but when it comes to recognizing a specialized vocabulary, ASR systems are just like humans. They need to adapt. Video Indexer now supports a customization layer for speech recognition, which allows you to teach the ASR engine new words, acronyms, and how they are used in your business context.

How does Automatic Speech Recognition work? Why is customization needed?

Roughly speaking, ASR works with two basic models - an acoustic model and a language model. The acoustic model is responsible for translating the audio signal to phonemes, parts of words. Based on these phonemes, guesses regarding how these phonemes can be sequenced into words know to the system’s lexicon are generated. The language model is then used to choose the most reasonable sequence of words out of these guesses, based on the probabilities of words to occur one after the other, as learned from large samples of text.

When input speech contains new words, the system cannot propose them as guesses, and they won’t be recognized correctly. For instance, Kubernetes, a new Azure product, is a word that we will teach VI to recognize in the example below. In other cases, the words exist, but the language model is not expecting them to appear in a certain context. For example, container service is not a 2-word sequence that a non-specialized language model would be scoring highly probable.

How does customization work?

Video Indexer lets you customize speech recognition by uploading adaptation text, namely text from the domain whose vocabulary you’d like the engine to adapt to. New words appearing in the adaptation text will now be recognized, assuming default pronunciation, and the language model will learn new probable sequences of words.

An example

Let’s take a video on Azure Containers as an example. First, we upload the video to video indexer, without adaptation. Go to the VI portal, click 'upload’, and choose the file from your machine.

After a few minutes, the video on Kubernetes will be indexed (view result). Let us see where adaptation can help. Go 9 minute and 46 seconds into the video. The word ‘Kubernetes’ is a new, highly specific, word that the system does not know, and is therefore recognized as “communities”.

Here are two other examples. At 00:49, “a VM” was recognized as “IBM”. Again, specific domain vocabulary, this time an acronym. The same happens for “PM” at 00:17, where it is not recognized.

To solve these, and other, issues, we need to apply language adaptation. We will start with a partial solution, which will help us understand the full solution.

Example 1: Partial adaptation – words without context

VI allows you to provide adaptation text that introduces your vocabulary to the speech recognition system. At this point, we will introduce just three lines, each with a word including Kubernetes, VM, and PM. The file is available for your review.

Go to the customization settings by clicking on the highlighted icon on the upper-right hand corner of the VI portal, as shown below:

On the next screen, click “add file”, and upload the adaptation file.

Make sure you activate the file as adaptation data.

After the model has been adapted, re-index the file. And… Kubernetes is now recognized!

VM is also recognized, as well as PM at 00:17.

However, there is still room for more adaptation. Manually adding words can only help so much, since we cannot cover all the words, and we would also like the language model to learn from real instances of the vocabulary. This will make use of context, parts of speech, and other cues which can be learned from a larger corpus. In the next example, we will take a more complete approach by adding a decent corpus of real sentences from the domain.

Example 2: Adapting the language model

Similar to what we have done above, let us now use as adaptation text a few pages of documentation about Azure containers. We have collected this adaptation text for your review. Below is an example for this style of adaptation data:

To mount an Azure file share as a volume in Azure Container Instances, you need three values: the storage account name, the share name, and the storage access key… The task of automating and managing a large number of containers and how they interact is known as orchestration. Popular container orchestrators include Kubernetes, DC/OS, and Docker Swarm, all of which are available in the Azure Container Service.

We recommend taking a look at the whole file. Let’s see a few examples of the effect. Let’s go back to 09:46. “Orchestrated” became orchestrator because of the adaptation text context.

Here is another nice example in which highly specific terms become recognizable.

Before adaptation:

After adaptation:

Do’s and don’ts for language model adaptation

The system learns based on probabilities of word combinations, so to learn best:

Give enough real examples of sentences as they would be spoken, hundreds to thousands is a good base.
Put only one sentence per line, not more. Otherwise the system will learn probabilities across sentences.
It is okay to put one word as a sentence to boost the word against others, but the system learns best from full sentences.
When introducing new words or acronyms, if possible, give as many examples of usage in a full sentence to give as much context as possible to the system.
Try to put several adaptation options, and see how they work for you.

Some patterns to avoid in adaptation data:

Repetition of the exact same sentence multiple times. It does not boost further the probability and may create bias against the rest of the input.
Including uncommon symbols (~, # @ % &) as they will get discarded, as well as the sentence they appear into.
Putting too large inputs, including hundreds of thousands of sentences. These will dilute the effect of boosting.

Using the VI language adaptation API

To support adaptation, we have added a new customization tab to the site, and a new web API to manage the adaptation texts, training of the adaptation text, and transcription using adapted models.

In the Api/Partner/LinguisticTrainingData web API you will be able to create, read, update, and delete the adaptation text files. The files are plain *.txt files which contain your adaptation data. For an improved user experience, mainly in the UI, we have groups that each file belongs to. This is especially useful when wanting to disable or enable multiple files at once in the UI.

After adaptation data files are uploaded, we need to use them to customize the system using the Api/Partner/LinguisticModel API's, which creates a linguistic model based on one or more files. In cases where there is more than a single file provided we concatenate the files into a single one. Preparing a customized model can take several minutes, and you are required to make sure that your model status is "Complete" before using it in indexing.

The last and most important step is the transcription itself. We added to the upload a new field named “linguisticModel” that accept a valid, customized linguistic model ID to be used for transcription. When re-indexing, we use the same model ID provided in the original indexing.

Important note: There is a slight difference in the user experience when using our site and API. When using our site, we allow enabling/disabling training data files and groups, and we will choose the active model during file upload/re-index. When using the API, we disregard the active state and index the videos based on model ID provided at run time. This difference is intentional to allow both simplicity in our website and more a robust experience for developers.

The full API can be found in our developer portal.

Conclusion

Adaptation for speech recognition is necessary in order to teach the ASR system new words and how they are being used in a domain’s vocabulary. In Video Indexer, we provide adaptation technology which takes nothing but adaptation text and modifies the language model in the ASR system to make more intelligent guesses, given that the transcribed speech comes from the same domain as the adaptation text. This is useful to teach your system acronyms (e.g. VM), new words (e.g. Kubernetes), and new uses of known words (e.g. container).

↧

Azure HDInsight Integration with Azure Log Analytics is now generally available

December 18, 2017, 12:00 am

≫ Next: Announcing Apache Kafka for Azure HDInsight general availability

≪ Previous: Bring your own vocabulary to Microsoft Video Indexer

I am excited to announce the general availability of HDInsight Integration with Azure Log Analytics.

Azure HDInsight is a fully managed cloud service for customers to do analytics at scale using the most popular open-source engines such as Hadoop, Hive/LLAP, Presto, Spark, Kafka, Storm, HBase etc.

Thousands of our customers run their big data analytical applications on HDInsight at global scale. The ability to monitor this infrastructure, detect failures quickly and take quick remedial action is key to ensuring a better customer experience.

Log Analytics is part of Microsoft Azure's overall monitoring solution. Log Analytics helps you monitors cloud and on-premises environments to maintain availability and performance.

Our integration with log analytics will make it easier for our customers to operate their big data production workloads more effective and simple manner.

Monitor & debug full spectrum of big data open source engines at global scale

Typical big data pipelines utilize multiple open source engines such as Kafka for Ingestion, Spark streaming or Storm for stream processing, Hive & Spark for ETL, Interactive Query [LLAP] for blazing fast querying of big data.

Additionally, these pipelines may be running in different datacenters across the globe.

With new HDInsight monitoring capabilities, our customers can connect different HDInsight clusters to Log Analytics workspace and monitor them with single pane of glass.

Image: Monitoring your global big data deployments with single pane of glass

Collect logs and metrics from open source analytics engines

Once Azure Log Analytics is enabled on your cluster, you will see important logs and metrics from number of different open source frameworks as well as cluster VM level metrics such as CPU usage, memory utilization and more. Customers will be able to get a full view into their cluster, from one location.

Many of our customers take advantage of elasticity of the cloud by creating and deleting clusters to minimize their costs. However, they want to retain the job logs and other useful information even after the cluster is terminated. With Azure log analytics, customers can retain the job information even after the cluster is deleted.

Below are some of the key metrics and logs collected from your HDInsight clusters.

Yarn Resource Manager, Yarn Applications, Hive, Mapreduce, Kafka, Storm, Hive Server 2, Hive Server Interactive, Oozie, Spark, Spark executor and driver Livy, Storm, HBase, Phoenix, Juypter, LLAP, Zookeeper, and many more.

Image: Logs & Metrics from various Open Source engines.

Visualize key metrics with solution templates

To make it easier we have created number of visualizations so that our customers can understand important metrics. We have published multiple solution templates for you to get started quickly. You can install these solutions templates from Azure portal directly, under Monitoring + Management.

Image: Installing HDInsight solution templates from Azure portal

Once installed, you can see visualize the key metrics. In the example below you can see the dashboard for your Spark clusters.

Image: Spark dashboard

Troubleshoot issues faster

It’s important to be able to detect and troubleshoot issues faster and find the root cause when developing big data applications in Hive, Spark or Kafka.

With log analytics portal, you can:

Write queries to quickly find issues of important data in your logs and metrics
Filter, sort, and group results within a time range
See your data in tabular format or in a chart

Below is the example query to look at application metrics from a Hive query

search *

| where Type contains "application_stats_dag_CL" and ClusterName_s contains "testhive02"

|order by TimeGenerated desc

Image: troubleshooting hive jobs

Enabling Log Analytics

Log Analytics integration with HDInsight is enabled via the Azure portal, PowerShell or the Azure SDK.

Enable-AzureRmHDInsightOperationsManagementSuite [-Name] <String> [-WorkspaceId] <String> [-PrimaryKey] <String> [-ResourceGroupName <String>] [-DefaultProfile <IAzureContextContainer>] [-WhatIf] [-Confirm] [<CommonParameters>]

Image: Enabling log Analytics from Azure portal

Get started today

HDInsight integration with Azure Log Analytics help you to gain greater visibility into your Big Data environment. Learn more about the capabilities and to simplify monitoring of your Big Data applications.

Please reach out to AskHDInsight@Microsoft.com in case of any questions.

↧

Announcing Apache Kafka for Azure HDInsight general availability

December 18, 2017, 12:00 am

≫ Next: Azure HDInsight announcements: Significant price reduction and amazing new capabilities

≪ Previous: Azure HDInsight Integration with Azure Log Analytics is now generally available

Apache Kafka on Azure HDInsight was added last year as a preview service to help enterprises create real-time big data pipelines. Since then, large companies such as Toyota, Adobe, Bing Ads, and GE have been using this service in production to process over a million events per sec to power scenarios for connected cars, fraud detection, clickstream analysis, and log analytics. HDInsight has worked very closely with these customers to understand the challenges of running a robust, real-time production pipeline at an enterprise scale. Using our learnings, we have implemented key features in the managed Kafka service on HDInsight, which is now generally available.

A fully managed Kafka service for the enterprise use case

Running big data streaming pipelines is hard. Doing so with open source technologies for the enterprise is even harder. Apache Kafka, a key open source technology, has emerged as the de-facto technology for ingesting large streaming events in a scalable, low-latency, and low-cost fashion. Enterprises want to leverage this technology, however, there are many challenges with installing, managing, and maintaining a streaming pipeline. Open source bits lack support and in-house talent needs to be well versed with these technologies to ensure the highest levels of up-time. Every second an ingestion pipeline is down, data is lost.

The HDInsight team learned from the challenges that enterprises faced while installing and operating Apache Kafka, and introduced Apache Kafka on HDInsight as a managed service last November. HDInsight is a managed platform with a 99.9% SLA on open source workloads. With this addition, our enterprise customers no longer worry about managing Kafka clusters, as HDInsight manages and fixes the issues involved with running Kafka at an enterprise scale. Not only did we onboard Apache Kafka as a fully managed service on Azure HDInsight, we used our customer’s feedback to innovate key features within the managed service.

We introduced native integration with Azure Managed Disks, which reduces costs exponentially as these workloads are scaled out for large enterprises like Toyota and Bing Ads. We also introduced tools for implementing rack awareness in Apache Kafka for the Azure environment to ensure the highest levels of Kafka availability on HDInsight. These key features and the general availability of Apache Kafka on HDInsight complete an end to end streaming pipeline on the Azure platform. Enterprises can deploy highly scalable, fault tolerant, and secure real-time architectures with Apache Kafka, Apache Spark, and Apache Storm on the managed HDInsight platform with a single click.

Customer success stories

The preview launch of managed Kafka on HDInsight received an overwhelming response. The Kafka team at HDInsight collaborated with large enterprises to help enable their streaming big data scenarios encompassing connected cars, clickstream analytics, fraud detection, real-time patient care, and more. In addition to getting these scenarios off the ground, managed Kafka on HDInsight serves as the backbone that powers them in production today – processing upwards of a trillion events/day. Along with the release including managed disks in June, we showed how millions of Toyota cars are ingesting data every second on Azure through the managed Kafka service on HDInsight . Today we have a few more scenarios to showcase.

Adobe Experience Cloud

Adobe's Experience Cloud provides industry-leading analytics and data processing for the world's top firms. In early 2017, Adobe's team needed a new way to ingest massive amounts of data for some of their most demanding customers. Not only did Adobe need to develop an entire new way of accepting this data, but they needed to build, test, and ship this new product feature in just a few month’s time. Adobe decided to use HDInsight Kafka to help them build this new pipeline and have been successfully using it to process over a billion transactions each day.

"Azure's HDInsight Kafka was the perfect solution for us. We had used Kafka successfully in other projects, but it would have taken us too long to get an internal Kafka instance deployed and ready to use. Using Azure, we had a development Kafka cluster ready in hours that helped us build our new system in record time. When we were ready to start ingesting live data with our new system we decided to continue using HDInsight Kafka and it has been a solid part of our infrastructure for months."

– Josh Butikofer, Sr. Software Architect, Adobe

GE Healthcare digital health revolution

“At GE Healthcare, we apply cutting edge technological innovations in cloud big data and machine learning to solve problems faced by thousands of clinics, hospitals, health care providers and millions of patients every day. We use Apache Kafka as a key technology we use to power these intelligent scenarios. Azure HDInsight provides Apache Kafka and Apache Spark as managed services, which makes it very easy for us to manage and operate these services in the Azure cloud at GE’s scale. This is just the start – we hope to bring this scale of analytics and intelligence to drive productive transformations in the digital health revolution.”

– Animesh Mahapatra, Director Software Engineering, GE

Microsoft Office365, Skype, and Bing Ads

“Data is the backbone of Microsoft's massive scale cloud services such as Bing, Office365, and Skype. Siphon is a service that provides a highly available and reliable distributed Data Bus for ingesting, distributing, and consuming near real-time data streams for processing and analytics for these services. For Siphon, we rely on Azure HDInsight Kafka as a core building block that is highly reliable, scalable, and cost effective. Siphon ingests over a trillion messages per day, and we look forward to leverage HDInsight Kafka to continue to grow in scale and throughput.”

– Thomas Alex, Principal Program Manager, Microsoft

Learn more and get started

To learn more, watch the video and follow the learning path listed below.

Apache Kafka on Azure HDInsight quick-start

Stay up-to-date with HDInight on the Azure blog and follow us on Twitter @HDInsight for the latest news and updates on the managed platform.

If you have any questions or feedback, please reach out to AskHDInsight@microsoft.com.

↧

Azure HDInsight announcements: Significant price reduction and amazing new capabilities

December 18, 2017, 12:00 am

≫ Next: Enterprise Security Package preview for Azure HDInsight

≪ Previous: Announcing Apache Kafka for Azure HDInsight general availability

Today, we are really happy to announce that we are reducing the prices for Azure HDInsight service and making several awesome capabilities generally available.

Launched in 2013, Azure HDInsight is a fully-managed, full spectrum, open-source analytics cloud service by Microsoft that makes it easy, fast, and cost-effective to process massive amounts of data. You can use the most popular open-source engines such as Hadoop, Spark, Hive, LLAP, Kafka, Storm, HBase, R and install more open source frameworks from the OSS ecosystem.

Amazing value for our customers

Customers ranging from startups to enterprises are using Azure HDInsight for their mission-critical applications. The service enables a broad range of scenarios in manufacturing, retail education, nonprofit, government, healthcare, media, banking, telecommunication, insurance and many more industries ranging in use cases from ETL to Data Warehousing, from Machine Learning to IoT and many more. Many Fortune 500 customers are running their big data pipelines on Azure HDInsight:

AccuWeather is using this technology to gain real-time intelligence into weather and business patterns. Handling 17 billion requests for data each day, AccuWeather is helping 1.5 billion people safeguard and improve their lives and businesses.

Cornell Lab of Ornithology improved Machine Learning Workflow with Azure HDInsight. By moving their open-source workflow to Microsoft’s scalable Azure HDInsight service, the researchers reduced their analysis run times to 3 hours, generating results for more species and providing quicker results for conservation staff to use in planning.

Learn more about how other customers are using Azure HDInsight here.

50+% price reduction with even more value

Today we are excited to announce the following:

Up to 52% price reduction in Azure HDInsight. We are happy to announce that we are lowering the prices in Azure HDInsight. Customers will get even more value from their batch processing, interactive querying, machine learning, streaming analytics, and real-time analytics workloads on Azure HDInsight at a much lower price. According to IDC, using Azure HDInsight, customers saw a 63% decrease in TCO as compared to on-premises solutions. The price reductions will substantially lower Azure HDInsight’s TCO even further. As a part of this price reduction, we will also be replacing the Premium cluster tier with the Enterprise Security Package that can be added to your Azure HDInsight cluster. Enterprise Security Package will still be in preview. The price reduction becomes effective on January 5, 2018. To learn more about the new pricing, read here.
Additional 80% price reduction for R Server for Azure HDInsight. Accompanying the overall Azure HDInsight price reduction, the price of the Microsoft R Server for HDInsight is also being reduced by 80%. This reduces the cost to $0.016 per core hour, allowing Microsoft R Server on Azure HDInsight users to run distributed R analytics workloads at a significantly lower price.
General availability of Apache Kafka. We are announcing the general availability of Apache Kafka on Azure HDInsight. Apache Kafka on Azure HDInsight enables customers to build enterprise-grade, open-source, real-time analytics solutions such as IoT, fraud detection, clickstream analysis, social analytics and more backed by Azure HDInsight 99.9% SLA.
General availability of Azure Log Analytics integration. Azure Log Analytics with Azure HDInsight enables enterprise grade monitoring for mission critical analytics workloads. You can now get alerts, monitor and debug all your Azure HDInsight workloads.
Public preview integration with Power BI direct query. We are excited to announce public preview integration with Power BI direct query, which allows you to create dynamic reports based on data and metrics you already have on your Interactive Query clusters in Azure BLOB store or Azure Data Lake Store. You can now also build visualizations on your entire data set much faster with the most recent data.
Advanced development tools for Apache Spark. For Scala and Java developers building robust production data pipelines in Azure HDInsight, we now offer plug-ins for IntelliJ and Eclipse with unique ability to submit and debug Spark jobs on the HDInsight clusters and support for distributed debugging of Spark code running across multiple Spark executors. For PySpark developers, who value productivity of Python language, new Visual Studio Code plugin for HDInsight offers first class integration with this popular code editors. Developers can now edit their scripts locally on their machines and submit PySpark statements to Azure HDInsight cluster with interactive experience.

Azure HDInsight: Cloud-native full-spectrum open source analytics on Azure

Today’s announcements build upon the following core capabilities of Azure HDInsight service, which provides enterprises with a robust platform to run their production workloads:

Cloud native. Azure HDInsight is the only service in the industry to provide an end-to-end SLA for your production workloads running Hadoop, Spark, Hive, LLAP, Kafka, Storm, HBase and R.

"At Johnson Controls, we use Azure HDInsight for performing real-time and batch analysis of sensor data that we collect from over 6,000 Connected Industrial Chillers deployed across the world. Azure HDInsight offers us the scalability, reliability, high-availability, performance, security and ease of deployment that we need in our production infrastructure to consistently deliver value to our customers".

- Vaidhyanathan Venkiteswaran, Platform Engineering Manager, Data Enabled Business, Johnson Controls

Low cost. Azure HDInsight is a cost-effective service that is extremely powerful and reliable. You pay only for what you use. You can create clusters on demand and then scale them up or down as needed. The decoupled compute and storage architecture provides better performance and flexibility.

“PROS is a pioneer in using machine learning to give companies an accurate and profitable pricing. PROS Guidance product runs enormously complex pricing calculations based on variables that comprise multiple terabytes of data. In Azure HDInsight, a process that formerly took several days now takes just a few minutes.”’

Secure and compliant. Azure HDInsight brings enterprise grade protection of your data with monitoring, virtual networks, encryption, Active Directory authentication, authorization, role-based access control and more. The service meets the most popular industry and government compliance standards.

“HDInsight as a Big Data Platform has enabled our data engineers and scientists to focus on developing data and analytics products rather than managing infrastructure and troubleshooting day-day issues related to very large clusters. The heavy lifting of installing and managing clusters, providing robust security with Apache Ranger, data at rest encryption, monitoring and scaling up/down is taken care by HDInsight. This platform is used for variety of use cases like real time streaming, machine learning, visualization, ETL. Overall a very positive experience with HDInsight engineering, product and support teams.”

- Navaljit Bhasin, Big Data Engineering Director, Honeywell

Global. Azure HDInsight is available in more Azure regions than any other Big Data offering in any other cloud. Azure HDInsight is also available in Azure government clouds in the US and sovereign clouds including China and Germany to meet sovereign compliance requirements.
Productive. With Azure HDInsight, you can use your preferred productivity tools such as Visual Studio, Eclipse, and IntelliJ for Scala, Python, R, Java and .NET. Data scientists and machine learning professionals can collaborate on the most popular notebooks such as Jupyter and Zeppelin.
Extensible. With Azure HDInsight, you can seamlessly integrate with the most popular big data solutions with a single-click and easily extend your analytics cluster capabilities with applications, edge nodes, and customize using script actions.

“Azure HDInsight Application Platform is the one of the most extensible platforms and has allowed us to make our product easily available to many customers using the Azure Platform. The one-click deploy experience is a game changer and takes away the most common pain point around discovering and installing applications from the Big Data ecosystem. It was by far the easiest way for our enterprise customers to deploy and experience AtScale running on the Azure Cloud."

- Eddie White, VP of Business Development, AtScale

Get started now!

With Azure HDInsight, our mission is to provide a fully managed, full spectrum of open source technologies combined with the power of the cloud. Customers today are using these open source technologies to build a variety of different applications such as batch processing, ETL, Data Warehousing, Machine Learning, IoT and more. We hope you take full advantage of today’s announcements and we are excited to see what you will build with Azure HDInsight. Read this developer guide and follow the quickstart guide to learn more about implementing these pipelines and architectures on Azure HDInsight. Stay up-to-date on the latest Azure HDInsight news and features by following us on Twitter #HDInsight and @AzureHDInsight. For questions and feedback – please reach out to AskHDInsight@microsoft.com.

↧

Enterprise Security Package preview for Azure HDInsight

December 18, 2017, 1:00 am

≫ Next: GDPR How-to: Get organized and implement the right processes

≪ Previous: Azure HDInsight announcements: Significant price reduction and amazing new capabilities

Azure HDInsight is a fully-managed cloud service that makes it easy, fast, and cost-effective to process massive amounts of data. Use the most popular open-source frameworks such as Hadoop, Spark, Hive, LLAP, Kafka, Storm, R & more. Azure HDInsight enables a broad range of scenarios such as ETL, Data Warehousing, Machine Learning, and IoT.

By default, when you provision a HDInsight cluster, you are required to create a local admin user and local SSH user that has full access to the cluster. The local admin user can access all the files, folders, tables, columns, etc. With a single local user, there is no need for role-based access control. However, as enterprise customers move to the cloud, they must enable strict security requirements in terms of authentication, authorization, auditing, and governance. This is especially important with larger or multiple teams that share the same cluster. Admins don’t want to create individual clusters for individual users. When we talked to customers, we received three main requests as part of enabling cluster access to multiple users:

As a data scientist, I want to use my Active Directory domain credentials to run queries on the cluster.
As a cluster admin, I want to configure role-based access control to restrict access to data only as needed.
As a cluster admin, I want to view audit logs, in terms of who accessed what data, and whether access succeeded or failed.

To meet these requirements, the HDInsight team went with a preview of the HDInsight premium cluster tier for Hadoop cluster types. We received a tremendous response, and a lot of customers signed up to be part of the preview program. Based on the feedback, and customer interest, it became clear that this feature shouldn’t be part of different cluster tier but rather an add-on to the regular/standard HDInsight cluster. Creating the add-on the security package simplifies the cluster creation workflow and improves user experience.

Today, we are excited to announce that these features are available as part of the add-on (optional) Enterprise Security Package. As part of provisioning the HDInsight cluster, you can optionally select the Enterprise Security Package.

Once you select this add-on feature, you will be able to:

Integrate the HDInsight cluster with your own Active Directory including Active Directory on-premises, Azure Active Directory Domain Services, or Active Directory on IaaS VMs. As an admin, you can grant domain users access to the cluster This means, that users can use their own corporate (domain) user-name and password to access the cluster.
Configure Role-Based Access Control for Hive, Spark, and Interactive Hive tables using Apache Ranger. Additionally, you can also set file and folder permissions for data stored in Azure Data Lake Store.
View the audit logs to see who accessed what data and what policy was enforced as part of the access.

We have enabled this feature for Hadoop, Spark, and Interactive Query cluster types.

To learn more about the Enterprise Security Package, refer to the below helpful links:

↧

GDPR How-to: Get organized and implement the right processes

December 18, 2017, 2:00 am

≫ Next: Last week in Azure: Azure France regions, cost savings, and more

≪ Previous: Enterprise Security Package preview for Azure HDInsight

Achieving compliance with the General Data Protection Regulation (GDPR), the new data privacy law from the European Union (EU), is not a one-time activity but is an ongoing process. When the GDPR goes into effect on May 25, 2018, individuals will have greater control over their personal data. Additionally, the GDPR imposes new obligations on organizations that collect, handle, or analyze personal data. Implementing the right processes and organizational changes to comply with the GDPR will not be an easy task, but Microsoft is here to help. With 10 chapters, 99 articles, and 160 requirements the GDPR is a complex law, and implementing all this will be a challenge, so Microsoft has created a highly detailed guide.

Our colleagues from Microsoft France recently published a detailed implementation guide, GDPR - Get organized and implement the right processes, available in both English and French. The guide provides customers with a methodology for creating and executing a GDPR compliance program in their organization. It describes the necessary steps for achieving GDPR compliance through a plan, do, check, act (PDCA) approach using Microsoft Cloud services such as Azure, as shown in the diagram below.

Figure 1: Consolidated view of the main GDPR related activities to be carried out, grouped by main categories.

For example, the guide explains when and how to create a data protection impact analysis (DPIA), describes what approval process should be put in place, what governance model should be applied, and what the role of a Data Protection Officer (DPO) is in the context of the GDPR.

Further information about how Azure helps you to successfully address the requirements of your GDPR compliance preparation are available at the Microsoft Azure GDPR web page on our Microsoft Trust Center.

↧

Last week in Azure: Azure France regions, cost savings, and more

December 18, 2017, 5:00 am

≫ Next: The Trouble with Bias, by Kate Crawford

≪ Previous: GDPR How-to: Get organized and implement the right processes

Two new Azure regions are now in preview in France: France Central in Paris, and France South in Marseille. These regions are part of Azure’s global portfolio of announced regions in 42 locations around the world. Availability Zones in France Central can be paired with the geographically separated France South region for regional disaster recovery while maintaining data residency requirements. This past week also saw new capabilities added to help manage cost on Azure. With Azure Cost Management, Azure is the only platform that offers an end-to-end cloud cost management and optimization solution to help customers make the most of cloud investment across multiple clouds. Cost Management is free to all customers to manage their Azure spend.

Headlines

Microsoft Azure preview with Azure Availability Zones now open in France - The preview of Microsoft Azure in France is open today to all customers, partners and ISVs worldwide giving them the opportunity to deploy services and test workloads in these latest Azure regions. This is an important step towards offering the Azure cloud platform from our datacenters in France.

Cloud storage now more affordable: Announcing general availability of Azure Archive Storage - Learn how to reduce your storage costs by storing your rarely accessed data in Archive Blob Storage, the new, third tier of Blob-Level Tiering. Blob-Level Tiering is now generally available and includes hot, cool, and archive tiers.

New Azure management and cost savings capabilities - Learn about four new management and cost savings capabilities: Azure Policy provides control and governance at scale for your Azure resources; Azure Cost Management is rolling out the support for Azure Virtual Machine Reserved Instances management later this week to help you maximize savings over time; reduced pricing on our Dv3 Series virtual machines in several regions; as mentioned above, our lowest priced Storage tier Azure Archive Storage is generally available.

Azure Marketplace – New offers in November 2017 - The Azure Marketplace is an online applications and services marketplace that enables start-ups and ISVs to offer their solutions to Azure customers around the world. The Azure Marketplace currently offers virtual machine images, virtual machine extensions, APIs, applications, and machine learning services. In the past month, 35 new offers went live, including Red Hat Enterprise Linux 7, Docker EE for Azure (Basic), Cassandra Cluster from Bitnami, EDB Postgres Ark, and more.

Microsoft expands scope of Singapore MTCS certification - As part of its commitment to customer satisfaction, Azure has adopted the MTCS standard to meet different cloud user needs for data sensitivity and business criticality. Azure has maintained its MTCS certification for the fourth consecutive year. A Level 3 certification means that in-scope Microsoft cloud services can host high-impact data for regulated organizations with the strictest security requirements.

Azure Monitor: Send monitoring data to an event hub - Now you can set up your resource-level diagnostic logs and metrics to be streamed to any of three destinations including a storage account, an Event Hubs namespace, or Log Analytics. Sending to an Event Hubs namespace is a convenient way to stream Azure logs from any source into a custom logging solution, 3rd party SIEM product, or other logging tool.

How cloud speed helps SQL Server DBAs - Learn how Azure SQL Database significantly transformed the SQL Server engineering model and how the evolution continues. The build-and-ship process continues to be streamlined and improved to provide continuous value and innovation to our customers both in Azure SQL Database and SQL Server.

General availability of Azure Site Recovery Deployment Planner for VMware and Hyper-V - Use this tool to understand your on-premises networking requirements, Microsoft Azure compute and storage requirements for successful Azure Site Recovery replication, and test failover or failover of your applications. In addition, the GA release of this tool includes detailed cost estimates of disaster recovery to Azure for your environment.

Azure ARM API for consumption usage details - An updated usage details API is now available, which is a first step in the consolidation of Azure cost and usage based APIs in the ARM (Azure Resource Manager) model. The Azure Consumption APIs give you programmatic access to cost and usage data for your Azure resources.

Announcing the General Availability of Azure Bot Service and Language Understanding, enabling developers to build better conversational bots - Microsoft Azure Bot Service and Microsoft Cognitive Services Language Understanding (LUIS) are both generally available. Azure Bot Service enables developers to create conversational interfaces on multiple channels while LUIS helps developers create customized natural interactions on any platform for any type of application, including bots. Learn more in the Conversational Bots Deep Dive – What’s new with the General Availability of Azure Bot Service and Language Understanding.

Top themes from KubeCon 2017 (Microsoft + Open Source blog) - Read how the Kubernetes community came together in record numbers at KubeCon earlier this month, with the goal of making it easier than ever to use containers to modernize existing applications and manage new applications.

Service updates

Azure shows

	AKS cluster upgrades and managed K8s - Corey Sanders, Director of Program Management on the Microsoft Azure Compute team sat down with Gabe Monroy a principal PM on the Azure Compute team. Gabe shows off new functionality to auto upgrade Kubernetes clusters as part of the relaunch of AKS.
	Azure IoT Hub - Olivier Bloch joins Scott Hanselman to discuss Azure IoT and how it is more than just about connecting IoT devices and sending telemetry to the Cloud. They also talk about Azure IoT device topics such as twins, provisioning, and lifecycle management.
	Windows 10 IoT and Azure IoT Device Management Enhancements - David Campbell joins Scott Hanselman to discuss Windows 10 IoT and how it enhances Azure IoT Device Management (DM) capabilities on Windows IoT, simplifying DM and aligning Azure DM with other Windows DM solutions.
	Jenkins CI/CD with Service Fabric - In this episode, Mani Ramaswamy shows Scott Hanselman how to use Jenkins for your CI/CD pipeline with Service Fabric and run your Jenkins build server directly on the Service Fabric cluster. The Service Fabric team uses Jenkins internally for testing on Linux, and you can learn about how it is configured.
	The Azure Podcast: Episode 208 - From College to Azure - We chat with Kendal Roden, an Azure Consultant at Microsoft, about the journey she went through, graduating from college to getting ramped up on Azure and working on real engagements with customers.
	Cloud Tech 10 - 18th December 2017 - Bot Service, Language Understanding, Azure France and more - Each week, Mark Whitby, a Cloud Solution Architect at Microsoft UK, covers what's happening with Microsoft Azure in just 10 minutes, or less. In this episode:

↧

The Trouble with Bias, by Kate Crawford

December 18, 2017, 9:00 am

≫ Next: VMware virtualization on Azure

≪ Previous: Last week in Azure: Azure France regions, cost savings, and more

Bias is a major issue in machine learning. But can we develop a system to "un-bias" the results? In this keynote at NIPS 2017, Kate Crawford argues that treating this as a technical problem means ignoring the underlying social problem, and has the potential to make things worse.

You can read more about biases in AI systems in this article at the Microsoft AI blog.

↧

VMware virtualization on Azure

December 18, 2017, 9:00 pm

≫ Next: Azure #CosmosDB: Entry point for unlimited containers is now 60% cheaper and other improvements

≪ Previous: The Trouble with Bias, by Kate Crawford

We have had tons of interest in our VMware virtualization on Azure offering. This includes questions about what we are offering and how we will provide an enterprise grade solution. Here are some of the details on the preview.

To enable this solution, we are working with multiple VMware Cloud Provider Program partners and running on existing VMware-certified hardware. For example, our preview hardware will use a flexpod bare metal configuration with NetApp storage. This hosted solution is similar to Azure's bare metal SAP HANA Large Instances solution that we launched last year. With this approach, we will enable you to use the same industry-leading VMware software and services that you currently use in your on-premises datacenters, but running on Azure infrastructure, allowing L3 network connectivity for existing applications to Azure-native services like Azure Active Directory, Azure Cosmos DB, and Azure Functions.

We are facilitating discussions with VMware and the VCPP partners to ensure you have a great solution and a great support experience when we make this offering generally available next year. More details from VMware on this can be found here. We will share more information on GA plans and partners in the coming months. If you’d like to participate in this preview, please contact your Microsoft sales representative.

Thanks,

Corey

↧

Azure #CosmosDB: Entry point for unlimited containers is now 60% cheaper and other improvements

December 19, 2017, 12:00 am

≫ Next: XBox – Analytics on petabytes of gaming data with Azure HDInsight

≪ Previous: VMware virtualization on Azure

Azure Cosmos DB is Microsoft’s globally distributed, horizontally partitioned, multi-model database service. The service is designed to allow customers to elastically and independently scale throughput and storage across any number of geographical regions. Azure Cosmos DB offers guaranteed low latency at the 99th percentile, 99.999% high availability, predictable throughput, and multiple well-defined consistency models. Azure Cosmos DB is the first and only globally distributed database service in the industry today to offer comprehensive Service Level Agreements (SLAs) encompassing all four dimensions of global distributions which our customers care the most: throughput, latency at the 99th percentile, availability, and consistency. As a cloud service, we have carefully designed and engineered Azure Cosmos DB with multi-tenancy, horizontal scalability and global distribution in mind.

We have just rolled out a few long-awaited changes and we wanted to share them with you:

Entry point for unlimited collections/containers is now 60% cheaper. In February, we’ve lowered entry point for unlimited containers making them 75% cheaper. We continue making improvements in our service and today we are pleased to announce that unlimited containers have now an entry point that is 60% cheaper than before. Instead of provisioning 2,500 RU/sec as a minimum, you can now provision an unlimited collection at 1,000 RU/sec and scale in increments of 100 RU/sec. Unlimited containers (collections) enable you to dynamically scale your provisioning from as low as 1,000 RU/sec to millions of RU/sec with no limit on storage consumption.
Provisioning massive amounts of throughput is now completely frictionless. Azure Cosmos DB’s design enables developers to elastically scale throughput across multiple geographical regions while maintaining the SLAs. The system is designed to scale throughput across regions and ensures that the changes to the provisioned throughput is instantaneous. To empower our users and their mission-critical workloads on Azure Cosmos DB, we are now allowing them to provision up to 1,000,000 RU/sec per container without raising a support request. This enables customers to elastically scale throughput based on the application traffic patterns across different geographical regions to support fluctuating workloads varying both by geography and time. The system manages the partitions transparently without compromising the availability, consistency, latency or throughput of an Azure Cosmos DB container.
Azure Cosmos DB is now available as a part of Microsoft Azure in France. Azure Cosmos DB is a foundational service in Azure powering mission-critical applications, services and customer workloads around the world. We are happy to announce that Azure Cosmos DB is a part of the preview of Microsoft Azure in France, which is now open to all customers, partners and ISVs worldwide giving them the opportunity to deploy services and test workloads against Azure Cosmos DB in these latest Azure regions. The new Azure Regions in France are part of our global portfolio of 42 regions in Azure. Azure Cosmos DB in the new regions will offer the same enterprise-grade reliability and performance with the industry-leading comprehensive SLAs to support the mission-critical applications and workloads of businesses and organizations in France. To sign up for Azure France preview, please visit here.

Azure Cosmos DB addresses fundamental data needs of all modern apps

Azure Cosmos DB is the database of the future - it is what we believe is the next big thing in the world of massively scalable databases! It makes your data available close to where your users are, worldwide. It is a globally distributed, multi-model database service for building planet scale apps with ease using the API and data model of your choice. You can Try Azure Cosmos DB for free today, no sign up or credit card required. If you need any help or have questions or feedback, please reach out to us on the developer forums on Stack Overflow. Stay up-to-date on the latest Azure Cosmos DB news and features by following us on Twitter #CosmosDB, @AzureCosmosDB.

- Your friends at Azure Cosmos DB

↧

XBox – Analytics on petabytes of gaming data with Azure HDInsight

December 19, 2017, 1:00 am

≫ Next: ASA Police Data Challenge student visualization contest winners

≪ Previous: Azure #CosmosDB: Entry point for unlimited containers is now 60% cheaper and other improvements

This blog post was co-authored by Karan Gulati, Senior Software Engineer, XBOX and Daniel Hagen, Senior Software Engineer, XBOX.

Microsoft Studios produces some of the world’s most popular game titles including the Halo, Minecraft, and Forza Motorsport series. The Xbox product services team manage thousands of datasets and hundreds of active pipelines consuming hundreds of gigabytes of data each hour for first party studios. Game developers need to know the health of their game through measuring acquisition, retention, player progression, and general usage over time. This presents a textbook big data problem where data needs to be cleaned, formatted, aggregated and reported on, better known as ETL (Extract Transform Load).

HDInsight - Fully managed, full spectrum open source analytics service for enterprises

Azure HDInsight is a fully-managed cloud service for customers to do analytics at a massive scale using the most popular open-source frameworks such as Hadoop, MapReduce, Hive, LLAP, Presto, Spark, Kafka, and R. HDInsight enables a broad range of customer scenarios such as batch & ETL, data warehousing, machine learning, IoT and streaming over massive volumes of data at a high scale using Open Source Frameworks.

Key HDInsight benefits

Cloud native: The only service in the industry to provide an end-to-end SLA on your production workloads. Cloud optimized clusters for Hadoop, Spark, Hive, Interactive Query, HBase Storm, Kafka, and Microsoft R Server backed by a 99.9% SLA.
Low cost: Cost-effectively scale workloads up or down through decoupled compute and storage. You pay for only what you use. Spark and Interactive Query users can use SSD memory for interactive performance without additional SSD cost.
Secure: Protect your data assets by using virtual networks, encryption, authenticate with Active Directory, authorize users and groups, and role based access control policies for all your enterprise data. HDInsight meets many compliance standards such as HIPAA, PCI, and more.

Global: Available in more than 25 regions globally. HDInsight is also available in Azure Government cloud and China which allows you to meet our needs in key geographical areas.
Productive: Rich productivity tools for Hadoop and Spark such as Visual Studio, Eclipse, and IntelliJ for Scala, Python, R, Java, and .NET support. Data scientists can also use the two most popular notebooks, Jupyter and Zeppelin. HDInsight is also the only managed-cloud Hadoop solution with integration to Microsoft R Server.
Extensible: Seamless integration with leading certified big data applications via an integrated marketplace which provides a one-click deploy experience.

The big data problem

To handle this wide range of uses and varying scale of data, Xbox has harnessed the versatility and power of Azure HDInsight. As raw heterogeneous json data lands in Azure Blob Storage, Hive jobs transform that raw data to more performant and indexed formats such as ORC (Optimized Row Columnar). Studio users can then add additional Hive, Spark, or Azure ML jobs to the pipeline to clean, filter, and aggregate further. 

Scalable HDInsight architecture with decoupled compute and underlying storage

Depending on the launch style of a game, Xbox telemetry systems can see huge spikes in data at launch. Outside of an increase in users, the type of analysis and query needed to answer different business questions can vary drastically from game to game and throughout the lifecycle, resulting in shifting the compute needed. Xbox uses the ease of creating HDInsight clusters via the Azure APIs to scale and create new clusters as analytic needs and data fluctuates while maintaining SLA. 

Needing a system that scales up and out while offering a variety of isolation levels, Xbox chose to utilize an array of Azure Storage Accounts and a shared Hive metastore. Utilizing an external Azure SQL database as the Hive metastore allows the creation of many clusters while sharing the same metadata, enabling a seamless query experience across dozens of clusters. Utilizing many Azure Storage Accounts, attached with SAS keys to control permissions, allows for a greater degree of consistency and security at the cluster level. Employing this cluster of clusters method greatly increases the scale out ability. In this cluster of clusters configuration, Xbox enabled the separation of processing (ETL) clusters and read-only ad-hoc clusters. Users are able to test queries and read data from these read-only clusters without affecting other users or processing, eliminating noisy neighbor situations. Users can control the scale and how they utilize their read-only cluster while sharing the same underlying data and metadata.

Shared data and metastore across different analytical engines in Azure HDInsight

We adopted the following best practices while picking up an external metastore with HDInsight for high performance and agility:

Use an external metastore. This helped us separate compute and metadata.
Ensure that the metastore created for one HDInsight cluster version is not shared across different HDInsight cluster versions. This is due to different Hive versions having different schemas. For example, Hive 1.2 and Hive 2.1 clusters trying to use same metastore.
Back-up custom metastore periodically for OOPS recovery and DR needs.
Keep metastore, storage accounts, and the HDInsight cluster in same region.

Data flow

Xbox devices generate telemetry data which is consumed by Event Hub for further processing in HDInsight Cluster by running thousands of different Azure Data Factory activities, and finally making data available to users for further insights. The figure below showcases the Xbox telemetry data journey.

Load balancing of Jobs

We use multiple clusters in our architecture to process thousands of jobs. We built our own custom logic to distribute jobs among a number of different clusters, which helped us optimize our job completion time. We typically have long running jobs and interactive high priority jobs that we need to finish. We use the Yarn capacity scheduler to load balance cluster capacity for these jobs. Typically, we set high priority queue at ~80-90% of the cluster with maximum capacity to 100% and a low priority queue with ~10-20% with maximum capacity to 100%. With this distribution, long running jobs can take max cluster capacity until a high priority interactive job shows up. Once that happens, the high priority job can take 80-90% of cluster capacity and finish faster.

Summary

Xbox telemetry processing pipeline, which is based on Azure HDInsight, can be applied to any kind of enterprise trying to solve big data processing at massive scale.

For more information, please reach out to AskHDInsight@Microsoft.com for any questions.

↧

ASA Police Data Challenge student visualization contest winners

December 19, 2017, 9:00 am

≫ Next: Service Workers: Going beyond the page

≪ Previous: XBox – Analytics on petabytes of gaming data with Azure HDInsight

The winners of the American Statistical Association Police Data Challenge have been announced. The ASA teamed up with the Police Data Initiative, which provides open data from local law enforcement agencies in the US, to create a competition for high school and college students to analyze crime data from Baltimore, Seattle and Cincinnati. In this post, we take a look at the winners of the visualization category.

The winners of the Best Visualization for college students were Julia Nguyen, Katherine Qian, Youbeen Shim, Catherine Sun from University of Virginia. Their entry included several visualizations of crime data in Baltimore, including the crime density map shown below. The team used R for all of the data cleaning, manipulation, and visualizations. The tidyverse suite of packages was used for data pipelining (including stringr and lubridate for merging data on latitude/longitude and date), and the ggmap package for visualization.

The winners of the Best Visualization for high school students were Alex Lapuente, Ana Kenefick and Sara Kenefick from Charlotte Latin School (Charlotte, N.C.). They used Microsoft Excel to look at overall trends Seattle crime data, the impact of employment and poverty on crime, and this visualization of the frequency of traffic-related incidents (note the "pedestrian violation" segment — I can attest from experience that jaywalking is strictly enforced in Seattle!):

For more on the Police Data Challenge and the winners in the Overall and Best Use of External Data categories, follow the link below.

This is Statistics: Police Data Challenge: Congratulations to our Winners!

↧