Introduction
Azure Media Indexer is a powerful media processor that extracts meaningful metadata from multimedia files. In the last post, we explored the basic usage of the processor, and a high-level description of the output files, focusing mainly on the XML-based files.
The focus of this post is the usage of the powerful audio index blob (AIB) file created by indexing jobs.
Audio Index Blob
The AIB file is a lattice-structured textual index of a multimedia file.
At a high level, when compared to the scenarios targeted by the TTML/SAMI output files, you can expect:
- Less false negatives Lattices allow to find (sub-)phrase and ‘AND’ matches where individual words are of low confidence, but the fact that they are queried together allows us to infer that they still may be correct. Word-lattices represent alternative recognition candidates that were also considered by the recognizer, but did not turn out to be the top-scoring candidate.
- Less false positives Lattices also provide a confidence score for each word match. This can be used to suppress low-confidence matches.
- Time stamps Lattices, unlike text, retain the start times of spoken words, which is useful for navigation.
This and more information on the technical background of the AIB can be found on the Microsoft Research website here.
Currently the AIB file can only be used in conjuction with the custom Indexer SQL IFilter Add-on.
Setting up your SQL Server Virtual Machine
In order to use the IFilter, you will need a Windows Server machine on which SQL Server (2008 or later) is installed. Luckily, the Azure VM gallery has a SQL Server preset that will work perfectly for this!
First, go to the Azure Management Portal, and select “New” in the bottom-left hand corner of the screen. For this tutorial, we will use the SQL Server 2012 SP1 Web image in the gallery, as shown below:
Note: be sure to remember the login credentials you create for your virtual machine!
Next, remote desktop into the newly-created machine using the Connect button in the portal, and begin configuring the SQL Server instance to handle your AIB files.
Configuring SQL Server
Open up Internet Explorer on your newly provisioned virtual machine and navigate to http://aka.ms/indexersql.
Note: you may need to add the website to your “allow” list as shown below
Once you have downloaded the custom Indexer SQL IFilter Add-on to the virtual machine, extract all of the files from the archive and run the setup script (setup32.msi for 32 bit machines, setup64.msi for 64 bit machines) which will copy all of the required executables, DLLs, documentation, and sample code to directory selected for installation.
Next, a new database will need to be created for audio search:
- Open SQL Server Management Studio and log into a server as a user with full administrative privileges
- Create the AudioSearch database
- Right-click on the Databases in the Object Explorer and select “New Database…”
- Enter “AudioSearch” in the Database name field.
- Check Full-Text Indexing is enabled for the database
- Right-click on the Audiosearch database in the Object Explorer and select “Properties”
- Click on the “Files” node in the “Select a page” tree to open the “Files” property page
- Make sure “Use full-text indexing” is checked.
- Create the “Files” table and full-text index
- Open the SQL script from [InstallDir]\code\Setup.sql
- Ensure the correct database is selected in the database dropdown in the top command bar (e.g. if it is master, change it to AudioSearch)
- Execute the script.
The Files table is now created and configured for audio full-text indexing on the AIB column. The full-text index will be automatically updated every time a new record is inserted in the Files table
Trying it out the Database
Now that the environment is all set up, we can finally try the SQL Server full-text search using the AIB file! Navigate to [InstallDir]\data and you will find sample_video.wmv and sample_video.wmv.aib, the video and AIB file respectively. Running an indexing job on the video file is out of scope for this blog post, and if you don’t remember how to do it, check out the previous blog post here. (Optionally: re-use the video from the last blog post and compare the results!)
In the last step, we created the Files table, and now we can use the sqlcmd command line utility to insert this sample AIB file into the Files table.
sqlcmd -d AudioSearch -Q "insert into files (Title, Description, Duration, FilePath,Ext,AIB) values ('Bill Gates: Windows and the Cloud', 'Interview with Bill Gates on Channel9 about his transition out of Microsoft and his views on cloud computing.', 79, '[InstallDir]\data\sample_video.wmv','.aib',(select * from openrowset(bulk '[InstallDir]\data\sample_video.wmv.aib', SINGLE_BLOB) as X) )
Now it is time to use SQL Server Management Studio to ensure that the AIB has been properly indexed by SQL Server.
- Open SQL Server Management Studio and log in to a server as a user with full administrative privileges.
- Select “New Query” from the command bar and connect to the same server if prompted.
- Execute the following SQL query:
select filepath from files where contains(AIB,'spoken:cloud')
The results table should show a single line containing the filepath:
[InstallDir]\data\sample_video.wmv
- Now, execute the following query to ensure that 0 results are returned:
select filepath from files where contains(AIB,'spoken:washington')
If you received the expected results, you have successfully indexed your video file using the Indexer SQL Server IFilter!
Look forward to the next blog post, which covers how to integrate Azure Media Indexer into your development workflow.