Click here to Skip to main content
1,822 members
Articles / Security / .NET 1.1
Article

Microsoft Indexing Service How-To

Rate me:
Please Sign up or sign in to vote.
5.00/5 (3 votes)
24 Apr 2012CPOL 11.9K   6   3  
This article describes how to provide full text search using Microsoft Indexing Service in .NET applications.

This article is a sponsored article. Articles such as these are intended to provide you with information on products and services that we consider useful and of value to developers

Introduction

A lot of websites provide search capabilities, where you can simply type several words, press a "Search" button, and you'll receive a list of pages which contain these words. It's simple. But how can you implement these features in your own web application? Yes, you have to use an indexing service which will index your files or web pages. After that, you can use full text search features.

There are a lot of solutions which allow you to provide this functionality in your application. One of them is Microsoft Indexing Service. It's part of Windows 2000 and later Windows versions. So, if you only provide Windows solutions (ASP.NET web applications, Windows Forms applications, etc.), you have to take a look at this Microsoft product.

One of the biggest advantages of Indexing Service is that it's totally free. You can use it without any restrictions or additional licenses. I think that this is so big, because other indexing products cost a lot of money. If you are developing a small or medium sized applications, you don't want to pay thousands of dollars for a full text search tool.

If you choose to use the Indexing Service, you should remember that it can only index file systems. For example, you can't use it for indexing files stored in your database. This is a big minus of the Microsoft Indexing Service, but I believe that you can easily solve this limitation.

In this article, I'll try to describe how to install, configure, and use the Microsoft Indexing Service. We'll develop a simple application which will allow us to use full text search features for web pages located on our local file system.

Installing and configuring the Microsoft Indexing Service

If you are using Windows XP or later, you'll be using Microsoft Indexing Service 3.0. And, if you're still using Windows 2000, you'll be using Microsoft Indexing Service 2.0. This service is installed to your machine, by default. But, you could disable its installation when installing the Operating System. You have to specify that Indexing Service be installed on your machine. To do this, you go to "Add or Remove Programs" in your Control Panel. Choose "Add/Remove Windows Components" there. You have to check that "Indexing Service" is installed. If it isn't installed, install it.

Microsoft Indexing Service Installation

Now, Microsoft Indexing Service has been installed, and you can configure it. Open the "Computer Management" configuration tool. Choose "Services and Applications", "Indexing Service". In this entry, you can manage your Microsoft Indexing Service.

First of all, you should create a new catalog in Indexing Service for the folder which will contain the indexes. Open the context menu for "Indexing Service" and choose "Catalog" in the "New" submenu. Type "Name", choose "Location", and press "OK".

New Catalog Creation

After that, you have to add the folders which will be indexed. For this, choose the "Directories" entry, open its context menu, and choose "Directory" from the "New" submenu. Choose the folder with your documents in the opened dialog box, and press "OK" to include the selected directory to the index. If you decide to exclude the folder from the existing index, please choose "No" for the "Include in Index?" parameter in this dialog window. This parameter is "Yes", by default.

New Directory Creation

If your Indexing Service is started, it will index the new catalog. Otherwise, you should start Indexing Service and it will index the catalog automatically. You can create or recreate an index folder manually. To do this, you should open the context menu for the specified folder in the existing catalog and choose "Rescan (Full)" or "Rescan (Incremental)" in the "All Tasks" submenu. Of course, your Microsoft Indexing Service has to be started at this time.

If you choose the "Indexing Service" entry in your "Computer Management", you will see the state of the Indexing Service. Sometimes, this information can help you if you have a big storage and can't find the file there.

There is another important setting for Indexing Service – "Indexing Service Usage". This setting allows you to tell Indexing Service how often it should update the indexes. For example, if your application only uses static storage, the service need not update the index so often because if you use dynamic data storage, your data is updated very often. To configure this parameter, you should open the context menu for the "Indexing Service" entry and choose "Tune Performance" in the "All Tasks" submenu.

Indexing Service Usage Configuration

Now, you can check the index. To do this, choose "Query the Catalog" in your catalog. You'll see a form which allows you to search something in your index. First of all, you can test a simple full text search. Enter something in the query field and press the "Search" button. Now, you will be able to see the files which contain the entered words. Of course, you can execute more difficult queries using this tool. Choose "Advanced query" if you want to execute some complex queries. You can use Microsoft Indexing Service queries to get the required information. This query language is the same as SQL, but it contains some syntax extensions.

Query Microsoft Indexing Service

You can use SQL to query Microsoft Indexing Service. But, there are several extensions for Indexing Service's SQL dialect which you have to know about.

The most useful command, when you use the Microsoft Indexing Service, is the SELECT command. It's clear, because you shouldn't add, delete, or update information in your indexes. You use Select to query the Indexing Service to retrieve some information about indexed files. Let's see an example query:

SELECT Path FROM SCOPE() WHERE FREETEXT(Contents, 'Hello World')

This query returns you all paths to files which contain the "Hello World" text. And, it can help me describe to you Microsoft Indexing Service's SQL extensions.

First of all, let's look at the FROM expression. In this example, we query all the data which the index contains. The SCOPE() function allows you to tell the Indexing Service which data you have decided to examine. By default, if you don't use any parameters, it examines all the data in your index. This function can optimize your queries, because it can limit the indexes for search. For example, you can use SCOPE ('"/books"'). Here, you will query only the "/books" folder, not all the folders in your index. The query execution speed will be more than if you would use a simple SCOPE() function. For more search limitations, you can use special traversal types. For example, SCOPE ('DEEP TRAVERSAL OF "/books"'). If you use this expression, Indexing Service will search in the "/books" directory and in all the directories beneath it. If you use SHALLOW TRAVERSAL, Microsoft Indexing Service will examine only the "/books" directory. For example, SCOPE('SHALLOW TRAVERSAL OF "/books"').

The WHERE expression is the same as in SQL, but there are few extensions for it too. There are Comparison Predicates. You can see them in this table:

OperatorSymbolExample
Equals=WHERE DocAuthor = 'John Doe'
Not equals!= or <>WHERE DocTitle != 'Finance'
Less than<WHERE WordCount < 1000
Greater than>WHERE WordCount > 500
Less than or equal to<=WHERE WordCount <= 500
Greater than or equal to>=WHERE WordCount >= 500

You also can use Boolean operators which are evaluated using the following rules:

  • NOT is evaluated before AND. NOT can only occur after AND (as in AND NOT; the combination OR NOT is not allowed).
  • AND is evaluated before OR.
  • AND expressions are associative, and can be applied in any order. For example, A AND B AND C, is the same as (A AND B) AND C, which is the same as A AND (B AND C) .
  • OR expressions are associative, and can be applied in any order.

There is a LIKE predicate too. But, there are several predicates which extend the SQL language:

  • ARRAY. This predicate performs comparisons of two arrays using logical operators. For example, ... WHERE username = SOME ARRAY ['Admin' , 'root']. This example returns you files which contain the username parameter as 'Admin' or 'root'.
  • CONTAINS. This predicate is used for full text search. For example, …WHERE CONTAINS(country,'"USA" OR "Russia"'). This example returns files which contains a country property which is "USA" or "Russia".
  • FREETEXT. This predicate allows you to find words and phrases in indexed files. It's better to use it if you need to find anything in the contents of your files. For example, …WHERE FREETEXT(Contents,'Hello World !!!').
  • MATHCES. This predicate performs queries using a Regular-Expression pattern. It's more powerful than the LIKE predicate. For example, … WHERE MATCHES (Contents, '|(USA|)|{1|}' ). This example matches any string in which exactly one instance of the pattern "BUSA" occurs.

For additional information, you have to go to the Indexing Service articles on the MSDN website.

Now you know how to prepare queries for the Microsoft Indexing Service, but you still need to take a list of properties which can be used in your queries. There are a lot of default properties for each index, which you can find in the following table.

Friendly NameData typeProperty
A_HRefDBTYPE_WSTR | DBTYPE_BYREFText of HTML HREF. This property name was created for the Microsoft® Site Server, and corresponds with the Indexing Service property name HtmlHRef. Can be queried, but not retrieved.
AccessVT_FILETIMELast time a file was accessed.
All(not applicable)Searches every property for a string. Can be queried, but not retrieved.
AllocSizeDBTYPE_I8Size of disk allocation for a file.
AttribDBTYPE_UI4File attributes. Documented in the Win32 SDK.
ClassIdDBTYPE_GUIDClass ID of an object, for example, WordPerfect, Word, and so on.
CharacterizationDBTYPE_WSTR | DBTYPE_BYREFCharacterization, or abstract, of a document. Computed by Indexing Service.
Contents(not applicable)Main contents of the file. Can be queried, but not retrieved.
CreateVT_FILETIMEThe time the file was created.
DirectoryDBTYPE_WSTR | DBTYPE_BYREFThe physical path to the file, not including the file name.
DocAppNameDBTYPE_WSTR | DBTYPE_BYREFName of the application that created the file.
DocAuthorDBTYPE_WSTR | DBTYPE_BYREFAuthor of the document.
DocByteCountDBTYPE_14Number of bytes in a document.
DocCategoryDBTYPE_STR | DBTYPE_BYREFType of a document such as a memo, schedule, or whitepaper.
DocCharCountDBTYPE_I4Number of characters in a document.
DocCommentsDBTYPE_WSTR | DBTYPE_BYREFComments about the document.
DocCompanyDBTYPE_STR | DBTYPE_BYREFName of the company for which the document was written.
DocCreatedTmVT_FILETIMEThe time the document was created.
DocEditTimeVT_FILETIMETotal time spent editing the document.
DocHiddenCountDBTYPE_14Number of hidden slides in a Microsoft® PowerPoint document.
DocKeywordsDBTYPE_WSTR | DBTYPE_BYREFDocument keywords.
DocLastAuthorDBTYPE_WSTR | DBTYPE_BYREFMost recent user who edited the document.
DocLastPrintedVT_FILETIMEThe time the document was last printed.
DocLastSavedTmVT_FILETIMEThe time the document was last saved.
DocLineCountDBTYPE_14Number of lines contained in a document.
DocManagerDBTYPE_STR | DBTYPE_BYREFName of the manager of the document's author.
DocNoteCountDBTYPE_14Number of pages with notes in a PowerPoint document.
DocPageCountDBTYPE_I4Number of pages in a document.
DocParaCountDBTYPE_14Number of paragraphs in a document.
DocPartTitlesDBTYPE_STR | DBTYPE_VECTORNames of document parts. For example, in Excel, part titles are the names of spread sheets; in PowerPoint, slide titles, and in Word for Windows, the names of the documents in the master document.
DocPresentationTargetDBTYPE_STR | DBTYPE_BYREFTarget format (35mm, printer, video, and so on) for a presentation in PowerPoint.
DocRevNumberDBTYPE_WSTR | DBTYPE_BYREFCurrent version number of the document.
DocSlideCountDBTYPE_14Number of slides in a PowerPoint document.
DocSubjectDBTYPE_WSTR | DBTYPE_BYREFSubject of the document.
DocTemplateDBTYPE_WSTR | DBTYPE_BYREFName of template for a document.
DocTitleDBTYPE_WSTR | DBTYPE_BYREFTitle of the document.
DocWordCountDBTYPE_I4Number of words in the document.
FileIndexDBTYPE_I8Unique ID of the file.
FileNameDBTYPE_WSTR | DBTYPE_BYREFName of the file.
HitCountDBTYPE_I4Number of hits (words matching a query) in the file.
HtmlHRefDBTYPE_WSTR | DBTYPE_BYREFText of HTML HREF. Can be queried, but not retrieved.
HtmlHeading1DBTYPE_WSTR | DBTYPE_BYREFText of HTML document in style H1. Can be queried, but not retrieved.
HtmlHeading2DBTYPE_WSTR | DBTYPE_BYREFText of HTML document in style H2. Can be queried, but not retrieved.
HtmlHeading3DBTYPE_WSTR | DBTYPE_BYREFText of HTML document in style H3. Can be queried, but not retrieved.
HtmlHeading4DBTYPE_WSTR | DBTYPE_BYREFText of HTML document in style H4. Can be queried, but not retrieved.
HtmlHeading5DBTYPE_WSTR | DBTYPE_BYREFText of HTML document in style H5. Can be queried, but not retrieved.
HtmlHeading6DBTYPE_WSTR | DBTYPE_BYREFText of HTML document in style H6. Can be queried, but not retrieved.
Img_AltDBTYPE_WSTR | DBTYPE_BYREFAlternate text for <IMG> tags. Can be queried, but not retrieved.
PathDBTYPE_WSTR | DBTYPE_BYREFFull physical path to a file, including file name.
RankDBTYPE_I4Rank of row. Ranges from 0 to 1000. Larger numbers indicate better matches.
RankVectorDBTYPE_I4 | DBTYPE_VECTORRanks of individual components of a vector query.
ShortFileNameDBTYPE_WSTR | DBTYPE_BYREFShort (8.3) file name.
SizeDBTYPE_I8Size of file, in bytes.
USNDBTYPE_I8Update Sequence Number. NTFS drives only.
VPathDBTYPE_WSTR | DBTYPE_BYREFFull virtual path to a file, including the file name. If more than one possible path, then the best match for the specific query is chosen.
WorkIdDBTYPE_I4Internal ID for a file. Used within Indexing Service.
WriteVT_FILETIMELast time the file was written.

As you can see, there are a lot of indexed properties for each file, but sometimes, you want to extend this list.

How to add new properties for an indexed file

First of all, this feature works only for web pages, because it is based on the HTML <meta> tag.

Let's say, you have several indexed web pages and you want to add several special properties for them. For example, if you want to add "country" and "city" properties, you should add <meta> tags to all files which will contain these new properties:

HTML
<meta name="country" content="Russia" />
<meta name="city" content="Moscow" />

After these changes, you have to restart Indexing Service. Now, you can open the entry "Properties" and see that Microsoft Indexing Service knows about your special parameters for files. But still, you can't use these new parameters in your queries.

Select the "Properties" node of your catalog and choose the property which you added to the files using the <meta> tag. Double click on the property, switch on the "Cached" checkbox, and choose the data type for the new property from the opened dialog box.

Microsoft Indexing Service Installation

After that, you should create a Column Definition File which contains information about your newly added parameters. The File could have an ".idq" extension, but this isn't important. A Column Definition File uses the following format:

[Names]
Propertyname( Data type ) = GUID ["Name" | Property ID]

The data type parameter is optional. If you don't define it, Microsoft Indexing Service will take the data type from the parameters definition for your catalog.

For my example, it contains this:

[Names]
country = d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1 "country"
city = d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1 "city"

All these data can be taken from the dialog box for the properties configuration.

After the Columns Definition File is created, information about this file has to be added to the Indexing Service Registry settings. Add a string entry named "DefaultColumnFile" to the Registry key "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\ContentIndexCommon". "DefaultColumnFile" should contain the full path to your Columns Definition File.

Restart Microsoft Indexing Service. After that, run a full rescan of your indexed folder. Now, you will be able to use the new parameters in your queries.

Using Microsoft Indexing Service in WinForms applications

Microsoft Indexing Service exposes itself to the developer as an OLE DB provider. Its name is MSIDXS. You can use ADO.NET for querying your Indexing Service. To do this, you have to create a new System.Data.OleDb.OleDbConnection object using this sample connection string:

Provider= "MSIDXS";Data Source="Documents"

In the Data Source parameter, you should use the name of your catalog in Indexing Service.

Let's create a sample code which will query Indexing Service for a few words from the file contents. In this sample, there is a queryString variable. It is an instance of the SearchParameters structure. This structure contains information about the data source and the query string. Here is the definition of this structure:

C#
struct SearchParameters
{
    private string storage;

    public string Storage
    {
        get { return storage; }
        set { storage = value; }
    }

    private string query;

    public string Query
    {
        get { return query; }
        set { query = value; }
    }
}

First of all, you create a new OleDbConnection object:

C#
string connectionString = 
  string.Format("Provider= \"MSIDXS\";Data Source=\"{0}\";", 
  queryString.Storage);
OleDbConnection connection = new OleDbConnection(connectionString);

After that, you have to create a new OleDbCommand associated with this connection:

C#
string query = string.Format(@"SELECT Path FROM scope() " + 
               @"WHERE FREETEXT(Contents, '{0}')", queryString.Query);
OleDbCommand command = new OleDbCommand(query, connection);

Note that the MSIDXS provider doesn't support commands with parameters. This is bad. I hope that Microsoft will fix this issue in the next version of the Microsoft Indexing Service.

You are now able to execute this command and retrieve a list of files which contain the selected text:

C#
connection.Open();

ArrayList result = new ArrayList();

OleDbDataReader reader = command.ExecuteReader();
while (reader.Read())
{
    result.Add(reader.GetString(0));
}

connection.Close();

In this code, checking the returned value for NULL is not necessary, because Indexing Service always returns a path to a found file.

Summary

Microsoft Indexing Service is a totally free and powerful product which is included with Windows 2000 or later versions. It's very simple to use. You can easily create indexes. You can also query these indexes using an OLEDB data provider. If you are working with Microsoft .NET, it is really easy to use. In this article, I have tried to describe how to install, configure, and query the Microsoft Indexing Service. I also recommend you see my example, which I have attached to this article. This example will show you how to use the full text search features. I hope that this article will help you to start using Indexing Service effectively.

When I prepared this article, I used these materials:

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior)
Czech Republic Czech Republic
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
-- There are no messages in this forum --