dtSearch入门教程

转帖|使用教程|编辑:龚雪|2014-12-12 11:25:43.000|阅读 1266 次

概述:本文介绍了dtSearch的入门教程,包含搜索的定义、dtSearch安装和初次使用、API等。

# 界面/图表报表/文档/IDE等千款热门软控件火热销售中 >>

相关链接:

Search is an important part of desktop and web based applications but it can be made to seem more difficult than needs be. We take a look at how to implement search via an object-oriented API using dtSearch and C#.

You may have noticed that I Programmer's search facility is, to put it bluntly, not very good. It has been on our fix list for a long time but so far no one has had the courage to decide on the technology to use to replace it.

I also have a great interest in desktop search - or rather how how it generally doesn't work under Windows. Since Vista, Window's desktop search has been difficult to use, difficult to configure and difficult to manage. I've tried alternatives such as Windows Search 4.0 and Solr but there are problems with both. They tend to be over complex and simply not worth the effort. Now I'm investigating dtSearch and I can tell you now, it's a refreshing return to simplicity.

But see if you agree as I explain how easy it is to get started with it as a system component and as an API.

What is search all about?

This is a difficult question because there are many answers depending on your exact circumstances. Put simply, search is about finding documents based on their content. The dumb way to do this is search the entire collection of documents each time. The most intelligent way of doing the job is to scan all of the documents and build an index of the words that they contain. The index is typically much smaller than the collection of documents and much faster to search.

In most cases you only have to scan the entire collection of files once and then add any new files to the index. The big problem is that most tools make creating an index a difficult task. Not so with dtSearch. It makes it seem ease and direct and you can see exactly what is going on.

You could simply use the dtSearch console to find documents but in most cases it is preferable to build it into an app of your own - and this is where the API comes in. But before looking at the basic structure of the API and building our first "hello world" applications let's take a quick look at getting started with dtSearch.

Installation and first use

All you have to do is download the 30-day evaluation of  dtSearch With Spider and start the installer. Once is has finished run the dtSearchDesktop. This is your command centre for everything from creating an index to searching an index.

dtSearchdesktop

Your first task is to create an index. There are a number of different options you can select that make the index more useful, but for the moment you can opt for the defaults. All you have to do is give the index a name test in the following examples and specify where the index is stored. You can accept the default location for the moment but the fact that dtSearch doesn't try to hide where the index is stored by using a fixed internal path is welcome. When you come to use it for real you can specify a document path for the index that is say included in your regular backup. Notice that you can use a network share for the index location but this will run slower than a local file. 

Once you click OK the index is created and your next step is to specify the files that will be indexed. A single index can be used to index multiple locations and multiple location types. You can add additional locations to an index at any time simply by using the Update Index command. Simply add the data you want to index to the "What to Index" list. You can index a folder, file, a website, or an Outlook store. You can customize the files that are included in the index using filename extensions - but for this example just accept the defaults. For the example I simply indexed my local documents directory - not a big collection of files but enough to show the principals of operations.

updateindex

After a few minutes the index should be complete - you can stop it or pause it any time you want to. The next task is to query it. It is a good idea to become familiar with the query mechanisms before you move on to programming because what you can do using the API is very similar. Simply select the search option and type in your search target. You can type a single word, a phrase or a conditional  For example "Hello or World" finds documents with "Hello" or "World".

searchresults

The search results give you a list of document locations and various search statistics - hits, score etc. In the lower pane you can also see extracts from the documents where the hits occurred.

There are a great many search options that you can set and the best way to find out about them is to look up what they do and then construct searches using them.

That's about all you need to know about the basic use of dtSearch - set up and index, search it, view documents. It couldn't be simpler. From the programmers point of view it is made all the more simple because the details of operation aren't hidden from view in some complex admin structure. In this case you see the paths involved and can see the search parameters and results.

There are more details of the way the index is created and searched that you can control but this is enough to move on and start looking at the API.

So before moving on make sure you have an index called test in a known location and that you have tried it out with a search term or two.

The API

You can use dtSearch from any .NET language, Java or C++. In this case I'm going to use the .NET API and C# 4.0, but the ideas are more or less the same in any language because the same classes are provided to do the same job. You could say that only the language changes the classes are largely unaltered.

Start a new C# Windows forms project and add a reference to:

dtSearchNetApi4.dll

which you will generally find in

C:\Program Files\dtSearch Developer\bin

or

C:\Program Files (x86)\dtSearch Developer\bin

(there are versions for earlier .NET assemblies but in most cases version 4 is what you should be using).

AddReference

Also add:

using dtSearch.Engine;

to save having to type out fully qualified names.

Now get started all you really need to know is the key class that does most of the work related to search is SearchJob. Whenever you are trying to get to grips with a new API finding the class (or small number of classes) where it all starts is usually the way to get on top of it fast. In this case once you know that SearchJob is what you need to set up a search of an index it is all remarkably easy.

Place a button on the form and in its click event handler we first create an instance of SearchJob:

SearchJob SJob1 = new SearchJob();

Before we can perform the search we need to setup some details. First we need to specify the location fo the index:

SJob1.IndexesToSearch.Add(@"C:\Users\ian\AppData\Local\dtSearch\test");

Of course you have to replace the string with the full path to the index you are using. Notice that you can specify multiple indexes because the property is a collection.

Next we need to specify what we are searching for. This can be done in two ways. Using the Request property to specify search terms or using the BooleanConditions property to specify a logical expression involving search terms. For example:

SJob1.BooleanConditions = "Hello and World";

will search for documents containing "Hello" and "World" in the index and

SJob1.BooleanConditions = "Hello or World";

will search for documents containing "Hello" or "World" in the index. Following this there are a range of optional parameters you can set. For example:

SJob1.MaxFilesToRetrieve = 10;

You can set all of the more sophisticated search options at this point - filters, stemming, fuzzy search, exclusions etc.

Now we are already to perform the search. You can do it as a blocking call or you can use an event to work asynchronously. The simplest option is to use a blocking call:

SJob1.Execute();

but note that once you call this method your entire application is frozen until the search is complete or an error occurs. This isn't too bad with a small index but of course it quickly becomes unacceptable. As well as using an event to process the data asynchronously you could also use a worker thread to run the search - again not difficult but not specific to using dtSearch.

When the call to Execute complete  the SearchJob instance has properties which return the results of the search. For example, the HitCount property gives an integer that holds the number of hits the search returned. For example:

MessageBox.Show( SJob1.HitCount.ToString());

More importantly SearchJob returns a SearchResults object via its Results property. This provides a collection of documents that the search found. To make use of this collection you have to make use of the GetNthDoc method to make the nth document the current document and then you can use various properties to return its details. For example:

SearchResults results = SJob1.Results;
for (int i = 0; i < results.Count; ++i)
{
 results.GetNthDoc(i);
 listBox1.Items.Add(results.DocName);
}

Which simply adds the document names to a ListBox placed on the form.

Yes it really is this easy.

Of course I've left out the usual error handling to make it easier to follow but this isn't difficult to add - there is an Error property that you can test. It also doesn't take into account that the results object could be very large indeed. In this case garbage collection might be a problem so you should use the "using" construct to ensure the the results object is disposed of when you are finished with it. Again not difficult.

What next?

The next step in most uses of an index search is converting the results to something more suitable. See the FileConverter Class for an easy way to convert to HTML, RTF or text.  You can also export the results as XML.  If you also want to control the construction and maintenance of the index itself then you need to look up the IndexJob object which is very similar to the SearchJob object.

Building an application around dtSearch is more a matter of what you do with the search results and in many cases how you allow the search to be specified by the user.

Then there are many other features that we haven't even mentioned - CDsearch, Websearch and setting up the web Spider to name just three, but these are other stories.

dtSearch更多信息和免费下载>>

原文地址:http://www.i-programmer.info/programming/database/2701-getting-started-with-dtsearch.html


标签:Java.NETC/C++搜索控件

本站文章除注明转载外,均为本站原创或翻译。欢迎任何形式的转载,但请务必注明出处、不得修改原文相关链接,如果存在内容上的异议请邮件反馈至chenjj@evget.com

文章转载自:慧都控件网

为你推荐

  • 推荐视频
  • 推荐活动
  • 推荐产品
  • 推荐文章
  • 慧都慧问
相关产品
dtSearch Desktop with Spider

全球领先的文本检索工具,支持在千兆字节数量级的数据源中进行搜索。

dtSearch Network with Spider

全球领先的文本检索工具,支持在千兆字节数量级的数据源中进行搜索。

dtSearch Web with Spider

全球领先的文本检索工具,能够快速地将大量的搜索内容即时发布到基于IIS的Web站点上。

dtSearch Publish

全球领先的文本检索工具,能够为CD/DVD publishing提供强大的功能。

dtSearch Engine

超过20年的全球领先的文本检索控件,使开发者为应用程序快速添加文本查检索功能。

扫码咨询


添加微信 立即咨询

电话咨询

客服热线
023-68661681

TOP