Feature
posted 24 Nov 2005 in Volume 2 Issue 6
Improving search in a cost-conscious environment
A workshop exploring various search-improvement techniques that will not break the bank.
By Matthew Gibson
If money were no object then we would all employ full-time information architects. We would have dedicated teams of people maintaining and enforcing our taxonomies. And we would have technical staff spending all day refining and tuning our in-house search technology.
Unfortunately we do not all have these things. In an industry such as retail, success is usually measured in the simple terms of sales and costs. In this environment it can be difficult to make the case for the lengthy exercise of introducing a taxonomy. It can be difficult to define the cash benefits of improved search. So what do you do if you find yourself managing search in a cost-conscious environment?
Central cost versus distributed cost
Expensive solutions frequently involve up-scale, centralised resources – teams of people operating expensive hardware and software. Think of this as the ‘super computer’ model.
So what is the opposite of the super-computer model? It is the ‘grid computing’ model. In terms of enterprise search, it means requiring end users to contribute a small amount of expertise when interacting with our information. The aim is to try to elevate the effectiveness of a low-cost solution to something approaching the expensive alternative.
Flavours of metadata
Relying on search algorithms to do the hard work for you is always an option. Much has been invested in the search algorithms that are available in today’s enterprise-search tools. There are many search products that provide solid out-of-the-box performance. However, if we are to maximise the effectiveness of our search solution then we must consider the use of metadata.
Metadata can be used to reinforce the context of a content item. This is frequently achieved by employing ubiquitous ‘keywords’ and ‘description’ meta-tags. This type of metadata is primarily intended to interact with technology – the search engine reads it to help clarify what a content item is about. It is typically invisible to the end user and is aligned to the super-computer model.
In the grid model, the emphasis is on interacting with the end user. We need metadata that talks directly to the user and can be used in a way to make search more powerful.
Keep it simple
In an environment where we expect end users to interact with metadata it is important to keep it simple. Content editors need an easy way to apply metadata and end users need familiar terminology to help them understand what the metadata is telling them about a content item.
Who, who and what
In a business environment it is reasonable to expect that the end users know a little bit about what they are searching for. There are three dimensions possessed by each piece of content that an end user should be able to relate to:
1. Who – who generated the content?
2. Who – who is the content intended for?
3. What – what type of content is it?
It is not unreasonable to expect a content creator to add these three pieces of metadata to a content item. The overhead of doing this can be kept to a minimum by limiting the range of values that each metadata element can have.
Who generated the content?
It is highly likely that the detail of who generated the content can be captured automatically when it is created. If you use a content-management system then try to derive this information from the security groups that are active in your system. Ideally you will be able to capture this as a department or team name, rather than an individual’s name.
Who is the content intended for?
The content creator should know their intended audience. It might be useful to everyone or only intended for a particular division of the business. Five options that might describe an entire business are listed below:
1. Administration centres;
2. Call centres;
3. Distribution centres;
4. Global;
5. Stores.
What type of content is it?
Even large amounts of content can generally be broken into a relatively small number of category types. The following eight descriptions might be sufficient to describe an entire intranet:
1. General interest;
2. Listing;
3. Meeting notes;
4. Organisational information;
5. Policy;
6. Procedure;
7. Trading information;
8. Training material.
Selecting a search tool
Search technology has matured to the point where good products can be found at affordable prices. When reviewing the numerous products at the lower levels of this market it is important to remember that cost is not the only consideration. There are three primary considerations when selecting a search tool:
1. Licence costs (of course);
2. Infrastructure costs;
3. Features.
When looking at the cost of a search tool it is worth noting the charging structure (see Figure 1).
[Box]
Examples of search engine charging structures and what they mean:
- Product A is sold for £3,000. There is no limit on the number of documents that can be indexed;
- Product B is sold for £3,000. This will only allow you to index up to 50,000 documents. A licence extension is required to index more than 50,000 documents;
- Product C is sold for £3,000. This will only allow you to index up to 50,000 documents.
Licence extensions are available in blocks of 50,000 documents.
Product A is a fixed-price product. The absence of a document limit may appear to be good value but a concern would be that this product may not be able to handle large numbers of documents.
Product B is a mid-range product capable of indexing large numbers of documents but should we be concerned about what happens when indexing many more than 50,000 documents?
Product C is an enterprise-level product. If the charging structure extends into indexing millions of documents then we should be comfortable concluding that the product is capable of running at high volumes.
Infrastructure cost
Any search system that we select must be capable of running on the hardware and operating systems that are supported by our business. If a search tool requires a backend database then which databases can be used? What are the costs of running these databases in our business?
Feature considerations
All search tools provide a way of indexing and retrieving content. We are entitled to expect more than these basic features (even if it is the best algorithm in the world).
Auto-indexing and refreshing
It is important that a search tool is capable of indexing and re-indexing content. Ideally this will be fully automated and controlled by a user-defined schedule. End users will not be satisfied if their search tool links to dead content or if new content is not searchable.
Security
A search tool might be used to index both public content and protected content. It is important that any search tool respects whatever security model is in place. It is clearly not acceptable to publish private documents to the general population through cached or summarised search results.
Filters
If a search tool is licensed according to the number of documents that it indexes then the ability to filter entries is crucial to managing the size of the search index and thus the licensing cost. There may be content areas that do not need to be searched. There may be content areas that include their own internal search mechanism. The search index can be optimised by filtering out unnecessary content.
Reporting
In order to understand the end user’s experience, it is important to have some form of reporting. What are people searching for? How many searches are being performed?
Information in search reporting can be used to re-shape entire information repositories. At the very least it can be used to provide proof that the search tool is being used and that the investment is not being wasted.
If metadata is being used to describe content then the search tool must be able to read it and use it to build search queries.
‘Top results’ is the mechanism by which certain documents can be made to appear at the top of search results lists, regardless of their relevance. This is done by linking particular search words or phrases to specific documents.
This mechanism can be particularly useful if generic words are used to describe business functions. For example, imagine that your business has an internal publication called ‘The Draft’. How successful do you imagine a normal search would be if your search words were ‘the draft’? By using a top-results feature you can manage this type of situation and ensure that your end users get the results that are expecting.
Thesaurus
A thesaurus in a search tool operates as you would expect. It links a search word or phrase with a series of synonyms to extend the search deeper and to allow for discrepancies in the language used by the content creator and the end user. A thesaurus goes some way to fulfil the role that might otherwise be performed by the keywords meta-tag.
The ability to manage the entries in a search engine thesaurus is interesting. For example, imagine that through reporting you discover that some end users are misusing your search tool to search for illicit content. In a business environment it is unlikely that any such material exists so the search tool will display no results. This situation is acceptable, but now imagine that you add entries to your thesaurus so that explicit search words are linked to phrases like ‘security policy’ and ‘acceptable usage policy’. Your end users will now be shown search results that remind them of relevant company policies, warning them to change their search behaviour and, ultimately, to stop wasting company time.
S.U.E.
We now have a metadata strategy and a search tool. In order to tie the two together there are three interfaces that must be developed. Each needs to be handled with sensitivity.
System
The search tool needs to be configured to read the metadata being used. We have a metadata element that describes who our content is intended for. At a technical level we might decide to call this ‘coverage’ and to store it in the familiar meta-tag format:
<meta name="coverage? content=?Call Centres? />
User
If we choose to display this metadata to our end users then it may not be a natural choice to call it ‘coverage’? From an end-user’s perspective, ‘audience’ might be a better word to describe this metadata.
Audience: Call Centres
Editor
Creating content is a different exercise to searching and reading content. To be sensitive to the role consider a further change to the terminology. For a content creator we might label this same element ‘operational relevance’.
Operational Relevance: Call Centres
Referencing the same metadata in different ways may appear to be a very cosmetic distinction but it is worth considering that these small differences affect how intuitive an interface is. Interfaces are where end users spend every minute of every day. Small improvements to the ease of use will deliver massive savings across the business as a whole.
Enforcing metadata standards
Relying on individual content editors to apply metadata to their content can eliminate the need for a central resource. The risk is that content editors may not be as dedicated to metadata as an information professional. Metadata may not be applied consistently and the default metadata options, whatever they may be, might prove to be conspicuously prevalent across your content.
In the absence of ‘metadata police’ it is difficult to eliminate this risk. In our environment we again look for the simple, efficient ways to approach this issue. In this case, the most effective tools we have are peer pressure and public shaming.
Display the metadata
Customise your search results page so that the metadata is displayed as well as the usual document information (title, description, link). This must be done in a sensitive way to ensure that the metadata is easily understood and not appearing as seemingly abstract text. For example, our three metadata tags could be displayed as follows:
Published by: Health & safety
Audience: Call centres
Content type: Procedure
Displayed as part of the search results, this metadata adds real value to the end-user experience. It enables additional, visual filtering of results in a way that is meaningful to the end user. In addition this approach also raises the profile of metadata. It makes it easy to review what metadata is being applied to content and it can be used to highlight any areas where metadata has been applied loosely. Content editors have nowhere to hide when metadata is displayed in search results. This is a ‘name and shame’ mechanism that helps to enforce standards.
Encouraging intelligent search
Displaying metadata in search results allows for visual filtering after the search is performed. Clearly, if our end users are able to filter by metadata before the search is performed then the number of search results will be reduced and the overall search experience will be greatly enhanced.
In our example we are able to automatically capture the name of the publishing department as content is created. We could, if we wished, display this metadata on the published content itself, perhaps as part of a footer. Now imagine that we include a link to our search interface in our page footer. Can we use the metadata to pass a filter to the search interface? A link to a search interface that passes a metadata filter might be constructed in the format shown below:
href=?/search.asp?Department=
This is a simple piece of coding but let’s just think about what it gives the end users.
This simple inclusion in our page footer means that every page will display a text link inviting end users to ‘search similar’. On clicking the link they will be taken to our search interface, enter their search word and click ‘search’.
There is no extra effort required to do this than to access our search interface by any other means. What the end user gains is automatic filtering. By clicking our ‘search similar’ link they are automatically filtering the results by department. Results will only be displayed if they are relevant to the search word
If a business has 50 departments then this simple enhancement will immediately reduce the number of possible search results by, on average, 98 per cent. If myriad search results are the problem then this simple filtering is a powerful and easy possible solution.
In a cost-conscious environment it is important to get to know your editors, your end users and your search tool. Small changes and simple enhancements can deliver real benefits with minimal investment.
Matthew Gibson has been working in the web industry since 1998. He has operated his own web-design business and more recently worked for a major
denotes premium content | May 26 2012 


