Regular
posted 20 Jul 2004 in Volume 1 Issue 2
Seek and you shall find?
Search and retrieval software may be the linchpin of an effective enterprise, but it is only as useful as the content it trawls, writes Claudine Beaumont. If information, data and documents are poorly organised or incorrectly archived, it won’t matter how powerful the search engine is – you won’t be able to find what you’re looking for.
“Search tools are ineffective ‘find’ tools because their focus is on retrieving a specific reference and not on discovery,” concluded one white paper1 by the Delphi Group. With enterprise search and retrieval a critical component of most organisations’ information-management plans, both vendors and businesses would do well to remember that while retrieving data is useful, making sense of it is what really brings improvements in business efficiency.
The marketplace for search and retrieval technology is buoyant, with analysts IDC predicting that worldwide revenues will grow from US$540m in 2001 to over US$2.6bn in 2005, despite the general economic downturn2. But strong spending in this area means the search and retrieval market is becoming increasingly congested, with most vendors finding it hard to develop solutions that really stand out from the crowd. And the already fuzzy lines between pure enterprise and browser-based search and retrieval offerings is expected to blur further when Google, the undisputed king of internet search, brings its considerable expertise to bear in a corporate environment.
Google Search Appliance uses the familiar internet interface to search across intranets, extranets and the public websites of businesses. It allows business users to personalise search results by geography or topic of interest, provides redundancy from disk drive failures, and crawls content on-the-fly to ensure new content appears immediately in search results. Companies such as Boeing and Cisco already use Google Search Appliance, as do Xerox.
Google’s next offering – codenamed Puffin – is scheduled for release sometime in the next eighteen months and threatens to encroach further on the traditional search and retrieval space; it will provide desktop file, text and document search using the familiar browser interface. It’s a pre-emptive move aimed at nullifying expected improvements to Microsoft’s in-built search functionality in its next version of Windows, due for release in 2006. It all goes to show that the internet search players now want a piece of the lucrative enterprise-search pie.
What many organisations don’t always appreciate is that search and retrieval is not a solution in itself, but rather a front-end interface built on complex back-end functionality.
The bedrock of successful enterprise search and retrieval is the metadata and tagging terms applied to documents and information. By applying a standardised vocabulary and synonym set to documentation, organisations can eliminate common data entry errors and maximise the accuracy of search results by ensuring only the most relevant information is retrieved when users perform searches with these key words.
Central to the tagging process is the development of a taxonomy that binds terms together into a coherent, hierarchical structure. Many of the leading vendors in the enterprise search and retrieval space, such as Autonomy, Convera, Inxight and Verity provide software that will create flexible, scalable taxonomies to automate classification and manage the taxonomy throughout its lifecycle, incorporating new documents and terms as they are generated.
These tools crawl existing documents to build the list of basic taxonomy terms. While concept-extraction technology is fairly accurate, organisations usually still need to use ‘training documents’ to teach the taxonomy to correctly identify metadata terms. Training documents provide a best-fit for taxonomy terms, and prevent the classification system from becoming too broad in its scope.
One of the most important factors in the development of search and retrieval software has been the emergence of web services, particularly XML tagging, as a standardised framework for document tagging. XML tags give structure to text-based documents, blurring the line between ‘structured’ and ‘unstructured’ data. Indeed, the market for XML lifecycle solutions is expected to grow from US$1.8bn in 2003 to more than US$11.6bn in 200833. While the use of web services does not circumvent the need for taxonomy, it does provide an extra layer of structure that makes the tagging process easier. It can be the unifying quality that creates order and consistency across disparate and seemingly incompatible systems.
Many organisations are struggling to fit search and retrieval solutions around existing knowledge silos and legacy systems, where decades of unmonitored, collated data can cause all manner of problems. While a lot of search and retrieval software can comb these systems for useful information, tag them appropriately and draw them into search results, it does not overcome the fact that most of this information will be out of date and useless. Companies should spend some time sorting out this data before drawing it into their retrieval software, rather than leaving it to users to filter the extraneous information at the desktop.
Search tools should have a quantifiable impact on the organisation, improving efficiency by ensuring business-critical information is more readily available; there should be a tangible ROI from a company’s investment in search-and-retrieval software. Ensuring information is ‘clean’ before it is aggregated into the search and retrieval system means users will not waste their time establishing the worth of the data – they can take it as a given.
Reference
1. ‘Connecting to your knowledge nuggets’, published by the Delphi Group, 2001.
2. Worldwide search and retrieval software forecast 2003-2007, IDC, May 2003.
3. Schmelzer, R., XML in the content lifecycle: Foundation report. ZapThink, January 2003.
denotes premium content | May 26 2012 


