Why Enterprise Search Sucks

June 27, 2008 · · Posted by Greg Lloyd

ImageRon Miller of EContent wrote a very good article AIIM Study Finds Enterprise Search Still Lacking about an upcoming AIIM report on Findability and disappointed expectations for enterprise search. Ron's title is more polite than some of the words I've heard (and used) to characterize enterprise search. Bluntly - if we all agree that enterprise search sucks, what is to be done?

Ron quotes Dan Keldsen, director of market intelligence at AIIM:

"It’s not that people don’t have search or other tools and techniques to find information. They have too many tools. They have search in their email client, search on the web, the sales force automation software has its own search [and so forth]." The trouble is most organizations don’t have tools to search across everything, he explains. In spite of the fact that federated search has been around for some time, he says, most organizations don’t have it because it’s tricky and expensive to implement.

Therefore it’s not surprising that 82 percent of those surveyed by AIIM agreed or strongly agreed that their experience with the consumer web has "created increased demand for enterprise findability." Whether that’s realistic or not, matters little, says Keldsen, because we have to face the fact that these users are frustrated for whatever reason. "Should we be frustrated that this is what people think and feel, or face it because it’s reality?" he asks.

In a highly unscientific poll of about forty people attending the Network Application Consortium's Fall 2007 Conference on Collaboration Technologies, I asked:

"How many of you think enterprise search sucks?"

and was not surprised to see about forty hands raised. I also believe that expectations set by people easily finding what they want on the public Web sets a high bar for what they expect at work (see Why Can't A Business Work More Like the Web?).

Ron's story goes on to quote Carl Frappaolo, VP at AIIM: "I don’t think the technology is failing us, I think it’s the way we are using the technologies," but he adds, "If I can’t find my content, it doesn’t exist."

I have a slightly different take. If all relevant content isn't indexed, it can't be found, but when you add more content stores to be indexed, the signal to noise ratio can get worse as coverage increases.

The technology of enterprise search is robust and capable of astonishingly deep analysis of great piles of content in almost any format.

But the relevance of search results often gets worse as a larger number of stovepiped and minimally cross-linked content stores are indexed. Email stores are often the worst offenders - but contain much of the most valuable working communication.

On the public Web, page rank and similar algorithms cleverly leverage human intelligence to help determine what people have found relevant in the past and found "link worthy". Web page content can provide valuable and indexable context for other files and pages connected by links.

In the enterprise, there are very few links to use for relevance ranking, and tons of duplicate files (or minor variations of the same file) attached to email that's blasted throughout the company and scattered .

Think of poor Dagwood Bumstead working hard to win the Acme Products account. He drafts a PowerPoint and circulates it for review. Because it's an important account many people are cc'd. They each squirrel away a copy, make proposed changes and sent those modified copies around.

The poor enterprise search engine may have hundreds or thousands of copies of duplicate or near duplicate PowerPoint files that talk about the Acme Product proposal - but very little context to determine which version is most relevant, or the context in which it was created. The signal to noise ratio of broadly cc'd email discussions with rats nests of quoted content is even worse.

I believe that blog, wiki, RSS feeds and tagging metadata (collectively "E2.0 sources") can greatly improve the relevance of enterprise search results by intelligently using E2.0 sources as human authored and highly contextualized indexes to weight the relevance and provide useful context to the content they contain or link to.

For example, the relevance rank of blog posts or wiki pages talking about the Acme Account can contribute to the relevance of any directly or indirectly referenced PowerPoint describing Dagwood Bumstead's plan. The PowerPoint could be stored within an E2.0 source - or stored elsewhere and referenced one or more E2.0 sources.

To me, the most important point is that the E2.0 sources model business context in a form that intelligent enterprise search engines can index and use to provide faceted navigation and relevance ranking based on factors including:

  • General business context: Inferred by correlating content analysis, use of "sales" related content tags, and other contextual clues
  • Specific business context: The Acme proposal, with resources collected, used, discussed, or referenced to create that proposal.
  • Time line: Items referenced or discussed while developing and discussing the Acme proposal.
  • People involved: Who worked on the Acme proposal ? What did they talk about and tag ?
  • Space: In what public, private, personal or by invitation collaboration space was the content recorded or referenced ?

When Mr. Dithers shouts: "Bumstead! Where are we on the Acme Account?", the most timely, frequently discussed and contextually relevant version of Dagwood's slide set could pop closer to the top of the result list, along with the cloud of tags and people who have touched or talked about that result.

For more thoughts on how the content of E2.0 sources can be used to provide context - and improved relevance - for enterprise search, see Enterprise 2.0 for Intelligence Analysts. My working title for the internal version of this slide deck was Why enterprise search sucks - and what to do about it.

See also Information Foraging at FASTForward '07
Authority versus Page Rank

A first-order approximation of what I'm talking about:
TeamPage | Attivio Search Module

but the concept using E2.0 sources to improve the relevance of enterprise search is not limited to correlating information from just one E2.0 source or from Traction Software's products. See Why Can't A Business Work More Like the Web?

Page Top