Enriching the search query

Most successful -- that is, used! -- online search tools have a remarkably simple query interface. The single input field is where your query interface needs to be. The wealth of your data is made available not by complicating the user's interface, but instead by automatically and intelligently enriching the query. Enriching the query includes the following, in no particular order,

  • term extraction -- words, acronyms, dates, numbers, email addresses, etc
  • term ranking -- some words are more important than others.
  • hyphenated terms -- query for both with an without hyphens
  • accented terms -- query for all extant variations of the term in the data (in the US, for example, accented characters are rarely used and so American's tend to drop the diacritics).
  • term spelling variants -- "color" and "colour", etc
  • term stemming and other transformations -- soundex
  • numbers and number names -- "101" is also "one hundred and one"
  • dates and times and date and time expressions -- "tomorrow" is "Thursday, 8 May 2008"
  • social relationships between authors -- Smith and Jones are frequently co-authors and so a search for Smith should include a search for Jones (within reason)
  • reference relationships between works -- the original linking
  • professional vocabularies -- nurses and doctors don't use the same terms for everything
  • lay vocabularies -- "cardiac arrest" is "heart attack"
  • spelling correction of terms -- only use the extant terms
  • broadening and narrowing of terms -- you say "tomato" and I say "solanum lycopersicum" or "vegetable".
  • ...
As always, do as much on behalf of the user as you can before asking them for more data.