About Us
Technology
Products
Customers and Partners
Media Room
Contact Us
Company
Leadership
Privacy Policy
Career Opportunities
NoBabel Translator
UniFind Search
Benefits
Unique Approach
Approach Comparison
Benefits
Unique Approach
Approach Comparison
NoBabel Translator
UniFind Search
Customers and Partners
Announcements
KSCL Fact Sheet
NoBabel Fact Sheet
UniFind Fact Sheet
Resources
Media Contact
Company Info
NoBabel Info
Sub Navigation 1
Sub Navigation 2
Sub Navigation 3
Sunday, September 05, 2010
About Us
Company
Leadership
Privacy Policy
Career Opportunities
Technology
NoBabel Translator
Benefits
Unique Approach
Approach Comparison
UniFind Search
Benefits
Unique Approach
Approach Comparison
Products
NoBabel Translator
UniFind Search
Customers and Partners
Media Room
Announcements
KCSL Fact Sheet
NoBabel Fact Sheet
UniFind Fact Sheet
Resources
Media Contact
Contact KCSL
Company Info
NoBabel Info
Fuzzy Search – Noise Words, Truncation, and Phonetics
Myth
Some people think that by manipulating search queries, and putting them in a format that is better understood by computers, search results will be improved. And so, common, noise words are removed, such as "the", "to", and "it".
Queries are then expanded by mechanical truncation and expansion of the remaining words, such as by adding "running" to a query that includes "run" and vice versa.
In order to compensate for possible misspellings, queries are also expanded using sounds-like phonetic functions that add similar sounding words.
Fact
Depending on the question, noise words can be invaluable, such as when looking for information regarding the Shakespearian line, "to be or not to be", which is composed almost entirely of noise words. Using primarily word frequencies from the particular collection of documents that is being searched, UniFind™ Technology, at the time of each question, dynamically and automatically determines the importance of all query words. Consequently, UniFind knows when noise words should be discarded and when they should be viewed with great importance.
While truncation does work to properly expand a query in some circumstances, it falls short in others. UniFind only expands questions when such expansion is warranted. For instance, trying to mechanically truncate the word "run" into "ran", and vice versa, would require that the truncation wildcard be placed immediately following the "r" (i.e., r*), and thus, the truncation wildcard would add every word beginning with "r" to the query. On the other hand, since UniFind uses linguistics to determine the proper derivatives and base forms of words, it is not limited by wildcard truncation, and thus, without adding every word beginning with "r", UniFind will produce "ran" in a query with "run". UniFind is even able to give gradual weightings to words, with greater importance being given to exact matches than to matches with alternative forms of the query words.
While phonetics will catch some spelling errors, it will not catch others. To ensure that all misspellings are found before a question is processed, UniFind spell checks and corrects all typographical and cognitive errors. In addition, UniFind’s list of terms used for spelling correction is automatically and dynamically created in order to ensure that the list coincides with the particular collection of documents that is being searched.
Back to Technology : UniFind Search : Unique Approach
Home
|
Privacy Policy
© 2010 KCSL Inc. All rights reserved.
Find: