At 02:02 PM 26/1/95 EST, Jeff Kell wrote:
>Does anyone have, know of, or have source code for a "fuzzy" string
>match? Not just a phonetic key (Soundex), but rather one which could
>tell you "how closely" two strings match?
>
>Ideally it would be "sort of" like a spelling checker, but extended to
>a string. Simple transpositional errors (2 letters reversed), spelling,
>omitted substrings, etc., would be accounted for.
>
>[\] Jeff Kell, [log in to unmask]
We use the Arizona grep routines (agrep) on our 9000, but I've not tried
porting it to the 3000. Specifically, it provides (amongst other things):
>1) the ability to search for approximate patterns;
> for example, "agrep -2 homogenos foo" will find homogeneous as well
> as any other word that can be obtained from homogenos with at most
> 2 substitutions, insertions, or deletions.
> "agrep -B homogenos foo" will generate a message of the form
> best match has 2 errors, there are 5 matches, output them? (y/n)
The algorithms used are all clearly described, so it might not be too much
effort to mangle to the 3000.
Unfortunately, I can't tell you how to find the source, other than at
Arizona University, Dept of Computer Science. You are probably better tuned
into that than me anyway.
Lots of luck.
----
Jim Wowchuk Internet: [log in to unmask]
Vanguard Computer Services Compu$erve: 100036,106
_--_|\ Post: PO Box 18, North Ryde, NSW 2113
/ \ Phone: +61 (2) 888-9688
\.--.__/ <---Sydney NSW Fax: +61 (2) 888-3056
v Australia
|