> Wirt Atmar wrote:
>
> > By comparison, to accomplish the same thing in 30-year-old BASIC, using
the
> > 25-year-old IMAGE database that I just referenced requires this much
code:
> >
> > CALL XDBGET(B$,"MASTERSET;",M5,S[*],"WORD;",W$,"")
> >
> > That's it. Just stick that line in anywhere in your code. If S[1] = 0,
the
> > word is spelled correctly. If not, it's not. But even more than that,
it's
> > also really quite efficient.
This line
CALL XDBGET(B$,"MASTERSET;",M5,S[*],"WORD;",W$,"")
is equivaled to this part of the Perl program:
$dict{$_}
> i started to mention that your solution was far more efficient....but i
figured
> someone else would point that out. :-)
Not necessarily. If you are spell checking a long document, the Perl
program you gave is probably faster since it holds the dictionary in memory.
That's a reasonable approach for a stand-alone program, but not ideal for a
subroutine.
I just tried it as an experiment.
Test dictionary: 238,640 words (2,757,518 bytes)
Test text: "Losing the War" by Lee Sandlin (about 33,000 words, or 198K
bytes).
This program took 2.1 seconds on a Windows/2000 PC with a Pentium III.
Almost all of that was to load the dictionary.
#!perl -w
use strict;
my %dict;
open D,"<words";
while(<D>){
chomp;
$dict{lc($_)}=1;
}
close D;
while (<>) {
my @words=split /[^a-zA-Z0-9']+/,$_;
foreach (@words){
if(!$dict{lc($_)}) {
print "\"$_\" is not in the dictionary\n";
}
}
}
By comparison, I loaded the dictionary into a hashed database:
#!perl -w
use DB_File;
my %dict;
tie %dict, "DB_File", "hashed_dict", O_RDWR|O_CREAT, 0640, $DB_HASH
or die "Cannot open file 'hashed_dict': $!\n";
while (<>) {
chomp;
$dict{lc($_)}=1;
}
That took 70 seconds, but of course you only need to do it once.
Then I used the hashed dictionary to spell check the same text file. This
time it took 5.5 seconds. So, I was right. For any lengthy text, the first
approach is faster (but uses a lot more memory).
#!perl -w
use DB_File;
use strict;
my %dict;
tie %dict, "DB_File", "hashed_dict", O_RDONLY, 0640, $DB_HASH
or die "Cannot open file 'hashed_dict': $!\n";
while (<>) {
my @words=split /[^a-zA-Z0-9']+/,$_;
foreach (@words){
if(!$dict{lc($_)}) {
print "\"$_\" is not in the dictionary\n";
}
}
}
For a short file (1359 words), it was the other way around. The in-memory
program took 2.0 seconds, the hash file program took 0.3 seconds
* To join/leave the list, search archives, change list settings, *
* etc., please visit http://raven.utc.edu/archives/hp3000-l.html *
|