HP3000-L Archives

July 2002, Week 2

HP3000-L@RAVEN.UTC.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
"Jonathan M. Backus" <[log in to unmask]>
Reply To:
Date:
Fri, 12 Jul 2002 17:37:21 -0400
Content-Type:
text/plain
Parts/Attachments:
text/plain (67 lines)
Allen,

        You still don't give quite enough answer for the "best" solution.  Are the
two files sorted by the "deduping" field?  If so you don't even need a
program, per say.  Write a command file that simply reads a record from the
50,000 record file, parses the "deduping" data into variable "A", up shift
it so case is not an issue (if desired), and then read the 5 million record
file one record at a time doing the same into variable "B".  If "A" matches
"B" then get another "A" record and continue.  If "A" becomes bigger then
"B" kick it out.  Get another "A" record and compare it to your current "B"
record, kicking them out until "B" is bigger or equal.  Then continue with
the original compare loop.

Thanx,
        Jon


-----Original Message-----
From: HP-3000 Systems Discussion [mailto:[log in to unmask]]On
Behalf Of Porter, Allen
Sent: Friday, July 12, 2002 5:14 PM
To: [log in to unmask]
Subject: [HP3000-L] Deduping files


I'm looking for opinions and experiences with deduping large fixed ASCII
files.  For instance, if you have a list of names (50,000 records) and you
want to bounce that against a master list of names (5 million records) to
produce a third file of non-matching records ( something less than 50,000
records), what would be the best tool to use?  Also, for this little
example, let's say that the matching field will be a 40 character name
field.

There are a multitude of ways to do this.  If you were patient, you could
even use QEdit, but who has that kind of patience?  So, what would be your
tool of choice...Image? SQL? Access? A custom C program?  Some mystery UNIX
utility?  Whatever your favorite solution would be.  I'm interested in
finding out what everyone thinks is the easiest and the fastest way to
accomplish something like this.

> Allen Porter
> ENVOY
> ISO 9001 Registered
> Phone:  636-827-5704
> Fax:  636-827-5874
>
> Visit our Web site @ http://www.yourenvoy.com
>
>


<font size="1">Confidentiality Warning:  This e-mail contains information
intended only for the use of the individual or entity named above.  If the
reader of this e-mail is not the intended recipient or the employee or agent
responsible for delivering it to the intended recipient, any dissemination,
publication or copying of this e-mail is strictly prohibited. The sender
does not accept any responsibility for any loss, disruption or damage to
your data or computer system that may occur while using data contained in,
or transmitted with, this e-mail.   If you have received this e-mail in
error, please immediately notify us by return e-mail.  Thank you.

* To join/leave the list, search archives, change list settings, *
* etc., please visit http://raven.utc.edu/archives/hp3000-l.html *

* To join/leave the list, search archives, change list settings, *
* etc., please visit http://raven.utc.edu/archives/hp3000-l.html *

ATOM RSS1 RSS2