HP3000-L Archives

January 2001, Week 4

HP3000-L@RAVEN.UTC.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Mark Wonsil <[log in to unmask]>
Reply To:
Date:
Mon, 22 Jan 2001 09:35:06 -0500
Content-Type:
text/plain
Parts/Attachments:
text/plain (45 lines)
What Mark says about large files is very true.  I had a client who was
trying to separate a single file into multiple files using WinBatch, a
windows-based scripting language.  The job was running for hours.  We
re-wrote the script in awk and it ran in 20 seconds.

{
; empty initialization
}
{
; for each record
 print $0 >> "data_" substr($0, 1, 1) ".txt"
}
{
; empty finalization
}

This is a stripped down version, but here's the gist.  The variable $0 is
the current record.  The >> operator appends to the file named after it.
The expression "data_" substr($0, 1, 1) ".txt" will produce file names that
begin with the string data_ and append the value in position 1 for a length
of 1 and add the .txt extension.  After running a data file through this
script it will create a filename for each different value and that file will
hold those records with that same value.  (Of course if the file name
already exists, it will just append - so you better code for that!)  Simple,
yet very powerful.  awk is very adept at handling delimited files too.  Once
awk knows the delimiter for the file, one can access each 'field' using the
construct $1 for the first field, $2 for the second field, etc. and you
already know that $0 is for all fields.  Neat stuff.

Mark Wonsil
4M Enterprises, Inc.

Mark Bixby writes:
>POSIX's /bin/awk is particularly well suited for manipulating large files
>without the overhead of typical CI-based approaches.
>
>To learn more about awk, see:
>
>        :xeq sh.hpbin.sys -L
>        man awk
>
>        or
>
>        http://docs.hp.com/mpeix/pdf/36431-90007.pdf

ATOM RSS1 RSS2