Subject: | |
From: | |
Reply To: | |
Date: | Mon, 22 Jan 2001 09:35:06 -0500 |
Content-Type: | text/plain |
Parts/Attachments: |
|
|
What Mark says about large files is very true. I had a client who was
trying to separate a single file into multiple files using WinBatch, a
windows-based scripting language. The job was running for hours. We
re-wrote the script in awk and it ran in 20 seconds.
{
; empty initialization
}
{
; for each record
print $0 >> "data_" substr($0, 1, 1) ".txt"
}
{
; empty finalization
}
This is a stripped down version, but here's the gist. The variable $0 is
the current record. The >> operator appends to the file named after it.
The expression "data_" substr($0, 1, 1) ".txt" will produce file names that
begin with the string data_ and append the value in position 1 for a length
of 1 and add the .txt extension. After running a data file through this
script it will create a filename for each different value and that file will
hold those records with that same value. (Of course if the file name
already exists, it will just append - so you better code for that!) Simple,
yet very powerful. awk is very adept at handling delimited files too. Once
awk knows the delimiter for the file, one can access each 'field' using the
construct $1 for the first field, $2 for the second field, etc. and you
already know that $0 is for all fields. Neat stuff.
Mark Wonsil
4M Enterprises, Inc.
Mark Bixby writes:
>POSIX's /bin/awk is particularly well suited for manipulating large files
>without the overhead of typical CI-based approaches.
>
>To learn more about awk, see:
>
> :xeq sh.hpbin.sys -L
> man awk
>
> or
>
> http://docs.hp.com/mpeix/pdf/36431-90007.pdf
|
|
|