Subject: | |
From: | |
Reply To: | |
Date: | Wed, 4 Nov 1998 08:58:24 -0800 |
Content-Type: | text/plain |
Parts/Attachments: |
|
|
John Dunlop writes:
>
> Folks,
>
> I am dealing with an application that reads barcoded information into
> flat files onto the HP3000. Unfortunately, sometimes the odd character
> or two gets mis-read (we are only talking say 10 in 5,000) and this
> "invalid character" can cause the next application to refuse to load
> the file. Therefore, I have been experimenting with ways to scan the
> file to pinpoint these bad characters. The only valid characters are
> numbers 0-9, all upper case characters and spaces.
...complicated CI example deleted...
> This works fine but is slow and inefficient.
>
> I would be interested to hear from anyone who could suggest a
> better/faster/more efficient way of scanning each character of a
> datafile.
POSIX to the rescue:
grep -v '^[0-9A-Z ]*$' DATAFILE
This says select all lines that are not (-v) consisting of multiple (*)
digits (0-9), uppercase letters (A-Z), and spaces ( ) from the start of the
line (^) through the end of the line ($).
POSIX regular expression pattern matching (regexp) packs a lot of power in
just a few characters.
For full regexp documentation, go into sh.hpbin.sys and say "man regexp".
--
Mark Bixby E-mail: [log in to unmask]
Coast Community College Dist. Web: http://www.cccd.edu/~markb/
District Information Services 1370 Adams Ave, Costa Mesa, CA, USA 92626-5429
Technical Support Voice: +1 714 438-4647
"You can tune a file system, but you can't tune a fish." - tunefs(1M)
|
|
|