It is actually pretty standard for comma delimited files treat commas inside
quotes as just part of the field, not a delimiter, and to treat quotes
otherwise as optional.
This can make a "unstring" shaky as a foundation for a bulletproof
csv-reader.
I've rolled my own a few times... following is the closest I can find to
being general-purpose / bullet-proof.
in file section
$DEFINE %MAXINLEN=256#
FD INPUT-FILE RECORD VARYING FROM 1 TO %MAXINLEN
DEPENDING ON INPUT-LEN.
01 INPUT-REC PIC X(%MAXINLEN).
in working-storage
77 WS-READ-CNT PIC S9(09) COMP.
77 SUB PIC S9(09) COMP.
$DEFINE %FIELDLIMIT=25#
$DEFINE %FLENLIMIT=32#
01 INPUT-FIELD-TABLE-AND-PARSERS.
05 INPUT-LEN PIC S9(09) COMP.
05 IN-PTR PIC S9(09) COMP.
05 FIELD-CNT PIC S9(09) COMP.
05 FIELD-SLOT OCCURS %FIELDLIMIT.
09 F-LEN PIC S9(09) COMP SYNC.
09 IN-FIELD PIC X(%FLENLIMIT).
05 WORK-SLOT.
09 WORK-LEN PIC S9(09) COMP SYNC.
09 WORK-FIELD PIC X(%FLENLIMIT).
05 INPUT-FIELD-FLAGS.
09 F PIC X(01).
88 INSIDE-QUOTES VALUE "Q".
88 NOT-INSIDE-QUOTES VALUE "R".
09 F PIC X(01).
88 IN-A-FIELD VALUE "I".
88 NOT-IN-A-FIELD VALUE "J".
FLENLIMIT is a macro for the maximum length of any single field; FIELDLIMIT
is the max number of fields in any record; MAXINLEN is the max number of
bytes in any input record. Adjust to taste.
After you read the csv record into INPUT-REC, the output of the following
routine goes into the FIELD-SLOT table above, with successive fields in
WORK-FIELD(1), WORK-FIELD(2), etc, and the byte-count of the fields in
WORK-LEN(1), WORK-LEN(2), etc. FIELD-CNT says how many fields there were in
the record / how many slots in FIELD-SLOT table are meaningful.
This routine handles empty fields like your "reference number",
commas-in-quotes, etc. It also skips leading $ & % which showed up as
useless distractions in my particular input.
***
************************
**
1200-PARSE-INPUT-REC SECTION.
MOVE 0 TO FIELD-CNT
IN-PTR.
PERFORM 1210-START-A-FIELD.
1200-NEXT-INPUT-BYTE.
ADD 1 TO IN-PTR.
IF IN-PTR > INPUT-LEN
IF IN-A-FIELD
PERFORM 1220-CLOSE-A-FIELD
END-IF
GO TO 1200-PARSE-INPUT-X.
IF INPUT-REC(IN-PTR:1) = ","
IF NOT-INSIDE-QUOTES
PERFORM 1220-CLOSE-A-FIELD
PERFORM 1210-START-A-FIELD
GO TO 1200-NEXT-INPUT-BYTE.
IF INPUT-REC(IN-PTR:1) = """"
IF INSIDE-QUOTES
SET NOT-INSIDE-QUOTES TO TRUE
ELSE
SET INSIDE-QUOTES TO TRUE
END-IF
GO TO 1200-NEXT-INPUT-BYTE.
IF INPUT-REC(IN-PTR:1) = "$" OR "%"
GO TO 1200-NEXT-INPUT-BYTE.
ADD 1 TO WORK-LEN.
IF WORK-LEN > %FLENLIMIT
DISPLAY "FIELD# ", FIELD-CNT
" IS TOO LONG. CONTENTS = ", WORK-FIELD
DISPLAY "INPUT-REC=", INPUT-REC(1:INPUT-LEN)
DISPLAY "REC# = ", WS-READ-CNT
CALL INTRINSIC "QUIT" USING \1220\.
MOVE INPUT-REC(IN-PTR:1) TO WORK-FIELD(WORK-LEN:1).
GO TO 1200-NEXT-INPUT-BYTE.
1200-PARSE-INPUT-X.
COMPUTE SUB = FIELD-CNT + 1.
PERFORM VARYING SUB FROM SUB BY 1
UNTIL SUB > %FIELDLIMIT
INITIALIZE FIELD-SLOT(SUB)
END-PERFORM.
1200-PARSE-DONT-GO-HERE. EXIT.
**************
**
1210-START-A-FIELD SECTION.
ADD 1 TO FIELD-CNT.
IF FIELD-CNT > %FIELDLIMIT
DISPLAY "MORE THAN %FIELDLIMIT FIELDS IN INPUT REC:"
DISPLAY INPUT-REC(1:INPUT-LEN)
DISPLAY "REC# = ", WS-READ-CNT
CALL INTRINSIC "QUIT" USING \1210\.
INITIALIZE WORK-SLOT.
SET IN-A-FIELD TO TRUE.
SET NOT-INSIDE-QUOTES TO TRUE.
1210-START-FIELD-X. EXIT.
**
******************
**
1220-CLOSE-A-FIELD SECTION.
MOVE WORK-SLOT TO FIELD-SLOT(FIELD-CNT).
SET NOT-IN-A-FIELD TO TRUE.
1220-CLOSE-FIELD-X. EXIT.
(Columns line up better in fixed fonts)
-----Original Message-----
From: Tony Girgenti
Sent: Wednesday, August 07, 2013 13:31
To: [log in to unmask]
Subject: Re: [HP3000-L] Cobol: Import CSV text file variable length numeric
field
Thanks to all for their replies.
I did not explain this in my original post, but I have no control over the
format of the input CSV file. The file is coming from a bank and we have to
use what they give us.
I would like to use the UNSTRING command and redefine the two fields that I
need which are check number(Reference Number) and check amount(Amount).
Here is a sample of the first two records in the file:
"Date","ABA Number","Currency","Account Number","Account
Name","Description","BAI Code","Amount","Serial Number","Reference
Number","Detail"
"06/03/2013","999999999","USD","9999999999","XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX","Check
Paid","999","-39,400.65","99999999","999999","CHECK
<br>"
Of course I replaced all of the numbers with nines and the Account Name with
X's.
I realize I need to read past the first record which is a header.
In using the UNSTRING command, I can use "," OR "<br>" as the DELIMITERS
and it should work.
The Amount has a dash in front of it, so I can INSPECT it REPLACING all
SPACES and "-" WITH ZERO.
The only problem I have now, is that one of the records looks like this:
"06/03/2013","999999999","USD","9999999999","XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX","Check
Paid","999","-39,400.65","99999999",,"CHECK
<br>"
As you can see, the Reference Number is empty with no quotes around it. I
don't know if this is a common occurrence with this file. I have not
contacted the bank yet about that.
I'm not sure of how to handle that field with the UNSTRING command, or if
there is some other method I should use.
Thanks,
Tony
* To join/leave the list, search archives, change list settings, *
* etc., please visit http://raven.utc.edu/archives/hp3000-l.html *
* To join/leave the list, search archives, change list settings, *
* etc., please visit http://raven.utc.edu/archives/hp3000-l.html *
|