LISTSERV - HP3000-L Archives

HP3000-L Archives

August 2013, Week 1

HP3000-L@RAVEN.UTC.EDU

	LISTSERV Archives
	HP3000-L Home
	HP3000-L August 2013, Week 1

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives
Options:	Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]
Subject:	Re: Cobol: Import CSV text file variable length numeric field
From:	"Dave Powell, MMfab" <[log in to unmask]>
Reply To:	Dave Powell, MMfab
Date:	Wed, 7 Aug 2013 17:18:21 -0700
Content-Type:	text/plain
Parts/Attachments:	text/plain (192 lines)
It is actually pretty standard for comma delimited files treat commas inside 
quotes as just part of the field, not a delimiter, and to treat quotes 
otherwise as optional.
This can make a "unstring" shaky as a foundation for a bulletproof 
csv-reader.
I've rolled my own a few times... following is the closest I can find to 
being general-purpose / bullet-proof.

in file section

$DEFINE  %MAXINLEN=256#
FD  INPUT-FILE      RECORD VARYING  FROM  1  TO  %MAXINLEN
                     DEPENDING ON  INPUT-LEN.
01  INPUT-REC                   PIC  X(%MAXINLEN).


in working-storage

77  WS-READ-CNT         PIC S9(09)  COMP.
77  SUB                     PIC S9(09)  COMP.

$DEFINE  %FIELDLIMIT=25#
$DEFINE  %FLENLIMIT=32#
01  INPUT-FIELD-TABLE-AND-PARSERS.
     05  INPUT-LEN           PIC S9(09)  COMP.
     05  IN-PTR              PIC S9(09)  COMP.
     05  FIELD-CNT           PIC S9(09)  COMP.
     05  FIELD-SLOT          OCCURS  %FIELDLIMIT.
         09  F-LEN               PIC S9(09)  COMP  SYNC.
         09  IN-FIELD            PIC  X(%FLENLIMIT).
     05  WORK-SLOT.
         09  WORK-LEN            PIC S9(09)  COMP  SYNC.
         09  WORK-FIELD          PIC  X(%FLENLIMIT).
     05  INPUT-FIELD-FLAGS.
         09  F                       PIC  X(01).
           88  INSIDE-QUOTES             VALUE  "Q".
           88  NOT-INSIDE-QUOTES         VALUE  "R".
         09  F                       PIC  X(01).
           88  IN-A-FIELD                VALUE  "I".
           88  NOT-IN-A-FIELD            VALUE  "J".

FLENLIMIT is a macro for the maximum length of any single field; FIELDLIMIT 
is the max number of fields in any record; MAXINLEN is the max number of 
bytes in any input record.  Adjust to taste.

After you read the csv record into INPUT-REC, the output of the following 
routine goes into the FIELD-SLOT table above, with successive fields in 
WORK-FIELD(1), WORK-FIELD(2), etc, and the byte-count of the fields in 
WORK-LEN(1), WORK-LEN(2), etc.  FIELD-CNT says how many fields there were in 
the record / how many slots in FIELD-SLOT table are meaningful.
This routine handles empty fields like your "reference number", 
commas-in-quotes, etc.  It also skips leading $ & % which showed up as 
useless distractions in my particular input.

***
************************
**
1200-PARSE-INPUT-REC                    SECTION.
     MOVE  0             TO  FIELD-CNT
                             IN-PTR.

     PERFORM  1210-START-A-FIELD.

1200-NEXT-INPUT-BYTE.
     ADD  1              TO  IN-PTR.
     IF  IN-PTR          >   INPUT-LEN
         IF  IN-A-FIELD
             PERFORM  1220-CLOSE-A-FIELD
         END-IF
         GO TO  1200-PARSE-INPUT-X.

     IF  INPUT-REC(IN-PTR:1)     =   ","
         IF  NOT-INSIDE-QUOTES
             PERFORM  1220-CLOSE-A-FIELD
             PERFORM  1210-START-A-FIELD
             GO TO  1200-NEXT-INPUT-BYTE.

     IF  INPUT-REC(IN-PTR:1)     =   """"
         IF  INSIDE-QUOTES
             SET  NOT-INSIDE-QUOTES      TO  TRUE
         ELSE
             SET  INSIDE-QUOTES          TO  TRUE
         END-IF
         GO  TO  1200-NEXT-INPUT-BYTE.

     IF  INPUT-REC(IN-PTR:1)     =   "$"  OR  "%"
         GO TO  1200-NEXT-INPUT-BYTE.

     ADD  1                      TO  WORK-LEN.
     IF  WORK-LEN            >   %FLENLIMIT
         DISPLAY  "FIELD# ",  FIELD-CNT
                  " IS TOO LONG.  CONTENTS = ",  WORK-FIELD
         DISPLAY  "INPUT-REC=",  INPUT-REC(1:INPUT-LEN)
         DISPLAY  "REC# = ", WS-READ-CNT
         CALL  INTRINSIC  "QUIT"  USING  \1220\.
     MOVE  INPUT-REC(IN-PTR:1)   TO  WORK-FIELD(WORK-LEN:1).

     GO TO 1200-NEXT-INPUT-BYTE.

1200-PARSE-INPUT-X.

     COMPUTE  SUB            =   FIELD-CNT  +  1.
     PERFORM     VARYING     SUB     FROM  SUB  BY  1
                 UNTIL       SUB     >   %FIELDLIMIT
         INITIALIZE      FIELD-SLOT(SUB)
     END-PERFORM.

1200-PARSE-DONT-GO-HERE. EXIT.
**************
**
1210-START-A-FIELD              SECTION.
     ADD  1                  TO  FIELD-CNT.
     IF  FIELD-CNT           >   %FIELDLIMIT
         DISPLAY  "MORE THAN %FIELDLIMIT FIELDS IN INPUT REC:"
         DISPLAY  INPUT-REC(1:INPUT-LEN)
         DISPLAY  "REC# = ", WS-READ-CNT
         CALL  INTRINSIC  "QUIT"  USING  \1210\.

     INITIALIZE  WORK-SLOT.
     SET  IN-A-FIELD         TO  TRUE.
     SET  NOT-INSIDE-QUOTES  TO  TRUE.
1210-START-FIELD-X. EXIT.
**
******************
**
1220-CLOSE-A-FIELD                  SECTION.
     MOVE  WORK-SLOT     TO  FIELD-SLOT(FIELD-CNT).

     SET  NOT-IN-A-FIELD         TO  TRUE.
1220-CLOSE-FIELD-X. EXIT.


(Columns line up better in fixed fonts)



-----Original Message----- 
From: Tony Girgenti
Sent: Wednesday, August 07, 2013 13:31
To: [log in to unmask]
Subject: Re: [HP3000-L] Cobol: Import CSV text file variable length numeric 
field

Thanks to all for their replies.

I did not explain this in my original post, but I have no control over the 
format of the input CSV file.  The file is coming from a bank and we have to 
use what they give us.

I would like to use the UNSTRING command and redefine the two fields that I 
need which are check number(Reference Number) and check amount(Amount). 
Here is a sample of the first two records in the file:

"Date","ABA Number","Currency","Account Number","Account 
Name","Description","BAI Code","Amount","Serial Number","Reference 
Number","Detail"
"06/03/2013","999999999","USD","9999999999","XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX","Check 
Paid","999","-39,400.65","99999999","999999","CHECK 
<br>"

Of course I replaced all of the numbers with nines and the Account Name with 
X's.

I realize I need to read past the first record which is a header.

In using the UNSTRING command, I can use ","  OR "<br>" as the DELIMITERS 
and it should work.

The Amount has a dash in front of it, so I can INSPECT it REPLACING all 
SPACES and "-" WITH ZERO.

The only problem I have now, is that one of the records looks like this:
"06/03/2013","999999999","USD","9999999999","XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX","Check 
Paid","999","-39,400.65","99999999",,"CHECK 
<br>"

As you can see, the Reference Number is empty with no quotes around it.  I 
don't know if this is a common occurrence with this file.  I have not 
contacted the bank yet about that.

I'm not sure of how to handle that field with the UNSTRING command, or if 
there is some other method I should use.

Thanks,
Tony

* To join/leave the list, search archives, change list settings, *
* etc., please visit http://raven.utc.edu/archives/hp3000-l.html * 

* To join/leave the list, search archives, change list settings, *
* etc., please visit http://raven.utc.edu/archives/hp3000-l.html *
ATOM RSS1 RSS2
RAVEN.UTC.EDU