LISTSERV - HP3000-L Archives

HP3000-L Archives

March 2003, Week 4

HP3000-L@RAVEN.UTC.EDU

	LISTSERV Archives
	HP3000-L Home
	HP3000-L March 2003, Week 4

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Subject:	Re: When does a job not exist ?
From:	David Powell <[log in to unmask]>
Reply To:	[log in to unmask]
Date:	Tue, 18 Mar 2003 17:26:27 -0800
Content-Type:	text/plain
Parts/Attachments:	text/plain (84 lines)

It's not a very big issue, but maybe I didn't make my point very clearly in
my post.  I know why abortjob might fail right after jinfo said the job
existed, cause I know the other job was ending at that split-second.  (It's
built into my logic that the jinfo always happens when the other job EOJs,
unless the other job actually is hung.)

My beef is that pause and jinfo seem to have different criteria for when a
job exists, so when a job is EOJ-ing, pause can decide it DOESN'T exist, and
THEN jinfo can think it DOES exist.

My guess is that when they enhanced pause to be able to wait for a job to
stop existing, they did the fine print a bit different than when they coded
jinfo.  My *hope* is that it would be just a one-line change to bring them
into sync, and they can clean them up before they all EOJ.

The actual abort doesn't bother me much, cause the job that aborted was just
going to EOJ itself as soon at it was sure that the other job wasn't hung.
I was just startled by the inconsistency between HP's commands.

----- Original Message -----
From: "Olav Kappert" <[log in to unmask]>
To: <[log in to unmask]>
Sent: Tuesday, March 18, 2003 4:33 PM
Subject: Re: [HP3000-L] When does a job not exist ?

> David:
>
> I believe it is a timing issue.
>
> It really would not make a difference if you added a 1 second time delay.
The same timing issue might happen again but one second later.
>
> The problem is the code described requires several cpu cycles. Between
these cycles anything can happen, including what you describe.  I see no
work around unless you want to program in PRIV mode.
>
> You could put a continue before the abort command to make sure the process
does not abend.
>
> Olav Kappert
>
>
> David Powell wrote:
>
> > Seems that pause and jinfo have a slight timing window when they
disagree about when a job no longer exists(?).  I have jobs that wait for a
specified other job to end, then blow it off after 'x' seconds if it thinks
it must be hung.  Normally work just fine, but one time...
> >
> > :PAUSE 900; JOB=#J5648
> > :IF  JINFO('#J5648', "exists")   =   TRUE
> > :     TELLOP ABOUT TO TRY TO ABORT #J5648
> > :     ABORTJOB #J5648
> > ^
> > Job does not exist. (CIERR 3042)
> > REMAINDER OF JOB FLUSHED.
> > CPU sec. = 1.  elapsed min. = 1.  SAT, MAR 1, 2003, 1:03 AM.
> >
> > Both jobs actually finished within 1 minute, so the 15-minute pause
plainly ended when it thought the other job no longer existed.  But then
jinfo thought it did still exist, and then abortjob voted that it didn't.
It was in fact logging off normally during all this.
> >
> > I could easily fix this by adding an extra ':pause 1', it (A) is this a
known problem?  (B) any other workarounds?  (C) Any chance of a fix ?
> >
> > Dave ("starved for ON topic threads, and could the rest of you please
remember your 'OT:'? ")
> >
> > * To join/leave the list, search archives, change list settings, *
> > * etc., please visit http://raven.utc.edu/archives/hp3000-l.html *
>
> * To join/leave the list, search archives, change list settings, *
> * etc., please visit http://raven.utc.edu/archives/hp3000-l.html *
>

* To join/leave the list, search archives, change list settings, *
* etc., please visit http://raven.utc.edu/archives/hp3000-l.html *

* To join/leave the list, search archives, change list settings, *
* etc., please visit http://raven.utc.edu/archives/hp3000-l.html *

ATOM RSS1 RSS2

RAVEN.UTC.EDU