HP3000-L Archives

September 2011, Week 2

HP3000-L@RAVEN.UTC.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Craig Lalley <[log in to unmask]>
Reply To:
Craig Lalley <[log in to unmask]>
Date:
Sun, 11 Sep 2011 07:32:42 -0700
Content-Type:
text/plain
Parts/Attachments:
text/plain (85 lines)
Mark,

Did you check your network status?

LINKCONTROL @,a

and the standard NETTOOL.NET.SYS --> RESOURCE --> DISPLAY?

It's a possibility.

-Craig


--- On Sat, 9/10/11, Mark Ranft <[log in to unmask]> wrote:

From: Mark Ranft <[log in to unmask]>
Subject: Tidal OCS Express - shutting down
To: [log in to unmask]
Date: Saturday, September 10, 2011, 10:24 AM

After running for a year, last week my client's EXPRESSJ,MGR.EXPAGENT jobs
shutdown.  This occurred on two systems both the production and the test
system.  This was not expected.  It shutdown normally.  The jobs went to
:EOJ.  I started the jobs again.  After review console log messages, we
blamed the shutdown on network errors that occurred just prior to the job
stopping.

What else would cause both jobs to shutdown like that?

Note the master scheduler is on a non-MPE system.  These are just agents.

Now today, one week later (minus 18 minutes) both jobs stops again.
Interestingly, the test system stopped about an hour later, which again was
about one week (minus 4 minutes) from when it was streamed.

What would make a job that normally runs constantly end after first one year
and then one week.  Will it be stopping after one day next?  I am a little
baffled.

Here is the output from the Express Agent log.

2011 Sep 10 08:54:15 agentd MGR.EXPAGENT 4: Message received by pid
56295826: UPTM
2011 Sep 10 08:54:15 agentd MGR.EXPAGENT 4: Load is 99.9999 (#56295826)
2011 Sep 10 08:54:15 agentd MGR.EXPAGENT 4: Message received by pid
56295826: TIME
2011 Sep 10 08:54:15 agentd MGR.EXPAGENT 4: Time is 1315644855)
2011 Sep 10 08:54:16 cqd MGR.EXPAGENT 1: cqd: shut down (pid=86507785)
2011 Sep 10 08:54:16 jobd MGR.EXPAGENT 4: 0 children waiting
2011 Sep 10 08:54:16 jobd MGR.EXPAGENT 4: 0 children waiting
2011 Sep 10 08:54:16 jobd MGR.EXPAGENT 4: 0 children waiting
2011 Sep 10 08:54:16 jobd MGR.EXPAGENT 4: 0 children waiting
2011 Sep 10 08:54:16 jobd MGR.EXPAGENT 4: 0 children waiting
2011 Sep 10 08:54:16 efiled MGR.EXPAGENT 4: RecvMsg msg=0
2011 Sep 10 08:54:16 efiled MGR.EXPAGENT 4: shutdown msg -- going down
2011 Sep 10 08:54:16 jobd MGR.EXPAGENT 4: 0 children waiting
2011 Sep 10 08:54:16 clockd MGR.EXPAGENT 1: clockd message queue read error
-1.
2011 Sep 10 08:54:16 jobd MGR.EXPAGENT 4: 0 children waiting
2011 Sep 10 08:54:16 efiled MGR.EXPAGENT 1: efiled shutting down
2011 Sep 10 08:54:16 efiled MGR.EXPAGENT 1: do_shutdown
2011 Sep 10 08:54:16 agentd MGR.EXPAGENT 4: Message received by pid
56295826:
2011 Sep 10 08:54:16 agentd MGR.EXPAGENT 4: Socket service terminated
(#56295826)
2011 Sep 10 08:54:16 clockd MGR.EXPAGENT 1: Server 'clockd' shutdown.
2011 Sep 10 08:54:16 agentd MGR.EXPAGENT 4: Shutdown received (#61210855)
2011 Sep 10 08:54:16 agentd MGR.EXPAGENT 1: Server 'agentd' shutdown.
2011 Sep 10 08:54:16 jobd MGR.EXPAGENT 4: pt_del for 1174847
2011 Sep 10 08:54:16 jobd MGR.EXPAGENT 4: pt_del for 1174876
2011 Sep 10 08:54:16 jobd MGR.EXPAGENT 1: jobd (68878603) shutdown.
2011 Sep 10 08:54:16 efiled MGR.EXPAGENT 4: sending CLEARTIME_OP to clockq
2011 Sep 10 08:54:16 efiled MGR.EXPAGENT 4: closing efileq

Does anyone know what may be happening here?

Mark Ranft
Pro 3K

* To join/leave the list, search archives, change list settings, *
* etc., please visit http://raven.utc.edu/archives/hp3000-l.html *

* To join/leave the list, search archives, change list settings, *
* etc., please visit http://raven.utc.edu/archives/hp3000-l.html *

ATOM RSS1 RSS2