logo
Another error spotted during update agents to new version…

… and inside logs:

java.lang.Throwable: java.security.PrivilegedActionException: oracle.sysman.emSDK.agent.comm.exception.VerifyConnectionException: unable to connect to http server at https://SERVER_NAME:3873/emd/main/. [peer not authenticated]java.lang.Throwable: java.security.PrivilegedActionException: oracle.sysman.emSDK.agent.comm.exception.VerifyConnectionException: unable to connect to http server at https://SERVER_NAME:3873/emd/main/. [peer not authenticated]

… later agent thrown:

Collection Status                            : [COLLECTIONS_HALTED(
  UPLOAD_SYSTEM Threshold (UploadMaxNumberXML: 5000) exceeded with 51567 files)]


Let’s go step by step to better understand this situation:


001. Management Agent blackout error

001. Agent upgrade are failed at start:


002. peer not authenticated error

002. Let’s check logs:

 


 003. We now try to start them manually in desired host – first we check status:

[root@SERVER_NAME ~]# su - oraagent
[oraagent@SERVER_NAME ~]$ emctl status agent
Oracle Enterprise Manager Cloud Control 12c Release 2
Copyright (c) 1996, 2012 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
Status agent Failure:unable to connect to http server at https://SERVER_NAME:3873/emd/lifecycle/main/. [peer not authenticated]
Agent is Not Running

004. Agent is not running, start them:

[oraagent@SERVER_NAME ~]$ emctl start agent
Oracle Enterprise Manager Cloud Control 12c Release 2
Copyright (c) 1996, 2012 Oracle Corporation. All rights reserved.
Starting agent ....... started.

005. Started. Check status?

[oraagent@SERVER_NAME ~]$ emctl status agent
Oracle Enterprise Manager Cloud Control 12c Release 2
Copyright (c) 1996, 2012 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
Agent Version : 12.1.0.2.0
OMS Version : 12.1.0.3.0
Protocol Version : 12.1.0.1.0
Agent Home : /u01/app/oraagent/agent12c/agent_inst
Agent Binaries : /u01/app/oraagent/agent12c/core/12.1.0.2.0
Agent Process ID : 18529
Parent Process ID : 7805
Agent URL : https://SERVER_NAME:3873/emd/main/
Repository URL : https://oem12c:4900/empbs/upload
Started at : 2014-01-16 15:10:15
Started by user : oraagent
Last Reload : (none)
Last successful upload : 2014-04-23 12:05:48
Last attempted upload : 2014-04-23 15:34:46
Total Megabytes of XML files uploaded so far : 238.54
Number of XML files pending upload : 17
Size of XML files pending upload(MB) : 0.01
Available disk space on upload filesystem : 23.71%
Collection Status : Collections enabled
Heartbeat Status : Ok
Last attempted heartbeat to OMS : 2014-04-23 15:34:40
Last successful heartbeat to OMS : 2014-04-23 15:34:40
Next scheduled heartbeat to OMS : 2014-04-23 15:35:41

---------------------------------------------------------------
Agent is Running and Ready

 


 

006. Agent Unreachable

006. After a while agent in agent list got broken state.



 

006a. Example of agent behaviour

006a. Keep in mind, that communication between agent and OMS are not shown in real-time sometime…
I made a small test to be sure what is working.

Agent are manually stopped by me. Even if You try to refresh page (F5 or reload button placed at top-right), this will not update agent Status. Agent is still shown as up and running.

 


 

006. OMS updated agent status

006b. After few minutes OMS will display correct agent state.


 


 

007. Come to host and check his state again:

[oraagent@SERVER_NAME ~]$ emctl status agent
Oracle Enterprise Manager Cloud Control 12c Release 2
Copyright (c) 1996, 2012 Oracle Corporation. All rights reserved.
---------------------------------------------------------------

Status agent Failure:unable to connect to http server at https://SERVER_NAME:3873/emd/lifecycle/main/. [peer not authenticated]
Agent is Not Running

Agent is died?


008. Try to start agan, last time (this is bad habit, better if i check logs now):

[oraagent@SERVER_NAME ~]$ emctl status agent
Oracle Enterprise Manager Cloud Control 12c Release 2
Copyright (c) 1996, 2012 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
Status agent Failure:unable to connect to http server at https://SERVER_NAME:3873/emd/lifecycle/main/. [peer not authenticated]
Agent is Not Running

[oraagent@SERVER_NAME ~]$ emctl start agent
Oracle Enterprise Manager Cloud Control 12c Release 2
Copyright (c) 1996, 2012 Oracle Corporation. All rights reserved.
Starting agent .......... failed.
HTTP Listener failed at Startup
Possible port conflict on port(3873): Retrying the operation...
Failed to start the agent after 1 attempts. Please check that the port(3873) is available.
Consult emctl.log and emagent.nohup in: /u01/app/oraagent/agent12c/agent_inst/sysman/log

Strange errors…


 

009. Ok, now we look at logs:
In emctl.log:

 at oracle.sysman.gcagent.util.system.GCAThread$RunnableWrapper.run(GCAThread.java:141)
 at java.lang.Thread.run(Thread.java:662)
2014-04-23 11:08:57,789 [1412134:GC.Executor.703935 (oracle_database:SERVER_NAME:Response) (oracle_database:SERVER_NAME:Response:Response)] ERROR -
 File not found exception
java.io.FileNotFoundException: /u01/app/oraagent/agent12c/agent_inst/sysman/emd/state/statemgmt/oracle_database/SERVER_NAME/SevDiff/8.tmp (Too
 many open files)
 at java.io.FileOutputStream.open(Native Method)
 at java.io.FileOutputStream.<init>(FileOutputStream.java:179)
 at java.io.FileOutputStream.<init>(FileOutputStream.java:131)
 at oracle.sysman.gcagent.state.StateMgr.upsertStateEntry(StateMgr.java:995)
 at oracle.sysman.gcagent.target.interaction.execution.Threshold.recordState(Threshold.java:1703)
 at oracle.sysman.gcagent.target.interaction.execution.Threshold.threshold_clear(Threshold.java:2506)
 at oracle.sysman.gcagent.target.interaction.execution.ExecuteTask.check_Thresholds(ExecuteTask.java:2502)
 at oracle.sysman.gcagent.target.interaction.execution.ExecuteTask.ProcessResultSet(ExecuteTask.java:2039)
 at oracle.sysman.gcagent.target.interaction.execution.ExecuteTask.runTask(ExecuteTask.java:4102)
 at oracle.sysman.gcagent.target.interaction.execution.ExecuteTask.call(ExecuteTask.java:4938)
 at oracle.sysman.gcagent.metadata.impl.collection.MetricColl$1.call(MetricColl.java:536)
 at oracle.sysman.gcagent.metadata.impl.collection.MetricColl$1.call(MetricColl.java:503)
 at oracle.sysman.gcagent.task.TaskFutureImpl$WrappedTask.accountedCall(TaskFutureImpl.java:548)
 at oracle.sysman.gcagent.task.TaskFutureImpl$WrappedTask.call(TaskFutureImpl.java:587)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

010. Too many open files can mean one thing: we reached filesystem limit. Check it:

[root@SERVER_NAME ~]# su - oraagent
[oraagent@SERVER_NAME ~]$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 139264
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 1024   # <------ Too low!
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 139264
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
[oraagent@SERVER_NAME ~]$ exit

011. We increase file limits for this user – go to root account.

vi /etc/security/limits.conf
# add at end of file and save:
oraagent hard nofile 65536
oraagent soft nofile 65536

[root@SERVER_NAME ~]# sysctl -p

012. We login again to user account and check changes:

# su - oraaagent

[oraagent@SERVER_NAME ~]$ ulimit -n
open files (-n) 65536

Ok, limit is increased.


013. Now is time to start agent.

[oraagent@SERVER_NAME ~]$ emctl clearstate agent
Oracle Enterprise Manager Cloud Control 12c Release 2
Copyright (c) 1996, 2012 Oracle Corporation. All rights reserved.
EMD clearstate completed successfully
[oraagent@SERVER_NAME ~]$ emctl start agent
Oracle Enterprise Manager Cloud Control 12c Release 2
Copyright (c) 1996, 2012 Oracle Corporation. All rights reserved.
Starting agent ........... failed.
HTTP Listener failed at Startup
Possible port conflict on port(3873): Retrying the operation...
Failed to start the agent after 1 attempts. Please check that the port(3873) is available.
Consult emctl.log and emagent.nohup in: /u01/app/oraagent/agent12c/agent_inst/sysman/log

Port is allocated? But why? By who? (use lsof?)


014. Check this log:

UID PID PPID C STIME TTY TIME CMD
oraagent 5723 5722 0 16:41 pts/1 00:00:00 /bin/sh -f /u01/app/oraagent/agent12c/core/12.1.0.2.0/bin/emctl start agent

----
5732 :: Wed Apr 23 16:41:15 2014::EM_SECURE_HOSTNME: SERVER_NAME
5732 :: Wed Apr 23 16:41:15 2014::EM_SECURE_PORT: 3873
5732 :: Wed Apr 23 16:41:15 2014::EM_LISTEN_ON_ALL_NICS: true
5732 :: Wed Apr 23 16:42:18 2014::AgentLifeCycle.pm:status agent returned with retCode=1
6122 :: Wed Apr 23 16:42:18 2014::AgentLifeCycle.pm: Launching the watchdog process.
6122 :: Wed Apr 23 16:42:18 2014::TZ = Poland, Read = Poland. Found = 1.
6122 :: Wed Apr 23 16:42:19 2014::agentTZRegion successfully validated.
6122 :: Wed Apr 23 16:42:19 2014::AgentTuning.pm: Launching emdctl with -Xmx1024m
6122 :: Wed Apr 23 16:42:20 2014::AgentStatus.pm:emdctl run_autotune returned 0
5732 :: Wed Apr 23 16:42:18 2014::AgentLifeCycle.pm: Launched the watchdog process pid=6122
5732 :: Wed Apr 23 16:42:18 2014::AgentLifeCycle.pm: StartCEMD start retryCount=120
5732 :: Wed Apr 23 16:42:25 2014::AgentLifeCycle.pm: StartCEMD status message file detected
5732 :: Wed Apr 23 16:42:25 2014::Found Msg file with: {1398264144584,initializing}
5732 :: Wed Apr 23 16:42:25 2014::AgentLifeCycle.pm: StartCEMD Querying for the real status of the agent
5732 :: Wed Apr 23 16:43:29 2014::AgentLifeCycle.pm: StartCEMD status message file detected
5732 :: Wed Apr 23 16:43:29 2014::Found Msg file with: {1398264144584,initializing}
5732 :: Wed Apr 23 16:43:29 2014::AgentLifeCycle.pm: StartCEMD Querying for the real status of the agent
5732 :: Wed Apr 23 16:44:31 2014::AgentLifeCycle.pm:Watch dog processs id: 6122 exited with an exit code of 55
5732 :: Wed Apr 23 16:44:31 2014::AgentLifeCycle.pm: Exited loop retryCount=112 with retCode=1
5732 :: Wed Apr 23 16:44:31 2014::AgentLifeCycle.pm: StartCEMD Querying for the real status of the agent
5732 :: Wed Apr 23 16:45:34 2014::AgentLifeCycle.pm: Check agent status retCode=1
5732 :: Wed Apr 23 16:45:49 2014::TZ: EmctlLogAvailabilityMarker Operation=start Diag=failed

015. Do You remember that agent start failed? So we don’t expect his process in memory, right?

[oraagent@SERVER_NAME ~]$ ps -fu oraagent
UID PID PPID C STIME TTY TIME CMD
oraagent 4278 1 0 2013 ? 00:00:40 /u01/app/oraagent/agent11g/perl/bin/perl /u01/app/oraagent/agent11g/bin/emwd.pl agent /u0
oraagent 4983 4982 0 16:39 pts/1 00:00:00 -bash
oraagent 7805 1 0 2013 ? 00:00:07 /u01/app/oraagent/agent12c/core/12.1.0.2.0/perl/bin/perl /u01/app/oraagent/agent12c/core/
oraagent 8133 4983 0 16:46 pts/1 00:00:00 ps -fu oraagent
oraagent 13277 4278 0 Mar30 ? 00:44:24 /u01/app/oraagent/agent11g/bin/emagent
oraagent 18529 7805 8 Jan16 ? 8-16:53:59 /u01/app/oraagent/agent12c/core/12.1.0.2.0/jdk/bin/java -Xmx128M -XX:MaxPermSize=96M -s

Agent is occupying memory! He is started – liar!


016. Kill this bastard!

[oraagent@SERVER_NAME ~]$ kill -9 18529 7805

017. Star agent again:

[oraagent@SERVER_NAME ~]$ emctl start agent
Oracle Enterprise Manager Cloud Control 12c Release 2
Copyright (c) 1996, 2012 Oracle Corporation. All rights reserved.
Starting agent ....................... started.

018. Started. Yea… we have been lied to…

[oraagent@SERVER_NAME ~]$ emctl status agent
Oracle Enterprise Manager Cloud Control 12c Release 2
Copyright (c) 1996, 2012 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
Agent Version : 12.1.0.2.0
OMS Version : 12.1.0.3.0
Protocol Version : 12.1.0.1.0
Agent Home : /u01/app/oraagent/agent12c/agent_inst
Agent Binaries : /u01/app/oraagent/agent12c/core/12.1.0.2.0
Agent Process ID : 12711
Parent Process ID : 12622
Agent URL : https://SERVER_NAME:3873/emd/main/
Repository URL : https://oem12c:4900/empbs/upload
Started at : 2014-04-23 16:52:33
Started by user : oraagent
Last Reload : (none)
Last successful upload : 2014-04-23 16:53:41
Last attempted upload : 2014-04-23 16:53:41
Total Megabytes of XML files uploaded so far : 0.66
Number of XML files pending upload : 50,605
Size of XML files pending upload(MB) : 24.59
Available disk space on upload filesystem : 23.72%
Collection Status : [COLLECTIONS_HALTED(
 UPLOAD_SYSTEM Threshold (UploadMaxNumberXML: 5000) exceeded with 51567 files)]
Heartbeat Status : Ok
Last attempted heartbeat to OMS : 2014-04-23 16:52:52
Last successful heartbeat to OMS : 2014-04-23 16:52:52
Next scheduled heartbeat to OMS : 2014-04-23 16:53:52

---------------------------------------------------------------
Agent is Running and Ready

Nope. Running and ready but with problem…
COLLECTIONS_HALTED – Agent have more collected records that allows his limit.
We can try to empty them.


019. Try to manually push upload data to OMS:

[oraagent@SERVER_NAME ~]$ emctl upload agent
Oracle Enterprise Manager Cloud Control 12c Release 2
Copyright (c) 1996, 2012 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
EMD upload error:full upload has failed: Upload Recent Sending Statistics
________________________________
Channel=severity send[ms]=47 fsz=711
Channel=severity send[ms]=52 fsz=711
Channel=severity send[ms]=50 fsz=711
(....)
Channel=severity send[ms]=50 fsz=711
Channel=severity send[ms]=50 fsz=711
Channel=severity send[ms]=51 fsz=711
Channel=severity send[ms]=47 fsz=711
Channel=severity send[ms]=48 fsz=711
Channel=severity send[ms]=49 fsz=711
Channel=severity send[ms]=48 fsz=711
Channel=severity send[ms]=50 fsz=711
Channel=severity send[ms]=49 fsz=711
Channel=severity send[ms]=51 fsz=711
Channel=severity send[ms]=54 fsz=711
Channel=severity send[ms]=54 fsz=711

Avg send time=53.57441 milliseconds
Backoff Event List
________________________________
Upload Failure List
_______________________________
Connection Event List
_____________________________
Upload timed out before completion.
Number of files to upload before the uploadNow call: 49811, total size (MB): 24.047525
Remaining number of files to upload: 48706, total size (MB): 23.298267 (TIMEOUT)

This helps, but not too much. We got at start 51567 files ready to upload.
Now we still have 48706 remaining in queue.
But good… some of data are uploaded.


020. Solution for this moment? Erase of collected data… Maybe not – we try to upload again?
End of upload process:

(...)
Upload timed out before completion.
Number of files to upload before the uploadNow call: 44886, total size (MB): 20.724644
Remaining number of files to upload: 43764, total size (MB): 19.961292 (TIMEOUT)

Again part of data are sent to the management server.
I’m patient. Try upload again and again. Everytime part of stored data are sent:

(...)
Upload timed out before completion.
Number of files to upload before the uploadNow call: 43314, total size (MB): 19.656172
Remaining number of files to upload: 42197, total size (MB): 18.898792 (TIMEOUT)

021. Check status?

(...)
Available disk space on upload filesystem : 23.76%
Collection Status : [COLLECTIONS_HALTED(
 UPLOAD_SYSTEM Threshold (UploadMaxNumberXML: 5000) exceeded with 47173 files)]
Heartbeat Status : Ok
(...)

022. Sometime during upload process agent will thrown something like this on screen.
Take it easy! Continue….

(...)
Event type=UploadReceiver event ts=2014-04-23 17:03:44 eventMetadata=<Events>
 <EventData TimeStamp="2014-04-23 17:03:44" ExceptionClass="ResponseINTERNALException" ExceptionMsg="ORA-20001: publish_metric_error Failed: (target guid = EE7C7C919BD6CF58E142809A89C1101B)(metric guid = 6E27E48E34845112A89F4E9696AF5FCB)(Coll name = sga_pool_wastage_10i)(Event name = sga_pool_wastage)(error_msg = )ORA-20391: [EVEM_ERR_REP_DATE_LT_PREV: :Reported date should not be earlier than reported date of previous event.]"/>
</Events>
Event type=UploadReceiver event ts=2014-04-23 17:03:44 eventMetadata=<Events>
 <EventData TimeStamp="2014-04-23 17:03:44" ExceptionClass="ResponseINTERNALException" ExceptionMsg="ORA-20001: publish_metric_error Failed: (target guid = EE7C7C919BD6CF58E142809A89C1101B)(metric guid = E041AB28B1B36BECDC5F12E7E4F5EFE4)(Coll name = service_10i)(Event name = service)(error_msg = )ORA-20391: [EVEM_ERR_REP_DATE_LT_PREV: :Reported date should not be earlier than reported date of previous event.]"/>
</Events>
Upload timed out before completion.
Number of files to upload before the uploadNow call: 41039, total size (MB): 18.317247
Remaining number of files to upload: 38737, total size (MB): 17.312742 (TIMEOUT)

023. This process take some time….

(...)
Upload timed out before completion.
Number of files to upload before the uploadNow call: 25305, total size (MB): 11.362581
Remaining number of files to upload: 22893, total size (MB): 10.280284 (TIMEOUT)
(...)
Upload timed out before completion.
Number of files to upload before the uploadNow call: 22204, total size (MB): 9.976499
Remaining number of files to upload: 19779, total size (MB): 8.914592 (TIMEOUT)
(...)
Upload timed out before completion.
Number of files to upload before the uploadNow call: 19304, total size (MB): 8.700016
Remaining number of files to upload: 16851, total size (MB): 7.6233826 (TIMEOUT)
(...)
Upload timed out before completion.
Number of files to upload before the uploadNow call: 4551, total size (MB): 2.0984163
Remaining number of files to upload: 2044, total size (MB): 0.959115 (TIMEOUT)

024. This is maybe last time? “EMD upload completed successfully” !!!

[oraagent@SERVER_NAME ~]$ emctl upload agent
Oracle Enterprise Manager Cloud Control 12c Release 2
Copyright (c) 1996, 2012 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
EMD upload completed successfully

025. Check status?

[oraagent@SERVER_NAME ~]$ emctl status agent
Oracle Enterprise Manager Cloud Control 12c Release 2
Copyright (c) 1996, 2012 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
Agent Version : 12.1.0.2.0
OMS Version : 12.1.0.3.0
Protocol Version : 12.1.0.1.0
Agent Home : /u01/app/oraagent/agent12c/agent_inst
Agent Binaries : /u01/app/oraagent/agent12c/core/12.1.0.2.0
Agent Process ID : 16597
Parent Process ID : 16546
Agent URL : https://SERVER_NAME:3873/emd/main/
Repository URL : https://oem12c:4900/empbs/upload
Started at : 2014-04-23 16:57:25
Started by user : oraagent
Last Reload : (none)
Last successful upload : 2014-04-23 17:19:44
Last attempted upload : 2014-04-23 17:19:44
Total Megabytes of XML files uploaded so far : 20.45
Number of XML files pending upload : 0
Size of XML files pending upload(MB) : 0
Available disk space on upload filesystem : 23.92%
Collection Status : Collections enabled
Heartbeat Status : Ok
Last attempted heartbeat to OMS : 2014-04-23 17:19:38
Last successful heartbeat to OMS : 2014-04-23 17:19:38
Next scheduled heartbeat to OMS : 2014-04-23 17:20:38

---------------------------------------------------------------
Agent is Running and Ready

026. Done! In OMS console agent also displayed as available.

026. Manually Upload Metric Data

If not, try to upload data from oms console (Setup -> Manage Cloud Control -> Agents). Click on agent and select button “Upload Metric Data”.

.

.