关于EBS HANG住的问题

900阅读 0评论2014-04-02 deargentle
分类:Oracle

OS:AIX5.3     EBS:11.5.10.2

EBS跑在两个节点上,一个节点跑APPS,一个节点跑DB.两个节点装了两套系统(一个PROD环境,一个UAT环境).今天客户反映EBS突然死掉,在IE中输入地址,很久打不开页面.重起APACHE后,也进不了WEB界面,FORM界面也不行.但两个节点的另一套系统UAT是正常的.所以排除了网络和HA的问题.DB可以正常连接,ALERT日志中无错误信息.两个节点的文件系统空间,数据库表空间均正常.
我将整个应用重起后,系统恢复正常.但不久后又出现此故障.现在不知道是什么原因引起的.为此我在METALINK上开了TAR。问题解决的过程大致如下:
 
截取的部分APACHE日志:
LOG FILE
----------
[Wed Feb 13 15:09:29 2008] [error] [client 192.168.22.202] File does not exist: /app/applprod/prodcomn/jav
a/oracle/ewt/alert/resource/AlertBundle_zh_US.properties
[Wed Feb 13 15:09:40 2008] [notice] SIGHUP received. Attempting to restart
[Wed Feb 13 15:09:40 2008] [warn] OPM: ADM: Process 459270 killed using SIGKILL
[Wed Feb 13 15:09:40 2008] [warn] OPM: ADM: Process 430978 killed using SIGKILL
[Wed Feb 13 15:09:40 2008] [warn] OPM: ADM: Process 405748 killed using SIGKILL
[Wed Feb 13 15:09:40 2008] [notice] FastCGI: process manager initialized (pid 238512)
[Wed Feb 13 15:24:43 2008] [error] OPM: EW: Fail to start process with mod=JServ and grp=DiscoGroup,
it's possible that your configuration file is not correct.
[Wed Feb 13 15:24:43 2008] [error] OPM: EW: Fail to start process with mod=JServ and grp=OACoreGroup,
it's possible that your configuration file is not correct.
[Wed Feb 13 15:24:43 2008] [error] OPM: EW: Fail to start process with mod=JServ and grp=XmlSvcsGrp,
it's possible that your configuration file is not correct.
[Wed Feb 13 15:46:50 2008] [crit] (67)The socket name is already in use.: make_sock: could not bind
to port 8100
[Wed Feb 13 16:12:18 2008] [warn] pid file /app/applprod/prodora/iAS/Apache/Apache/logs/httpd.pid overwritten -- Unclean shutdown of previous Apach
e run?
[Wed Feb 13 16:12:18 2008] [notice] FastCGI: process manager initialized (pid 323890)
[Wed Feb 13 16:12:19 2008] [notice] Oracle HTTP Server Powered by Apache/1.3.19 configured -- resuming normal operations
[Wed Feb 13 16:12:19 2008] [error] [client 192.168.22.160] client denied by server configuration: /app/app
lprod/prodcomn/java/oracle/ewt
[Wed Feb 13 16:12:19 2008] [error] [client 192.168.22.171] client denied by server configuration: /app/applprod/prodcomn/java/ora
cle/ewt
[Wed Feb 13 16:12:19 2008] [error] [client 192.168.4.173] client denied by server configuration: /app/applprod/prodcomn/java/oracle/ewt
[Wed Feb 13 16:12:19 2008] [error] OPM:Can not find one alive process
[Wed Feb 13 16:12:19 2008] [error] [client 192.168.2.97] File does not exist:
[Wed Feb 13 16:12:20 2008] [error] OPM:Can not find one alive process
[Wed Feb 13 16:12:20 2008] [error] [client 192.168.2.97] File does not exist:
SUPPORT的判断是系统过载,JVM不能为新登陆的用户分配新的进程。同时提供了两篇文章给我做参考。原文如下:
Hi Jianhui,

I have gone through the log file uploaded. Seems like whenever the system load reaches high and crosses the load that can be handl
ed , jvm is unable to allocate new process for the newly logged in user.

Please see the below the following blogs in order to configure jvms according to the
system load.

http://blogs.oracle.com/schan/2006/08/01#a494

http://blogs.oracle.com/schan/2006/10/19#a844 -- jvm out of memory errors
 
其中对JVMS的数量有个大致的估值:
OACoreGroup
DiscoGroup
FormsGroup
XmlSvcsGrp

当然JVM的数量也跟系统资源限制有关。In addition to this, Oracle generally recommends no more than 2 JVMs per CPU.    You also need to confirm there are enough operating system resources (e.g. physical memory) to cope with any additional JVMs.

Here are a couple of quick-and-dirty tools that might be useful in sizing your JVMs.

Script to determine "active users" for OACoreGroup

REM
REM SQL to count number of Apps 11i users
REM Run as APPS user
REM
select 'Number of user sessions : ' || count( distinct session_id) How_many_user_sessions
from icx_sessions icx
where disabled_flag != 'Y'
and PSEUDO_FLAG = 'N'
and (last_connect + decode(FND_PROFILE.VALUE('ICX_SESSION_TIMEOUT'), NULL,limit_time, 0,limit_time,FND_PROFILE.VALUE('ICX_SESSION_TIMEOUT')/60)/24) > sysdate   
and counter < limit_connects;
REM
REM END OF SQL
REM

How to determine "active forms users" for FormsGroup

Check the number of f60webmx processes on the Middle Tier server.  For example:

ps -ef | grep f60webx | wc -l

Conclusion

基于用户数的考虑,我增加了OACoreGroup的设置。现在系统一直稳定运行。
上一篇:为history命令加上时间
下一篇:CentOS用yum安装X Window