昨天机房搞网络灾备切换演练,核心数据库在网络中断恢复之后,session数量突然增高,一般数据库的sessoin在475左右,突然接近600个会话,
因为网络中断之后,会话没有及时释放资源,
导致会话数量居高不下,下面是解决过程:
(注:因为昨天在处理的的时候,来不及整理现场操作的资料,现在虚拟机上实验证明一下,思路是一样的)
1.测试用户xiangyang,开启4个会话连接数据库(真实环境一般是用JDBC_Thin_Client程序连接数据库):
D:\Documents and Settings\Administrator>sqlplus /nolog
SQL*Plus: Release 9.2.0.8.0 - Production on 星期日 3月 13 21:35:31 2011
Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.
SQL> conn xiangyang/xiangyang
已连接。
SQL>
D:\Documents and Settings\Administrator>sqlplus /nolog
已连接。
SQL>
D:\Documents and Settings\Administrator>sqlplus /nolog
SQL*Plus: Release 9.2.0.8.0 - Production on 星期日 3月 13 21:35:31 2011
Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.
SQL> conn xiangyang/xiangyang
已连接。
SQL>
D:\Documents and Settings\Administrator>sqlplus /nolog
已连接。
SQL>
D:\Documents and Settings\Administrator>sqlplus /nolog
SQL*Plus: Release 9.2.0.8.0 - Production on 星期日 3月 13 21:35:31 2011
Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.
SQL> conn xiangyang/xiangyang
已连接。
SQL>
D:\Documents and Settings\Administrator>sqlplus /nolog
已连接。
SQL>
D:\Documents and Settings\Administrator>sqlplus /nolog
SQL*Plus: Release 9.2.0.8.0 - Production on 星期日 3月 13 21:35:31 2011
Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.
SQL> conn xiangyang/xiangyang
已连接。
SQL>
已连接。
SQL>
2.查看该用户在数据库的连接数量:
select username,program,terminal,machine,count(1) from v$session where username='XIANGYANG'
group by username,program,terminal,machine order by 5 desc
group by username,program,terminal,machine order by 5 desc
| USERNAME | PROGRAM | TERMINAL | MACHINE | COUNT(1) |
| XIANGYANG | sqlplus.exe | SUN-001 | WORKGROUP\SUN-001 | 4 |
用户名为xiangyang连在数据库的会话数量是4个.
3.查看xiangyang用户的会话数量的详细信息:
select b.*, 'alter system kill session' || '''' || b.sid || ',' || b.SERIAL# || '''' || ';'
from v$session b
where b.type not like 'BACKGROUND'
and b.username='XIANGYANG'
from v$session b
where b.type not like 'BACKGROUND'
and b.username='XIANGYANG'
SID SERIAL# PADDR USERNAME STATUS OSUSER PROCESS MACHINE TERMINAL PROGRAM TYPE
19 78 2A243054 XIANGYANG INACTIVE SUN-001\Administrator WORKGROUP\SUN-001 SUN-001 sqlplus.exe 2011-3-13 21:35 891 alter system kill session'19,78';
20 38 2A24343C XIANGYANG INACTIVE SUN-001\Administrator WORKGROUP\SUN-001 SUN-001 sqlplus.exe 2011-3-13 21:35 873 alter system kill session'20,38';
21 30 2A243824 XIANGYANG INACTIVE SUN-001\Administrator WORKGROUP\SUN-001 SUN-001 sqlplus.exe 2011-3-13 21:36 858 alter system kill session'21,30';
23 35 2A243FF4 XIANGYANG INACTIVE SUN-001\Administrator WORKGROUP\SUN-001 SUN-001 sqlplus.exe 2011-3-13 21:36 837 alter system kill session'23,35';
由于会话数量过多,kill掉会话:
19 78 2A243054 XIANGYANG INACTIVE SUN-001\Administrator WORKGROUP\SUN-001 SUN-001 sqlplus.exe 2011-3-13 21:35 891 alter system kill session'19,78';
20 38 2A24343C XIANGYANG INACTIVE SUN-001\Administrator WORKGROUP\SUN-001 SUN-001 sqlplus.exe 2011-3-13 21:35 873 alter system kill session'20,38';
21 30 2A243824 XIANGYANG INACTIVE SUN-001\Administrator WORKGROUP\SUN-001 SUN-001 sqlplus.exe 2011-3-13 21:36 858 alter system kill session'21,30';
23 35 2A243FF4 XIANGYANG INACTIVE SUN-001\Administrator WORKGROUP\SUN-001 SUN-001 sqlplus.exe 2011-3-13 21:36 837 alter system kill session'23,35';
由于会话数量过多,kill掉会话:
alter system kill session'19,78';
alter system kill session'20,38';
alter system kill session'21,30';
alter system kill session'23,35';
4.查看会话状态:
select sid,serial#,paddr,username,status from v$session where status='KILLED';
SID SERIAL# PADDR USERNAME STATUS
19 78 2A263E30 XIANGYANG KILLED
20 38 2A263E30 XIANGYANG KILLED
21 30 2A263E30 XIANGYANG KILLED
23 35 2A263E30 XIANGYANG KILLED
19 78 2A263E30 XIANGYANG KILLED
20 38 2A263E30 XIANGYANG KILLED
21 30 2A263E30 XIANGYANG KILLED
23 35 2A263E30 XIANGYANG KILLED
5.但是会话所占用的资源并没有释放,也就是还独占了连接数:
select username,program,terminal,machine,count(1) from v$session where username='XIANGYANG'
group by username,program,terminal,machine order by 5 desc
group by username,program,terminal,machine order by 5 desc
| USERNAME | PROGRAM | TERMINAL | MACHINE | COUNT(1) |
| XIANGYANG | sqlplus.exe | SUN-001 | WORKGROUP\SUN-001 | 4 |
用户XIANGYANG持有的会话数仍然是4个,资源并没有释放.
需要kill掉会话所在的操作
系统进程,使用下面的SQL找出会话的进程:
select spid from v$process where addr in(select paddr from v$session where status='KILLED');
SPID
-------
没有任何显示,说明状态为KILL的会话的进程找不到了,通过v$process.addr=v$session
.paddr的方式.什么原因?
6.查看一下原因:
先看视图v$session
select s.username,s.sid,s.serial#,s.paddr from v$session s where username='XIANGYANG' and s.status='KILLED';
| USERNAME | SID | SERIAL# | PADDR |
| XIANGYANG | 19 | 78 | 2A263E30 |
| XIANGYANG | 20 | 38 | 2A263E30 |
| XIANGYANG | 21 | 30 | 2A263E30 |
| XIANGYANG | 23 | 35 | 2A263E30 |
v$session.paddr为2A263E30,状态为KILLED的,用户名为XIANGYANG的4个会话的进程地址为同样的
PADDR:2A263E30,说明是同时指向了另一个地址,也许是虚拟地址,现在没有考证.但是可以证明一点就是
4个session的进程地址通过v$session.paddr的方式找不到了.
再看视图v$process记录的会话进程地址:
select p.addr from v$process p where pid <> 1
ADDR
2A23F9A4
2A23FD8C
2A240174
2A24055C
2A240944
2A240D2C
2A241114
2A2414FC
2A2418E4
2A241CCC
2A2420B4
2A24249C
2A242884
2A242C6C
2A243054
2A24343C
2A243824
2A243C0C
2A243FF4
2A2443DC
2A23FD8C
2A240174
2A24055C
2A240944
2A240D2C
2A241114
2A2414FC
2A2418E4
2A241CCC
2A2420B4
2A24249C
2A242884
2A242C6C
2A243054
2A24343C
2A243824
2A243C0C
2A243FF4
2A2443DC
20 rows selected
这20个会话的进程地址中间肯定有状态为KILLED会话的进程地址
可以通过一条SQL,把状态为KILLED的会话进程找出来,这里要使用minus这个SQL连接符号.
select p.addr from v$process p where pid <> 1
minus
select s.paddr from v$session s;
minus
select s.paddr from v$session s;
ADDR
2A243054
2A24343C
2A243824
2A243FF4
2A24343C
2A243824
2A243FF4
以上这4个进程地址就是状态为KILLED会话的进程地址.
通过minus把v$process拥有的addr,但是v$session没有的paddr找出来了.因为在v$session的paddr
有4个相同的paddr,其实就是替换了上面4个真实的进程地址.
7.现在就可以kill掉状态为KILLED的会话的进程,真正的释放资源了.
通过下面的脚本找出状态为KILLED会话的spid:
select 'kill -9 ' || spid||';',addr
from v$process
where --addr in (select paddr from v$session where username='XIANGYANG');
addr in (select p.addr
from v$process p
where pid <> 1
minus
select s.paddr from v$session s);
from v$process
where --addr in (select paddr from v$session where username='XIANGYANG');
addr in (select p.addr
from v$process p
where pid <> 1
minus
select s.paddr from v$session s);
UNIX/LINUX系统:
killed ADDR
kill -9 4960; 2A243054
kill -9 3732; 2A24343C
kill -9 5036; 2A243824
kill -9 3732; 2A24343C
kill -9 5036; 2A243824
kill -9 4952; 2A243FF4
kill -9 4952; 2A243FF4
windows系统:
orakill 实例sid spid
orakill westdb 4960
D:\Documents and Settings\Administrator>orakill westdb 4960
Kill of thread id 4960 in instance westdb successfully signalled.
D:\Documents and Settings\Administrator>orakill westdb 3732
Kill of thread id 3732 in instance westdb successfully signalled.
D:\Documents and Settings\Administrator>orakill westdb 5036
Kill of thread id 5036 in instance westdb successfully signalled.
D:\Documents and Settings\Administrator>orakill westdb 4952
Kill of thread id 4952 in instance westdb successfully signalled.
9.资源得到释放,问题得到解决.
select username,program,terminal,machine,count(1) from v$session where username='XIANGYANG'
group by username,program,terminal,machine order by 5 desc
| USERNAME | PROGRAM | TERMINAL | MACHINE | COUNT(1) |
-----------------------------------------------------
没有任何显示,说明资源已经释放,
至此问题解决.