os kill SMON MMON 加快rollback 进度

🕗 发布于 2024-07-19 00:07 oracle

Then identify the pid of MMON and kill it using:

kill -9 <pid of mmon>

Note that killing mmon will not terminate the instance and the new mmon process which starts will use _smu_debug_mode=134217728.

A large transaction was killed.

Transaction Recovery is stuck and the database can appear to be hanging or hung:

select usn, state, undoblockstotal "Total", undoblocksdone "Done", undoblockstotal-undoblocksdone "ToDo",
decode(cputime,0,'unknown',sysdate+(((undoblockstotal-undoblocksdone) / (undoblocksdone / cputime)) / 86400)) "Estimated time to complete"
from v$fast_start_transactions;

      USN STATE                 Total       Done       ToDo Estimated time to complete
---------- ---------------- ---------- ---------- ---------- -----------------------------
       90 RECOVERED             15669      15669          0 01-OCT-2012 05:52:35
       15 RECOVERING           174954       8137     166817 17-OCT-2012 12:32:07 <<<<<<<<<<
GNTOUN35>/

      USN STATE                 Total       Done       ToDo Estimated time to complete
---------- ---------------- ---------- ---------- ---------- -----------------------------
       90 RECOVERED             15669      15669          0 01-OCT-2012 05:52:39
       15 RECOVERING           174954       8137     166817 17-OCT-2012 12:33:33 <<<<<<<<<<<<<<<
GNTOUN35>/

      USN STATE                 Total       Done       ToDo Estimated time to complete
---------- ---------------- ---------- ---------- ---------- -----------------------------
       90 RECOVERED             15669      15669          0 01-OCT-2012 05:52:40
       15 RECOVERING           174954       8137     166817 17-OCT-2012 12:33:54   <<<<< see no movement for this

SMON trace file shows:

service name: SYS$BACKGROUND
   Current Wait Stack:
    0: waiting for 'row cache lock'
       cache id=0x3, mode=0x0, request=0x5
       wait_id=44059 seq_num=44060 snap_id=1
       wait times: snap=4 min 31 sec, exc=4 min 31 sec, total=4 min 31 sec
       wait times: max=infinite, heur=4 min 31 sec
       wait counts: calls=92 os=92
       in_wait=1 iflags=0x15a2
   There is at least one session blocking this session.
     Dumping 1 direct blocker(s):
       inst: 1, sid: 1, ser: 1
     Dumping final blocker:
       inst: 1, sid: 1, ser: 1                     <=======
   Wait State:
     fixed_waits=0 flags=0x2a boundary=0x35cd32608/0
   Session Wait History:
       elapsed time of 0.000121 sec since current wait

Hanganalyze shows:

Chain 1:
-------------------------------------------------------------------------------
   Oracle session identified by:
   {
               instance: 1 X1
                  os id: 26192
             process id: 14, oracle@XXXX (SMON)
             session id: 1345
       session serial #: 1
   }
   is waiting for 'row cache lock' with wait info:
   {
                     p1: 'cache id'=0x3
                     p2: 'mode'=0x0
                     p3: 'request'=0x5
           time in wait: 51 min 58 sec
          timeout after: never
                wait id: 44188
               blocking: 0 sessions
            current sql: <none>
            short stack: ksedsts()+461<-ksdxfstk()+32<-ksdxcb()+1782<-sspuser()+112<-__restore_rt()<-semtimedop()+10
                         <-skgpwwait()+156<-ksliwat()+1821<-kslwaitctx()+162<-kqrget()+851<-kqrLockAndPinPo()+630
                         <-kqrpre1()+765<-kqrpre()+19<-kturRecoverActiveTxns()+3677<-ktprbeg()+313<-ktmmon()+13593
                       <-ktmSmonMain()+206<-ksbrdp()+923<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+266
                       <-ssthrdmain()+214<-main()+201<-__libc_start_main()+244<-_start()+36
           wait history:
             * time between current wait and wait #1: 0.000064 sec
             1.       event: 'db file sequential read'
                time waited: 0.001194 sec
                    wait id: 44187           p1: 'file#'=0x299
                                             p2: 'block#'=0xef544
                                             p3: 'blocks'=0x1
             * time between wait #1 and #2: 0.000138 sec
             2.       event: 'db file sequential read'
                time waited: 0.000609 sec
                    wait id: 44186           p1: 'file#'=0x299
                                             p2: 'block#'=0xef547
                                             p3: 'blocks'=0x1
             * time between wait #2 and #3: 0.000164 sec
             3.       event: 'db file sequential read'
                time waited: 0.003346 sec
                    wait id: 44185           p1: 'file#'=0x299
                                             p2: 'block#'=0xef54a
                                             p3: 'blocks'=0x1
   }
   and is blocked by
=> Oracle session identified by:
   {
               instance: 1 (X1)
                  os id: 26196
             process id: 16, oracle@XXXX (MMON)
             session id: 1                     <===============
       session serial #: 1                     <===============

CHANGES

CAUSE

Transaction recovery was blocked by MMON process and the hanganalyze trace file shows the call stack details:

ktucloGetTxTableUndoRec()+879<-ktucloGetMinPrepStartTime()+453<-ktucloUsMinScn()

If ktucloGetTxTableUndoRec is present in the call stack then it indicates that cleanout optimization feature is enabled and going on

SOLUTION

Identify the current value of _smu_debug_mode.
Note it down.

Disable the cleanout optimization feature by setting the following parameter:

_smu_debug_mode=134217728 (which is 0x8000000)

Dynamically you can do this via:

alter system set "_smu_debug_mode"=134217728;

Then identify the pid of MMON and kill it using:

kill -9 <pid of mmon>

Note that killing mmon will not terminate the instance and the new mmon process which starts will use _smu_debug_mode=134217728.

Once the transaction recovery is finished, return _smu_debug_mode to the previous value.

-------method 2------------

SYMPTOMS

Parallel Transaction Recovery is taking too long.

You can use V$TRANSACTION USED_UBLK to estimate how long the rollback is going
to take but there is no formula for this. If you shutdown the database after
rollback has started, it will begin where it left off.

You can also look at V$FAST_START_TRANSACTIONS for UNDOBLOCKSDONE
versus UNDOBLOCKSTOTAL.

CHANGES

A large transaction got killed or rolled back.

CAUSE

There are cases where parallel transaction recovery is not as fast as serial
transaction recovery, because the pq slaves are interfering with each other.
This depends mainly on the type of changes that need to be made during rollback
and usually may happen when rolling back INDEX Updates in parallel.

SOLUTION

NOTE: In the images and/or the document content below, the user information and data used represents fictitious data from the Oracle sample schema(s) or Public Documentation delivered with an Oracle database product. Any similarity to actual persons, living or dead, is purely coincidental and not intended in any manner.

Dynamically switch from parallel recovery to serial. If you are in a clustered
environment you will need to do this on all instances at the same time:

1. Find SMON's Oracle PID:

Example:

SQL> select pid, program from v$process where program like '%SMON%';

PID PROGRAM
---------- ------------------------------------------------
6 oracle@test1 (SMON)

2. Disable SMON transaction cleanup:

SQL> oradebug setorapid 'SMON's Oracle PID';
SQL> oradebug event 10513 trace name context forever, level 2

3. Kill the PQ slaves that are doing parallel transaction recovery.
You can check V$FAST_START_SERVERS to find these.

e.g.

select SPID from V$PROCESS where PID in (select PID from V$FAST_START_SERVERS);

then kill these processes from the OS level with : kill -9 n

where n is the SPID's returned from the previous statement .

4. Turn off fast_start_parallel_rollback:

alter system set fast_start_parallel_rollback=false;

5. Re-enable transaction recovery as it was disabled in step 2.

SQL> oradebug setorapid 'SMON's Oracle PID';
SQL> oradebug event 10513 trace name context off

--------------并行回滚不如串行回滚，尤其是涉及到index更新相关的回滚（index split？）

SYMPTOMS

Sometimes Parallel Rollback of Large Transaction may become very slow. After killing a large running transaction (either by killing the shadow process or aborting the database) then database seems to hang, or smon and parallel query servers taking all the available cpu. After killing shadow process or aborting the database the v$transaction entry is lost, so you cannot estimate by examining v$transaction.used_ublk how the rollback procedure proceeds. In parallel rollback checking v$lock you see SMON having one TX lock and x (where x is the init.ora -> parallel_max_servers value) PS locks (parallel secondary synchronization lock). Also each Parallel Query process has also 1 PS lock. And a lot of redo log activity is happening. In order to identify if this is your case check the following views: V$fast_start_servers -> provides information about all the recovery secondary performing parallel transaction recovery. This contains one row for each parallel query secondary. Column STATE shows the state of the server being IDLE or RECOVERING. Column UNDOBLOCKSDONE shows the percentage of the assigned work done so far. V$fast_start_transactions -> contains one row for each one of the transactions that Oracle is recovering in Parallel. Important columns here are: USN -> the undo segment number of the transaction (join with v$rollstat.usn and v$rollname.usn to find the rollback segment used, v$rollstat.xacts would show a value larger than 0). UNDOBLOCKSDONE -> the number of undo blocks completed in this transaction UNDOBLOCKSTOTAL -> the total number of undo blocks that need recovery CPUTIME -> the time for which recovery has progressed in seconds You may notice that UNDOBLOCKSDONE is not increasing or increases very slowly.

CHANGES

No changes

CAUSE

In fast-start parallel rollback, the background process Smon acts as a coordinator and rolls back a set of transactions in parallel using multiple server processes.

SOLUTION

If you have fall in the above case you can use

  alter system set fast_start_parallel_rollback = false;

in order to disable parallel rollback.

If it hangs, shutdown database and define it inside init.ora file:

  fast_start_parallel_rollback = false

FAST_START_PARALLEL_ROLLBACK shows the maximum number of processes which may
exist for performing parallel rollback. If the value is false, parallel 
rollback is disabled. If the value is LOW (default), 2 * CPU_COUNT number of
processes may be used. If the value is HIGH, at most 4 * CPU_COUNT number of
rollback servers are used for parallel rollback. 
  

Explanation
-----------

In fast-start parallel rollback, the background process Smon acts as a
coordinator and rolls back a set of transactions in parallel using multiple
server processes.

Fast start parallel rollback is mainly useful when a system has transactions
that run a long time before commiting, especially parallel Inserts, Updates,
Deletes operations.  When SMON discovers that the amount of recovery work is 
above a certain threshold, it automatically begins parallel rollback by 
dispersing the work among several parallel processes.

One special form of fast-start parallel rollback is intra-transaction 
recovery.  Intra-transaction parallelism kicks in only when more than 4 
secondary (parallel_max_servers) are available.

(see bug:819005)[This section is not visible to customers.]

There are cases where parallel transaction recovery is not as fast as serial 
transaction recovery, because the pq secondary are interfering with each other. 
It looks like the changes made by this transaction cannot be recovered in 
parallel without causing a performance problem. The parallel rollback secondary 
processes are most likely contending for the same resource, which results in 
even worse rollback performance compared to a serial rollback.  This depends 
mainly on the type of changes that need to be made during rollback and usually
may happen when rolling back INDEX Updates in parallel.

(see bug:1512513 and bug:1737765)[This section is not visible to customers.]

One way to check this is by dumping some blocks of the online redo logs or 
archive logs produced during the parallel rollback by command:

   alter database dump logfile 'log1SID.dbf' 
   rba min <seq#> . 1 
   rba max <seq#> . 100  -> for 100 blocks of redo log file with sequence <seq#>

If you check the Redo Records you may notice that they are doing actions
with Opcodes 10.x (Index actions) and 5.x (rollback actions).

(see Note:29733.1) [This section is not visible to customers.]

原文地址：https://blog.csdn.net/jnrjian/article/details/140416471

免责声明：本站文章内容转载自网络资源，如本站内容侵犯了原著者的合法权益，可联系本站删除。更多内容请关注自学内容网（zxcms.com）！

机器学习-梯度下降实验一
【代码】机器学习-梯度下降实验一。
阅读更多2024-09-20
Java——Static与final修饰的变量与方法(总结)
Java语法学过一遍之后，我相信大多数和我一样脑瓜子嗡嗡的，甚至有点乱了，这时候应该自己把之前的能总结的，或者不熟悉的都要总结一遍，以便于后期的学习！！
阅读更多2024-09-20
ffmpeg 拉流
保存为视频# 保存图片查看解码器：
阅读更多2024-09-20
sourceTree保姆级教程7：（合并某次提交）
在日常开发过程中，大家有时候并非都是在同一个分支进行开发，可能存在多人的情况下开发。切换到master分支，可以看到master分支落后 develop分支好多版本了，test2.txt文件对于mas
阅读更多2024-09-20
事业环境因素和组织过程资产的区别
在项目管理中，事业环境因素和组织过程资产是两个重要的概念，它们对项目的成功有着重要的影响。虽然它们都对项目执行产生影响，但它们存在本质区别。
阅读更多2024-09-20
MATLAB统计和机器学习工具箱：数据分析与建模的利器
MATLAB的统计和机器学习工具箱（Statistics and Machine Learning Toolbox）是一个功能强大的数据分析和建模工具，它为数据科学家、工程师和研究人员提供了一系列的函
阅读更多2024-09-20
配网缺陷检测无人机航拍图像数据集（不规范绑扎，螺栓销钉缺失）数据集总共3000张左右，标注为voc格式
配网缺陷检测无人机航拍图像数据集，（不规范绑扎，螺栓销钉缺失）数据集总共3000张左右，标注为voc格式
阅读更多2024-09-20
基于＂WT2605C的智能血压计：AI对话引领个性化健康管理新时代，健康守护随时在线
在当今快节奏的生活中，健康管理已成为我们日常不可或缺的一部分。随着科技的进步，智能设备正逐步融入我们的日常生活，为健康管理带来前所未有的便捷与智能化。今天，让我们共同探索WT2605C AI在线方案如
阅读更多2024-09-20
基于大语言模型（LLM）的人工智能能否引领新的工业革命？
如果我们的时代是下一个工业革命，正如许多人所声称的那样，人工智能无疑是其驱动力之一。” – 李飞飞《纽约时代周刊》专栏文章距离 OpenAI 开发的划时代人工智能产品 [ChatGPT] 于 2022
阅读更多2024-09-20
qt七个按钮进行互斥
qt七个按钮进行互斥
阅读更多2024-09-20

os kill SMON MMON 加快rollback 进度

CHANGES

CAUSE

SOLUTION

SYMPTOMS

CHANGES

CAUSE

SOLUTION

SYMPTOMS

CHANGES

CAUSE

SOLUTION

相关文章