自学内容网 自学内容网

os kill SMON MMON 加快rollback 进度

Then identify the pid of MMON and kill it using:

  kill -9 <pid of mmon>


Note that killing mmon will not terminate the instance and the new mmon process which starts will use _smu_debug_mode=134217728.

 

A large transaction was killed.

Transaction Recovery is stuck and the database can appear to be hanging or hung:

select usn, state, undoblockstotal "Total", undoblocksdone "Done", undoblockstotal-undoblocksdone "ToDo",
decode(cputime,0,'unknown',sysdate+(((undoblockstotal-undoblocksdone) / (undoblocksdone / cputime)) / 86400)) "Estimated time to complete"
from v$fast_start_transactions;

      USN STATE                 Total       Done       ToDo Estimated time to complete            
---------- ---------------- ---------- ---------- ---------- -----------------------------         
       90 RECOVERED             15669      15669          0 01-OCT-2012 05:52:35                                               
       15 RECOVERING           174954       8137     166817 17-OCT-2012 12:32:07 <<<<<<<<<<                                                                                                             
GNTOUN35>/

      USN STATE                 Total       Done       ToDo Estimated time to complete         
---------- ---------------- ---------- ---------- ---------- ----------------------------- 
       90 RECOVERED             15669      15669          0 01-OCT-2012 05:52:39                                                   
       15 RECOVERING           174954       8137     166817 17-OCT-2012 12:33:33 <<<<<<<<<<<<<<<                      
GNTOUN35>/

      USN STATE                 Total       Done       ToDo Estimated time to complete    
---------- ---------------- ---------- ---------- ---------- -----------------------------             
       90 RECOVERED             15669      15669          0 01-OCT-2012 05:52:40   
       15 RECOVERING           174954       8137     166817 17-OCT-2012 12:33:54   <<<<< see no movement for this

 
SMON trace file shows:

service name: SYS$BACKGROUND
   Current Wait Stack:
    0: waiting for 'row cache lock'
       cache id=0x3, mode=0x0, request=0x5
       wait_id=44059 seq_num=44060 snap_id=1
       wait times: snap=4 min 31 sec, exc=4 min 31 sec, total=4 min 31 sec
       wait times: max=infinite, heur=4 min 31 sec
       wait counts: calls=92 os=92
       in_wait=1 iflags=0x15a2
   There is at least one session blocking this session.
     Dumping 1 direct blocker(s):
       inst: 1, sid: 1, ser: 1
     Dumping final blocker:
       inst: 1, sid: 1, ser: 1                     <=======
   Wait State:
     fixed_waits=0 flags=0x2a boundary=0x35cd32608/0
   Session Wait History:
       elapsed time of 0.000121 sec since current wait

 

Hanganalyze shows:

Chain 1:
-------------------------------------------------------------------------------
   Oracle session identified by:
   {
               instance: 1 X1
                  os id: 26192
             process id: 14, oracle@XXXX (SMON)
             session id: 1345
       session serial #: 1
   }
   is waiting for 'row cache lock' with wait info:
   {
                     p1: 'cache id'=0x3
                     p2: 'mode'=0x0
                     p3: 'request'=0x5
           time in wait: 51 min 58 sec
          timeout after: never
                wait id: 44188
               blocking: 0 sessions
            current sql: <none>
            short stack: ksedsts()+461<-ksdxfstk()+32<-ksdxcb()+1782<-sspuser()+112<-__restore_rt()<-semtimedop()+10
                         <-skgpwwait()+156<-ksliwat()+1821<-kslwaitctx()+162<-kqrget()+851<-kqrLockAndPinPo()+630
                         <-kqrpre1()+765<-kqrpre()+19<-kturRecoverActiveTxns()+3677<-ktprbeg()+313<-ktmmon()+13593
                         <-ktmSmonMain()+206<-ksbrdp()+923<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+266
                         <-ssthrdmain()+214<-main()+201<-__libc_start_main()+244<-_start()+36
           wait history:
             * time between current wait and wait #1: 0.000064 sec
             1.       event: 'db file sequential read'
                time waited: 0.001194 sec
                    wait id: 44187           p1: 'file#'=0x299
                                             p2: 'block#'=0xef544
                                             p3: 'blocks'=0x1
             * time between wait #1 and #2: 0.000138 sec
             2.       event: 'db file sequential read'
                time waited: 0.000609 sec
                    wait id: 44186           p1: 'file#'=0x299
                                             p2: 'block#'=0xef547
                                             p3: 'blocks'=0x1
             * time between wait #2 and #3: 0.000164 sec
             3.       event: 'db file sequential read'
                time waited: 0.003346 sec
                    wait id: 44185           p1: 'file#'=0x299
                                             p2: 'block#'=0xef54a
                                             p3: 'blocks'=0x1
   }
   and is blocked by
=> Oracle session identified by:
   {
               instance: 1 (X1)
                  os id: 26196
             process id: 16, oracle@XXXX (MMON)
             session id: 1                     <===============
       session serial #: 1                     <===============

CHANGES

CAUSE

Transaction recovery was blocked by MMON process and the hanganalyze trace file shows the call stack details:

ktucloGetTxTableUndoRec()+879<-ktucloGetMinPrepStartTime()+453<-ktucloUsMinScn()


If ktucloGetTxTableUndoRec is present in the call stack then it indicates that cleanout optimization feature is enabled and going on
 

SOLUTION

Identify the current value of _smu_debug_mode.
Note it down.


Disable the cleanout optimization feature by setting the following parameter:

  _smu_debug_mode=134217728    (which is 0x8000000)


Dynamically you can do this via:

  alter system set "_smu_debug_mode"=134217728;


Then identify the pid of MMON and kill it using:

  kill -9 <pid of mmon>


Note that killing mmon will not terminate the instance and the new mmon process which starts will use _smu_debug_mode=134217728.


Once the transaction recovery is finished, return _smu_debug_mode to the previous value.

-------method 2------------

SYMPTOMS



Parallel Transaction Recovery is taking too long.

You can use V$TRANSACTION USED_UBLK to estimate how long the rollback is going
to take but there is no formula for this.  If you shutdown the database after
rollback has started, it will begin where it left off.  

You can also look at V$FAST_START_TRANSACTIONS for UNDOBLOCKSDONE
versus UNDOBLOCKSTOTAL.

CHANGES

 A large transaction got killed or rolled back.

CAUSE

There are cases where parallel transaction recovery is not as fast as serial
transaction recovery, because the pq slaves are interfering with each other.  
This depends mainly on the type of changes that need to be made during rollback
and usually may happen when rolling back INDEX Updates in parallel.  

SOLUTION

NOTE: In the images and/or the document content below, the user information and data used represents fictitious data from the Oracle sample schema(s) or Public Documentation delivered with an Oracle database product.  Any similarity to actual persons, living or dead, is purely coincidental and not intended in any manner. 

Dynamically switch from parallel recovery to serial.  If you are in a clustered
environment you will need to do this on all instances at the same time:

1. Find SMON's Oracle PID:

Example:

SQL> select pid, program from v$process where program like '%SMON%';

       PID PROGRAM
---------- ------------------------------------------------
         6 oracle@test1 (SMON)


2. Disable SMON transaction cleanup:

SQL> oradebug setorapid 'SMON's Oracle PID';
SQL> oradebug event 10513 trace name context forever, level 2


3. Kill the PQ slaves that are doing parallel transaction recovery.
You can check V$FAST_START_SERVERS to find these.

e.g.

select SPID from V$PROCESS where PID in (select PID from V$FAST_START_SERVERS);

then kill these processes from the OS level with : kill -9 n

where n is the SPID's returned from the previous statement .


4. Turn off fast_start_parallel_rollback:

alter system set fast_start_parallel_rollback=false;

5. Re-enable transaction recovery as it was disabled in step 2.

SQL> oradebug setorapid 'SMON's Oracle PID';
SQL> oradebug event 10513 trace name context off

--------------并行回滚不如串行回滚,尤其是涉及到index更新相关的回滚(index split?)

SYMPTOMS

Sometimes Parallel Rollback of Large Transaction may become very slow. After killing a large running transaction (either by killing the shadow process or aborting the database) then database seems to hang, or smon and parallel query servers taking all the available cpu. After killing shadow process or aborting the database the v$transaction entry is lost, so you cannot estimate by examining v$transaction.used_ublk how the rollback procedure proceeds. In parallel rollback checking v$lock you see SMON having one TX lock and x (where x is the init.ora -> parallel_max_servers value) PS locks (parallel secondary synchronization lock). Also each Parallel Query process has also 1 PS lock. And a lot of redo log activity is happening. In order to identify if this is your case check the following views: V$fast_start_servers -> provides information about all the recovery secondary performing parallel transaction recovery. This contains one row for each parallel query secondary. Column STATE shows the state of the server being IDLE or RECOVERING. Column UNDOBLOCKSDONE shows the percentage of the assigned work done so far. V$fast_start_transactions -> contains one row for each one of the transactions that Oracle is recovering in Parallel. Important columns here are: USN -> the undo segment number of the transaction (join with v$rollstat.usn and v$rollname.usn to find the rollback segment used, v$rollstat.xacts would show a value larger than 0). UNDOBLOCKSDONE -> the number of undo blocks completed in this transaction UNDOBLOCKSTOTAL -> the total number of undo blocks that need recovery CPUTIME -> the time for which recovery has progressed in seconds You may notice that UNDOBLOCKSDONE is not increasing or increases very slowly.

CHANGES

No changes

CAUSE

In fast-start parallel rollback, the background process Smon acts as a coordinator and rolls back a set of transactions in parallel using multiple server processes.

SOLUTION

If you have fall in the above case you can use

  alter system set fast_start_parallel_rollback = false;

in order to disable parallel rollback.

If it hangs, shutdown database and define it inside init.ora file:

  fast_start_parallel_rollback = false

FAST_START_PARALLEL_ROLLBACK shows the maximum number of processes which may
exist for performing parallel rollback. If the value is false, parallel 
rollback is disabled. If the value is LOW (default), 2 * CPU_COUNT number of
processes may be used. If the value is HIGH, at most 4 * CPU_COUNT number of
rollback servers are used for parallel rollback. 
  

Explanation
-----------

In fast-start parallel rollback, the background process Smon acts as a
coordinator and rolls back a set of transactions in parallel using multiple
server processes.

Fast start parallel rollback is mainly useful when a system has transactions
that run a long time before commiting, especially parallel Inserts, Updates,
Deletes operations.  When SMON discovers that the amount of recovery work is 
above a certain threshold, it automatically begins parallel rollback by 
dispersing the work among several parallel processes.

One special form of fast-start parallel rollback is intra-transaction 
recovery.  Intra-transaction parallelism kicks in only when more than 4 
secondary (parallel_max_servers) are available.
(see bug:819005)[This section is not visible to customers.]
There are cases where parallel transaction recovery is not as fast as serial 
transaction recovery, because the pq secondary are interfering with each other. 
It looks like the changes made by this transaction cannot be recovered in 
parallel without causing a performance problem. The parallel rollback secondary 
processes are most likely contending for the same resource, which results in 
even worse rollback performance compared to a serial rollback.  This depends 
mainly on the type of changes that need to be made during rollback and usually
may happen when rolling back INDEX Updates in parallel.
(see bug:1512513 and bug:1737765)[This section is not visible to customers.]
One way to check this is by dumping some blocks of the online redo logs or 
archive logs produced during the parallel rollback by command:

   alter database dump logfile 'log1SID.dbf' 
   rba min <seq#> . 1 
   rba max <seq#> . 100  -> for 100 blocks of redo log file with sequence <seq#>

If you check the Redo Records you may notice that they are doing actions
with Opcodes 10.x (Index actions) and 5.x (rollback actions).

(see Note:29733.1) [This section is not visible to customers.]


原文地址:https://blog.csdn.net/jnrjian/article/details/140416471

免责声明:本站文章内容转载自网络资源,如本站内容侵犯了原著者的合法权益,可联系本站删除。更多内容请关注自学内容网(zxcms.com)!