Logging

Introduction

Log is used to record the history of database changes securely.

The transaction is the unit of execution of database operations.

In the typical embedded SQL system, transactions begin as soon as operations on the database are executed and end with an explicit COMMIT or ROLLBACK (“abort”) command.

How transactions interact with the database? There are 3 address spaces that interact in important ways:

  1. The space of disk blocks holding the database elements.
  2. The virtual or main memory address space that is managed by the buffer manager.
  3. The local address space of the transaction.

事务的定义:

  • 一个数据库操作序列
  • 一个不可分割的工作单位(atomic)
  • 恢复和并发控制的基本单位

The SQL statement COMMIT causes a transaction to complete, database modifications are now permanent in the database.

The SQL statement ROLLBACK also causes the transaction to end, but by aborting. Having no effects on the database. It caused by failures like division by 0 or a constraint violation.

ACID

ACID transactions are:

  • Atomic: Either the whole or the none of the transaction is done.
  • Consistent: Database constraints reserved. For example, a + b = 10, if a changed, b also changes.
  • Isolated: It appears to the user that only one process is executing at one time.
  • Durable: Effects of a process survive a crash.

Correctness Principle

If a transaction executes in the absence of any other transactions or system errors, and it starts with the database in a consistent state, then the database is also in a consistent state when the transaction ends.

State: a DB has state means a value for each of its elements.

Consistent state: certain state satisfies all constraints.

Another description of correctness principle:

  1. a transaction is atomic, and the complete execution of a single transaction is from one consistent state to another consistent state. That is to say, if only part of a transaction executes, then db state may not be consisitent.
  2. multiple transactions execute at the same time may lead to inconsistency unless using concurrecy control.

So, a transaction is a collection of actions that preserve consistency.

If T starts with consistent state and executes in isolation, then T leaves with consistent state.

Three address spaces:

  1. Local address space of transaction (Memory)
  2. Buffer (Memory)
  3. Space of disk

INPUT(X): block containing X disk to buffer/memory

OUTPUT(X): block containing X buffer to disk

READ(X, t): do input(X) if necessary; read value of x in buffer to t in local address space of transaction.

WRITE(X, t): do output(X) if necessary; write t in local address space of transaction to the value of x in buffer.

Undo Logging

: indicates that transaction T has begun

: transaction T has completed successfully and will make no more changes to the database elements

: transcation T could not complete successfully

only update record:

  • transaction T has changed database element X and its former value was v

The change reflected by an update record normally occurs in memory, not disk.

  • the log record is a response to a WRITE action, not an OUTPUT action

undo log must be written to disk in the following order:

  1. The log records that indicating changed database elements.
  2. The changed database elements themselves.
  3. The COMMIT log record.

So we got 2 rules:

U1:

U2:

In order to force log records to disk:

  • The log manager needs a flush-log command that tells the buffer manager to copy to disk any log blocks that have not previously been copied to disk.
  • FLUSH LOG

When a transaction begins, a is written to the log memory; after a series of actions like INPUT or READ (or probably no actions at all), a WRITE command comes, and a is written to the log memory. After the last WRITE operation, it’s time to FLUSH LOG. After flushing log, and are written to the disk. That is to say, before flushing log, the previous logs are still in the log memory.

Then, execute OUTPUT(A), OUTPUT(B), … Only after the value of A, B, … is written to disk, a log can be written to the log memory.

Then FLUSH LOG (after execute OUTPUT B, let’s assume that B is the last one) and is written to disk.

Recovery Rules

Incomplete transactions must be undone!
Scan the undo log from the end:

  • Let S = set if transactions with in log but no or record in log

  • For each in log, in reverse order (latest to earliest) do:

    If Ti in S then WRITE(X, v), OUTPUT(X)

  • For each Ti in S do: write to log

Simple checkpointing

In a simple checkpoint,

  1. Stop accepting new transactions.

  2. Wait until all currently active transactions commit or abort and have written a COMMIT or ABORT record on the log.

  3. Flush the log to disk.

  4. Write a log record , and flush the log again.

  5. Resume accepting transactions.

Problem: must shut the system while the ckpt is being made

Nonquiescent checkpointing

Nonquiescent checkpointing

  1. Write a log record

    and flush the log.

  2. Wait until all of T1, …, TK commit or abort, but do not prohibit other transactions from starting

  3. When all of T1, …, TK have completed, write a log record and flush the log

Two cases:

  1. If we first meet an , then we know that all incomplete transactions began after the previous . Scan backwards as far as

  2. If we first meet an , then the crash occurred during the checkpoint

A general rule:

Once an has been written to disk, we can delete the log prior to the previous .

Redo Logging

Problem for undo logging: First write all its changed data to disk then commit a transaction — too many disk I/Os.

Save disk I/Os: Let changed data reside only in main memory for a while.

Redo logging requires the COMMIT record appear on disk before any changed values reach disk.

Redo logging rules

R1: Before modifying any database element X on disk, it is necessary that all log records pertaining to this modification of X, including both the update record and the record, must appear on disk.

  1. The log records indicating changed db elements.
  2. The COMMIT log record.
  3. The changed db elements themselves.

First COMMIT then OUTPUT.

To recover, using a redo log, after a system crash:

  1. Identify the committed transactions.

  2. Scan the log forward from the beginning. For each log record <T, X, v> encountered:

    a) If T is not a committed transaction, do nothing.

    b) If T is committed, write value v for database element X. (That’s because no COMMIT no values on disk)

  3. For each incomplete transaction T, write an record to the log and flush the log.

The steps to perform a nonquiescent checkpoint of redo log:

  1. Write a log record , where T1, …, Tk are all the active (uncommitted) transactions, and flush the log

  2. Write to disk all database elements that were written to buffers but not yet to disk by transactions that had already committed when the START CKPT record was written to the log

  3. Write an record to the log and flush the log

Undo/redo Logging

The update log record:

  • the former value is v, the new value is w

UR1: Before modifying any db element X on disk because of changes made by some transaction T, it is necessary that the update record appear on disk.

UR2: A record must be flushed to disk as soon as it appears in the log.

The undo/redo recovery policy is:

  1. Redo all the commited transactions in the order earliest-first.
  2. Undo all the incomplete transactions in the order latest-first.

系统故障造成数据库不一致状态的原因

未完成事务对数据库的更新已写入数据库

已提交事务对数据库的更新还留在缓冲区没来得及写入数据库

恢复方法

\1. Undo 故障发生时未完成的事务

\2. Redo 已完成的事务

系统故障的恢复由系统在重新启动时自动完成,不需要用户干预

系统故障的恢复步骤

  1. 正向扫描日志文件(即从头扫描日志文件)

重做(REDO) 队列: 在故障发生前已经提交的事务

这些事务既有BEGIN TRANSACTION记录,也有COMMIT记录

撤销 (Undo)队列:故障发生时尚未完成的事务

这些事务只有BEGIN TRANSACTION记录,无相应的COMMIT记录

  1. 对撤销(Undo)队列事务进行撤销(UNDO)处理

反向扫描日志文件,对每个UNDO事务的更新操作执行逆操作

即将日志记录中“更新前的值”写入数据库

  1. 对重做(Redo)队列事务进行重做(REDO)处理

正向扫描日志文件,对每个REDO事务重新执行登记的操作

即将日志记录中“更新后的值”写入数据库

A nonquiescent checkpoint is somewhat simpler for undo/redo logging than for other logging methods

  1. Write a record to the log, where T1, …, Tk are all the active transactions, and flush the log

  2. Write to disk all the buffers that are dirty; i.e., they contain one or more changed database elements. Unlike redo logging, we flush all dirty buffers, not just those written by committed transactions

  3. Write an record to the log and flush the log

Archiving

Maintaining a copy of the db separate from the db itself.

  • shut down the db for a while
  • make a backup copy on some storage medium
  • Store the copy in some remote secure location

Two levels of archiving:

  1. a full dump, in which the entire db is copied
  2. an incremental dump, in which only those db elements changed since the previous full or incremental dump are copied

Concurrency Control

The process of assuring that transactions preserve consistency when executing simultaneously.

The scheduler takes read/write requests from transactions and executes them in buffers or delay them.

Serial and Serializable Schedules

A schedule is serial if its actions consist of all the actions of one transaction, then all the actions of another transaction, and so on.

The correctness principle tells us that every serial schedule will preserve consistency of the db state.

A schedule is serializable if there is a serial schedule S’ such that for every initial db state, the effects of S and S’ are the same.

Conflicts: A pair of consecutive actions in a schedule such that, if their order is interchanged then the behaviour of at least one of the transactions involved can change.

Author

preccrep

Posted on

2021-06-24

Updated on

2021-06-25

Licensed under

You need to set install_url to use ShareThis. Please set it in _config.yml.
You forgot to set the business or currency_code for Paypal. Please set it in _config.yml.

Comments

You forgot to set the shortname for Disqus. Please set it in _config.yml.