A Simple Syntax-Directed Translator

Definition of Grammars

A context-free grammar composed of:

  • A set of terminal symbols referred to as “tokens”.
  • A set of nonterminals called “syntactic variables”.
  • A set of productions, where each production consists of a nonterminal, called the head or left side of the production, an arrow, and a sequence of terminals and/or nonterminals, called the body or right side of the production.
  • A designation of one of the nonterminals as the start symbol.
Read more

Lexical Analysis

动漫歌曲罗马音

ヴァイオレット・エヴァーガーデン

Sincerely

avatar

shi ra na i kotoba wo oboe te i ku ta bi

知 ら な い 言葉 を 覚え て い く た び

o mo ka ge no na ka de wo no ba su no

お も か げ の な か 手 を 伸 ば す の

Read more

Data Mining Concepts and Techniques

数据仓库技术是为了有效地把操作型数据集成到统一的环境中以提供决策型数据访问的各种技术和模型的总称。

数据处理分为:联机事务处理(on-line transaction processing, OLTP)联机分析处理(on-line analytical processing, OLAP).

OLTP是传统的操作型数据库系统的主要应用,主要是一些基本的日常事务处理,如银行柜台存取款、股票交易和商场POS系统等。

OLAP是数据仓库系统的主要应用,支持复杂的分析操作,侧重决策支持,并且提供直观易懂的查询结果。

Read more

NLP入门

Python中有一个自然语言处理工具包(Natural Language Toolkit,简称NLTK)的开源库。NLTK包含大量软件、数据和文档,所有这些都可以从官网免费下载。

语言处理任务与相应的NLTK模块:

语言处理任务 NLTK模块 功能描述
获取和处理语料库 nltk.corpus 语料库和词典的标准化接口
字符串处理 nltk.tokenize, nltk.stem 分词,句子分解提取主干
搭配发现 nltk.collocations t-检验,卡方,点互信息 PMI
词性标识符 nltk.tag n-gram,backoff,Brill,HMM,TnT
分类 nltk.classify, nltk.cluster 决策树,最大熵,贝叶斯,EM,k-means
分块 nltk.chunk 正则表达式,n-gram ,命名实体
解析 nl tk.parse 图表,基于特征,一致性,概率,依赖
语义解释 nltk.sem, nltk.inference λ演算,一阶逻辑,模型检验
指标评测 nltk.metrics 精度,召回率,协议系数
概率与估计 nltk.probability 频率分布,平滑概率分布
应用 nltk.app, nltk.chat 图形化的关键词排序,分析器,WordNet 查看器,聊天机器人
语言学领域的工作 nltk.toolbox 处理 SIL 工具箱格式的数据

编译原理

词法分析

词法分析的主要任务:

从左向右逐行扫描源程序的字符,识别出各个单词,确定单词的类型,并将识别出的单词转换成统一的机内表示——词法单元token形式。

token: <种别码,属性值>

e.g. 词法分析后得到的token序列:

输入:while(value != 100) {num++;}

输出:

  1. while < WHILE, - >
  2. ( < SLP, - >
  3. value < IDN, value >
  4. != < NE, - >
  5. 100 < CONST, 100 >
  6. ) < SRP, - >
  7. { < LP, - >
  8. num < IDN, num >
  9. ++ < INC, - >
  10. ; < SEMI, - >
  11. } < RP, - >
Read more

网络攻防

网络的漏洞和攻击在描述每一协议层的漏洞和攻击的架构时可分为4类:

  • 基于头部的漏洞和攻击:修改了协议头部,或使头部无效。
  • 基于协议的漏洞和攻击:数据包是有效的,但不能直接使用。
  • 基于验证的漏洞和攻击:修改了发送方和接收方的识别标志。
  • 基于流量的漏洞和攻击:借助流量实施攻击。

定义

  • 应用:指允许用户连接网络并执行某项任务的计算机程序。
  • 攻击者:指利用网络攻击计算机系统、网络或其他连接在互联网中的设备的个人或群体。
  • 黑客:即攻击者。
  • 主机:连接到网络上的计算机。
  • 互联网:互连众多网络设备的全球网络的集合。
  • 网络:一组互连的设备并可以相互通信。
  • 网络设备:连接到网络上的设备,泛指包括主机或计算机在内的所有设备,这些设备使网络可以运行起来。
  • 目标:黑客试图攻击的设备、主机、用户或目标。
  • 用户:利用网络执行计算机程序的个人,或一般的计算机用户。
Read more

Swift开发学习

创建一个新的APP

在Xode的欢迎界面中,点击 Create a new Xcode project 创建新项目。模板选择器可以帮助你利用预设模板创建新项目,在 macOS 选项卡中你可以看到四种不同的核心macOS app类型:

  • App: 基于窗口UI的macOS桌面应用,Cocoa是所有macOS应用程序的框架。
  • Document App:
  • Game: 基于苹果SpriteKit或SceneKit框架的游戏程序。
  • Command Line Tool: 基于命令行文本UI,在shell中运行的实用工具。

选择 App 并点击 Next,在下一界面中设置产品名称为 HelloWorld,其他保持默认。

最后再点击 Next,并选择你要保存项目的位置。

Swift 的 playground 就像是一个可交互的文档,它是用来练手学swift的,写一句代码出一行结果(右侧),可以实时查看代码结果,是学习swift语言的利器!

Swift语言

基本类型

使用 let 创建一个常量,var 创建一个变量。

1
2
3
var myVar = 100
myVar = 50
let PI = 3.1415926

每个常量和变量在Swift中都有一个类型,但是不需要显式地把类型写出来。当你声明一个常量或变量的时候,提供给这个常量或变量一个值,让编译器去推断它的类型。

如果初始值提供不了足够的信息(或没有初始值),那么就需要在变量名之后为其指定类型,中间通过冒号分隔。

1
2
3
let implicitIntrger = 70
let implicitDouble = 70.0
let explicitDouble: Double = 70

如果你需要进行类型转换,就显式声明:

1
2
3
let label = "The width is "
let width = 94
let widthLabel = label + String(width)

还有一种更简单的方法:

1
2
3
4
let apple = 3
let pear = 5
let appleSum = "I have \(apple) apples."
let fruitSum = "I have \(apple + pear) pieces of fruit."

使用三引号(“””)表示占用多行字符串:

1
2
3
4
let quotation = """
I said "I have \(apple) apples."
And then I said "I have \(apple + pear) pieces of fruit."
"""

使用 [] 创建数组和字典:

1
2
3
4
5
6
7
8
9
10
11
var list = ["A", "B", "C"] //字符也要用"",因为是作为String类型的
list[1] = "P"
// then list is ["A", "P", "C"]
// index starts from 0

var dict = [
"Malcolm": "Captain"
"Kaylee": "Mechanic"
]
dict["preccrep"] = "programmer"
print(dict)

向数组中添加元素时,数组自动增长:

1
2
list.append("D")
print(list)

创建空数组或空字典:

1
2
let empArr: [String] = []
let empDict: [String: Float] = [:]

如果数据类型可以被推断,只需简单地写成:

1
2
empArr = []
empDict = [:]

条件控制

使用 ifswitch 来创建条件语句,使用 for-in, whilerepeat-while 创建循环。条件加不加括号都可以,但是后面的内容必须用花括号括起来。

1
2
3
4
5
6
7
8
9
10
let scores = [75, 43, 103, 87, 12]
var teamScore = 0
for score in scores {
if score > 50 {
teamScore += 3
} else {
teamScore += 1
}
}
print(teamScore)

这里注意,条件必须是显式的布尔表达式,像C/C++里的 if(score) 是行不通的,而是要写为 if score != 0

在if语句中使用可选绑定来检查可选类型是否有值:

1
2
3
4
5
6
7
8
9
10
11
12
13
var optionalName: String? = "John Appleseed"
var greeting = "Hello!"
if let name = optionalName {
greeting = "Hello, \(name)"
}
// print: Hello, John Appleseed

var optionalName: String? = nil
var greeting = "Hello!"
if let name = optionalName {
greeting = "Hello, \(name)"
}
// print: Hello!

实际上,当optionalName = nil时,if的条件为false。其实很好理解,就是先让name等于optionalName,然后判断name是不是nil。An optional value either contains a value or contains nil to indicate that a value is missing. Write a question mark (?) after the type of a value to mark the value as optional.

Another way to handle optional values is to provide a default value using the ?? operator. If the optional value is missing, the default value is used instead.

1
2
3
let nickname: String? = nil
let fullname: String = "John Appleseed"
let informalGreeting = "Hi \(nickname ?? fullname)"

Switches support any kind of data and a wide variety of comparison operations—they aren’t limited to integers and tests for equality.

1
2
3
4
5
6
7
8
9
10
11
let vegetable = "red pepper"
switch vegetable {
case "celery":
print("Add some raisins and make ants on a log.")
case "cucumber", "watercress":
print("That would make a good tea sandwich.")
case let x where x.hasSuffix("pepper"):
print("Is it a spicy \(x)?")
default:
print("Everything tastes good in soup.")
}

After executing the code inside the switch case that matched, the program exits from the switch statement. Execution doesn’t continue to the next case, so you don’t need to explicitly break out of the switch at the end of each case’s code.

You use for-in to iterate over items in a dictionary by providing a pair of names to use for each key-value pair. Dictionaries are an unordered collection, so their keys and values are iterated over in an arbitrary order.

1
2
3
4
5
6
7
8
9
10
11
12
13
let interestingNumbers = [
"Prime": [2,3,5,7,11,13],
"Fibonacci": [1,1,2,3,5,8],
"Square": [1,4,9,16,25],
]
var largest = 0
for (_, numbers) in interestingNumbers {
for numebr in numbers {
if number > largest {
largest = number
}
}
}

Use while to repeat a block of code until a condition changes. The condition of a loop can be at the end instead, ensuring that the loop is run at least once.

1
2
3
4
5
6
7
8
9
10
11
var n = 2
while n < 100 {
n *= 2
}
print(n)

var m = 2
repeat {
m *= 2
} while m < 100
print(m)

You can keep an index in a loop by using ..< to make a range of indexes.

1
2
3
4
5
var total = 0
for i in 1..<4 {
total += i
}
print(total) // 6

Use ..< to make a range that omits its upper value, and use ... to make a range that includes both values.

1
2
3
4
5
var total = 0
for i in 1...4 {
total += i
}
print(total) // 10

函数和方法

函数声明可以包含0个或者多个参数,格式如name:Type。这些参数是额外的信息,当你去调用方法的时候,你必须把这些额外的信息传递进去。另外,函数有可选的返回类型,格式如:->返回类型,这表明这个函数返回的结果。函数的视线写在大括号({})中。

1
2
3
4
func greet(person: String, day: String) -> String {
return "Hello \(person), today is \(day)."
}
greet(person: "Bob", day: "Tuesday")

By default, functions use their parameter names as labels for their arguments. Write a custom argument label before the parameter name, or write _ to use no argument label.

1
2
3
4
func greet(_ person: String, on day: String) -> String {
return "Hello \(person), today is \(day)."
}
greet("John", on: "Wednesday")

Use a tuple to make a compound value—for example, to return multiple values from a function. The elements of a tuple can be referred to either by name or by number.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
func calc(scores: [Int]) -> (min: Int, max: Int, sum: Int) {
var min = scores[0]
var max = scores[0]
var sum = 0

for score in scores {
if score > max {
max = score
} else if score < min {
min = score
}
sum += score
}
return (min, max, sum)
}
let statistics = calc(scores: [5,3,100,3,9])
print(statistics.sum) // 120
print(statistics.2) // 120

Functions can be nested. Nested functions have access to variables that were declared in the outer function. You can use nested functions to organize the code in a function that’s long or complex.

1
2
3
4
5
6
7
8
9
func returnFifteen() -> Int {
var y = 10
func add() {
y += 5
}
add()
return y
}
returnFifteen()

Functions are a first-class type. This means that a function can return another function as its value.

1
2
3
4
5
6
7
8
func makeIncrementer -> ((Int) -> Int) {
func addOne(number: Int) -> Int {
return 1 + number
}
return addOne
}
var increment = makeIncrementer()
increment(7)

A function can take another function as one of its arguments.

1
2
3
4
5
6
7
8
9
10
11
12
func hasAnyMatches(list: [Int], condition: (Int) -> Bool) -> Bool {
for item in list {
if condition(item) {
return true
}
}
return false
}
func lessThanTen(number: Int) -> Bool {
return number < 10
}
hasAnyMatches(list: [20,19,7,12], condition: lessThanTen)

Functions are actually a special case of closures: blocks of code that can be called later. The code in a closure has access to things like variables and functions that were available in the scope where the closure was created, even if the closure is in a different scope when it’s executed—you saw an example of this already with nested functions. You can write a closure without a name by surrounding code with braces ({}). Use in to separate the arguments and return type from the body.

1
2
3
4
5
6
var numbers = [20, 19, 7, 12]
print(numbers.map({ (number: Int) -> Int in
let res = number * 3
return res
}))
// [60, 57, 21, 36]
map: Returns an array containing the results of mapping the given closure over the sequence’s elements.
Declarationfunc map<T>(_ transform: (Int) throws -> T) rethrows -> [T]
DiscussionIn this example, map is used first to convert the names in the array to lowercase strings and then to count their characters.let cast = ["Vivien", "Marlon", "Kim", "Karl"] let lowercaseNames = cast.map { $0.lowercased() } // 'lowercaseNames' == ["vivien", "marlon", "kim", "karl"] let letterCounts = cast.map { $0.count } // 'letterCounts' == [6, 6, 3, 4]
ParameterstransformA mapping closure. transform accepts an element of this sequence as its parameter and returns a transformed value of the same or of a different type.-No description.
ReturnsAn array containing the transformed elements of this sequence.

You have several options for writing closures more concisely. When a closure’s type is already known, such as the callback for a delegate, you can omit the type of its parameters, its return type, or both. Single statement closures implicitly return the value of their only statement.

1
2
3
let mappedNumbers = numbers.map({ number in 3*number })
print(mappedNumbers)
// [60, 57, 21, 36]

You can refer to parameters by number instead of by name—this approach is especially useful in very short closures. A closure passed as the last argument to a function can appear immediately after the parentheses. When a closure is the only argument to a function, you can omit the parentheses entirely.

1
2
let sortedNumbers = numbers.sorted { $0 > $1 }
print(sortedNumbers) // [20, 19, 12, 7]

对象与类

Use class followed by the class’s name to create a class. A property declaration in a class is written the same way as a constant or variable declaration, except that it’s in the context of a class. Likewise, method and function declarations are written the same way.

1
2
3
4
5
6
7
8
9
10
class Shape {
var numberOfSides = 0
let constSide = 10
func simpleDescription() -> String {
return "A shape with \(numberOfSides) sides."
}
func returnSides() -> Int {
return constSide + 10
}
}

Create an instance of a class by putting parentheses after the class name. Use dot syntax to access the properties and methods of the instance.

1
2
3
var shape = Shape()
shape.numberOfSides = 7
var shapeDescription = shape.simpleDescription()

Use init:

1
2
3
4
5
6
7
8
9
10
11
12
class NamedShape {
var numberOfSides = 0
var name: String

init(name: String) {
self.name = name
}

func simpleDescription() -> String {
return "A shape with \(numberOfSides) sides."
}
}

Override:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
class Square: NamedShape {
var sideLength: Double

init(sideLength: Double, name: String) {
self.sideLength = sideLength
super.init(name: name)
numberOfSides = 4
}

func area() -> Double {
return sideLength * sideLength
}

override func simpleDescription() -> String {
return "A square with sides of length \(sideLength)."
}
}
let test = Square(sideLength: 5.2, name: "my test square")
test.area()
test.simpleDescription()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
class Circle: NamedShape {
var radius: Double
let PI = 3.1415926

init(radius: Double, name: String) {
self.radius = radius
super.init(name: name)
}

func getArea() -> Double {
return PI * radius * radius
}

override func simpleDescription() -> String {
return "This is a circle with name \(name), radius \(radius) and area \(getArea())."
}
}
var c1 = Circle(radius: 1.23, name: "MyCircle")
print(c1.simpleDescription())

getter and a setter:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
class EquilateralTriangle: NamedShape {
var sideLength: Double = 0.0
init(sideLength: Double, name: String) {
self.sideLength = sideLength
super.init(name: name)
numberOfSides = 3
}
var perimeter: Double {
get {
return 3.0 * sideLength
}
set {
sideLength = newValue / 3.0
}
}
override func simpleDescription() -> String {
return "An equilateral triangle with sides of length \(sideLength)."
}
}
var triangle = EquilateralTriangle(sideLength: 3.1, name: "a triangle")
print(triangle.perimeter)
// Prints "9.3"
triangle.perimeter = 9.9
print(triangle.sideLength)
// Prints "3.3000000000000003"

If you don’t need to compute the property but still need to provide code that’s run before and after setting a new value, use willSet and didSet. The code you provide is run any time the value changes outside of an initializer. For example, the class below ensures that the side length of its triangle is always the same as the side length of its square.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
class TriangleAndSquare {
var triangle: EquilateralTriangle {
willSet {
square.sideLength = newValue.sideLength
}
}
var square: Square {
willSet {
triangle.sideLength = newValue.sideLength
}
}
init(size: Double, name: String) {
square = Square(sideLength: size, name: name)
triangle = EquilateralTriangle(sideLength: size, name: name)
}
}
var triangleAndSquare = TriangleAndSquare(size: 10, name: "another test shape")
print(triangleAndSquare.square.sideLength)
print(triangleAndSquare.triangle.sideLength)
triangleAndSquare.square = Square(sideLength: 50, name: "larger square")
print(triangleAndSquare.triangle.sideLength)

Optional value:

1
2
let optionalSquare: Square? = Square(sideLength: 2.5, name: "optional square")
let sideLength = optionalSquare?.sideLength

枚举和结构体

Use enum to create an enumeration. Like classes and all other named types, enumerations can have methods associated with them.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
enum Rank: Int {
case ace = 10
case two, three, four, five, six, seven, eight, nine, ten
case jack, queen, king
func simpleDescription() -> String {
switch self {
case .ace:
return "ace"
case .jack:
return "jack"
case .queen:
return "queen"
case .king:
return "king"
default:
return String(self.rawValue)
}
}
}
let ace = Rank.ace
let aceRawValue = ace.rawValue // 10

By default, Swift assigns the raw values starting at zero and incrementing by one each time, but you can change this behavior by explicitly specifying values. In the example above, ace is explicitly given a raw value of 1, and the rest of the raw values are assigned in order. You can also use strings or floating-point numbers as the raw type of an enumeration. Use the rawValue property to access the raw value of an enumeration case.

Use the init?(rawValue:) initializer to make an instance of an enumeration from a raw value. It returns either the enumeration case matching the raw value or nil if there’s no matching Rank.

1
2
3
if let convertedRank = Rank(rawValue: 3) {
let threeDescription = convertedRank.simpleDescription()
}
Summary: A half-open interval from a lower bound up to, but not including, an upper bound.
Declarationstruct Range<Bound> where Bound : Comparable
DiscussionYou create a Range instance by using the half-open range operator (..<).let underFive = 0.0..<5.0You can use a Range instance to quickly check if a value is contained in a particular range of values. For example:underFive.contains(3.14) // true underFive.contains(6.28) // false underFive.contains(5.0) // false``Range instances can represent an empty interval, unlike ClosedRange.let empty = 0.0..<0.0 empty.contains(0.0) // false empty.isEmpty // trueUsing a Range as a Collection of Consecutive ValuesWhen a range uses integers as its lower and upper bounds, or any other type that conforms to the Strideable protocol with an integer stride, you can use that range in a for-in loop or with any sequence or collection method. The elements of the range are the consecutive values from its lower bound up to, but not including, its upper bound.for n in 3..<5 { print(n) } // Prints "3" // Prints "4"Because floating-point types such as Float and Double are their own Stride types, they cannot be used as the bounds of a countable range. If you need to iterate over consecutive floating-point values, see the stride(from:to:by:) function.

Most app UIs are very visual experiences, so most accessibility work focuses on VoiceOver — a screen reader that lets people with low to no vision use Apple devices without needing to see their screens. VoiceOver reads out information to users about your app’s UI elements. It’s up to you to make sure this information helps users interact efficiently with your app.

协议和扩展

使用 protocol 声明一个协议:

1
2
3
4
protocol ExampleProtocol {
var simpleDescription: String { get }
mutating func adjust()
}

类、枚举和结构都可以采用协议:

1
2
3
4
5
6
7
8
class SimpleClass: ExampleProtocol {
var simpleDescription: String = "A very simple class."
var anotherProperty: Int = 69105
func adjust() {
simpleDescription += " Now 100% adjusted."
}
}

Logging

Introduction

Log is used to record the history of database changes securely.

The transaction is the unit of execution of database operations.

In the typical embedded SQL system, transactions begin as soon as operations on the database are executed and end with an explicit COMMIT or ROLLBACK (“abort”) command.

How transactions interact with the database? There are 3 address spaces that interact in important ways:

  1. The space of disk blocks holding the database elements.
  2. The virtual or main memory address space that is managed by the buffer manager.
  3. The local address space of the transaction.

事务的定义:

  • 一个数据库操作序列
  • 一个不可分割的工作单位(atomic)
  • 恢复和并发控制的基本单位

The SQL statement COMMIT causes a transaction to complete, database modifications are now permanent in the database.

The SQL statement ROLLBACK also causes the transaction to end, but by aborting. Having no effects on the database. It caused by failures like division by 0 or a constraint violation.

ACID

ACID transactions are:

  • Atomic: Either the whole or the none of the transaction is done.
  • Consistent: Database constraints reserved. For example, a + b = 10, if a changed, b also changes.
  • Isolated: It appears to the user that only one process is executing at one time.
  • Durable: Effects of a process survive a crash.

Correctness Principle

If a transaction executes in the absence of any other transactions or system errors, and it starts with the database in a consistent state, then the database is also in a consistent state when the transaction ends.

State: a DB has state means a value for each of its elements.

Consistent state: certain state satisfies all constraints.

Another description of correctness principle:

  1. a transaction is atomic, and the complete execution of a single transaction is from one consistent state to another consistent state. That is to say, if only part of a transaction executes, then db state may not be consisitent.
  2. multiple transactions execute at the same time may lead to inconsistency unless using concurrecy control.

So, a transaction is a collection of actions that preserve consistency.

If T starts with consistent state and executes in isolation, then T leaves with consistent state.

Three address spaces:

  1. Local address space of transaction (Memory)
  2. Buffer (Memory)
  3. Space of disk

INPUT(X): block containing X disk to buffer/memory

OUTPUT(X): block containing X buffer to disk

READ(X, t): do input(X) if necessary; read value of x in buffer to t in local address space of transaction.

WRITE(X, t): do output(X) if necessary; write t in local address space of transaction to the value of x in buffer.

Undo Logging

: indicates that transaction T has begun

: transaction T has completed successfully and will make no more changes to the database elements

: transcation T could not complete successfully

only update record:

  • transaction T has changed database element X and its former value was v

The change reflected by an update record normally occurs in memory, not disk.

  • the log record is a response to a WRITE action, not an OUTPUT action

undo log must be written to disk in the following order:

  1. The log records that indicating changed database elements.
  2. The changed database elements themselves.
  3. The COMMIT log record.

So we got 2 rules:

U1:

U2:

In order to force log records to disk:

  • The log manager needs a flush-log command that tells the buffer manager to copy to disk any log blocks that have not previously been copied to disk.
  • FLUSH LOG

When a transaction begins, a is written to the log memory; after a series of actions like INPUT or READ (or probably no actions at all), a WRITE command comes, and a is written to the log memory. After the last WRITE operation, it’s time to FLUSH LOG. After flushing log, and are written to the disk. That is to say, before flushing log, the previous logs are still in the log memory.

Then, execute OUTPUT(A), OUTPUT(B), … Only after the value of A, B, … is written to disk, a log can be written to the log memory.

Then FLUSH LOG (after execute OUTPUT B, let’s assume that B is the last one) and is written to disk.

Recovery Rules

Incomplete transactions must be undone!
Scan the undo log from the end:

  • Let S = set if transactions with in log but no or record in log

  • For each in log, in reverse order (latest to earliest) do:

    If Ti in S then WRITE(X, v), OUTPUT(X)

  • For each Ti in S do: write to log

Simple checkpointing

In a simple checkpoint,

  1. Stop accepting new transactions.

  2. Wait until all currently active transactions commit or abort and have written a COMMIT or ABORT record on the log.

  3. Flush the log to disk.

  4. Write a log record , and flush the log again.

  5. Resume accepting transactions.

Problem: must shut the system while the ckpt is being made

Nonquiescent checkpointing

Nonquiescent checkpointing

  1. Write a log record

    and flush the log.

  2. Wait until all of T1, …, TK commit or abort, but do not prohibit other transactions from starting

  3. When all of T1, …, TK have completed, write a log record and flush the log

Two cases:

  1. If we first meet an , then we know that all incomplete transactions began after the previous . Scan backwards as far as

  2. If we first meet an , then the crash occurred during the checkpoint

A general rule:

Once an has been written to disk, we can delete the log prior to the previous .

Redo Logging

Problem for undo logging: First write all its changed data to disk then commit a transaction — too many disk I/Os.

Save disk I/Os: Let changed data reside only in main memory for a while.

Redo logging requires the COMMIT record appear on disk before any changed values reach disk.

Redo logging rules

R1: Before modifying any database element X on disk, it is necessary that all log records pertaining to this modification of X, including both the update record and the record, must appear on disk.

  1. The log records indicating changed db elements.
  2. The COMMIT log record.
  3. The changed db elements themselves.

First COMMIT then OUTPUT.

To recover, using a redo log, after a system crash:

  1. Identify the committed transactions.

  2. Scan the log forward from the beginning. For each log record <T, X, v> encountered:

    a) If T is not a committed transaction, do nothing.

    b) If T is committed, write value v for database element X. (That’s because no COMMIT no values on disk)

  3. For each incomplete transaction T, write an record to the log and flush the log.

The steps to perform a nonquiescent checkpoint of redo log:

  1. Write a log record , where T1, …, Tk are all the active (uncommitted) transactions, and flush the log

  2. Write to disk all database elements that were written to buffers but not yet to disk by transactions that had already committed when the START CKPT record was written to the log

  3. Write an record to the log and flush the log

Undo/redo Logging

The update log record:

  • the former value is v, the new value is w

UR1: Before modifying any db element X on disk because of changes made by some transaction T, it is necessary that the update record appear on disk.

UR2: A record must be flushed to disk as soon as it appears in the log.

The undo/redo recovery policy is:

  1. Redo all the commited transactions in the order earliest-first.
  2. Undo all the incomplete transactions in the order latest-first.

系统故障造成数据库不一致状态的原因

未完成事务对数据库的更新已写入数据库

已提交事务对数据库的更新还留在缓冲区没来得及写入数据库

恢复方法

\1. Undo 故障发生时未完成的事务

\2. Redo 已完成的事务

系统故障的恢复由系统在重新启动时自动完成,不需要用户干预

系统故障的恢复步骤

  1. 正向扫描日志文件(即从头扫描日志文件)

重做(REDO) 队列: 在故障发生前已经提交的事务

这些事务既有BEGIN TRANSACTION记录,也有COMMIT记录

撤销 (Undo)队列:故障发生时尚未完成的事务

这些事务只有BEGIN TRANSACTION记录,无相应的COMMIT记录

  1. 对撤销(Undo)队列事务进行撤销(UNDO)处理

反向扫描日志文件,对每个UNDO事务的更新操作执行逆操作

即将日志记录中“更新前的值”写入数据库

  1. 对重做(Redo)队列事务进行重做(REDO)处理

正向扫描日志文件,对每个REDO事务重新执行登记的操作

即将日志记录中“更新后的值”写入数据库

A nonquiescent checkpoint is somewhat simpler for undo/redo logging than for other logging methods

  1. Write a record to the log, where T1, …, Tk are all the active transactions, and flush the log

  2. Write to disk all the buffers that are dirty; i.e., they contain one or more changed database elements. Unlike redo logging, we flush all dirty buffers, not just those written by committed transactions

  3. Write an record to the log and flush the log

Archiving

Maintaining a copy of the db separate from the db itself.

  • shut down the db for a while
  • make a backup copy on some storage medium
  • Store the copy in some remote secure location

Two levels of archiving:

  1. a full dump, in which the entire db is copied
  2. an incremental dump, in which only those db elements changed since the previous full or incremental dump are copied

Concurrency Control

The process of assuring that transactions preserve consistency when executing simultaneously.

The scheduler takes read/write requests from transactions and executes them in buffers or delay them.

Serial and Serializable Schedules

A schedule is serial if its actions consist of all the actions of one transaction, then all the actions of another transaction, and so on.

The correctness principle tells us that every serial schedule will preserve consistency of the db state.

A schedule is serializable if there is a serial schedule S’ such that for every initial db state, the effects of S and S’ are the same.

Conflicts: A pair of consecutive actions in a schedule such that, if their order is interchanged then the behaviour of at least one of the transactions involved can change.

MIPS汇编语言编程

MIPS下各个寄存器编号及描述

寄存器编号 寄存器名 寄存器用途
0 zero 永远返回0
1 $at 汇编保留寄存器(不可用作其他用途)
2-3 $v0-$v1 (value的简写)存储表达式或者函数的返回值
4-7 $a0-$a3 (Argument简写)存储子程序的前4个参数,在子程序调用过程中释放
8-15 $t0-$t7 (temp简写)临时变量,同上调用时不保存
16-23 $s0-$s7 (Save or Static简写?)静态变量?调用时保存
24-25 $t8-$t9 (Temp简写)算是前面$0-$7的继续,属性同$t0-$t7
26-27 $k0-$k1 (break off简写?)中断函数返回值,不可做其他用途
28 gp\ (GlobalPointer简写)指向64k(gp\ (GlobalPointer简写)指向64k(2^{16}$)大小的静态数据块的中间地址
29 $sp (Stack Pointer简写)栈指针,指向栈顶
30 $s8/$fp (Save / Frame Pointer)帧指针
31 $ra 返回地址,目测不可用作其他用途

程序结构

数据声明 + 普通文本 + 程序编码(文件后缀是 .s 或 .asm)

数据声明在代码段之后(在之前也没什么问题)

数据

  • 数据段以 .data 为开始标志
  • 声明变量后,即在主存中分配空间

代码

  • 代码段以 .text 为开始标志
  • 各项指令操作
  • 程序入口标志:main:
  • 程序结束标志

注释

同C系语言。

基本模板:

1
2
3
4
5
6
7
8
9
10
11
12
13
# Comment giving name of program and description of function
# ...
# Template.s
.data # variable declarations follow this line
# 数据变量声明
# ...
.text # instructions follow this line
# 代码段
main: # indicates start of code (first instruction to execute)
# 主程序
# ...

# End of program, leave a blank line

数据声明

声明的格式:

1
变量名: 数据类型 变量值

冒号不可缺少。

通常给变量赋一个初始值,对于 .space,要指明需要多少大小的空间(bytes)。

加载/保存 指令集

  • 如果要访问内存,只能使用 load/store 指令
  • 其他的只能是寄存器操作

load

lw

lw register_dest, RAM_src

从内存中复制RAM_src的内容到对应的寄存器中(w即word,一个字长,4个字节,因此该数据大小为4个字节)

lb

lb register_dest, RAM_src

同上,lb为load byte.

store

sw

sw register_src, RAM_dest

指将指定寄存器中的数据写入到特定的内存中。

sb

sb register_src, RAM_dest

同上,但数据大小为1字节。

load imm

li

li register_dest, value

加载立即数。

立即与间接寻址

la

Load address 直接给地址

例如,la $t0, var1 将 var1 (表示的是内存地址)放到 $t0 中。

如果变量var1不是内存地址,就成 li 指令了。

间接寻址

lw: 寄存器中存储的是内存地址,需要访问该内存地址得到数据再放到目的寄存器中。

lw $t2,($t0) load word at RAM address contained in $t0 into $t2
sw $t2,($t0) store word in register $t2 into RAM at address contained in $t0

加偏移量

lw $t2,4($t0) load word at RAM address ($t0+4) into register $t2, ”4” gives offset from address in register $t0
sw $t2,-12($t0) store word in register $t2 into RAM at address ($t0 - 12),negative offsets are fine

算术指令集

  • 最多3个操作数
  • 操作数只能是寄存器,不允许出现地址
  • 所有指令统一是32位
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
add $t0,$t1,$t2 # $t0 = $t1 + $t2; add as signed (2's complement) integers
sub $t2,$t3,$t4 # $t2 = $t3 - $t4
addi $t2,$t3, 5 # $t2 = $t3 + 5; "add immediate" (no sub immediate)
addu $t1,$t6,$t7 # $t1 = $t6 + $t7; add as unsigned integers
subu $t1,$t6,$t7 # $t1 = $t6 - $t7; subtract as unsigned integers
mult $t3,$t4 # multiply 32-bit quantities in $t3 and $t4, and store 64-bit
# result in special registers Lo and Hi: (Hi,Lo) = $t3 * $t4
            #             运算结果存储在hi,lo(hi高位数据, lo低位数据)
div $t5,$t6 # Lo = $t5 / $t6 (integer quotient)
# Hi = $t5 mod $t6 (remainder)
            #             商数存放在 lo, 余数存放在 hi
mfhi $t0 # move quantity in special register Hi to $t0: $t0 = Hi
        #                 不能直接获取 hi 或 lo中的值, 需要mfhi, mflo指令传值给寄存器
mflo $t1 # move quantity in special register Lo to $t1: $t1 = Lo
# used to get at result of product or quotient

move $t2,$t3 # $t2 = $t3

控制流

分支(if else系列)

1
2
3
4
5
6
7
b   target      #  unconditional branch to program label target
beq $t0,$t1,target # branch to target if $t0 = $t1
blt $t0,$t1,target # branch to target if $t0 < $t1
ble $t0,$t1,target # branch to target if $t0 <= $t1
bgt $t0,$t1,target # branch to target if $t0 > $t1
bge $t0,$t1,target # branch to target if $t0 >= $t1
bne $t0,$t1,target # branch to target if $t0 <> $t1

跳转(while, for, goto系列)

1
2
3
4
j   target      #  unconditional jump to program label target
# 看到就跳, 不用考虑任何条件
jr $t3 # jump to address contained in $t3 ("jump register")
      # 类似相对寻址,跳到该寄存器给出的地址处

子程序调用

1
jal sun_label   # "jump and link"

将当前程序计数器保存在 $ra 中,通过上面保存在 $ra 中的计数器返回到调用前。

如果调用的子程序中有调用了其他的子程序,如此往复,则返回地址的标记就用栈来存储。

系统调用与输入/输出(主要针对SPIM模拟器)

  • 使用syscall,以下指令应该是通用的。
  • 参数使用的寄存器:$v0, $a0, $a1.
  • 返回值使用:$v0.
Service Code in $v0 对应功能的调用码 Arguemnts 所需参数 Results 返回值
打印一个整型 $v0=1 将要打印的整型赋值给$a0
打印一个浮点数 $v0=2 将要打印的浮点数赋值给$f12
打印双精度浮点数 $v0=3 将要打印的双精度浮点数赋值给$f12
打印字符串 $v0=4 将要打印的字符串的地址赋值给$a0
读取一个整型 $v0=5 将读取的整型赋值给$v0
读取浮点数 $v0=6 将读取的浮点数赋值给$v0
读取双精度浮点数 $v0=7 将读取的双精度浮点数赋值给$v0
读取字符串 $v0=8 将读取字符串地址赋值给$a0,将读取字符串的长度赋值给$a1
sbrk(应该同C中的sbrk()函数一样) 动态分配内存 $v0=9 $a0=amount 需要分配的空间大小,单位目测是字节 将分配好的空间首地址给$a0
退出 \v0=10 退出
  • 打印的字符串应该有一个终止符(‘\0’),声明字符串为 .asciiz 类型即可。
  • 对于读取整型,浮点数,双精度浮点数等数据操作,系统会读取一整行(以’\n’为结束)
  • 读取字符串时,输入过长就截短,短了不补,最后会加上终止符
  • The sbrk service returns the address to a block of memory containing n additional bytes. This would be used for dynamic memory allocation.

在MIPS程序中,指令一般从地址0x00400000开始存储。MIPS存储器地址是字节寻址,所以32位(4字节)指令地址每次增加4字节而不是1字节。

机器代码就是指令。汇编代码是汇编代码。

处理器从存储器中顺序地取出指令。数字电路硬件解码和执行这些指令。当前指令的地址存储在程序计数器(一个32位寄存器)中。

操作系统将PC的值设为地址0x00400000。处理器将位于这个地址的指令读出并执行机器代码。然后,处理器将PC增加4,变为0x00400000,接着取出该地址的指令并执行,以此类推。

微处理器的体系结构状态(architectural state)保存程序的状态。对于MIPS,体系结构状态由寄存器文件和PC组成。

分支

为了顺序执行指令,程序计数器执行一条指令后增4。分支(branch)指令改变程序计数器的值,跳过某段代码或返回到执行先前的代码。