Translator's order
Brief introduction of translator
Preface to publication
Chapter 1 Overview of dbms system
1. 1 development of database system
1. 1. 1 early database management system
1. 1.2 relational database system
1. 1.3 Smaller and smaller systems
1. 1.4 Larger and larger systems
1. 1.5 information integration
1.2 Overview of Database Management System
1.2. 1 data definition language command
1.2.2 Overview of query processing
1.2.3 main memory and buffer manager
1.2.4 transaction processing
1.2.5 query processor
1.3 book overview
1.4 database model and language review
1.4. 1 Review of relational model
1.4.2sql review
1.5 citations
The first part is the realization of database system.
Chapter II Auxiliary Warehouse Management
2. 1 memory hierarchy
2. 1. 1 memory hierarchy
2. 1.2 transferring data between memory levels
2. 1.3 volatile and nonvolatile memory
2. 1.4 virtual memory
2. 1.5 movement
2.2 disk
2.2. 1 disk structure
Disk controller
2.2.3 disk access characteristics
practise
2.3 Accelerate access to auxiliary memory
2.3. 1 i/o model for calculation
2.3.2 Organize data by cylinder.
Use multiple disks
Disk mirroring
2.3.5 disk scheduling and elevator algorithm
2.3.6 prefetching and large-scale buffering
practise
2.4 Disk failure
2.4. 1 intermittent fault
Checksum
Stable storage
2.4.4 Error handling ability of stable storage
2.4.5 Recovery from Disk Crash
2.4.6 as a mirror image of redundant technology
Parity check block
2.4.8 An improvement: raid 5
2.4.9 Handling when Multiple Disks Crash
2.4. 10 exercise
2.5 organize the data on the disk
2.5. 1 fixed length record
2.5.2 Placement of Fixed Length Records in Blocks
practise
2.6 Representation of Block and Record Address
2.6. 1 address in client-server system
Logical address and structure address
Pointer mixed writing
2.6.4 Block Return to Disk
2.6.5 pinned records and blocks
practise
2.7 Variable length data and records
2.7. 1 records with variable length fields
2.7.2 Records with duplicate fields
Variable format record
2.7.4 Records in the block cannot be loaded.
2.7.5blob
Chromatographic column storage
practise
2.8 Modification of records
2.8. 1 insert
delete
modify
practise
2.9 Summary
2. 10 references
Chapter III Exponential Structure
3. 1 indicator structure basis
3. 1. 1 sequential file
3. 1.2 Density index
3. 1.3 sparsity index
3. 1.4 multilevel index
3. 1.5 auxiliary index
3. 1.6 Application of auxiliary indicators
Indirect in 3. 1.7 auxiliary index
3. 1.8 document retrieval and inverted index
3. 1.9 movement
3.2b tree
3.2. 1b tree structure
3.2.2 Application of B-B Tree
3.2.3b- tree search
3.2.4 Range query
3.2.5b- tree insertion
3.2.6b- tree deletion
3.2.7b- Christmas Tree Efficiency
take exercise
3.3 Hash table
3.3. 1 secondary hash table
3.3.2 Insert hash table
3.3.3 Delete Hashtable
3.3.4 Efficiency of Hash Table Index
3.3.5 Extensible Hash Table
3.3.6 Insertion of Extensible Hash Table
3.3.7 linear hash table
3.3.8 Insert a linear hash table
take exercise
3.4 Multidimensional Index
3.4. 1 multidimensional indexing application
3.4.2 Performing Range Query with Traditional Index
3.4.3 Using traditional index to execute nearest neighbor query
3.4.4 Overview of Multidimensional Index Structure
3.5 Hash structure of multidimensional data
3.5. 1 grid file
Search grid file
Insertion of grid file
Performance of grid files
3.5.5 Piecewise Hash Function
3.5.6 Comparison between Grid File and Segment Hash
take exercise
3.6 Tree structure of multidimensional data
3.6. 1 multi-key index
3.6.2 Multi-key index performance
3.6.3kd tree
3.6.4kd tree operation
3.6.5 Make kd- tree suitable for auxiliary storage.
quadtree
R- tree
3.6.8r- Operation of Christmas Tree
take exercise
3.7 Bitmap Index
3.7. 1 bitmap index motivation
3.7.2 compressed bitmap
3.7.3 Operation of Segment Length Encoding Bit Vector
3.7.4 Bitmap Index Management
take exercise
3.8 Summary
3.9 References
Chapter 4 Query Execution
4. 1 Introduction of physical query plan operator
4. 1. 1 scan table
4. 1.2 Sorting when scanning tables
4. 1.3 physical operator calculation model
4. 1.4 parameters to measure the cost
4. I/o cost of1.5 scan operator
4. 1.6 Implementing iterator of physical operators
4.2 Single stroke algorithm
4.2. 1 One-pass algorithm for once cell group operation
4.2.2 One-step algorithm for unary operation of global relations
4.2.3 One-stroke algorithm for binary operation
take exercise
4.3 Nested loop connection
4.3. 1 nested cyclic join based on tuples
4.3.2 Nested Loop Connection Iterator Based on Tuples
4.3.3 Block-based Nested Cyclic Join Algorithm
4.3.4 Nested loop connection analysis
4.3.5 Summary of algorithms so far
take exercise
4.4 Two-pass algorithm based on sorting
4.4. 1 two-level multiple merge sorting
4.4.2 Use sorting to eliminate duplication
4.4.3 Grouping and aggregation by sorting
4.4.4 Joint algorithm based on sorting
4.4.5 intersection algorithm based on sorting
4.4.6 Simple connection algorithm based on sorting
4.4.7 Simple Sorting Connection Analysis
4.4.8 More effective connection based on sorting
4.4.9 Overview of algorithms based on sorting
4.4. 10 exercise
4.5 Hash-based two-pass algorithm
4.5. 1 Dividing relationships by hashing
4.5.2 Hash-based deduplication algorithm
4.5.3 Hash-based grouping and aggregation algorithm
4.5.4 Algorithm of Union, Intersection and Difference Based on Hash
Hash join algorithm
4.5.6 Save some disk I/O.
4.5.7 Overview of Hash-based Algorithms
take exercise
4.6 Index-based algorithm
4.6. 1 clustered index and nonclustered index
4.6.2 Index-based Selection
Use indexed connections.
4.6.4 Connections using ordered indexes
take exercise
4.7 Buffer management
4.7. 1 buffer management structure
Buffer management strategy
4.7.3 Relationship between physical operator selection and buffer management
take exercise
4.8 Use the algorithm more than twice.
4.8. 1 Multi-pass algorithm based on sorting
4.8.2 Performance of multi-pass algorithm based on sorting
4.8.3 Multi-pass algorithm based on hash
4.8.4 Performance of multi-pass algorithm based on hash
take exercise
4.9 Summary
4. 10 references
Chapter 5 Query Compiler
5. 1 syntax analysis and preprocessing
5. 1. 1 Parsing and Parsing Tree
5. Syntax of simple subset of1.2 SQL
5. 1.3 preprocessor
5. 1.4 preprocessing queries involving views
5. 1.5 movement
5.2 Improve the Algebraic Rule of Query Planning
5.2. 1 commutative law and associative law
5.2.2 Law on Choice
Push down selection
5.2.4 Laws on Projection
5.2.5 Laws on Connections and Products
5.2.6 Laws on eliminating duplication
5.2.7 Law on grouping and aggregation
take exercise
5.3 From Parsing Tree to Logical Query Plan
5.3. 1 into relational algebra
5.3.2 Delete the subquery from the condition.
5.3.3 Improvement of Logical Query Plan
5.3.4 Grouping of Combinable/Assignable Operators
take exercise
5.4 Operating cost estimation
5.4. 1 estimation of intermediate relation size
5.4.2 Estimated operation scale
5.4.3 Selection of operation scale estimation
5.4.4 Estimation of connection operation scale
5.4.5 Natural connection of multi-connection attributes
5.4.6 Connection of Multiple Relationships
5.4.7 Estimation of other operation scale
take exercise
5.5 Introduction of cost-based planning options
5.5. 1 Obtain the estimated value of the size parameter.
5.5.2 Calculation of statistics
5.5.3 Heuristic estimation to reduce the cost of logical query planning
5.5.4 Methods of enumerating physical plans
take exercise
5.6 Selection of connection sequence
5.6. The meaning of1left and right parameter connection
5.6.2 Connection Tree
5.6.3 Left deep junction tree
5.6.4 Select the connection sequence and grouping through dynamic planning.
5.6.5 Dynamic Programming with More Specific Cost Function
5.6.6 Greedy algorithm for selecting connection order
take exercise
5.7 Complete the physical query scheme selection.
5.7. 1 Select the selection method.
Select connection method
5.7.3 Pipelining and concretization
5.7.4 unary pipeline operation
5.7.5 Pipeline operation of binary operation
5.7.6 Symbol of physical query plan
Classification of physical operations
take exercise
5.8 Summary
5.9 References
Chapter VI System Fault Countermeasures
6. 1 problems and models that can be resumed.
6. 1. 1 failure mode
6. Further discussion on1.2 transaction
6. 1.3 Correct execution of the transaction
6. Original operation of1.4 transaction
6. 1.5 movement
6.2 Undo log
6.2. 1 logging
6. 2. 2 Undo the log rule
6.2.3 Recovering with Undo Log
checking point
Nonstationary checkpoint
practise
6.3 Redo Log
6. 3. 1 Redo Log Rules
6.3.2 Use redo logs for recovery.
6. 3. 3 Redo checkpoint of redo log
6.3.4 Use the redo log with checkpoint for recovery.
practise
6.4 Undo/Redo Log
Undo/Redo Rule
6.4.2 Recovery by using Undo/Redo Log
6.4.3 checkpoint of undo/redo log
take exercise
6.5 Media fault protection
6.5. 1 backup
Nonstationary dump
6.5.3 Use backup and log for recovery.
practise
6.6 summary
6.7 references
Chapter 7 Concurrency Control
7. 1 serial scheduling and serializable scheduling
7. 1. 1 scheduling
7. 1.2 serial scheduling
7. 1.3 serializable scheduling
7. 1.4 Influence of transaction semantics
7. 1.5 A symbol for transaction and scheduling
7. 1.6 movement
7.2 conflict serializability
6.5438+0 conflict
7.2.2 priority map and conflict serializability judgment
7.2.3 Reasons why priority graph test plays a role
take exercise
7.3 Serializable implementation using locks
7.3. 1 lock
7.3.2 Blocking scheduler
7.3.3 Two-stage blockade
7.3.4 Reasons for the effectiveness of the two-stage blockade
practise
7.4 Locking system with multiple locking modes
7.4.1* * Enjoy locks and exclusive locks.
Compatibility matrix
Lock upgrade
Update lock
Incremental lock
practise
7.5 architecture of blocking scheduler
7.5. 1 Dispatcher performs insertion locking action.
Lock watch
take exercise
7.6 Hierarchy of database elements
7.6. 1 Multi-granularity lock
Warning lock
7.6.3 Correct treatment of illusion and insertion
practise
7.7 Tree Protocol
7.7. 1 tree blockade motivation
7.7.2 Rules for accessing tree structure data
7.7.3 The reason why the tree protocol works.
practise
7.8 concurrency control using timestamps
7.8. 1 timestamp
7.8.2 Behavior that cannot be realized in fact.
7.8.3 Dirty data problem
7.8.4 Scheduling Rules Based on Timestamp
Multi-version timestamp
Timestamp and blocking
practise
7.9 Concurrency Control Using Validity Confirmation
Structure of scheduler based on validity confirmation
Validity confirmation rule
7.9.3 Comparison of Three Concurrency Control Mechanisms
practise
7. 10 summary
7. 1 1 reference
Chapter VIII Re-discussion on Affairs Management
8. 1 continuity and recoverability
8. 1. 1 Dirty data problem
8. 1.2 cascade rollback
8. 1.3 can resume scheduling.
8. 1.4 scheduling to avoid cascading rollback
8. 1.5 Rollback Management Based on Lock
8. 1.6 Group Submission
8. 1.7 logical log
8. 1.8 Recovering from the logical log
8. 1.9 movement
8.2 deadlock
8.2. 1 timeout deadlock detection
8.2.2 Waiting for the chart
8.2.3 Preventing Deadlock by Ordering Elements
8.2.4 Deadlock Detection by Timestamp
8.2.5 Comparison of Deadlock Management Methods
practise
8.3 Long-term business
8.3. 1 Long Transaction Problem
8.3.2saga (continuous recording)
8.3.3 Compensation affairs
8.3.4 Reasons why the salary problem plays a role
practise
8.4 Summary
8.5 references
Chapter 9 Parallel and Distributed Databases
9. Parallel algorithm of1
9. 1. 1 parallel mode
9. 1.2 Parallel operation, one tuple at a time.
9. 1.3 Parallel Algorithm for Total Relation Operation
Performance of 9. 1.4 parallel algorithm
9. 1.5 movement
9.2 map? Reduce parallel architecture
9.2. 1 storage mode
Mapping function
9.2.3 Reduction function
take exercise
9.3 Distributed Database
data distribution
Distributed transaction
9.3.3 Data Replication
take exercise
9.4 Distributed Query Processing
9.4. 1 distributed connection operation problem
Semi-connected simplification
9.4.3 Connection of Multiple Relationships
9.4.4 acyclic hypergraph
9.4.5 Complete Simplification of Acyclic Hypergraphs
9.4.6 Why is the completely simplified algorithm effective?
practise
9.5 distributed submission
9.5. 1 supports distributed atomicity.
9.5.2 Two-stage submission
9.5.3 Recovery of Distributed Transactions
take exercise
9.6 Distributed Blockade
9.6. 1 centralized block system
9.6.2 Cost Model of Distributed Blocking Algorithm
9.6.3 Blocking elements with multiple copies
9.6.4 Block the master copy
9.6.5 Global lock composed of local locks
practise
9.7 Point-to-Point Distributed Search
9.7. 1 Peer-to-Peer Network
9.7.2 Distributed Hash Problem
9.7.3 Centralized solution of distributed hash
9.7.4 Circle with rope
9.7.5 Link on chord circle
9.7.6 Look it up with a finger table.
9.7.7 Add a new node
9.7.8 When the terminal leaves the network,
9.7.9 When one end collapses,
9.7. 10 exercise
9.8 Summary
9.9 references
The second part is the theme of modern database system.
Chapter 10 Information Integration
10. 1 Introduction to Information Integration
10. 1. 1 Why do you want to integrate information?
10. 1.2 Heterogeneity
10.2 information integration mode
10.2. 1 federated database system
10.2.2 data warehouse
10. 2. 3 intermediary
10.2.4 exercise
10.3 Wrapper in Mediation-based System
10.3. 1 query pattern template
10.3.2 packaging generator
10.3.3 filter
10.3.4 Other operations on the package
10.3.5 exercise
10.4 capacity-based optimization
10.4. 1 limited data source capacity
10.4.2 symbol describing the function of data source.
10.4.3 Query Plan Selection Based on Ability
10.4.4 Add cost-based optimization.
10.4.5 exercise
10.5 optimize the intermediary query
10.5. 1 Simplified modifier
10.5.2 get the answer of the sub-goal.
10. 5. 3 chain algorithm
10.5.4 combine and view mediations.
10.5.5 exercise
10.6 uses the local as the intermediary of the view.
10.6. 1lav mediator's motivation
10. 6. 2 legal mediator terminology
10.6.3 extended solution
10.6.4 contains a federated query.
10.6.5 Why is the inclusion mapping test valid?
10.6.6 Discover the solution of mediator query.
10.6.7 Why does the lmss theorem hold?
10.6.8 exercise
10.7 entity analysis
10.7. 1 Determines whether the record represents an * * * entity.
10.7.2 merge similar records.
Similarity and Useful Properties of 10.7.3 Merging Function
10.7.4icar recorded by icar? Swoosh algorithm
10.7.5 Why R? Swoosh algorithm will be effective.
10.7.6 Other methods of entity resolution
10.7.7 exercise
10.8 summary
10.9 references
Chapter 1 1 Data Mining
1 1. 1 frequent itemsets mining
11.1.1market-shopping basket model
Basic definition of 1 1. 1.2
1 1. 1.3 association rules
1 1. 1.4 calculation model of frequent itemsets
1 1. 1.5 motion
1 1.2 Algorithm for Finding Frequent Itemsets
1 1.2. 1 the distribution of frequent itemsets
1 1.2.2 Naive algorithm for finding frequent itemsets
1 1.2.3a? apriori algorithm
1 1.2.4a? Realization of prior algorithm
1 1.2.5 Make better use of main memory
1 1.2.6 When to use pcy algorithm?
1 1.2.7 multilevel algorithm
1 1.2.8 motion
1 1.3 Find similar products.
Jaccard measure of 1 1.3. 1 similarity
11.3.2 Application of Jaccard Similarity
1 1.3.3 Minimum hash
1 1.3.4 Similarity between minimum hash and jaccard
1 1.3.5 Why can the minimum hash be used to estimate the similarity?
1 1.3.6 Implementation of Minimum Hash
1 1.3.7 movement
1 1.4 local sensitive hash
1 1.4. 1lsh Example: Entity Resolution
Local sensitive hash of 1 1.4.2 tag
1 1.4.3 combination of minimum hash method and local sensitive hash.
1 1.4.4 motion
1 1.5 large-scale data clustering
The application of 1 1.5. 1 clustering
1 1.5.2 Definition of distance
1 1.5.3 Cohesive clustering
1 1.5.4k? Mean value algorithm
Massive data 1 1.5.5 K? Means and methods
1 1.5.6 Processing after memory is full.
1 1.5.7 motion
1 1.6 summary
1 1.7 references
12 Chapter Database System and Internet
12. 1 search engine architecture
12. 1. 1 search engine composition
12. 1.2 web crawler
12. 1.3 query processing in search engines
12. 1.4 ranking pages.
Pagerank used to identify important web pages.
12.2. 1pagerank's intuitive thinking
The Recursive Formula of 12.2.2pagerank —— A Preliminary Attempt
12.2.3 crawler traps and dead ends
12.2.4 pagerank considering reptile traps and dead ends
12.2.5 exercise
12.3 pagerank for specific topics
12.3. 1 "Long-distance Mobile" Collection
12.3.2 Calculate the pagerank related to the topic.
12.3.3 link cheating
12.3.4 pagerank and link cheating related to the topic
12.3.5 exercise
12.4 data stream
12.4. 1 data flow management system
12.4.2 data stream application
12.4.3 data flow data model
12.4.4 data stream conversion to relationship
12.4.5 relationship into data stream.
12.4.6 exercise
12.5 data stream mining
12.5. 1 motivation
12.5.2 statistics of binary numbers
12.5.3 Calculate the number of different elements.
12.5.4 exercise
12.6 summary
12.7 references