Current Technology: Database Design,Transaction management and concurrency control

The below post is notes prepared by me by studying the book "Database Systems Design, Implementation and Management" by Peter Rob and Carlos Coronel

Content, examples and diagrams are taken from that book.

Unit-III Chapter –III Database Design

Q: What is an information system?

Ans: A complete information system is composed of people, hardware, software, the databases, application programs and procedures.

The process of creating an information system is known as system development.

Q: The system development life cycle(SDLC)

The SDLC is an iterative rather than a sequential process.

1. Planning:

The SDLC planning phase yields a general overview of the company and its objectives.

An initial assessment of the information flow-and-extent requirements must be made to answer questions like

Should the existing system be continued?

Should the existing system be modified?

Should the existing system be replaced?

If it is decided that a new system is necessary, then it is checked whether the new system is feasible or not. The feasibility study includes

Technical feasibility: Can the development of the new system be done with current equipment, existing software technology, and available personnel? Does it require new technology?
Economic feasibility: Can we afford it? Is it a million dollar solution for a thousand dollar problem?
Operational feasibility: Does the company possess the human, technical and financial resources to keep the system operational? Will there be resistance from users?

2.Analysis:

A thorough audit of user requirements and

understanding of system’s functional areas, actual and potential problems and opportunities.

The logical design must specify the appropriate conceptual data model, inputs, processes and expected output requirements using tools such as DFDs,ER diagrams etc..

All data transformations (processes) are described and documented using such system analysis tools.

3.Detailed System design

The design includes all the necessary technical specifications for the screens, menus, reports and other devices that might be used to help make the system more efficient information generator.

4. Implementation

The hardware, DBMS software and application programs are installed and the database design is implemented.

During the intial stages of implementation phase, the system enters into a cycle of coding, testing and debugging until it is ready to be delivered.

The system will be in full operation by the end of this phase but will be continuously evaluated and fine-tuned.

5. Maintenance

Maintenance includes all the activity after the installation of software that is performed to keep the system operational.

Major forms of maintenance activities are

fixing of errors fall under corrective maintenance.

Adaptive maintenance due to changes in the business environment.

Perfective maintenance to enhance the system.

UNIT-IV Chapter-I

Transaction management and concurrency control

What is a transaction?

A transaction is any action that reads from and / or writes to a database.

A transaction is a single, indivisible, logical unit of work.

All transaction are controlled and executed by the DBMS to guarantee database integrity.

Q: Transaction properties or ACID test

Each individual transaction must display Atomicity, Consistency, Isolation and Durability.

Atomicity: requires all operations (SQL requests) of a transaction be completed if not the transaction is aborted.

If a transaction T1 has four SQL requests, all four requests must be successfully completed otherwise the entire transaction is aborted.

Consistency: A transaction takes a database from one consistent state to another.

Isolation: means that the data used during the execution of a transaction cannot be used by a second transaction until the first one is completed.

If transaction T1 is being executed and is using data item X, that data item cannot be accessed by any other transction until T1 ends.

Durability:ensures that once transaction changes are done(committed), they cannot be undone or lost, even in the event of a system failure. COMMITED TRANSACTIONS ARE NOT ROLLED BACK

When executing multiple transactions the DBMS must schedule the concurrent execution of the transaction’s operations. The schedule of such transaction’s operations must exhibit the property of serializability.

Serializability ensures that the schedule for the concurrent execution of the transactions yields consistent results.

Q: Transaction management with SQL

Transaction support is provided by two SQL statements : COMMIT and ROLLBACK.

Transaction sequence must continue through all succeeding SQL statements until one of the following four events occurs.

A COMMIT statement is reached, in which case all changes are permanently recorded within the database.

The COMMIT statement automatically ends the SQL transaction.

Ex: UPDATE product SET qoh = qoh-2 WHERE pno=’P01’;

UPDATE customer SET baldue = baldue+20 where cno = 201;

COMMIT;

A ROLLBACK is reached, in which case all changes are aborted and the database is rolled back to its previous consistent state.
The end of a program is successfully reached, in which case all changes are permanently recorded within the database. This action is equivalent to COMMIT

A program is abnormally terminated, in which case changes made in the database are aborted and the database is rolled back to its previous consistent state. This action is equivalent to ROLLBACK.

A transaction begins implicitly when the first SQL statement is encountered.

SQL SERVER uses transaction management statement such as BEGIN TRANSACTION; to indicate beginning of a new transaction.

The Oracle RDBMS uses the SET TRANSACTION statement to declare a new transaction start and its properties.

Q: The Transaction Log:

A DBMS uses a transactional log to keep track of all transactions that update the database.

The information stored in this log is used by the DBMS for a recovery requirement

The transaction log stores:

A record for the beginning of the transaction.
For each transaction component (SQL statement ):

The type of operation being performed (update, insert, delete)

The names of the objects affected by the transaction ( the name of the table)

The before and after values for the fields being updated.

Pointer to the previous and next transaction log entries for the same transaction.

The ending (COMMIT) of the transaction.

A transaction log

TRL_ID	TRX_NUM	Prev PTR	Next PTR	Operation	Table	RowID	Attribute	Before Value	After Value
341	101	Null	352	START	**Start Transaction
352	101	341	363	UPDATE	PRODUCT	P01	qoh	20	18
363	101	352	365	UPDATE	CUSTOMER	201	baldue	100	120
365	101	363	Null	COMMIT	**End of Transaction

Concurrency Control

The Coordination of the simultaneous execution of transactions in a multi user database system is known as concurrency control.

Problems due to concurrent transaction due to lack of concurrency control are:

Lost updates
Uncommited data and
Inconsistent retrievals

Lost updates occurs when two concurrent transactions, T1 and T2 are updating the same data element and one of the upadates is lost (overwritten by the other transaction)

Ex: consider two concurrent transactions to update qoh

Transaction	Computation
T1 :Purchase of 100 units	qoh = qoh + 100
T2: Sell 30 units	Qoh = qoh - 30

Serial execution of above transactions

Time	Transaction	Step	Value
1	T1	Read qoh	35
2	T1	qoh = 35+ 100
3	T1	Write qoh	135
4	T2	Read qoh	135
5	T2	qoh = 135-30
6	T2	Write qoh	105

The below table shows how a Lost Updates problem can arise

Time	Transaction	Step	Value
1	T1	Read qoh	35
2	T2	Read qoh	35
3	T1	qoh = 35+ 100
4	T2	qoh = 35-30
5	T1	Write qoh (lost update)	135
6	T2	Write qoh	5

Uncommited data or dirty read occurs when two transactions T1 and T2 are executed concurrently and the first transaction (T1) is rolled back after the first transaction (T1) is rolled back after the second transaction (T2) has already accessed the uncommitted data - thus violating the isolation property of transactions.

Time	Transaction	Step	Value
1	T1	Read qoh	35
2	T1	qoh = 35+ 100
3	T1	Write qoh	135
4	T1	*ROLL BACK	35
5	T2	Read qoh	35
6	T2	qoh = 135-30
7	T2	Write qoh	5

Time	Transaction	Step	Value
1	T1	Read qoh	35
2	T1	qoh = 35+ 100
3	T1	Write qoh	135
4	T2	Read qoh	135
5	T2	qoh = 135-30
6	T1	*ROLLBACK*	35
7	T2	Write qoh	105

Inconsistent retrievals occur when a transaction accesses data before and after another transaction finish working with such data.

For ex: T1 calculates summary( aggregate) function over a set of data while another transaction T2 is updating same data

Transaction T1	Transaction T2
SELECT SUM(qoh) FROM product WHERE pno < ‘P04’;	UPDATE product SET qoh = qoh + 10 WHERE pno = ‘P01’;

Time	Transaction	Action	Value	Total
1	T1	Read qoh for pno = ‘P01’	10	10
2	T2	Read qoh for pno = ‘P01’	10
3	T2	qoh = 10 + 10
4	T2	Write qoh for pno = ‘P01’	20
4	T1	Read qoh for pno = ‘P02’	3	13
5	T2	*COMMIT*
6	T2	Read qoh for pno = ‘P03’	5000	5013

The computed answer of 5013 is wrong as the correct answer is 5023

Unless the DBMS exercises concurrency control, a multi user database environment can create havoc within the information system.

The Scheduler

If two transactions access unrelated data then

There will be no conflict

And order of execution is irrelevant.

If the transactions operate on related data then

Conflict possible and

The selection of one execution order over another might have undesirable consequences.

The correct order is determined by built - in scheduler.

Q: What is a scheduler?

The scheduler is a special DBMS process that establishes the order in which the operations within concurrent transactions are executed.

The scheduler bases its actions on concurrency control algorithms, such as locking or time stamping methods.

NOT ALL TRANSACTIONS ARE SERIALIZALE

If the transactions are not serializable then the transaction are executed on first-come, first - served basis by the DBMS.

If transactions are executed in serial order one after another then

CPU time will be wasted and

Yields unacceptable response times within the multi user DBMS environment.

Q: What is Serializable Schedule?

Ans: A serializable schedule is a schedule of transaction’s operations in which the interleaved execution of the transactions (T1, T2, T3 etc.) yields the same result as if the transaction were executed in serial order (one after another).

	Transaction
Operations	T1	T2	Result
	Read	Read	No Conflict
	Read	Write	Conflict
	Write	Read	Conflict
	Write	Write	Conflict

Q: What are conflicting database operations?

Ans: Scheduler facilitates data isolation to ensure two transactions do not update the same data element at the same time.

Two operations are in conflict when they access the same data and atleast one of them is a WRITE operation.

The figure shows Conflicting database operations

Concurrency control with locking methods

A lock guarantees exclusive use of a data item to a current transaction.

A transaction acquires lock prior to data access; the lock is released when the transaction is completed so that another transaction can lock the data item for its exclusive use.

The database might be in a temporary inconsistent state when several updates are executed. Therefore, locks are required to prevent another transaction from reading inconsistent data.

Most multi user DBMSs automatically initiate and enforce locking procedures.

All lock information is managed by a lock manager.

Q: What is lock granularity? Explain?

Ans: Lock granularity indicates the level of lock use.

Locking can take place at the following levels: database, table, page, row or even field.

Database level

In a database level lock, the entire database is locked.

This level of locking is good for batch processes, but it is unsuitable for multiuser DBMSs.

Table Level

In a table level lock, the entire table is locked.

It is unsuitable for multiuser DBMSs.

Page Level

In page level lock, the DBMS will lock an entire diskpage.

Page is equivalent to disk block. A block is the smallest unit of data transfer between the hard disk and the processor.

Row level

The DBMS allows concurrent transactions to access different rows of the same table.

The row - level locking approach

Improves the availability of data but

Its management requires high overhead because a lock exists for each row.

Modern DBMS automatically escalate a lock from row level to page level lock when the application session requests multiple locks on the same page.

Field Level

The DBMS allows concurrent transactions to access different fields within a row.

Field level locking

Yields the most flexible multi user data access but it is

Rarely implemented in DBMS

LOCK TYPES : DBMS may use different lock types: Binary locks and Shared/Exclusive locks

Binary Locks:

A binary lock has only two states: locked (1) or unlocked (0).

If an object is locked by transaction, no other transaction can use that object.

Shared/Exclusive Locks :A shared lock is issued when a transaction wants to read data from the database.

An exclusive lock is issued when a transaction want to update (write) a data item.

Using Shared/Exclusive Locks concept, a lock can have three states: unlocked, shared (read) and exclusive (write).

Lock-compatibility matrix

Any number of transactions can hold shared locks on an item,

but if any transaction holds an exclusive on the item no other transaction may hold any lock on the item.

If a lock cannot be granted, the requesting transaction is made to wait till all incompatible locks held by other transactions have been released. The lock is then granted.

Although the use of shared locks renders data access more efficiently, a Shared/Exclusive Lock schema increases the lock manager’s overhead for several reasons:

The type of lock must be known before a lock can be granted
Three lock operations exist:

READ_LOCK (to check the type of lock)

WRITE_LOCK (to issue the lock) and

UNLOCK (to release the lock)

The schema has been enhanced to allow a lock upgrade (from shared to exclusive) and a lock downgrade (from exclusive to shared)

Locks prevent serious data inconsistencies but they lead to two major problems:

Serializability
Deadlock

Serializability is guaranteed through a locking protocol known as two-phase locking.

TWO-PHASE LOCKING (2PL):defines how transaction acquire and release locks

ENSURES SERIALIZABILITY
Does not prevent deadlocks

The two phases are:

Growing Phase:

Acquires all locks
Doesnot unlock any data

Once all locks have been acquired, the transaction is in its locked state.

Shrinking Phase:

Releases all locks

Cannot obtain any lock

The two phase locking protocol is governed by the following rules

Two transaction cannot have conflicting locks
No unlock operation can precede a lock operation in the same transaction
No data are unaffected until all locks are obtained-- that is until the transaction is in its locked point.

The transaction acquires all of the locks it needs until it reaches its locked point.

When the locked point is reached, the data are modified.

Finally the transaction is completed as it releases all of the locks it acquired in the first phase.

DeadLocks

A deadlock occurs when two transactions wait indefinitely for each other to unlock data.

For example: T1 has locked data item X and waiting for data item Y which is held by T2 and

T2 has locked data item Y and waiting for data item X which is held by T1.

T1 and T2 wait for each other to unlock the required data item. Such a deadlock is also known as deadly embrace.

The figure demonstrates how a deadlock is created.

Time	Transaction	Reply	Lock status
0			Data X	Data Y
1	T1: LOCK(X)	OK	Unlocked	Unlocked
2	T2:LOCK(Y)	OK	Locked	Unlocked
3	T1:LOCK(Y)	WAIT	Locked	Locked
4	T2:LOCK(X)	WAIT	Locked	Locked
3	T1:LOCK(Y)	WAIT	Locked	Locked
4	T2:LOCK(X)	WAIT	Locked	Locked

			Deadlock

The three basic techniques to control deadlocks are:

Deadlock prevention:

A transaction requesting a new lock is aborted when there is the possibility that a deadlock can occur.

If a transaction is aborted, all changes made by this transaction are rolled back and all locks obtained are released.

The transaction is rescheduled for execution.

Deadlock prevention works because it avoids the conditions that lead to deadlocking.

Deadlock detection:

The DBMS periodically tests the database for deadlocks. If a deadlock is found, one of the transaction (“the victim”) is aborted (rolled back and restarted ) and the other transaction continues.

Deadlock avoidance: The transaction must obtain all locks it needs before it can be executed. This technique avoids rollback of conflicting transactions.

The choice of deadlock control method to use depends on the database environment.

For example, if the probability of deadlock is

Low, deadlock detection is recommended.

High, deadlock prevention is recommended.

If the response time is not high on the system’s priority list, deadlock avoidance is employed.

DBMS use a blend of prevention and avoidance for other types of data such as XML data or data warehouses

All current DBMSs support deadlock detection in transactional databases

Concurrency control with time stamping methods

The time stamping approach assigns a global, unique time stamp to each transaction.

Time stamps must have two properties: uniqueness and monotonicity.

Uniqueness ensures that no equal timestamp values can exists

Monotonicity ensures that timestamp values always increase.

All database operations (READ and WRITE) within the same transaction must have the same timestamp.

The DBMS executes conflicting operations in timestamp order.

If two transactions conflict, one is stopped, rolled back, rescheduled and assigned a new timestamp value.

For each data Q two timestamp values have to be maintained. They are

W-timestamp(Q) is the largest timestamp of any transaction that executed write(Q) successfully.

R- timestamp(Q) is the largest timestamp of any transaction that executed read(Q) successfully.

Thus timestamping increases memory needs and database processing overhead and uses a lot of system resources.

WAIT / DIE AND WOUND/WAIT SCHEMES

Assume that you have two conflicting transactions T1 and T2 each with a unique timestamp.

Suppose T1 has a timestamp of 115 and

T2 has a timestamp of 195

That is T1 is older transaction and T2 is newer transaction.

Transaction Requesting Lock	Transaction Owning lock	Wait/ Die Scheme	Wound/Wait Scheme
T1(115)	T2(195)	T1 waits until T2 is completed and T2 releases its lock	T1 preempts (rollbacks) T2 T2 is rescheduled using the same timestamp
T2(195)	T1(115)	T2 dies (rollback) T2 is rescheduled using the same timestamp	T2 waits until T1 is completed and T1 releases its lock

For A transaction that requests multiple locks.How long does a transaction have to wait for each lock request?

To prevent that type of deadlock, each lock request has an associate timeout value.

If the lock is not granted before the timeout expires, the transaction is rolled back.

Concurrency control with optimistic methods

The optimistic approach is based on the assumption that the majority of the database operations do not conflict.

This has three phases they are:

Read phase: the transaction T reads the data items from the database into its private workspace.

All the updates of the transaction can only change the local copies of the data in a private workspace.

Validate phase:

Checking is performed to confirm the read values have changed during the time transaction was updating the local values. This is performed by comparing the current database values to the values that were read in the private workspace.

Incase the values have changed, the local copies are thrown away and the transaction aborts.

Write phase: The changes are permanently applied to the database.

Database recovery management

Transaction recovery reverses all the changes that the transaction made to the database before the transaction was aborted and to recover the system after some type of critical error has occurred.

Examples of critical events are:

1. Hardware/software failures: Failure of this type could be harddisk media failure, bad capacitor on motherboard, or a failing memory bank, application program or O.S errors that cause the data to be overwritten, deleted or lost.

2. Human- caused incidents are two types: unintentional and intentional

Unintentional failure caused by carelessness by end users. Such errors include deleting the wrong rows from a table or shutting down the database server by accident.
Intentional events are security threats caused by hackers and virus attacks.

3. Natural disasters include fires,earthquakes,floods and power failures.

A critical error can render the database in an inconsistent state.

Transaction recovery

Database transaction recovery uses data in transaction log to bring a database to a consistent state after a failure.

Four important concepts that affect the recovery process.

The write-ahead-log protocol ensures that transaction logs are always written before any database data are actually updated. Recovery is done using the data in transaction log incase of failure.
Redundant transaction logs ensure that a physical disk failure will not affect the recovery.
When a transaction updates data, it actually updates the copy of the data in buffer(temporary storage area in primary memory). This process is much faster than accessing the physical disk every time. Later on, all buffers are written to physical disk during a single operation.
Database checkpoints are operations in which the DBMS writes all of its updated buffers to disk.
While this is happening, the DBMS does not execute any other requests.
Checkpoints are automatically scheduled by the DBMS several times per hour.
Checkpoint operation is also registered in the transaction log.

Q: Write about different types of recovery techniques?

Ans: Transaction recovery procedures generally make use of

Deferred-write also called Deferred Update and

Write-through also called Immediate Update techniques.

In Deferred- write technique, the transaction operations do not immediately update the physical database. Instead, only the transaction log is updated.

The database is physically updated only after the transaction is committed

The recovery process using Deferred-write follows these steps:

Identify the last checkpoint in the transaction log.
(This is the last time transaction data was physically saved to disk.)
For a transaction that started and committed before the last checkpoint, nothing needs to be done
For a transaction that performed a commit operation after last checkpoint, the DBMS redoes the transaction using the after values in transaction log.
The changes are made in ascending order from oldest to newest.
For a transaction that had a ROLLBACK operation after the last checkpoint or that was left active (with neither a COMMIT nor a ROLLBACK) before failure , nothing is done because no changes were written to disk

In Write-through technique, the database is immediately updated by transaction operations during the transaction’s execution, even before the transaction reaches its commit point.

If a transaction aborts, a undo operation need to be done.

The recovery process using write-through follows these steps:

first 3 points same as above.
For a transaction that had a ROLLBACK operation after the last checkpoint or that was left active (with neither a COMMIT nor a ROLLBACK) before failure, the DBMS undo the transaction using the before values in transaction log.

TRL ID	TRX NUM	Prev PTR	Next PTR	Operation	Table ID	Row Value	Attribute	Before	After Value
341	101	Null	352	START	**Start Transaction
352	101	341	363	UPDATE	PRODUCT	P01	QOH	20	18
363	101	352	365	UPDATE	CUSTOMER	201	baldue	100	120
365	101	363	Null	COMMIT	**End of transaction
397	106	Null	405	START	**Start Transaction
405	106	397	415	INSERT	INVOICE	305			305
415	106	405	419	INSERT	LINE	305,L01			305,L01,P05
419	106	415	427	UPDATE	PRODUCT	P05	QOH	120	119
423				CHECK POINT
427	106	419	431	UPDATE	CUSTOMER	202	baldue	500	1050
431	106	427	Null	COMMIT	**End of transaction
521	155	Null	525	START	**Start Transaction
525	155	521	528	UPDATE	PRODUCT	P08	QOH	100	80
528	155	525	Null	COMMIT	**End of transaction
**CRASH**

Transaction 101 :

UPDATE product SET qoh=qoh-2 WHERE pno=’P01’;

UPDATE customer SET baldue =baldue+20 WHERE cno=201;

COMMIT;

Transaction 106:

INSERT INTO Invoice VALUES(305,202,SYSDATE());

INSERT INTO line VALUES(305,’L01’,’P05’,1,550);

UPDATE product SET qoh=qoh-1 WHERE pno=’P05’;

UPDATE customer SET baldue =baldue+550 WHERE cno=202;

COMMIT;

Transaction 155:

UPDATE product SET qoh=qoh-20 WHERE pno=’P08’;

COMMIT;

Database recovery process for a DBMS using deferred update method is as follows

Identify the last checkpoint. In this case it is TRL ID 423. This was the last time database buffers were physically written to disk.
Transaction 101 committed before the last check point. All changes were already written to disk and so no action to be taken
For each transaction committed after the last checkpoint, the DBMS does
for example: for transaction 106:
        1.Find COMMIT (TRL ID 457)
        2.Use the previous pointer values to locate the start of the transaction (TRL ID 397)
        3.Use the next pointer values to locate each DML statement and apply the changes to disk, using the
           after values (Start with TRL ID 405, then 415,419,427 and 431)
        4.Repeat the process for transaction 155
Transactions that ended with ROLLBACK and that were active at the time of crash, nothing is done because no changes were written to disk

Wednesday, 21 May 2014

Database Design,Transaction management and concurrency control

No comments:

Post a Comment

Recent

Comments