Important Terminologies
Database: Database is a collection of inter-related data which helps in efficient retrieval, insertion and deletion of data from database and organizes the data in the form of tables, views, schemas, reports etc. For Example, university database organizes the data about students, faculty, and admin staff etc. which helps in efficient retrieval, insertion and deletion of data from it.Paradigm Shift from File System to DBMS
A File Management system is a DBMS that allows access to single files or tables at a time. In a File System, data is directly stored in a set of files. It contains flat files that have no relation to other files (when only one table is stored in a single file, then this file is known as flat file). File System manages data using files in the hard disk. Users are allowed to create, delete, and update the files according to their requirements. Let us consider the example of a file-based University Management System. Data of students is available to their respective Departments, Academics Section, Result Section, Accounts Section, Hostel Office, etc. Some of the data is common for all sections like Roll No, Name, Father Name, Address and Phone number of students but some data is available to a particular section only like Hostel allotment number which is a part of hostel office. Let us discuss the issues with this system:Relational DBMS
Relational database means the data is stored as well as retrieved in the form of relations (tables). The table below shows the relational database with only one relation called STUDENT which stores ROLL_NO, NAME, ADDRESS, PHONE and AGE of students.ROLL_NO | NAME | ADDRESS | PHONE | AGE |
1 | RAM | DELHI | 9455123451 | 18 |
2 | RAMESH | GURGAON | 9652431543 | 18 |
3 | SUJIT | ROHTAK | 9156253131 | 20 |
4 | SURESH | DELHI | 9156768971 | 18 |
NoSQL
A NoSQL originally referring to non SQL or non-relational is a database that provides a mechanism for storage and retrieval of data. This data is modelled in means other than the tabular relations used in relational databases. NoSQL databases are used in real-time web applications and big data and their use are increasing over time. NoSQL systems are also sometimes called Not only SQL to emphasize the fact that they may support SQL-like query languages. A NoSQL database includes simplicity of design, simpler horizontal scaling to clusters of machines and finer control over availability. The data structures used by NoSQL databases are different from those used by default in relational databases which makes some operations faster in NoSQL. The suitability of a given NoSQL database depends on the problem it should solve. Data structures used by NoSQL databases are sometimes also viewed as more flexible than relational database tables. Types of NoSQL databases and the name of the databases system that falls in that category are:Advantages of DBMS
DBMS helps in efficient organization of data in database which has following advantages over typical file system.Two Level Architecture
Two level architecture is similar to a basic client-server model. The application at the client end directly communicates with the database at the server side. API's like ODBC,JDBC are used for this interaction. The server side is responsible for providing query processing and transaction management functionalities. On the client side, the user interfaces and application programs are run. The application on the client side establishes a connection with the server side in order to communicate with the DBMS.DBMS 3-tier Architecture
DBMS 3-tier architecture divides the complete system into three inter-related but independent modules as shown in Figure below.Data Independence
Data independence means a change of data at one level should not affect another level. Two types of data independence are present in this architecture:Phases of database design
Database designing for a real world application starts from capturing the requirements to physical implementation using DBMS software which consists of following steps shown in Figure.Entity, Entity Type, Entity Set
An Entity may be an object with a physical existence - a particular person, car, house, or employee - or it may be an object with a conceptual existence - a company, a job, or a university course.
An Entity is an object of Entity Type and set of all entities is called as Entity Set. e.g.; E1 is an entity having Entity Type Student and set of all students is called Entity Set. In ER diagram, Entity Type is represented as:
Attribute(s)
Attributes are the properties which define the entity type. For example, Roll_No, Name, DOB, Age, Address, Mobile_No are the attributes which defines entity type Student. In ER diagram, attribute is represented by an oval.
Relational Model was proposed by E.F. Codd to model data in the form of relations or tables. After designing the conceptual model of Database using the ER diagram, we need to convert the conceptual model in the relational model which can be implemented using any RDBMS languages like Oracle SQL, MySQL, etc. So we will see what the Relational Model is.
What is Relational Model?
Relational Model represents how data is stored in Relational Databases. A relational database stores data in the form of relations (tables). Consider a relation STUDENT with attributes ROLL_NO, NAME, ADDRESS, PHONE and AGE shown in Table 1.STUDENT
ROLL_NO | NAME | ADDRESS | PHONE | AGE |
1 | RAM | DELHI | 9455123451 | 18 |
2 | RAMESH | GURGAON | 9652431543 | 18 |
3 | SUJIT | ROHTAK | 9156253131 | 20 |
4 | SURESH | DELHI | 18 |
IMPORTANT TERMINOLOGIES
1 | RAM | DELHI | 9455123451 | 18 |
ROLL_NO |
1 |
2 |
3 |
4 |
Constraints in Relational Model
While designing Relational Model, we define some conditions which must hold for data present in database are called Constraints. These constraints are checked before performing any operation (insertion, deletion and updation) in database. If there is a violation in any of constrains, operation will fail.STUDENT
ROLL_NO | NAME | ADDRESS | PHONE | AGE | BRANCH_CODE |
1 | RAM | DELHI | 9455123451 | 18 | CS |
2 | RAMESH | GURGAON | 9652431543 | 18 | CS |
3 | SUJIT | ROHTAK | 9156253131 | 20 | ECE |
4 | SURESH | DELHI | 18 | IT |
BRANCH
BRANCH_CODE | BRANCH_NAME |
CS | COMPUTER SCIENCE |
IT | INFORMATION TECHNOLOGY |
ECE | ELECTRONICS AND COMMUNICATION ENGINEERING |
CV | CIVIL ENGINEERING |
ANOMALIES
An anomaly is an irregularity or something which deviates from the expected or normal state. When designing databases, we identify three types of anomalies: Insert, Update and Delete.CREATE TABLE Order(
Order_id INTEGER Primary Key,
Order_Date DATE,
Cost INTEGER,
Customer_ID REFERENCES Customer(Customer_ID)
)
Functional Dependency
Functional Dependency is a constraint between two sets of attributes in a relation from a database. A functional dependency is denoted by an arrow (→). If an attribute A functionally determines B or B is functionally dependent on A, then it is written as A → B.What does functionally dependent mean?
A function dependency A → B means for all instances of a particular value of A, there is the same value of B.A B
------
1 3
2 3
4 0
1 3
4 0
Why do we study Functional Dependency?
We study functional dependency so that we can carve out a few attributes to another table so that the data redundancy can be reduced to a minimum. Let's look at the following table:ABC → AB
ABC → A
ABC → ABC
Id → Name,
Name → DOB
AB → BC,
AD → DC
1. First Normal Form
If a relation contains a composite or multi-valued attribute, it violates first normal form or relation is in first normal form if it does not contain any composite or multi-valued attribute. A relation is in first normal form if every attribute in that relation is singled valued attribute.ID Name CoursesIn the above table, Course is a multi-valued attribute so it is not in 1NF.
------------------
1 A c1, c2
2 E c3
3 M C2, c3
ID Name Course
------------------
1 A c1
1 A c2
2 E c3
3 M c1
3 M c2
2. Second Normal Form
To be in second normal form, a relation must be in first normal form and relation must not contain any partial dependency. A relation is in 2NF if it has No Partial Dependency, i.e., no non-prime attribute (attributes which are not part of any candidate key) is dependent on any proper subset of any candidate key of the table.FD set: {COURSE_NO->COURSE_NAME}In FD COURSE_NO->COURSE_NAME, COURSE_NO (proper subset of candidate key) is determining COURSE_NAME (non-prime attribute). Hence, it is partial dependency and relation is not in second normal form.
Candidate Key: {STUD_NO, COURSE_NO}
STUDENT_COURSE (STUD_NO, COURSE_NO)Note - This decomposition will be lossless join decomposition as well as dependency preserving.
COURSE (COURSE_NO, COURSE_NAME)
AB -> C [A and B together determine C]In the above relation, AB is the only candidate key and there is no partial dependency, i.e., any proper subset of AB doesn't determine any non-prime attribute.
BC -> D [B and C together determine D]
3. Third Normal Form
Def 1: It follows two rules:4. Boyce-Codd Normal Form (BCNF)
A relation R is in BCNF if R is in Third Normal Form and for every FD, LHS is super key. A relation is in BCNF if in every non-trivial functional dependency X --> Y, X is a superkey.BC->D,Step 1. As we can see, (AC)+ ={A,C,B,E,D} but none of its subset can determine all attribute of relation, So AC will be candidate key. A or C can’t be derived from any other attribute of the relation, so there will be only 1 candidate key {AC}.
AC->BE,
B->E
ABC --> D
CD --> AE
Concurrency Control deals with interleaved execution of more than one transaction. In the next article, we will see what is serializability and how to find whether a schedule is serializable or not.
What is Transaction?A set of logically related operations is known as transaction. The main operations of a transaction are:
Read(A): Read operations Read(A) or R(A) reads the value of A from the database and stores it in a buffer in main memory.
Write (A): Write operation Write(A) or W(A) writes the value back to the database from buffer.
Let us take a debit transaction from an account which consists of following operations:
Assume A’s value before starting of transaction is 5000.
But it may also be possible that transaction may fail after executing some of its operations. The failure can be because of hardware, software or power etc. For example, if debit transaction discussed above fails after executing operation 2, the value of A will remain 5000 in the database which is not acceptable by the bank. To avoid this, Database has two important operations:
Commit: After all instructions of a transaction are successfully executed, the changes made by transaction are made permanent in the database.
Rollback: If a transaction is not able to execute all operations successfully, all the changes made by transaction are undone.
Properties of a transaction
What is a Schedule?
A schedule is a series of operations from one or more transactions. Schedules are used to resolve conflicts between the transactions.Types of Schedules?
Lets discuss various types of schedules.T1 | T2 |
---|---|
R(A) | |
W(A) | |
R(B) | |
W(B) | |
R(A) | |
R(B) |
S1: R1(x), W1(x), R2(x), R1(y), R2(y),Given schedule follows order of Ti->Tj => C1->C2. Transaction T1 is executed before T2 hence there is no chances of conflict occur. R1(x) appears before W1(x) and transaction T1 is committed before T2 i.e. completion of first transaction performed first update on data item x, hence given schedule is recoverable.
W2(x), W1(y), C1, C2;
T1 | T2 |
---|---|
R(A) | |
W(A) | |
W(A) | |
R(A) | |
commit | |
commit |
T1 | T2 |
---|---|
R(A) | |
W(A) | |
W(A) | |
commit | |
R(A) | |
commit |
T1 | T2 |
---|---|
R(A) | |
W(A) | |
R(A) | |
W(A) | |
abort | |
abort |
T1 | T2 |
---|---|
R(A) | |
R(A) | |
W(A) | |
commit | |
W(A) | |
R(A) | |
commit |
Conflict Serializable: A schedule is called conflict serializable if it can be transformed into a serial schedule by swapping non-conflicting operations.
Conflicting operations: Two operations are said to be conflicting if all conditions satisfy:Example: -
S1: R1(A), W1(A), R2(A), W2(A), R1(B), W1(B), R2(B), W2(B)If Oi and Oj are two operations in a transaction and Oi< Oj (Oi is executed before Oj), same order will follow in schedule as well. Using this property, we can get two transactions of schedule S1 as:
T1: R1(A), W1(A), R1(B), W1(B)Possible Serial Schedules are: T1->T2 or T2->T1
T2: R2(A), W2(A), R2(B), W2(B)
S11: R1(A), W1(A), R1(B), W2(A), R2(A), W1(B), R2(B), W2(B)-> Similarly, swapping non-conflicting operations W2(A) and W1(B) in S11, the schedule becomes,
S12: R1(A), W1(A), R1(B), W1(B), R2(A), W2(A), R2(B), W2(B)
S12 is a serial schedule in which all operations of T1 are performed before starting any operation of T2. Since S has been transformed into a serial schedule S12 by swapping non-conflicting operations of S1, S1 is conflict serializable.
S2: R2(A), W2(A), R1(A), W1(A), R1(B), W1(B), R2(B), W2(B)Two transactions will be:
T1: R1(A), W1(A), R1(B), W1(B)Possible Serial Schedules are: T1->T2 or T2->T1
T2: R2(A), W2(A), R2(B), W2(B)
S2: R2(A), W2(A), R1(A), W1(A), R1(B), W1(B), R2(B), W2(B)Swapping non-conflicting operations R1(A) and R2(B) in S2, the schedule becomes,
S21: R2(A), W2(A), R2(B), W1(A), R1(B), W1(B), R1(A), W2(B)Similarly, swapping non-conflicting operations W1(A) and W2(B) in S21, the schedule becomes,
S22: R2(A), W2(A), R2(B), W2(B), R1(B), W1(B), R1(A), W1(A)
In schedule S22, all operations of T2 are performed first, but operations of T1 are not in order (order should be R1(A), W1(A), R1(B), W1(B)). So S2 is not conflict serializable.
Conflict Equivalent: Two schedules are said to be conflict equivalent when one can be transformed to another by swapping non-conflicting operations. In the example discussed above, S11 is conflict equivalent to S1 (S1 can be converted to S11 by swapping non-conflicting operations). Similarly, S11 is conflict equivalent to S12 and so on.
Note 1: Although S2 is not conflict serializable, but still it is conflict equivalent to S21 and S21 because S2 can be converted to S21 and S22 by swapping non-conflicting operations.
Note 2: The schedule which is conflict serializable is always conflict equivalent to one of the serial schedule. S1 schedule discussed above (which is conflict serializable) is equivalent to serial schedule (T1->T2).
[GATE 2007]
Solution: Two transactions of given schedules are:T1: R1(X) R1(Y) W1(X)Let us first check serializability of S1:
T2: R2(X) R2(Y) W2(Y)
S1: R1(X) R1(Y) R2(X) R2(Y) W2(Y) W1(X)To convert it to a serial schedule, we have to swap non-conflicting operations so that S1 becomes equivalent to serial schedule T1->T2 or T2->T1. In this case, to convert it to a serial schedule, we must have to swap R2(X) and W1(X) but they are conflicting. So S1 can’t be converted to a serial schedule.
S2: R1(X) R2(X) R2(Y) W2(Y) R1(Y) W1(X)Swapping non conflicting operations R1(X) and R2(X) of S2, we get
S2’: R2(X) R1(X) R2(Y) W2(Y) R1(Y) W1(X)Again, swapping non conflicting operations R1(X) and R2(Y) of S2’, we get
S2’’: R2(X) R2(Y) R1(X) W2(Y) R1(Y) W1(X)Again, swapping non conflicting operations R1(X) and W2(Y) of S2’’, we get
S2’’’: R2(X) R2(Y) W2(Y) R1(X) R1(Y) W1(X)which is equivalent to a serial schedule T2->T1.
Read(X)
X = X - 10
Write(X)
Read(Y)
Y = Y + 10
Write(Y)
Read(Y)
Y = Y - 10
Write(Y)
Read(Z)
Z = Z + 10
Write(Z)
Read(Y)For Y, this is valid as in both the cases Y is read by T2 first. We don't need to check for X and Z as they are not shared by both the transactions.
Y = Y - 10
Write(Y)
Read(Z)
Z = Z + 10
Write(Z)
Read(X)
X = X - 10
Write(X)
Read(Y)
Y = Y + 10
Write(Y)
S1: R1(x), W1(x), R2(x), R1(y), R2(y),Given schedule follows order of Ti->Tj => C1->C2. Transaction T1 is executed before T2 hence there is no chances of conflict occur. R1(x) appears before W1(x) and transaction T1 is committed before T2 i.e. completion of first transaction performed first update on data item x, hence given schedule is recoverable.
W2(x), W1(y), C1, C2;
S2: R1(x), R2(x), R1(z), R3(x), R3(y), W1(x),Ti->Tj => C2->C3 but W3(y) executed before W2(y) which leads to conflicts thus it must be committed before T2 transaction. So given schedule is unrecoverable. if Ti->Tj => C3->C2 is given in schedule then it will become recoverable schedule.
W3(y), R2(y), W2(z), W2(y), C1, C2, C3;
S3: R1(x), R2(z), R3(x), R1(z), R2(y), R3(y), W1(x), C1,In this schedule W3(y) and W2(y) overwrite conflicts and there is no read, therefore given schedule is cascadeless schedule.
W2(z), W3(y), W2(y), C3, C2;
S4: R1(x), R2(x), R1(z), R3(x), R3(y),In this schedule no read-write or write-write conflict arises before commit hence its strict schedule:
W1(x), C1, W3(y), C3, R2(y), W2(z), W2(y), C2;
What is a Rollback?
Types of Rollback?
What is a Dirty Read Problem?
Reading the data written by an uncommitted transaction is called as dirty read.
What does Dirty Read Problem result into?
How to avoid Dirty Read Problem?
Schedule: T1: Lock-X1(A) T2: Lock-X2(B) T1: Lock-X1(B) T2: Lock-X2(A)In this T1: Lock-X1(B) has to wait for T2 and T2: Lock-X2(A) has to wait for T1.