Entity-Relationship Model - BunksAllowed

BunksAllowed is an effort to facilitate Self Learning process through the provision of quality tutorials.

Community

Entity-Relationship Model

Share This
The Entity-Relationship (E-R) data model, a semantic data model, views the real world as entities and relationships, aiding database design by specifying an enterprise schema. It is useful in mapping real-world enterprise meanings onto conceptual schemas, influencing many database-design tools.

The E-R data model consists of three fundamental concepts: entity sets, relationship sets, and attributes.

Entities are distinguishable objects in the real world, such as people or loans. They have a set of properties that can uniquely identify them. An entity set is a set of entities of the same type that share the same properties or attributes. For example, the set of all persons who are customers at a given bank can be defined as the entity set customer. The individual entities that constitute a set are called the extension of the entity set.
Ex.: Employees Entity Set

Attributes are descriptive properties possessed by each member of an entity set. Each entity has a value for each attribute, and the domain of each attribute is the set of permitted values. In the above Figure, ssn, name and lot are the attributes of Employees entity  set.

For each attribute associated with an entity set, we must identify a domain of possible values. For example, the domain associated with the attribute name of Employees might be the set of 20-character strings. As another example, if the company rates employees on a scale of 1 to 10 and stores ratings in a field called rating, the associated domain consists of integers 1 through 10.

Further, for each entity set, we choose a key. A key is a minimal set of attributes whose values uniquely identify an entity in the set. There could be more than one candidate key; if so, we designate one of them as the primary key.

A database includes a collection of entity sets, each containing any number of entities of the same type. An attribute of an entity set is a function that maps from the entity set into a domain. Each entity can be described by a set of (attribute, data value) pairs, one pair for each attribute of the entity set..


A relationship is an association among two or more entities.  Note that several relationship sets might involve the same entity sets. For example, we could also have a Manages relationship set involving Employees and Departments. A relationship can also have descriptive attributes. Descriptive attributes are used to record information about the relationship, rather than about any one of the participating entities; for example, we may wish to record that Mr. X works in the pharmacy department as of January 1991.
Ex. Works_In Relationship between Employees and Departments Entities.

An instance of a relationship set is a set of relationships. Intuitively, an instance can be thought of as a `snapshot' of the relationship set at some instant in time. Each Employees entity is denoted by its ssn, and each Departments entity is denoted by its did, for simplicity. The since value is shown beside each relationship.

As another example of an ER diagram, suppose that each department has offices in several locations and we want to record the locations at which each employee works. This relationship is ternary because we must record an association between an employee, a department, and a location. The ER diagram for this variant of Works_In, which we call Works_In2.

The entity sets that participate in a relationship set need not be distinct; sometimes a relationship might involve two entities in the same entity set. Since employees report to other employees, every relationship in Reports To is of the form (emp1; emp2), where both emp1 and emp2 are entities in Employees. However, they play different roles: emp1 reports to the managing employee emp2, which is reflected in the role indicators supervisor and subordinate. For example, the Reports_To relationship set has attributes corresponding to the ssn of the supervisor and the ssn of the subordinate, and the names of these attributes are supervisor_ssn and subordinate_ssn.


Participation Constraints

The key constraint on Manages tells us that a department has at most one manager. Let us say that every department is required to have a manager. This requirement is an example of a participation constraint; the participation of the entity set Departments in the relationship set Manages is said to be total. A participation that is not total is said to be partial. As an example, the participation of the entity set Employees in Manages is partial, since not every employee gets to manage a department.

Revisiting the Works_In relationship set, it is natural to expect that each employee works in at least one department and that each department has at least one employee. This means that the participation of both Employees and Departments in Works_In is total. 
 
The ER diagram shows both the Manages and Works_In relationship sets and all the given constraints. If the participation of an entity set in a relationship set is total, the two are connected by a thick line; independently, the presence of an arrow indicates a key constraint. The instances of Works_In and Manages satisfy all the constraints.


Weak Entities


Thus far, we have assumed that the attributes associated with an entity set include a key. This assumption does not always hold. For example, suppose that employees can purchase insurance policies to cover their dependents. We wish to record information about policies, including who is covered by each policy, but this information is really our only interest in the dependents of an employee. If an employee quits, any policy owned by the employee is terminated and we want to delete all the relevant policy and dependent information from the database.

We might choose to identify a dependent by name alone in this situation, since it is reasonable to expect that the dependents of a given employee have different names. Thus the attributes of the Dependents entity set might be pname and age. The attribute pname does not identify a dependent uniquely. Recall that the key for Employees is ssn; thus we might have two employees called Mr. X, and each might have a son called Joe.

Dependents is an example of a weak entity set. A weak entity can be identifi ed uniquely only by considering some of its attributes in conjunction with the primary key of another entity, which is called the identifying owner.

The following restrictions must hold: 
  • The owner entity set and the weak entity set must participate in a one-to-many relationship set (one owner entity is associated with one or more weak entities, but each weak entity has a single owner). This relationship set is called the identifying relationship set of the weak entity set. 
  • The weak entity set must have total participation in the identifying relationship set.

For example, a Dependents entity can be identified uniquely only if we take the key of the owning Employees entity and the pname of the Dependents entity. The set of attributes of a weak entity set that uniquely identify a weak entity for a given owner entity is called a partial key of the weak entity set. In our example pname is a partial key for Dependents.

The Dependents weak entity set and its relationship to Employees is shown in following Figure. The total participation of Dependents in Policy is indicated by linking them with a dark line. The arrow from Dependents to Policy indicates that each Dependents entity appears in at most one (indeed, exactly one, because of the participation constraint) Policy relationship. 
 
To underscore the fact that Dependents is a weak entity and Policy is its identifying relationship, we draw both with dark lines. To indicate that pname is a partial key for Dependents, we underline it using a broken line. This means that there may well be two dependents with the same pname value.

The E-R model integrates abstract schema with actual enterprise data, focusing on attribute values. Attributes can be simple, composite, single-valued, derived, or null. 
  • Simple attributes are not divided into sub-parts, while composite attributes can be divided into sub-parts. Composite attributes group related attributes together, making modeling cleaner.
  • Single-valued attributes have a single value for a particular entity, while multi-valued attributes have a set of values for a specific entity. Derived attributes can be derived from other related attributes or entities, such as loans-held or age. Null values indicate that an attribute does not exist or are unknown.

A database for a banking enterprise may include multiple entity sets, such as accounts, branch entities, or customer entities. Each entity set may be described by attributes such as branch-name, branch-city, and assets.

An attribute can be null when an entity does not have a value, indicating "not applicable" or unknown values. For example, a customer's name may be null, while an apartment number may be missing or unknown.

In a banking enterprise, database management may include multiple entity sets, such as accounts, branches, and accounts. Attributes play a crucial role in ensuring accurate and efficient data management.


Class Hierarchies


Entity sets can be classified into subclasses, such as Hourly_Emps and Contract_Emps, to distinguish the basis on which they are paid. Each entity in one of these sets is an Employees entity, and as such, must have all attributes of Employees defined. The attributes for an Hourly_Emps entity are the attributes for Employees plus Hourly_Emps.

The attributes for the entity set Employees are inherited by the entity set Hourly_Emps, and Hourly_Emps ISA (read is a) Employees. In contrast to class hierarchies in programming languages like C++, queries over instances of these entity sets must consider all Hourly_Emps and Contract_Emps entities.

Employees can also be classified using different criteria, such as identifying a subset of employees as Senior_Emps. To reflect this change, a second ISA node can be added as a child of Employees and make Senior_Emps a child of this node. Each of these entity sets can be further classified, creating a multilevel ISA hierarchy.

A class hierarchy can be viewed in one of two ways:

Employees is specialized into subclasses, which are subsets of an entity set (the superclass) that share some distinguishing characteristic. Specialization is the process of identifying subsets of an entity set (the superclass) that share some distinguishing characteristics. The superclass is defined first, followed by the subclasses, and subclass-specific attributes and relationship sets are added.

Generalization consists of identifying common characteristics of a collection of entity sets and creating a new entity set that contains these characteristics. Typically, subclasses are defined first, followed by the superclass, and any relationship sets that involve the superclass are then defined. Two types of constraints can be specified with respect to ISA hierarchies: overlap and covering constraints. Overlap constraints determine whether two subclasses are allowed to contain the same entity, while covering constraints determine whether the entities in the subclasses collectively include all entities in the superclass.

There are two basic reasons for identifying subclasses: 1) to add descriptive attributes that make sense only for the entities in a subclass, such as hourly wages, or 2) to identify the set of entities that participate in some relationship. For example, the Manages relationship may define the Participating entity sets as Senior Emps and Departments, ensuring that only senior employees can be managers.

Aggregation is another important aspect of ISA hierarchies, as it allows for the organization of entities into subclasses and relationships. This helps in defining the structure and functionality of a system, ensuring that entities are organized and managed effectively.


Aggregation


A relationship set is an association between entity sets, such as Projects and Sponsors. In the example given, Monitors should be a relationship set that associates a Sponsors relationship with an Employees entity. However, relationships are defined to associate two or more entities. To define a relationship set like Monitors, a new feature of the ER model, called aggregation, is introduced. Aggregation allows us to indicate that a relationship set (identified through a dashed box) participates in another relationship set. This allows us to treat Sponsors as an entity set for the purposes of defining the Monitors relationship set.

Aggregation is used to express relationships among relationships, but it can also be used to express relationships involving other relationships. In the example given, we can make Sponsors a ternary relationship, as there are two distinct relationships, Sponsors and Monitors, each possibly with its own attributes. The use of aggregation versus a ternary relationship may also be guided by certain integrity constraints.


Developing an ER diagram presents several choices, including the following:

Should a concept be modeled as an entity or an attribute? Should a concept be modeled as an entity or a relationship? What are the relationship sets and their participating entity sets? Should we use binary or ternary relationships? Should we use aggregation?

Entity vs. Attribute

When identifying attributes in an entity set, it's often unclear whether a property should be modeled as an attribute or an entity set. For instance, adding address information to the Employees entity set can be done using an attribute address, which records only one address per employee. Another option is to create an entity set called Addresses and record associations between employees and addresses using a relationship (Has Address). This is necessary in situations where multiple addresses are needed or to capture the structure of an address in an ER diagram. For example, breaking down an address into city, state, country, and zip code can support queries like "Find all employees with an address in Madison, WI." Another example is the relationship set (Works_In2), which can be modeled as an entity set instead of an attribute.

The Works In relationship set differs in that it has attributes from and to instead of since. It records the interval during which an employee works for a department. However, the ER diagram's semantics rule out the possibility of multiple periods for an employee. To address this, an entity set called Duration with attributes from and to can be introduced, as shown in the following Figure.

The ER model allows attributes to take on sets as values, allowing us to model Duration as an attribute of Works In, rather than an entity set. This approach is more intuitive than modeling Duration as an entity set. However, when these set-valued attributes are translated into the relational model, the resulting relational schema is similar to the entity set model.

Entity vs. Relationship


The relationship set Manages, renamed Manages2, is used when each department manager is given a discretionary budget, as shown in following Figure.

The current system stores the starting date and discretionary budget for each manager-department pair, assuming a manager receives a separate budget for each department. However, if the discretionary budget is a sum covering all departments managed by the same employee, each Manages2 relationship involving a given employee will have the same value in the budget field. This can lead to significant redundancy and potential problems. To address these issues, a new approach is proposed, associating the budget with the appointment of the employee as manager of a group of departments. This involves modeling the appointment as an entity set and using a ternary relationship, Manages3, to relate a manager, an appointment, and a department. The details of an appointment, such as the discretionary budget, are not repeated for each department, but one Manages3 relationship instance per department. Each department has at most one manager due to the key constraint.

Binary vs. Ternary Relationships


Consider the ER diagram shown below. It models a situation in which an employee can own several policies, each policy can be owned by several employees, and each dependent can be covered by several policies. Suppose that we have the following additional requirements:

  • A policy cannot be owned jointly by two or more employees. 
  • Every policy must be owned by some employee. 
  • Dependents is a weak entity set, and each dependent entity is uniquely identified by taking pname in conjunction with the policyid of a policy entity (which, intuitively, covers the given dependent).

The text suggests imposing a key constraint on policies regarding covers, which may result in a policy covering only one dependent. A total participation constraint is acceptable if each policy covers at least one dependent. The third requirement requires a binary identifying relationship, which is not the case in all versions of ER diagrams. Despite these limitations, the text remains relevant.

The text discusses the relationship between entity sets Parts, Suppliers, and Departments, and a relationship set Contracts. It argues that a contract specifies a supplier's supply of a part to a department, which cannot be adequately captured by binary relationships without aggregation. The text also suggests that a combination of these relationships cannot adequately express the meaning of a contract for two reasons.

  • The facts that supplier S can supply part P, that department D needs part P, and that D will buy from S do not necessarily imply that department D indeed buys part P from supplier S! 
  • We cannot represent the qty attribute of a contract cleanly.

Aggregation vs. Ternary Relationships


The choice between aggregation or a ternary relationship is primarily determined by the existence of a relationship relating a relationship set to an entity set or second relationship set. Integrity constraints may also guide the choice. For example, in an ER diagram, a project can be sponsored by multiple departments, each sponsored by multiple employees. If no record is needed for the until attribute of Monitors, a ternary relationship, such as Sponsors2, may be suitable.

The constraint that each department's sponsorship of a project be monitored by at most one employee cannot be expressed in terms of the Sponsors2 relationship set. However, it can be easily expressed by drawing an arrow from the aggregated relationship Sponsors to the relationship Monitors, thus indicating the use of aggregation over a ternary relationship set.


Happy Exploring!

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.