Performance Criteria for Measuring Database Optimality (1997)

Bernard W Bennetto and Matthew C O Todd

Abstract

There is a need for both better database designs and ultimately the definition and generation of what might be termed an optimal database design. Poor designs are still prevalent despite the widespread adoption of life cycle methodologies, structured and object oriented design and case tools. Design guidelines for current approaches are over emphasising such objectives as full normalisation of the data structures and rapid online response times and neglecting the many other important criteria by which database performance might be judged. This paper seeks to define those other criteria by which database performance might be measured and the basis for the mechanisms which will be needed to determine satisfactory trade-offs between these many criteria.

Introduction

There is a need for both better database designs and ultimately the definition and generation of what might be termed an optimal database design. A process is necessary which both establishes criteria for directing the database design exercise and which also defines their relative priorities so that one is able to trade-off the competing options during design. The process must establish what are the principal objectives of the implemented database from a list of such competing criteria as economy, future flexibility, rapid online response, availability and recoverability, security, meeting overnight batch processing windows, load- and unload-ability and software and hardware compatibility. The process must also formalise and make explicit the trade-offs which have been made during design.

Performance Evaluation Criteria

The list of criteria, discussed below, has been built up from experience gained during a variety of design exercises. The list is by no means intended to be exhaustive. It is expected to grow with time and it is not intended to be universally applicable. Each successive hardware and software development can devalue some criteria and cause other criteria to assume greater prominence. Each installation has been found to be unique and to reveal additional criteria which have been added at the time of each exercise. Moreover, each installation has its own unique blend of priorities which need to be taken account of so that the database designer can reach an implemented database which truly satisfies the requirements of that site and its end users.

They are not introduced in any particular order other than the `default' order which has been implied by existing design approaches and the degree to which they might be formally quantified. Design approaches such as SSADM (Downs, Clare and Coe 1992) emphasise the achievement of a `normalised' structure in the first instance and then meeting the requirement for criteria such as rapid online response and availability. The rank importance of the criteria must be determined by each design exercise and any criteria which have been omitted from the list will need to be determined by the designer. This rank order is defined so as to establish a basis for trading-off the implications of the various design options.

The Defined Criteria

The initial criteria is that of maintainability or future flexibility. This is defined as the need for the ability to amend the database by adding, amending or deleting structures with minimal impact upon the existing programs. It is defined by the formal application of Codd's normalisation steps [Codd (1970), (1971) and (1972)].

The second criteria is that of rapid online response. This needs to be defined as the achievement of a particular host response time, say 3 seconds, for key end user functions as measured by online query select, insert, update and delete response times during the online day (say 9.00 - 17.00 hours).

The criteria of availability and recoverability are defined as the desirability of reaching a particular percentage level of database availability (say 99%) for all or part of the structure and the ability to recover key parts of the structure within a particular time frame (say 25 minutes).

The next criteria define requirements to meet particular `batch processing windows'. Overnight processabilty defines the provision to complete all overnight batch processing in a particular window which will not delay the commencement of the online day (say 17.00 through to 9.00 hours). Loadability and unloadability define stipulations that the database can be loaded, off-loaded and re-loaded for the purposes of loading, data re-organisation and re-structuring within a particular time frame eg. a weekend.

The economy criteria defines the obligations of the designer to meet the budgeted procurement and maintenance costs of the hardware and software which will be required to run the database measured by the cost of processing, memory and storage.

The above criteria can be explicitly quantified in a formalised fashion. However, there are further criteria representing technical requirements which can only be measured in terms of their indirect effects on the above criteria. They include:

  • programmability and end user accessibility - the need for simplifying the task of the application programmers by insulating them from details of the data structure and the importance of providing end user access to the database
  • privacy and security - the statutory demands to prevent unauthorised access to the data for reasons of privacy or security
  • software and hardware compatibility - the limitation to work within the framework of a particular hardware and software environment.

Each of these performance criteria is discussed in greater detail below.

Maintainability and Future Flexibility

Codd's normalisation techniques provide for the definition of what is termed a normalised structure. This can be mapped onto a relational, hierarchical or network structure and this is then regarded as providing the most flexible, future structure. The secondary considerations are then seen as meeting design objectives which seek to improve the performance of the database so that it supports such criteria as fast online response, high levels of availability, avoidance of queueing due to conflicting processing requirements and meeting the overnight or periodic processing schedules.

The construction of a normalised structure meets this performance evaluation criteria by defining an ideal conceptual data structure for meeting the future flexibility criteria. However, while accepting that this is an extremely important, if not over-riding design objective in certain circumstances, there is a need to more clearly define the need for future flexibility. The use of a normalised structure by generating additional I/O and instructional overheads will limit the extent to which one can support currently defined requirements efficiently.

Techniques are needed which systematically ask a series of questions in relation to each entity/relation so as to determine the need for 'novel' structures to meet such unspecified but anticipated requirements. The questions will define such limits. They will ask whether the definition of the entity/relation will change in the future due to legislative or policy type considerations so that the attributes will be changed by either a redefinition/removal of existing attributes or by the addition of new attributes or by splitting it into more than one entity/relation or subsuming it into another entity/relation. They provide a basis for classifying entities/relations as unstable, probably stable or stable. One can then consider how one might cope with such changes.

In designing a vehicle database, it is of little value to provide separate physical structures to hold a history of the individual keeper and corporate keeper entities if one already anticipates that there will be NO major new processing requirements which will require a keeper history or the administration of the distinction between an individual and a corporate keeper. In such circumstances denormalisation is appropriate (Bennetto and Todd 1993).

The techniques do not provide a 'crystal ball' for determining how such future, unknown requirements will be met. If data and functions remain undefined and their characteristics unspecified then there is no way in which they can be incorporated into the construction of the data or function models. Nevertheless, they do attempt to identify those parts of the structure which are liable to be unstable and volatile so that in the future one is able to add other 'novel' structures to meet such unspecified but anticipated requirements.

Rapid Online Response

The requirement for rapid online response is often seen as one of the key design criteria for many database systems and database software is provided with a variety of mechanisms to enable this criteria to be met. Unlike maintainability, it is a quantifiable criteria but the subjective term `rapid' does requires formal definition and this is often neglected.

End users must be guided as to what might constitute a 'rapid', 'good', 'reasonable' or 'poor' response time. The formal definition must be quantified in terms of both average and maximum response terms ie. that 50% of the selected transactions will be satisfied within, say, one and a half seconds and that 95% will be satisfied within three seconds. The requirement should also clearly stipulate that this is the criteria for the host component of the response time and not the network component of the response time. A distinction should also be drawn between retrievals which involve query (SELECT) commands and those involving update commands (INSERT, DELETE and UPDATE) and distinctions should also be drawn between which particular retrieval transactions will require a rapid online response.

Availability and Recoverability

Again these are quantifiable criteria requiring formal definition. Availability defines the desirability of reducing the possibility of a database failure or service interruption and of isolating a failure in one part of the structure from other related parts. Recoverability defines the need for an ability to recover the entire or key parts of the database structure within a particular time frame. The two criteria are inter-related in end users terms since they are measured by defining the requirement of reaching a particular percentage level of database availability.

The level of database availability is measured from the mean time between failures (MTBF) and the mean time to recover. The MTBF is normally defined by the guarantees which have been provided by the computer hardware supplier and there are a range of measures which one can take to increase this value or reduce the implications. Online systems guaranteed a service level of 95% availability would need to suffer no more than an average downtime of 5 minutes during an online day which ran from 9.00 to 17.00 hours. One failure per week with a mean time to recover of 25 minutes would meet this criteria.

The availability level will need to be formally stated so that one can adopt approaches to meet the defined level. The approaches which can be adopted include structural options which either reduce the number of physical linkages across the total database structure or involve logical partitioning so that a failure only affects a limited number of occurrences. Other approaches involve either duplicating the entire or key parts of the database or the more conventional approach involving the creation of a tape or disk file of 'looks' or 'images' of the status of amended records both before and after amendment. The latter approaches permit relative rapid recovery of the data in the event of failure but do add to the instruction workload and can double the read/write overheads of a process and this will negatively impact response time targets.

Overnight Processabilty

The designer might well have stayed within the budgetary provisions which were made for the database system, but there might also be the need to complete all overnight batch processing in a particular window which will not delay the commencement of the online day.

The designer will, therefore, need to make estimates and carry out benchmarks to ensure that the functions, which are to be implemented as overnight batch processes, will run within the overnight schedule. This involves making allowances for any necessary dumping of files, file recovery and re-running following a failure. It also takes account of logical dependencies which define when one function can be initiated given the completion of another functions and which functions can be run simultaneously. Once again the tools of the capacity planner and performance analyst will enable the designer to make such predictions.

Designs have overlooked and failed to consider criteria such batch processing support capability and neglected the critical path of the overnight batch processes and the need to fit the overnight processing window. Such failures to meet this criteria will delay start of online processing unless special provision is made for running batch updates during the online day. This involves specialised recovery strategies which will enable the database to be rapidly restored and both batch and online update processes to be re-started following a failure. The degree to which this criteria can be met can be clearly quantified.

Loadability and Unloadability

Following initial load, a database will need to have the data within its files periodically re-organised to re-build index structures and to remove structures from file overflow areas to optimise the use of particular access techniques. The database will also require periodic re-structuring to correct structural inadequacies and to introduce new structures on top of existing structures. This requires to data to be off-loaded and re-loaded.

A site will often make the stipulation that the database must be off-loaded and re-loaded for the purposes of data re-organisation and re-structuring within a particular time frame such as a weekend or a holiday weekend. The designer should therefore be confident that this can be achieved within this time frame. The designer should not ignore the fact that a bank holiday weekend provides only 80 hours processing time for re-organisation and re-structuring of the database and should not fail to predict that while an initial design might take many days to load, other alternative design might take a tenth of that time. If the entire structure cannot be loaded, off-loaded and re-loaded within the defined time frames then the designer needs to develop a structure which will allow this to be done progressively on parts of that total structure over a number of weekends. Again the degree to which this criteria can be met can be clearly quantified.

Economy

Since performance optimisation over a wide range of design criteria is not a paramount concern, many machine upgrades have been prompted as a result of inadequate initial designs. This objective must state that the database must run on specified hardware and software and that it should not need substantially greater processing and storage capacity than was originally anticipated. It should remain within its original cost justification and the budgetary constraints which were originally specified. The designer needs access to capacity planning tools which progressively predict what the ultimate hardware and software requirements will be so as to formally report on how and why the original targets might be breached or how the system must be modified or re-engineered so as to meet the original targets.

Observance of the criteria requires the designer to calculate the additional costs incurred by the introduction of the various structural design options, the various recovery strategies, and the various physical placement and location mode options which increase either mass storage and processing costs. These costs can be represented in monetary terms or in terms of the additional megabytes, MIPs and processing times that are required. In any event they can be translated into directly quantifiable terms.

Programmability and End User Accessibility

Amending programs and providing direct end user access can be a both a risky and costly undertaking and while notional costs can be attached such criteria cannot be directly quantified. They invoke facilities or tactics which will enhance this aspect of the design but which will have impacts on other criteria. Flexibility considerations attempt to avoid or anticipate changes in data structure as a means of avoiding costly re-organisation and re-structuring in the future. While denormalised data structures which result in repetition, duplication, or repeating groups do result in greater programming complexity, other structures which avoiding or resolve optional relationships remove from the programmer the processing overhead of maintaining the linkages.

The use of relational database software rather than pointer based software provides a more effective means of insulating the application programmers from the data structure since the optimizer determines the optimal access path rather than requiring the programmer to navigate the structure. It provides a layer of software which assumes responsibility for navigating the data structure so that the programmer is unaware of the presence of physical tables or linkages. The programmer calls for data from a logical view of data and the interface software does the mapping to the physical data. Such software can be both complicated to maintain and can impose additional processing overheads.

This end user accessibility criteria defines the potential requirement to permit end users access to the database by query languages such as QBE or SQL. It is valuable in those circumstances where the total set of end user enquiry requirements cannot be met by a set of pre-programmed transactions or batch reports. Quite frequently, policy and managerial questions will arise which cannot be satisfied by any existing program and which will best be satisfied by a query language such as SQL.

Their use poses certain problems for the designer since the end user is not able to anticipate such queries and therefore to build structures to support and optimise their use. As a result they can impose a significant workload on the future database by needing to navigate the entire or substantial parts of the database structure. They can also lock out or be locked out by conflicting update processes.

As much information as possible should be collected as possible about the type of queries which might need to be supported so that options can be considered to specifically meet this objective. Where such information is not forthcoming, then a separate stand alone database structure should be considered which is 'rolled off' from the main structure on a periodic basis. The time to navigate the structure is reduced and conflicts with the principal day to day processes are avoided. Moreover, the establishment of a monitor which records the frequency and type of query can permit the incorporation of support for the most popular paths into the major database structure.

Privacy and Security

All organisations have both a statutory duty and commercial duty to prevent unauthorised access to the data for reasons of privacy or security. The relative importance of this criteria will require the designer to consider whether he needs to provide additional security options over and above those provided by the application system. If the designer regards this as necessary then he can considerthe placing of passwords on particular entities or groups of entity occurrences, the encrypting of data which that entity represents (with the additional overheads of encryption and decryption) and the physical isolation of particular sensitive data to separate physical structures. The designer will also need to determine whether these facilities are provided by standard software or are purpose built. The criteria cannot be directly quantified but each facility will have quantifiable implications on other criteria.

Software and Hardware Compatibility

The site is also likely to make the limitation that the designer must work within the framework defined by their particular hardware and software environment. This has the implication that the designer might be restricted in his choice of available database technology ie. Codasyl, hierarchical or relational and therefore in the available structural options (which will depend on the database technology adopted).

Basically, the older Codasyl and the hierarchical technologies will rely on the implementation of relationships through physical pointers between records while the relational approach will exploit the use of logical foreign keys. Because of the widespread use of physical pointers by the proven Codasyl and hierarchical technologies, there are serious limitations in terms of both extendibility and loadability. Relational structures offer far greater possibilities to extend the existing data structures and can be loaded more efficiently. They are more easily navigable and offer a standard query language.

Tactical Impacts and Trade-Offs

As indicated by the discussion above, a decision with regard to one particular criteria will have impacts on related criteria. Table 1 is an attempt to indicate the degree and extent of such impacts and how they might be measured. It is intended to be indicative rather than exhaustively correct. Therefore it indicates that tactics to meet the maintainability criteria will increase the read/write and instructional overheads for both online and batch processes since the maximum number of tables will be defined. Tactics to improve online response times will increase overnight and periodic batch processing times and lead to increased hardware costs while tactics to improve batch processing times are likely to have negative impacts on online response times and similarly increase hardware costs.

Table 1: Tactical Impacts and Trade-Offs

Design Criteria

Defn M&F ROR A&R OVP L&U E&C P&E P&S H&S
Maintainability &
Flexibility

M&F

Normln

X

X

X

Rapid Online
Response

ROR

Secs

   X

X

X

X

Availability &
Recoverability

A&R

%

X

X

X

X

Overnight
Processability

OVP

Hours

X

X

Loadability &
Unloadability

L&U

Hours

X

X

Economy &
Cost

E&C

£ or $

Programmabilility
& EU Access

P&E

---

X

X

X

Privacy &
Security

P&S

---

X

X

X

HW & SW
Compatibility

H&S

---

X

X

X

X

X

X

X

The Survey

Because of the variation in the priorities which might be assigned to the above criteria, a user survey of some 7 separate organisations within the Suffolk area known to make use of relational databases was conducted by Todd (1996). While the results presented below are therefore representative of only a very small sample and as such cannot be seen to be statistically significant, they do reflect similar anecdotal evidence gleaned from conversations with several database swiss cartier replica administrators and consultants in the field of data analysis. They are also reflected in personal experience of user requirements in database systems.

The sizes of the databases was variable, ranging from 12 tables with an average of just over 20000 rows to 150 tables with an average of 150000 rows. Even the smallest however is clearly not trivial. They are in terms of size what would normally be classified as either medium or large scale and of a size where normalisation is likely to have been attempted and where they are also likely to be those most likely to benefit from performance improvement.

Indexing was a favoured performance enhancing technique, though no respondent was able to say to what extent performance had been improved and denormalisation had been used by just one respondent (this shows that despite it being available as a possible technique to use for about 15 years, very few database designers had actually taken the opportunity to use it). best replica watches This could be for various reasons, not least of which is no doubt its controversial nature, and the reluctance of many organisations to try techniques which do not have widespread acceptance. It could also mean that many organisations have not been able to find sufficient benefit to justify the cost of the performance improvement although there is no evidence available to support or deny this hypothesis.

The list of criteria was varied from the list presented above by avoiding category names which were not easy to define in the context of a questionnaire and by sub-dividing criteria which were felt to be too broad into a number of sub-criteria. Each was given a ranking based on the average positional score and the ordered, weighted list is presented in Table 2. The most important criterion to most users of the database was on-line query response time (1) based on the criteria which were used in the questionnaire. Possibly surprising is the relatively low ranking of such factors such as on-line insert (18.5), delete (18.5) and update (9.5) which were much less important than on-line query time. Also of note is the relatively breitling replik uhren low ranking of future flexibility (9) and maintainability (10.5) and the high ranking achieved by such factors as loadability (3.5) and availability (6.5).

audemars piguet replica watches breitling swiss replica swiss breitling replica

In any event, the survey would indicate that a much wider range of design criteria are relevant for any database design exercise.

Table 2: Weighted Priorities from User Survey

Criterion

Rating

On-line query response time

1

loadability

3.5

weekend processing window

5

batch processing support

5.5

availability

6

overnight batch window

6.5

recovery time

9

future flexibility

9

on-line update response time

9.5

maintainability

10.5

software compatibility

10.5

re-organisability

11

cost of processing

11.5

programmability

11.5

hardware compatibility

13.5

cost of storage

14.5

privacy and security

14.5

on-line delete response time

18.5

on-line insert response time

18.5

Knapsack Problems

The intention of this paper is not only to identify that a much wider range of design criteria are relevant for the database design process but also to seek mechanisms which will establish satisfactory trade-offs between these many criteria and ultimately an optimal database design. Knapsack problems are a category of problem well known to those involved in optimisation problems and which might provide an appropriate model for achieving the goal of an optimal database design [Hamilton (1994)]. They get their name from the general problem outlined below [Kolman and Beck (1980)].

Suppose a hiker can carry a maximum weight of equipment in their pack (say c kilos). They have a collection of n items which they consider taking on their trip. To each item they assign a relative value, based on its importance to their trip, pj . The more important items have a higher relative value. Let wj be the weight of the jth item. The hiker's problem is to decide which of the n items to carry. They need to choose the correct items to maximise the relative value of all the items in their pack. To construct the mathematical model of this we let xj = 1 if the jth item is chosen, and xj = 0 if the jth item is not chosen.

The task is then to maximise

n

z = Sigma pjxj

j=1

subject to the constraint that

n

Sigma wjxj <= c

j=1

Such problems are sometimes known as the zero-one knapsack problem.

An important generalisation of this problem is known as the zero-one multiple knapsack problem, arising when m containers of given capacities ci are available. If we now define xij to be 1 if item j is chosen for container i, and 0 otherwise, then the problem is to maximise

       m       n
z = Sigma Sigma pjxij
      i=1    j= 1

subject to

n

Sigma wjxj <= ci                                 i = 1, ..., m

j=1

m

Sigma xij <= 1                                  j = 1, ..., n

i =1

In a more general case, then pj and wj will vary as the container used varies (i.e. the relative value of each item, pij, and its weight, wij, varies depending on the container, i, used.).

The problem then becomes one of maximising:

               m       n

z = Sigma Sigma pijxij
      i=1    j= 1

subject to

n

Sigma wijxj <= ci                                 i = 1, ..., m

j=1

m

Sigma xij <= 1                                  j = 1, ..., n

i =1

xij = 0 or 1

This is often referred to as the generalised assignment problem.

Martello and Toth (1990) present several algorithms for solving such problems. However, as is pointed out in Syslo et al. (1983) such problems are de facto NP-hard, or complex, and precise formulation when there is a large number of possible options can take a long time.

Why Optimising Database Design is a Knapsack Problem

Suppose that there is a set of m criteria which are of interest in a particular database system. Suppose also that there are n possible discrete design combinations for the proposed database. Suppose also that it is possible to numerically evaluate the performance of a particular design option against a given criteria, and that this is denoted by pij. i.e. the set of performances is P = {p11, p12, ..., pmn}. Suppose also that each criteria has a weighting assigned to it to denote its importance to the database design. The weightings are denoted by wi. i.e. the set of weightings is W = {w1, w2, ..., wm}. If a given design is chosen for use then let xj = 1, otherwise let xj = 0. The set X = {x1, x2, ..., xn} is a so called `type 1 special ordered set', in which only one member is non-zero. The task of optimising the design can now be formulated as maximise

               m       n

z = Sigma Sigma wipijxj
      i=1    j= 1

subject to

 n

Sigma xj 1

        j=1

xj = 0 or 1

As has already been mentioned algorithms to solve such problems do exist, however, because to produce the above formulation it is assumed that performance of a given design against a particular criterion can be measured numerically, only a limited optimisation can take place using this approach.

A Heuristic Approach To Optimising Database Design

Kiviat graphs [Kolence and Kiviat (1973)] were first introduced in 1973 for use in the performance evaluation of computer systems. Kiviat proposed plotting semi-axes radiating from a point, called a pole. The performance of various aspects of a computer system could then be plotted along the axes using predetermined scales, and the points joined together to form a polygon. The number of semi-axes used was completely arbitrary and they were generally selected for according to the completeness of the data required on the system under assessment. A common version involved the plotting of an even number of variables, half of which would ideally be maximised and half of which would ideally be minimised, and alternating the axes for those seen as positive with those seen as negative; a well balanced system would then produce a star shaped graph. Systems tended to produce characteristic shapes [Ferrari, et al. (1983)] and the skilled system manager could recognise these and apply appropriate corrections. A new Kiviat graph could then be drawn to see the `before' and `after' effect (see Figure 1).

Figure 1: Example Kiviat Graphs

                                      (i)                                                          (ii)

Figure 1: Example Kiviat graphs of (i) a well-utilised system and (ii) an un-tuned system

As is stated above the number of axes, the scales used, and what data is plotted on them, is entirely arbitrary. It is therefore possible to use them to plot the criteria associated with a particular database design. This then allows the designer to make proposals about possible design options, based on the known potential effects of such options and the usage analysis which needs to be carried out. However, by producing a Kiviat graph first it may be possible to ignore particular performance enhancement techniques in those situations where they would have a detrimental effect, and to propose alternative designs which may not have previously been considered worthwhile. Before this can be fully accomplished, the designer will need to build up sufficient knowledge of the interpretation of the shapes of Kiviat graphs to understand their full meaning in any given database implementation. As such the approach is by nature heuristic and imprecise.

The process which is proposed is firstly to construct a Kiviat graph based on the relative importance of a range of criteria which can be applied to the database design. These criteria would come from the list previously discussed and should be as complete as possible. Other criteria may be added in any particular situation, but care should be taken in interpretation if this is done. This graph will be known as the `user choice graph'.

Having plotted the user choice graph, the designer then needs to turn his/her attention to each possible design approach. For each enhancement option, performance estimates of those criteria which can be easily quantified such as query response time should be plotted on a second Kiviat graph, along with `quantity estimates' of the performance of non-quantifiable criteria, such as security. The scale chosen for each individual criterion should be such that the maximum expected value for each one is equidistant from the origin. This second Kiviat graph is known as the `performance estimate' graph.

An analysis of the shape of the `user choice graph' would reveal whether the user seeks to maximise those criteria which would generally be positively effected by a particular performance enhancement technique such as indexing or denormalisation, and whether those criteria which would usually be negatively effected by such a technique carry less weighting. The shape could potentially also reveal whether some types of denormalisation, e.g. table consolidation are more appropriate than others, e.g. table separation. If the shape of the Kiviat graph suggests that some form of performance enhancement technique may produce an overall improvement in performance then other steps should follow.

Figure 2: Kiviat Graphs of User Criteria

Figure 2:  Kiviat Graphs of User Criteria

                                        (i)                                                            (ii)

(i) A shape indicating that the maximum amount of denormalisation should be possible

(ii) a shape which indicates that care should be taken to avoid adversely affecting those criteria causing the bulge at the top of the graph

It is then possible to compare the shape of the original user choice Kiviat graph to that produced by each design option. The intention is to try to get the two graphs to be similar shapes, maximising those criteria the user wants maximised, and minimising those criteria he/she wants minimised.

If on the performance estimate graph the shape is such that the maximum value on one or more of the criteria to be maximised falls outside that on the user choice graph, or if the minimum value lies inside that on the user choice graph, then the expected performance of the design option under question tends to exceed that required by the user and the option should be chosen. If on the other hand the shape of the performance estimate graph is such that the values of the `maximise' criteria are inside the user choice values, or that the `minimise' criteria are outside the user choice values, then the design under question does not perform as well as required and an alternative design should be chosen.

Analysing Appropriate Criteria and Assigning Weightings

The criteria used to produce the Kiviat diagrams in Figure 2 were taken from the results of the user survey. They are plotted in such a way that those criteria likely to be positively affected by a particular design tactic such as denormalisation alternate with those which are likely to be adversely affected. Clearly this choice of performance enhancement technique is itself arbitrary, and alternative orderings of the criteria will produce different shaped graphs. The key issue, however is that a database designer who consistently uses the same ordering will be able, over a period of time, to build up a collection of graph shapes sufficient to allow heuristic design decisions to be made.

The list of criteria plotted on the axes in Figure 2 is described in Table 3.

Table 3: Mapping of Kiviat Axes to Design Criteria

1

batch processing support

2

re-organisability

3

on-line query response time

4

on-line update response time

5

loadability

6

maintainability

7

availability

8

on-line delete response time

9

overnight batch processing window

10

cost of storage

11

privacy/security

12

on-line insert response time

13

cost of processing

14

programmability

15

recovery time

16

future flexibility

17

weekend processing window

18

software compatibility

The task of assigning weightings to a given set of criteria can be performed by questioning the user and also determining the relative frequency of particular types of transactions. However, the measurement (or estimation) of performance against any given criteria, e.g. programmability, may be more difficult. Some criteria, such as on-line response time, recovery time, etc. are easily quantified, and it is therefore possible to assess this subset of criteria using the mathematical model described. The criteria used in any given situation can therefore vary, depending on which ones the designer chooses to quantify.

References

1    E Downs, P Clare & I Coe (1992)

`Structured Systems Analysis and Design Method' 2nd Edition. Prentice-Hall

2.   E. F. Codd (1970)

`A Relational Model for Large Shared Data Banks' CACM 13 No 6 (June 1970)

3.   E. F. Codd (1971)

`Normalized Data Base Structure: A Brief Tutorial.'  Proc 1971 ACM SIGFIDET Workshop on Data Description, Access and Control San Diego, Calif. (November 1971)

4.   E. F. Codd (1972)

`Further Normalization of the Data Base Relational Model' in `Data Base Systems' Courant Computer Science Symposia Series Vol 6 Englewood Cliffs, N.J. Prentice Hall (1972)

5.   B W Bennetto & M C O Todd (1993)  

`Denormalisation - An `Optimal Database Design Tactic' Procs of the Annual Conference of United Kingdom Computer Measurement Group, Birmingham, June 1993

6.   M C O Todd (1996)

`Improving Relational Database Performance Through Denormalised Design' Unpublished Open University MSc Dissertation

7.   R J Hamilton (1994)

Personal Communications, June 1994

8.   B Kolman & RE Beck (1980)

`Elementary Linear Programming with Applications' Academic Press, New York

9.   S Martello & P Toth (1990)

`Knapsack Problems: Algorithms and Computer Implementations' Wiley, Chichester, UK

10.   M M Syslo, N Deo & J S Kowalik (1983)

`Discrete Optimization Algorithms' Prentice-Hall, Eaglewood Cliffs, NJ

11.   K W Kolence & P J Kiviat (1973)

`Software Unit Profiles and Kiviat Figures' ACM SIGMETRICS Performance Evaluation Review 2(3) September 73: pp.2-12

12.   D Ferrari, G Serazzi & A Zeigner (1983)

`Measurement and Tuning of Computer Systems' Prentice-Hall, Englewood Cliffs, NJ

This paper was first presented in Coventry in May 1997.

Copyright © rests with Bernard W Bennetto,  Matthew C O Todd & Technology Translation Limited

Home
Home

Last updated: 15/05/97 20:55:17
Authored and Designed by:
Bernard W Bennetto   bennetto@tt-ltd.com

© Bernard W Bennetto 1996 - 1997