MSc-IT Study Material
January 2011 Edition

Computer Science Department, University of Cape Town
| MIT Notes Home | Edition Home |

An example in constructing a data-flow diagram

Just as for logical data structures, to make a data-flow diagram we must analyse the requirements and describe the system in terms of the components of data-flow diagrams. Several attempts will be needed before a final and complete model of the system can be produced.

Unfortunately, there is no straightforward way to progress through developing a data-flow diagram. We are aiming therefore to build a skeleton model on paper which we can then work with and develop more fully. Each stage will be illustrated with examples from the following Estate Agent case study (repeated from Chapter 4, An Introduction to Analysis and Design).

Estate Agency case study

Clients wishing to put their property on the market visit the estate agent, who will take details of their house, flat or bungalow and enter them on a card which is filed according to the area, price range and type of property.

Potential buyers complete a similar type of card which is filed by buyer name in an A4 binder.

Weekly, the estate agent matches the potential buyer's requirements with the available properties and sends them the details of selected properties.

When a sale is completed, the buyer confirms that the contracts have been exchanged, client details are removed from the property file, and an invoice is sent to the client. The client receives the top copy of a three part set, with the other two copies being filed.

On receipt of the payment the invoice copies are stamped and archived. Invoices are checked on a monthly basis and for those accounts not settled within two months a reminder (the third copy of the invoice) is sent to the client.

Identify the system boundaries

The easiest place to making a data-flow model of a system is to identify what the external entities of the system are and what inputs and outputs they provide. These give you the boundary between the system and the rest of the world.

External entities must provide inputs or receive outputs. There are usually one or two which stand out as obviously interacting with the system but not being part of the system. In the Estate Agent system, Client and Buyer stand out as good candidates for external entities. Others may be harder to spot, but once again consider nouns in the case study and add them to a list of possible external entities.

It may be tempting to add Estate Agent as an external entity as it obviously interacts with the system. However, the estate agent is in fact part of the system in that he or she manipulates the data within the system. Another way to think about it is that the estate agent will actually be replaced by the new software system and so does not need to appear in the data-flow diagram.

From the list of candidates of external entities, determine what inputs they provide and what outputs they receive. If a candidate entity does not seem to provide data into the system or receive data from the system then it is not an external entity and can be discounted (for now).

An external entity stands for the type of thing interacting with the system so all clients and all buyers are represented by the Client and Buyer external entities.

Having identified the external entities there are two ways of progressing from here. Both are equally sensible approaches and are covered in the next two subsections.

Follow inputs

Each input to the system must be received by a process. This gives us a natural way to start building up the model.

First, take one of the more significant external entities and one of the main inputs it provides. In our case, a Client providing Property Details is a good place to start. Draw an oval for the entity, a data-flow for the input and a process which receives the input. From the case study there should something that suggest what happens when this data comes in and this will be the name of the process. For Property Details, the case study says that the estate agent enters the details on a card and files them. So the process name should be either Record Details or Receive Details.

Every process must have at least one output, so, for the process in hand, consider what the outputs must be and put labelled arrows on the diagram for the outputs. The data must be changed by a process and so should have a different name from the input data. Property Details are taken and recorded as a property on the file so the output could be just something like Recorded Property or more simply, Property.

Now start again only using this output as a new input. It must either go to another process, to a data store or to an external entity. It should be clear from the case study what happens.

If a new process is needed then do the same again. Find a sensible name for the process using the case study, determine and label the outputs and then follow the outputs.

If the data is stored then add a data store to the diagram, name it sensibly from the case study and draw the output arrow going into the data store. This is what happens in Property and so we add the data store Properties.

If the output is an output from the system then simply add the external entity which receives the output.

When the data is finally output or comes to rest in a data store, go back and follow any of the other outputs which may have been defined on the way. When they are exhausted, choose a new input and follow that through in exactly the same way.

Follow events

Another way to approach building up a data-flow model is to consider what happens in the system. The case study will outline a number of events. There must be processes in the system which respond to these events or even make them happen. Identify these processes and then add the data inputs which are used by the process and determine the outputs.

For example, in the estate agent example, there is the phrase, When a sale is completed.... This is an event: a sale is completed. From the case study we see that lots of things then happen: the buyer confirms exchange of contracts so this is an input to some process; the client details are removed from the file and invoice is sent out. This is the process. A sensible name might be Record Sale or possibly Receive Sale Confirmation. The data needed is the input from the Buyer and Client details which are on file. This must mean there is a data store somewhere on the diagram holding this information. If there is not one there already then add it. And the output must be an invoice to the Client.

From here on the approach is the same as following inputs. For any new outputs, work out where those outputs must go and if it is to a process follow them as if they were inputs to the new process.

Most processes can be found in the case study using either technique of following inputs or following events. However, some processes are related to temporal events and so can only be found by following events.

As the name suggests, temporal events are events which occur at specific times. They are not prompted to happen by the arrival of new data, but rather because a certain time has been reached. These events often appear in case studies beginning with phrases such as, Once a month... or, At the end of every day,.... However, once these have been identified, producing the model by following this event is exactly the same as for any other event.

In the estate agent system, there are two temporal events: there is a weekly matching of potential buyers with properties; invoices and reminders are sent out on a monthly basis.

Though time is the trigger the processes carrying out temporal events, time is generally not shown on the data-flow diagram. This is because the time aspect is often just a practical implementation rather than rigid necessity. For example, the matching of buyers and properties at the estate agents need not be weekly. It is probably done weekly so that it always gets done, and also so that it does not interrupt the other daily business. With an automated system it may be possible to match buyers with properties as soon as any new details on either arrive.

Where time is crucial to a process, say accounting done at the end of a financial year, then this can be reflected in the name of the process. For example, Calculate end of year profits.

Fill in gaps

After building a model that handles each input or each event, it is worth going over the processes defined so far.

For each process, ask the question, Does this process have all the information it needs to perform its task? For instance, if a process sends out invoices, does it have all the details of the invoice and the address of where the invoice should go? If the answer is No, then add a data-flow into the process which consists of the data needed by the process. If there are several, clearly distinct items of data needed, then you may need an arrow for each item. Now try to identify the source of the data.

First, see if the data can be found already inside the system, either on a data store or as a result of a process. If not, it may be that the data can be obtained by processing some of the existing data in, which case add a new process that takes the existing data and makes the data you require. Or, the data may be available but from the case study it is clear that there is a time-lag between the process that produces the data and the process which uses it. Simply add a data store where the data can reside till it is needed.

If there is still no source for the data then it could be from an external entity. In which case, this is a new input to the system. It may not be explicitly mentioned in the case study, but if it is necessary then it should be added. Having added the new input from the appropriate entity, go back and correct the context diagram.

This is an important task. If there is not enough data to support a task then the system will not function properly. Of course, to be on the safe side, you could have all the data going to all the processes! But this is not really a solution because with a large system this would be impractical.

Having examined all the processes, check that all the outputs have been generated. All of the inputs should have been covered already, but this does not mean that all the outputs have been produced. If there is still an output which does not appear on the diagram, see if there is a process where it could come from. If there is no sensible candidate, add a process and begin to work backwards. What inputs does the new process need? Where do these inputs come from? This task is almost the same as the one just described.

Any left over outputs must have come from a process. Outputs cannot come from data stores or external entities. If there is no sensible way to fit the output into the diagram then it may be that it is not a sensible output for the system you are currently considering. Use the case study to confirm this.

Finally, check the data stores. Data must enter a data store somehow and generally data on a data store is read. For each data store, identify when the store is either written or read by considering the processes which may use the data. Also, use the case study to see that you have not missed any arrows to or from a data store.


By this stage, you will have considered all the inputs, all the outputs and produced a first draft of the data-flow model of the system.

Review the case study, looking for functionality described which is not performed by the model. In particular, look for temporal events as these are sometimes hidden implicitly in the text.

Where necessary, add new processes that perform the omitted functions and use the method of following events to work out their inputs and outputs. Fill in the gaps of the model in exactly the same way as was done to produce the first attempt.

The model can be declared finished when you have considered every word in the case study and decided that it is not relevant or that it is incorporated in some way into the model.