Web Servers

Web Servers
Prev	Chapter 1. Basic Concepts	Next

There are lots of different types of servers: FTP servers, gopher servers and, of course, Web servers. In this section we will concentrate on Web servers.

What do Web servers need to do? Their functionality can be summarise as follows:

They need to provide HTML pages (with an appropriate MIME type header).
They need to provide other types of documents (also with an appropriate MIME type header).
They may need to process information from the user. For instance, if the user submits information to the site, the Web server must either process and store that information, or pass it on to another programme which can do so.
They supply dynamic data (such as in response to user supplied information).

Processing user information and supplying dynamic data is complex. Many servers do not provide this facility. While complex to implement, it does make the server more dynamic and useful.

User information can be processed on the server using server-side applications called CGI (Common Gateway Interface) scripts. Many other languages and interfaces also exist, e.g. Java Servlets and PHP.

The server passes the user's data to the CGI programme which then processes it. This programme may dynamically create an HTML file to be sent back to the client just as standard HTML stored on the server would be. (Note, this will be discussed further in the units on JavaScript.)

Distributed Processing

Client-server computing is concerned with distributing the load of information and processing. Until about 20 years ago, most information was stored on one computer — the same computer on which all the processing was done. The only reason an extra copy of the data might have been kept on another machine was for security or backup purposes. If many people needed either the data or the processing, they would get another computer and copy the data.

With client-server computing, a given machine acts both as a client and as server; that is, it can run both a Web server and a browser client. It can also run processes (i.e. programmes) on other machines. Network technology has enabled this distribution of processing and data.

The goal of distributing processing is to reduce the overall time that is needed to processes some information. For example, consider this: one machine (named A) is connected to two other machines, B and C. If there are three processes to run, they can all run on A. If each require 10 seconds of processor time in order to complete, then it will take a total of 30 seconds of user time to run the processes on one machine. But if B and C are each asked to run a process as well (so that now three machines are being used), then the total processing time has been distributed, and while it still takes 30 seconds of processor time to complete the work, it only takes 10 seconds of user time. It is therefore three times faster.

However, there is an additional cost that was overlooked in the above paragraph. If A has to ask B to run a process, some communication time between the machines is required. For instance, just sending a message takes a certain amount of time, and this assumes that computer B already has the necessary data and programmes to run the process. If not, A may have to send the data and possibly the programme as well. Additionally, time is also required for B to send the results of the processing back to A. (The same is true for Machine C as well.)

For simplicity sake, let us say that sending the data and the results each takes one second. In the first second, A sends the data to both B and C, and A starts processing. In the following second, B and C begin processing. At the tenth second, A finishes its processing. At second 11, both B and C finish processing their data and send their responses to A. In second 12, A receives the data and everything is completed. The total time to run the three processes is 12 seconds

Now try some process balancing in the following exercise.

Exercise 6

Below are a list of processes (P1-P6) and computers (A-E) on which their data currently resides. Each process will output some result after a given amount of execution time has passed (as listed below). Processes can only execute on those computers which contain all of its data. The amount of data the processes require (in megabytes) is also given. Note that in some cases the data is already present on multiple computers. This data may be transferred to other machines at the rate of 1 MB per second. After the data has transferred, the process may then run on that machine. The computed results may also be transferred to another computer taking one second of time. All the machines are directly connected to each other and are otherwise identical. Each computer can run only one process at a time, but after a process completes may execute another.

Of the five computers, computer A wants the results from four of the processes: P1, P2, P3 and P4. Computer B wants the results from P5 and P6, and computers C, D, and E are essentially idle, wanting no results from any of the processes.

Programme	Run Time	Location of Data	Size of Dataa
P1	5 seconds	A	8 MB
P2	6 seconds	D and C	4 MB
P3	7 seconds	A and C	5 MB
P4	8 seconds	C	12 MB
P5	9 seconds	A and E	2 MB
P6	10 seconds	B	3 MB

Come up with five different ways of distributing the processing, and the total user time for processing. For example:
- machine A runs P1 (5 seconds)
- machine B runs P6 (10 seconds)
- machine C runs P3 (7 seconds plus one second to transfer the answer back to A for 8 seconds)
- machine D runs P4 (12 seconds to send the data from machine C, 8 seconds for processing, and 1 second to send the answer for a total of 21 seconds)
- machine E runs P5 (9 seconds to run and 1 second to transfer the answer to B for a total of 10 seconds)
- After running P3, machine C also runs P2 (6 seconds plus one for transfer for a total of 7), bringing the amount of time that machine C is occupied for up to 15 seconds.
- The longest time taken is for process P4, which takes 21 seconds to complete. This means that the total time for obtaining all the required results is only 21 seconds.
What is the least amount of time required to execute all six processes and send their results to the machines which want them? Is it possible to complete all of this work in less than 12 seconds?

Discussions and answers can be found at the end of the chapter

Review

Do Review Question 5 and 6

Prev	Up	Next
The Ultimate Hypermedia System: The World Wide Web	Home	Review Questions