Apache Web Server
• It is stable, flexible, and secure.
• It is used, backed, and supported by several major sites and organizations.
• The entire program and related components are open source.
• It works on most variants of Linux/UNIX and Microsoft Windows.
We’ll walk through the process of installing and configuring the Apache HTTP server on a Linux server in this chapter. But before we get into the steps necessary to configure Apache, let’s review some of the fundamentals of HTTP as well as some of the internals of Apache, such as its process ownership model. This information will help you understand why Apache is set up to work the way it does.
HTTP traffic makes up a significant portion of the world’s Internet traffic, and Apache is a server implementation of HTTP. Applications such as Firefox, Chrome, Opera, Curl, wget, Edge, Safari, and Internet Explorer are client implementations of HTTP.
As of this writing, HTTP is at version 2, but HTTP version 1.1 is still by far the most widely used version of the protocol out there. HTTP/1.1 is documented in RFCs 7230 through 7235.
When a web client connects to a web server, the client’s default method of making this connection is to contact the server’s TCP port 80. Once connected, the web server says nothing; it’s up to the client to issue HTTP-compliant commands (also called verbs or methods) for its requests to the server. Along with each command comes a request header that includes information about the client. For example, when using the Firefox browser (the client) on a Linux box, a web server might receive the following information from the client:
The first line contains the HTTP GET command, which asks the server to fetch a file. The remainder of the information makes up the header, which tells the server about the client, the kind of file formats the client will accept, and so forth. Many servers use this information to determine what can and cannot be sent to the client, as well as for logging purposes.
Along with the request header, additional headers may be sent. For example, when a client uses a hyperlink to get to the server site, an entry showing the client’s originating address will also appear in the header.
When it receives a blank line, the server knows a request header is complete. Once the request header is received, it responds with the actual requested content, prefixed by a server header. The server header provides the client with information about the server, the amount of data the client is about to receive, the type of data coming in, and other information. For example, the request header just shown, when sent to an HTTP server, results in the following server response header:
A blank line and then the actual content of the transmission follow the response header.
The default port for HTTP requests is port 80, but you can also configure a web server to use a different (arbitrarily chosen) port that is not in use by another service. This is one of the mechanisms for running multiple web servers or sites on the same host, with each server or site on a different port. Some sites also use this arrangement for multiple configurations of their web servers to support various types of client requests.
When a site runs a web server on a nonstandard port, you can see that port number in the site’s URL. For example, the web address www.example.com with the default port number (80) implicitly and explicitly displayed would read http://www.example.com and http://www.example.com:80, respectively. But serving the same site on a nonstandard port such as port 8080 will require the port number to be explicitly stated, as in http://www.example.com:8080.
Process Ownership and Security
Running a web server on a Linux platform follows the traditional Linux/UNIX permissions and ownership model. In terms of permissions, that means each process has an owner, and that owner has limited rights on the system.
Whenever a program (process) is started, it inherits the permissions of its parent process. For example, if you’re logged in as root, the shell in which you’re doing all your work has all the same rights as the root user. In addition, any process you start from this shell will inherit all of your permissions. Processes may give up rights, but they cannot gain rights.
NOTE There is an exception to the Linux inheritance principle. Programs configured with the SetUID bit do not inherit rights from their parent process, but rather start with the rights specified by the owner of the file itself. For example, the file containing the program
su (/bin/su) is owned by root and has the SetUID bit set. If the user yyang runs the program
su, that program doesn’t inherit the rights of yyang but instead will start with the rights of the superuser (root). To learn more about SetUID, see Chapter 6.
How Apache Processes Ownership
To carry out initial network-related functions, the Apache HTTP server must start with root permissions. Specifically, it needs to bind itself to port 80 so that it can listen for requests and accept connections. Once it does this, Apache can give up its rights and run as a non-root user (unprivileged user), as specified in its configuration files. Different Linux distributions may have varying defaults for this user, but it is usually one of the following: nobody, www, apache, wwwrun, www-data, or daemon.
Remember that when running as an unprivileged user, Apache can read only the files that the user has permissions to read.
Security is especially important for sites that use executable scripts such as Common Gateway Interface (CGI), PHP, or Python scripts. By limiting the permissions of the web server, you decrease the likelihood that someone can send a malicious and executable request to the server. The server processes and corresponding scripts can damage only what they can access. As the user nobody, the scripts and processes don’t have access to the same key files that the root user can access. (Remember that, by default, root can access everything, no matter what the permissions.)
Installing the Apache HTTP Server
Most Linux distributions have the Apache HTTP server software prepackaged as RPM, .deb, or other binaries, so installing the software is usually as simple as using the package management tool on the system. This section walks you through the process of obtaining and installing the program via RPM and Advanced Packaging Tool (APT). Mention is also made of installing the software from source code, if you choose to go that route. The actual configuration of the server covered in later sections applies to both classes of installation (from source or from a binary package).
On Fedora/RHEL/CentOS systems, the package that provides the Apache HTTP server is semi-intuitively named httpd-*.rpm. We’ll use a package manager (
yum) on our sample Fedora server to obtain and install the program.
dnf to install the program, type the following:
To confirm that the software is installed, type the following:
For a Debian-based Linux distribution such as Ubuntu, the package that provides the Apache HTTP server (version 2) is more intuitively named “apache2.” You can use APT to install Apache by running the following:
The web server daemon is automatically started after you install using
apt-get on Ubuntu systems.
And that’s it! You now have Apache installed.
Part of what makes Apache so powerful and flexible is that its design allows for it to be extended through modules. Apache comes with many modules by default and automatically includes them in the default installation.
If you can imagine “it,” you can be almost certain that somebody has already written a module for it for the Apache web server. The Apache module application programming interface (API) is well documented, and if you are so inclined (and know how), you can probably write your own module for Apache to provide any functionality you want.
To give you some idea of what kinds of things people are doing with modules, visit http://modules.apache.org. There you will find information on how to extend Apache’s capabilities using modules. Here are some common Apache modules:
• mod_wsgi Provides a Web Server Gateway Interface (WSGI)–compliant interface for hosting Python-based web applications
• mod_authnz_ldap Provides support for authenticating users of the Apache HTTP server against a Lightweight Directory Access Protocol (LDAP) database
• 407mod_ssl Provides strong cryptography for the Apache web server via the Secure Sockets Layer (SSL) and Transport Layer Security (TLS) protocols
• mod_userdir Allows user content to be served from user-specific directories on the web server via HTTP
• mod_proxy Implements an extensible proxy/gateway/load-balancing interface for Apache when used in conjunction with some other modules (such as mod_proxy_ftp, mod_proxy_balancer, mod_proxy_http, and so on)
If you know the name of a particular module you want (and if the module is popular enough), you might find that the module has already been packaged in an RPM format, so you can install it using the usual RPM methods. For example, if you want to include the SSL module (mod_ssl) in your web server setup, on a Fedora/RHEL/CentOS system, you can issue this
dnf command to download and install the module for you automatically:
Alternatively, you can go to the Apache modules project web site and search for, download, compile, and install the module that you want.
TIP Make sure the run-as user is there! If you build Apache from source, the sample configuration file (httpd.conf) expects that the web server will run as the user named daemon. Although that user exists on almost all Linux distributions, you may want to double-check the local user database (/etc/passwd) to make sure the user daemon does indeed exist.
Starting Up and Shutting Down Apache
Starting up and shutting down Apache on most Linux distributions is easy.
To start Apache on any distro that refers to Apache as httpd and also uses the
service utility, use this command:
On modern Linux distributions running systemd, you can alternatively start the httpd daemon using the
systemctl command like so:
Debian-like systems like Ubuntu refer to the Apache binary as apache2, so you would instead start Apache on such distros by running this:
To shut down Apache on an RPM-based distro like Fedora, enter this command:
On Ubuntu or Debian, you should instead run the following command:
After making a configuration change to the web server that requires you to restart Apache on an RPM-based distro like Fedora, type this:
TIP On a system running openSUSE or SLE (SUSE Linux Enterprise), the commands to start and stop the web server, respectively, are
Starting Apache at Boot Time
After installing the web server, it’s reasonable to assume that you want the web service to be available at all times to your users; you will therefore need to configure the system to automatically start the service between system reboots. Use the
chkconfig utility on legacy Linux distros to configure the automatic startup of the web server service (for example, via chkconfig httpd on). Change
apache2 on Debian-like systems in the following examples.
On systemd-enabled Linux distros, you can check if httpd (or apache2) is enabled for automatic startup by running this command:
If the previous output shows that the unit file for the service is disabled, you can make the httpd (or apache2) daemon automatically start up with system reboots by issuing the
systemctl command, like so:
In older/legacy Ubuntu distros, you can use either the
update-rc.d utility to manage the runlevels in which Apache starts up, like so:
Testing Your Installation
You can perform a quick test of your Apache installation by trying to browse or visit the default basic web site or home page that’s often bundled with Apache for testing purposes.
Use the status option with the
systemctl command on systemd-aware systems to view a nice synopsis (cgroup information, child processes, and so on) of the Apache server status, like so:
On our sample Fedora system, Apache comes with a default page that gets served to visitors in the absence of a custom default home page (for example, index.html or index.htm). The file displayed to visitors when there is no default home page is /usr/share/httpd/noindex/index.html and is controlled by the /etc/httpd/conf.d/welcome.conf configuration file.
To find out if your Apache installation went smoothly, start a web browser and point it to the web site on your machine. To do this from the same system running the web server, simply type http://localhost (or the IPv6 equivalent, http://[::1]/) in the address bar of your web browser. You should see a simple Demo/Sample page showing the web server is working. If you don’t see this, retrace your Apache installation steps and make sure you didn’t encounter any errors in the process. Another thing to check if you can’t see the default web page is to make sure you don’t have any host-based firewall such as Netfilter (
ufw; see Chapter 14) blocking access to the web server.
To quickly open up the HTTP port on a Fedora server, type the following:
Apache supports a rich set of configuration options that are sensible and easy to follow. This makes it a simple task to set up the web server in various configurations.
This section walks through a basic configuration. The default configuration is actually quite good and often works right out of the box, so if the default is acceptable to you, simply start creating your HTML documents! Apache allows several common customizations. After we step through creating a simple web page, you’ll see how to make those common customizations in the Apache configuration files.
Creating a Simple Root-Level Page
If you like, you can start adding files to Apache right away in the /var/www/html directory for top-level pages. Just remember to make sure that any files or directories placed in that directory are world-readable.
As mentioned earlier, Apache’s default web page is index.html. Let’s create and change the default home page so that it reads “Welcome to webserver.example.org.” Here are the commands:
You could also use an editor such as
emacs to edit the index.html file and make it more interesting.
Apache Configuration Files
The configuration files for Apache are located in the /etc/httpd/conf/ directory on RPM-based distros like Fedora. The main configuration file is named httpd.conf on such systems.
On Debian-like systems, the main configuration file for Apache is instead named /etc/apache2/apache2.conf.
One good method to familiarize yourself with and learn more about the configuration files is to read the httpd.conf file. The default configuration file is heavily commented, explaining each entry, its role, and the parameters you can set.
Common Configuration Options
The default configuration settings work just fine right out of the box, and for basic needs, they may require no further modification. Nevertheless, site administrators may need to customize their web server and/or web sites further.
This section discusses some of the common directives or options used in Apache’s configuration file.
This specifies the base configuration directory for the web server. On Fedora, RHEL, and CentOS distributions, this value, by default, is the /etc/httpd/ directory. The default value for this directive in Ubuntu, openSUSE, and Debian Linux distributions is /etc/apache2/.
This is the port(s) on which the server listens for connection requests. It refers to the venerable port 80 (HTTP) for which everything good and bad on the web is so well known!
Listen directive can also be used to specify the particular IP addresses over which the web server accepts connections. The default value for this directive is 80 for nonsecure web communications.
For example, to set Apache to listen on all its IPv4 and IPv6 interfaces on port 80, you would set the
Listen directive to read as follows:
To set Apache to listen on a specific IPv6 interface (such as 2002:c0a8:1::) on port 8080, you would set the
Listen directive to read as follows:
For Debian-like systems such as Ubuntu, set this directive outside of the main configuration file. The value is usually set in the /etc/apache2/ports.conf file.
This directive defines the hostname and port that the server uses to identify itself. At many sites, a single (normally underutilized) server may fulfill multiple purposes and host several other services. For example, an intranet web server that isn’t getting heavy usage may also double up as an FTP server to serve the same files. In such a situation, a computer name such as “www” (with a corresponding fully qualified domain name [FQDN] of www.example.org) wouldn’t be a good choice because it suggests that the machine has only one purpose.
It’s better to give a server a neutral name and then establish Domain Name System (DNS) Canonical Name (CNAME) entries or multiple hostname entries in the /etc/hosts file. In other words, from the users’ perspective, you can define several names for accessing the server or services.
Consider a server whose real hostname is dioxin.eng.example.org. This server also doubles as a web server. You might be thinking of giving it the hostname alias www.sales.example.org. However, since dioxin will know itself only as dioxin, users who visit www.sales.example.org might be confused by seeing in their browsers that the server’s real name is dioxin.
Apache provides a way to get around this through the use of the
ServerName directive. This works by allowing you to specify what you want Apache to return as the hostname of the web server to web clients or visitors.
This is the e-mail address that the server includes in error messages sent to the client.
It’s often a good idea, for a couple of reasons, to use an e-mail alias for a web site’s administrator(s). First, there might be more than one administrator. By using an alias, it’s possible for the alias to expand out to a list of other e-mail addresses. Second, it’ll be easier and quicker to perform mass updates/edits to a bunch of web pages that might have the site administrator e-mail address (incorrectly) hard-coded in them. Here is the syntax:
This defines the primary directory on the web server from which HTML files will be served to requesting clients. On Linux distros such as CentOS, RHEL, Fedora, Ubuntu, and Debian, the default value for this directive is /var/www/html/. On openSUSE and SLE distributions, the default value for this directive is /srv/www/htdocs.
TIP On a web server that is expected to host plenty of web content, the file system pointed to by the
DocumentRoot directive should have a lot of free space to house any current and future [anticipated] web content.
This sets a limit on the number of simultaneous requests that the web server will service.
This is used for loading or adding other modules into Apache’s running configuration. It adds the specified module to the list of active modules.
This specifies the user ID with which the web server will answer requests. The server process will initially start off as the root user but will later downgrade its privileges to those of the user specified here. The user should have only just enough privileges to access files and directories that are intended to be visible to the outside world via the web server. Also, the user should not be able to execute code that is not HTTP or web related.
On a Fedora system, the value for this directive is automatically set to the user named apache. In openSUSE Linux, the value is set to the user wwwrun. In a Debian-like system such as Ubuntu, the value is set to the user www-data (set via the
$APACHE_RUN_USER environment variable).
This specifies the group name of the Apache HTTP server process. It is the group with which the server will respond to requests. The default value under the Fedora, CentOS, and RHEL flavors of Linux is apache. In openSUSE Linux, the value is set to the group www. In Ubuntu, the default value is www-data (set via the
$APACHE_RUN_GROUP environment variable).
This directive allows Apache to specify and include other configuration files at runtime. It is mostly useful for organization purposes; you can, for example, elect to store all the configuration directives for different virtual domains in appropriately named files, and Apache will automatically know to include them at runtime.
Many of the mainstream Linux distros rely quite heavily on the use of the
Include directive to organize site-specific configuration files and directives for the web server. Often, this file and directory organization is the sole distinguishing factor between Apache installation/setup among the different distros.
This directive defines the subdirectory name within each user’s home directory, where users can place personal content that they want to make accessible via the web server. This directory is usually named public_html and is often stored under each user’s home directory. This option is, of course, dependent on the availability of the mod_userdir module in the web server setup.
Here’s a sample usage of this option in the httpd.conf file:
This defines the location where errors from the web server will be logged.
Example: ErrorLog /var/log/httpd/error_log
This option sets the level of verbosity for the messages sent to the error logs. Acceptable log levels are
debug. The default log level is
Alias directive allows documents (web content) to be stored in any other location on the file system that is different from the location specified by the
DocumentRoot directive. It also allows you to create abbreviations (or aliases) for path names that might otherwise be quite long.
ScriptAlias option specifies a target directory or file as containing CGI scripts that are meant to be processed by the CGI module (mod_cgi).
Example: ScriptAlias /cgi-bin/ "/var/www/cgi-bin/"
One of the most-used features of Apache is its ability to support virtual hosts. This makes it possible for a single web server to host multiple web sites as if each site had its own dedicated hardware. It works by allowing the web server to provide different, autonomous content, based on the hostname, port number, or IP address that is being requested by the client. This is accomplished by the HTTP protocol, which specifies the desired site in the HTTP header rather than relying on the server to learn what site to fetch from its IP address.
This directive is actually made up of two tags: an opening
<VirtualHost> tag and a closing
</VirtualHost> tag. It is used to specify the options that pertain to a particular virtual host. Most of the directives that we discussed previously are valid here, too.
Suppose, for example, that we wanted to set up a virtual host configuration for a host named www.another-example.org. To do this, we can create a
VirtualHost entry in the httpd.conf file (or use the
Include directive to specify a separate file), like this one:
On Debian-like distros, you can use another set of utilities (
a2dissite) to enable or disable virtual hosts and web sites quickly under Apache.
For example, assuming while on an Ubuntu server we created the previous configuration file named www.another-example.org for the virtual web site and stored the file under the /etc/apache2/sites-available/ directory, we can enable the virtual web site using the following command:
Similarly, to disable the virtual site, we can run this command:
After running any of the previous commands (
a2dissite), you should make Apache reload its configuration files by running the following:
Finally, don’t forget that it is not enough to configure a virtual host using Apache’s
VirtualHost directive—the value of the
ServerName option in the
VirtualHost container must be a name that is resolvable via DNS (or any other means) to the web server machine.
NOTE Apache’s options/directives are too numerous to all be covered in this section. However, the software comes with its own extensive online manual, which is written in HTML so that you can access it in a browser. If you installed the software via RPM, you might find that documentation for Apache has been packaged into a separate RPM binary, and as a result, you will need to install the proper package (for example, httpd-manual) to have access to it. If you downloaded and built the software from source code, you will find the documentation in the manual directory of your installation prefix (for example, /usr/local/httpd/manual). Depending on the Apache version, the documentation is available online at the project’s web site at http://httpd.apache.org/docs/.
The process of changing various configuration options (or even the initial installation) sometimes may not work as smoothly as you’d like. Thankfully—and with other things being equal—Apache does an excellent job at reporting in its error log file why it failed or what is failing.
The error log file is located in your logs directory. If you are running a stock Fedora or RHEL-type installation, this is in the /var/log/httpd/ directory. If you are running Apache on a stock Debian- or Ubuntu-type distro, this is in the /var/log/apache2/ directory.
The access_log file is simply that—a log of which files have been accessed by people visiting your web site(s). It contains information about whether the transfer completed successfully, where the request originated (IP address), how much data was transferred, and what time the transfer occurred. This is a powerful way of determining the usage of your site.
The error_log file contains all the errors that occur in Apache. Note that not all errors that occur are fatal—some are simply problems with a client connection from which Apache can automatically recover and continue operation. However, if you started Apache but still cannot visit your web site, take a look at this log file to see why Apache might not be responding. The easiest way to see the most recent error messages is by using the
tail command, like so:
If you need to see more log information than that, simply change the number 10 to the number of lines you need to see. If you would like to view the errors or logs in real time as they are being generated, you can use the
-f option for the
tail command. This provides a valuable debugging tool, because you can try things out with the server (such as requesting web pages or restarting Apache) and view the results of your experiments in a separate virtual terminal window. The
tail command with the
-f switch is shown here:
This command will constantly tail the logs until you terminate the program.
On a Fedora/CentOS/RHEL system running the systemd-journald service, you can alternatively use the
journalctl utility to view the latest messages from the httpd.service unit by running the following:
TIP When SELinux is enabled, watch out for its unhelpful interference when troubleshooting your web server on Red Hat–like distros. You might find httpd not working as you’d expect and, even more alarming, with no corresponding logs to help you troubleshoot! You might need to temporarily disable SELinux in this case.
On Debian-based distros like Ubuntu running the systemd-journald service, you can use the
journalctl utility to view the latest messages from the apache2.service unit, like so:
This chapter covered the process of setting up your own web server using Apache (aka httpd) from the ground up. This chapter by itself is enough to get you going with a top-level page and a basic configuration. At a minimum, the material covered here will help you get your web server on the interwebs, or internets—whichever one you prefer!
It is highly recommended that you take some time to page through the relevant and official Apache manual/documentation (http://httpd.apache.org/docs/). It is well written, concise, and flexible enough that you can set up just about any configuration imaginable.