Chapter 19 Apache Web Server – Linux Administration: A Beginner's Guide, Eighth Edition, 8th Edition

CHAPTER

19

Apache Web Server

Apache is a widely popular open source Hypertext Transfer Protocol (HTTP) server software. It is released under the Apache license. Apache continues to dominate the web server market in terms of use and deployments. It has thus maintained a certain level of acceptance and respect from the Internet community. Apache offers some of the following benefits and advantages:

•   It is stable, flexible, and secure.

•   It is used, backed, and supported by several major sites and organizations.

•   The entire program and related components are open source.

•   It works on most variants of Linux/UNIX and Microsoft Windows.

We’ll walk through the process of installing and configuring the Apache HTTP server on a Linux server in this chapter. But before we get into the steps necessary to configure Apache, let’s review some of the fundamentals of HTTP as well as some of the internals of Apache, such as its process ownership model. This information will help you understand why Apache is set up to work the way it does.

Understanding HTTP

HTTP traffic makes up a significant portion of the world’s Internet traffic, and Apache is a server implementation of HTTP. Applications such as Firefox, Chrome, Opera, Curl, wget, Edge, Safari, and Internet Explorer are client implementations of HTTP.

As of this writing, HTTP is at version 2, but HTTP version 1.1 is still by far the most widely used version of the protocol out there. HTTP/1.1 is documented in RFCs 7230 through 7235.

Headers

When a web client connects to a web server, the client’s default method of making this connection is to contact the server’s TCP port 80. Once connected, the web server says nothing; it’s up to the client to issue HTTP-compliant commands (also called verbs or methods) for its requests to the server. Along with each command comes a request header that includes information about the client. For example, when using the Firefox browser (the client) on a Linux box, a web server might receive the following information from the client:

The first line contains the HTTP GET command, which asks the server to fetch a file. The remainder of the information makes up the header, which tells the server about the client, the kind of file formats the client will accept, and so forth. Many servers use this information to determine what can and cannot be sent to the client, as well as for logging purposes.

Along with the request header, additional headers may be sent. For example, when a client uses a hyperlink to get to the server site, an entry showing the client’s originating address will also appear in the header.

When it receives a blank line, the server knows a request header is complete. Once the request header is received, it responds with the actual requested content, prefixed by a server header. The server header provides the client with information about the server, the amount of data the client is about to receive, the type of data coming in, and other information. For example, the request header just shown, when sent to an HTTP server, results in the following server response header:

A blank line and then the actual content of the transmission follow the response header.

Ports

The default port for HTTP requests is port 80, but you can also configure a web server to use a different (arbitrarily chosen) port that is not in use by another service. This is one of the mechanisms for running multiple web servers or sites on the same host, with each server or site on a different port. Some sites also use this arrangement for multiple configurations of their web servers to support various types of client requests.

When a site runs a web server on a nonstandard port, you can see that port number in the site’s URL. For example, the web address www.example.com with the default port number (80) implicitly and explicitly displayed would read http://www.example.com and http://www.example.com:80, respectively. But serving the same site on a nonstandard port such as port 8080 will require the port number to be explicitly stated, as in http://www.example.com:8080.

Process Ownership and Security

Running a web server on a Linux platform follows the traditional Linux/UNIX permissions and ownership model. In terms of permissions, that means each process has an owner, and that owner has limited rights on the system.

Whenever a program (process) is started, it inherits the permissions of its parent process. For example, if you’re logged in as root, the shell in which you’re doing all your work has all the same rights as the root user. In addition, any process you start from this shell will inherit all of your permissions. Processes may give up rights, but they cannot gain rights.

NOTE  There is an exception to the Linux inheritance principle. Programs configured with the SetUID bit do not inherit rights from their parent process, but rather start with the rights specified by the owner of the file itself. For example, the file containing the program su (/bin/su) is owned by root and has the SetUID bit set. If the user yyang runs the program su, that program doesn’t inherit the rights of yyang but instead will start with the rights of the superuser (root). To learn more about SetUID, see Chapter 6.

How Apache Processes Ownership

To carry out initial network-related functions, the Apache HTTP server must start with root permissions. Specifically, it needs to bind itself to port 80 so that it can listen for requests and accept connections. Once it does this, Apache can give up its rights and run as a non-root user (unprivileged user), as specified in its configuration files. Different Linux distributions may have varying defaults for this user, but it is usually one of the following: nobody, www, apache, wwwrun, www-data, or daemon.

Remember that when running as an unprivileged user, Apache can read only the files that the user has permissions to read.

Security is especially important for sites that use executable scripts such as Common Gateway Interface (CGI), PHP, or Python scripts. By limiting the permissions of the web server, you decrease the likelihood that someone can send a malicious and executable request to the server. The server processes and corresponding scripts can damage only what they can access. As the user nobody, the scripts and processes don’t have access to the same key files that the root user can access. (Remember that, by default, root can access everything, no matter what the permissions.)

Installing the Apache HTTP Server

Most Linux distributions have the Apache HTTP server software prepackaged as RPM, .deb, or other binaries, so installing the software is usually as simple as using the package management tool on the system. This section walks you through the process of obtaining and installing the program via RPM and Advanced Packaging Tool (APT). Mention is also made of installing the software from source code, if you choose to go that route. The actual configuration of the server covered in later sections applies to both classes of installation (from source or from a binary package).

On Fedora/RHEL/CentOS systems, the package that provides the Apache HTTP server is semi-intuitively named httpd-*.rpm. We’ll use a package manager (dnf or yum) on our sample Fedora server to obtain and install the program.

To use dnf to install the program, type the following:

To confirm that the software is installed, type the following:

For a Debian-based Linux distribution such as Ubuntu, the package that provides the Apache HTTP server (version 2) is more intuitively named “apache2.” You can use APT to install Apache by running the following:

The web server daemon is automatically started after you install using apt-get on Ubuntu systems.

And that’s it! You now have Apache installed.

Apache Modules

Part of what makes Apache so powerful and flexible is that its design allows for it to be extended through modules. Apache comes with many modules by default and automatically includes them in the default installation.

If you can imagine “it,” you can be almost certain that somebody has already written a module for it for the Apache web server. The Apache module application programming interface (API) is well documented, and if you are so inclined (and know how), you can probably write your own module for Apache to provide any functionality you want.

To give you some idea of what kinds of things people are doing with modules, visit http://modules.apache.org. There you will find information on how to extend Apache’s capabilities using modules. Here are some common Apache modules:

•   mod_wsgi   Provides a Web Server Gateway Interface (WSGI)–compliant interface for hosting Python-based web applications

•   mod_authnz_ldap   Provides support for authenticating users of the Apache HTTP server against a Lightweight Directory Access Protocol (LDAP) database

•   407mod_ssl   Provides strong cryptography for the Apache web server via the Secure Sockets Layer (SSL) and Transport Layer Security (TLS) protocols

•   mod_userdir   Allows user content to be served from user-specific directories on the web server via HTTP

•   mod_proxy   Implements an extensible proxy/gateway/load-balancing interface for Apache when used in conjunction with some other modules (such as mod_proxy_ftp, mod_proxy_balancer, mod_proxy_http, and so on)

If you know the name of a particular module you want (and if the module is popular enough), you might find that the module has already been packaged in an RPM format, so you can install it using the usual RPM methods. For example, if you want to include the SSL module (mod_ssl) in your web server setup, on a Fedora/RHEL/CentOS system, you can issue this dnf command to download and install the module for you automatically:

Alternatively, you can go to the Apache modules project web site and search for, download, compile, and install the module that you want.

TIP  Make sure the run-as user is there! If you build Apache from source, the sample configuration file (httpd.conf) expects that the web server will run as the user named daemon. Although that user exists on almost all Linux distributions, you may want to double-check the local user database (/etc/passwd) to make sure the user daemon does indeed exist.

Starting Up and Shutting Down Apache

Starting up and shutting down Apache on most Linux distributions is easy.

To start Apache on any distro that refers to Apache as httpd and also uses the service utility, use this command:

On modern Linux distributions running systemd, you can alternatively start the httpd daemon using the systemctl command like so:

Debian-like systems like Ubuntu refer to the Apache binary as apache2, so you would instead start Apache on such distros by running this:

To shut down Apache on an RPM-based distro like Fedora, enter this command:

On Ubuntu or Debian, you should instead run the following command:

After making a configuration change to the web server that requires you to restart Apache on an RPM-based distro like Fedora, type this:

TIP  On a system running openSUSE or SLE (SUSE Linux Enterprise), the commands to start and stop the web server, respectively, are

Starting Apache at Boot Time

After installing the web server, it’s reasonable to assume that you want the web service to be available at all times to your users; you will therefore need to configure the system to automatically start the service between system reboots. Use the chkconfig utility on legacy Linux distros to configure the automatic startup of the web server service (for example, via chkconfig httpd on). Change httpd to apache2 on Debian-like systems in the following examples.

On systemd-enabled Linux distros, you can check if httpd (or apache2) is enabled for automatic startup by running this command:

If the previous output shows that the unit file for the service is disabled, you can make the httpd (or apache2) daemon automatically start up with system reboots by issuing the systemctl command, like so:

In older/legacy Ubuntu distros, you can use either the sysv-rc-conf or update-rc.d utility to manage the runlevels in which Apache starts up, like so:

Testing Your Installation

You can perform a quick test of your Apache installation by trying to browse or visit the default basic web site or home page that’s often bundled with Apache for testing purposes.

Use the status option with the systemctl command on systemd-aware systems to view a nice synopsis (cgroup information, child processes, and so on) of the Apache server status, like so:

On our sample Fedora system, Apache comes with a default page that gets served to visitors in the absence of a custom default home page (for example, index.html or index.htm). The file displayed to visitors when there is no default home page is /usr/share/httpd/noindex/index.html and is controlled by the /etc/httpd/conf.d/welcome.conf configuration file.

To find out if your Apache installation went smoothly, start a web browser and point it to the web site on your machine. To do this from the same system running the web server, simply type http://localhost (or the IPv6 equivalent, http://[::1]/) in the address bar of your web browser. You should see a simple Demo/Sample page showing the web server is working. If you don’t see this, retrace your Apache installation steps and make sure you didn’t encounter any errors in the process. Another thing to check if you can’t see the default web page is to make sure you don’t have any host-based firewall such as Netfilter (iptables, nftables, or ufw; see Chapter 14) blocking access to the web server.

To quickly open up the HTTP port on a Fedora server, type the following:

Configuring Apache

Apache supports a rich set of configuration options that are sensible and easy to follow. This makes it a simple task to set up the web server in various configurations.

This section walks through a basic configuration. The default configuration is actually quite good and often works right out of the box, so if the default is acceptable to you, simply start creating your HTML documents! Apache allows several common customizations. After we step through creating a simple web page, you’ll see how to make those common customizations in the Apache configuration files.

Creating a Simple Root-Level Page

If you like, you can start adding files to Apache right away in the /var/www/html directory for top-level pages. Just remember to make sure that any files or directories placed in that directory are world-readable.

As mentioned earlier, Apache’s default web page is index.html. Let’s create and change the default home page so that it reads “Welcome to webserver.example.org.” Here are the commands:

You could also use an editor such as vi, pico, or emacs to edit the index.html file and make it more interesting.

Apache Configuration Files

The configuration files for Apache are located in the /etc/httpd/conf/ directory on RPM-based distros like Fedora. The main configuration file is named httpd.conf on such systems.

On Debian-like systems, the main configuration file for Apache is instead named /etc/apache2/apache2.conf.

One good method to familiarize yourself with and learn more about the configuration files is to read the httpd.conf file. The default configuration file is heavily commented, explaining each entry, its role, and the parameters you can set.

Common Configuration Options

The default configuration settings work just fine right out of the box, and for basic needs, they may require no further modification. Nevertheless, site administrators may need to customize their web server and/or web sites further.

This section discusses some of the common directives or options used in Apache’s configuration file.

ServerRoot

This specifies the base configuration directory for the web server. On Fedora, RHEL, and CentOS distributions, this value, by default, is the /etc/httpd/ directory. The default value for this directive in Ubuntu, openSUSE, and Debian Linux distributions is /etc/apache2/.

Listen

This is the port(s) on which the server listens for connection requests. It refers to the venerable port 80 (HTTP) for which everything good and bad on the web is so well known!

The Listen directive can also be used to specify the particular IP addresses over which the web server accepts connections. The default value for this directive is 80 for nonsecure web communications.

For example, to set Apache to listen on all its IPv4 and IPv6 interfaces on port 80, you would set the Listen directive to read as follows:

To set Apache to listen on a specific IPv6 interface (such as 2002:c0a8:1::) on port 8080, you would set the Listen directive to read as follows:

For Debian-like systems such as Ubuntu, set this directive outside of the main configuration file. The value is usually set in the /etc/apache2/ports.conf file.

ServerName

This directive defines the hostname and port that the server uses to identify itself. At many sites, a single (normally underutilized) server may fulfill multiple purposes and host several other services. For example, an intranet web server that isn’t getting heavy usage may also double up as an FTP server to serve the same files. In such a situation, a computer name such as “www” (with a corresponding fully qualified domain name [FQDN] of www.example.org) wouldn’t be a good choice because it suggests that the machine has only one purpose.

It’s better to give a server a neutral name and then establish Domain Name System (DNS) Canonical Name (CNAME) entries or multiple hostname entries in the /etc/hosts file. In other words, from the users’ perspective, you can define several names for accessing the server or services.

Consider a server whose real hostname is dioxin.eng.example.org. This server also doubles as a web server. You might be thinking of giving it the hostname alias www.sales.example.org. However, since dioxin will know itself only as dioxin, users who visit www.sales.example.org might be confused by seeing in their browsers that the server’s real name is dioxin.

Apache provides a way to get around this through the use of the ServerName directive. This works by allowing you to specify what you want Apache to return as the hostname of the web server to web clients or visitors.

ServerAdmin

This is the e-mail address that the server includes in error messages sent to the client.

It’s often a good idea, for a couple of reasons, to use an e-mail alias for a web site’s administrator(s). First, there might be more than one administrator. By using an alias, it’s possible for the alias to expand out to a list of other e-mail addresses. Second, it’ll be easier and quicker to perform mass updates/edits to a bunch of web pages that might have the site administrator e-mail address (incorrectly) hard-coded in them. Here is the syntax:

DocumentRoot

This defines the primary directory on the web server from which HTML files will be served to requesting clients. On Linux distros such as CentOS, RHEL, Fedora, Ubuntu, and Debian, the default value for this directive is /var/www/html/. On openSUSE and SLE distributions, the default value for this directive is /srv/www/htdocs.

TIP  On a web server that is expected to host plenty of web content, the file system pointed to by the DocumentRoot directive should have a lot of free space to house any current and future [anticipated] web content.

MaxRequestWorkers

This sets a limit on the number of simultaneous requests that the web server will service.

LoadModule

This is used for loading or adding other modules into Apache’s running configuration. It adds the specified module to the list of active modules.

User

This specifies the user ID with which the web server will answer requests. The server process will initially start off as the root user but will later downgrade its privileges to those of the user specified here. The user should have only just enough privileges to access files and directories that are intended to be visible to the outside world via the web server. Also, the user should not be able to execute code that is not HTTP or web related.

On a Fedora system, the value for this directive is automatically set to the user named apache. In openSUSE Linux, the value is set to the user wwwrun. In a Debian-like system such as Ubuntu, the value is set to the user www-data (set via the $APACHE_RUN_USER environment variable).

Group

This specifies the group name of the Apache HTTP server process. It is the group with which the server will respond to requests. The default value under the Fedora, CentOS, and RHEL flavors of Linux is apache. In openSUSE Linux, the value is set to the group www. In Ubuntu, the default value is www-data (set via the $APACHE_RUN_GROUP environment variable).

Include

This directive allows Apache to specify and include other configuration files at runtime. It is mostly useful for organization purposes; you can, for example, elect to store all the configuration directives for different virtual domains in appropriately named files, and Apache will automatically know to include them at runtime.

Many of the mainstream Linux distros rely quite heavily on the use of the Include directive to organize site-specific configuration files and directives for the web server. Often, this file and directory organization is the sole distinguishing factor between Apache installation/setup among the different distros.

UserDir

This directive defines the subdirectory name within each user’s home directory, where users can place personal content that they want to make accessible via the web server. This directory is usually named public_html and is often stored under each user’s home directory. This option is, of course, dependent on the availability of the mod_userdir module in the web server setup.

Here’s a sample usage of this option in the httpd.conf file:

ErrorLog

This defines the location where errors from the web server will be logged.

Example: ErrorLog /var/log/httpd/error_log

LogLevel

This option sets the level of verbosity for the messages sent to the error logs. Acceptable log levels are emerg, alert, crit, error, warn, notice, info, and debug. The default log level is warn.

Alias

The Alias directive allows documents (web content) to be stored in any other location on the file system that is different from the location specified by the DocumentRoot directive. It also allows you to create abbreviations (or aliases) for path names that might otherwise be quite long.

ScriptAlias

The ScriptAlias option specifies a target directory or file as containing CGI scripts that are meant to be processed by the CGI module (mod_cgi).

Example: ScriptAlias /cgi-bin/ "/var/www/cgi-bin/"

VirtualHost

One of the most-used features of Apache is its ability to support virtual hosts. This makes it possible for a single web server to host multiple web sites as if each site had its own dedicated hardware. It works by allowing the web server to provide different, autonomous content, based on the hostname, port number, or IP address that is being requested by the client. This is accomplished by the HTTP protocol, which specifies the desired site in the HTTP header rather than relying on the server to learn what site to fetch from its IP address.

This directive is actually made up of two tags: an opening <VirtualHost> tag and a closing </VirtualHost> tag. It is used to specify the options that pertain to a particular virtual host. Most of the directives that we discussed previously are valid here, too.

Suppose, for example, that we wanted to set up a virtual host configuration for a host named www.another-example.org. To do this, we can create a VirtualHost entry in the httpd.conf file (or use the Include directive to specify a separate file), like this one:

On Debian-like distros, you can use another set of utilities (a2ensite and a2dissite) to enable or disable virtual hosts and web sites quickly under Apache.

For example, assuming while on an Ubuntu server we created the previous configuration file named www.another-example.org for the virtual web site and stored the file under the /etc/apache2/sites-available/ directory, we can enable the virtual web site using the following command:

Similarly, to disable the virtual site, we can run this command:

After running any of the previous commands (a2ensite or a2dissite), you should make Apache reload its configuration files by running the following:

Finally, don’t forget that it is not enough to configure a virtual host using Apache’s VirtualHost directive—the value of the ServerName option in the VirtualHost container must be a name that is resolvable via DNS (or any other means) to the web server machine.

NOTE  Apache’s options/directives are too numerous to all be covered in this section. However, the software comes with its own extensive online manual, which is written in HTML so that you can access it in a browser. If you installed the software via RPM, you might find that documentation for Apache has been packaged into a separate RPM binary, and as a result, you will need to install the proper package (for example, httpd-manual) to have access to it. If you downloaded and built the software from source code, you will find the documentation in the manual directory of your installation prefix (for example, /usr/local/httpd/manual). Depending on the Apache version, the documentation is available online at the project’s web site at http://httpd.apache.org/docs/.

Troubleshooting Apache

The process of changing various configuration options (or even the initial installation) sometimes may not work as smoothly as you’d like. Thankfully—and with other things being equal—Apache does an excellent job at reporting in its error log file why it failed or what is failing.

The error log file is located in your logs directory. If you are running a stock Fedora or RHEL-type installation, this is in the /var/log/httpd/ directory. If you are running Apache on a stock Debian- or Ubuntu-type distro, this is in the /var/log/apache2/ directory.

The access_log file is simply that—a log of which files have been accessed by people visiting your web site(s). It contains information about whether the transfer completed successfully, where the request originated (IP address), how much data was transferred, and what time the transfer occurred. This is a powerful way of determining the usage of your site.

The error_log file contains all the errors that occur in Apache. Note that not all errors that occur are fatal—some are simply problems with a client connection from which Apache can automatically recover and continue operation. However, if you started Apache but still cannot visit your web site, take a look at this log file to see why Apache might not be responding. The easiest way to see the most recent error messages is by using the tail command, like so:

If you need to see more log information than that, simply change the number 10 to the number of lines you need to see. If you would like to view the errors or logs in real time as they are being generated, you can use the -f option for the tail command. This provides a valuable debugging tool, because you can try things out with the server (such as requesting web pages or restarting Apache) and view the results of your experiments in a separate virtual terminal window. The tail command with the -f switch is shown here:

This command will constantly tail the logs until you terminate the program.

On a Fedora/CentOS/RHEL system running the systemd-journald service, you can alternatively use the journalctl utility to view the latest messages from the httpd.service unit by running the following:

TIP  When SELinux is enabled, watch out for its unhelpful interference when troubleshooting your web server on Red Hat–like distros. You might find httpd not working as you’d expect and, even more alarming, with no corresponding logs to help you troubleshoot! You might need to temporarily disable SELinux in this case.

On Debian-based distros like Ubuntu running the systemd-journald service, you can use the journalctl utility to view the latest messages from the apache2.service unit, like so:

Summary

This chapter covered the process of setting up your own web server using Apache (aka httpd) from the ground up. This chapter by itself is enough to get you going with a top-level page and a basic configuration. At a minimum, the material covered here will help you get your web server on the interwebs, or internets—whichever one you prefer!

It is highly recommended that you take some time to page through the relevant and official Apache manual/documentation (http://httpd.apache.org/docs/). It is well written, concise, and flexible enough that you can set up just about any configuration imaginable.