developer/ 40755 0 0 0 10423220200 10056 5 ustar 0 0 faq/ 40755 0 0 0 10423220200 6640 5 ustar 0 0 howto/ 40755 0 0 0 10423220200 7231 5 ustar 0 0 images/ 40755 0 0 0 10423220200 7336 5 ustar 0 0 misc/ 40755 0 0 0 10423220200 7024 5 ustar 0 0 mod/ 40755 0 0 0 10423220200 6650 5 ustar 0 0 platform/ 40755 0 0 0 10423220200 7715 5 ustar 0 0 programs/ 40755 0 0 0 10423220200 7723 5 ustar 0 0 rewrite/ 40755 0 0 0 10423220200 7552 5 ustar 0 0 ssl/ 40755 0 0 0 10423220200 6672 5 ustar 0 0 style/ 40755 0 0 0 10423220200 7231 5 ustar 0 0 style/_generated/ 40755 0 0 0 10423220200 11326 5 ustar 0 0 style/css/ 40755 0 0 0 10423220033 10025 5 ustar 0 0 style/lang/ 40755 0 0 0 10423220200 10152 5 ustar 0 0 style/latex/ 40755 0 0 0 10423220200 10346 5 ustar 0 0 style/xsl/ 40755 0 0 0 10423220200 10037 5 ustar 0 0 style/xsl/util/ 40755 0 0 0 10423220200 11014 5 ustar 0 0 vhosts/ 40755 0 0 0 10423220200 7417 5 ustar 0 0 bind.html 100644 0 0 23631 10423220177 10032 0 ustar 0 0
Versin 2.0 del Servidor HTTP Apache

Cmo configurar Apache para que escuche en direcciones IP y puertos especficos.
| Mdulos Relacionados | Directivas Relacionadas |
|---|---|
Cuando Apache se inicia, comienza a esperar peticiones entrantes en determinados puertos y direcciones de la mquina en la que se est ejecutando. Sin embargo, si quiere que Apache escuche solamente en determinados puertos especficos, o solamente en determinadas direcciones, o en una combinacin de ambos, debe especificarlo adecuadamente. Esto puede adems combinarlo con la posibilidad de usar hosts virtuales, funcionalidad con la que un servidor Apache puede responder a peticiones en diferentes direcciones IP, diferentes nombres de hosts y diferentes puertos.
La directiva Listen
le indica al servidor que acepte peticiones entrantes solamente en
los puertos y en las combinaciones de puertos y direcciones que se
especifiquen. Si solo se especifica un nmero de puerto en la
directiva Listen el
servidor escuchar en ese puerto, en todas las interfaces de
red de la mquina. Si se especifica una direccin IP y
un puerto, el servidor escuchar solamente en la interfaz de
red a la que pertenezca esa direccin IP y solamente en el
puerto indicado. Se pueden usar varias directivas Listen para
especificar varias direcciones IP y puertos de escucha. El
servidor responder a las peticiones de todas las direcciones
y puertos que se incluyan.
Por ejemplo, para hacer que el servidor acepte conexiones tanto en el puerto 80 como en el puerto 8000, puede usar:
Listen 80
Listen 8000
Para hacer que el servidor acepte conexiones en dos interfaces de red y puertos especficos, use
Listen 192.170.2.1:80
Listen 192.170.2.5:8000
Las direcciones IPv6 deben escribirse entre corchetes, como en el siguiente ejemplo:
Listen [2001:db8::a00:20ff:fea7:ccea]:80
Cada vez ms plataformas implementan IPv6, y APR soporta IPv6 en la mayor parte de esas plataformas, permitiendo que Apache use sockets IPv6 y pueda tratar las peticiones que se envan con IPv6.
Un factor de complejidad para los administradores de Apache es
si un socket IPv6 puede tratar tanto conexiones IPv4 como
IPv6. Para tratar conexiones IPv4 con sockets IPv6 se utiliza un
traductor de direcciones IPv4-IPv6, cuyo uso est permitido
por defecto en la mayor parte de las plataformas, pero que
est desactivado por defecto en FreeBSD, NetBSD, y OpenBSD
para cumplir con la poltica system-wide en esas
palaformas. Pero incluso en los sistemas en los que no est
permitido su uso por defecto, un parmetro especial de
configure puede modificar ese
comportamiento.
Si quiere que Apache trate conexiones IPv4 y IPv6 con un
mnimo de sockets, lo que requiere traducir direcciones IPv4
a IPv6, especifique la opcin de configure
--enable-v4-mapped y use directivas Listen genricas de la
siguiente forma:
Listen 80
Con --enable-v4-mapped, las directivas Listen en
el fichero de configuracin por defecto creado por Apache
usarn ese formato. --enable-v4-mapped es el
valor por defecto en todas las plataformas excepto en FreeBSD,
NetBSD, y OpenBSD, de modo que esa es probablemente la manera en
que su servidor Apache fue construido.
Si quiere que Apache solo procese conexiones IPv4, sin tener en
cuenta cul es su plataforma o qu soporta APR, especifique
una direccin IPv4 en todas las directivas Listen, como en
estos ejemplos:
Listen 0.0.0.0:80
Listen 192.170.2.1:80
Si quiere que Apache procese conexiones IPv4 y IPv6 en sockets
diferentes (es decir, deshabilitar la conversin de
direcciones IPv4 a IPv6), especifique la opcin de
configure --disable-v4-mapped y
use directivas Listen especficas como en el siguiente ejemplo:
Listen [::]:80
Listen 0.0.0.0:80
Con --disable-v4-mapped, las directivas Listen en
el fichero de configuracin que Apache crea por defecto
usarn ese formato. --disable-v4-mapped se usa
por defecto en FreeBSD, NetBSD, y OpenBSD.
Listen no implementa
hosts virtuales. Solo le dice al servidor
principal en qu direcciones y puertos tiene que escuchar. Si no
se usan directivas <VirtualHost>, el servidor se comporta de
la misma manera con todas las peticiones que se acepten. Sin
embargo, <VirtualHost> puede usarse para
especificar un comportamiento diferente en una o varias
direcciones y puertos. Para implementar un host virtual, hay que
indicarle primero al servidor que escuche en aquellas direcciones y
puertos a usar. Entonces se debe crear un una seccin
<VirtualHost>
en una direccin y puerto especficos para determinar
el comportamiento de ese host virtual. Tenga en cuenta que si se
especifica en una seccin <VirtualHost> una direccin y puerto
en los que el servidor no est escuchando, ese host virtual no
podr ser accedido.
Apache HTTP Server Version 2.0

This document describes the files used to configure the Apache HTTP server.
| Related Modules | Related Directives |
|---|---|
Apache is configured by placing directives in plain text
configuration files. The main configuration file is usually called
httpd.conf. The location of this file is set at
compile-time, but may be overridden with the -f
command line flag. In addition, other configuration files may be
added using the Include
directive, and wildcards can be used to include many configuration
files. Any directive may be placed in any of these configuration
files. Changes to the main configuration files are only
recognized by Apache when it is started or restarted.
The server also reads a file containing mime document types;
the filename is set by the TypesConfig directive,
and is mime.types by default.
Apache configuration files contain one directive per line. The back-slash "\" may be used as the last character on a line to indicate that the directive continues onto the next line. There must be no other characters or white space between the back-slash and the end of the line.
Directives in the configuration files are case-insensitive, but arguments to directives are often case sensitive. Lines that begin with the hash character "#" are considered comments, and are ignored. Comments may not be included on a line after a configuration directive. Blank lines and white space occurring before a directive are ignored, so you may indent directives for clarity.
You can check your configuration files for syntax errors
without starting the server by using apachectl
configtest or the -t command line
option.
| Related Modules | Related Directives |
|---|---|
Apache is a modular server. This implies that only the most
basic functionality is included in the core server. Extended
features are available through modules which can be loaded
into Apache. By default, a base set of modules is
included in the server at compile-time. If the server is
compiled to use dynamically loaded
modules, then modules can be compiled separately and added at
any time using the LoadModule
directive.
Otherwise, Apache must be recompiled to add or remove modules.
Configuration directives may be included conditional on a
presence of a particular module by enclosing them in an <IfModule> block.
To see which modules are currently compiled into the server,
you can use the -l command line option.
| Related Modules | Related Directives |
|---|---|
Directives placed in the main configuration files apply to
the entire server. If you wish to change the configuration for
only a part of the server, you can scope your directives by
placing them in <Directory>, <DirectoryMatch>, <Files>, <FilesMatch>, <Location>, and <LocationMatch>
sections. These sections limit the application of the
directives which they enclose to particular filesystem
locations or URLs. They can also be nested, allowing for very
fine grained configuration.
Apache has the capability to serve many different websites
simultaneously. This is called Virtual
Hosting. Directives can also be scoped by placing them
inside <VirtualHost>
sections, so that they will only apply to requests for a
particular website.
Although most directives can be placed in any of these sections, some directives do not make sense in some contexts. For example, directives controlling process creation can only be placed in the main server context. To find which directives can be placed in which sections, check the Context of the directive. For further information, we provide details on How Directory, Location and Files sections work.
| Related Modules | Related Directives |
|---|---|
Apache allows for decentralized management of configuration
via special files placed inside the web tree. The special files
are usually called .htaccess, but any name can be
specified in the AccessFileName
directive. Directives placed in .htaccess files
apply to the directory where you place the file, and all
sub-directories. The .htaccess files follow the
same syntax as the main configuration files. Since
.htaccess files are read on every request, changes
made in these files take immediate effect.
To find which directives can be placed in
.htaccess files, check the Context of the
directive. The server administrator further controls what
directives may be placed in .htaccess files by
configuring the AllowOverride
directive in the main configuration files.
For more information on .htaccess files, see
the .htaccess tutorial.
Apache HTTP Server Version 2.0

Apache supports content negotiation as described in the HTTP/1.1 specification. It can choose the best representation of a resource based on the browser-supplied preferences for media type, languages, character set and encoding. It also implements a couple of features to give more intelligent handling of requests from browsers that send incomplete negotiation information.
Content negotiation is provided by the
mod_negotiation module, which is compiled in
by default.
A resource may be available in several different representations. For example, it might be available in different languages or different media types, or a combination. One way of selecting the most appropriate choice is to give the user an index page, and let them select. However it is often possible for the server to choose automatically. This works because browsers can send, as part of each request, information about what representations they prefer. For example, a browser could indicate that it would like to see information in French, if possible, else English will do. Browsers indicate their preferences by headers in the request. To request only French representations, the browser would send
Accept-Language: fr
Note that this preference will only be applied when there is a choice of representations and they vary by language.
As an example of a more complex request, this browser has been configured to accept French and English, but prefer French, and to accept various media types, preferring HTML over plain text or other text types, and preferring GIF or JPEG over other media types, but also allowing any other media type as a last resort:
Accept-Language: fr; q=1.0, en; q=0.5
Accept: text/html; q=1.0, text/*; q=0.8, image/gif; q=0.6, image/jpeg; q=0.6, image/*; q=0.5, */*; q=0.1
Apache supports 'server driven' content negotiation, as
defined in the HTTP/1.1 specification. It fully supports the
Accept, Accept-Language,
Accept-Charset andAccept-Encoding
request headers. Apache also supports 'transparent'
content negotiation, which is an experimental negotiation
protocol defined in RFC 2295 and RFC 2296. It does not offer
support for 'feature negotiation' as defined in these RFCs.
A resource is a conceptual entity identified by a URI (RFC 2396). An HTTP server like Apache provides access to representations of the resource(s) within its namespace, with each representation in the form of a sequence of bytes with a defined media type, character set, encoding, etc. Each resource may be associated with zero, one, or more than one representation at any given time. If multiple representations are available, the resource is referred to as negotiable and each of its representations is termed a variant. The ways in which the variants for a negotiable resource vary are called the dimensions of negotiation.
In order to negotiate a resource, the server needs to be given information about each of the variants. This is done in one of two ways:
*.var
file) which names the files containing the variants
explicitly, orA type map is a document which is associated with the
handler named type-map (or, for
backwards-compatibility with older Apache configurations, the
MIME type application/x-type-map). Note that to
use this feature, you must have a handler set in the
configuration that defines a file suffix as
type-map; this is best done with
AddHandler type-map .var
in the server configuration file.
Type map files should have the same name as the resource
which they are describing, and have an entry for each available
variant; these entries consist of contiguous HTTP-format header
lines. Entries for different variants are separated by blank
lines. Blank lines are illegal within an entry. It is
conventional to begin a map file with an entry for the combined
entity as a whole (although this is not required, and if
present will be ignored). An example map file is shown below.
This file would be named foo.var, as it describes
a resource named foo.
URI: foo
URI: foo.en.html
Content-type: text/html
Content-language: en
URI: foo.fr.de.html
Content-type: text/html;charset=iso-8859-2
Content-language: fr, de
Note also that a typemap file will take precedence over the filename's extension, even when Multiviews is on. If the variants have different source qualities, that may be indicated by the "qs" parameter to the media type, as in this picture (available as JPEG, GIF, or ASCII-art):
URI: foo
URI: foo.jpeg
Content-type: image/jpeg; qs=0.8
URI: foo.gif
Content-type: image/gif; qs=0.5
URI: foo.txt
Content-type: text/plain; qs=0.01
qs values can vary in the range 0.000 to 1.000. Note that any variant with a qs value of 0.000 will never be chosen. Variants with no 'qs' parameter value are given a qs factor of 1.0. The qs parameter indicates the relative 'quality' of this variant compared to the other available variants, independent of the client's capabilities. For example, a JPEG file is usually of higher source quality than an ASCII file if it is attempting to represent a photograph. However, if the resource being represented is an original ASCII art, then an ASCII representation would have a higher source quality than a JPEG representation. A qs value is therefore specific to a given variant depending on the nature of the resource it represents.
The full list of headers recognized is available in the mod_negotation typemap documentation.
MultiViews is a per-directory option, meaning it
can be set with an Options
directive within a <Directory>, <Location> or <Files> section in
httpd.conf, or (if AllowOverride is properly set) in
.htaccess files. Note that Options All
does not set MultiViews; you have to ask for it by
name.
The effect of MultiViews is as follows: if the
server receives a request for /some/dir/foo, if
/some/dir has MultiViews enabled, and
/some/dir/foo does not exist, then the
server reads the directory looking for files named foo.*, and
effectively fakes up a type map which names all those files,
assigning them the same media types and content-encodings it
would have if the client had asked for one of them by name. It
then chooses the best match to the client's requirements.
MultiViews may also apply to searches for the file
named by the DirectoryIndex directive, if the
server is trying to index a directory. If the configuration files
specify
DirectoryIndex index
then the server will arbitrate between index.html
and index.html3 if both are present. If neither
are present, and index.cgi is there, the server
will run it.
If one of the files found when reading the directory does not
have an extension recognized by mod_mime to designate
its Charset, Content-Type, Language, or Encoding, then the result
depends on the setting of the MultiViewsMatch directive. This
directive determines whether handlers, filters, and other
extension types can participate in MultiViews negotiation.
After Apache has obtained a list of the variants for a given resource, either from a type-map file or from the filenames in the directory, it invokes one of two methods to decide on the 'best' variant to return, if any. It is not necessary to know any of the details of how negotiation actually takes place in order to use Apache's content negotiation features. However the rest of this document explains the methods used for those interested.
There are two negotiation methods:
| Dimension | Notes |
|---|---|
| Media Type | Browser indicates preferences with the Accept
header field. Each item can have an associated quality factor.
Variant description can also have a quality factor (the "qs"
parameter). |
| Language | Browser indicates preferences with the
Accept-Language header field. Each item can have
a quality factor. Variants can be associated with none, one or
more than one language. |
| Encoding | Browser indicates preference with the
Accept-Encoding header field. Each item can have
a quality factor. |
| Charset | Browser indicates preference with the
Accept-Charset header field. Each item can have a
quality factor. Variants can indicate a charset as a parameter
of the media type. |
Apache can use the following algorithm to select the 'best' variant (if any) to return to the browser. This algorithm is not further configurable. It operates as follows:
Accept
header with the quality-of-source factor for this variants
media type, and select the variants with the highest
value.Accept-Language header (if present), or else
the order of languages in the LanguagePriority
directive (if present).Accept-Charset
header line. Charset ISO-8859-1 is acceptable unless
explicitly excluded. Variants with a text/*
media type but not explicitly associated with a particular
charset are assumed to be in ISO-8859-1.Vary is set to indicate the dimensions of
negotiation (browsers and caches can use this information when
caching the resource). End.Vary header to
indicate the dimensions of variance.Apache sometimes changes the quality values from what would
be expected by a strict interpretation of the Apache
negotiation algorithm above. This is to get a better result
from the algorithm for browsers which do not send full or
accurate information. Some of the most popular browsers send
Accept header information which would otherwise
result in the selection of the wrong variant in many cases. If a
browser sends full and correct information these fiddles will not
be applied.
The Accept: request header indicates preferences
for media types. It can also include 'wildcard' media types, such
as "image/*" or "*/*" where the * matches any string. So a request
including:
Accept: image/*, */*
would indicate that any type starting "image/" is acceptable, as is any other type. Some browsers routinely send wildcards in addition to explicit types they can handle. For example:
Accept: text/html, text/plain, image/gif, image/jpeg, */*
The intention of this is to indicate that the explicitly listed types are preferred, but if a different representation is available, that is ok too. Using explicit quality values, what the browser really wants is something like:
Accept: text/html, text/plain, image/gif, image/jpeg, */*; q=0.01
The explicit types have no quality factor, so they default to a preference of 1.0 (the highest). The wildcard */* is given a low preference of 0.01, so other types will only be returned if no variant matches an explicitly listed type.
If the Accept: header contains no q
factors at all, Apache sets the q value of "*/*", if present, to
0.01 to emulate the desired behavior. It also sets the q value of
wildcards of the format "type/*" to 0.02 (so these are preferred
over matches against "*/*". If any media type on the
Accept: header contains a q factor, these special
values are not applied, so requests from browsers which
send the explicit information to start with work as expected.
New in Apache 2.0, some exceptions have been added to the negotiation algorithm to allow graceful fallback when language negotiation fails to find a match.
When a client requests a page on your server, but the server
cannot find a single page that matches the
Accept-language sent by
the browser, the server will return either a "No Acceptable
Variant" or "Multiple Choices" response to the client. To avoid
these error messages, it is possible to configure Apache to ignore
the Accept-language in these cases and provide a
document that does not explicitly match the client's request. The
ForceLanguagePriority
directive can be used to override one or both of these error
messages and substitute the servers judgement in the form of the
LanguagePriority
directive.
The server will also attempt to match language-subsets when no
other match can be found. For example, if a client requests
documents with the language en-GB for British
English, the server is not normally allowed by the HTTP/1.1
standard to match that against a document that is marked as simply
en. (Note that it is almost surely a configuration
error to include en-GB and not en in the
Accept-Language header, since it is very unlikely
that a reader understands British English, but doesn't understand
English in general. Unfortunately, many current clients have
default configurations that resemble this.) However, if no other
language match is possible and the server is about to return a "No
Acceptable Variants" error or fallback to the LanguagePriority, the server
will ignore the subset specification and match en-GB
against en documents. Implicitly, Apache will add
the parent language to the client's acceptable language list with
a very low quality value. But note that if the client requests
"en-GB; q=0.9, fr; q=0.8", and the server has documents
designated "en" and "fr", then the "fr" document will be returned.
This is necessary to maintain compliance with the HTTP/1.1
specification and to work effectively with properly configured
clients.
In order to support advanced techniques (such as cookies or
special URL-paths) to determine the user's preferred language,
since Apache 2.0.47 mod_negotiation recognizes
the environment variable
prefer-language. If it exists and contains an
appropriate language tag, mod_negotiation will
try to select a matching variant. If there's no such variant,
the normal negotiation process applies.
SetEnvIf Cookie "language=en" prefer-language=en
SetEnvIf Cookie "language=fr" prefer-language=fr
Apache extends the transparent content negotiation protocol (RFC
2295) as follows. A new {encoding ..} element is used in
variant lists to label variants which are available with a specific
content-encoding only. The implementation of the RVSA/1.0 algorithm
(RFC 2296) is extended to recognize encoded variants in the list, and
to use them as candidate variants whenever their encodings are
acceptable according to the Accept-Encoding request
header. The RVSA/1.0 implementation does not round computed quality
factors to 5 decimal places before choosing the best variant.
If you are using language negotiation you can choose between different naming conventions, because files can have more than one extension, and the order of the extensions is normally irrelevant (see the mod_mime documentation for details).
A typical file has a MIME-type extension (e.g.,
html), maybe an encoding extension (e.g.,
gz), and of course a language extension
(e.g., en) when we have different
language variants of this file.
Examples:
Here some more examples of filenames together with valid and invalid hyperlinks:
| Filename | Valid hyperlink | Invalid hyperlink |
|---|---|---|
| foo.html.en | foo foo.html |
- |
| foo.en.html | foo | foo.html |
| foo.html.en.gz | foo foo.html |
foo.gz foo.html.gz |
| foo.en.html.gz | foo | foo.html foo.html.gz foo.gz |
| foo.gz.html.en | foo foo.gz foo.gz.html |
foo.html |
| foo.html.gz.en | foo foo.html foo.html.gz |
foo.gz |
Looking at the table above, you will notice that it is always
possible to use the name without any extensions in a hyperlink
(e.g., foo). The advantage is that you
can hide the actual type of a document rsp. file and can change
it later, e.g., from html to
shtml or cgi without changing any
hyperlink references.
If you want to continue to use a MIME-type in your
hyperlinks (e.g. foo.html) the language
extension (including an encoding extension if there is one)
must be on the right hand side of the MIME-type extension
(e.g., foo.html.en).
When a cache stores a representation, it associates it with the request URL. The next time that URL is requested, the cache can use the stored representation. But, if the resource is negotiable at the server, this might result in only the first requested variant being cached and subsequent cache hits might return the wrong response. To prevent this, Apache normally marks all responses that are returned after content negotiation as non-cacheable by HTTP/1.0 clients. Apache also supports the HTTP/1.1 protocol features to allow caching of negotiated responses.
For requests which come from a HTTP/1.0 compliant client
(either a browser or a cache), the directive CacheNegotiatedDocs can be
used to allow caching of responses which were subject to
negotiation. This directive can be given in the server config or
virtual host, and takes no arguments. It has no effect on requests
from HTTP/1.1 clients.
For more information about content negotiation, see Alan J. Flavell's Language Negotiation Notes. But note that this document may not be updated to include changes in Apache 2.0.
Versin 2.0 del Servidor HTTP Apache

Apache ofrece la posibilidad de que los webmasters puedan configurar las respuestas que muestra el servidor Apache cuando se producen algunos errores o problemas.
Las respuestas personalizadas pueden definirse para activarse en caso de que el servidor detecte un error o problema.
Si un script termina de forma anormal y se produce una respuesta "500 Server Error", esta respuesta puede ser sustituida por otro texto de su eleccin o por una redireccin a otra URL (local o externa).
NCSA httpd 1.3 devolva mensajes antiguos del error o problema encontrado que con frecuencia no tenan significado alguno para el usuario, y que no incluan en los logs informacin que diera pistas sobre las causas de lo sucedido.
Se puede hacer que el servidor siga uno de los siguientes comportamientos:
Redireccionar a otra URL puede resultar de utilidad, pero solo si con ello se puede tambin pasar alguna informacin que pueda explicar el error o problema y/o registrarlo en el log correspondiente ms claramente.
Para conseguir esto, Apache define ahora variables de entorno similares a las de los CGI:
REDIRECT_HTTP_ACCEPT=*/*, image/gif, image/x-xbitmap,
image/jpeg
REDIRECT_HTTP_USER_AGENT=Mozilla/1.1b2 (X11; I; HP-UX A.09.05
9000/712)
REDIRECT_PATH=.:/bin:/usr/local/bin:/etc
REDIRECT_QUERY_STRING=
REDIRECT_REMOTE_ADDR=121.345.78.123
REDIRECT_REMOTE_HOST=ooh.ahhh.com
REDIRECT_SERVER_NAME=crash.bang.edu
REDIRECT_SERVER_PORT=80
REDIRECT_SERVER_SOFTWARE=Apache/0.8.15
REDIRECT_URL=/cgi-bin/buggy.pl
Tenga en cuenta el prefijo REDIRECT_.
Al menos REDIRECT_URL y
REDIRECT_QUERY_STRING se pasarn a la nueva
URL (asumiendo que es un cgi-script o un cgi-include). Las otras
variables existirn solo si existan antes de aparecer
el error o problema. Ninguna de estas variables
se crear si en la directiva ErrorDocument ha especificado una
redireccin externa (cualquier cosa que empiece
por un nombre de esquema del tipo http:, incluso si
se refiere al mismo servidor).
El uso de ErrorDocument
est activado para los ficheros .htaccess cuando AllowOverride tiene el valor
adecuado.
Aqu hay algunos ejemplos ms...
ErrorDocument 500 /cgi-bin/crash-recover
ErrorDocument 500 "Sorry, our script crashed. Oh dear"
ErrorDocument 500 http://xxx/
ErrorDocument 404 /Lame_excuses/not_found.html
ErrorDocument 401 /Subscription/how_to_subscribe.html
La sintaxis es,
ErrorDocument <3-digit-code> <action>
donde action puede ser,
El comportamiento de Apache en cuanto a las redirecciones ha cambiado para que puedan usarse ms variables de entorno con los script/server-include.
Las variables CGI estndar estaban disponibles para el script al que se haca la redireccin. No se inclua ninguna indicacin sobre la precedencia de la redireccin.
Un nuevo grupo de variables de entorno se inicializa para que
las use el script al que ha sido redireccionado. Cada
nueva variable tendr el prefijo REDIRECT_.
Las variables de entorno REDIRECT_ se crean a
partir de de las variables de entorno CGI que existen antes de
la redireccin, se les cambia el nombre
aadindoles el prefijo REDIRECT_, por
ejemplo, HTTP_USER_AGENT pasa a ser
REDIRECT_HTTP_USER_AGENT. Adems, para esas
nuevas variables, Apache definir REDIRECT_URL
y REDIRECT_STATUS para ayudar al script a seguir su
origen. Tanto la URL original como la URL a la que es redirigida
la peticin pueden almacenarse en los logs de acceso.
Si ErrorDocument especifica una redireccin local a un
script CGI, el script debe incluir una campo de cabeceraa
"Status:" en el resultado final para asegurar que
es posible hacer llegar al cliente de vuelta la condicin
de error que lo provoc. Por ejemplo, un script en Perl
para usar con ErrorDocument podra incluir lo
siguiente:
...
print "Content-type: text/html\n";
printf "Status: %s Condition Intercepted\n", $ENV{"REDIRECT_STATUS"};
...
Si el script tiene como fin tratar una determinada
condicin de error, por ejemplo
404 Not Found, se pueden usar los
cdigos de error y textos especficos en su lugar.
Tenga en cuenta que el script debe incluir un campo
de cabecera Status: apropiado (como
302 Found), si la respuesta contiene un campo de
cabecera Location: (para poder enviar una
redireccin que se interprete en el cliente). De otra
manera, la cabecera
Location: puede que no tenga efecto.
Apache HTTP Server Version 2.0

This document has not been updated to take into account changes made in the 2.0 version of the Apache HTTP Server. Some of the information may still be relevant, but please use it with care.
These are some notes on the Apache API and the data structures you have to deal with, etc. They are not yet nearly complete, but hopefully, they will help you get your bearings. Keep in mind that the API is still subject to change as we gain experience with it. (See the TODO file for what might be coming). However, it will be easy to adapt modules to any changes that are made. (We have more modules to adapt than you do).
A few notes on general pedagogical style here. In the interest of conciseness, all structure declarations here are incomplete -- the real ones have more slots that I'm not telling you about. For the most part, these are reserved to one component of the server core or another, and should be altered by modules with caution. However, in some cases, they really are things I just haven't gotten around to yet. Welcome to the bleeding edge.
Finally, here's an outline, to give you some bare idea of what's coming up, and in what order:
We begin with an overview of the basic concepts behind the API, and how they are manifested in the code.
Apache breaks down request handling into a series of steps, more or less the same way the Netscape server API does (although this API has a few more stages than NetSite does, as hooks for stuff I thought might be useful in the future). These are:
SetEnv, which don't really fit well elsewhere.These phases are handled by looking at each of a succession of modules, looking to see if each of them has a handler for the phase, and attempting invoking it if so. The handler can typically do one of three things:
OK.DECLINED. In this case, the server behaves in all
respects as if the handler simply hadn't been there.Most phases are terminated by the first module that handles them;
however, for logging, `fixups', and non-access authentication checking,
all handlers always run (barring an error). Also, the response phase is
unique in that modules may declare multiple handlers for it, via a
dispatch table keyed on the MIME type of the requested object. Modules may
declare a response-phase handler which can handle any request,
by giving it the key */* (i.e., a wildcard MIME type
specification). However, wildcard handlers are only invoked if the server
has already tried and failed to find a more specific response handler for
the MIME type of the requested object (either none existed, or they all
declined).
The handlers themselves are functions of one argument (a
request_rec structure. vide infra), which returns an integer,
as above.
At this point, we need to explain the structure of a module. Our
candidate will be one of the messier ones, the CGI module -- this handles
both CGI scripts and the ScriptAlias config file command. It's actually a great deal
more complicated than most modules, but if we're going to have only one
example, it might as well be the one with its fingers in every place.
Let's begin with handlers. In order to handle the CGI scripts, the
module declares a response handler for them. Because of ScriptAlias, it also has handlers for the
name translation phase (to recognize ScriptAliased URIs), the type-checking phase (any
ScriptAliased request is typed
as a CGI script).
The module needs to maintain some per (virtual) server information,
namely, the ScriptAliases in
effect; the module structure therefore contains pointers to a functions
which builds these structures, and to another which combines two of them
(in case the main server and a virtual server both have ScriptAliases declared).
Finally, this module contains code to handle the ScriptAlias command itself. This particular
module only declares one command, but there could be more, so modules have
command tables which declare their commands, and describe where
they are permitted, and how they are to be invoked.
A final note on the declared types of the arguments of some of these
commands: a pool is a pointer to a resource pool
structure; these are used by the server to keep track of the memory which
has been allocated, files opened, etc., either to service a
particular request, or to handle the process of configuring itself. That
way, when the request is over (or, for the configuration pool, when the
server is restarting), the memory can be freed, and the files closed,
en masse, without anyone having to write explicit code to track
them all down and dispose of them. Also, a cmd_parms
structure contains various information about the config file being read,
and other status information, which is sometimes of use to the function
which processes a config-file command (such as ScriptAlias). With no further ado, the
module itself:
/* Declarations of handlers. */
int translate_scriptalias (request_rec *);
int type_scriptalias (request_rec *);
int cgi_handler (request_rec *);
/* Subsidiary dispatch table for response-phase
* handlers, by MIME type */
handler_rec cgi_handlers[] = {
{ "application/x-httpd-cgi", cgi_handler },
{ NULL }
};
/* Declarations of routines to manipulate the
* module's configuration info. Note that these are
* returned, and passed in, as void *'s; the server
* core keeps track of them, but it doesn't, and can't,
* know their internal structure.
*/
void *make_cgi_server_config (pool *);
void *merge_cgi_server_config (pool *, void *, void *);
/* Declarations of routines to handle config-file commands */
extern char *script_alias(cmd_parms *, void *per_dir_config, char *fake,
char *real);
command_rec cgi_cmds[] = {
{ "ScriptAlias", script_alias, NULL, RSRC_CONF, TAKE2,
"a fakename and a realname"},
{ NULL }
};
module cgi_module = {
STANDARD_MODULE_STUFF, NULL, /* initializer */ NULL, /* dir config creator */ NULL, /* dir merger */ make_cgi_server_config, /* server config */ merge_cgi_server_config, /* merge server config */ cgi_cmds, /* command table */ cgi_handlers, /* handlers */ translate_scriptalias, /* filename translation */ NULL, /* check_user_id */ NULL, /* check auth */ NULL, /* check access */ type_scriptalias, /* type_checker */ NULL, /* fixups */ NULL, /* logger */ NULL /* header parser */ };
The sole argument to handlers is a request_rec structure.
This structure describes a particular request which has been made to the
server, on behalf of a client. In most cases, each connection to the
client generates only one request_rec structure.
The request_rec contains pointers to a resource pool
which will be cleared when the server is finished handling the request;
to structures containing per-server and per-connection information, and
most importantly, information on the request itself.
The most important such information is a small set of character strings describing attributes of the object being requested, including its URI, filename, content-type and content-encoding (these being filled in by the translation and type-check handlers which handle the request, respectively).
Other commonly used data items are tables giving the MIME headers on
the client's original request, MIME headers to be sent back with the
response (which modules can add to at will), and environment variables for
any subprocesses which are spawned off in the course of servicing the
request. These tables are manipulated using the ap_table_get
and ap_table_set routines.
Note that the Content-type header value cannot
be set by module content-handlers using the ap_table_*()
routines. Rather, it is set by pointing the content_type
field in the request_rec structure to an appropriate
string. e.g.,
r->content_type = "text/html";
Finally, there are pointers to two data structures which, in turn,
point to per-module configuration structures. Specifically, these hold
pointers to the data structures which the module has built to describe
the way it has been configured to operate in a given directory (via
.htaccess files or <Directory> sections), for private data it has built in the
course of servicing the request (so modules' handlers for one phase can
pass `notes' to their handlers for other phases). There is another such
configuration vector in the server_rec data structure pointed
to by the request_rec, which contains per (virtual) server
configuration data.
Here is an abridged declaration, giving the fields most commonly used:
struct request_rec {
pool *pool;
conn_rec *connection;
server_rec *server;
/* What object is being requested */
char *uri;
char *filename;
char *path_info;
char *args; /* QUERY_ARGS, if any */
struct stat finfo; /* Set by server core;
* st_mode set to zero if no such file */
char *content_type;
char *content_encoding;
/* MIME header environments, in and out. Also,
* an array containing environment variables to
* be passed to subprocesses, so people can write
* modules to add to that environment.
*
* The difference between headers_out and
* err_headers_out is that the latter are printed
* even on error, and persist across internal
* redirects (so the headers printed for
* ErrorDocument handlers will have
them).
*/
table *headers_in;
table *headers_out;
table *err_headers_out;
table *subprocess_env;
/* Info about the request itself... */
int header_only; /* HEAD request, as opposed to GET */ char *protocol; /* Protocol, as given to us, or HTTP/0.9 */ char *method; /* GET, HEAD, POST, etc. */ int method_number; /* M_GET, M_POST, etc. */
/* Info for logging */
char *the_request;
int bytes_sent;
/* A flag which modules can set, to indicate that
* the data being returned is volatile, and clients
* should be told not to cache it.
*/
int no_cache;
/* Various other config info which may change
* with .htaccess files
* These are config vectors, with one void*
* pointer for each module (the thing pointed
* to being the module's business).
*/
void *per_dir_config; /* Options set in config files, etc. */ void *request_config; /* Notes on *this* request */
};
Most request_rec structures are built by reading an HTTP
request from a client, and filling in the fields. However, there are a
few exceptions:
*.var file), or a CGI script which returned a local
`Location:', then the resource which the user requested is going to be
ultimately located by some URI other than what the client originally
supplied. In this case, the server does an internal redirect,
constructing a new request_rec for the new URI, and
processing it almost exactly as if the client had requested the new URI
directly.ErrorDocument
is in scope, the same internal redirect machinery comes into play.Finally, a handler occasionally needs to investigate `what would happen if' some other request were run. For instance, the directory indexing module needs to know what MIME type would be assigned to a request for each directory entry, in order to figure out what icon to use.
Such handlers can construct a sub-request, using the
functions ap_sub_req_lookup_file,
ap_sub_req_lookup_uri, and ap_sub_req_method_uri;
these construct a new request_rec structure and processes it
as you would expect, up to but not including the point of actually sending
a response. (These functions skip over the access checks if the
sub-request is for a file in the same directory as the original
request).
(Server-side includes work by building sub-requests and then actually
invoking the response handler for them, via the function
ap_run_sub_req).
As discussed above, each handler, when invoked to handle a particular
request_rec, has to return an int to indicate
what happened. That can either be
OK -- the request was handled successfully. This may or
may not terminate the phase.DECLINED -- no erroneous condition exists, but the module
declines to handle the phase; the server tries to find another.Note that if the error code returned is REDIRECT, then
the module should put a Location in the request's
headers_out, to indicate where the client should be
redirected to.
Handlers for most phases do their work by simply setting a few fields
in the request_rec structure (or, in the case of access
checkers, simply by returning the correct error code). However, response
handlers have to actually send a request back to the client.
They should begin by sending an HTTP response header, using the
function ap_send_http_header. (You don't have to do anything
special to skip sending the header for HTTP/0.9 requests; the function
figures out on its own that it shouldn't do anything). If the request is
marked header_only, that's all they should do; they should
return after that, without attempting any further output.
Otherwise, they should produce a request body which responds to the
client as appropriate. The primitives for this are ap_rputc
and ap_rprintf, for internally generated output, and
ap_send_fd, to copy the contents of some FILE *
straight to the client.
At this point, you should more or less understand the following piece
of code, which is the handler which handles GET requests
which have no more specific handler; it also shows how conditional
GETs can be handled, if it's desirable to do so in a
particular response handler -- ap_set_last_modified checks
against the If-modified-since value supplied by the client,
if any, and returns an appropriate code (which will, if nonzero, be
USE_LOCAL_COPY). No similar considerations apply for
ap_set_content_length, but it returns an error code for
symmetry.
int default_handler (request_rec *r)
{
int errstatus;
FILE *f;
if (r->method_number != M_GET) return DECLINED;
if (r->finfo.st_mode == 0) return NOT_FOUND;
if ((errstatus = ap_set_content_length (r, r->finfo.st_size))
||
(errstatus = ap_set_last_modified (r, r->finfo.st_mtime)))
return errstatus;
f = fopen (r->filename, "r");
if (f == NULL) {
log_reason("file permissions deny server access", r->filename, r);
return FORBIDDEN;
}
register_timeout ("send", r);
ap_send_http_header (r);
if (!r->header_only) send_fd (f, r);
ap_pfclose (r->pool, f);
return OK;
}
Finally, if all of this is too much of a challenge, there are a few
ways out of it. First off, as shown above, a response handler which has
not yet produced any output can simply return an error code, in which
case the server will automatically produce an error response. Secondly,
it can punt to some other handler by invoking
ap_internal_redirect, which is how the internal redirection
machinery discussed above is invoked. A response handler which has
internally redirected should always return OK.
(Invoking ap_internal_redirect from handlers which are
not response handlers will lead to serious confusion).
Stuff that should be discussed here in detail:
ap_auth_type,
ap_auth_name, and ap_requires.ap_get_basic_auth_pw, which sets the
connection->user structure field
automatically, and ap_note_basic_auth_failure,
which arranges for the proper WWW-Authenticate:
header to be sent back).When a request has internally redirected, there is the question of
what to log. Apache handles this by bundling the entire chain of redirects
into a list of request_rec structures which are threaded
through the r->prev and r->next pointers.
The request_rec which is passed to the logging handlers in
such cases is the one which was originally built for the initial request
from the client; note that the bytes_sent field will only be
correct in the last request in the chain (the one for which a response was
actually sent).
One of the problems of writing and designing a server-pool server is that of preventing leakage, that is, allocating resources (memory, open files, etc.), without subsequently releasing them. The resource pool machinery is designed to make it easy to prevent this from happening, by allowing resource to be allocated in such a way that they are automatically released when the server is done with them.
The way this works is as follows: the memory which is allocated, file opened, etc., to deal with a particular request are tied to a resource pool which is allocated for the request. The pool is a data structure which itself tracks the resources in question.
When the request has been processed, the pool is cleared. At that point, all the memory associated with it is released for reuse, all files associated with it are closed, and any other clean-up functions which are associated with the pool are run. When this is over, we can be confident that all the resource tied to the pool have been released, and that none of them have leaked.
Server restarts, and allocation of memory and resources for per-server configuration, are handled in a similar way. There is a configuration pool, which keeps track of resources which were allocated while reading the server configuration files, and handling the commands therein (for instance, the memory that was allocated for per-server module configuration, log files and other files that were opened, and so forth). When the server restarts, and has to reread the configuration files, the configuration pool is cleared, and so the memory and file descriptors which were taken up by reading them the last time are made available for reuse.
It should be noted that use of the pool machinery isn't generally
obligatory, except for situations like logging handlers, where you really
need to register cleanups to make sure that the log file gets closed when
the server restarts (this is most easily done by using the function ap_pfopen, which also arranges for the
underlying file descriptor to be closed before any child processes, such as
for CGI scripts, are execed), or in case you are using the
timeout machinery (which isn't yet even documented here). However, there are
two benefits to using it: resources allocated to a pool never leak (even if
you allocate a scratch string, and just forget about it); also, for memory
allocation, ap_palloc is generally faster than
malloc.
We begin here by describing how memory is allocated to pools, and then discuss how other resources are tracked by the resource pool machinery.
Memory is allocated to pools by calling the function
ap_palloc, which takes two arguments, one being a pointer to
a resource pool structure, and the other being the amount of memory to
allocate (in chars). Within handlers for handling requests,
the most common way of getting a resource pool structure is by looking at
the pool slot of the relevant request_rec; hence
the repeated appearance of the following idiom in module code:
int my_handler(request_rec *r)
{
struct my_structure *foo;
...
foo = (foo *)ap_palloc (r->pool, sizeof(my_structure));
}
Note that there is no ap_pfree --
ap_palloced memory is freed only when the associated resource
pool is cleared. This means that ap_palloc does not have to
do as much accounting as malloc(); all it does in the typical
case is to round up the size, bump a pointer, and do a range check.
(It also raises the possibility that heavy use of
ap_palloc could cause a server process to grow excessively
large. There are two ways to deal with this, which are dealt with below;
briefly, you can use malloc, and try to be sure that all of
the memory gets explicitly freed, or you can allocate a
sub-pool of the main pool, allocate your memory in the sub-pool, and clear
it out periodically. The latter technique is discussed in the section
on sub-pools below, and is used in the directory-indexing code, in order
to avoid excessive storage allocation when listing directories with
thousands of files).
There are functions which allocate initialized memory, and are
frequently useful. The function ap_pcalloc has the same
interface as ap_palloc, but clears out the memory it
allocates before it returns it. The function ap_pstrdup
takes a resource pool and a char * as arguments, and
allocates memory for a copy of the string the pointer points to, returning
a pointer to the copy. Finally ap_pstrcat is a varargs-style
function, which takes a pointer to a resource pool, and at least two
char * arguments, the last of which must be
NULL. It allocates enough memory to fit copies of each of
the strings, as a unit; for instance:
ap_pstrcat (r->pool, "foo", "/", "bar", NULL);
returns a pointer to 8 bytes worth of memory, initialized to
"foo/bar".
A pool is really defined by its lifetime more than anything else. There are some static pools in http_main which are passed to various non-http_main functions as arguments at opportune times. Here they are:
permanent_poolpconfptemppchildptransr->poolFor almost everything folks do, r->pool is the pool to
use. But you can see how other lifetimes, such as pchild, are useful to
some modules... such as modules that need to open a database connection
once per child, and wish to clean it up when the child dies.
You can also see how some bugs have manifested themself, such as
setting connection->user to a value from
r->pool -- in this case connection exists for the
lifetime of ptrans, which is longer than
r->pool (especially if r->pool is a
subrequest!). So the correct thing to do is to allocate from
connection->pool.
And there was another interesting bug in mod_include
/ mod_cgi. You'll see in those that they do this test
to decide if they should use r->pool or
r->main->pool. In this case the resource that they are
registering for cleanup is a child process. If it were registered in
r->pool, then the code would wait() for the
child when the subrequest finishes. With mod_include this
could be any old #include, and the delay can be up to 3
seconds... and happened quite frequently. Instead the subprocess is
registered in r->main->pool which causes it to be
cleaned up when the entire request is done -- i.e., after the
output has been sent to the client and logging has happened.
As indicated above, resource pools are also used to track other sorts
of resources besides memory. The most common are open files. The routine
which is typically used for this is ap_pfopen, which takes a
resource pool and two strings as arguments; the strings are the same as
the typical arguments to fopen, e.g.,
...
FILE *f = ap_pfopen (r->pool, r->filename, "r");
if (f == NULL) { ... } else { ... }
There is also a ap_popenf routine, which parallels the
lower-level open system call. Both of these routines arrange
for the file to be closed when the resource pool in question is
cleared.
Unlike the case for memory, there are functions to close files
allocated with ap_pfopen, and ap_popenf, namely
ap_pfclose and ap_pclosef. (This is because, on
many systems, the number of files which a single process can have open is
quite limited). It is important to use these functions to close files
allocated with ap_pfopen and ap_popenf, since to
do otherwise could cause fatal errors on systems such as Linux, which
react badly if the same FILE* is closed more than once.
(Using the close functions is not mandatory, since the
file will eventually be closed regardless, but you should consider it in
cases where your module is opening, or could open, a lot of files).
More text goes here. Describe the the cleanup primitives in terms of
which the file stuff is implemented; also, spawn_process.
Pool cleanups live until clear_pool() is called:
clear_pool(a) recursively calls destroy_pool()
on all subpools of a; then calls all the cleanups for
a; then releases all the memory for a.
destroy_pool(a) calls clear_pool(a) and then
releases the pool structure itself. i.e.,
clear_pool(a) doesn't delete a, it just frees
up all the resources and you can start using it again immediately.
On rare occasions, too-free use of ap_palloc() and the
associated primitives may result in undesirably profligate resource
allocation. You can deal with such a case by creating a sub-pool,
allocating within the sub-pool rather than the main pool, and clearing or
destroying the sub-pool, which releases the resources which were
associated with it. (This really is a rare situation; the only
case in which it comes up in the standard module set is in case of listing
directories, and then only with very large directories.
Unnecessary use of the primitives discussed here can hair up your code
quite a bit, with very little gain).
The primitive for creating a sub-pool is ap_make_sub_pool,
which takes another pool (the parent pool) as an argument. When the main
pool is cleared, the sub-pool will be destroyed. The sub-pool may also be
cleared or destroyed at any time, by calling the functions
ap_clear_pool and ap_destroy_pool, respectively.
(The difference is that ap_clear_pool frees resources
associated with the pool, while ap_destroy_pool also
deallocates the pool itself. In the former case, you can allocate new
resources within the pool, and clear it again, and so forth; in the
latter case, it is simply gone).
One final note -- sub-requests have their own resource pools, which are
sub-pools of the resource pool for the main request. The polite way to
reclaim the resources associated with a sub request which you have
allocated (using the ap_sub_req_... functions) is
ap_destroy_sub_req, which frees the resource pool. Before
calling this function, be sure to copy anything that you care about which
might be allocated in the sub-request's resource pool into someplace a
little less volatile (for instance, the filename in its
request_rec structure).
(Again, under most circumstances, you shouldn't feel obliged to call
this function; only 2K of memory or so are allocated for a typical sub
request, and it will be freed anyway when the main request pool is
cleared. It is only when you are allocating many, many sub-requests for a
single main request that you should seriously consider the
ap_destroy_... functions).
One of the design goals for this server was to maintain external compatibility with the NCSA 1.3 server --- that is, to read the same configuration files, to process all the directives therein correctly, and in general to be a drop-in replacement for NCSA. On the other hand, another design goal was to move as much of the server's functionality into modules which have as little as possible to do with the monolithic server core. The only way to reconcile these goals is to move the handling of most commands from the central server into the modules.
However, just giving the modules command tables is not enough to divorce
them completely from the server core. The server has to remember the
commands in order to act on them later. That involves maintaining data which
is private to the modules, and which can be either per-server, or
per-directory. Most things are per-directory, including in particular access
control and authorization information, but also information on how to
determine file types from suffixes, which can be modified by
AddType and DefaultType directives, and so forth. In general,
the governing philosophy is that anything which can be made
configurable by directory should be; per-server information is generally
used in the standard set of modules for information like
Aliases and Redirects which come into play before the
request is tied to a particular place in the underlying file system.
Another requirement for emulating the NCSA server is being able to handle
the per-directory configuration files, generally called
.htaccess files, though even in the NCSA server they can
contain directives which have nothing at all to do with access control.
Accordingly, after URI -> filename translation, but before performing any
other phase, the server walks down the directory hierarchy of the underlying
filesystem, following the translated pathname, to read any
.htaccess files which might be present. The information which
is read in then has to be merged with the applicable information
from the server's own config files (either from the <Directory> sections in
access.conf, or from defaults in srm.conf, which
actually behaves for most purposes almost exactly like <Directory
/>).
Finally, after having served a request which involved reading
.htaccess files, we need to discard the storage allocated for
handling them. That is solved the same way it is solved wherever else
similar problems come up, by tying those structures to the per-transaction
resource pool.
Let's look out how all of this plays out in mod_mime.c,
which defines the file typing handler which emulates the NCSA server's
behavior of determining file types from suffixes. What we'll be looking
at, here, is the code which implements the AddType and AddEncoding commands. These commands can appear in
.htaccess files, so they must be handled in the module's
private per-directory data, which in fact, consists of two separate
tables for MIME types and encoding information, and is declared as
follows:
typedef struct {
table *forced_types; /* Additional AddTyped stuff */
table *encoding_types; /* Added with AddEncoding... */
} mime_dir_config;When the server is reading a configuration file, or <Directory> section, which includes
one of the MIME module's commands, it needs to create a
mime_dir_config structure, so those commands have something
to act on. It does this by invoking the function it finds in the module's
`create per-dir config slot', with two arguments: the name of the
directory to which this configuration information applies (or
NULL for srm.conf), and a pointer to a
resource pool in which the allocation should happen.
(If we are reading a .htaccess file, that resource pool
is the per-request resource pool for the request; otherwise it is a
resource pool which is used for configuration data, and cleared on
restarts. Either way, it is important for the structure being created to
vanish when the pool is cleared, by registering a cleanup on the pool if
necessary).
For the MIME module, the per-dir config creation function just
ap_pallocs the structure above, and a creates a couple of
tables to fill it. That looks like this:
void *create_mime_dir_config (pool *p, char *dummy)
{
mime_dir_config *new =
(mime_dir_config *) ap_palloc (p, sizeof(mime_dir_config));
new->forced_types = ap_make_table (p, 4);
new->encoding_types = ap_make_table (p, 4);
return new;
}
Now, suppose we've just read in a .htaccess file. We
already have the per-directory configuration structure for the next
directory up in the hierarchy. If the .htaccess file we just
read in didn't have any AddType
or AddEncoding commands, its
per-directory config structure for the MIME module is still valid, and we
can just use it. Otherwise, we need to merge the two structures
somehow.
To do that, the server invokes the module's per-directory config merge function, if one is present. That function takes three arguments: the two structures being merged, and a resource pool in which to allocate the result. For the MIME module, all that needs to be done is overlay the tables from the new per-directory config structure with those from the parent:
void *merge_mime_dir_configs (pool *p, void *parent_dirv, void *subdirv)
{
mime_dir_config *parent_dir = (mime_dir_config *)parent_dirv;
mime_dir_config *subdir = (mime_dir_config *)subdirv;
mime_dir_config *new =
(mime_dir_config *)ap_palloc (p, sizeof(mime_dir_config));
new->forced_types = ap_overlay_tables (p, subdir->forced_types,
parent_dir->forced_types);
new->encoding_types = ap_overlay_tables (p, subdir->encoding_types,
parent_dir->encoding_types);
return new;
}
As a note -- if there is no per-directory merge function present, the
server will just use the subdirectory's configuration info, and ignore
the parent's. For some modules, that works just fine (e.g., for
the includes module, whose per-directory configuration information
consists solely of the state of the XBITHACK), and for those
modules, you can just not declare one, and leave the corresponding
structure slot in the module itself NULL.
Now that we have these structures, we need to be able to figure out how
to fill them. That involves processing the actual AddType and AddEncoding commands. To find commands, the server looks in
the module's command table. That table contains information on how many
arguments the commands take, and in what formats, where it is permitted,
and so forth. That information is sufficient to allow the server to invoke
most command-handling functions with pre-parsed arguments. Without further
ado, let's look at the AddType
command handler, which looks like this (the AddEncoding command looks basically the same, and won't be
shown here):
char *add_type(cmd_parms *cmd, mime_dir_config *m, char *ct, char *ext)
{
if (*ext == '.') ++ext;
ap_table_set (m->forced_types, ext, ct);
return NULL;
}
This command handler is unusually simple. As you can see, it takes
four arguments, two of which are pre-parsed arguments, the third being the
per-directory configuration structure for the module in question, and the
fourth being a pointer to a cmd_parms structure. That
structure contains a bunch of arguments which are frequently of use to
some, but not all, commands, including a resource pool (from which memory
can be allocated, and to which cleanups should be tied), and the (virtual)
server being configured, from which the module's per-server configuration
data can be obtained if required.
Another way in which this particular command handler is unusually
simple is that there are no error conditions which it can encounter. If
there were, it could return an error message instead of NULL;
this causes an error to be printed out on the server's
stderr, followed by a quick exit, if it is in the main config
files; for a .htaccess file, the syntax error is logged in
the server error log (along with an indication of where it came from), and
the request is bounced with a server error response (HTTP error status,
code 500).
The MIME module's command table has entries for these commands, which look like this:
command_rec mime_cmds[] = {
{ "AddType", add_type, NULL, OR_FILEINFO, TAKE2,
"a mime type followed by a file extension" },
{ "AddEncoding", add_encoding, NULL, OR_FILEINFO, TAKE2,
"an encoding (e.g., gzip), followed by a file extension" },
{ NULL }
};
The entries in these tables are:
(void *) pointer, which is passed in the
cmd_parms structure to the command handler ---
this is useful in case many similar commands are handled by
the same function.AllowOverride option, and an additional mask
bit, RSRC_CONF, indicating that the command may
appear in the server's own config files, but not in
any .htaccess file.TAKE2 indicates two pre-parsed arguments. Other
options are TAKE1, which indicates one
pre-parsed argument, FLAG, which indicates that
the argument should be On or Off,
and is passed in as a boolean flag, RAW_ARGS,
which causes the server to give the command the raw, unparsed
arguments (everything but the command name itself). There is
also ITERATE, which means that the handler looks
the same as TAKE1, but that if multiple
arguments are present, it should be called multiple times,
and finally ITERATE2, which indicates that the
command handler looks like a TAKE2, but if more
arguments are present, then it should be called multiple
times, holding the first argument constant.NULL).Finally, having set this all up, we have to use it. This is ultimately
done in the module's handlers, specifically for its file-typing handler,
which looks more or less like this; note that the per-directory
configuration structure is extracted from the request_rec's
per-directory configuration vector by using the
ap_get_module_config function.
int find_ct(request_rec *r)
{
int i;
char *fn = ap_pstrdup (r->pool, r->filename);
mime_dir_config *conf = (mime_dir_config *)
ap_get_module_config(r->per_dir_config, &mime_module);
char *type;
if (S_ISDIR(r->finfo.st_mode)) {
r->content_type = DIR_MAGIC_TYPE;
return OK;
}
if((i=ap_rind(fn,'.')) < 0) return DECLINED;
++i;
if ((type = ap_table_get (conf->encoding_types, &fn[i])))
{
r->content_encoding = type;
/* go back to previous extension to try to use it as a type */
fn[i-1] = '\0';
if((i=ap_rind(fn,'.')) < 0) return OK;
++i;
}
if ((type = ap_table_get (conf->forced_types, &fn[i])))
{
r->content_type = type;
}
return OK;
}
The basic ideas behind per-server module configuration are basically the same as those for per-directory configuration; there is a creation function and a merge function, the latter being invoked where a virtual server has partially overridden the base server configuration, and a combined structure must be computed. (As with per-directory configuration, the default if no merge function is specified, and a module is configured in some virtual server, is that the base configuration is simply ignored).
The only substantial difference is that when a command needs to
configure the per-server private module data, it needs to go to the
cmd_parms data to get at it. Here's an example, from the
alias module, which also indicates how a syntax error can be returned
(note that the per-directory configuration argument to the command
handler is declared as a dummy, since the module doesn't actually have
per-directory config data):
char *add_redirect(cmd_parms *cmd, void *dummy, char *f, char *url)
{
server_rec *s = cmd->server;
alias_server_conf *conf = (alias_server_conf *)
ap_get_module_config(s->module_config,&alias_module);
alias_entry *new = ap_push_array (conf->redirects);
if (!ap_is_url (url)) return "Redirect to non-URL";
new->fake = f; new->real = url;
return NULL;
}
Apache HTTP Server Version 2.0

The allocation mechanism's within APR have a number of debugging modes that can be used to assist in finding memory problems. This document describes the modes available and gives instructions on activating them.
free()d memory and other such
nonsense.The theory is simple. The FILL_BYTE (0xa5)
is written over all malloc'd memory as we receive it, and
is written over everything that we free up during a
clear_pool. We check that blocks on the free list always
have the FILL_BYTE in them, and we check during
palloc() that the bytes still have FILL_BYTE
in them. If you ever see garbage URLs or whatnot containing lots
of 0xa5s then you know something used data that's been
freed or uninitialized.
malloc() and free()d appropriately at the
end.This is intended to be used with something like Electric
Fence or Purify to help detect memory problems. Note that if
you're using efence then you should also add in ALLOC_DEBUG.
But don't add in ALLOC_DEBUG if you're using Purify because
ALLOC_DEBUG would hide all the uninitialized read errors
that Purify can diagnose.
In particular, it causes the table_{set,add,merge}n
routines to check that their arguments are safe for the
apr_table_t they're being placed in. It currently only works
with the unix multiprocess model, but could be extended to others.
This requires a recent gcc which supports
__builtin_return_address(). The error_log output will be a
message such as:
table_push: apr_table_t created by 0x804d874 hit limit of 10
Use l *0x804d874 to find the
source that corresponds to. It indicates that a apr_table_t
allocated by a call at that address has possibly too small an
initial apr_table_t size guess.
This requires a bit of an understanding of how alloc.c works.
Not all the options outlined above can be activated at the same time. the following table gives more information.
| ALLOC DEBUG | ALLOC USE MALLOC | POOL DEBUG | MAKE TABLE PROFILE | ALLOC STATS | |
|---|---|---|---|---|---|
| ALLOC DEBUG | - | No | Yes | Yes | Yes |
| ALLOC USE MALLOC | No | - | No | No | No |
| POOL DEBUG | Yes | No | - | Yes | Yes |
| MAKE TABLE PROFILE | Yes | No | Yes | - | Yes |
| ALLOC STATS | Yes | No | Yes | Yes | - |
Additionally the debugging options are not suitable for multi-threaded versions of the server. When trying to debug with these options the server should be started in single process mode.
The various options for debugging memory are now enabled in
the apr_general.h header file in APR. The various options are
enabled by uncommenting the define for the option you wish to
use. The section of the code currently looks like this
(contained in srclib/apr/include/apr_pools.h)
/*
#define ALLOC_DEBUG
#define POOL_DEBUG
#define ALLOC_USE_MALLOC
#define MAKE_TABLE_PROFILE
#define ALLOC_STATS
*/
typedef struct ap_pool_t {
union block_hdr *first;
union block_hdr *last;
struct cleanup *cleanups;
struct process_chain *subprocesses;
struct ap_pool_t *sub_pools;
struct ap_pool_t *sub_next;
struct ap_pool_t *sub_prev;
struct ap_pool_t *parent;
char *free_first_avail;
#ifdef ALLOC_USE_MALLOC
void *allocation_list;
#endif
#ifdef POOL_DEBUG
struct ap_pool_t *joined;
#endif
int (*apr_abort)(int retcode);
struct datastruct *prog_data;
} ap_pool_t;
To enable allocation debugging simply move the #define
ALLOC_DEBUG above the start of the comments block and rebuild
the server.
In order to use the various options the server must be rebuilt after editing the header file.
Apache HTTP Server Version 2.0

Apache 2.0 uses Doxygen to document the APIs and global variables in the the code. This will explain the basics of how to document using Doxygen.
To start a documentation block, use /**
To end a documentation block, use */
In the middle of the block, there are multiple tags we can use:
Description of this functions purpose
@param parameter_name description
@return description
@deffunc signature of the function
The deffunc is not always necessary. DoxyGen does not
have a full parser in it, so any prototype that use a macro in the
return type declaration is too complex for scandoc. Those functions
require a deffunc. An example (using > rather
than >):
/**
* return the final element of the pathname
* @param pathname The path to get the final element of
* @return the final element of the path
* @tip Examples:
* <pre>
* "/foo/bar/gum" -> "gum"
* "/foo/bar/gum/" -> ""
* "gum" -> "gum"
* "wi\\n32\\stuff" -> "stuff"
* </pre>
* @deffunc const char * ap_filename_of_pathname(const char *pathname)
*/
At the top of the header file, always include:
/**
* @package Name of library header
*/
Doxygen uses a new HTML file for each package. The HTML files are named {Name_of_library_header}.html, so try to be concise with your names.
For a further discussion of the possibilities please refer to the Doxygen site.
Apache HTTP Server Version 2.0

This is a cut 'n paste job from an email (<022501c1c529$f63a9550$7f00000a@KOJ>) and only reformatted for better readability. It's not up to date but may be a good start for further research.
There are three basic filter types (each of these is actually broken down into two categories, but that comes later).
CONNECTIONAP_FTYPE_CONNECTION, AP_FTYPE_NETWORK)PROTOCOLAP_FTYPE_PROTOCOL,
AP_FTYPE_TRANSCODE)RESOURCEPROTOCOL, but internal redirects and sub-requests can change
the content without ending the request. (AP_FTYPE_RESOURCE,
AP_FTYPE_CONTENT_SET)It is important to make the distinction between a protocol and a resource filter. A resource filter is tied to a specific resource, it may also be tied to header information, but the main binding is to a resource. If you are writing a filter and you want to know if it is resource or protocol, the correct question to ask is: "Can this filter be removed if the request is redirected to a different resource?" If the answer is yes, then it is a resource filter. If it is no, then it is most likely a protocol or connection filter. I won't go into connection filters, because they seem to be well understood. With this definition, a few examples might help:
The further breakdown of each category into two more filter types is
strictly for ordering. We could remove it, and only allow for one
filter type, but the order would tend to be wrong, and we would need to
hack things to make it work. Currently, the RESOURCE filters
only have one filter type, but that should change.
This is actually rather simple in theory, but the code is
complex. First of all, it is important that everybody realize that
there are three filter lists for each request, but they are all
concatenated together. So, the first list is
r->output_filters, then r->proto_output_filters,
and finally r->connection->output_filters. These correspond
to the RESOURCE, PROTOCOL, and
CONNECTION filters respectively. The problem previously, was
that we used a singly linked list to create the filter stack, and we
started from the "correct" location. This means that if I had a
RESOURCE filter on the stack, and I added a
CONNECTION filter, the CONNECTION filter would
be ignored. This should make sense, because we would insert the connection
filter at the top of the c->output_filters list, but the end
of r->output_filters pointed to the filter that used to be
at the front of c->output_filters. This is obviously wrong.
The new insertion code uses a doubly linked list. This has the advantage
that we never lose a filter that has been inserted. Unfortunately, it comes
with a separate set of headaches.
The problem is that we have two different cases were we use subrequests. The first is to insert more data into a response. The second is to replace the existing response with an internal redirect. These are two different cases and need to be treated as such.
In the first case, we are creating the subrequest from within a handler
or filter. This means that the next filter should be passed to
make_sub_request function, and the last resource filter in the
sub-request will point to the next filter in the main request. This
makes sense, because the sub-request's data needs to flow through the
same set of filters as the main request. A graphical representation
might help:
Default_handler --> includes_filter --> byterange --> ...
If the includes filter creates a sub request, then we don't want the data from that sub-request to go through the includes filter, because it might not be SSI data. So, the subrequest adds the following:
Default_handler --> includes_filter -/-> byterange --> ...
/
Default_handler --> sub_request_core
What happens if the subrequest is SSI data? Well, that's easy, the
includes_filter is a resource filter, so it will be added to
the sub request in between the Default_handler and the
sub_request_core filter.
The second case for sub-requests is when one sub-request is going to
become the real request. This happens whenever a sub-request is created
outside of a handler or filter, and NULL is passed as the next filter to
the make_sub_request function.
In this case, the resource filters no longer make sense for the new request, because the resource has changed. So, instead of starting from scratch, we simply point the front of the resource filters for the sub-request to the front of the protocol filters for the old request. This means that we won't lose any of the protocol filters, neither will we try to send this data through a filter that shouldn't see it.
The problem is that we are using a doubly-linked list for our filter stacks now. But, you should notice that it is possible for two lists to intersect in this model. So, you do you handle the previous pointer? This is a very difficult question to answer, because there is no "right" answer, either method is equally valid. I looked at why we use the previous pointer. The only reason for it is to allow for easier addition of new servers. With that being said, the solution I chose was to make the previous pointer always stay on the original request.
This causes some more complex logic, but it works for all cases. My concern in having it move to the sub-request, is that for the more common case (where a sub-request is used to add data to a response), the main filter chain would be wrong. That didn't seem like a good idea to me.
The final topic. :-) Mod_Asis is a bit of a hack, but the
handler needs to remove all filters except for connection filters, and
send the data. If you are using mod_asis, all other
bets are off.
The absolutely last point is that the reason this code was so hard to
get right, was because we had hacked so much to force it to work. I
wrote most of the hacks originally, so I am very much to blame.
However, now that the code is right, I have started to remove some
hacks. Most people should have seen that the reset_filters
and add_required_filters functions are gone. Those inserted
protocol level filters for error conditions, in fact, both functions did
the same thing, one after the other, it was really strange. Because we
don't lose protocol filters for error cases any more, those hacks went away.
The HTTP_HEADER, Content-length, and
Byterange filters are all added in the
insert_filters phase, because if they were added earlier, we
had some interesting interactions. Now, those could all be moved to be
inserted with the HTTP_IN, CORE, and
CORE_IN filters. That would make the code easier to
follow.
Apache HTTP Server Version 2.0

This document is still in development and may be partially out of date.
In general, a hook function is one that Apache will call at some point during the processing of a request. Modules can provide functions that are called, and specify when they get called in comparison to other modules.
In order to create a new hook, four things need to be done:
Use the AP_DECLARE_HOOK macro, which needs to be given
the return type of the hook function, the name of the hook, and the
arguments. For example, if the hook returns an int and
takes a request_rec * and an int and is
called do_something, then declare it like this:
AP_DECLARE_HOOK(int, do_something, (request_rec *r, int n))
This should go in a header which modules will include if they want to use the hook.
Each source file that exports a hook has a private structure which is used to record the module functions that use the hook. This is declared as follows:
APR_HOOK_STRUCT(
APR_HOOK_LINK(do_something)
...
)
The source file that exports the hook has to implement a
function that will call the hook. There are currently three
possible ways to do this. In all cases, the calling function is
called ap_run_hookname().
If the return value of a hook is void, then all the
hooks are called, and the caller is implemented like this:
AP_IMPLEMENT_HOOK_VOID(do_something, (request_rec *r, int n), (r, n))
The second and third arguments are the dummy argument declaration and the dummy arguments as they will be used when calling the hook. In other words, this macro expands to something like this:
void ap_run_do_something(request_rec *r, int n)
{
...
do_something(r, n);
}
If the hook returns a value, then it can either be run until the first hook that does something interesting, like so:
AP_IMPLEMENT_HOOK_RUN_FIRST(int, do_something, (request_rec *r, int n), (r, n), DECLINED)
The first hook that does not return DECLINED
stops the loop and its return value is returned from the hook
caller. Note that DECLINED is the tradition Apache
hook return meaning "I didn't do anything", but it can be
whatever suits you.
Alternatively, all hooks can be run until an error occurs. This boils down to permitting two return values, one of which means "I did something, and it was OK" and the other meaning "I did nothing". The first function that returns a value other than one of those two stops the loop, and its return is the return value. Declare these like so:
AP_IMPLEMENT_HOOK_RUN_ALL(int, do_something, (request_rec *r, int n), (r, n), OK, DECLINED)
Again, OK and DECLINED are the traditional
values. You can use what you want.
At appropriate moments in the code, call the hook caller, like so:
int n, ret;
request_rec *r;
ret=ap_run_do_something(r, n);
A module that wants a hook to be called needs to do two things.
Include the appropriate header, and define a static function of the correct type:
static int my_something_doer(request_rec *r, int n)
{
...
return OK;
}
During initialisation, Apache will call each modules hook registering function, which is included in the module structure:
static void my_register_hooks()
{
ap_hook_do_something(my_something_doer, NULL, NULL, HOOK_MIDDLE);
}
mode MODULE_VAR_EXPORT my_module =
{
...
my_register_hooks /* register hooks */
};
In the example above, we didn't use the three arguments in
the hook registration function that control calling order.
There are two mechanisms for doing this. The first, rather
crude, method, allows us to specify roughly where the hook is
run relative to other modules. The final argument control this.
There are three possible values: HOOK_FIRST,
HOOK_MIDDLE and HOOK_LAST.
All modules using any particular value may be run in any
order relative to each other, but, of course, all modules using
HOOK_FIRST will be run before HOOK_MIDDLE
which are before HOOK_LAST. Modules that don't care
when they are run should use HOOK_MIDDLE. (I spaced
these out so people could do stuff like HOOK_FIRST-2
to get in slightly earlier, but is this wise? - Ben)
Note that there are two more values,
HOOK_REALLY_FIRST and HOOK_REALLY_LAST. These
should only be used by the hook exporter.
The other method allows finer control. When a module knows that it must be run before (or after) some other modules, it can specify them by name. The second (third) argument is a NULL-terminated array of strings consisting of the names of modules that must be run before (after) the current module. For example, suppose we want "mod_xyz.c" and "mod_abc.c" to run before we do, then we'd hook as follows:
static void register_hooks()
{
static const char * const aszPre[] = { "mod_xyz.c", "mod_abc.c", NULL };
ap_hook_do_something(my_something_doer, aszPre, NULL, HOOK_MIDDLE);
}
Note that the sort used to achieve this is stable, so
ordering set by HOOK_ORDER is preserved, as far
as is possible.
Ben Laurie, 15th August 1999
Apache HTTP Server Version 2.0

Many of the documents on these Developer pages are lifted from Apache 1.3's documentation. While they are all being updated to Apache 2.0, they are in different stages of progress. Please be patient, and point out any discrepancies or errors on the developer/ pages directly to the dev@httpd.apache.org mailing list.
Apache HTTP Server Version 2.0

This is a first attempt at writing the lessons I learned
when trying to convert the mod_mmap_static module to Apache
2.0. It's by no means definitive and probably won't even be
correct in some ways, but it's a start.
These now need to be of type apr_status_t and return a
value of that type. Normally the return value will be
APR_SUCCESS unless there is some need to signal an error in
the cleanup. Be aware that even though you signal an error not all code
yet checks and acts upon the error.
These should now be renamed to better signify where they sit
in the overall process. So the name gets a small change from
mmap_init to mmap_post_config. The arguments
passed have undergone a radical change and now look like
apr_pool_t *papr_pool_t *plogapr_pool_t *ptempserver_rec *sA lot of the data types have been moved into the APR. This means that some have had a name change, such as the one shown above. The following is a brief list of some of the changes that you are likely to have to make.
pool becomes apr_pool_ttable becomes apr_table_tThe new architecture uses a series of hooks to provide for
calling your functions. These you'll need to add to your module
by way of a new function, static void register_hooks(void).
The function is really reasonably straightforward once you
understand what needs to be done. Each function that needs
calling at some stage in the processing of a request needs to
be registered, handlers do not. There are a number of phases
where functions can be added, and for each you can specify with
a high degree of control the relative order that the function
will be called in.
This is the code that was added to mod_mmap_static:
static void register_hooks(void)
{
static const char * const aszPre[]={ "http_core.c",NULL };
ap_hook_post_config(mmap_post_config,NULL,NULL,HOOK_MIDDLE);
ap_hook_translate_name(mmap_static_xlat,aszPre,NULL,HOOK_LAST);
};This registers 2 functions that need to be called, one in
the post_config stage (virtually every module will need this
one) and one for the translate_name phase. note that while
there are different function names the format of each is
identical. So what is the format?
ap_hook_phase_name(function_name,
predecessors, successors, position);
There are 3 hook positions defined...
HOOK_FIRSTHOOK_MIDDLEHOOK_LASTTo define the position you use the position and then modify it with the predecessors and successors. Each of the modifiers can be a list of functions that should be called, either before the function is run (predecessors) or after the function has run (successors).
In the mod_mmap_static case I didn't care about the
post_config stage, but the mmap_static_xlat
must be called after the core module had done it's name
translation, hence the use of the aszPre to define a modifier to the
position HOOK_LAST.
There are now a lot fewer stages to worry about when creating your module definition. The old defintion looked like
module MODULE_VAR_EXPORT module_name_module =
{
STANDARD_MODULE_STUFF,
/* initializer */
/* dir config creater */
/* dir merger --- default is to override */
/* server config */
/* merge server config */
/* command handlers */
/* handlers */
/* filename translation */
/* check_user_id */
/* check auth */
/* check access */
/* type_checker */
/* fixups */
/* logger */
/* header parser */
/* child_init */
/* child_exit */
/* post read-request */
};The new structure is a great deal simpler...
module MODULE_VAR_EXPORT module_name_module =
{
STANDARD20_MODULE_STUFF,
/* create per-directory config structures */
/* merge per-directory config structures */
/* create per-server config structures */
/* merge per-server config structures */
/* command handlers */
/* handlers */
/* register hooks */
};Some of these read directly across, some don't. I'll try to summarise what should be done below.
The stages that read directly across :
/* dir config creater *//* create per-directory config structures *//* server config *//* create per-server config structures *//* dir merger *//* merge per-directory config structures *//* merge server config *//* merge per-server config structures *//* command table *//* command apr_table_t *//* handlers *//* handlers */The remainder of the old functions should be registered as hooks. There are the following hook stages defined so far...
ap_hook_post_config_init routines get
registeredap_hook_http_methodap_hook_open_logsap_hook_auth_checkerap_hook_access_checkerap_hook_check_user_idap_hook_default_portap_hook_pre_connectionap_hook_process_connectionap_hook_child_initap_hook_create_requestap_hook_fixupsap_hook_handlerap_hook_header_parserpost_read_request for thisap_hook_insert_filterap_hook_log_transactionap_hook_optional_fn_retrieveap_hook_post_read_requestap_hook_quick_handlerap_hook_translate_nameap_hook_type_checkerApache HTTP Server Version 2.0

Warning - this is a first (fast) draft that needs further revision!
Several changes in Apache 2.0 affect the internal request processing mechanics. Module authors need to be aware of these changes so they may take advantage of the optimizations and security enhancements.
The first major change is to the subrequest and redirect
mechanisms. There were a number of different code paths in
Apache 1.3 to attempt to optimize subrequest or redirect
behavior. As patches were introduced to 2.0, these
optimizations (and the server behavior) were quickly broken due
to this duplication of code. All duplicate code has been folded
back into ap_process_request_internal() to prevent
the code from falling out of sync again.
This means that much of the existing code was 'unoptimized'. It is the Apache HTTP Project's first goal to create a robust and correct implementation of the HTTP server RFC. Additional goals include security, scalability and optimization. New methods were sought to optimize the server (beyond the performance of Apache 1.3) without introducing fragile or insecure code.
All requests pass through ap_process_request_internal()
in request.c, including subrequests and redirects. If a module
doesn't pass generated requests through this code, the author is cautioned
that the module may be broken by future changes to request
processing.
To streamline requests, the module author can take advantage of the hooks offered to drop out of the request cycle early, or to bypass core Apache hooks which are irrelevant (and costly in terms of CPU.)
The request's parsed_uri path is unescaped, once and only
once, at the beginning of internal request processing.
This step is bypassed if the proxyreq flag is set, or the
parsed_uri.path element is unset. The module has no further
control of this one-time unescape operation, either failing to
unescape or multiply unescaping the URL leads to security
reprecussions.
All /../ and /./ elements are
removed by ap_getparents(). This helps to ensure
the path is (nearly) absolute before the request processing
continues.
This step cannot be bypassed.
Every request is subject to an
ap_location_walk() call. This ensures that
<Location> sections
are consistently enforced for all requests. If the request is an internal
redirect or a sub-request, it may borrow some or all of the processing
from the previous or parent request's ap_location_walk, so this step
is generally very efficient after processing the main request.
Modules can determine the file name, or alter the given URI
in this step. For example, mod_vhost_alias will
translate the URI's path into the configured virtual host,
mod_alias will translate the path to an alias path,
and if the request falls back on the core, the DocumentRoot is prepended to the request resource.
If all modules DECLINE this phase, an error 500 is
returned to the browser, and a "couldn't translate name" error is logged
automatically.
After the file or correct URI was determined, the
appropriate per-dir configurations are merged together. For
example, mod_proxy compares and merges the appropriate
<Proxy> sections.
If the URI is nothing more than a local (non-proxy) TRACE
request, the core handles the request and returns DONE.
If no module answers this hook with OK or DONE,
the core will run the request filename against the <Directory> and <Files> sections. If the request
'filename' isn't an absolute, legal filename, a note is set for
later termination.
Every request is hardened by a second
ap_location_walk() call. This reassures that a
translated request is still subjected to the configured
<Location> sections.
The request again borrows some or all of the processing from its previous
location_walk above, so this step is almost always very
efficient unless the translated URI mapped to a substantially different
path or Virtual Host.
The main request then parses the client's headers. This prepares the remaining request processing steps to better serve the client's request.
Needs Documentation. Code is:
switch (ap_satisfies(r)) {
case SATISFY_ALL:
case SATISFY_NOSPEC:
if ((access_status = ap_run_access_checker(r)) != 0) {
return decl_die(access_status, "check access", r);
}
if (ap_some_auth_required(r)) {
if (((access_status = ap_run_check_user_id(r)) != 0)
|| !ap_auth_type(r)) {
return decl_die(access_status, ap_auth_type(r)
? "check user. No user file?"
: "perform authentication. AuthType not set!",
r);
}
if (((access_status = ap_run_auth_checker(r)) != 0)
|| !ap_auth_type(r)) {
return decl_die(access_status, ap_auth_type(r)
? "check access. No groups file?"
: "perform authentication. AuthType not set!",
r);
}
}
break;
case SATISFY_ANY:
if (((access_status = ap_run_access_checker(r)) != 0)) {
if (!ap_some_auth_required(r)) {
return decl_die(access_status, "check access", r);
}
if (((access_status = ap_run_check_user_id(r)) != 0)
|| !ap_auth_type(r)) {
return decl_die(access_status, ap_auth_type(r)
? "check user. No user file?"
: "perform authentication. AuthType not set!",
r);
}
if (((access_status = ap_run_auth_checker(r)) != 0)
|| !ap_auth_type(r)) {
return decl_die(access_status, ap_auth_type(r)
? "check access. No groups file?"
: "perform authentication. AuthType not set!",
r);
}
}
break;
}The modules have an opportunity to test the URI or filename
against the target resource, and set mime information for the
request. Both mod_mime and
mod_mime_magic use this phase to compare the file
name or contents against the administrator's configuration and set the
content type, language, character set and request handler. Some modules
may set up their filters or other request handling parameters at this
time.
If all modules DECLINE this phase, an error 500 is
returned to the browser, and a "couldn't find types" error is logged
automatically.
Many modules are 'trounced' by some phase above. The fixups phase is used by modules to 'reassert' their ownership or force the request's fields to their appropriate values. It isn't always the cleanest mechanism, but occasionally it's the only option.
This phase is not part of the processing in
ap_process_request_internal(). Many
modules prepare one or more subrequests prior to creating any
content at all. After the core, or a module calls
ap_process_request_internal() it then calls
ap_invoke_handler() to generate the request.
Modules that transform the content in some way can insert their values and override existing filters, such that if the user configured a more advanced filter out-of-order, then the module can move its order as need be. There is no result code, so actions in this hook better be trusted to always succeed.
The module finally has a chance to serve the request in its
handler hook. Note that not every prepared request is sent to
the handler hook. Many modules, such as mod_autoindex,
will create subrequests for a given URI, and then never serve the
subrequest, but simply lists it for the user. Remember not to
put required teardown from the hooks above into this module,
but register pool cleanups against the request pool to free
resources as required.
Apache HTTP Server Version 2.0

When using any of the threaded mpms in Apache 2.0 it is important that every function called from Apache be thread safe. When linking in 3rd party extensions it can be difficult to determine whether the resulting server will be thread safe. Casual testing generally won't tell you this either as thread safety problems can lead to subtle race conditons that may only show up in certain conditions under heavy load.
When writing your module or when trying to determine if a module or 3rd party library is thread safe there are some common things to keep in mind.
First, you need to recognize that in a threaded model each individual thread has its own program counter, stack and registers. Local variables live on the stack, so those are fine. You need to watch out for any static or global variables. This doesn't mean that you are absolutely not allowed to use static or global variables. There are times when you actually want something to affect all threads, but generally you need to avoid using them if you want your code to be thread safe.
In the case where you have a global variable that needs to be global and accessed by all threads, be very careful when you update it. If, for example, it is an incrementing counter, you need to atomically increment it to avoid race conditions with other threads. You do this using a mutex (mutual exclusion). Lock the mutex, read the current value, increment it and write it back and then unlock the mutex. Any other thread that wants to modify the value has to first check the mutex and block until it is cleared.
If you are using APR, have a look
at the apr_atomic_* functions and the
apr_thread_mutex_* functions.
This is a common global variable that holds the error number of the
last error that occurred. If one thread calls a low-level function that
sets errno and then another thread checks it, we are bleeding error
numbers from one thread into another. To solve this, make sure your
module or library defines _REENTRANT or is compiled with
-D_REENTRANT. This will make errno a per-thread variable
and should hopefully be transparent to the code. It does this by doing
something like this:
#define errno (*(__errno_location()))
which means that accessing errno will call
__errno_location() which is provided by the libc. Setting
_REENTRANT also forces redefinition of some other functions
to their *_r equivalents and sometimes changes
the common getc/putc macros into safer function
calls. Check your libc documentation for specifics. Instead of, or in
addition to _REENTRANT the symbols that may affect this are
_POSIX_C_SOURCE, _THREAD_SAFE,
_SVID_SOURCE, and _BSD_SOURCE.
Not only do things have to be thread safe, but they also have to be
reentrant. strtok() is an obvious one. You call it the first
time with your delimiter which it then remembers and on each subsequent
call it returns the next token. Obviously if multiple threads are
calling it you will have a problem. Most systems have a reentrant version
of of the function called strtok_r() where you pass in an
extra argument which contains an allocated char * which the
function will use instead of its own static storage for maintaining
the tokenizing state. If you are using APR you can use apr_strtok().
crypt() is another function that tends to not be reentrant,
so if you run across calls to that function in a library, watch out. On
some systems it is reentrant though, so it is not always a problem. If
your system has crypt_r() chances are you should be using
that, or if possible simply avoid the whole mess by using md5 instead.
The following is a list of common libraries that are used by 3rd party
Apache modules. You can check to see if your module is using a potentially
unsafe library by using tools such as ldd(1) and
nm(1). For PHP, for example,
try this:
% ldd libphp4.so
libsablot.so.0 => /usr/local/lib/libsablot.so.0 (0x401f6000)
libexpat.so.0 => /usr/lib/libexpat.so.0 (0x402da000)
libsnmp.so.0 => /usr/lib/libsnmp.so.0 (0x402f9000)
libpdf.so.1 => /usr/local/lib/libpdf.so.1 (0x40353000)
libz.so.1 => /usr/lib/libz.so.1 (0x403e2000)
libpng.so.2 => /usr/lib/libpng.so.2 (0x403f0000)
libmysqlclient.so.11 => /usr/lib/libmysqlclient.so.11 (0x40411000)
libming.so => /usr/lib/libming.so (0x40449000)
libm.so.6 => /lib/libm.so.6 (0x40487000)
libfreetype.so.6 => /usr/lib/libfreetype.so.6 (0x404a8000)
libjpeg.so.62 => /usr/lib/libjpeg.so.62 (0x404e7000)
libcrypt.so.1 => /lib/libcrypt.so.1 (0x40505000)
libssl.so.2 => /lib/libssl.so.2 (0x40532000)
libcrypto.so.2 => /lib/libcrypto.so.2 (0x40560000)
libresolv.so.2 => /lib/libresolv.so.2 (0x40624000)
libdl.so.2 => /lib/libdl.so.2 (0x40634000)
libnsl.so.1 => /lib/libnsl.so.1 (0x40637000)
libc.so.6 => /lib/libc.so.6 (0x4064b000)
/lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x80000000)
In addition to these libraries you will need to have a look at any
libraries linked statically into the module. You can use nm(1)
to look for individual symbols in the module.
Please drop a note to dev@httpd.apache.org if you have additions or corrections to this list.
| Library | Version | Thread Safe? | Notes |
|---|---|---|---|
| ASpell/PSpell | ? | ||
| Berkeley DB | 3.x, 4.x | Yes | Be careful about sharing a connection across threads. |
| bzip2 | Yes | Both low-level and high-level APIs are thread-safe. However, high-level API requires thread-safe access to errno. | |
| cdb | ? | ||
| C-Client | Perhaps | c-client uses strtok() and
gethostbyname() which are not thread-safe on most C
library implementations. c-client's static data is meant to be shared
across threads. If strtok() and
gethostbyname() are thread-safe on your OS, c-client
may be thread-safe. | |
| cpdflib | ? | ||
| libcrypt | ? | ||
| Expat | Yes | Need a separate parser instance per thread | |
| FreeTDS | ? | ||
| FreeType | ? | ||
| GD 1.8.x | ? | ||
| GD 2.0.x | ? | ||
| gdbm | No | Errors returned via a static gdbm_error
variable | |
| ImageMagick | 5.2.2 | Yes | ImageMagick docs claim it is thread safe since version 5.2.2 (see Change log). |
| Imlib2 | ? | ||
| libjpeg | v6b | ? | |
| libmysqlclient | Yes | Use mysqlclient_r library variant to ensure thread-safety. For more information, please read http://www.mysql.com/doc/en/Threaded_clients.html. | |
| Ming | 0.2a | ? | |
| Net-SNMP | 5.0.x | ? | |
| OpenLDAP | 2.1.x | Yes | Use ldap_r library variant to ensure
thread-safety. |
| OpenSSL | 0.9.6g | Yes | Requires proper usage of CRYPTO_num_locks,
CRYPTO_set_locking_callback,
CRYPTO_set_id_callback |
| liboci8 (Oracle 8+) | 8.x,9.x | ? | |
| pdflib | 5.0.x | Yes | PDFLib docs claim it is thread safe; changes.txt indicates it has been partially thread-safe since V1.91: http://www.pdflib.com/products/pdflib/index.html. |
| libpng | 1.0.x | ? | |
| libpng | 1.2.x | ? | |
| libpq (PostgreSQL) | 7.x | Yes | Don't share connections across threads and watch out for
crypt() calls |
| Sablotron | 0.95 | ? | |
| zlib | 1.1.4 | Yes | Relies upon thread-safe zalloc and zfree functions Default is to use libc's calloc/free which are thread-safe. |
Versin 2.0 del Servidor HTTP Apache

Este documento puede resumirse en la siguiente frase: no configure Apache de manera que el anlisis sintctico de los ficheros de configuracin tenga que confiar en resoluciones DNS. Si Apache necesita de resoluciones DNS para analizar los ficheros de configuracin, entonces su servidor puede no funcionar correctamente (por ejemplo, podra no iniciarse), o sufrir ataques de denegacin o robo de servicio (incluyendo que otas web puedan "robar" peticiones de otras web).
<VirtualHost www.abc.dom>
ServerAdmin webgirl@abc.dom
DocumentRoot /www/abc
</VirtualHost>
Para que Apache funcione correctamente, es imprescindible
conocer dos aspectos sobre cada host virtual: el valor de la
directiva ServerName y al
menos una direccin IP en la que servidor escuchar y
responder a las peticiones que se produzcan. El ejemplo
mostrado arriba no incluye la direccion IP, de manera que Apache
tiene que usar una resolucin DNS para encontrar la
direccin IP correspondiente a www.abc.dom. Si
por alguna razn la resolucin DNS no est
disponible en el momento en que su servidor est analizando
sintnticamente su fichero de configuracin, entonces
este host virtual no se configurar y no
ser capaz de responder a ninguna de las peticiones que se
hagan a ese host virtual (en las versiones de Apache anteriores a
la 1.2 el servidor ni siquiera se iniciaba).
Suponga que www.abc.dom tiene como direccin
IP la 10.0.0.1. Considere la siguiente configuracin:
<VirtualHost 10.0.0.1>
ServerAdmin webgirl@abc.dom
DocumentRoot /www/abc
</VirtualHost>
Ahora Apache necesita hacer una bsqueda DNS inversa para
encontrar el ServerName de este host virtual. Si esta
bsqueda inversa falla entonces el host virtual se
desactivar parcialmente (en las versiones de Apache
anteriores a la 1.2 el servidor ni siquiera se iniciaba). Si el
host virtual est basado en el nombre, entonces se
desactivar completamente, pero si est basado en la
direccin IP, entonces funcionar para la mayor parte de
las cosas. Sin embargo, si Apache tiene que generar en algn
momento una URL completa que incluya el nombre del servidor, no
ser capaz de generar una URL vlida.
Aqu tiene una forma de evitar ambos problemas:
<VirtualHost 10.0.0.1>
ServerName www.abc.dom
ServerAdmin webgirl@abc.dom
DocumentRoot /www/abc
</VirtualHost>
Hay (al menos) dos formas de que ocurra una denegacin de
servicio. Si est ejecutando una versin de Apache
anterior a la 1.2, entonces su servidor no se iniciar si una
de las dos bsquedas de DNS mencionadas arriba falla para
cualquiera de sus hosts virtuales. En algunos casos estas
bsquedas DNS puede que no estn bajo su control; por
ejemplo, si abc.dom es uno de sus clientes y ellos
controlan su propia DNS, pueden forzar a su servidor (pre-1.2) a
fallar al iniciarse simplemente borrando el registro
www.abc.dom.
Otra formas pueden ser bastante ms complicadas. Fjese en esta configuracin:
<VirtualHost www.abc.dom>
ServerAdmin webgirl@abc.dom
DocumentRoot /www/abc
</VirtualHost>
<VirtualHost www.def.com>
ServerAdmin webguy@def.com
DocumentRoot /www/def
</VirtualHost>
Suponga que ha asignado la direccin 10.0.0.1 a
www.abc.dom y 10.0.0.2 a
www.def.com. Todava ms, suponga que
def.com tiene el control de sus propias DNS. Con esta
configuracin ha puesto def.com en una
posicin en la que puede robar todo el trafico destinado a
abc.dom. Para conseguirlo, todo lo que tiene que
hacer es asignarle a www.def.com la direccin
10.0.0.1. Como ellos controlan sus propias DNS no puede evitar que
apunten el registro www.def.com a donde quieran.
Las peticiones dirigidas a la direccin 10.0.0.1
(includas aquellas en las los usuarios escriben URLs de tipo
http://www.abc.dom/whatever) sern todas
servidas por el host virtual def.com. Comprender por
qu ocurre esto requiere una discusin ms profunda
acerca de como Apache asigna las peticiones que recibe a los hosts
virtuales que las servirn. Puede consultar aqu un documento que trata el
tema.
El que Apache soporte hosting
virtual basado en nombres desde la version 1.1 hace que sea
necesario que el servidor conozca la direccin (o
direcciones) IP del host que httpd est
ejecutando. Para tener acceso a esta direccin puede usar la
directiva global ServerName
(si est presente) o llamar a la funcin de C
gethostname (la cul debe devolver el mismo
resultado que devuelve ejecutar por lnea de comandos
"hostname"). Entonces se produce una bsqueda DNS de esa
direccin. Actualmente, no hay forma de evitar que se
produzca esta bsqueda.
Si teme que esta bsqueda pueda fallar porque su servidor
DNS est desactivado entonces puede insertar el nombre de
host en /etc/hosts (donde probablemente ya lo tiene
para que la mquina pueda arrancar
correctamente). Asegrese de que su mquina est
configurada para usar /etc/hosts en caso de que esa
bsqueda DNS falle. En funcin del sistema operativo que
use, puede conseguir esto editando /etc/resolv.conf,
o puede que /etc/nsswitch.conf.
Si su servidor no tiene que ejecutar bsquedas DNS por
ninguna otra razn entonces considere ejecutar Apache
especificando el valor "local" en la variable de entorno
HOSTRESORDER. Todo esto depende del sistema operativo
y de las libreras de resolucin que use. Esto
tambin afecta a los CGIs a menos que use
mod_env para controlar el entorno. Por favor,
consulte las pginas de ayuda o la seccin de Preguntas
Ms Frecuentes de su sistema operativo.
VirtualHost
Listen
ServerName
<VirtualHost _default_:*>
que no tenga pginas que servir.La situacin actual respecto a las bsquedas DNS est lejos de ser la deseable. En Apache 1.2 se intent hacer que el servidor al menos se iniciara a pesar de que fallara la bsqueda DNS, pero puede que esa no sea la mejor solucin. En cualquier caso, requerir el uso de direcciones IP explcitas en los ficheros de configuracin no es ni mucho menos una solucin deseable con la situacin actual de Internet, donde la renumeracin es una necesidad.
Una posible solucin a los ataques de robo de servicio descritos ms arriba, sera hacer una bsqueda DNS inversa de la direccin IP devuelta por la bsqueda previa y comparar los dos nombres -- en caso de que sean diferentes, el host virtual se desactivara. Esto requerira configurar correctamente DNS inverso (una tarea con la que suelen estar familiarizados la mayora de los administradores de sistemas).
En cualquier caso, no parece posible iniciar en las condiciones apropiadas un servidor web alojado virtualmente cuando DNS ha fallado a no ser que se usen direcciones IP. Soluciones parciales tales como desactivar partes de la configuracin podran ser incluso peores que no iniciar el servidor en absoluto, dependiendo de las funciones que se espera que realice el servidor web.
Como HTTP/1.1 est ampliamente extendido y los navegadores
y los servidores proxy empiezan a usar la cabecera
Host, en el futuro ser posible evitar el uso de
hosting virtual basado en direcciones IP completamente. En ese
caso, un servidor web no tiene ninguna necesidad de hacer
bsquedas de DNS durante la configuracin. Sin embargo,
en Marzo de 1997 esas funcionalidades no estaban lo
suficientemente implantadas como para ponerlas en uso en
servidores web que realizaban tareas de importancia
crtica.
Versin 2.0 del Servidor HTTP Apache

El servidor HTTP Apache es un programa modular en el que el
administrador puede elegir qu funcionalidades se incluyen
mediante la seleccin de un conjunto de mdulos. En
primer lugar, los mdulos pueden compilarse de manera
esttica en el binario httpd. De forma
alternativa, los mdulos tambin pueden compilarse como
Objetos Dinamicos Compartidos (DSOs) que existen de forma
independiente del archivo binario httpd. Los
mdulos que se deseen usar como objetos dinmicos
compartidos pueden compilarse al mismo tiempo que el servidor, o
pueden compilarse en otro momento y ser aadidos despus
usando la Herramienta de Extensin de Apache
(apxs).
Este documento describe cmo usar los mdulos en forma de objeto dinmico compartido (DSO) as como los fundamentos tericos que hay detrs para explicar su funcionamiento.
| Mdulos Relacionados | Directivas Relacionadas |
|---|---|
Cargar mdulos de Apache individualmente como objetos
dinmicos compartidos (DSO) es posible gracias a un
mdulo llamado mod_so que debe compilarse
estticamente en el ncleo (kernel) de Apache. Es el
nico mdulo junto con el mdulo
core que no se puede usar como objeto
dinmico compartido. Prcticamente todos los dems
mdulos distribuidos con Apache se pueden usar como objetos
dinmicos compartidos individualmente siempre y cuando se
haya activado la posibilidad de usarlos con la opcin de
configure
--enable-module=shared tal y como se
explic en la documentacin de
instalacin. Una vez que haya compilado un mdulo
como objeto dinmico compartido y le haya puesto un nombre
del tipo mod_foo.so, puede cargarlo al iniciar o
reiniciar el servidor usando el comando LoadModule de mod_so
en el fichero httpd.conf.
Para simplificar la creacin de objetos dinmicos
compartidos para Apache (especialmente mdulos de terceras
partes) est disponible un nuevo programa de soporte llamado
apxs (APache eXtenSion). Puede usar
este programa para crear mdulos como objetos dinmicos
compartidos sin tener que crearlos al mismo tiempo que
compila su servidor Apache. La idea es simple: cuando se instala
Apache el procedimiento make install de
configure @@@ installs the Apache C header
files and puts the platform-dependent compiler and linker flags
for building DSO files into the apxs program / instala los
ficheros de cabecera de C de Apache y especifica las opciones de
compilacin y enlace dependientes de la plataforma para
generar objetos dinmicos compartidos con
apxs. De esta manera el usuario puede usar
apxs para compilar el cdigo fuente de
mdulos de Apache de manera independiente y sin tener que
preocuparse por las opciones de compilacin y enlace
dependientes de la plataforma que soportan objetos dinmicos
compartidos.
Para que se haga una idea de lo que permite el soporte de objetos dinmicos compartidos en Apache 2.0, aqu tiene un resumen breve pero conciso:
mod_foo.c, como un objeto dinmico
compartido de nombre mod_foo.so:
$ ./configure --prefix=/path/to/install --enable-foo=shared
$ make install
mod_foo.c, como
un objeto dinmico compartido de nombre
mod_foo.so:
$ ./configure --add-module=module_type:/path/to/3rdparty/mod_foo.c --enable-foo=shared
$ make install
$ ./configure --enable-so
$ make install
mod_foo.c, como
un objeto dinmico compartido de nombre
mod_foo.so fuera de la estructura de
directorios de Apache usando apxs:
$ cd /path/to/3rdparty
$ apxs -c mod_foo.c
$ apxs -i -a -n foo mod_foo.la
En todos los casos, una vez que se compila el objeto
dinmico compartido, debe usar una directiva LoadModule en
httpd.conf para activar dicho mdulo.
En las versiones modernas de Unix, existe un mecanismo especialmente til normalmente llamado enlazado/carga de Objetos Dinmicos Compartidos (DSO). Este mecanismo ofrece una forma de construir trozos de cdigo de programa en un formato especial para cargarlo en tiempo de ejecucin en el espacio de direcciones de memoria de un programa ejecutable.
Esta carga puede hacerse de dos maneras: automticamente
con un programa de sistema llamado ld.so al inicio de
un programa ejecutable o manualmente desde dentro del programa en
ejecucin con una interfaz programtica del sistema al
cargador de Unix mediante llamadas al sistema
dlopen()/dlsym().
Si se usa el primer mtodo, los objetos dinmicos
compartidos se llaman normalmente libreras
compartidas libreras DSO y se
nombran como libfoo.so o
libfoo.so.1.2. Residen en un directorio de
sistema (normalmente /usr/lib) y el enlace con el
programa ejecutable se establece al construir la librera
especificando la opcin-lfoo al comando de
enlace. Esto incluye las referencias literales a las
libreras en el programa ejecutable de manera que cuando
se inicie, el cargador de Unix ser capaz de localizar
libfoo.so en /usr/lib, en rutas
referenciadas literalmente mediante opciones del linker como
-R o en rutas configuradas mediante la variable
de entorno LD_LIBRARY_PATH. Entonces se resuelven
los smbolos (todava no resueltos) en el programa
ejecutable que estn presentes en el objeto dinmico
compartido.
Los smbolos en el programa ejecutable no estn
referenciados normalmente en el objeto dinmico compartido
(porque son libreras reusables de propsito general) y
por tanto, no se producen ms resoluciones. El programa
ejecutable no tiene que hacer nada por s mismo para usar los
smbolos del objeto dinmico compartido porque todo el
trabajo de resolucin lo hace @@@ Unix loader / el cargador
de Unix @@@. (De hecho, el cdigo para invocar
ld.so es parte del cdigo que se ejecuta al
iniciar, y que hay en cualquier programa ejecutable que haya sido
construido de forma no esttica). La ventaja de cargar
dinmicamente el cdigo de las libreras comunes es
obvia: el cdigo de las libreras necesita ser almacenado
solamente una vez, en una librera de sistema como
libc.so, ahorrando as espacio en disco.
Por otro lado, los objetos dinmicos compartidos
tambin suelen llamarse objetos compatidos o
ficheros DSO y se les puede nombrar con cualquier
extensin (aunque su nombre cannico es
foo.so). Estos archivos normalmente permanecen
dentro de un directorio especfico del programa y no se
establecen enlaces automticamente con los programas
ejecutables con los que se usan. En lugar de esto, el
programa ejecutable carga manualmente el objeto dinmico
compartido en tiempo de ejecucin en su espacio de
direcciones de memoria con dlopen(). En ese
momento no se resuelven los smbolos del objeto
dinmico compartido para el programa ejecutable. En lugar
de esto, el cargador de Unix resuelve automticamente los
smbolos (an no resueltos en el objeto
dinmico compartido del conjunto de smbolos
exportados por el programa ejecutable y de las libreras
DSO que tenga ya cargadas (especialmente todos los
smbolos de la omnipresente libc.so). De
esta manera el objeto dinmico compartido puede conocer
el conjunto de smbolos del programa ejecutable como si
hubiera sido enlazado estticamente en un primer
momento.
Finalmente, para beneficiarse de la API de las DSOs, el
programa ejecutable tiene que resolver los smbolos
particulares de la DSO con dlsym() para ser usado
ms tarde dentro de tablas de direccionamiento (dispatch
tables) etc. En otras palabras: El programa ejecutable
tiene que resolver manualmente cada uno de los smbolos que
necesita para poder usarlo despus. La ventaja de ese
mecanismo es que las partes opcionales del programa no necesitan
ser cargadas (y por tanto no consumen memoria) hasta que se
necesitan por el programa en cuestin. Cuando es necesario,
estas partes del programa pueden cargarse dinmicamente para
expandir las funcionalidades bsicas del programa.
Aunque este mecanismo DSO parece muy claro, hay al menos un paso de cierta dificultad: la resolucin de los smbolos que usa el programa ejecutable por la DSO cuando se usa una DSO para extender la funcionalidad de una programa (segundo caso). Por qu? Porque la resolucin inversa de smbolos de DSOs del conjunto de smbolos del programa ejecutable se hace en contra del diseo de la librera (donde la librera no tiene conocimiento sobre los programas que la usan) y tampoco est disponible en todas las plataformas no estandarizadas. En la prctica los smbolos globales del programa ejecutable estn disponibles para su uso en una DSO. El mayor problema que hay que resolver cuando se usan DSOs para extender un programa en tiempo de ejecucin es encontrar un modo de forzar al enlazador a exportar todos los smbolos globales.
El enfoque de las libreras compartidas es bastante tpico, porque es para lo que se diseo el mecanismo DSO, por tanto se usa para casi todos los tipos de libreras que incluye el sistema operativo. Por otro lado, no muchos programas usan objetos compartidos para expandir sus funcionalidades.
En 1998, haba solamente unos pocos programas disponibles que usaban el mecanismo DSO para extender su funcionalidad en tiempo de ejecucion: Perl 5 (por medio de su mecanismo XS y el mdulo DynaLoader), Netscape Server, etc. A partir de la version 1.3, Apache se uni a este grupo, Apache usa desde entonces una concepcin modular para extender su funcionalidad e internamente usa un enfoque de tablas de direccionamiento (dispatch-list-based) para enlazar mdulos externos con las funcionalidades propias del servidor. De esta manera, Apache puede usar el mecanismo DSO para cargar sus mdulos en tiempo de ejecucin.
Las caractersticas de las libreras dinmicas compartidas arriba explicadas tienen las siguientes ventajas:
LoadModule en
httpd.conf en lugar de tener que hacerlo con las
opciones de configure al compilar. Por
ejemplo, de esta manera uno puede ejecutar diferentes instancias
del servidor (estndar & SSL, mnima & super
potente [mod_perl, PHP3], etc.) con una nica
instalacin de Apache.apxs se
puede trabajar fuera de la estructura de directorios de Apache y
nicamente es necesario el comando apxs -i
seguido del comando apachectl restart para probar
la nueva versin del mdulo que se est
desarrollando.DSO presenta los siguientes inconvenientes:
ld -lfoo) en todas
las plataformas (por ejemplo en las plataformas basadas en a.out
normalmente no puede ser usada esta funcionalidad, mientras que
s puede ser usada en las plataformas basadas en ELF) no se
puede usar el mecanismo DSO para todos los tipos de
mdulos. En otras palabras, los mdulos compilados
como ficheros DSO solamente pueden usar smbolos del
ncleo (kernel) de Apache, los de las libreras de C
(libc) y de todas las demas libreras
dinmicas o estticas usadas por el ncleo de
Apache, o de archivos de libreras estticas
(libfoo.a) que contengan cdigo independiente
de su posicin. Las nicas posibilidades para usar
otro cdigo es asegurarse de que el ncleo de Apache
contiene una referencia a l o cargar el cdigo por
medio de dlopen().Versin 2.0 del Servidor HTTP Apache

El servidor HTTP Apache HTTP ofrece un mecanismo para almacenar informacin en variables especiales que se llaman variables de entorno. Esta informacin puede ser usada para controlar diversas operaciones como por ejemplo, almacenar datos en ficheros de registro (log files) o controlar el acceso al servidor. Las variables de entorno se usan tambin como un mecanismo de comunicacin con programas externos como por ejemplo, scripts CGI. Este documento explica las diferentes maneras de usar y manipular esas variables.
Aunque estas variables se llaman variables de entorno, no son iguales que las variables de entorno que controla el sistema operativo de la mquina en que se est ejecutando Apache. Las variables de entorno de Apache se almacenan y manipulan la en estructura interna de Apache. Solamente se convierten en autnticas variables de entorno del sistema operativo cuando se pasan a scripts CGI o a scripts Server Side Include. Si quiere manipular el entorno del sistema operativo sobre el que Apache se est ejecutando, debe usar los mecanismos estndar de manipulacin que tenga su sistema operativo.
| Mdulos Relacionados | Directivas Relacionadas |
|---|---|
El modo ms bsico de especificar el valor de una
variable de entorno en Apache es usando la directiva
incondicional SetEnv. Las variables pueden
tambin pasarse desde el shell en el que se inicio Apache
usando la directiva PassEnv.
Si necesita ms flexibilidad, las directivas incluidas
con mod_setenvif permiten especificar valores para las
variables de entorno de manera condicional en funcin de
las caracteristicas particulares de la peticin que se
est procesando. Por ejemplo, se puede especificar un
valor para una variable solamente cuando la peticin se
haga con un navegador especfico, o solamente cuando la
peticin contenga una determinada informacin en su
cabecera. Si necesita an ms flexibilidad, puede
conseguirla con la directiva RewriteRule del mdulo
mod_rewrite que tiene la opcin [E=...] para
especificar valores en las variables de entorno.
Finalmente, mod_unique_id determina el valor de la variable
de entorno UNIQUE_ID para cada
peticin. Este valor est garantizado que sea
nico entre todas las peticiones bajo condiciones muy
especficas.
Adems de todas las variables de entorno especificadas en la configuracin de Apache y las que se pasan desde el shell, los scripts CGI y las pginas SSI tienen un conjunto de variables de entorno que contienen meta-informacin sobre la peticin tal y como establece la especificacin CGI.
suexec para
lanzar scripts CGI, el entorno se limpia y se queda reducido
a un conjunto de variables seguras antes de que se
lancen los scripts. La lista de variables seguras
se define en el momento de compilar en
suexec.c.| Mdulos Relacionados | Directivas Relacionadas |
|---|---|
Uno de los principales usos de las variables de entorno es pasar informacin a scripts CGI. Tal y como se explicaba ms arriba, el entorno que se pasa a los scripts CGI incluye meta-informacin estndar acerca de la peticin adems de cualquier variable especificada en la configuracin de Apache. Para obtener ms informacin sobre este tema consulte el tutorial sobre CGIs.
Los documentos procesados por el servidor con el filtro
INCLUDES perteneciente a mod_include pueden
imprimir las variables de entorno usando el elemento
echo, y pueden usar las variables de entorno en
elementos de control de flujo para dividir en partes una
pgina condicional segn las caractersticas de
la peticin. Apache tambin sirve pginas SSI
con las variables CGI estndar tal y como se explica
ms arriba en este documento. Para obetener ms
informacin, consulte el tutorial sobre SSI.
El acceso al servidor puede ser controlado en funcin
del valor de las variables de entorno usando las directivas
allow from env= y deny from env=. En
combinacin con la directiva SetEnvIf, se puede tener un
control ms flexible del acceso al servidor en
funcin de las caractersticas del cliente. Por
ejemplo, puede usar estas directivas para denegar el acceso si
el cliente usa un determinado navegador.
Los valores de las variables de entorno pueden registrarse
en el log de acceso usando la directiva LogFormat con la
opcin %e. Adems, la decisin
sobre qu peticiones se registran puede ser tomada en
funcin del valor de las variables de entorno usando la
forma condicional de la directiva CustomLog. En
combinacin con SetEnvIf, esto permite controlar de forma
flexible de qu peticiones se guarda registro. Por
ejemplo, puede elegir no registrar las peticiones que se hagan
a ficheros cuyo nombre termine en gif, o puede
elegir registrar nicamente las peticiones que provengan
de clientes que estn fuera de su propia red.
La directiva Header puede utilizar la
presencia o ausencia de una variable de entorno para
determinar si una determinada cabecera HTTP se incluye en la
respuesta al cliente. Esto permite, por ejemplo, que una
determinada cabecera de respuesta sea enviada nicamente
si tambin estaba presente en la peticin del
cliente.
External filters configured by mod_ext_filter
using the ExtFilterDefine directive can
by activated conditional on an environment variable using the
disableenv= and enableenv= options.
La expresion %{ENV:...} de TestString
en una directiva RewriteCond permite que el
motor de reescritura de mod_rewrite pueda tomar decisiones en
funcin del valor de variables de entorno. Tenga en
cuenta que las variables accesibles en mod_rewrite sin el
prefijo ENV: no son realmente variables de
entorno. En realidad, son variables especiales de mod_rewrite
que no pueden ser accedidas desde otros mdulos.
Los problemas de interoperatividad han conducido a la
introduccin de mecanismos para modificar el
comportamiento de Apache cuando se comunica con determinados
clientes. Para hacer que esos mecanismos sean tan flexibles
como sea posible, se invocan definiendo variables de entorno,
normalmente con la directiva BrowserMatch, aunque
tambin se puede usar por ejemplo con las directivas
SetEnv y PassEnv.
Fuerza que la peticin sea tratada como una peticin HTTP/1.0 incluso si viene en una especificacin posterior.
Hace que cualquier campo Vary se elimine de la
cabecera de la respuesta antes de ser enviada al
cliente. Algunos clientes no interpretan este campo
correctamente (consulte la seccin sobre problemas conocidos con
clientes); usar esta variable puede evitar esos
problemas. Usar esta variable implica tambin el uso de
force-response-1.0.
Fuerza que la respuesta a una peticin HTTP/1.0 se haga tambin segn la especificacin HTTP/1.0. Esto se implement originalmente como resultado de un problema con los proxies de AOL. Algunos clientes HTTP/1.0 no se comportan correctamente cuando se les enva una respuesta HTTP/1.1, y este mecanismo hace que se pueda interactuar con ellos.
Cuando tiene valor "1", esta variable desactiva el filtro
de salida DEFLATE de mod_deflate para
contenidos de tipo diferentes de text/html.
Cuando se especifica, se desactiva el filtro
DEFLATE de mod_deflate.
Desactiva KeepAlive.
Influye en el comportamiento de
mod_negotiation. Si contiene una etiqueta de
idioma (del tipo en, ja o
x-klingon), mod_negotiation
intenta que se use ese mismo idioma en la respuesta. Si no
est disponible ese idioma, se aplica el proceso de negociacin
normal.
Fuerza que el servidor sea especialmente cuidadoso al enviar una redireccin al cliente. Se usa normalmente cuando un cliente tiene un problema conocido tratando las redirecciones. Fue implementado originalmente por el problema que presentaba el software de WebFolders de Microsoft, que tena problemas interpretando redirecciones originadas cuando se acceda a recursos servidos usando DAV.
Disponible en las versiones de Apache 2.0.40 y posteriores
Cuando Apache efecta una redireccin en respuesta a la peticin de un cliente, la respuesta incluye algn texto para que se muestre en caso de que el cliente no pueda seguir (o no siga) automticamente la redireccin. Apache normalmente etiqueta este texto siguiendo la codificacin ISO-8859-1.
Sin embargo, si la redireccin es a una pgina que usa una codificacin diferente, algunas versiones de navegadores que no funcionan correctamente intentarn usar la codificacin del texto de redireccin en lugar de la de pagina a la que ha sido redireccionado. La consecuencia de esto puede ser, por ejemplo, que una pgina en griego no se muestre correctamente.
Especificar un valor en esta variable de entorno hace que Apache omita la codificacin en el texto que incluye con las redirecciones, y que esos navegadores que no funcionan correctamente muestren correctamente la pgina de destino.
Recomendamos que incluya las siguentes lneas en el fichero httpd.conf para evitar problemas conocidos
# # Las siguientes directivas modifican el comportamiento normal de las respuestas HTTP. # La primera directiva desactiva keepalive para Netscape 2.x y para navegadores # que la simulan. Hay problemas conocidos con esos navegadores. # La segunda directiva es para Microsoft Internet Explorer 4.0b2 # que tiene un fallo en la implemantacin de HTTP/1.1 y no soporta # keepalive adecuadamente cuando se usan respuestas 301 302 (redirecciones). # BrowserMatch "Mozilla/2" nokeepalive BrowserMatch "MSIE 4\.0b2;" nokeepalive downgrade-1.0 force-response-1.0 # # La siguiente directiva desactiva las respuestas HTTP/1.1 para navegadores que # violan la especificacin HTTP/1.0 @@@ by not being able to grok a # basic 1.1 response @@@. # BrowserMatch "RealPlayer 4\.0" force-response-1.0 BrowserMatch "Java/1\.0" force-response-1.0 BrowserMatch "JDK/1\.0" force-response-1.0
Este ejemplo evita que las peticiones de imgenes aparezcan en el registro de acceso. Puede ser modificada fcilmente para evitar que se registren entradas de peticiones de directorios, o provenientes de determinados clientes.
SetEnvIf Request_URI \.gif image-request SetEnvIf Request_URI \.jpg image-request SetEnvIf Request_URI \.png image-request CustomLog logs/access_log common env=!image-request
Este ejemplo muestra como evitar que otras webs usen las imgenes de su servidor para sus pginas. Esta configuracin no se recomienda, pero puede funcionar en determinadas circunstancias. Asumimos que que todas sus imgenes estn en un directorio llamado /web/images.
SetEnvIf Referer "^http://www.example.com/" local_referal # Allow browsers that do not send Referer info SetEnvIf Referer "^$" local_referal <Directory /web/images> Order Deny,Allow Deny from all Allow from env=local_referal </Directory>
Para obtener ms informacin sobre esta tcnica, consulte el tutorial de ApacheToday " Keeping Your Images from Adorning Other Sites".
Apache HTTP Server Version 2.0

The latest version of this FAQ is always available from the main Apache web site, at <http://httpd.apache.org/docs/2.0/faq/>.
Since Apache 2.0 is quite new, we don't yet know what the Frequently Asked Questions will be. While this section fills up, you should also consult the Apache 1.3 FAQ to see if your question is answered there.
If you are having trouble with your Apache server software, you should take the following steps:
/usr/local/apache2/logs/error_log, but see the ErrorLog directive in your config files for the
location on your server.Apache has an active community of users who are willing to share their knowledge. Participating in this community is usually the best and fastest way to get answers to your questions and problems.
USENET newsgroups:
If you've gone through those steps above that are appropriate and have obtained no relief, then please do let the httpd developers know about the problem by logging a bug report.
If your problem involves the server crashing and generating a core dump, please include a backtrace (if possible). As an example,
# cd ServerRoot
# dbx httpd core
(dbx) where
(Substitute the appropriate locations for your ServerRoot
and your httpd and core files. You may have to use
gdb instead of dbx.)
With several million users and fewer than forty volunteer developers, we cannot provide personal support for Apache. For free support, we suggest participating in a user forum.
Professional, commercial support for Apache is available from a number of companies.
Apache uses the sendfile syscall on platforms
where it is available in order to speed sending of responses.
Unfortunately, on some systems, Apache will detect the presence of
sendfile at compile-time, even when it does not work
properly. This happens most frequently when using network or
other non-standard file-system.
Symptoms of this problem include the above message in the error
log and zero-length responses to non-zero-sized files. The
problem generally occurs only for static files, since dynamic
content usually does not make use of sendfile.
To fix this problem, simply use the EnableSendfile directive to disable
sendfile for all or part of your server. Also see
the EnableMMAP, which can
help with similar problems.
If you get error messages related to the AcceptEx syscall
on win32, see the Win32DisableAcceptEx
directive.
Most problems with CGI scripts result in this message written in the
error log together with an Internal Server Error delivered
to the browser. A guide to helping debug this type of problem is
available in the CGI
tutorial.
Apache HTTP Server Version 2.0

Apache uses the sendfile syscall on platforms
where it is available in order to speed sending of responses.
Unfortunately, on some systems, Apache will detect the presence of
sendfile at compile-time, even when it does not work
properly. This happens most frequently when using network or
other non-standard file-system.
Symptoms of this problem include the above message in the error
log and zero-length responses to non-zero-sized files. The
problem generally occurs only for static files, since dynamic
content usually does not make use of sendfile.
To fix this problem, simply use the EnableSendfile directive to disable
sendfile for all or part of your server. Also see
the EnableMMAP, which can
help with similar problems.
If you get error messages related to the AcceptEx syscall
on win32, see the Win32DisableAcceptEx
directive.
Most problems with CGI scripts result in this message written in the
error log together with an Internal Server Error delivered
to the browser. A guide to helping debug this type of problem is
available in the CGI
tutorial.
Apache HTTP Server Version 2.0

The latest version of this FAQ is always available from the main Apache web site, at <http://httpd.apache.org/docs/2.0/faq/>. In addition, you can view this FAQ all in one page for easy searching and printing.
Since Apache 2.0 is quite new, we don't yet know what the Frequently Asked Questions will be. While this section fills up, you should also consult the Apache 1.3 FAQ to see if your question is answered there.
Apache HTTP Server Version 2.0

If you are having trouble with your Apache server software, you should take the following steps:
/usr/local/apache2/logs/error_log, but see the ErrorLog directive in your config files for the
location on your server.Apache has an active community of users who are willing to share their knowledge. Participating in this community is usually the best and fastest way to get answers to your questions and problems.
USENET newsgroups:
If you've gone through those steps above that are appropriate and have obtained no relief, then please do let the httpd developers know about the problem by logging a bug report.
If your problem involves the server crashing and generating a core dump, please include a backtrace (if possible). As an example,
# cd ServerRoot
# dbx httpd core
(dbx) where
(Substitute the appropriate locations for your ServerRoot
and your httpd and core files. You may have to use
gdb instead of dbx.)
With several million users and fewer than forty volunteer developers, we cannot provide personal support for Apache. For free support, we suggest participating in a user forum.
Professional, commercial support for Apache is available from a number of companies.
Versin 2.0 del Servidor HTTP Apache

Este documento describe cmo usar filtros en Apache.
| Mdulos Relacionados | Directivas Relacionadas |
|---|---|
Un filtro es un proceso que se aplica a los datos que se reciben o se envan por el servidor. Los datos enviados por los clientes al servidor son procesados por filtros de entrada mientras que los datos enviados por el servidor se procesan por los filtros de salida. A los datos se les pueden aplicar varios filtros, y el orden en que se aplica cada filtro puede especificarse explcitamente.
Los filtros se usan internamente por Apache para llevar a cabo
funciones tales como chunking y servir peticiones de
byte-range. Adems, los mdulos contienen filtros que se
pueden seleccionar usando directivas de configuracin al
iniciar el servidor. El conjunto de filtros que se aplica a los
datos puede manipularse con las directivas SetInputFilter, SetOutputFilter, AddInputFilter, AddOutputFilter, RemoveInputFilter, y RemoveOutputFilter.
Actualmente, vienen con la distribucin de Apache los siguientes filtros seleccionables por el usuario.
mod_includemod_deflate
Adems, el mdulo mod_ext_filter
permite definir programas externos como filtros.
Versin 2.0 del Servidor HTTP Apache

Este glosario define la terminologa ms comn relacionada con Apache en particular y con los servidores web en general. En los enlaces que hay asociados a cada trmino se puede encontrar informacin ms detallada.
INCLUDES
procesa documentos para Server Side Includes.www es un nombre de host,
example.com es un nombre de dominio, y
www.example.com es un nombre de dominio completamente
qualificado.cgi-script designa los ficheros a ser
procesados como CGIs./usr/local/apache2/conf/httpd.conf, pero puede moverse
usando opciones de configuracin al compilar o al iniciar
Apache.GET,
POST, y PUT.text/html,
image/gif, y application/octet-stream. En
HTTP, el tipo MIME se transmite en la cabecera
del Tipo Contenido./images/.*(jpg|gif)$".
Apache usa Expresiones Regulares compatibles con Perl gracias a la
librera PCRE.tar. Las
distribuciones Apache se almacenan en ficheros comprimidos con tar o
con pkzip.http o https, un nombre
de host, y una ruta. Una URL para esta pgina es
http://httpd.apache.org/docs/2.0/glossary.html.Versin 2.0 del Servidor HTTP Apache

Este documento describe el uso de los Handlers en Apache.
| Mdulos Relacionados | Directivas Relacionadas |
|---|---|
Un "handler" es una representacin interna de Apache de una accin que se va a ejecutar cuando hay una llamada a un fichero. Generalmente, los ficheros tienen handlers implcitos, basados en el tipo de fichero de que se trata. Normalmente, todos los ficheros son simplemente servidos por el servidor, pero algunos tipos de ficheros se tratan de forma diferente.
Apache 1.1 aade la posibilidad de usar handlers explicitamente. Basndose en la extension del fichero o en la ubicacin en la que este, se pueden especificar handlers sin tener en cuenta el tipo de fichero de que se trate. Esto es una ventaja por dos razones. Primero, es una solucin ms elegante. Segundo, porque a un fichero se le pueden asignar tanto un tipo como un handler. (Consulte tambin la seccin Ficheros y extensiones mltiples.)
Los Handlers pueden ser tanto ser compilados con el servidor
como incluidos en un mdulo, como aadidos con la
directiva Action. Los
handlers compilados con el servidor de la distribucin
estndar de Apache son:
default_handler(), que es el handler
usado por defecto para tratar contenido
esttico. (core)mod_asis)mod_cgi)mod_imap)mod_info)mod_status)mod_negotiation)Las siguientes directivas hacen que cuando haya una
peticin de ficheros con la extensin
html se lance el script CGI
footer.pl.
Action add-footer /cgi-bin/footer.pl
AddHandler add-footer .html
En este caso, el script CGI es el responsable de enviar el
documento originalmente solicitado (contenido en la variable de
entorno PATH_TRANSLATED) y de hacer cualquier
modificacin o aadido deseado.
Las siguientes directivas activan el handler
send-as-is, que se usa para ficheros que contienen
sus propias cabeceras HTTP. Todos los archivos en el directorio
/web/htdocs/asis/ sern procesados por el
handler send-as-is, sin tener en cuenta su
extension.
<Directory /web/htdocs/asis>
SetHandler send-as-is
</Directory>
Para implementar las funcionalidades de los handlers, se ha
hecho un aadido a la API de
Apache que puede que quiera usar. Para ser ms
especficos, se ha aadido un nuevo registro a la
estructura request_rec:
char *handler
Si quiere que su mdulo llame a un handler , solo tiene
que aadir r->handler al nombre del handler
en cualquier momento antes de la fase invoke_handler
de la peticin. Los handlers se implementan siempre como se
haca antes, aunque usando el nombre del handler en vez de un
tipo de contenido. Aunque no es de obligado cumplimiento, la
convencin de nombres para los handlers es que se usen
palabras separadas por guiones, sin barras, de manera que no se
invada el media type name-space.
Versin 2.0 del Servidor HTTP Apache

La autentificacin es cualquier proceso mediante el cual se verifica que alguien es quien dice ser. La autorizacin es cualquier proceso por el cual a alguien se le permite estar donde quiere ir, o tener la informacin que quiere tener.
| Mdulos Relacionados | Directivas Relacionadas |
|---|---|
Si en su sitio web tiene informacin sensible o dirigida slo a un pequeo grupo de personas, las tcnicas explicadas en ste artculo le ayudarn a asegurarse de que las personas que ven esas pginas son las personas que usted quiere que las vean.
Este artculo cubre la manera "estndar" de proteger partes de su sitio web que la mayora de ustedes van a usar.
Las directivas tratadas en ste artculo necesitarn
ir en el archivo de configuracin principal de su servidor
(tpicamente en una seccin del tipo
<Directory>),
o en archivos de configuracin por directorios (archivos
.htaccess).
Si planea usar archivos .htaccess, necesitar
tener una configuracin en el servidor que permita poner directivas
de autentificacin en estos archivos. Esto se logra con la
directiva AllowOverride,
la cual especifica cules directivas, en caso de existir, pueden
ser colocadas en los archivos de configuracin por directorios.
Ya que se est hablando de autentificacin, necesitar
una directiva AllowOverride como
la siguiente:
AllowOverride AuthConfig
O, si slo va a colocar directivas directamente en el principal archivo de configuracin del servidor, por supuesto necesitar tener permiso de escritura a ese archivo.
Y necesitar saber un poco acerca de la estructura de directorios de su servidor, con la finalidad de que sepa dnde estn algunos archivos. Esto no debera ser muy difcil, y tratar de hacerlo sencillo cuando lleguemos a ese punto.
Aqu est lo esencial en cuanto a proteger con contrasea un directorio de su servidor.
Necesitar crear un archivo de contraseas. ste
archivo debera colocarlo en algn sitio no accesible
mediante la Web. Por ejemplo, si sus documentos son servidos desde
/usr/local/apache/htdocs usted podra querer colocar
el(los) archivo(s) de contraseas en
/usr/local/apache/passwd.
Para crear un archivo de contraseas, use la utilidad
htpasswd que viene con Apache.
sta utilidad puede encontrarla en el directorio bin
de cualquier sitio en que haya instalado Apache. Para crear el
archivo, escriba:
htpasswd -c /usr/local/apache/passwd/passwords rbowen
htpasswd le pedir la contrasea, y luego se
la volver a pedir para confirmarla:
# htpasswd -c /usr/local/apache/passwd/passwords rbowen
New password: mypassword
Re-type new password: mypassword
Adding password for user rbowen
Si htpasswd no est en su ruta, por supuesto
tendr que escribir la ruta completa al archivo para ejecutarlo.
En mi servidor, ste archivo est en
/usr/local/apache/bin/htpasswd
El siguiente paso es configurar el servidor para que solicite una
contrasea y decirle al servidor a qu usuarios se les
permite el acceso. Puede hacer esto editando el archivo
httpd.conf o usando un archivo .htaccess.
Por ejemplo, si desea proteger el directorio
/usr/local/apache/htdocs/secret, puede usar las siguientes
directivas, ya sea colocndolas en el archivo
/usr/local/apache/htdocs/secret/.htaccess,
o en httpd.conf dentro de una seccin <Directory
/usr/local/apache/apache/htdocs/secret>.
AuthType Basic
AuthName "Restricted Files"
AuthUserFile /usr/local/apache/passwd/passwords
Require user rbowen
Vamos a examinar cada una de estas directivas por separado. La
directiva AuthType selecciona
el mtodo que se va a usar para autentificar al usuario. El
mtodo ms comn es Basic, y ste
mtodo est implementado en mod_auth. Es importante
ser consciente, sin embargo, de que la autentificacin Bsica
enva la contrasea desde el cliente hasta el navegador sin
encriptar. Por lo tanto, este mtodo no debera ser usado
para informacin altamente sensible. Apache soporta otro mtodo
de autentificacin: AuthType Digest. Este mtodo
est implementado en mod_auth_digest y es mucho ms
seguro. Slo las versiones ms recientes de clientes soportan
la autentificacin del tipo Digest.
La directiva AuthName establece
el Dominio (Realm) a usar en la
autentificacin. El dominio (realm) cumple
dos funciones importantes. Primero, el cliente frecuentemente presenta
esta informacin al usuario como parte del cuatro de dilogo
para la contrasea. Segundo, es usado por el cliente para determinar
qu contrasea enviar para un rea autentificada dada.
As, por ejemplo, una vez que el cliente se haya autentificado en
el rea "Restricted Files",
automticamente se volver a tratar de usar la misma
contrasea en cualquier rea del mismo servidor que est
marcado con el Dominio (Realm) "Restricted Files". Por lo tanto,
puede evitar que se le pida al usuario la contrasea
ms de una vez permitiendo compartir el mismo dominio (realm)
para mltiples reas restringidas. Por supuesto, por
razones de seguridad, el cliente siempre necesitar pedir de
nuevo la contrasea cuando cambie el nombre de la
mquina del servidor.
La directiva AuthUserFile
establece la ruta al archivo de contrasea que acabamos de crear
con htpasswd. Si tiene un gran nmero de usuarios,
sera bastante lento buscar por medio de un archivo en texto plano
para autentificar al usuario en cada solicitud. Apache tambin tiene
la capacidad de almacenar la informacin del usuario en
archivos rpidos de bases de datos. El mdulo mod_auth_dbm
proporciona la directiva AuthDBMUserFile. Estos archivos pueden
ser creados y manipulados con el programa
dbmmanage. Muchos otros tipos
de opciones de autentificacin estn disponibles en mdulos
de terceras partes en la Base de
datos de Mdulos de Apache.
Finalmente, la directiva Require
proporciona la parte de la autorizacin del proceso estableciendo
el usuario al que se le permite acceder a ese rea del servidor.
En la prxima seccin, discutimos varias formas de usar la
directiva Require.
Las directivas anteriores slo permiten que una persona
(especficamente alguien con un nombre de usuario de
rbowen) acceda al directorio. En la mayora de los
casos, usted querr permitir el acceso a ms de una persona.
Aqu es donde entra la directiva AuthGroupFile.
Si desea permitir la entrada a ms de una persona, necesitar crear un archivo de grupo que asocie nombres de grupo con una lista de usuarios perteneciente a ese grupo. El formato de este archivo es muy sencillo, y puede crearlo con su editor favorito. El contenido del archivo ser parecido a este:
GroupName: rbowen dpitts sungo rshersey
Esto es solo una lista de miembros del grupo escritos en una lnea separados por espacios.
Para agregar un usuario a un archivo de contraseas ya existente, escriba:
htpasswd /usr/local/apache/passwd/passwords dpitts
Obtendr la misma respuesta que antes, pero el nuevo usuario ser agregado
al archivo existente, en lugar de crear un nuevo archivo.
(Es la opcin -c la que se cree un nuevo archivo
de contraseas).
Ahora, necesita modificar su archivo .htaccess para que
sea como el siguiente:
AuthType Basic
AuthName "By Invitation Only"
AuthUserFile /usr/local/apache/passwd/passwords
AuthGroupFile /usr/local/apache/passwd/groups
Require group GroupName
Ahora, cualquiera que est listado en el grupo GroupName,
y figure en el archivo password, se le permitir
el acceso, si escribe la contrasea correcta.
Existe otra manera de permitir entrar a mltiples usuarios que es menos especfica. En lugar de crear un archivo de grupo, puede usar slo la siguiente directiva:
Require valid-user
Usando eso en vez de la lnea Require user rbowen,
le permitir el acceso a cualquiera que est listado en el
archivo de contraseas y que haya introducido correctamente su
contrasea. Incluso puede emular el comportamiento del grupo
aqu, slo manteniendo un archivo de contrasea para
cada grupo. La ventaja de esta tcnica es que Apache slo
tiene que verificar un archivo, en vez de dos. La desventaja es que
usted tiene que mantener un grupo de archivos de contrasea, y
recordar referirse al correcto en la directiva AuthUserFile.
Por la manera en la que la autentificacin bsica est especificada, su nombre de usuario y contrasea debe ser verificado cada vez que se solicita un documento del servidor. Incluso si est recargando la misma pgina, y por cada imagen de la pgina (si vienen de un directorio protegido). Como se puede imaginar, esto retrasa un poco las cosas. El retraso es proporcional al tamao del archivo de contrasea, porque se tiene que abrir ese archivo, y recorrer la lista de usuarios hasta que encuentre su nombre. Y eso se tiene que hacer cada vez que se cargue la pgina.
Una consecuencia de esto es que hay un lmite prctico de cuntos usuarios puede colocar en un archivo de contraseas. Este lmite variar dependiendo del rendimiento de su equipo servidor en particular, pero puede esperar observar una disminucin una vez que inserte unos cientos de entradas, y puede que entonces considere un mtodo distinto de autentificacin.
La autentificacin por nombre de usuario y contrasea es slo parte del cuento. Frecuentemente se desea permitir el acceso a los usuarios basandose en algo ms que quines son. Algo como de dnde vienen.
Las directivas Allow y
Deny posibilitan permitir
y rechazar el acceso dependiendo del nombre o la direccin de la
mquina que solicita un documento. La directiva Order va de la mano con estas dos, y le
dice a Apache en qu orden aplicar los filtros.
El uso de estas directivas es:
Allow from address
donde address es una direccin IP (o una direccin IP parcial) o un nombre de dominio completamente cualificado (o un nombre de dominio parcial); puede proporcionar mltiples direcciones o nombres de dominio, si lo desea.
Por ejemplo, si usted tiene a alguien que manda mensajes no deseados a su foro, y quiere que no vuelva a acceder, podra hacer lo siguiente:
Deny from 205.252.46.165
Los visitantes que vengan de esa direccin no podrn ver el contenido afectado por esta directiva. Si, por el contrario, usted tiene un nombre de mquina pero no una direccin IP, tambin puede usarlo.
Deny from host.example.com
Y, si le gustara bloquear el acceso de un dominio entero, puede especificar slo parte de una direccin o nombre de dominio:
Deny from 192.101.205
Deny from cyberthugs.com moreidiots.com
Deny from ke
Usar Order le permitir
estar seguro de que efectivamente est restringiendo el acceso
al grupo al que quiere permitir el acceso, combinando una directiva
Deny y una Allow:
Order deny,allow
Deny from all
Allow from dev.example.com
Usando slo la directiva Allow no hara lo que desea, porque
le permitira entrar a la gente proveniente de esa mquina, y
adicionalmente a cualquier persona. Lo que usted quiere es dejar entrar
slo aquellos.
Tambin debera leer la documentacin de
mod_auth y mod_access que
contiene ms informacin acerca de cmo funciona todo esto.
Apache HTTP Server Version 2.0

| Related Modules | Related Directives |
|---|---|
The CGI (Common Gateway Interface) defines a way for a web server to interact with external content-generating programs, which are often referred to as CGI programs or CGI scripts. It is the simplest, and most common, way to put dynamic content on your web site. This document will be an introduction to setting up CGI on your Apache web server, and getting started writing CGI programs.
In order to get your CGI programs to work properly, you'll need to have Apache configured to permit CGI execution. There are several ways to do this.
The
ScriptAlias
directive tells Apache that a particular directory is set
aside for CGI programs. Apache will assume that every file in
this directory is a CGI program, and will attempt to execute
it, when that particular resource is requested by a
client.
The ScriptAlias
directive looks like:
ScriptAlias /cgi-bin/ /usr/local/apache2/cgi-bin/
The example shown is from your default httpd.conf
configuration file, if you installed Apache in the default
location. The ScriptAlias
directive is much like the Alias directive, which defines a URL prefix that
is to mapped to a particular directory. Alias
and ScriptAlias are usually used for
directories that are outside of the DocumentRoot directory. The difference between
Alias and ScriptAlias
is that ScriptAlias has the added meaning
that everything under that URL prefix will be considered a CGI
program. So, the example above tells Apache that any request for a
resource beginning with /cgi-bin/ should be served from
the directory /usr/local/apache2/cgi-bin/, and should be
treated as a CGI program.
For example, if the URL
http://www.example.com/cgi-bin/test.pl
is requested, Apache will attempt to execute the file
/usr/local/apache2/cgi-bin/test.pl
and return the output. Of course, the file will have to
exist, and be executable, and return output in a particular
way, or Apache will return an error message.
CGI programs are often restricted to ScriptAlias'ed directories for security reasons.
In this way, administrators can tightly control who is allowed to
use CGI programs. However, if the proper security precautions are
taken, there is no reason why CGI programs cannot be run from
arbitrary directories. For example, you may wish to let users
have web content in their home directories with the
UserDir directive.
If they want to have their own CGI programs, but don't have access to
the main cgi-bin directory, they will need to be able to
run CGI programs elsewhere.
There are two steps to allowing CGI execution in an arbitrary
directory. First, the cgi-script handler must be
activated using the AddHandler or SetHandler directive. Second,
ExecCGI must be specified in the Options directive.
You could explicitly use the Options directive, inside your main server configuration
file, to specify that CGI execution was permitted in a particular
directory:
<Directory /usr/local/apache2/htdocs/somedir>
Options +ExecCGI
</Directory>
The above directive tells Apache to permit the execution
of CGI files. You will also need to tell the server what
files are CGI files. The following AddHandler directive tells the server to treat all
files with the cgi or pl extension as CGI
programs:
AddHandler cgi-script .cgi .pl
The .htaccess tutorial
shows how to activate CGI programs if you do not have
access to httpd.conf.
To allow CGI program execution for any file ending in
.cgi in users' directories, you can use the
following configuration.
<Directory /home/*/public_html>
Options +ExecCGI
AddHandler cgi-script .cgi
</Directory>
If you wish designate a cgi-bin subdirectory of
a user's directory where everything will be treated as a CGI
program, you can use the following.
<Directory /home/*/public_html/cgi-bin>
Options ExecCGI
SetHandler cgi-script
</Directory>
There are two main differences between ``regular'' programming, and CGI programming.
First, all output from your CGI program must be preceded by a MIME-type header. This is HTTP header that tells the client what sort of content it is receiving. Most of the time, this will look like:
Content-type: text/html
Secondly, your output needs to be in HTML, or some other format that a browser will be able to display. Most of the time, this will be HTML, but occasionally you might write a CGI program that outputs a gif image, or other non-HTML content.
Apart from those two things, writing a CGI program will look a lot like any other program that you might write.
The following is an example CGI program that prints one
line to your browser. Type in the following, save it to a
file called first.pl, and put it in your
cgi-bin directory.
#!/usr/bin/perl
print "Content-type: text/html\n\n";
print "Hello, World.";
Even if you are not familiar with Perl, you should be able
to see what is happening here. The first line tells Apache
(or whatever shell you happen to be running under) that this
program can be executed by feeding the file to the
interpreter found at the location /usr/bin/perl.
The second line prints the content-type declaration we
talked about, followed by two carriage-return newline pairs.
This puts a blank line after the header, to indicate the end
of the HTTP headers, and the beginning of the body. The third
line prints the string "Hello, World.". And that's the end
of it.
If you open your favorite browser and tell it to get the address
http://www.example.com/cgi-bin/first.pl
or wherever you put your file, you will see the one line
Hello, World. appear in your browser window.
It's not very exciting, but once you get that working, you'll
have a good chance of getting just about anything working.
There are four basic things that you may see in your browser when you try to access your CGI program from the web:
Content-Type set in your CGI program.Remember that the server does not run as you. That is,
when the server starts up, it is running with the permissions
of an unprivileged user - usually nobody, or
www - and so it will need extra permissions to
execute files that are owned by you. Usually, the way to give
a file sufficient permissions to be executed by nobody
is to give everyone execute permission on the file:
chmod a+x first.pl
Also, if your program reads from, or writes to, any other files, those files will need to have the correct permissions to permit this.
When you run a program from your command line, you have
certain information that is passed to the shell without you
thinking about it. For example, you have a PATH,
which tells the shell where it can look for files that you
reference.
When a program runs through the web server as a CGI program,
it may not have the same PATH. Any programs that you
invoke in your CGI program (like sendmail, for
example) will need to be specified by a full path, so that the
shell can find them when it attempts to execute your CGI
program.
A common manifestation of this is the path to the script
interpreter (often perl) indicated in the first
line of your CGI program, which will look something like:
#!/usr/bin/perl
Make sure that this is in fact the path to the interpreter.
In addition, if your CGI program depends on other environment variables, you will need to assure that those variables are passed by Apache.
Most of the time when a CGI program fails, it's because of a problem with the program itself. This is particularly true once you get the hang of this CGI stuff, and no longer make the above two mistakes. The first thing to do is to make sure that your program runs from the command line before testing it via the web server. For example, try:
cd /usr/local/apache2/cgi-bin
./first.pl
(Do not call the perl interpreter. The shell
and Apache should find the interpreter using the path information on the first line of
the script.)
The first thing you see written by your program should be
a set of HTTP headers, including the Content-Type,
followed by a blank line. If you see anything else, Apache will
return the Premature end of script headers error if
you try to run it through the server. See Writing a CGI program above for more
details.
The error logs are your friend. Anything that goes wrong generates message in the error log. You should always look there first. If the place where you are hosting your web site does not permit you access to the error log, you should probably host your site somewhere else. Learn to read the error logs, and you'll find that almost all of your problems are quickly identified, and quickly solved.
The suexec support program
allows CGI programs to be run under different user permissions,
depending on which virtual host or user home directory they are
located in. Suexec has very strict permission checking, and any
failure in that checking will result in your CGI programs
failing with Premature end of script headers.
To check if you are using suexec, run apachectl
-V and check for the location of SUEXEC_BIN.
If Apache finds an suexec binary there on startup,
suexec will be activated.
Unless you fully understand suexec, you should not be using it.
To disable suexec, simply remove (or rename) the suexec
binary pointed to by SUEXEC_BIN and then restart the
server. If, after reading about suexec,
you still wish to use it, then run suexec -V to find
the location of the suexec log file, and use that log file to
find what policy you are violating.
As you become more advanced in CGI programming, it will become useful to understand more about what's happening behind the scenes. Specifically, how the browser and server communicate with one another. Because although it's all very well to write a program that prints "Hello, World.", it's not particularly useful.
Environment variables are values that float around you as
you use your computer. They are useful things like your path
(where the computer searches for the actual file
implementing a command when you type it), your username, your
terminal type, and so on. For a full list of your normal,
every day environment variables, type
env at a command prompt.
During the CGI transaction, the server and the browser also set environment variables, so that they can communicate with one another. These are things like the browser type (Netscape, IE, Lynx), the server type (Apache, IIS, WebSite), the name of the CGI program that is being run, and so on.
These variables are available to the CGI programmer, and are half of the story of the client-server communication. The complete list of required variables is at http://hoohoo.ncsa.uiuc.edu/cgi/env.html.
This simple Perl CGI program will display all of the
environment variables that are being passed around. Two
similar programs are included in the
cgi-bin
directory of the Apache distribution. Note that some
variables are required, while others are optional, so you may
see some variables listed that were not in the official list.
In addition, Apache provides many different ways for you to
add your own environment variables
to the basic ones provided by default.
#!/usr/bin/perl
print "Content-type: text/html\n\n";
foreach $key (keys %ENV) {
print "$key --> $ENV{$key}<br>";
}
Other communication between the server and the client
happens over standard input (STDIN) and standard
output (STDOUT). In normal everyday context,
STDIN means the keyboard, or a file that a
program is given to act on, and STDOUT
usually means the console or screen.
When you POST a web form to a CGI program,
the data in that form is bundled up into a special format
and gets delivered to your CGI program over STDIN.
The program then can process that data as though it was
coming in from the keyboard, or from a file
The "special format" is very simple. A field name and its value are joined together with an equals (=) sign, and pairs of values are joined together with an ampersand (&). Inconvenient characters like spaces, ampersands, and equals signs, are converted into their hex equivalent so that they don't gum up the works. The whole data string might look something like:
name=Rich%20Bowen&city=Lexington&state=KY&sidekick=Squirrel%20Monkey
You'll sometimes also see this type of string appended to
a URL. When that is done, the server puts that string
into the environment variable called
QUERY_STRING. That's called a GET
request. Your HTML form specifies whether a GET
or a POST is used to deliver the data, by setting the
METHOD attribute in the FORM tag.
Your program is then responsible for splitting that string up into useful information. Fortunately, there are libraries and modules available to help you process this data, as well as handle other of the aspects of your CGI program.
When you write CGI programs, you should consider using a code library, or module, to do most of the grunt work for you. This leads to fewer errors, and faster development.
If you're writing CGI programs in Perl, modules are
available on CPAN. The most
popular module for this purpose is CGI.pm. You might
also consider CGI::Lite, which implements a minimal
set of functionality, which is all you need in most programs.
If you're writing CGI programs in C, there are a variety of
options. One of these is the CGIC library, from
http://www.boutell.com/cgic/.
There are a large number of CGI resources on the web. You can discuss CGI problems with other users on the Usenet group comp.infosystems.www.authoring.cgi. And the -servers mailing list from the HTML Writers Guild is a great source of answers to your questions. You can find out more at http://www.hwg.org/lists/hwg-servers/.
And, of course, you should probably read the CGI specification, which has all the details on the operation of CGI programs. You can find the original version at the NCSA and there is an updated draft at the Common Gateway Interface RFC project.
When you post a question about a CGI problem that you're having, whether to a mailing list, or to a newsgroup, make sure you provide enough information about what happened, what you expected to happen, and how what actually happened was different, what server you're running, what language your CGI program was in, and, if possible, the offending code. This will make finding your problem much simpler.
Note that questions about CGI problems should never be posted to the Apache bug database unless you are sure you have found a problem in the Apache source code.
Apache HTTP Server Version 2.0

.htaccess files provide a way to make configuration
changes on a per-directory basis.
| Related Modules | Related Directives |
|---|---|
.htaccess files (or "distributed configuration files")
provide a way to make configuration changes on a per-directory basis. A
file, containing one or more configuration directives, is placed in a
particular document directory, and the directives apply to that
directory, and all subdirectories thereof.
If you want to call your .htaccess file something
else, you can change the name of the file using the AccessFileName directive. For example,
if you would rather call the file .config then you
can put the following in your server configuration file:
AccessFileName .config
In general, .htaccess files use the same syntax as
the main configuration
files. What you can put in these files is determined by the
AllowOverride directive. This
directive specifies, in categories, what directives will be
honored if they are found in a .htaccess file. If a
directive is permitted in a .htaccess file, the
documentation for that directive will contain an Override section,
specifying what value must be in AllowOverride in order for that
directive to be permitted.
For example, if you look at the documentation for the AddDefaultCharset
directive, you will find that it is permitted in .htaccess
files. (See the Context line in the directive summary.) The Override line reads
FileInfo. Thus, you must have at least
AllowOverride FileInfo in order for this directive to be
honored in .htaccess files.
If you are unsure whether a particular directive is permitted in a
.htaccess file, look at the documentation for that
directive, and check the Context line for ".htaccess".
In general, you should never use .htaccess files unless
you don't have access to the main server configuration file. There is,
for example, a prevailing misconception that user authentication should
always be done in .htaccess files. This is simply not the
case. You can put user authentication configurations in the main server
configuration, and this is, in fact, the preferred way to do
things.
.htaccess files should be used in a case where the
content providers need to make configuration changes to the server on a
per-directory basis, but do not have root access on the server system.
In the event that the server administrator is not willing to make
frequent configuration changes, it might be desirable to permit
individual users to make these changes in .htaccess files
for themselves. This is particularly true, for example, in cases where
ISPs are hosting multiple user sites on a single machine, and want
their users to be able to alter their configuration.
However, in general, use of .htaccess files should be
avoided when possible. Any configuration that you would consider
putting in a .htaccess file, can just as effectively be
made in a <Directory> section in your main server
configuration file.
There are two main reasons to avoid the use of
.htaccess files.
The first of these is performance. When AllowOverride
is set to allow the use of .htaccess files, Apache will
look in every directory for .htaccess files. Thus,
permitting .htaccess files causes a performance hit,
whether or not you actually even use them! Also, the
.htaccess file is loaded every time a document is
requested.
Further note that Apache must look for .htaccess files
in all higher-level directories, in order to have a full complement of
directives that it must apply. (See section on how
directives are applied.) Thus, if a file is requested out of a
directory /www/htdocs/example, Apache must look for the
following files:
/.htaccess
/www/.htaccess
/www/htdocs/.htaccess
/www/htdocs/example/.htaccess
And so, for each file access out of that directory, there are 4
additional file-system accesses, even if none of those files are
present. (Note that this would only be the case if
.htaccess files were enabled for /, which
is not usually the case.)
The second consideration is one of security. You are permitting
users to modify server configuration, which may result in changes over
which you have no control. Carefully consider whether you want to give
your users this privilege. Note also that giving users less
privileges than they need will lead to additional technical support
requests. Make sure you clearly tell your users what level of
privileges you have given them. Specifying exactly what you have set
AllowOverride to, and pointing them
to the relevant documentation, will save yourself a lot of confusion
later.
Note that it is completely equivalent to put a .htaccess
file in a directory /www/htdocs/example containing a
directive, and to put that same directive in a Directory section
<Directory /www/htdocs/example> in your main server
configuration:
.htaccess file in /www/htdocs/example:
/www/htdocs/example
AddType text/example .exm
httpd.conf
file
<Directory /www/htdocs/example>
AddType text/example .exm
</Directory>
However, putting this configuration in your server configuration file will result in less of a performance hit, as the configuration is loaded once when Apache starts, rather than every time a file is requested.
The use of .htaccess files can be disabled completely
by setting the AllowOverride
directive to none:
AllowOverride None
The configuration directives found in a .htaccess file
are applied to the directory in which the .htaccess file
is found, and to all subdirectories thereof. However, it is important
to also remember that there may have been .htaccess files
in directories higher up. Directives are applied in the order that they
are found. Therefore, a .htaccess file in a particular
directory may override directives found in .htaccess files
found higher up in the directory tree. And those, in turn, may have
overridden directives found yet higher up, or in the main server
configuration file itself.
Example:
In the directory /www/htdocs/example1 we have a
.htaccess file containing the following:
Options +ExecCGI
(Note: you must have "AllowOverride Options" in effect
to permit the use of the "Options" directive in
.htaccess files.)
In the directory /www/htdocs/example1/example2 we have
a .htaccess file containing:
Options Includes
Because of this second .htaccess file, in the directory
/www/htdocs/example1/example2, CGI execution is not
permitted, as only Options Includes is in effect, which
completely overrides any earlier setting that may have been in
place.
As discussed in the documentation on Configuration Sections,
.htaccess files can override the <Directory> sections for
the corresponding directory, but will be overriden by other types
of configuration sections from the main configuration files. This
fact can be used to enforce certain configurations, even in the
presence of a liberal AllowOverride setting. For example, to
prevent script execution while allowing anything else to be set in
.htaccess you can use:
<Directory />
Allowoverride All
</Directory>
<Location />
Options +IncludesNoExec -ExecCGI
</Location>
If you jumped directly to this part of the document to find out how
to do authentication, it is important to note one thing. There is a
common misconception that you are required to use
.htaccess files in order to implement password
authentication. This is not the case. Putting authentication directives
in a <Directory>
section, in your main server configuration file, is the preferred way
to implement this, and .htaccess files should be used only
if you don't have access to the main server configuration file. See above for a discussion of when you should and should
not use .htaccess files.
Having said that, if you still think you need to use a
.htaccess file, you may find that a configuration such as
what follows may work for you.
You must have "AllowOverride AuthConfig" in effect for
these directives to be honored.
.htaccess file contents:
AuthType Basic
AuthName "Password Required"
AuthUserFile /www/passwords/password.file
AuthGroupFile /www/passwords/group.file
Require Group admins
Note that AllowOverride AuthConfig must be in effect
for these directives to have any effect.
Please see the authentication tutorial for a more complete discussion of authentication and authorization.
Another common use of .htaccess files is to enable
Server Side Includes for a particular directory. This may be done with
the following configuration directives, placed in a
.htaccess file in the desired directory:
Options +Includes
AddType text/html shtml
AddHandler server-parsed shtml
Note that AllowOverride Options and AllowOverride
FileInfo must both be in effect for these directives to have any
effect.
Please see the SSI tutorial for a more complete discussion of server-side includes.
Finally, you may wish to use a .htaccess file to permit
the execution of CGI programs in a particular directory. This may be
implemented with the following configuration:
Options +ExecCGI
AddHandler cgi-script cgi pl
Alternately, if you wish to have all files in the given directory be considered to be CGI programs, this may be done with the following configuration:
Options +ExecCGI
SetHandler cgi-script
Note that AllowOverride Options and AllowOverride
FileInfo must both be in effect for these directives to have any
effect.
Please see the CGI tutorial for a more complete discussion of CGI programming and configuration.
When you put configuration directives in a .htaccess
file, and you don't get the desired effect, there are a number of
things that may be going wrong.
Most commonly, the problem is that AllowOverride is not
set such that your configuration directives are being honored. Make
sure that you don't have a AllowOverride None in effect
for the file scope in question. A good test for this is to put garbage
in your .htaccess file and reload. If a server error is
not generated, then you almost certainly have AllowOverride
None in effect.
If, on the other hand, you are getting server errors when trying to
access documents, check your Apache error log. It will likely tell you
that the directive used in your .htaccess file is not
permitted. Alternately, it may tell you that you had a syntax error,
which you will then need to fix.
Apache HTTP Server Version 2.0

Authentication is any process by which you verify that someone is who they claim they are. Authorization is any process by which someone is allowed to be where they want to go, or to have information that they want to have.
The CGI (Common Gateway Interface) defines a way for a web server to interact with external content-generating programs, which are often referred to as CGI programs or CGI scripts. It is the simplest, and most common, way to put dynamic content on your web site. This document will be an introduction to setting up CGI on your Apache web server, and getting started writing CGI programs.
See: CGI: Dynamic Content
.htaccess files.htaccess files provide a way to make configuration
changes on a per-directory basis. A file, containing one or more
configuration directives, is placed in a particular document directory,
and the directives apply to that directory, and all subdirectories thereof.
See: .htaccess files
SSI (Server Side Includes) are directives that are placed in HTML pages, and evaluated on the server while the pages are being served. They let you add dynamically generated content to an existing HTML page, without having to serve the entire page via a CGI program, or other dynamic technology.
On systems with multiple users, each user can be permitted to have a
web site in their home directory using the UserDir directive. Visitors
to a URL http://example.com/~username/ will get content
out of the home directory of the user "username", out of
the subdirectory specified by the UserDir directive.
Apache HTTP Server Version 2.0

On systems with multiple users, each user can be permitted to have a
web site in their home directory using the UserDir directive. Visitors
to a URL http://example.com/~username/ will get content
out of the home directory of the user "username", out of
the subdirectory specified by the UserDir directive.
Per-user web directories
Setting the file path with UserDir
Restricting what users are permitted to use this
feature
Enabling a cgi directory for each user
Allowing users to alter configuration| Related Modules | Related Directives |
|---|---|
The UserDir
directive specifies a directory out of which per-user
content is loaded. This directive may take several different forms.
If a path is given which does not start with a leading slash, it is assumed to be a directory path relative to the home directory of the specified user. Given this configuration:
UserDir public_html
the URL http://example.com/~rbowen/file.html will be
translated to the file path
/home/rbowen/public_html/file.html
If a path is given starting with a slash, a directory path will be constructed using that path, plus the username specified. Given this configuration:
UserDir /var/html
the URL http://example.com/~rbowen/file.html will be
translated to the file path /var/html/rbowen/file.html
If a path is provided which contains an asterisk (*), a path is used in which the asterisk is replaced with the username. Given this configuration:
UserDir /var/www/*/docs
the URL http://example.com/~rbowen/file.html will be
translated to the file path
/var/www/rbowen/docs/file.html
Using the syntax shown in the UserDir documentation, you can restrict what users are permitted to use this functionality:
UserDir enabled
UserDir disabled root jro fish
The configuration above will enable the feature for all users
except for those listed in the disabled statement.
You can, likewise, disable the feature for all but a few users by
using a configuration like the following:
UserDir disabled
UserDir enabled rbowen krietz
See UserDir
documentation for additional examples.
In order to give each user their own cgi-bin directory, you can use
a <Directory>
directive to make a particular subdirectory of a user's home directory
cgi-enabled.
<Directory /home/*/public_html/cgi-bin/>
Options ExecCGI
SetHandler cgi-script
</Directory>
Then, presuming that UserDir is set to
public_html, a cgi program example.cgi
could be loaded from that directory as:
http://example.com/~rbowen/cgi-bin/example.cgi
If you want to allows users to modify the server configuration in
their web space, they will need to use .htaccess files to
make these changed. Ensure that you have set AllowOverride to a
value sufficient for the directives that you want to permit the users
to modify. See the .htaccess tutorial for
additional details on how this works.
Apache HTTP Server Version 2.0

Server-side includes provide a means to add dynamic content to existing HTML documents.
| Related Modules | Related Directives |
|---|---|
This article deals with Server Side Includes, usually called simply SSI. In this article, I'll talk about configuring your server to permit SSI, and introduce some basic SSI techniques for adding dynamic content to your existing HTML pages.
In the latter part of the article, we'll talk about some of the somewhat more advanced things that can be done with SSI, such as conditional statements in your SSI directives.
SSI (Server Side Includes) are directives that are placed in HTML pages, and evaluated on the server while the pages are being served. They let you add dynamically generated content to an existing HTML page, without having to serve the entire page via a CGI program, or other dynamic technology.
The decision of when to use SSI, and when to have your page entirely generated by some program, is usually a matter of how much of the page is static, and how much needs to be recalculated every time the page is served. SSI is a great way to add small pieces of information, such as the current time. But if a majority of your page is being generated at the time that it is served, you need to look for some other solution.
To permit SSI on your server, you must have the following
directive either in your httpd.conf file, or in a
.htaccess file:
Options +Includes
This tells Apache that you want to permit files to be parsed
for SSI directives. Note that most configurations contain
multiple Options directives
that can override each other. You will probably need to apply the
Options to the specific directory where you want SSI
enabled in order to assure that it gets evaluated last.
Not just any file is parsed for SSI directives. You have to
tell Apache which files should be parsed. There are two ways to
do this. You can tell Apache to parse any file with a
particular file extension, such as .shtml, with
the following directives:
AddType text/html .shtml
AddOutputFilter INCLUDES .shtml
One disadvantage to this approach is that if you wanted to
add SSI directives to an existing page, you would have to
change the name of that page, and all links to that page, in
order to give it a .shtml extension, so that those
directives would be executed.
The other method is to use the XBitHack directive:
XBitHack on
XBitHack
tells Apache to parse files for SSI
directives if they have the execute bit set. So, to add SSI
directives to an existing page, rather than having to change
the file name, you would just need to make the file executable
using chmod.
chmod +x pagename.html
A brief comment about what not to do. You'll occasionally
see people recommending that you just tell Apache to parse all
.html files for SSI, so that you don't have to
mess with .shtml file names. These folks have
perhaps not heard about XBitHack. The thing to
keep in mind is that, by doing this, you're requiring that
Apache read through every single file that it sends out to
clients, even if they don't contain any SSI directives. This
can slow things down quite a bit, and is not a good idea.
Of course, on Windows, there is no such thing as an execute bit to set, so that limits your options a little.
In its default configuration, Apache does not send the last modified date or content length HTTP headers on SSI pages, because these values are difficult to calculate for dynamic content. This can prevent your document from being cached, and result in slower perceived client performance. There are two ways to solve this:
XBitHack Full configuration. This
tells Apache to determine the last modified date by looking
only at the date of the originally requested file, ignoring
the modification date of any included files.mod_expires to set an explicit expiration
time on your files, thereby letting browsers and proxies
know that it is acceptable to cache them.SSI directives have the following syntax:
<!--#element attribute=value attribute=value ... -->
It is formatted like an HTML comment, so if you don't have SSI correctly enabled, the browser will ignore it, but it will still be visible in the HTML source. If you have SSI correctly configured, the directive will be replaced with its results.
The element can be one of a number of things, and we'll talk some more about most of these in the next installment of this series. For now, here are some examples of what you can do with SSI
<!--#echo var="DATE_LOCAL" -->
The echo element just spits out the value of a
variable. There are a number of standard variables, which
include the whole set of environment variables that are
available to CGI programs. Also, you can define your own
variables with the set element.
If you don't like the format in which the date gets printed,
you can use the config element, with a
timefmt attribute, to modify that formatting.
<!--#config timefmt="%A %B %d, %Y" -->
Today is <!--#echo var="DATE_LOCAL" -->
This document last modified <!--#flastmod file="index.html" -->
This element is also subject to timefmt format
configurations.
This is one of the more common uses of SSI - to output the results of a CGI program, such as everybody's favorite, a ``hit counter.''
<!--#include virtual="/cgi-bin/counter.pl" -->
Following are some specific examples of things you can do in your HTML documents with SSI.
Earlier, we mentioned that you could use SSI to inform the user when the document was most recently modified. However, the actual method for doing that was left somewhat in question. The following code, placed in your HTML document, will put such a time stamp on your page. Of course, you will have to have SSI correctly enabled, as discussed above.
<!--#config timefmt="%A %B %d, %Y" -->
This file last modified <!--#flastmod file="ssi.shtml" -->
Of course, you will need to replace the
ssi.shtml with the actual name of the file that
you're referring to. This can be inconvenient if you're just
looking for a generic piece of code that you can paste into any
file, so you probably want to use the
LAST_MODIFIED variable instead:
<!--#config timefmt="%D" -->
This file last modified <!--#echo var="LAST_MODIFIED" -->
For more details on the timefmt format, go to
your favorite search site and look for strftime. The
syntax is the same.
If you are managing any site that is more than a few pages, you may find that making changes to all those pages can be a real pain, particularly if you are trying to maintain some kind of standard look across all those pages.
Using an include file for a header and/or a footer can
reduce the burden of these updates. You just have to make one
footer file, and then include it into each page with the
include SSI command. The include
element can determine what file to include with either the
file attribute, or the virtual
attribute. The file attribute is a file path,
relative to the current directory. That means that it
cannot be an absolute file path (starting with /), nor can it
contain ../ as part of that path. The virtual
attribute is probably more useful, and should specify a URL
relative to the document being served. It can start with a /,
but must be on the same server as the file being served.
<!--#include virtual="/footer.html" -->
I'll frequently combine the last two things, putting a
LAST_MODIFIED directive inside a footer file to be
included. SSI directives can be contained in the included file,
and includes can be nested - that is, the included file can
include another file, and so on.
In addition to being able to config the time
format, you can also config two other things.
Usually, when something goes wrong with your SSI directive, you get the message
[an error occurred while processing this directive]
If you want to change that message to something else, you
can do so with the errmsg attribute to the
config element:
<!--#config errmsg="[It appears that you don't know how to use SSI]" -->
Hopefully, end users will never see this message, because you will have resolved all the problems with your SSI directives before your site goes live. (Right?)
And you can config the format in which file
sizes are returned with the sizefmt attribute. You
can specify bytes for a full count in bytes, or
abbrev for an abbreviated number in Kb or Mb, as
appropriate.
I expect that I'll have an article some time in the coming
months about using SSI with small CGI programs. For now, here's
something else that you can do with the exec
element. You can actually have SSI execute a command using the
shell (/bin/sh, to be precise - or the DOS shell,
if you're on Win32). The following, for example, will give you
a directory listing.
<pre>
<!--#exec cmd="ls" -->
</pre>
or, on Windows
<pre>
<!--#exec cmd="dir" -->
</pre>
You might notice some strange formatting with this directive
on Windows, because the output from dir contains
the string ``<dir>'' in it, which confuses
browsers.
Note that this feature is exceedingly dangerous, as it will
execute whatever code happens to be embedded in the
exec tag. If you have any situation where users
can edit content on your web pages, such as with a
``guestbook'', for example, make sure that you have this
feature disabled. You can allow SSI, but not the
exec feature, with the IncludesNOEXEC
argument to the Options directive.
In addition to spitting out content, Apache SSI gives you the option of setting variables, and using those variables in comparisons and conditionals.
Most of the features discussed in this article are only available to you if you are running Apache 1.2 or later. Of course, if you are not running Apache 1.2 or later, you need to upgrade immediately, if not sooner. Go on. Do it now. We'll wait.
Using the set directive, you can set variables
for later use. We'll need this later in the discussion, so
we'll talk about it here. The syntax of this is as follows:
<!--#set var="name" value="Rich" -->
In addition to merely setting values literally like that, you
can use any other variable, including environment variables or the variables
discussed above (like LAST_MODIFIED, for example) to
give values to your variables. You will specify that something is
a variable, rather than a literal string, by using the dollar sign
($) before the name of the variable.
<!--#set var="modified" value="$LAST_MODIFIED" -->
To put a literal dollar sign into the value of your variable, you need to escape the dollar sign with a backslash.
<!--#set var="cost" value="\$100" -->
Finally, if you want to put a variable in the midst of a longer string, and there's a chance that the name of the variable will run up against some other characters, and thus be confused with those characters, you can place the name of the variable in braces, to remove this confusion. (It's hard to come up with a really good example of this, but hopefully you'll get the point.)
<!--#set var="date" value="${DATE_LOCAL}_${DATE_GMT}" -->
Now that we have variables, and are able to set and compare
their values, we can use them to express conditionals. This
lets SSI be a tiny programming language of sorts.
mod_include provides an if,
elif, else, endif
structure for building conditional statements. This allows you
to effectively generate multiple logical pages out of one
actual page.
The structure of this conditional construct is:
<!--#if expr="test_condition" -->
<!--#elif expr="test_condition" -->
<!--#else -->
<!--#endif -->
A test_condition can be any sort of logical
comparison - either comparing values to one another, or testing
the ``truth'' of a particular value. (A given string is true if
it is nonempty.) For a full list of the comparison operators
available to you, see the mod_include
documentation. Here are some examples of how one might use this
construct.
In your configuration file, you could put the following line:
BrowserMatchNoCase macintosh Mac
BrowserMatchNoCase MSIE InternetExplorer
This will set environment variables ``Mac'' and ``InternetExplorer'' to true, if the client is running Internet Explorer on a Macintosh.
Then, in your SSI-enabled document, you might do the following:
<!--#if expr="${Mac} && ${InternetExplorer}" -->
Apologetic text goes here
<!--#else -->
Cool JavaScript code goes here
<!--#endif -->
Not that I have anything against IE on Macs - I just struggled for a few hours last week trying to get some JavaScript working on IE on a Mac, when it was working everywhere else. The above was the interim workaround.
Any other variable (either ones that you define, or normal
environment variables) can be used in conditional statements.
With Apache's ability to set environment variables with the
SetEnvIf directives, and other related directives,
this functionality can let you do some pretty involved dynamic
stuff without ever resorting to CGI.
SSI is certainly not a replacement for CGI, or other technologies used for generating dynamic web pages. But it is a great way to add small amounts of dynamic content to pages, without doing a lot of extra work.