HOW-TO Write a CGI Program in C/C++

C++ CGI Information and Variable Wrapper Libraries

Homepage: Variable Wrapper | HOW-TO do CGI with C | References | PurplePixie Main

HOW-TO Write a CGI in C/C++
Table of Contents
1. Introduction to CGI Principles
2. Differences Between a CGI and Other Applications
3. Basic Input and Output
4. First Simple CGI
5. Dealing With User Input
6. More Advanced Topics

Copyright Information and Errata
This document is authored by and © Copyright 2004-5 David Cutting. All rights reserved. This document or sections thereof may only be reproduced with the express permission of the author. Please report any errors or errata along with any requests for reproduction rights by email to webmaster@purplepixie.org. Reproduction rights will normally be granted free of all charge and restrictions to any non-profit making or public sector organisation.

1. Introduction to CGI Principles

The Common Gateway Interface (CGI) is a standardised method of passing data to and from a web server. The CGI allows for web pages not merely consisting of a static HTML file but inclusive of dynamic content. The CGI framework defines standards to allow a web server to call a second, seperate application and pass to/receive from it.

Data is produced by the CGI program and output to a virtual 'screen', this screen is then echoed by the web server back to the requesting client. This simple and standardised output method combined with a number of powerful yet easily functional input methods detailed later allow relatively easy creation of powerful web-based applications. Developing CGI rather than standard applications allows easy multi-user environments as well as removing a vast amount of work from the developer in terms of I/O.

CGI programming is no more or less complex than other forms of programming and the added benefits that it provides may be outwayed by the lack of native real-time control in the user interface. This document is intended for programmers to get a feel for CGI techniques and hopefully to discuss some best-practice for the specific design and development methods required.

Although examples throughout are provided in C, the basic I/O system would be equally applicable with any CGI-able language (which is most but probably not Visual Basic).

2. Differences Between a CGI and Other Applications

Unlike a traditional application a CGI technically runs in a non user interactive environment. A CGI is usually run (obviously there are always advanced examples like self-referential code to spoil every example) as a single-use program. When it is called by the web server, all the input data will have been pre-assigned and packaged according to the input method. Your program deals (in this execution at least) with this data only or other automatically garnered data but with no further user input before it does it's output and finishes execution. There are ways of persisting data such as cookies, explained in the Advanced section.

This causes a considerable degree of thought to be placed into the one-pass design of the system and perhaps not allow easy transition of a traditional interactive system into a CGI environment. It is possible however to, by ensuring you use functional design, design a CGI application that differs little from a traditional system in overall design/implementation/interface. Without dealing with the specifics of data I/O which are covered later on, let us consider a simple application.

Example: A booking system for something or other. A standard set of data is gathered for all customers. Dependent upon their requirements certain other data will be gathered before the booking is confirmed and a range of output options chosen.

In a traditional application we would most likely use a series of GUI-based forms or data input screens. We would order the input in such a way as to determine which questions should be asked dependent on previous information. It is likely that we would have some form of function to determine which questions needed to still be asked dependent on a data structure that was constantly queried and updated.

The limitations we must consider for deploying this application via CGI are primarily two-fold:

All data must be manually persisted
In a traditional application we would hold data we gathered in a central dataset that was updated as the program is run. With CGI, our program is executed for a single page output and then exits so this model cannot be used. Rather the program must save any data and pass this, along with any user input from HTML forms etc... to another run of the CGI which must again load the data. We can accomplish this with two methods, either to establish some form of individual session identification for the client and save the settings somewhere on the server to be re-loaded or to output all the data in every link or form to be carried to the next execution. There are ways to make the client persist some data using cookies, explained some more in the Advanced Topics section.
CGI cannot do real-time validation
Your CGI application has finished running when the HTML is output for the user. Unlike a traditional interactive application, the CGI framework does not allow you to validate users input in real-time (eg as soon as a user leaves an input field). This is only possible in real-time with the use of Client-side programming techniques (such as JavaScript discussed in Advanced Topics). You can validate the data on the next run of the CGI and re-display the form with data included (for user conveniance) and an error message, but this is still a limitation of CGI applications.

In our example, the CGI application would have to dump all the available data each time for the next execution which would decide which questions to ask before outputting the questions along, again, with all the current data.

As well as a design consideration, an execution overhead is caused by the fact that the code is executed every time rather than running interactively. This primarily means than more complex applications, involving a high degree of object orientation can be cumbersomely slow as they are initialised, loaded with data, executed and then destroyed on every execution.

3. Basic Input and Output

Output from your CGI is sent via stdout to the web server which has in effect system'd your program. Other than header lines this data can be binary format (for example your CGI can generate a binary image file). More complicated header topics such as cookies and redirections are dealt with later in this document, for basic output all you request is the Content-type header. This is a requirement for web servers and clients and part of the CGI standard. This line needs to be buffered with TWO newlines in order to have effect and ends the header stage of the output. The normal Content-type header is:

Content-type: text/html

Which bizarre as it may seen denotes textual html output from this CGI. Following this line and another newline you may start your normal HTML document and print this to stdout using your preferred method.

Input to your CGI comes from two sources: environment variables and standard input (stdin). Lots of information will always be provided by the webserver and available as environment variables to the program (a list of some more common ones is provided later in this document) accessible with getenv(). The majority of these variables will contain connection or environment related data such as the remote IP address or URL being requested, in addition they can be used to pass dynamic data from the user to the CGI.

As mentioned previously CGI's are called using a method as are all documents from a webserver. These methods are GET, POST and PUT. GET is the most common and is a simple request to GET a document with all data encoded in the URL. POST is used to encode data for example to pass to a CGI. PUT is used with file uploads and not documented here.

Using the GET method any data passed to a CGI is encoded in the URL, you may have seen this before with URLs such as /cgi-bin/cgi?Variable=value. The Question Mark in a URL marks the beginning of data or the query string. This is recovered as the environment variable QUERY_STRING.

The POST method requires the web server to pass data to your CGI via stdin. This can be recovered using standard file read functions on the stdin stream.

All data passed to a CGI is urlencoded. Encosion strips out any characters that can be problematic such as spaces or question marks (as that denotes the start of a query string) and replaces them with a % sign and then two digits signifying the characters ASCII hex value. Spaces are a special case and sometimes denoted with %20 (32 decimal ASCII space) or sometimes a plus dependent on the whim of the browser and the like. Dealing with this data is explained in further detail later in this document.

4. First Simple CGI

Ok... The "Goodbye Cruel World" program. Very very simple:

#include <stdio.h>

int main(void)
 {
 printf("Content-type: text/html\n\n");
 printf("<html><title>Hello</title><body>\n");
 printf("Goodbye Cruel World\n");
 printf("</body></html>");
 return 1;
 }

And there you go, compile with something like:

gcc first.c -o first.cgi

See it run by clicking here.

And copy into your web server's CGI directory. You should now be able to run your CGI via your webserver and see the output on your web browser.

This is a nice simple test and should work. If it has drink a celebratory Iron Bru and skip ahead to the next section. Move to the top of the class. If you are having problems however read on.

If you are a Windoze user you may have taken the compilation literally. You will need some form of Windoze compiler (I recommend Borland's BCC) as it's easier to setup under W32 than GCC. You will also need to output the file as something.exe because Windows claims to not relay upon the three letter group but, oh well I shan't carry on about it here.

A 500 Internal Server error isn't good. Does your webserver have the CGI correctly compiled? Can you run the CGI from the command line ok and get the output you should? Are there TWO NEWLINES AFTER THE CONTENT-TYPE HEADER??

If you get prompted to download the program then the webserver is not executing it as a CGI and you need to enable this on your webserver.

5. Dealing With User Input

As stated user input arrives via the GET or POST methods (server variables are available as environment variables regardless of the method). The data is always passed in a URL encoded string either tacked onto the URL with GET or received from stdin with POST.

A URL encoded string is one with certain special characters encoded as a percent sign (%) followed by the two-digit hexadecimal ASCII code (spaces are also represented sometimes by the + sign). Variables are named followed by an equal sign (=) and the value. Variables are seperated with the ampersand (&) character. With GET, this string is tacked onto the URL after a question mark (?) and forms a URL like:
http://server/cgi?Variable=Value+Goes+Here&Variable2=Another+Value

When GET is used to request a CGI the query string is available from the environment variable QUERY_STRING (or you can get the URL with the query string attached with the environment variable REQUEST_URI). POST uses stdin to pass the URL Encoded string and can be read using any standard file input functions such as fgets (or even fscanf shudder). The method used is provided by the server as the environment variable REQUEST_METHOD.

I normally parse this input into a linked list or some such wizardry. For your convinience I have made available my CGI Variable Wrapper for C++ which provides IMHO a simple and easy way of getting hold of user input. The code to decode url strings is provided as part of this package and I shan't bother to list it seperately here (it's in the StringLoad() method in the cgi_interface.cpp file).

6. More Advanced Topics

Redirection
With CGI it is possible to pass the browser a re-direct as part of the header (as an alternative to using meta http-equiv methods). This will automatically redirect the browser to the alternative page and display nothing of the source page (most browsers provide the option to prompt when a redirect is encountered but most have this disabled by default). To cause the browser to redirect simply send the header:

Location: http://www.fullurl.here

And then exit your program. If you want the users to see a message first (for example to inform them the URL they're using is out of date), a delayed http-equiv refresh is a much better option.

URL Encoding a String
URL Encoding a string is quite simple - you must replace certain special characters with their hex codes (or optionally a + sign for space). The hex codes are prefixed by a % sign. Special characters would be those that could confuse the parsing of the input URL. I ensure I encode the following (god knows if it's a full list): ampersand (&), space, plus (+), quotation marks (") and question marks (?). So:

hello "there" world would URL encode into:
hello+%23there%23+world

Please note that I provide free of charge my most cantankerous and Heath Robinson inspired CGI Wrapper which will handle this and much for for you, details of which can be found here. A full list of ASCII characters and their decimal and hex codes is provided in the reference section of this site.

Real Time User Interaction/Validation
If you want the ability to deal with the user in real-time (for example show a popup warning of a range violation when a user tabs out of an input field or to force an input format), you need to rely on some form of client executed system such as JavaScript. Using client executed code allows you to do all sorts of funky things which I shan't go into detail on here.

You have basically four choices: a full blown Java Applet, JavaScript, JScript (Microsoft's bastard interpretation of JavaScript) or VBScript (Microsoft's bastard implementation of a wannabe JavaScript). My preference would be to use JavaScript (even modern MSIE browsers fully support it).

Cookies
Cookies are a method to store persistant data with the client. The system must be suppored by both the client (and enabled; there is much hype over a few relatively minor security considerations about cookies) and the web server (most do). Much information on exactly how this works with different browsers etc... can be found on the Internet. For the purposes of this brief document you only need to know the following:
Cookies are set with a header (before the Content) of the format:

Set-Cookie: name=value; expires=date; path=path; domain=domain; secure

Only the name=value bit is required and can contain anything other than semi-colons or whitespace. Expires is obvious, path the path on the server the cookie relates to (is provided to) i.e. /cgi-bin/ and domain is the domain the cookie relates to (is provided to) i.e. www.company.com or .company.com to provide it to www and www2.company.com).

Cookies are read via the HTTP_COOKIE environment variable and come in one long string of the format:

var1=value; var2=value; var3=value

If a cookie is set more than once (this can happen if, for example, the same cookie name is set for both .company.com and www.company.com - both are seen as seperate cookies by the browser but both are valid and provided to a request to www.company.com) it will appear twice (or however many times) in the list. By default (Firefox and IE with Apache) the cookie is valid for the FQDN of the webserver, for the directory the script that set it came from and for the duration of this session (usually until browser closes).

As of 2.06 onwards my C++ CGI Wrapper has some rudimentary cookie support.

Need More Information?

View FAQ or ask Question
View the Reference Section