Utilities:Scripts:FTPWebLog

FTPWebLog 1.0.2

What is FTPWebLog?

FTPWebLog 1.0.2 is a freeware integrated WWW and FTP log reporting tool. Its primary inspiration was the wwwstat program written by Roy Fielding.

While a good program - wwwstat has some design flaws that make it unsuited for use by large sites as released - notably difficult reconfiguration of reports, bad handling of characters that should be escaped, difficulty in making it support additional log formats, poor support for multiple servers, and the rather 'after the fact' retro-fitting of graphic reports to it.

My experience using and heavily customizing wwwstat led me to conclude that I needed a new program written from the ground up for flexibility: FTPWebLog was the result.

wwwstat still does some things that FTPWebLog does not - most notably filtering of reports by date. On the flip side, FTPWebLog does several things that wwwstat does not and is much easier to customize to match a sites particular needs.

What does a FTPWebLog report look like?

I have an example of a report online. This report is a full report with all report sections activated and graphs. The text section is about 230 Kbytes. Each major section can be selectively disabled, and re-ordering the sections is simply a matter of changing the order of a half dozen calling lines.

For example, a 'stats lite' version of the same report above is easily generated by extracting the needed information from the full report. It is only 24 Kbytes.

How can I get it?

You can download the current version distribution, ftpweblog-102a.tar.gz, right from this web page.

If you want to do graphical reports, you will need some additional support:

Follow the directions given with each of those packages to install them. Once the required graphics support is in place, configuration of 'ftpweblog' is easy.

Configuration

Almost all the options are explained directly in the source for 'ftpweblog' and 'graphftpweblog'. Here is a short general guide that should let you get up.

Identify where your access_log is stored. Change $LogFile in the 'ftpweblog' program to point to it.

If using 'graphftpweblog', set $GraphFTPWebLogURL in the 'ftpweblog' program to point the URL where you intend to put the graphic report html file generated by 'graphftpweblog'.

Make any directories that will be used by 'graphftpweblog' to store the gif files it generates.

Run 'ftpweblog' - directing its output to a file:

ftpweblog > stats.html

If using graphftpweblog, run it - also directing its output to a file.

graphftpweblog > graphs.html

You should now have a report. That easy. By fine tuning the report options, you can make it as short or as in depth as you like.

The Command Line Options for FTPWebLog

Nearly every report option that can be set from inside the script can be set using command line options:

ftpwwwlog [-h] [-i pathname] [-t www|ftp] [-x perlregex] [-X perlregex] [-r perlregex] [-R perlregex] [-A 0|1] [-H 0|1] [-f N] [-d N] [-S 0|1] [-D 0|1] [-F 0|1] [-N systemname] [-T perlregex] [-B perlregex] [-Q quota] [-q quotarate] [logfile ...] [logfile.gz ...] [logfile.Z ...]

Display Options

-h
Just display the usage help message and quit.

Input Options

-i pathname
Include the 'pathname' file (assumed to be a prior ftpweblog output). in the report. Only one preexisting report can be included per run right now.
[logfile ...] [logfile.gz ...] [logfile.Z ...]
Process the listed sequence of logfiles.
-t www|ftp
Select whether the log files are to be processed are FTP log or NCSA Common Log format
-g URL
The URL of the of GraphFTPWebLog output html file(if using GraphFTPWebLog)

Log Search Options

-x regex
Only include domain names matching the perl regex in the report
-X regex
Do not include any domain name matching the perl regex
-r regex
Only include refs to files matching the perl regex
-R regex
Do not include refs to files matching the perl regex
-A 0|1
Print Daily stats (0=do not, 1=do)
-H 0|1
Print Hourly stats (0=do not, 1=do)
-f N
Print Top N Files (0=do not)
-d N
Print Top N Domains (0=do not)
-S 0|1
Print summary report (0=do not, 1=do)
-F 0|1
Print full file listing (0=do not, 1=do)
-D 0|1
Print full domain listing (0=do not, 1=do)
-L 0|1
Print top level domain report (0=do not, 1=do)
-N name
Name for report
-T regex
Filter top N file list to exclude files matching the regex
-B regex
Blank this pattern in filenames. Useful for stripping extra path from cache defeating CGI scripts.
-Q quota
Volume Quota in bytes (0=no quota). A extremely basic accounting feature. Lets you automatically charge for excessive volume.
-q quotarate
Quota Rate in meg/day over volume quota. Assumed to be in dollars.

The Command Line Options for GraphFTPWebLog

graphftpwwwlog [-h] [-A 0|1] [-B regex] [-D 0|1] [-d N] [-f N] [-H 0|1] [-N name] [-P directory] [-U URL] [-R regex] [-r regex] [-X regex] [-x regex] [filename]

GraphFTPWebLog processes a FTPWebLog report and produce graphss of the information in it. An HTML web page connecting them together is sent to STDOUT

Display Options

-h
Just display the usage help message and quit.

Common Options

-P directory
Directory where the graph files are to be stored.
-U URL
Base URL where the graph files can be accessed
-A 0|1
Graph Daily stats (0=do not, 1=do)
-B regex
Blank out partial URLs matching the regex. This can be used to 'defragment' URLs that use extended paths (such as cache defeating CGI programs).
-D 0|1
Graph top level doamins (0=do not, 1=do)
-d N
Graph Top N Domains (0=do not)
-f N
Graph Top N Files (0=do not)
-H 0|1
Graph Hourly stats (0=do not, 1=do)
-N name
System name for report. It iss inserted into the title and a h1 header for the report.
-R regex
Filter out URLs matching regex from the top N files graph
-r regex
Include only files matching the perl regex
-X regex
Filter domains matching regex in top N domains graph
-x regex
Include only doamains matching the perl regex
filename
The file where an already generated FTPWebLog report has been stored.

Putting it all together

Here is an example of a script to analyze a log and generate both a full report, and a 'lite' report - both linked to a graphic report.

         #!/bin/bash
         
         cd /home/users/snowhare/bin/stats # Where I keep the FTPWebLog scripts
         
         # Directory where I am going to keep all my stats
         basestatsdir="/usr/local/lib/httpd/htdocs/statistics"
         
         # Location of my access_log
         sourcelog="/usr/local/lib/httpd/logs/access_log"
         
         # Name of my server
         name="www.someplace.com"
         
         # Type of log I am processing (www or ftp)
         type="www"
         
         #Name of the full stats report
         statsfile="$basestatsdir/$httpstats.html"
         
         # Genate a FULL stats report, all reports.
         
         ./ftpweblog -t "$type" -N "Web Log Report for $name" \
                 -d 40 -D 1 -L 1 -f 40 -F 1 -S 1 -A 1 -H 1 \
                 -g "/statistics/graph.html" \
                 $sourcelog > ${statsfile}.$$
         mv ${statsfile}.$$ ${statsfile}  # Doing the two step to keep the time
                                           # when there are NO stats to a minimum
         
         # Generate a stats lite
         # Only the Summary, Daily, Hourly and Top Level domains.
         
         litestatsfile="$basestatsdir/httpstats-lite.html"
         ./ftpweblog -t "$type" \
                 -N "Lite Web Log Report for $name" -i $statsfile \
                 -d 0 -D 0 -L 1 -F 0 -f 0 -S 1 -A 1 -H 1 \
                 -g "/statistics/graph.html" \
                 /dev/null > ${litestatsfile}.$$
         mv ${litestatsfile}.$$ ${litestatsfile} # Doing the two step to keep the time
                                                 # when there are NO stats to a minimum
         
         # Make the graphical log report.
         
         ./graphftpweblog -N "Graphical Web Log Report for $name" 
                         -U "/statistics" \
                         -P "$basestatsdir" \
                         -A 1 -D 1 -d 40 -f 40 -H 1 \
                         $statsfile > $basestatsdir/graph.html
         
         # Just to be sure file permission are correct
         chmod 644 $litestatsfile $statsfile $basestatsdir/graph.html
         chmod 644 $basestatsdir/*Stats.gif
         

Getting sophisticated

A number of sites are now running multiple servers. By taking advantage of the command line options you can tailor the reports for each server - in fact you can even make seperate reports for different sections of a single server. When doing that - I recommend making one 'with the works' report with all reports turned on, and then using the ability to read old reports to efficiently extract special interest reports. This is much faster than generating new reports from the original access_log.

Note: You can't extract domains and meaningfully associate them with a file sections from an old log report. You have to do that particular trick using the original access_log. You can extract domains from an old log report for analysis OR extract file names from an old report and have it mean something. But not both.

An example

Let's say you have a user named 'johndoe' on your server. You could get a report on *just* his pages by using:

ftpweblog -t www -N 'Web Pages for John Doe' -D 0 -d 0 -L 0 -i fullreport.html -r '^/~johndoe' /dev/null > johndoe.html

Breaking it down:

-t www
Specifies this report as being about a WWW server. Not strictly needed since we aren't actually reading a log file.
-N 'Web Pages for John Doe'
This sets the title of the report to 'Web Pages for John Doe'
-D 0
Suppress the full domain report because it would be meaningless
-d 0
Suppress the top 40 domains report because it would be meaningless
-L 0
Suppress the top level domains report, again because it would be meaningless
-i fullreport.html
Specifies to read the file 'fullreport.html' for an already created FTPWebLog report
-r '^/~johndoe'

Only include files that have paths that start with /~johndoe

This is an extremely powerful feature - you can use it to extract reports on graphic files, individual users, and archive sections.

/dev/null
Read the current 'log' from '/dev/null'. Just an easy trick to let you focus on the prepocessed report you already made without having to process a real access_log.
> johndoe.html
Put this extracted report in the file 'johndoe.html'

This is the final version of 1.0.2.

You will also find in this distribution a 'ftpweblog-103a1' file - this is an experimental version of FTPWebLog that supports Apache's mod_config_log module and improves FTPWebLog's memory management (you should save TONS of memory now if you turn off the domain related reports). You should be able to directly copy your 'LogFormat' directive value into the appropriate line and have the program parse your custom log format. It is nowhere near complete - it does work.