Archive for the ‘ Perl ’ Category

Apache access log url type counter

We can not always set our monitoring system to detect each different error type which may occur on a vast web application platform.
From time to time we need to get our hands dirty, open the good old console and take a look inside the apache log files to see what’s going wrong.

Every real boss will ask you from time to time, what the hell are these increasing 404 status codes on our platform, so I wrote a very simple perl script which counts, depending on a pre-defined url depth, how many times this type of url occurs inside the log file.

my %countHash;
while (<STDIN>) {
foreach $key (sort hashValueAscendingNum (keys(%countHash))) {
   print "\t$countHash{$key}\t$key\n";
sub main {
  my $logLine = shift;
  #(my $domain, my $rfc931, my $authuser, my $TimeDate, my $Request, my $Status, my $Bytes, my $Referrer, my $Agent) = $logLine =~ /^(\S+) (\S+) (\S+) (\[[^\]\[]+\]) \"([^"]*)\" (\S+) (\S+) \"?([^"]*)\"? \"([^"]*)\"/o;
  # get the url out of the apache logs
  (my $Request) = $logLine =~ /^\S+ \S+ \S+ \[[^\]\[]+\] \"([^"]*)\" \S+ \S+ \"?[^"]*\"? \"[^"]*\"/o;
  # cut off GET / POST .... at the begin of the string and HTTP/1.1 .... at the end
  (my $cleanUrl) = $Request =~ /^[^\s]*\s*(\S+)\s/;
  # count similar url classes: here 2. level
  # add '[\/]?[^\/]*' inside the parentesis to differentiate deeper levels
  (my $cmpStr) = $cleanUrl =~ /^(\/[^\/]*[\/]?[^\/]*)/;
  if ($cmpStr ne "") {
    if (!$countHash{$cmpStr}) {
      $countHash{$cmpStr} = 1;
    } else {
sub hashValueDescendingNum {
   $countHash{$b} <=> $countHash{$a};
sub hashValueAscendingNum {
   $countHash{$a} <=> $countHash{$b};

The script also contains the ‘official’ or more important ‘working’ regular expression to cut down the standard apache log-file-lines into pieces.


You have to pipe the log file into this script:

cat /var/log/httpd/access.log|./

But something very important, you can, thanks to the holy ‘grep’, create lists, with specific error types.
To get a list of the most often hitted 404 status pages:

cat /var/log/httpd/access.log|grep '" 404 '|./

Or even you can find out how many bad 503 pages were delivered to our all best friend, Mr. Google:

cat /var/log/httpd/access.log|grep '" 500 '|grep -i 'googlebot'|./

stdout, stdin

Howto redirect the standard input and output of a command shell script:

STDERR and STDOUT together:

    $output = `cmd 2>&1`;

To capture a command’s STDOUT but discard its STDERR:

    $output = `cmd 2>/dev/null`;

To capture a command’s STDERR but discard its STDOUT (ordering is important here):

    $output = `cmd 2>&1 1>/dev/null`;

To exchange a command’s STDOUT and STDERR in order to capture the STDERR but leave its STDOUT to come out the old STDERR:

    $output = `cmd 3>&1 1>&2 2>&3 3>&-`;

To read both a command’s STDOUT and its STDERR separately, it’s easiest to redirect them separately to files, and then read from those files when the program is done:

    system("program args 1>program.stdout 2>program.stderr");

File Tests

If you want to test whether you can write to a file or if a direcory exists,
this may help a little when writing clean scripts with appropriate error messages

So here is a little collection of the most common file test operators in Perl:

File Test Operators
Test 	Meaning
-r 	File or directory is readable by this (effective) user or group
-w 	File or directory is writable by this (effective) user or group
-x 	File or directory is executable by this (effective) user or group
-o 	File or directory is owned by this (effective) user
-R 	File or directory is readable by this real user or group
-W 	File or directory is writable by this real user or group
-X 	File or directory is executable by this real user or group
-O 	File or directory is owned by this real user
-e 	File or directory name exists
-z 	File exists and has zero size (always false for directories)
-s 	File or directory exists and has nonzero size (the value is the size in bytes)
-f 	Entry is a plain file
-d 	Entry is a directory
-l 	Entry is a symbolic link
-S 	Entry is a socket
Test 	Meaning
-p 	Entry is a named pipe (a “fifo”)
-b 	Entry is a block-special file (like a mountable disk)
-c 	Entry is a character-special file (like an I/O device)
-u 	File or directory is setuid
-g 	File or directory is setgid
-k 	File or directory has the sticky bit set
-t 	The filehandle is a TTY (as reported by theisatty()system function; filenames can’t be tested by this test)
-T 	File looks like a “text” file
-B 	File looks like a “binary” file
-M 	Modification age (measured in days)
-A 	Access age (measured in days)