sabato 24 gennaio 2015

Perl, printf and qw to rescue!

When dealing with fixed/padded strings, nothing is better in my opinion of the printf family of functions.
However, the printf has a couple of problems when trying to format complex data, especially if compared to pack().
The first problem is that the formatting string could result very hard to read; for instance consider the following one:

qq(%-4s%1s%09d%1s%-50s%-50s%1s%08d%-4s%-16s%-100s)

The second problem is that it cannot handle easily errors in field types, and this often happens when cycling thru a file and formatting each line according to a specific formatting string. Consider again the above formatting string: what happens if the third field is not a valid number on each line of the file you are processing? Perl simply compains, or better, printf() complains about an error.

One solution I found that can help solving both problems is to dynamically build the formatting string from an array of single atoms. So for instance I specify the above formatting string as follows:

$format_specs = [ qw(%-4s %1s %09d %1s% -50s% -50s %1s %08d %-4s %-16s %-100s) ];

and then later I use something like:

printf join '', @{ $format_specs }, @fields;

Why should it be better than using a single pre-formatted string?
Well, first of all, having extracted each formatting pattern into an array allows for better readibility (I can even add comments to each atom to remember what it means). Second, and most important, I can check each field read from the input file and see if it comply the formatting atom. For instance, to check for a number:

for my $index ( 0..$#format_specs ){
  warn "Error on field $index, expected $format_specs[ $index ]\n" 
     if ( $format_specs[ $index ] =~ /d/ && $fields[ $index ] !~ /\d+/ );


Of course it is possible to build a more robust checking around each field, but the usage of an array of formatting atoms allows for a quick and iterative checking of the field nature, as well as ad-hoc error reporting.

Nessun commento: