9. Files and Directories

Created Wednesday 23 April 2014

BASIC FILE HANDLING

Opening and Reading a File

open FILEHANDLE, MODE, FILENAME
open FILEHANDLE, FILENAME # deprecated...

Reading Files

my $filename = 'chapter_9/targets.txt';
open my $spies_to_espy, '<', $filename

or die "Cannot open '$filename': $!";

use autodie;
my $filename = 'chapter_9/targets.txt';
open my $filehandle, '<', $filename;

Given a file:
James|007|Spy
Number 6|6|Ex-spy
Agent 99|99|Spy with unknown name
Napoleon Solo|11|Uncle spy
# This guy is only rumored to exist. Not everyone believes it.
Unknown|666|Maybe a spy
Example of parsing this file:
use strict;
use warnings;
use diagnostics;
my $filename = 'chapter_9/targets.txt';
open my $spies_to_espy, '<', $filename

or die "Cannot open '$filename': $!";
while (my $line = <$spies_to_espy>) {
next if $line =~ /^\s*#/; # skip comments
chomp($line);
my ($name, $case_number, $description) = split /\|/, $line;
print "$name ($case_number): $description\n";
}
close $spies_to_espy or die "Could not close '$filename':$!";

Writing Files

To add Maxwell Smart as a new target in targets.txt:
open my $fh, '>>', $filename
or die "Cannot open '$filename' for appending: $!";
print $fh "Maxwell Smart|86|Definitely a spy\n";
Rewriting the file:
my $filename = 'chapter_9/targets.txt';
open my $fh, '<', $filename
or die "Cannot open '$filename' for reading: $!";
# each element in @lines gets one line from the file
my @lines = sort grep {!/^\s*#/} <$fh>;
close $fh or die "Cannot close '$filename':$!";
open $fh, '>', $filename
or die "Cannot open '$filename' for writing $!";
print $fh @lines;
close $fh or die "Cannot close '$filename': $!";

my $filename = 'chapter_9/targets.txt';
open my $fh, '+<', $filename
or die "Cannot open '$filename' in read-write mode: $!";
my @lines = sort grep {!/^\s*#/} <$fh>;
seek $fh, 0, 0
or die "Cannot seek '$filename', 0, 0: $!";
print $fh @lines;
truncate $fh, tell($fh)
or die "Cannot truncate '$filename':$!";
close $fh or die "Cannot close $filename: $!";

File Test Operators

Expensive
my $filename = 'somefile';
if (-e $filename && -f $filename) { ... }
Less expensive
if (-e $filename && -f _ && -r _) { ... }
In Perl 5.9.1 or better you may stack the operators:
if (-e -f -r $filename) { ... }

The Diamond Operator

use strict;
use warnings;
while (<>) {
next unless /\S/;
print;
}
Call it with perl myprog.pl file1.txt file2.txt file3.txt, then it prints out every "nonblank" line from each of those files

Temporary Files

use File::Temp 'tempfile';
my $fh = tempfile();
# or, if you also need the name:
my ($fh, $filename) = tempfile();
# If you need a particular suffix for the file:
my ($fh, $filename) = tempfile(SUFFIX => '.yaml');

DATA as a File

use strict;
use warnings;
use diagnostics;
use Data::Dumper;
my %config;
while (<DATA>) {
next if /^\s*#/; # skip comments
next unless /(\w+)\s*=\s*(\w+)/; # key = value
my ($key, $value) = ($1, $2);
if (exists $config{$key}) {
# we've already seen this key, so convert the value to an array reference
# Does $config{$key} currently store a scalar or an aref?
if (! ref $config{$key}) {
$config{$key} = [$config{$key}];
}
push @{$config{$key}} => $value;
} else {
$config{$key} = $value;
}
}
print Dumper(\%config);
__DATA__
# max_tries = 3
max_tries = 2
timeout = 30
# only these people are OK
user = Ovid
user = Sally
user = Bob

binmode

my $image = 'really_cool.jpg';
open my $fh, '<', $image
or die "Cannot open '$image' for reading: $!";
binmode $fh; # treat it as a binary file

DIRECTORIES

opendir (my $dh, $directory)
or die "Cannot open '$directory' for reading: $!";
# get all entries not starting with a dot
my @entries = grep {!/^\./} readdir($dh);
closedir $dh
or die "Cannot close '$directory': $!";

Globbing

3 ways to list all directory entries with a .txt extension
1: using opendir():
use strict;
use warnings;
use autodie;
my $dir = 'drafts/';
opendir(my $dh, $dir);
my @txt = grep {/\.txt$/} readdir($dh);
print join "\n", @txt;
closedir $dh;
2: using glob():
use strict;
use warnings;
use autodie;
my $dir = 'drafts';
my @txt = glob("$dir/*.txt");
print join "\n", @txt;
3: using <>:
use strict;
use warnings;
use autodie;
my $dir = 'drafts';
my @txt = <$dir/*.txt>; # no quotes!
print join "\n", @txt;

UNICODE

Decoding '7bit-jis' to Perl's internal format on a string-by-string basis:
use Encode qw(encode decode);
my $string = decode('7bit-jis', $byte_string);
Decoding at the source using Perl's IO layers with binmode():
open my $fh, '<', $some_file or die $!;
binmode $fh, ':7bit-jis';
Or with the mode:
open my $fh, '<:7bit-jis', $some_file or die $!;

Encoding the string:
use Encode qw(encode decode);
my $encoded = encode('7bit-jis', $string);
Using IO layers:
open my $fh, '>:7bit-jis', $some_file or die $!;

Outputting UTF-8:
use utf8;
my $string = 'שלום';
my $length = length($string);
binmode STDOUT, ':encoding(UTF-8)';
print "$string has $length characters\n";
If you don't want to apply encoding layer to all of STDOUT:
use utf8;
use Encode qw(encode decode);
my $string = 'שלום';
my $length = length($string);
$string = encode('UTF-8', $string);
print "$string has $length characters\n";
Making input UTF-8:
use utf8;
use Encode qw(encode decode);
my $string = decode('UTF-8', shift);
my $length = length($string);
$string = encode('UTF-8', $string);
print "$string has $length characters\n";

Case Folding

Converting Between Encodings

my $string = decode('UTF-16', $utf16_data);
my $latin1 = encode('iso-8859-1', $string);

Assuming Everything is UTF-8

is_utf8()

use Encode 'is_utf8';
if (is_utf8($string)) {
# wrong!
}
OR
if (utf8::is_utf8($string)) {
# wrong!
}

A UTF-8 Shortcut

Printing Unicode

use utf8::all;
use charnames ':short';
# double-quoted strings are required
print "\N{greek:Sigma} is an upper-case sigma.\n";
Using full names:
use utf8::all;
use charnames ':full';
print "\N{GREEK SMALL LETTER ETA WITH DASIA AND PERISPOMENI}\n";
Using the code point:
use utf8::all;
print "\N{U+263A}\n";
Or use the chr() function:
print chr(0x263a);

Unicode Character Properties and Regexes

use utf8::all;
my $character = 'ἧ';
if ($character =~ /\p{Lowercase}/) {
print "$character is lower case\n";
}
if ($character =~ /\p{Uppercase}/) {
print "$character is upper case\n";
}

Further Reading

USEFUL MODULES

File::Find

One way to delete all empty text files in a directory and its subdirectories:
use File::Find;
find(\&wanted, 'some_directory/');
sub wanted {
if (/\.txt$/ && -f $_ && -z _) {
# only delete empty text files
unlink $_ or die "Could not unlink '$file::Find::name': $!";
}
}
Collect the names for later use:
find (\&html_documents, @directories);
my @html_docs;
sub html_documents {
push @html_docs, $File::Find::name if /\.html?$/;
}

File::Path

use autodie ':all';
use File:Path qw(make_path remove_tree);
make_path('path/to/create/', 'another/path/to/create');
remove_tree('path/to/remove');

File::Find::Rule

Find HTML documents:
my @html_docs = File::Find::Rule
->file
->name(qr/\.html?$/)
->in(@directories);
Find empty files:
my @empty = File::Find::Rule->file->empty->in(@directories);
Converting a project from the Subversion to git, deleting all of Subversion's annoying .svn dirs:
use File::Path 'remove_tree';
use File::Find::Rule;
my @svn_dirs = File::Find::Rule->directory->name('.svn')->in($dir);
foreach my $svn_dir (@svn_dirs) {
remove_tree($svn_dir)
or die "Cannot rmdir($svn_dir): $!";
}
Or write it like this:
File::Find::Rule->find->directory->name('.svn')->exec(sub {
my ($short_name, $directory, $fullname) = @_;
remove_tree($svn_dir) # ?!!!!
or die "Cannot rmdir($svn_dir): $!";
})->in(@directories);
Using iterators:
my $find = File::Find::Rule
->file
->name(qr/\.html?$/)
->start(@directories);
while (defined(my $html_document = $find->match)) {
# do something with $html_document
}
Print all files greater than a half meg:
File::Find::Rule
->file
->size('>.5M')
->exec(sub {
my ($short_name, $directory, $fullname) = @_;
print "$fullname\n";
})->in(@ARGV);
Recursively Printing a Directory Structure
use strict;
use warnings;
use autodie ':all';
use File::Spec::Functions qw(catdir splitdir);
# The starting dir will be passed on the command line, otherwise, use the current dir
my $dir = @ARGV ? $ARGV[0] : '.';
unless (-d $dir) {
die "($dir) is not a directory";
}
print_entries($dir, 0);
exit 0;
sub print_entries {
my ($dir, $depth) = @_;
my @directories = grep {$_} splitdir($dir);
my $short_name = $directories[-1];
my $prefix = '| ' x $depth;
print "$prefix$short_name/\n";
opendir(my $dh, $dir);
# grab everything that doesn't start with a .
my @entries = sort grep {!/^\./} readdir($dh);
foreach my $entry(@entries) {
my $path = catdir($dir, $entry);
if (-f $path) {
print "$prefix|--$entry\n";
} elsif (-d _) {
print_entries($path, $depth + 1);
} else {
# skip anything not a file or directory
}
}
}



Backlinks: