Archive for category Programming
Using Perl and ExifTool to Access EXIF Data in Digital Images
Posted by Spencer in Programming on September 2nd, 2009
Overview
EXIF data in digital images is a fairly complete snapshot of all camera and flash settings at the time of exposure. The information can be extremely useful in many ways, such as automatic image organization; metadata repositories for advanced searching; and unique file renaming, to name a few. In fact, Nikon NEF files have a full-size Basic JPG image stored in the EXIF information, along with a thumbnail-sized preview. Because of this, I never shoot RAW+JPG, since I get that JPG for free with every RAW file anyway — I just have to extract it. This article documents how to access this and other data using Perl and the ExifTool package, written by Phil Harvey (documented on his website, http://www.sno.phy.queensu.ca/~phil/exiftool/). Both are available as free downloads for several platforms, including Linux and Windows XP. I have used both packages on both platforms.
Reading Data
In order to use the package, it must be included at the top of the perl file:
#!/usr/bin/perl -w use Image::ExifTool;
Reading data is fairly simple once a general process is established. Information is organized for the most part in a “key/value” structure, like a hash in perl (and in fact, data returned by the exiftool object is represented by a hash structure). The following perl code reads the Shutter Speed associated with a particular image:
my $srcfile = "/home/spencerkellis/dsc0001.jpg";
my $srcfileExif = new Image::ExifTool;
my @srcfileTagList = ('ShutterSpeed');
my $srcfileInfo = $srcfileExif->ImageInfo($srcfile,@srcfileTagList);
print "Shutterspeed: ".$$srcfileInfo{'ShutterSpeed'};
The code is straightforward enough; line 1 establishes a file (note the absolute path); line 2 instantiates an Image::ExifTool object; line 3 specifies which tags to read; line 4 actually reads the information; finally, line 5 prints the returned value.
A Simple Example: Renaming Files Based on EXIF Date and Time
One of my principle uses of perl and exiftool is to rename files automatically to a unique filename based on date and time, in the following format:
YYYYMMDD-HHMMSS-II.FFF
‘I’ stands for an “index”, to allow for cases in which multiple pictures were taken in the same second. ‘F’ represents the file format extension (i.e., JPG or NEF for Nikon RAW files). The process of renaming files, tedious by hand, is simple and efficient to automate with perl.
#!/usr/bin/perl -w
use File::Spec;
use Image::ExifTool;
my $p = shift;
#extract date from EXIF
my @ioTagList = ('DateTimeOriginal');
my $exifTool = new Image::ExifTool;
$exifTool->Options(DateFormat => "%Y%m%d-%H%M%S-");
my $info = $exifTool->ImageInfo($p, @ioTagList);
#create new filename
my $name = sprintf("%s00",$$info{'DateTimeOriginal'});
my ($pathvol,$pathdir,$pathfile) = File::Spec->splitpath($p);
my $newfile = File::Spec->catfile($pathvol,$pathdir,$date.".JPG");
#write files
if( -f $newfile )
{
print "skipping $p: already exists at $newfilen";
return;
}
if($p ne $newfile)
{
rename($p,$newfile) or die "Error: could not copy $p to $newfile: $!n";
}
Again, a straightforward example with a few extra lines to take note of. In this case,to simplify my perl code, I used an exiftool construct allowing options to be set which govern output format:
$exifTool->Options(DateFormat => "%Y%m%d-%H%M%S-");
This option instructs exiftool to output the date in a specific format corresponding to the filename format I described above. As a simple example, I did not include my method for handling multiple files with the same date and time; the code will simply skip renaming files where the target filename already exists. The sprintf line constructs the string holding the filename, and the File::Spec package is used to split apart and reconstruct paths. The code requires an argument (the path to a file); in my full script, this code is a sub, and passed the filename for each entry in a directory.
Command Line Alternative
After spending quite a lot of time getting to know the exiftool package in perl, I started to think about ways to do it with simpler code. As it turns out, using the command line version of exiftool can result in much cleaner code. Consider the following, which is virtually the same perlscript as above, except executing the exiftool executable instead of in line perl code:
my $p = shift;
#extract date from EXIF
my $date = `exiftool -d %Y%m%d-%H%M%S- -DateTimeOriginal -S -s $p`;
chomp $date;
$date .= "00";
my ($pathvol,$pathdir,$pathfile) = File::Spec->splitpath($p);
my $newfile = File::Spec->catfile($pathvol,$pathdir,$date.".JPG");
#write files
if( -f $newfile )
{
print "skipping $p: already exists at $newfilen";
return;
}
if($p ne $newfile)
{
rename($p,$newfile) or die "Error: could not copy $p to $newfile: $!n";
}
Notice that the entire chunk of code instantiating the exiftool object, etc. has beenreplaced by one line calling the executable in backticks. I haven’t done any testing to analyze which is more efficient, but it does look better, and it’s easier to conceptualize.
EDIT 3 Sept 2009: Updated Command Line Alternative
Thanks so much to Phil Harvey, the author of ExifTool, for reading and commenting on this article. He suggested a much cleaner command-line alternative so disregard the perlscript above!
exiftool -d %Y%m%d-%H%M%S-%%.2c.%%e "-FileName<DateTimeOriginal" FILE
Writing Data
Let’s pose an issue based on a problem I faced some time back. Consider an automated process that copies NEF files into a source directory (all uniquely renamed), and creates a smaller JPG file specifically sized for the web (about 600×400). The process of copying and resizing, however, does not preserve the EXIF data, and I would like to restore the information for possible inclusion into my website’s database.
In order to do this, we need (1) the original EXIF information from the NEF source file; and (2) the ability to write or copy that EXIF data into the destination JPG file. The following perl accomplishes this task. The code below uses an image in the form of a “blob,” in this case what has been returned from the ImageMagick function “ImageToBlob()” which will be discussed in a different article soon.
#!/usr/bin/perl -w use Image::ExifTool; use Image::Magick; $srcfile = "/home/spencerkellis/20051008-133601-00.nef"; #webfile should not exist yet! $webfile = "/home/spencerkellis/20051008-133601-00.jpg"; #process image as needed my $IM = Image::Magick->new(magick=>'jpg'); $IM->Read($srcfile); # ... resize here ... # #create blob my $final_blob = $IM->ImageToBlob(); #copy exif my $exifTool = new Image::ExifTool; $exifTool->SetNewValuesFromFile($srcfile); $exifTool->WriteInfo($final_blob,$webfile);
All of the Image::Magick section will be discussed in a different article. Basically, it’s a package to perform image manipulation in perl (and other languages) the same as if you had opened the image in GIMP or PhotoShop. The last two lines are where the magic happens; after instantiating the exiftool object, the second-to-last line retrieves the EXIF data from the NEF source file, and the last line creates a new file using the image information in $final_blob (the backslash preceding the variable creates a reference to the variable) and the EXIF data already stored in the $exifTool object. $webfile now has the same EXIF data as $srcfile!
Extracting Basic JPG from NEF
Accounting for about 700KB of a NEF’s size (usually around 5MB) is a full-sized Basic JPG file. Incorporating the ability to extract this file into an automated post-processing phase means never shooting RAW+JPG, which further means more space on a compact flash card! Not to mention potential space savings on a hard drive, and potential nightmares keeping track of which files have both NEF and JPG vs. NEF only or JPG only, vs. what has been edited for print or for web… the list goes on, and it’s a battle every digital photographer tackles at some point.
Using Perl and ExifTool, I wrote a script which will extract these basic JPGs automatically. The following code is incorporated into a larger script, but it shows the basic idea. I also want to note that there are simpler ways of running batch jobs to get JPGs out of NEF files; for instance, Ihave used and enjoyed Udi Fuch’s UFRaw package on the command-line in batch mode, and it runs fairly quickly with pretty good output.
#!/usr/bin/perl -w
use Image::Magick;
use Image::ExifTool;
my $srcfile = "/home/spencerkellis/dsc0001.nef";
my $webfile = "/home/spencerkellis/dsc0001.jpg";
my @ioTagList = ('JpgFromRaw');
my $exifTool = new Image::ExifTool;
$exifTool->Options(Binary=>1);
my $info = $exifTool->ImageInfo($srcfile, @ioTagList);
#manipulation
$IM = Image::Magick->new(magick=>'jpg');
$IM->BlobToImage(${$$info{'JpgFromRaw'}});
# ... do manipulation here ... #
$final_blob = $IM->ImageToBlob();
#restore exif
$exifTool->SetNewValuesFromFile($srcfile);
$exifTool->WriteInfo($final_blob,$webfile);
undef $IM;
undef $final_blob;
There are a few things to note here. First, in order to extract binary data (i.e., a JPG image), we have to set the Binary option in the ExifTool object to ‘1′ (verses default 0′). Since in my script I’m manipulating the image using the ImageMagick package, I instantiate the extracted JPG as a new ImageMagick object, do some manipulation, then export it as a blob. Then, since EXIF data has not been preserved, I restore it using ExifTool. The extracted JPG is finally written using ExifTool to the path specified as $webfile.
P.S. This article is a transfer of most of the content of an article on my old website.
Merging Google Syntax Highlighter with TinyMCE
Posted by Spencer in Programming on September 2nd, 2009
Syntax Highlighting in TinyMCE
TinyMCE is a powerful in-browser WYSIWYG editor. It’s used in well-known platforms such as WordPress to allow users the ability to edit blog posts right in the browser. I’ve been using TinyMCE for several years now, but until today I hadn’t found a decent solution for adding code snippets with syntax highlighting inside TinyMCE.
Google SyntaxHighlighter
Enter Google SyntaxHighlighter. You may have noticed the signature appearance on several web-design related websites: nettuts.com, davidwalsh.name, and scriptandstyle.com, just to name a few. It integrates several attractive features – decent syntax highlighting, cross-browser (all javascript/css), and the option to view plain text.
SyntaxHighlighter can be configured to use either ‘pre’ or ‘textarea’ elements (see this discussion for more details on the choice). In either case, add two attributes to the element and you’re set:
<pre name=code class=javascript></pre>
Unfortunately, working with textareas in TinyMCE is awkward at best (consider – what other use could there possibly be for textareas inside a WYSIWYG editor?). Okay, no textareas – no problem, just switch to the pre element! Here’s where TinyMCE’s powerful featureset gets in the way: the ‘name’ attribute isn’t technically supported for the pre tag, and TinyMCE will strip it from your HTML if you try and add it by viewing the code.
The cleanest way to get around this problem is to add an extended_valid_elements to your tinyMCE init, and include pre[name] in the element list. TinyMCE will merge the extended_valid_elements with the default valid_elements to allow the name attribute along with already-allowed attributes.
tinyMCE.init({
mode : "textareas",
theme : "advanced",
extended_valid_elements : "pre[name]"
});
Be aware that caching can make it seem like your changes aren’t making any difference! If in doubt, clear your cache.
RichGuk’s syntaxhl TinyMCE plugin
There’s an easier way than editing HTML every time you want to add a code snippet. I found a great plugin – syntaxhl by RichGuk (installation instructions are included with the download).
By default, his plugin uses textareas but changing it to use pre tags is simple. Edit syntaxhl/js/dialog.js and replace all instances of the textarea tag with a pre tag (there are only two instances, opening and closing tags). The final version is shown below:
f.syntaxhl_code.value = f.syntaxhl_code.value.replace(/</g,'<');
f.syntaxhl_code.value = f.syntaxhl_code.value.replace(/>/g,'>');
textarea_output = '<pre name="code" ';
textarea_output += 'class="' + f.syntaxhl_language.value + options + '" cols="50" rows="15">';
textarea_output += f.syntaxhl_code.value;
textarea_output += '</pre> '; /* note space at the end, had a bug it was inserting twice? */
tinyMCEPopup.editor.execCommand('mceInsertContent', false, textarea_output);
tinyMCEPopup.close();
You’ll need to get the newline and br options set up correctly to preserve whitespace in your code snippets.
Fully Integrated Syntax Highlighting
With syntaxhl integrated and working, I have single-button access to UI-level syntax highlighting. Writing tutorials is infinitely easier with a simple solution for sharing code. If you’re interested, I found a few alternatives along the way that might be better suited to your needs:
- Chili
- GeSHi
- ColourCode
- The Definitive Guide on WordPress Syntax Highlighter Plugins
- Alternatives listed in the SyntaxHighlighter Wiki
- vim
I hope this article has been useful. Let me know in the comments if you have suggestions or questions!
P.S. This article is a transfer of most of the content of an article on my old website.
MATLAB Tips and Tricks
Posted by Spencer in Programming on September 2nd, 2009
Over the course of my graduate work, I’ve spent a fair amount of time with MATLAB. These are a few small tricks I wish I had known from the beginning.
Preallocation
I am a believer in preallocation. For a particular application, I read in about 13GB of data from a file into a 4-D matrix (this was running on a machine with 32GB of memory). Before preallocation, I let the process run for about 10 hours, and it still hadn’t finished. With preallocation the process finished in about 20 minutes. That’s an improvement of at least 3,000%!
Permute
The permute function can be quite handy – it can shuffle around the dimensions in a matrix with a single function call. Consider the following matrix:
m_one = rand([2 5 4000]);
The size of this matrix is reported as 2×5x4000 in MATLAB. Now, we can shuffle the dimensions. Let’s make it a 4000×2x5 array:
m_one = permute(m_one, [3 1 2]);
The permute() function takes the array to shuffle around, and the new order of dimensions. In this case, the 3rd dimension moved to become first, 1st dimension second, and 2nd dimension last so that a 2×5x4000 matrix becomes a 4000×2x5 matrix.
Vector notation
Be careful, though: as handy as permute can be, it’s easy to use it inefficiently. Remember that 13GB 4-D matrix? I ran permute on that, and memory usage immediately doubled. In general, I recommend creating the data the right way first! It will save a lot of headache (and RAM) down the road.
If you desperately need only a subset of dimensions, an alternative solution is to use MATLAB’s built-in, efficient vector notation. For example, to extract the first and third dimensions for a single 2nd-dimension element, just use
m_two = m_one(:,1,:);
The one downside here is that you’ll end up with an annoying singleton dimension that can frustrate other builtin functions like plot. The squeeze function will rescue us.
Squeeze
Squeeze is cool. After running the previous code, size(m_two) shows us that m_two is a 4000×1x5 matrix. We could use indexing to access all these elements, but squeeze will make life much easier – it will remove the singleton dimension in the middle.
m_two = squeeze(m_two);
Now, size(m_two) tells us we’ve got a 4000×5 matrix and using the matrix just got that much simpler.
Removing elements from vectors and matrices
There are times when you want to discard elements from a vector or matrix. I used to do this by creating a new variable to hold just the elements I wanted to keep. Obviously, there’s a better way. Let’s remove all the elements of a matrix that are less than 0.5. It’s insanely easy:
m_three = rand(1,1000); m_three( m_three < 0.5 ) = [];
Now, size(m_three) gives 1x2023.
Conclusions
If you haven't noticed yet, MATLAB is all about the matrices. Understanding how to efficiently operate on subsets of matrices will give you huge returns in performance. Learn how and when to use permute, squeeze, and vector notation and you'll be well on your way. Anything else you think should be on this page? Let me know in the comments!
P.S. This article is a transfer of most of the content of an article on my old website.










