Keeping your ebooks regular, how to use regular expressions
If you’ve never heard of a regular expression and you’re about to jump into the world of ebook formatting, it's time to learn about regex.
If you’ve never heard of a regular expression, or regex as boffins call them, and you’re about to jump into the world of ebook formatting you might like to become familiar with the concept.
Regular expressions “provide a concise and flexible means for matching strings of text, such as particular characters, words, or patterns of characters” (Wikipedia).
I think they are the coolest thing when it comes to ebook formatting; they also come in handy for tidying up messy Word imports into apps such as InDesign (CS3 onward).
Boning up on regular expressions is a must if you’re about to step into digital publishing through the likes of Amazon’s Digital Text Platform, unless of course you outsource the code conversion as we do.
They will save you massive amounts of time in presenting nice clean code to Amazon — or the Sony Ebookstore, or Kobo, or the iBookstore …
An example. If your source code contains a lot of hyperlinks without the matching HTML tags:
www.redhillpublishing.com
This is missing the anchor tags and should look like:
<a href="http://www.redhillpublishing.com">Red Hill Publishing</a>
A regular expression can turn the first instance of the URL into the second. Big deal? It is if you need to convert a few hundred links as I recently did. Here’s the code I used:
(\b([\d\w\.\/\+\-\?\:]*)((ht|f)tp(s|)\:\/\/|[\d\d\d|\d\d]\.[\d\d\d|\d\d]\.|www\.|\.tv|\.ac|\.com|\.edu|\.gov|\.int|\.mil|\.net|\.org|\.biz|\.info|\.name|\.pro|\.museum|\.co)([\d\w\.\/\%\+\-\=\&\?\:\\\"\'\,\|\~\;]*)\b)
But regular expressions can be used for more mundane tasks such as applying anchor links between the start of chapters and a table of contents, or performing global changes on variable text ranges where a simple find and replace won’t work.
Bing regular expression and you’ll find a huge array of resources.