Building a business case for COPE (part one): organisational agility

In my last post (Create Once, Publish Everywhere: what does it all mean?) we looked at the key principles of COPE and how we are planning to implement them (at least in broad terms) at CIPD.

Over the coming weeks I’ll be following up on that post by rehearsing some of the arguments for COPE at CIPD: what benefits exactly will it bring? Although specific to one particular business – the CIPD – our case for COPE will undoubtedly strike a chord with other content rich organisations looking to persuade their leadership teams to invest in a modern content management capability.

Why it’s difficult to build a business case for content

Building a business case for content isn’t straightforwardWe’re currently putting together a business case for Project Athena: a significant investment for CIPD in terms of money, time, resource and organisational change. But building a business case for content isn’t straightforward. The difficulty we face is that content is notoriously hard to cost. There are some measurable production savings, for example DTP (page layout) and print. But – in common with most organisations – time spent creating, managing, reviewing, searching for, updating, (and so on) content is not tracked at CIPD. Therefore establishing precise cost savings and return on investment is tricky.

To build a more nuanced argument for COPE we must look beyond the balance sheetWe still need to demonstrate the tangible cost savings and potential revenue opportunities, but to build a more nuanced argument for COPE we must look beyond the balance sheet. Substitute ‘COPE’ for ‘content strategy’ in Kristina Halvorsen’s famous quote, and you get the picture.

Content strategy defines how you’re going to use content to meet your business goals and satisfy your users’ needs.

Kristina Halvorsen, Content Strategy for the Web

So what we’ll be looking to do is build a business case that demonstrates how COPE (Project Athena) aligns to (and helps to deliver) CIPD’s strategic imperatives and satisfies our customers’ needs. But beyond that even, how COPE will help CIPD to not only cope with, but thrive in, these challenging economic times. And that’s what we’ll be looking at in this first post on building a business case for COPE.

Agility: building a business case that aligns with our corporate values

Agility is one of CIPD’s core organisational values. As an Institute and as employees, we try to be Purposeful, Agile, Collaborative and Expert in everything we do. So by couching COPE in terms of ‘agility’ I’m hoping to align our content strategy with our corporate values. But it’s not for nothing that agility is one of our core values. To remain viable and relevant in the 21st century, businesses must become agile. It’s not for nothing that agility is one of our core values

First we’ll look at the economic factors at play that are driving this need for organisational agility. We’ll go on to look at what I believe are the three crucial ingredients needed to develop organisational agility with a particular focus on the need for technical agility. And in the next post we’ll look specifically at how COPE builds business agility. Because let’s face it – it really is crazy out there!

Why do organisations need to be agile? Or: ‘Hey it’s crazy out there!’

Volatility is the new reality

According to the Economist Intelligence Unit’s 2009 report, the seismic events of 2008 introduced a new phase of globalisation ‘one in which volatility is likely to remain a constant’. The recession may be lifting in some markets, but

… underlying fluctuations in energy, commodity and currency rates, the emergence of new and non-traditional competitors, and rising customer demands will continue to roil traditional business and operating models for some time to come.

Economist Intelligence Unit, Organisational agility: how business can survive and thrive in turbulent times (2009)

The end of competitive advantage

In her book The End of Competitive Advantage, Professor Rita Gunther McGrath argues that the very concept of competitive advantage is no longer relevant. In this VUCA world of ours, business behemoths that once appeared invincible to the vagaries of the market, adapt too slowly to changes in technology, in consumer taste, and disappear. Kodak is the example most often cited, crashing from market leader to yesterday’s news, due to its inability to adapt to the digital revolution.

The rise of freeconomics

If you’re not using free, you will be competing with free

A business model most closely associated with Internet start up companies, it is best exemplified by services such as Skype, Dropbox, Flicr, Evernote, etc. Originally coined by venture capitalist Fred Wilson, freemium is the combination of the words ‘free’ and ‘premium’. The core product is given away for free to a large user base; a small group (often under 10%) of users pay for the premium service.

With the majority of content and services given away for free, freemium inevitably poses a significant threat to ‘traditional’ businesses. And it’s not going away any time soon: Pew Research Center’s Digital Life in 2025 predicts the continued ‘disruption of business models established in the 20th century (most notably impacting … publishers of all sorts and education).’

Organisational agility: strategies for volatile times

This storm is only going to get stormierIncreasingly top executives understand that it is not simply by battening down the hatches, by tightening control of operating costs, that they will weather the economic storms. This storm is only going to get stormier; some additional strategies are required.

According to that report by the EIU, ‘organisational agility is a core differentiator in today’s rapidly changing business environment. Nearly 90% of executives surveyed … believe that organisational agility is critical for business success.’

In fact agile businesses can grow revenues up to 37% faster and than their non-agile competitors (Peter Weill, MIT).

So what makes an organisation agile?

Businesses must be constantly vigilant to threats and opportunities, and prepared, agile if you will, to respond as they emerge.

It seems to me that the three crucial ingredients of organisational agility are an agile mindset, agile working practices and technical agility.

The agile mindset

This agile mindset is defined very nicely in CIPD’s Shaping the Future research (2011) as the ‘ability to stay open to new directions and be continually proactive, helping to assess the limits or indeed risks of existing approaches and ensuring that leaders and followers have an agile and change-ready mindset to enable them and ultimately the organisation to keep moving, changing, adapting’.

Change, adaptation and re-calibration is a constantThe agile mindset acknowledges that change, adaptation and re-calibration is a constant, and that rather than a threat offers opportunities for learning, growth and new ventures both at an organisational as well as a individual level.

The agile workplace

According to CIPD’s Getting Smart about Agile Working report (2014) agile working is

A set of practices that allow businesses to establish an optimal workforce and provide the benefits of a greater match between the resources and the demand for services, increased productivity, and improved talent attraction and retention.

This means flexible working arrangements in terms of when people work, where they work and what they do.

Added to this is the percolation of some of the theories of agile methodology from software development teams through to the more general working environment.

Agile teams rely on self-organisation, iterations, customer centricity, knowledge sharing and collaboration, and mutual trust.

CIPD Getting Smart about Agile Working (2014)

This is something we’re seeing introduced at CIPD: for example we’re working out loud, across silos, with projects broken up into 90-day sprints.

Building technical agility

The third and final component of organisational agility is, inevitably I suppose, around the investment in and optimisation of a business’s information technology.

Businesses must look to invest in technologies that ‘improve decision making and convert information into insight’ (EIU, 2009); systems that support knowledge gathering and analysis (e.g. customer and market insights and business intelligence), as well as knowledge sharing and collaboration tools.

Of equal importance is the technology that supports an organisation’s core business, its ‘true engine of growth’ (EIU, 2009). And in this it plays two roles. Firstly the optimisation of production processes (the ‘whittling away at inefficiency’ and secondly, the ‘re-grouping around what is truly core to the business’ (EIU, 2009).

CIPD is a knowledge business; our content is a key strategic asset. We need to be re-grouping around content, investing in technology that supports the efficient production, management and dissemination of content.

The agility paradox

There is higher agility in firms with more digitized and standardized business process and platform.

Enterprise Architecture as Strategy: Creating a Foundation for Business Execution, Ross, Weill & Robertson, Harvard Business School Press, June 2006

Ironically, it is through standardisation that a business builds agilityIronically, it is through standardisation that a business builds agility – Peter Weill’s ‘agility paradox’. Funnily enough this echoes loudly in my head Rachel Lovinger’s quote ‘ironically, it’s more structure that makes content nimble and sets it free.’ (Nimble Report.) We’ll be looking at this more when we explore content agility, another piece of the business case argument coming soon.

The four categories of agility (three out of four ain’t bad)

In the Agility Paradox, Peter Weill (quoting Jeanne Ross, MIT CISR Research Workshop May 2006) talks about the four categories of business agility. COPE satisfies three out of the four, which let’s face it, ain’t half bad!

four categories of business agility

1. Business Efficiency Agility

As we’ll see in future posts, COPE introduces a number of process efficiencies to the content workflow, in terms of the creation, management, translation, transformation, publication and distribution. It is also a scalable solution, at least in IT terms – the people side of the equation, resourcing and skills, will need more careful consideration.

2. Business Model Agility

With its excellent translation and localisation capabilities (which we’ll explore more in a future post looking at how COPE aligns to our strategic imperatives) the implementation of COPE at CIPD will support new business models, particularly the ability to enter new global markets. Additionally ‘intelligent’ content – atomised, free of system- and presentation- formatting, semantically rich and structurally logical – will enable us to deliver content to new platforms, channels and partners as they emerge.

3. New Product Agility

The necessity of process efficiency is the mother of invention As we shall see in the next post (COPE and Business Agility), COPE supports business innovation, with its seemingly infinite capacity to combine and re-combine atoms of content and transform them into new customer propositions. It would seem that the necessity of process efficiency is the mother of invention.

Open standards – the ultimate technical agility

And lastly, but by no means least-ly, that technical agility needs to be built upon open standards and open platforms. (There’ll be a post on this in a few weeks’ time.) It is only when the systems you invest in can connect to and easily communicate with one another that businesses can claim to be truly technically agile.


Create Once, Publish Everywhere: what does it all mean?

Create Once, Publish Everywhere. COPE: a great acronym and a lofty ambition. But what does it actually mean? And what does it mean for the organisation that I work for – CIPD?

The name we’ve given the initiative to introduce COPE at CIPD is ‘Project Athena’. (We like to use Greek Gods as our project ‘code names’ at CIPD! We already have Caerus and Aurora in the pantheon.) Athena is the goddess of wisdom and the goddess of war, which seems appropriate, as we’re going to war on the chaos of content.

I’m currently working on the business case for COPE at CIPD. We’re looking for a large investment: not just in terms of money, but also time, resource and organisational change. Which means it had better be a convincing business case!

In this first blog post, I’ll explore the philosophy behind COPE, and in subsequent posts I’ll be rehearsing some of the arguments I’ll be using in the business case. Many of these are arguments and scenarios that we’ve been using for a while to help colleagues better understand the concept and how it will benefit our organisation.

What is COPE?

COPE: a great acronym and a lofty ambitionThe term Create Once, Publish Everywhere was devised back in 2009 to describe the content management strategy Daniel Jacobsen and his team were developing at US National Public Radio. (See Jacobsen’s blog post for a full account of COPE at NPR.)

COPE is a philosophy rather than a blueprint; it is, as Jacobson says ‘agnostic as to the build or buy/integrate decision’. COPE is a single source, multi-channel content management approach based on the following principles:

  1. Content should be encoded with semantic meaning, but be free of display and system-specific formatting.
  2. Content should be broken into, stored and managed as modular units.
  3. Content should be managed via a CMS (content management system) rather than from web-publishing tools.
  4. Content is published via an API (Application Program Interface) distribution layer.

COPE has become the benchmark for content-rich organisations that wish to originate and manage content from a single source for distribution in many combinations, across multiple platforms, devices and channels.

Diagnosing the problem

sick twitter bird icon

Infobesity: a cute name for a debilitating condition (Alex Ewerlöf)

CIPD is an organisation with a lot of content: a veritable chaos of content! We’re suffering from an ‘infobesity crisis’, corporate cognitive overload.

Our corporate knowledge assets are isolated ‘content islands’, stranded from one another, managed in many versions, in different formats on shared drives. We have no way of interrogating our content, often resulting in re-commissioning of content that already exists, and making it difficult to speak with a consistent voice, confusing customers and damaging the brand.

We need our content to do a lot of things for us, to educate, inspire, inform. We have a diverse portfolio of content – learning, research, informational, textbooks. And a diverse range of content outputs, formats, channels and platforms that must satisfy a broad range of customers with increasingly sophisticated digital habits and expectations.

It is increasingly difficult for our current content management workflows to support CIPD’s strategic objectives.

At CIPD, we’ve been working with Content Strategy consultancy Mekon to come up with a COPE framework that will work for our business.

How will we COPE at CIPD?

The four foundational principles of COPE at CIPD are as follows:

  2. Global metadata and taxonomy
  3. Content models
  4. CCMS and API layer

These map to NRP’s COPE principles in the following way:

  • DITA XML and the global metadata framework encodes content with structural and semantic meaning, but leaves it free of system-specific formatting.
  • Content models allow for content to be broken into, stored and managed as modular units.
  • The proposed tools and technology will manage the primary content in a CCMS rather than a web-publishing tool, or e-learning platform, etc.
  • An API layer will handle distribution to onward channels: website, Virtual Learning Environment, etc.


A much healthier looking avian: the DITA birdie.

A much healthier looking avian: the DITA birdie.

Darwin Information Typing Architecture (DITA) is an open source XML data model for marking up and publishing content. It takes the name ‘Darwin’ because it has been built so that it can evolve to support a broad range of information types, from technical communication (where it originated) to books, journals and educational materials.

DITA supports the re-use of content. It treats content as discreet ‘chunks’, so that product might be built by pulling the relevant topics together by means of a ‘map’. Topics can then be included (reused) in many maps (documents).

Compared to other XML publishing standards, such as DocBook or NLM (for journals), DITA is more agile and more suited to the modularisation required of COPE and multi-channel publishing.

The other significant advantages of DITA XML are that it supports the automated updating of that re-used content, cascading changes to wherever the topics are referenced, as well as audience profiling, allowing for the filtering, flagging and display of content according to certain audience attributes (e.g. persona, qualification level, job type).

Global metadata framework

Metadata is the encoded knowledge of your organisation.

Metadata is the glue … that allows computers to be ‘smart’. It’s the stuff that makes ‘intelligent content’ intelligent.

Ann Rockley

Metadata provides additional context or information that tells software programs how to handle content. XML is not concerned with display; it is the presentation layer (CSS or XSL-FO) that uses the instructions encoded in the content (the metadata) to deliver the content in a specific way according to the device and platform it is being accessed on. It encodes structural, descriptive and administrative information (for the CCMS and workflow tools) so that organisations can retrieve, track and report on their content. Metadata enables customisation, to target specific chunks of content for particular devices or audiences segments. It helps internal users to find content and external users to find what they are looking for, chase elusive ideas, and make connections.

CIPD already had a metadata framework in place before we began our content strategy initiative. But metadata was applied to the web pages in the Web CMS rather than the source content. Only a small subset of CIPD’s content is managed in the WCMS, so it is impossible to interrogate the corporate content assets as a whole.

The implementation of COPE at CIPD includes a global metadata framework that can be applied across all content, systems and enterprise documents, and a new corporate taxonomy.

Corporate taxonomy

We’ve been working with information strategy consultancy Metataxis on our global metadata framework. As part of that work they’ve been developing a new corporate taxonomy.

A taxonomy is, in its simplest sense, the words we use to describe ourselves and the areas in which we operate. Our ‘domain space’ if you will. Our taxonomy will be fundamental to our ability to search our content within and across all of our internal and customer-facing content platforms. Coupled with a smart CRM strategy (which we’re also currently embarking on at CIPD), taxonomy will enable us to make connections between customers’ experiences and preferences and the content we deliver to them.

The taxonomy we’re currently building is of course a snapshot of CIPD in 2014. So we’ll be investing in a ‘Thesaurus’ metadata management tool, and setting up workflows, governance and QA processes to build the agility to flex and adapt the taxonomy as our domain space – CIPD and the world of work – changes.

Content models

One of the founding principles of COPE is that content is broken into, stored and managed as modular units. DITA works particularly well in this regard, based as it is on the precept that content is managed as topics. A content model describes the framework for how to architect and ‘chunk’ the content into intelligent components, so that it might be managed, re-used and re-assembled.

Content models should always be the output of a dialogue between content technologists and the business Content models should always be the output of a dialogue between content technologists and the business, and involve an audit of existing content and a judgement as to what an organisation might want to do with its content going forward. Content modelling must ask the questions:

  • Where and when might we want to re-use these chunks of content?
  • What are the potential destinations and audiences?

Content models also act as authoring templates to ensure content types are consistent and ‘on brand’.

CCMS and API layer

We’re currently reviewing candidate Component Content Management Systems (I’ll be blogging on this process in a few week’s time). But whichever platform we end up investing in, the principles are the same.

  • Content will be authored in a simple tool – authors will be writing in DITA XML, in chunks, but they won’t see any code.
  • Within this authoring environment our Subject Matter Experts – the people writing the content – can add the right taxonomical labels (the terms that describe what it is that they are creating). And assuming our onward channel delivery platforms are interoperable, based on open standards, that structure and meaning will flow through the content value chain, on its journey to our customers, however they find or consume it. (See my forthcoming post on Open standards and open platforms.)
  • The CCMS will handle version control, workflow, updates, and re-use. And when we say create once, ‘publish everywhere’ we mean publish everywhere. We will be able to efficiently manage translation and localisation workflows, supporting CIPD’s international ambitions.
  • The XML is sent to a transformation engine (DITA OpenToolkit) for conversion to different outputs – to e-book and PDF. And from PDF we can print.

But to truly exploit the potential of the CCMS – what I like to think of as the ‘foundational’ content technology – we will need to invest in an API layer, some kind of content server that can handle delivery to the various platforms and destination channels. This is what’s known as dynamic publishing, real-time delivery of targeted, personalised content to customers based on their individual interests and preferences.

Come along for the ride

So this is the journey we’re embarking on at CIPD. I’ll be blogging about it as we navigate the highs and the lows (hopefully not too many lows). I hope these posts will be of use to you if you’re interested in content strategy, or if you’re trying to implement something similar in your own organisation. If so: good luck and <foreign xml:lang="fr">bon chance</foreign> ;-).

Open access and content visibility

I work for CIPD, a membership organisation for HR and Learning and Development (L&D) professionals. And as a membership organisation we provide certain benefits to our members. One of which is privileged access to some content, whereas other content is free to all.

Gated content is invisibleBut there’s a big issue with gated content: it’s invisible. Gated content can’t be found and so isn’t shared. Content whose purpose is to build brand awareness and authority should be free to access.

Search engine ‘spiders’ (the programs that ‘crawl’ and index web pages) cannot reach gated content. Spiders aren’t CIPD members and they don’t have logins! (Registration walls are just as much of a barrier to spiders in this respect.) At CIPD we try to get around this by including a short summary, a sample quote and a brief contents list in HTML on the landing page (i.e. in front of the barrier).

However, this offers insufficient keyword density for optimal page ranking.

By gating content we prevent access to a huge potential audience (and potential membership base). Thought leaders in the Business world who aren’t members can’t access (or share) gated content. Similarly international HR Professionals who aren’t members, negatively impacting our international reach.

Gated content is not disseminated by the news media. The pinnacle of success is when a journalist includes a link to your content. But they are unlikely to do this if the content is for members only.

When non-members look elsewhere for knowledge that they could source from us, but that sits behind a wall, there’s a real risk that another provider will emerge.

Gated content is rarely shared.Gated content is rarely shared on social media. No one will publish a link to a logon screen page. By restricting access to content, you reduce its value to virtually nil.
Mathewson et. al, Audience, Relevance and Search

Research by our in house SEO team confirmed this – open access content has the most shares on social media by some margin.

Fewer links to content harms its PageRank. ‘PageRank is an algorithm used by Google Search to rank websites in their search engine results’ (Wikipedia). Google’s search algorithm takes two primary factors into account when assessing a site’s relevance to a user’s search term. The first is relevance (related to how keywords appear on a page) the second is its PageRank.

On the web, the value of content is directly proportional to how many links point to it.

Audience, Relevance and Search, Mathewson et. al

Google uses external links to a page as its main contextual cue to determining PageRank. Content that is not shared, particularly from other authoritative sites (i.e. with high PageRanks of their own) will not rank well (if at all) on Google’s results pages. On the Internet, such content is invisible.

So, consider carefully what content (if any) you lock behind digital bars. Content behind a paywall or other kind of gateway needs sufficiently rich ‘teaser’ information in front of the wall to entice people in and to make it visible on the web. And to truly maximise relevance and reach, set your content free.

DITA and the structured writing controversy

Some time last year my boss sent me a link to this blog post by Tom Johnson, blogger on technical writing. Possibly to provoke me (kidding! ;-)) but more probably to get me to pause momentarily and check all this ‘COPE’ and ‘DITA’ stuff was in fact the right direction for our organisation to be taking. After all, implementing COPE successfully is a significant undertaking. (Understatement of the year!)

Why can’t we just use the WCMS to manage all our content?In fact Tom Johnson’s post generated a lot of controversy in the technical writing community, and he has since gone back to more clearly articulate his original ideas. But his post echoed an argument that I’ve subsequently heard again and again from my colleagues. If we’re going to be investing in revamping our website, why can’t we just use the WCMS to manage all our content?

Tom Johnson’s point was that structured writing, particularly DITA, isn’t particularly suitable for the web; in fact it is overkill if your output is simply or primarily a single website. And on that point, I would probably agree. However, for CIPD, an ‘actual’ website (‘like the one you’re reading now’ Johnson says), is only one of many outputs.

We can’t manage all of our content in the WCMS, or in a VLE (virtual learning environment), because they are onward channel platforms. They are designed to manage and publish content in a way that is specific to the web, or to eLearning.

And at CIPD we have a uniquely diverse publishing portfolio, with a wide diversity of publishing outputs – textbooks, digital learning, face to face training, informational web content, research papers, and so on.

In terms of formats, we’ll need HTML for mobile apps, e-books, web, VLE. We’ll need PDF for print. But we also need to target our content to different audiences, devices and destinations. We’ll need snippets of content to take flight on social media. We want to re-use and re-mix content we’ve already developed (say an excerpt from a research paper) in new products and offerings (say in a training course).

The number of outputs, formats and uses will only increaseThe number of outputs, formats and uses will only increase. And customers will expect to be able to consume content in the way that is most convenient for them, be it print, e-book, mobile-optimised web, on the platform of their choice, the VLE, a responsive website, a CPD platform, the list goes on… and who knows what’s coming around the corner. The ‘Internet of Things’ is almost upon us. We need our content to be ‘future proof’.

Johnson is partly right when he says that ‘transforming structured XML is a burden’. But so is any transformation of content into publication-ready formats. Even ‘traditional’ typesetting and printing are both highly complex processes. Just because they are usually managed out of house doesn’t make them any less ‘burdensome’. And there are tools out there that minimise the burden of digital transformations considerably, assuming you understand the underlying principles and the tool’s capabilities.

Johnson says that organisations working in DITA (or other structured formats) need to hire a ‘dedicated publishing engineer’ to handle the transforms – or contract someone ‘at high cost’. But ideally by developing XML expertise in house we can directly control the multiplicity of outputs and get content to market quickly, without compromising the interoperability/reusability of the original content asset.

Structured writing isn’t going away. It’s only going to become more important in the digital multiverse.Structured writing isn’t going away. It’s only going to become more important as the amount of content that businesses generate and the number of devices and channels explodes. DITA is still quite new outside of the technical world, and in many ways, what we are proposing at CIPD is quite radical. But my prediction is that more and more content-rich organisations will be looking to the discipline of content strategy (and by extension structured content) as a way to manage their content in the digital multiverse.

Search Engine Optimisation: building SEO into content strategy

One of the (many) questions I’ve been grappling with lately is how do we make the content we produce at CIPD more ‘impactful’? How do we broadcast our message more successfully? One of those strategies, is of course, SEO.

There are millions of websites vying for our attention like a cacophony of raised voices in which few are heard. If you hope to rise above the noise so your voice can be heard, you need to communicate more intelligently.

Building Findable Websites, Aarron Walter

Search engine companies such as Google make their billions from their exclusive search algorithms. They are of course very protective of the details about how they rank pages. The methods and algorithms by which they crawl the web are continually evolving as they battle the dishonest (‘black hat’) tactics of the spammers. But SEO experts continuously experiment to provide insight into what the search engines are doing.

Without knowing precisely what Google is looking for, we can make a handful of insights, and optimise our content accordingly:

Google tries to match a users search term to ‘keywords’ that appear on a web page. In order to drive high levels of traffic, site owners must determine what those keywords are, and distribute them across the content where readers and search engines can most easily find them.

Google search engines mimic human behaviour on the web. They look for relevant search terms in the places where humans most prominently ‘skim and scan’ (headings, emphasised words, etc.).

Google uses the structure of the web itself to determine a site’s credibility. If other credible sites are linking to yours, there’s a good chance your site offers an authoritative perspective on a subject and your PageRank will rise accordingly.

Introduction to SEO terminology

What is SEO?

Search engine optimisation makes web pages, and the content contained within them, findable.

Increasing the findability of content includes adding the appropriate metadata to the content but also following certain practices and techniques that help search engine spiders find your content. Many of the techniques relate closely to (or often are the same as) good web writing and accessibility guidelines.

Ultimately it is about ensuring that the words on the web page match the words that web users are using to find your (or similar) content.

What are keywords?

Keywords are the strings of characters people enter into search fields. ‘Keywords’ may be a bit of a misnomer because often users enter phrases or word combinations rather than single words, including ‘long-tail keywords’: longer and more complex search strings are often more likely to deliver relevant results.

What are crawlers/spiders?

Search engine crawlers (also known as ‘spiders’) ‘crawl’ the web to find content that might be relevant to the user, based on the search term (keyword) the user has input into the search box.

Crawlers are programs that scan web pages and return information about those pages back to Google (or Bing, etc.). Based on this information, Google will list the pages in an index, ranking it according to its relevance to the search term.

Skimming and scanning

Search crawlers scan pages in the much the same way that readers do. Google has designed its search crawlers to mimic how web users read pages. Much in the same way that readers ‘skim and scan’ for titles, headings and links for key terms, the crawler program scans pages for keywords – those pages with the optimal use and placement of keywords are indexed more highly.

Many of the strategies to optimise pages for web reading (e.g. descriptive, meaningful links) have the same positive effect on search engine spiders. Similarly, many of the strategies for accessibility, also have positive effects on search engine results.

Long-tail keywords

Wired magazine’s editor Chris Anderson coined the term ‘The Long Tail’ ( in an article in 2004. The Long Tail is a marketing concept in which companies sell small volumes of hard-to-find items to high volumes of diverse customers.)

The concept has been extended to keywords. Long tail keywords are like hard to find items—longer key phrases that users take the trouble to type in when they want to find something quite specific (James Matthewson et al. Audience, Relevance and Search).

By using different word combinations and grammatical constructions related to the same subject the user is more likely to find precisely what they are searching for. As web users become more search savvy they are increasingly using long tail keywords in their search terms.

Writing for long-tail keywords mostly follows naturally from writing good web copy. Don’t use the same constructions over and over, but vary them, using synonyms and different grammatical combinations to communicate related concepts.

James Matthewson et al. Audience, Relevance and Search

Use synonyms and other related words to increase the chances of capturing long tail keyword search.

The 4% keyword density rule

Ideally keywords should represent between 2 and 4% of a page’s body copy. Generally in the print world, the rule is to use alternative words rather than use the same word twice in a short piece of text. This rule doesn’t apply to keywords on the web. But beware falling foul of the 4% rule.

Search engines on the look out for ‘black hat’ (i.e. dishonest) practices will penalise pages where keywords appear too often, assuming that the copy has been stuffed with terms in a dishonest effort to improve ranking.

Search engines use a simple formula to determine whether a page has been deliberately stuffed.

Keyword frequency ÷ total number of words in a web page = keyword density

By studying written language, researchers discovered that keyword density is typically not higher than 4%. So a page that has a keyword density of 10% might look suspicious and could possibly receive a penalty in search engine rankings.

You can check the frequency of all terms on a given page at

Keyword tactics

To generate the highest search engine ranking, and consequently the most traffic, aim to insert targeted keywords in the following places:

  • title tag
  • heading tags
  • strong and em
  • link labels
  • file names
  • alt attributes on images
  • table elements
  • abbr
  • first line of the first paragraph of a page
  • meta description
  • within a URL
  • tables

title tag

The title tag is the text that appears at the top of the browser window.


From CIPD’s Secondment factsheet, viewed in Safari

It also appears in search engine’s result page.


HTML and results page for PESTLE Factsheet

  • Make sure title is written for humans first, search engines second.
  • Keep it concise (less than 12 words).
  • The title should succinctly describe the page’s purpose.
  • Should include relevant and targeted keywords.

Heading tags

Search engines crawl for relevant keywords in heading tags especially h1, h2, and h3.

  • Only use a top-level heading (h1) once on the page.
  • Do not use clever puns and double meanings in headings. Writers often use these strategies to make their headlines more interesting – this works for print magazines and newspapers, but if a heading doesn’t explicitly describe the content below, it will not reach the target audience.

<strong> and <em>

Do not use i (italic) and b (bold). Such formatting tags are redundant markup, they communicate nothing about the hierarchy or semantics of the content.

Instead indicate the presence of a keyword by using strong and em. Use of these semantic tags also improves a web page’s accessibility, as text-to-speech rendering engines will stress the words, which is a reason not to overuse these tags.

Link labels

Links are important. Internal links improve a page’s PageRank. Although not as important as external links, search engines credit pages that include links into them from highly relevant pages from other parts of the site.

Make sure link text match the title/heading/keywords of where they link to – a disconnect will lead to confusion and readers will lose trust in the site. The closer the match between the link and the page it links to, the greater the value search engines will attribute to the link. Don’t be afraid to use long links – the more text in a link the more likely it is to include the words the reader has in mind.

File names

Search engines look for keywords in file names, so use the naming of images and other files (e.g. audio files, Word, etc.) as an opportunity to include relevant keywords.

<alt> attribute on <img> tag

The text encoded in the alt attribute on the image (img) tag is read to users of screen readers; it is also displayed when the image is unavailable for any reason. (WCAG 1.1.1)

Search engines actively index alt text, factoring in this content when evaluating keyword density (i.e. relevancy) of a page.

Table elements

Table elements such as th, caption, and the summary attribute offer additional opportunities for search-engine-friendly content that can add to a page’s keyword density.

<table summary=“Top selling expresso machine”>
<caption>Espresso Impresso’s top-selling commercial espresso machines</caption>

The <th> (table header) tag also communicates an elevated information hierarchy to search engines, and can be a good place to position keywords where relevant.


Using the <abbr> tag (for abbreviations and acronyms) indicates to a search engine the presence of a keyword.

Use the full (spelled out) form of acronyms and abbreviations wherever possible. If the page only includes the acronyms or abbreviations and not the full phrase, search engines may not direct users to the page, even though it may include exactly what they are searching for.

Use of <abbrev>, with title attribute to provide the definition or expanded form of the abbreviation or the acronym, conforms to WCAG 2.0 accessibility criteria (3.1.4).

First line of the first paragraph of a page

Include keywords in the first line of the first paragraph of a page. This is good web writing practice – it ensures the reader, when skimming the text, can tell at a glance that they’ve landed on the information that they are looking for.

Meta description

Search the term ‘PESTLE’ on Google and it comes up with this search result.


What’s displayed on the search engine’s result page is taken from the <title> tag, the URL and the first 150 characters in the <meta name="description content=" "> tag.




Make sure this meta text encapsulates the page’s purpose and includes prominent keywords.

Long descriptions are truncated, so try to keep it to 150 characters or thereabouts.

Within URLs

Search engines also search for keywords in the URLs, which are generally composed of the Content Management System’s folders and file names.

Generally across a web site, try to make URLs predictable. Define an easy to understand system, e.g. name directories and files with the same name as the navigation labels.

  • Use relevant keywords in folder- and file names.
  • Separate keywords in file and folder names with a hyphen rather than an underscore so search engines can read each word individually rather than as one large word.
  • Try to be brief for folder and file names. Shorter URLs are more convenient and will encourage inbound links/citations in printed materials.


The <table> summary attribute and <caption> tag present additional opportunities to present keywords to search engines spiders.

<table summary="net employment intentions">
<caption>Net employment intentions for the next three months, by industry</caption>

Where accessibility and SEO intersects

the wealthiest, most influential blind users on the web are the indexing spiders. … Search engine spiders cannot see content within images… and poorly constructed pages just like many screen-reader browsers used by blind users.

Building Findable Websites, Aarron Walter

Accessible content is findable content

Building Findable Websites, Aarron Walter

Many accessibility strategies introduce more content into a page (and more opportunities for keywords) that can be seen and indexed by search engines. For example:

  • Use alt attributes on image files <img> – to provide text that is read by screen readers to print- and visually-impaired users
  • Provide text transcripts for audio and video files (the content of which is otherwise totally hidden from search engines)
  • Use the full (spelled out) form of acronyms and abbreviations wherever possible (see

Create great content

Develop ‘link bait’ to maximise your ‘link juice’

Rather than use metadata (which is was misused by black hatters), Google uses the structure of the web itself to determine the context and relevancy of a page – that is the links to a page. This is called PageRank.

PageRank doesn’t just help Google determine the contextual relevance of a page. It also helps Google evaluate its credibility. A link to a page is often described as a “vote of confidence for that page”. Google users don’t care only about which pages are most linguistically relevant to their search query. They also want the most authoritative information on a topic. 

James Matthewson et al. Audience, Relevance and Search

Google doesn’t treat all links to a page as the same, the quality of the links – that is whether they are from an authoritative site – are important.

As websites become more and more savvy to other SEO techniques (e.g. keyword optimisation), building or encouraging links into your pages is increasingly important. Links into your site are known as ‘link equity’ or ‘link juice’.

The aim is to create ‘link bait’ – great content that other sites want to link to.

…link juice is like citations in a print journal. The more citations an article has in the other writers’ bibliographies, the more credibility it has. A citation is very similar to a link in the sense that the writer who cites another is giving the other writer a vote of confidence. 

James Matthewson et al. Audience, Relevance and Search

And minimise ‘bounce’

The aim is to attract people to your site. But to engage and keep them on the site (rather than bouncing right off without clicking any links) you need to present them with relevant content.

Build plenty of internal links

Although not quite as valuable as external links, internal links generate link equity (or link juice). The closer the link text matches the keywords on the target page, the greater the value search engines will attribute to the link.

Tactics: Other SEO techniques


  • Check for broken links. Use the W3C validator tool to check broken links The tool provides a detailed report of the problems and how to fix them.
  • Use semantic, standards-compliant code.
  • Separate page structure (HTML) from formatting (CSS) and behaviour (JavaScript) – place each in a separate file. This also speeds up load times (the CSS and JavaScript cache in the browser and only need to be loaded once).
  • Publish robots.txt (for those pages you don’t want spiders to index e.g. dynamic search result pages that may display improperly without user input, 404 pages, image directories, login pages,) [Note: CIPD does this]
  • Publish sitemap.xml and notify major search engines of your site map file. (Note – a standard XML sitemap format that allows websites to communicate their structure to search engines, providing ‘hints for web crawlers to do a better job of crawling your site.’)
  • Create an HTML sitemap page – as well as being useful for users, it helps search engines crawl the site.


  • Avoid PDFs Search engines do index PDF files, but HTML delivery offers better SEO opportunities (James Matthewson et al. Audience, Relevance and Search)
  • Do not restrict access to content behind a firewall or log on screens. No one will publish a link to a logon screen page. By restricting access to content, you reduce its value to virtually nil (James Matthewson et al. Audience, Relevance and Search)
  • Have a clear site architecture. …The Google crawler mimics human web navigation, following links in its path through your content. If you design a clear site architecture, you make it easier on the crawler and help ensure that the crawler will find your content and pass its information to the search engine. (James Matthewson et al. Audience, Relevance and Search)
  • Develop robust web governance practice [Organisations need] a collaborative governance system, [otherwise] they end up building parallel experiences and competing with one another for the same audience.’ (James Matthewson et al. Audience, Relevance and Search.) Not to mention competing for the same keywords!
  • Don’t optimise your content for keywords after you’ve written it. The problem here is that there is often a disconnect between the content and the keywords that have been shoehorned in. (Google is also wise to such tactics and may remove the page from its index.) Instead do the keyword research up front and develop content that meets the terms that users are searching for.

(Finally, there’s lots of information on keyword research and using keywords to develop ‘search-first’ site architecture, in Audience, Relevance and Search, Matthewson et al.)