Open access and content visibility

I work for CIPD, a membership organisation for HR and Learning and Development (L&D) professionals. And as a membership organisation we provide certain benefits to our members. One of which is privileged access to some content, whereas other content is free to all.

Gated content is invisibleBut there’s a big issue with gated content: it’s invisible. Gated content can’t be found and so isn’t shared. Content whose purpose is to build brand awareness and authority should be free to access.

Search engine ‘spiders’ (the programs that ‘crawl’ and index web pages) cannot reach gated content. Spiders aren’t CIPD members and they don’t have logins! (Registration walls are just as much of a barrier to spiders in this respect.) At CIPD we try to get around this by including a short summary, a sample quote and a brief contents list in HTML on the landing page (i.e. in front of the barrier).

However, this offers insufficient keyword density for optimal page ranking.

By gating content we prevent access to a huge potential audience (and potential membership base). Thought leaders in the Business world who aren’t members can’t access (or share) gated content. Similarly international HR Professionals who aren’t members, negatively impacting our international reach.

Gated content is not disseminated by the news media. The pinnacle of success is when a journalist includes a link to your content. But they are unlikely to do this if the content is for members only.

When non-members look elsewhere for knowledge that they could source from us, but that sits behind a wall, there’s a real risk that another provider will emerge.

Gated content is rarely shared.Gated content is rarely shared on social media. No one will publish a link to a logon screen page. By restricting access to content, you reduce its value to virtually nil.
Mathewson et. al, Audience, Relevance and Search

Research by our in house SEO team confirmed this – open access content has the most shares on social media by some margin.

Fewer links to content harms its PageRank. ‘PageRank is an algorithm used by Google Search to rank websites in their search engine results’ (Wikipedia). Google’s search algorithm takes two primary factors into account when assessing a site’s relevance to a user’s search term. The first is relevance (related to how keywords appear on a page) the second is its PageRank.

On the web, the value of content is directly proportional to how many links point to it.

Audience, Relevance and Search, Mathewson et. al

Google uses external links to a page as its main contextual cue to determining PageRank. Content that is not shared, particularly from other authoritative sites (i.e. with high PageRanks of their own) will not rank well (if at all) on Google’s results pages. On the Internet, such content is invisible.

So, consider carefully what content (if any) you lock behind digital bars. Content behind a paywall or other kind of gateway needs sufficiently rich ‘teaser’ information in front of the wall to entice people in and to make it visible on the web. And to truly maximise relevance and reach, set your content free.

Advertisements

Search Engine Optimisation: building SEO into content strategy

One of the (many) questions I’ve been grappling with lately is how do we make the content we produce at CIPD more ‘impactful’? How do we broadcast our message more successfully? One of those strategies, is of course, SEO.

There are millions of websites vying for our attention like a cacophony of raised voices in which few are heard. If you hope to rise above the noise so your voice can be heard, you need to communicate more intelligently.

Building Findable Websites, Aarron Walter

Search engine companies such as Google make their billions from their exclusive search algorithms. They are of course very protective of the details about how they rank pages. The methods and algorithms by which they crawl the web are continually evolving as they battle the dishonest (‘black hat’) tactics of the spammers. But SEO experts continuously experiment to provide insight into what the search engines are doing.

Without knowing precisely what Google is looking for, we can make a handful of insights, and optimise our content accordingly:

Google tries to match a users search term to ‘keywords’ that appear on a web page. In order to drive high levels of traffic, site owners must determine what those keywords are, and distribute them across the content where readers and search engines can most easily find them.

Google search engines mimic human behaviour on the web. They look for relevant search terms in the places where humans most prominently ‘skim and scan’ (headings, emphasised words, etc.).

Google uses the structure of the web itself to determine a site’s credibility. If other credible sites are linking to yours, there’s a good chance your site offers an authoritative perspective on a subject and your PageRank will rise accordingly.

Introduction to SEO terminology

What is SEO?

Search engine optimisation makes web pages, and the content contained within them, findable.

Increasing the findability of content includes adding the appropriate metadata to the content but also following certain practices and techniques that help search engine spiders find your content. Many of the techniques relate closely to (or often are the same as) good web writing and accessibility guidelines.

Ultimately it is about ensuring that the words on the web page match the words that web users are using to find your (or similar) content.

What are keywords?

Keywords are the strings of characters people enter into search fields. ‘Keywords’ may be a bit of a misnomer because often users enter phrases or word combinations rather than single words, including ‘long-tail keywords’: longer and more complex search strings are often more likely to deliver relevant results.

What are crawlers/spiders?

Search engine crawlers (also known as ‘spiders’) ‘crawl’ the web to find content that might be relevant to the user, based on the search term (keyword) the user has input into the search box.

Crawlers are programs that scan web pages and return information about those pages back to Google (or Bing, etc.). Based on this information, Google will list the pages in an index, ranking it according to its relevance to the search term.

Skimming and scanning

Search crawlers scan pages in the much the same way that readers do. Google has designed its search crawlers to mimic how web users read pages. Much in the same way that readers ‘skim and scan’ for titles, headings and links for key terms, the crawler program scans pages for keywords – those pages with the optimal use and placement of keywords are indexed more highly.

Many of the strategies to optimise pages for web reading (e.g. descriptive, meaningful links) have the same positive effect on search engine spiders. Similarly, many of the strategies for accessibility, also have positive effects on search engine results.

Long-tail keywords

Wired magazine’s editor Chris Anderson coined the term ‘The Long Tail’ (http://www.thelongtail.com/about.html) in an article in 2004. The Long Tail is a marketing concept in which companies sell small volumes of hard-to-find items to high volumes of diverse customers.)

The concept has been extended to keywords. Long tail keywords are like hard to find items—longer key phrases that users take the trouble to type in when they want to find something quite specific (James Matthewson et al. Audience, Relevance and Search).

By using different word combinations and grammatical constructions related to the same subject the user is more likely to find precisely what they are searching for. As web users become more search savvy they are increasingly using long tail keywords in their search terms.

Writing for long-tail keywords mostly follows naturally from writing good web copy. Don’t use the same constructions over and over, but vary them, using synonyms and different grammatical combinations to communicate related concepts.

James Matthewson et al. Audience, Relevance and Search

Use synonyms and other related words to increase the chances of capturing long tail keyword search.

The 4% keyword density rule

Ideally keywords should represent between 2 and 4% of a page’s body copy. Generally in the print world, the rule is to use alternative words rather than use the same word twice in a short piece of text. This rule doesn’t apply to keywords on the web. But beware falling foul of the 4% rule.

Search engines on the look out for ‘black hat’ (i.e. dishonest) practices will penalise pages where keywords appear too often, assuming that the copy has been stuffed with terms in a dishonest effort to improve ranking.

Search engines use a simple formula to determine whether a page has been deliberately stuffed.

Keyword frequency ÷ total number of words in a web page = keyword density

By studying written language, researchers discovered that keyword density is typically not higher than 4%. So a page that has a keyword density of 10% might look suspicious and could possibly receive a penalty in search engine rankings.

You can check the frequency of all terms on a given page at http://www.ranks.nl/cgi-bin/ranksnl/spider/spider.cgi

Keyword tactics

To generate the highest search engine ranking, and consequently the most traffic, aim to insert targeted keywords in the following places:

  • title tag
  • heading tags
  • strong and em
  • link labels
  • file names
  • alt attributes on images
  • table elements
  • abbr
  • first line of the first paragraph of a page
  • meta description
  • within a URL
  • tables

title tag

The title tag is the text that appears at the top of the browser window.

SEo1

From CIPD’s Secondment factsheet, viewed in Safari

It also appears in search engine’s result page.

seo2SEO3

HTML and results page for PESTLE Factsheet

  • Make sure title is written for humans first, search engines second.
  • Keep it concise (less than 12 words).
  • The title should succinctly describe the page’s purpose.
  • Should include relevant and targeted keywords.

Heading tags

Search engines crawl for relevant keywords in heading tags especially h1, h2, and h3.

  • Only use a top-level heading (h1) once on the page.
  • Do not use clever puns and double meanings in headings. Writers often use these strategies to make their headlines more interesting – this works for print magazines and newspapers, but if a heading doesn’t explicitly describe the content below, it will not reach the target audience.

<strong> and <em>

Do not use i (italic) and b (bold). Such formatting tags are redundant markup, they communicate nothing about the hierarchy or semantics of the content.

Instead indicate the presence of a keyword by using strong and em. Use of these semantic tags also improves a web page’s accessibility, as text-to-speech rendering engines will stress the words, which is a reason not to overuse these tags.

Link labels

Links are important. Internal links improve a page’s PageRank. Although not as important as external links, search engines credit pages that include links into them from highly relevant pages from other parts of the site.

Make sure link text match the title/heading/keywords of where they link to – a disconnect will lead to confusion and readers will lose trust in the site. The closer the match between the link and the page it links to, the greater the value search engines will attribute to the link. Don’t be afraid to use long links – the more text in a link the more likely it is to include the words the reader has in mind.

File names

Search engines look for keywords in file names, so use the naming of images and other files (e.g. audio files, Word, etc.) as an opportunity to include relevant keywords.

<alt> attribute on <img> tag

The text encoded in the alt attribute on the image (img) tag is read to users of screen readers; it is also displayed when the image is unavailable for any reason. (WCAG 1.1.1)

Search engines actively index alt text, factoring in this content when evaluating keyword density (i.e. relevancy) of a page.

Table elements

Table elements such as th, caption, and the summary attribute offer additional opportunities for search-engine-friendly content that can add to a page’s keyword density.

<table summary=“Top selling expresso machine”>
<caption>Espresso Impresso’s top-selling commercial espresso machines</caption>

The <th> (table header) tag also communicates an elevated information hierarchy to search engines, and can be a good place to position keywords where relevant.

<abbr>

Using the <abbr> tag (for abbreviations and acronyms) indicates to a search engine the presence of a keyword.

Use the full (spelled out) form of acronyms and abbreviations wherever possible. If the page only includes the acronyms or abbreviations and not the full phrase, search engines may not direct users to the page, even though it may include exactly what they are searching for.

Use of <abbrev>, with title attribute to provide the definition or expanded form of the abbreviation or the acronym, conforms to WCAG 2.0 accessibility criteria (3.1.4).

First line of the first paragraph of a page

Include keywords in the first line of the first paragraph of a page. This is good web writing practice – it ensures the reader, when skimming the text, can tell at a glance that they’ve landed on the information that they are looking for.

Meta description

Search the term ‘PESTLE’ on Google and it comes up with this search result.

SEO3

What’s displayed on the search engine’s result page is taken from the <title> tag, the URL and the first 150 characters in the <meta name="description content=" "> tag.

seo2

SEO4

SEO5

Make sure this meta text encapsulates the page’s purpose and includes prominent keywords.

Long descriptions are truncated, so try to keep it to 150 characters or thereabouts.

Within URLs

Search engines also search for keywords in the URLs, which are generally composed of the Content Management System’s folders and file names.

Generally across a web site, try to make URLs predictable. Define an easy to understand system, e.g. name directories and files with the same name as the navigation labels.

  • Use relevant keywords in folder- and file names.
  • Separate keywords in file and folder names with a hyphen rather than an underscore so search engines can read each word individually rather than as one large word.
  • Try to be brief for folder and file names. Shorter URLs are more convenient and will encourage inbound links/citations in printed materials.

Tables

The <table> summary attribute and <caption> tag present additional opportunities to present keywords to search engines spiders.

<table summary="net employment intentions">
<caption>Net employment intentions for the next three months, by industry</caption>

Where accessibility and SEO intersects

the wealthiest, most influential blind users on the web are the indexing spiders. … Search engine spiders cannot see content within images… and poorly constructed pages just like many screen-reader browsers used by blind users.

Building Findable Websites, Aarron Walter

Accessible content is findable content

Building Findable Websites, Aarron Walter

Many accessibility strategies introduce more content into a page (and more opportunities for keywords) that can be seen and indexed by search engines. For example:

  • Use alt attributes on image files <img> – to provide text that is read by screen readers to print- and visually-impaired users
  • Provide text transcripts for audio and video files (the content of which is otherwise totally hidden from search engines)
  • Use the full (spelled out) form of acronyms and abbreviations wherever possible (see http://www.w3.org/TR/UNDERSTANDING-WCAG20/meaning-located.html).

Create great content

Develop ‘link bait’ to maximise your ‘link juice’

Rather than use metadata (which is was misused by black hatters), Google uses the structure of the web itself to determine the context and relevancy of a page – that is the links to a page. This is called PageRank.

PageRank doesn’t just help Google determine the contextual relevance of a page. It also helps Google evaluate its credibility. A link to a page is often described as a “vote of confidence for that page”. Google users don’t care only about which pages are most linguistically relevant to their search query. They also want the most authoritative information on a topic. 

James Matthewson et al. Audience, Relevance and Search

Google doesn’t treat all links to a page as the same, the quality of the links – that is whether they are from an authoritative site – are important.

As websites become more and more savvy to other SEO techniques (e.g. keyword optimisation), building or encouraging links into your pages is increasingly important. Links into your site are known as ‘link equity’ or ‘link juice’.

The aim is to create ‘link bait’ – great content that other sites want to link to.

…link juice is like citations in a print journal. The more citations an article has in the other writers’ bibliographies, the more credibility it has. A citation is very similar to a link in the sense that the writer who cites another is giving the other writer a vote of confidence. 

James Matthewson et al. Audience, Relevance and Search

And minimise ‘bounce’

The aim is to attract people to your site. But to engage and keep them on the site (rather than bouncing right off without clicking any links) you need to present them with relevant content.

Build plenty of internal links

Although not quite as valuable as external links, internal links generate link equity (or link juice). The closer the link text matches the keywords on the target page, the greater the value search engines will attribute to the link.

Tactics: Other SEO techniques

Technical

  • Check for broken links. Use the W3C validator tool to check broken links http://validator.w3.org/checklink The tool provides a detailed report of the problems and how to fix them.
  • Use semantic, standards-compliant code.
  • Separate page structure (HTML) from formatting (CSS) and behaviour (JavaScript) – place each in a separate file. This also speeds up load times (the CSS and JavaScript cache in the browser and only need to be loaded once).
  • Publish robots.txt (for those pages you don’t want spiders to index e.g. dynamic search result pages that may display improperly without user input, 404 pages, image directories, login pages,) [Note: CIPD does this]
  • Publish sitemap.xml and notify major search engines of your site map file. (Note http://www.sitemaps.org – a standard XML sitemap format that allows websites to communicate their structure to search engines, providing ‘hints for web crawlers to do a better job of crawling your site.’)
  • Create an HTML sitemap page – as well as being useful for users, it helps search engines crawl the site.

Organisational

  • Avoid PDFs Search engines do index PDF files, but HTML delivery offers better SEO opportunities (James Matthewson et al. Audience, Relevance and Search)
  • Do not restrict access to content behind a firewall or log on screens. No one will publish a link to a logon screen page. By restricting access to content, you reduce its value to virtually nil (James Matthewson et al. Audience, Relevance and Search)
  • Have a clear site architecture. …The Google crawler mimics human web navigation, following links in its path through your content. If you design a clear site architecture, you make it easier on the crawler and help ensure that the crawler will find your content and pass its information to the search engine. (James Matthewson et al. Audience, Relevance and Search)
  • Develop robust web governance practice [Organisations need] a collaborative governance system, [otherwise] they end up building parallel experiences and competing with one another for the same audience.’ (James Matthewson et al. Audience, Relevance and Search.) Not to mention competing for the same keywords!
  • Don’t optimise your content for keywords after you’ve written it. The problem here is that there is often a disconnect between the content and the keywords that have been shoehorned in. (Google is also wise to such tactics and may remove the page from its index.) Instead do the keyword research up front and develop content that meets the terms that users are searching for.

(Finally, there’s lots of information on keyword research and using keywords to develop ‘search-first’ site architecture, in Audience, Relevance and Search, Matthewson et al.)