Language, Linguistics, Semantics & Search

Search engines primarily work with words in text content… By definition, they know a lot about the semantics of natural language, about meanings and relationships between words. Various patented techniques analyse the text content across your website in ever more complex ways to assess;

  • what your website is about
  • its relevance to various search queries
  • its respective value compared to other websites

Many people know enough to research what keyword search phrases are in current use. They know which of those are short-tailed or long-tailed phrases, and how to target those exact-match terms across Titles, Descriptions, headings and paragraphs within their website.

However, that’s only part of the equation. Related words and phrases also contribute additional weight and relevancy to the site and adding those into the mix can make a discernible difference. It is certainly a lot more sensible than keyword stuffing! Knowledge is knowing that a tomato is a fruit; wisdom is not putting it in the fruit salad.

Semantics is the branch of linguistics and logic concerned with meaning. The 2 primary facets are;

  1. Logical Semantics: sense and reference, presupposition and implication
  2. Lexical Semantics: the analysis of word meanings and relations between them

Latent semantic analysis (LSA) was patented in 1988. In the context of its application to information retrieval, it is sometimes called Latent Semantic Indexing (LSI). Google has a number of patents involving Synonyms, along with “Reasonable Surfer” and the “Phrase-Based indexing” patents. (*1)

There is a raft of new SEO buzz words emerging; Co-citation, co-occurrence, along with algorithms for Entity Disambiguation, Lexical Co-Occurrence, re-ranking, Phrase-based Indexing, Synonomy, mixed in with Topical PageRank, Ontological Taxonomies and Entity Recognition. (*2)
One could be forgiven for suspecting that before long, Google will have made its algorithms so complex that SEO practitioners will require a PHD in Linguistics just to interpret the new jargon, acronyms and patents!

Several years ago, there was a lot of fuss and bother about what – if anything – Google does with LSI. People obsessed about whether it was only ever a part of Adsense, or applied to organic search results – or not used at all anymore.

Lets skirt around that particular debate, and apply some commonsense to the concept of using related words within the content of a page / website to reinforce the primary keyword phrase being targeted.

Probabilities

Logic suggests that a website selling insurance policies could reasonably be expected to use quite a broad range of related words within the text. Aside from singular and plural policy, policies, there may be industry acronyms & jargon, plus related adjectives and nouns; insure, insured, insurer, premium, premiums, property, contents, cover, coverage, plan, plans, protect, protection, protected, existing, pre-existing, application, applications, pricing, rates, specify, specified… on and on it goes!

For any given website that contains very detailed information about a particular genre, product, service or industry, there is a high probability that it would also include a predictable range of related words, phrases and terms. For example, we might see;

  • word derivations
  • synonyms (*2)
  • hypernyms & hyponyms
  • nouns, verbs and adjective variations
  • jargon & acronyms

Search engine algorithms assess all the words within the site. These algorithms may be bereft of direct human interpretation but are based on mathematics, knowledge, experience and intelligence. They deliver very accurate relevance analysis. In the context of using related words or variations within your website, it is one good way of reinforcing the primary keyword phrase you wish to rank for, without over-use of exact-match keywords and phrases.

By using synonyms, and a range of relevant nouns, verbs and adjectives, you may eliminate excessive repetition and more accurately describe your topic or theme and at the same time, increase the range of word associations your website will rank for. E.g. considering the term “Real Estate” the following options appear;

  • Variations or derived words; realtor, land agent, home for sale, house for sale, land for sale, house & land packages
  • Synonyms – Immovable, real property, realty
  • Hypernym: Belongings, holding, property
  • Hyponym: Acres, estate, land, landed estate
  • Definition: Property consisting of houses and land
  • Adjectives: private, secluded, view, vista, cosy, compact, spacious
  • Related Words: section, plot, land, house, home, buildings, garage, carport, outbuildings, mortgage, finance, home loans, codes, zones, residential, commercial, industrial and many more…
  • Jargon: Listing, 4B2B, closing costs, contingency, fixture, title costs
  • Acronyms: Cvac, Egdo, DOM, MLS, AVB, AVL, ASSG

The purpose of the example is to show that there are usually many ways to expand the description of an activity, product or service. In doing so, you provide additional supporting indicators / descriptors that reinforce the content’s intent and purpose.

The English Language

language semantic indexingA sad fact of life is that the global educational systems seem intent on reducing the English language to its lowest common denominator. Language was once celebrated and its history interpreted in schools to show evolution of word creation and usage. These days, some schools now allow cryptic phonetic spelling (as per SMS/TXT messages and Chat programs) to be used in examinations! As a wordsmith, I find that to be truly appalling! Such butchery of the language is found on websites as well, something that’s both aesthetically unpleasant and sure to minimise your rankings.

I’m not a language snob by any means and often found grammar a struggle at school. I still don’t fully understand (or obsess about) what a “past participle” is, but I do love words for the way you can play on them and with them.

Alternative word usage between countries, spelling differences (colour vs. colour) and word choices (footpath vs. sidewalk) are important aspects to understand when you are working on a website. You need to know your target audience, and what they expect to see.

In optimising pages for search engines, one of the most useful skills one could possess has always been a solid grasp of the English language, and an instinctive ability to accurately and concisely explain complex concepts within 65 and 160 character constraints! Understanding the language allows you to substitute appropriate but shorter words to fit within constraints of Titles, Descriptions and Headings etc…

Spelling & Punctuation

The ability to apply correct spelling is crucial and punctuation is important as it conveys meaning. That said, misspellings can be both a bad thing AND a good thing.

Bad Spelling:

Google et al are quite good at guessing what you meant as distinct from what you typed in a search. However, you should make every effort to check and correct basic (silly) spelling mistakes, simply because it impacts negatively on the image of you and your business… You never get a second chance to create a first impression…

Commonly Misspelled Words:

There are some words which are misspelled on a regular basis and one of those is “accommodation.” Variations such as  “accomodation, accommation, accomadation, accommodations” are not common and for a bed & breakfast or lodge website, it can be useful to include the various misspellings within the site. The volume of misspelled search phrases may vary across countries, but not everyone understands that there’s traffic to be gained from common errors.

If a misspelled word is clearly showing as high-volume in your keyword research, include it within the visible page text. Spell checkers are an integral part of most word processing applications – it makes sense to use them, but they are not without their problems – being correct is not always equal to being right!

I have a spelling checker
It came with my PC
It highlights for my revue
Mistakes I cannot sea.

I ran this poem thru it
I’m sure your pleased to no
Its letter perfect in it’s weigh
My checker told me sew.

– by Mark Eckman

Applying Language Variables

In Your Website

Thoughtfully include variations, derivations, synonyms, hyponyms and hypernyms of your primary keywords and phrases. You will more effectively convince the search engines of your content’s theme than you would by stuffing multiple iterations of the same primary keywords into those page/s.

The proviso is that you must write your textual content in a natural style, for your reading audience, and include additional words only where it makes sense in the context of the sentence / paragraph / page. Use the variations above in the same way you’d use condiments on a meal – sparingly, to add flavour and enhance tastes…

In Your Link Anchor Text

The same logic applies in terms of building links. In this day and age, it is a serious error of judgement to overdo the use of exact-match high-volume keywords in your link-building efforts! Instead, understand that it is safer and more effective to build additional links that include “Brand + A Keyword + Synonyms/Related Words” in the anchor text, and probably safer still to just use the domain name, business and/or brand name. It appears that the not too distant future will include greater emphasis on references to you rather than links to you, as per the co-occurrence theories that are gaining credence! (*3)

Word Tools

Even those with a professional grasp of English language and its usage will need access to research tools to identify appropriate words relevant to the topic they are working on. A couple of big dictionaries can be useful, but lets not make the job any more difficult than it need be.. There are both downloadable and online tools available;

  • The Sage – free download dictionary & thesaurus with synonyms, hyponyms, hypernyms and more!
  • OneLook.com – online dictionary search
  • Thesaurus.com  – online thesaurus search

The lack of affordable, cutting-edge tools is the main reason I decided not to switch professions and become a computer hacker.  I’ve always been a  Leatherman user, but my Micra version lacks the requisite heft… But, I digress…

The Challenges

The problem of deductive interpretation of the English language is compounded by the in-built subtleties and nuances. History clearly shows that Google is continually filing new patents across a broad range of knowledge-based disciplines, and employing an ever-growing army of bright young things to develop more… My expectation for 2013 is that this will continue unabated.

Development of new predictive language models that can identify, translate and deduce implied meaning will not be easy. Implementing those new models into future search algorithms will be equally difficult, and delivering consistently reliable results will be an ongoing challenge, not least because language continually evolves.

Added to that is the even more troublesome aspect of how to apply semantic analysis and interpretive reasoning to distil meaning from euphemisms, comedy, innuendo, sarcasm, satire, slang, parables, parody and wit within the framework of semantic search. The decline in educational standards reported in the US, India, Pakistan and other countries in the past year perhaps explains why few so few people can now detect and interpret whimsy, for example. Human interpretive skills are being eroded by failure in global education systems, making it harder to find people who can model semantic concepts in the digital world…

Summary

Applied semantics will make search engines increasingly smarter. (*4) There are evolving theories on the concept of Lexical Co-Occurrence and Entity Disambiguation which indicate that Google may be putting in place an infrastructure that will enable greater reliance on these as document ranking factors. Conversely, the current reliance on links – domain authority and anchor text – may gradually fade in importance.

Advances in Personalised Search technology means that Googling can already reveal the answers to questions of grave concern – by way of this quoted example;

Google will tell you that the number 1 sign of alcoholism is drinking alone. I feel that the number 1 sign of alcoholism is having to Google ‘number 1 sign of alcoholism.’ ” – Dan Frigolette.

That all this and much more may one day be deduced by matching up your Profile’s “likes & dislikes” on social media databases with predictive search is truly amazing!

In my opinion, trying to keep abreast of the changes in search through 2012 and into 2013 is already contributing to a future statistical spike in applied alcoholism, a view clearly supported by strong growth in Jack Daniels sales world-wide. On March 6 2013, Reuters reported that;  “Worldwide demand for its Jack Daniel’s whiskey helped U.S. distiller Brown-Forman Corp to beat Wall Street profit estimates for a third consecutive quarter, and the company said full-year sales would rise.

There’s no knowing where all of this will take us, but it’s bound to make for another interesting year…

Well, its been a funny old day and I have a wee headache from all those EBWs… (Exceedingly Big Words). Life was so much simpler back in 1997, and being an early adopter of SEO, one thing today’s research showed with chilling clarity is that I’m getting ever-closer to my use-by date! I also admit to muted alarm when I learned about the potential dangers of drinking alone. That said, the house has gone quiet, and my two companions are obviously thirsty. Best I end this article now and pour three liquid ambers – for me, myself and I…

References:

1 – http://www.seobythesea.com/2012/11/not-all-anchor-text-is-equal-other-co-citation-observations/

2- http://www.seobythesea.com/2011/02/more-ways-a-search-engine-might-identify-synonyms-to-expand-queries-with/

3 – http://www.iacquire.com/blog/its-not-co-citation-but-its-still-awesome/

4 – http://searchenginewatch.com/article/2200995/Is-Google-Afraid-of-the-Big-Bad-Wolfram

5- http://www.seomoz.org/blog/semantic-web-and-link-building-without-links-the-future-for-seo