Semantic Search | TF-IDF | Language & Linguistics

Keywords, TF TDF and semantic search - conceptual image
Keywords, TF TDF and semantic search

Semantic search is important because search engines primarily work with keywords within in-text content elements… By definition, they know a lot about the semantics of natural language, about meanings and relationships between words. Using synonyms – where appropriate – can assist in guarding against keyword stuffing. Various patented algorithms and techniques analyse the text content across your website in ever more complex ways to assess;

  • What your website is about and what do your topics mean?
  • Relevance to various search queries.
  • Respective value compared to other websites through E-E-A-T assessments.

Keyword Research…

The best webmasters know enough about SEO to do basic research on what keyword search phrases are in current use. The best SEO agency will also know which of those are short-tailed or long-tailed phrases, transactional or informative and how to target those keyword terms across Titles, Descriptions, headings and paragraphs within their website. Taken too far, over-optimisation in SEO can be an issue.

However, that’s only part of the equation. Related words and phrases can also contribute additional weight and relevancy to the site and adding those into the mix can make a discernible difference. It is certainly a lot more sensible than keyword stuffing! The crucial factor in 2023 is knowing the intent and purpose of the Helpful Content algorithm and how to optimise your pages for it.

TF-IDF

The evolution of keyword use has encompassed Latent Semantic Indexing, semantic search and keyword density calculations. It’s now arrived at TF-IDF

short for term frequency–inverse document frequency – which is a measure of the importance of a word to a document in a collection or corpus, adjusted for the fact that some words appear more frequently in general.

Wikipedia – Tf-idf

So for a given page, for a given set of targeted keyword phrases, we can calculate the words likely to deliver the best outcomes, in comparison with our competitors for those phrases.

Semantics is the branch of linguistics and logic concerned with meaning.

Knowledge is knowing that a tomato is a fruit; wisdom is not putting it in the fruit salad.

Miles Kington

The 2 primary facets are;

  1. Logical Semantics: sense and reference, presupposition and implication
  2. Lexical Semantics: the analysis of word meanings and relations between them

Latent semantic analysis (LSA) was patented in 1988. In the context of its application to information retrieval, it is sometimes called Latent Semantic Indexing (LSI). Google has a number of patents involving Synonyms, along with “Reasonable Surfer” and the “Phrase-Based indexing” patents. (*1)

There is a raft of new SEO buzzwords emerging; Co-citation, and co-occurrence, along with algorithms for Entity Disambiguation, Lexical Co-Occurrence, re-ranking, Phrase-based Indexing, Synonomy, mixed in with Topical PageRank, Ontological Taxonomies and Entity Recognition. (*2)
One could be forgiven for suspecting that before long, Google will have made its algorithms so complex that SEO practitioners will require a PhD in Linguistics just to interpret the new jargon, acronyms and patents!

Several years ago, there was a lot of fuss and bother about what – if anything – Google does with LSI. People obsessed about whether it was only ever a part of Adsense, or applied to organic search results – or not used at all anymore.

Let’s skirt around that particular debate, and apply an SEO expert’s commonsense to the concept of using related words within the content of a page/website to reinforce the primary keyword phrase being targeted.

    Logic suggests that a website selling insurance policies could reasonably be expected to use quite a broad range of related words within the text. Aside from singular and plural policy, policies, there may be industry acronyms & jargon, plus related adjectives and nouns; insure, insured, insurer, premium, premiums, property, contents, cover, coverage, plan, plans, protect, protection, protected, existing, pre-existing, application, applications, pricing, rates, specify, specified… on and on it goes!

    For any given website that contains very detailed information about a particular genre, product, service or industry, there is a high probability that it would also include a predictable range of related words, phrases and terms. For example, we might see;

    • word derivations
    • synonyms (*2)
    • hypernyms & hyponyms
    • nouns, verbs and adjective variations
    • jargon & acronyms

    Search engine algorithms

    Search engine algorithms assess all the words within the site. These algorithms may be bereft of direct human interpretation but are based on mathematics, knowledge, experience and intelligence. They deliver a very accurate relevance analysis. In the context of using related words or variations within your website, it is one good way of reinforcing the primary keyword phrase you wish to rank for, without over-use of exact-match keywords and phrases.

    By using synonyms, and a range of relevant nouns, verbs and adjectives, you may eliminate excessive repetition and more accurately describe your topic or theme and at the same time, increase the range of word associations your website will rank for. For considering the term “Real Estate” the following options appear;

    • Variations or derived words; realtor, land agent, home for sale, house for sale, land for sale, house & land packages
    • Synonyms – Immovable, real property, realty
    • Hypernym: Belongings, holding, property
    • Hyponym: Acres, estate, land, landed estate
    • Definition: Property consisting of houses and land
    • Adjectives: private, secluded, view, vista, cosy, compact, spacious
    • Related Words: section, plot, land, house, home, buildings, garage, carport, outbuildings, mortgage, finance, home loans, codes, zones, residential, commercial, industrial and many more…
    • Jargon: Listing, 4B2B, closing costs, contingency, fixture, title costs
    • Acronyms: Cvac, Egdo, DOM, MLS, AVB, AVL, ASSG

    The purpose of the example is to show that there are usually many ways to expand the description of an activity, product or service. In doing so, you provide additional supporting indicators/descriptors that reinforce the content’s intent and purpose. These are components of all my website SEO services & packages.

    The English Language

    language semantic indexing

    A sad fact of life is that the global educational systems seem intent on reducing the English language to its lowest common denominator. The language was once celebrated and its history was interpreted in schools to show the evolution of word creation and usage. These days, some schools now allow cryptic phonetic spelling (as per SMS/TXT messages and Chat programs) to be used in examinations! As a wordsmith, I find that to be truly appalling! Such butchery of the language is found on websites as well, something that’s both aesthetically unpleasant and sure to minimise your rankings.

    I’m not a language snob by any means and often found grammar a struggle at school. I still don’t fully understand (or obsess about) what a “past participle” is, but I do love words for the way you can play on them and with them.

    Alternative word usage between countries, spelling differences (colour vs. colour) and word choices (footpath vs. sidewalk) are important aspects to understand when you are working on a website. You need to know your target audience, and what they expect to see.

    In optimising pages for search engines, one of the most useful skills one could possess has always been a solid grasp of the English language, and an instinctive ability to accurately and concisely explain complex concepts within 65 and 160 character constraints! Understanding the language allows you to substitute appropriate but shorter words to fit within the constraints of Titles, Descriptions, Headings etc…

    Spelling & Punctuation

    The ability to apply correct spelling is crucial and punctuation is important as it conveys meaning. That said, misspellings can be both a bad thing AND a good thing.

    Bad Spelling:

    Google et al are quite good at guessing what you meant as distinct from what you typed in a search. However, you should make every effort to check and correct basic (silly) spelling mistakes, simply because it impacts negatively on the image of you and your business… You never get a second chance to create a first impression…

    Commonly Misspelled Words:

    There are some words which are misspelled on a regular basis and one of those is “accommodation.” Variations such as  “accomodation, accommation, accomadation, accommodations” are not common and for a bed & breakfast or lodge website, it can be useful to include the various misspellings within the site. The volume of misspelt search phrases may vary across countries, but not everyone understands that there’s traffic to be gained from common errors.

    If a misspelt word is clearly showing as high-volume in your keyword research, include it within the visible page text. Spell checkers are an integral part of most word processing applications – it makes sense to use them, but they are not without their problems – being correct is not always equal to being right!

    I have a spelling checker
    It came with my PC
    It highlights for my revue
    Mistakes I cannot sea.

    I ran this poem thru it
    I’m sure your pleased to no
    Its letter perfect in it’s weigh
    My checker told me sew.

    – by Mark Eckman

    Applying Language Variables

    In Your Website

    You might thoughtfully include variations, derivations, synonyms, hyponyms and hypernyms of your primary keywords and phrases as/if appropriate. You may then more effectively convince the search engines of your content’s theme than you would by stuffing multiple iterations of the same primary keywords into those page/s.

    The proviso is that you must write your textual content in a natural style, for your reading audience, and include additional words only where it makes sense in the context of the sentence/paragraph/page. Use the variations above in the same way you’d use condiments on a meal – sparingly, to add flavour and enhance tastes…

    The same logic applies in terms of building links. In this day and age, it is a serious error of judgment to overdo the use of exact-match high-volume keywords in your link-building efforts! Instead, understand that it is safer and more effective to build additional links that include “Brand + A Keyword + Synonyms/Related Words” in the anchor text, and probably safer still to just use the domain name, business and/or brand name. It appears that the not-too-distant future will include greater emphasis on references to you rather than links to you, as per the co-occurrence theories that are gaining credence! (*3)

    Word Tools

    Even those with a professional grasp of the English language and its usage will need access to research tools to identify appropriate words relevant to the topic they are working on. A couple of big dictionaries can be useful, but let us not make the job any more difficult than it needs to be. There are both downloadable and online tools available;

    • The Sage – free download dictionary & thesaurus with synonyms, hyponyms, hypernyms and more!
    • OneLook.com – online dictionary search
    • Thesaurus.com  – online thesaurus search

    The lack of affordable, cutting-edge tools is the main reason I decided not to switch professions and become a computer hacker.  I’ve always been a  Leatherman user, but my Micra version lacks the requisite heft… But, I digress…

    The Challenges in Semantic Search

    The problem of deductive interpretation of the English language is compounded by the in-built subtleties and nuances. History clearly shows that Google is continually filing new patents across a broad range of knowledge-based disciplines. They employ an ever-growing army of bright young things to develop more… My expectation for 2013 is that this will continue unabated.

    Development of new predictive language models that can identify, translate and deduce implied meaning will not be easy. Implementing those new models into future search algorithms will be equally difficult. Delivering consistently reliable results will be an ongoing challenge, not least because language continually evolves.

    Added to that is the even more troublesome aspect of how to apply semantic analysis and interpretive reasoning to distil meaning from euphemisms, comedy, innuendo, sarcasm, satire, slang, parables, parody and wit within the framework of semantic search. A decline in educational standards reported in the US, India, Pakistan and other countries in the past year. This perhaps explains why so few people can now detect and interpret whimsy, for example. Human interpretive skills are being eroded by failure in global education systems. This makes it harder to find people who can model semantic concepts in the digital world…

    Applied Semantics

    Applied semantics will make search engines increasingly smarter. (*4) There are evolving theories on the concept of Lexical Co-Occurrence and Entity Disambiguation. This indicates that Google may be putting in place an infrastructure that will enable greater reliance on these as document ranking factors. Conversely, the current reliance on links – domain authority and anchor text – may gradually fade in importance.

    Advances in Personalised Search technology mean that Googling can already reveal the answers to questions of grave concern – by way of this quoted example;

    Google will tell you that the number 1 sign of alcoholism is drinking alone. I feel that the number 1 sign of alcoholism is having to Google ‘number 1 sign of alcoholism.’ ” – Dan Frigolette.

    That all this and much more may one day be deduced by matching up your Profile’s “likes & dislikes” on social media databases with predictive search is truly amazing!

    In my opinion, trying to keep abreast of the changes in search through 2012 and into 2013 is already contributing to a future statistical spike in applied alcoholism. This is a view clearly supported by strong growth in Jack Daniels sales worldwide. On March 6 2013, Reuters reported that;  “Worldwide demand for its Jack Daniel’s whiskey helped U.S. distiller Brown-Forman Corp to beat Wall Street profit estimates for a third consecutive quarter, and the company said full-year sales would rise.

    LSI & Mathematical Probabilities

    The mathematical probability that a site that is genuinely about a specific topic ALSO includes multiple iterations of related words, phrases and terms is high e.g.;

    • nouns, verbs and adjective variations
    • word derivations
    • synonyms
    • hypernyms
    • hyponyms

    The major search engine algorithms now apply “latent semantic indexing,” taking into account all word relationships within the site. Whilst an algorithm might lack “intelligence,” the mathematical model is quite robust and delivers extremely accurate relevancy assessments. To apply the concept of LSI to your site, it’s a matter of NOT over-loading your page with primary keywords (spamming). Instead, use variations to more accurately describe your topic or theme; E.g. using the term “SEARCH” the following options appear;

    Variations or derived words; searcher, searched, searching,

    Synonyms – query, queried, querying, seeking, looking, finding

    Hypernym: activity, examination, examine, higher cognitive process, investigate, investigating, investigation, look into, operation, scrutiny, see.
    > Synonym: explore, hunt, hunting, look, look for, lookup, research, seek.
    > Hyponym: angle, beat about, browse, cast about, cast around, comb, cruise, divine, drag, dredge, exploration, feel, finger, fish, forage, foraging, frisk, frisking, fumble, gather, go, go after, grope, grub, hunt, leave no stone unturned, looking, looking for, manhunt, nose, poke, prospect, pry, pursuance, pursue, pursuit, quest, quest after, quest for, raid, ransack, ransacking, re-explore, rifle, rummage, scan, scour, scouring, seek out, seeking, shakedown, shop, strip-search, surf, want.
    > Derived: searcher

    Noun

    > Hypernym: activity, examination, higher cognitive process, investigating, investigation, operation, scrutiny.
    > Synonym: hunt, hunting, lookup.
    > Hyponym: exploration, forage, foraging, frisk, frisking, hunt, looking, looking for, manhunt, pursuance, pursuit, quest, ransacking, rummage, scouring, seeking, shakedown.Verb
    > Derived: searcher.
    > Synonym: explore, look, look for, research, seek.
    > Hypernym: examine, investigate, look into, see.
    > Hyponym: angle, beat about, browse, cast about, cast around, comb, cruise, divine, drag, dredge, feel, finger, fish, frisk, fumble, gather, go, go after, grope, grub, hunt, leave no stone unturned, nose, poke, prospect, pry, pursue, quest after, quest for, raid, ransack, re-explore, rifle, rummage, scan, scour, seek out, shop, strip-search, surf, want. The purpose of the example is to show that there are many ways to describe the same activity, product or service.

    Application of Latent Semantic Indexing

    Implement LSI on your site by thoughtfully including variations, derivations, synonyms, hyponyms and hypernyms of your primary keywords and phrases. You will more effectively convince the Search Engines of your content theme than you would by stuffing multiple iterations of the same primary keywords into those page/s!

    There’s no knowing where all of this will take us, but it’s bound to make for another interesting decade…

    Well, it’s been a funny old day and I have a wee headache from all those EBWs… (Exceedingly Big Words). Life was so much simpler back in 1997, and being an early adopter of SEO. One thing today’s research showed with chilling clarity is that I’m getting ever closer to my use-by date! I also admit to muted alarm when I learned about the potential dangers of drinking alone. That said, the house has gone quiet, and my two companions are obviously thirsty. Best I end this article now and pour three liquid ambers – for me, myself and I

    References:

    1. Link-assistant – semantic search optimization
    2. Ignite Visibility – semantic search
    3. Social Media Today – Elon Musk says semantic search coming soon to X

    Page last updated on Thursday, October 12, 2023 by the author Ben Kemp