a·gen·tic a·gil·i·ty

Leveraging AI Embeddings for Related Content Classification

Explores using AI embeddings and cosine similarity to improve related content recommendations in Hugo, enhancing semantic relevance, user navigation, and AI discoverability efficiently.

Published on
6 minute read
Image
https://nkdagility.com/resources/OUeioY0BIV-
Subscribe

Mid-last year, I transitioned my website to Hugo and since then have been exploring AI-driven content classification. A common feature I have always appreciated is the “related content” recommendation, suggesting to readers what’s next or what else might be of interest. Although Hugo’s built-in related content functionality is perfectly serviceable, relying on parameters like tags, keywords, and headings, I believed there was room for something more sophisticated.

In general, for my site, my focus has shifted towards Generative Experience Optimisation (GEO), aiming to enhance the reading experience for both humans and generative AI agents. Unlike traditional Search Engine Optimisation (SEO), GEO optimises content readability and semantic relevance, creating a win-win scenario for both humans and AI. SEO often prioritises keyword density, sometimes compromising readability. GEO, conversely, optimises content clarity for both human understanding and AI comprehension. With this shift, the standard method in Hugo,though efficient for basic needs,felt insufficient.

The challenge was clear: Hugo’s built-in related content system, based on static parameters, lacked semantic understanding. It didn’t recognise deeper contextual relationships between articles beyond shared tags or keywords.

I initially considered leveraging my existing classification capabilities, but the computation involved would be excessive, approximately 2.56 million API calls for my content catalogue, an impractical approach that would be very expensive and slow.

Creating an Embeddings Repository

Instead, I opted for a more fun, computationally efficient method: OpenAI Embeddings. Embeddings convert textual content into numerical vectors, capturing semantic meaning and enabling sophisticated comparisons.

The power of embeddings lies in their semantic understanding. Rather than purely lexical comparisons, embeddings identify relationships in meaning. This makes them perfect for establishing truly relevant connections between articles.

Leveraging AI Embeddings for Related Content Classification

The first step was to generate embeddings for each piece of content using OpenAI’s Embeddings API. The cost-efficiency of this method was striking; I generated embeddings for around 1,600 content pieces (about 3,876 requests, including debugging) at a minimal total cost of $0.66.

 1function Get-OpenAIEmbedding {
 2    param (
 3        [Parameter(Mandatory)]
 4        [string]$Content,
 5
 6        [string]$Model = "text-embedding-3-large",
 7        # OpenAI API Key
 8        [string]$OPEN_AI_KEY = $env:OPENAI_API_KEY
 9    )
10
11    $response = Invoke-RestMethod `
12        -Uri "https://api.openai.com/v1/embeddings" `
13        -Headers @{
14        "Authorization" = "Bearer $OPEN_AI_KEY"
15        "Content-Type"  = "application/json"
16    } `
17        -Body (ConvertTo-Json @{
18            input = $Content
19            model = $Model
20        }) `
21        -Method Post
22
23    return $response.data[0].embedding
24}

To manage these embeddings efficiently, I stored them locally and synced them to Azure Blob Storage, creating a reusable, easily accessible repository. Although the cost was minimal the runtime to get the embedding for 1600 items was non-trivial so I syncing them to cloud storage to significantly reduces processing time for future operations.

Leveraging AI Embeddings for Related Content Classification

AZCopy is your friend here as it’s able to minimise the upload/download time.

Calculating Cosine Similarity

With the embeddings in place, the next task was to calculate the semantic similarity between content items using Cosine Similarity. This algorithm measures the angle between two embedding vectors, returning a similarity score ranging from -1 (completely opposite) to 1 (identical).

 1function Get-EmbeddingCosineSimilarity {
 2    param (
 3        [float[]]$VectorA,
 4        [float[]]$VectorB
 5    )
 6
 7    $dotProduct = 0
 8    $magnitudeA = 0
 9    $magnitudeB = 0
10
11    for ($i = 0; $i -lt $VectorA.Length; $i++) {
12        $dotProduct += $VectorA[$i] * $VectorB[$i]
13        $magnitudeA += [Math]::Pow($VectorA[$i], 2)
14        $magnitudeB += [Math]::Pow($VectorB[$i], 2)
15    }
16
17    if ($magnitudeA -eq 0 -or $magnitudeB -eq 0) {
18        return 0
19    }
20
21    return $dotProduct / ([Math]::Sqrt($magnitudeA) * [Math]::Sqrt($magnitudeB))
22}

This step was computationally intensive due to comparing each content item against all others. To manage this, I cached similarity scores above a threshold (0.5 or higher), significantly reducing future computations.

 1{
 2  "calculatedAt": "2025-05-27T18:44:01.8272803Z",
 3  "related": [
 4    {
 5      "Title": "Mastering Azure DevOps Migration: Navigating Challenges, Solutions, and Best Practices",
 6      "Slug": "mastering-azure-devops-migration-navigating-challenges-solutions-and-best-practices",
 7      "Reference": "resources/videos/youtube/_rJoehoYIVA",
 8      "ResourceType": "videos",
 9      "ResourceId": "_rJoehoYIVA",
10      "Similarity": 0.6808168645512461
11    },
12    {
13      "Title": "Mastering Azure DevOps Migration: A Step-by-Step Guide for Seamless Project Transfers",
14      "Slug": "mastering-azure-devops-migration-a-step-by-step-guide-for-seamless-project-transfers",
15      "Reference": "resources/videos/youtube/Qt1Ywu_KLrc",
16      "ResourceType": "videos",
17      "ResourceId": "Qt1Ywu_KLrc",
18      "Similarity": 0.6715379090446947
19    },
20    {
21      "Title": "Navigating the TFS to Azure DevOps Migration: Overcoming Compatibility Concerns with Confidence",
22      "Slug": "navigating-the-tfs-to-azure-devops-migration-overcoming-compatibility-concerns-with-confidence",
23      "Reference": "resources/videos/youtube/qpo4Ru1VVZE",
24      "ResourceType": "videos",
25      "ResourceId": "qpo4Ru1VVZE",
26      "Similarity": 0.6701177401809448
27    }
28  ]
29}

This is then stored with the content item and used to get the related content items for any specific content item.

Integration into Hugo Layout

With similarity scores computed and cached, integrating them into Hugo was straightforward. I updated the Hugo layout to dynamically load the cached similarity scores at build-time, displaying the top 3 related content items for each article. This provided a more meaningful user experience without excessive runtime overhead.

Leveraging AI Embeddings for Related Content Classification

The Outcomes

The impact of this all remains to be seen, but there is an expectation that beyond the usability enhancements that this brings for human readers, it will also bring enhancements to content analysis by AI agents. These links enable the discoverability of other actually related content which should tickle the AI search algorithms to promote the content more.

The implementation was remarkably successful:

As I mentioned, the impact remains to be seen, and I have no idea how often AI crawls the content for updates. However, this will be live on my site by the time you read this.

Future Improvements and Extensions

If this experiment proves to be successful, there are a number of other ideas that I have for its use:

Conclusion

Using embeddings to enhance related content has provided a practical and scalable improvement to my site’s content classification. It aligns perfectly with my goal of optimising the experience for both human readers and AI systems. While the true impact of this approach will become clearer over time, the initial implementation already shows significant promise for improving relevance and efficiency.

Subscribe

Related Blog

Related videos

Connect with Martin Hinshelwood

If you've made it this far, it's worth connecting with our principal consultant and coach, Martin Hinshelwood, for a 30-minute 'ask me anything' call.

Our Happy Clients​

We partner with businesses across diverse industries, including finance, insurance, healthcare, pharmaceuticals, technology, engineering, transportation, hospitality, entertainment, legal, government, and military sectors.​

Deliotte Logo

Deliotte

Trayport Logo

Trayport

Cognizant Microsoft Business Group (MBG) Logo

Cognizant Microsoft Business Group (MBG)

DFDS Logo

DFDS

ALS Life Sciences Logo

ALS Life Sciences

Microsoft Logo

Microsoft

Qualco Logo

Qualco

Sage Logo

Sage

Illumina Logo

Illumina

ProgramUtvikling Logo

ProgramUtvikling

Slaughter and May Logo

Slaughter and May

Teleplan Logo

Teleplan

Capita Secure Information Solutions Ltd Logo

Capita Secure Information Solutions Ltd

Schlumberger Logo

Schlumberger

Epic Games Logo

Epic Games

Alignment Healthcare Logo

Alignment Healthcare

Slicedbread Logo

Slicedbread

Lockheed Martin Logo

Lockheed Martin

New Hampshire Supreme Court Logo

New Hampshire Supreme Court

Royal Air Force Logo

Royal Air Force

Nottingham County Council Logo

Nottingham County Council

Department of Work and Pensions (UK) Logo

Department of Work and Pensions (UK)

Washington Department of Transport Logo

Washington Department of Transport

Washington Department of Enterprise Services Logo

Washington Department of Enterprise Services

Akaditi Logo

Akaditi

Bistech Logo

Bistech

Emerson Process Management Logo

Emerson Process Management

Slicedbread Logo

Slicedbread

Boxit Document Solutions Logo

Boxit Document Solutions

Sage Logo

Sage