How I Got Deindexed by Google (And How I Fixed It) | Ben’s Build and B(r)ass Notes

When Google’s algorithm decides your site isn’t worthy anymore, it happens fast. One day you’re ranking for your content, the next you’ve dropped from 400 indexed pages to just 10. Zero clicks, zero impressions.

This is how my blog got deindexed by Google in October 2024, and what I did to fix it. If you’re facing similar issues, maybe this helps.

How I Noticed

Mid-October 2024, Google Search Console showed something was very wrong:

Figure 1. Google Search Console showing drastic drop in indexed pages

The indexed page count had dropped from around 400 to just 10. And it got worse:

Figure 2. Google Search Console showing zero clicks and impressions

Traffic had dried up completely. My blog had practically vanished from Google’s index.

As Google Gemini humorously put it when I asked for a rant about this situation:

The Verdict: You are currently the only person on Earth who can explain Maven Reactor Modules and Rotary Valve maintenance in the same breath, yet you’re being "ghosted" by a robot that thinks your archive pages are spam.

— Google Gemini

That pretty much sums up the absurdity of it all.

What Happened

Google rolled out several algorithm updates in October 2024. These weren’t minor tweaks — they fundamentally changed how Google evaluates content quality and site authority.

The major updates included:

The Helpful Content Update got refined further — rewarding original content written for humans.
Core Algorithm Update with stronger focus on E-E-A-T (Experience, Expertise, Authoritativeness, Trustworthiness).
Technical SEO signals like Core Web Vitals became even more important.

Each of these changes individually would have hurt my rankings. Together, they were devastating.

Figure 3. Major reasons for deindexing according to Google Search Console

My blog got hit hard. Looking back, the reasons are clear: technical issues everywhere, no semantic structure, and zero signals establishing authority. I’d been sailing close to the wind for years, and October 2024 was when Google finally drew the line.

But there’s more to this story.

The JBake Migration (March 2022)

In March 2022, I migrated my blog from WordPress to JBake, a static site generator. I wanted the control and simplicity of static sites — no database, no PHP, just generated HTML.

I ported the WordPress Author theme to JBake and released it as open source. The port was substantial work: I removed all jQuery dependencies and modernized the JavaScript. Clean, dependency-free code.

But I made a critical mistake: I didn’t add semantic structure back. No <main>, <article>, or <section> tags. No schema.org markup. No JSON-LD. Nothing.

The theme looked the same, but the HTML structure was hollow. Search engines lost the semantic signals that help them understand content.

For two years, Google tolerated this. Rankings weren’t great, but the blog was indexed. Then October 2024 happened — algorithm updates that prioritized semantic structure and E-E-A-T signals. My blog didn’t just lose rankings. It got deindexed.

That’s the real lesson here: you can get away with missing structure for a while, but eventually Google raises the bar. When that happens, you either meet the new standards or disappear from search results.

The Root Causes

After digging through everything, I found several categories of problems.

Missing Domain Authority

My main domain bmarwell.de was nothing but a redirect to the blog. Think about that: someone visits my primary domain and immediately gets sent elsewhere. No landing page, no information about who I am or what I do. No structured data, no author bio, nothing.

From Google’s perspective, this looks like I don’t care about my own identity. How can Google establish authority and trust when the primary domain doesn’t even have content? That’s a critical mistake for E-E-A-T signals — and E-E-A-T is everything in 2024’s algorithm updates.

Technical Performance Issues

PageSpeed Insights painted a grim picture. Every metric that matters for Core Web Vitals was in the red.

Performance was bad:

CSS and JavaScript blocked initial render → poor LCP scores.
Featured images served at full social media resolution (1200×630px) even when displayed at 412×216px.
No responsive images, no <picture> elements, no srcset.
JavaScript causing forced reflows and layout thrashing.
No WebP support, no proper lazy loading.

These aren’t cosmetic issues. Large images waste bandwidth and delay rendering. Blocking resources prevent the page from painting. Each problem compounds the others, and Google’s Core Web Vitals update makes all of this directly impact rankings.

I wasn’t just slow — I was serving 3x more data than necessary and blocking the critical rendering path while doing it.

Semantic and Accessibility Issues

The HTML structure was technically valid but semantically meaningless. Browsers could render it, but search engines couldn’t properly understand it.

HTML structure was a mess:

Multiple <h1> tags on some pages.
No schema.org structured data for articles or author info.
Inconsistent heading hierarchy.

Search engines rely on semantic structure to understand content relationships and hierarchy. When you have multiple H1 tags, they can’t tell what the page is actually about. When there’s no schema.org markup, they have to guess at the article structure, author, and publication dates.

I’d stripped away all the signals that help search engines understand content, and expected them to figure it out anyway.

SEO and Content Issues

Beyond the technical problems, there were content and indexing issues that made things worse. Some of these actively harmed my rankings by creating duplicate content or exposing my work to scrapers.

Missing meta descriptions on many pages.
RSS feed exposed full articles → content scrapers could publish before Google crawled mine.
Tag and archive pages indexed unnecessarily → duplicate content.
Sitemap lacked proper priority values.

The RSS feed issue particularly stung. By serving full articles, I was essentially inviting scrapers to republish my content immediately. If they got indexed first, Google might think I’m the copycat.

Tag and archive pages create duplicate content — the same article appears on multiple URLs. Without proper noindex tags, Google wastes crawl budget and potentially penalizes the site for duplication.

The Fix

Between late December 2024 and early January 2025, I fixed everything. Here’s what I did.

1. Domain Authority and Identity

The first priority was establishing who I am and why anyone should trust what I write. I completely rebuilt bmarwell.de as a proper landing page.

Figure 4. bmarwell.de landing page on desktop

Figure 5. bmarwell.de landing page on mobile

The new landing page includes:

Clean card-based design showing my work.
Full schema.org JSON-LD markup (Person, Organization, WebSite).
Links to Apache Software Foundation, GitHub, Stack Overflow, conference talks, social profiles with rel="me".
Published PGP keys with microdata.

This gave Google clear signals about who I am and why I have expertise. The structured data explicitly states my role at Apache, my conference talks, my open source contributions. All the things that establish E-E-A-T but were previously invisible to search engines.

The site is open source on GitHub, so anyone can see exactly how I implemented the structured data.

2. Performance Improvements

Critical Rendering Path

The goal here was simple: get something on screen fast, then load the rest. Users should see content within 200-300ms, not wait for fonts and non-critical CSS.

Changes made:

Async CSS loading for non-critical styles using media="print" with JavaScript swap.
font-display: swap for Lato and Rokkitt fonts.
Deferred JavaScript loading.
rel="preconnect" for Google Fonts.

The meta media="print" trick is clever: browsers don’t block rendering for print stylesheets. Once loaded, JavaScript swaps the media attribute to all and the styles apply. Meanwhile, the page has already painted with system fonts.

Fonts using font-display: swap render immediately with fallback fonts, then swap to custom fonts once loaded. No invisible text, no layout shift timing.

Images

This needed a complete build pipeline overhaul. Images were the single biggest performance bottleneck, and fixing them required both tooling and template changes. The gains here were substantial — both in terms of file sizes and user experience.

Implementation:

Node.js script using Sharp generates responsive sizes (400w, 800w, 1200w) for images ≥800px.
WebP conversion with PNG/JPEG fallbacks.
<picture> elements for featured images and large body images.
First image on index pages: eager loading with fetchpriority="high". Rest: lazy.
SVG CC license badge with PNG fallback.

The Sharp-based pipeline runs during Maven’s package phase. It detects images over 800px, generates three responsive sizes, converts to WebP, and keeps original format as fallback. The HTML processor then transforms <img> tags to <picture> elements with proper srcset attributes.

Here’s what the compression achieved in practice. Each row shows an actual image from the blog, with its original size and the WebP variants generated at different widths. The percentage shows how much smaller each WebP version is compared to the original:

Table 1. Image Compression Results: Real Examples from the Blog
Image File	Original	400w WebP	800w WebP	1200w WebP
mail-regex…png (PNG)	238 KB	8.6 KB (96%)	19 KB (92%)	28 KB (88%)
python_connect.png (PNG)	73 KB	13 KB (82%)	37 KB (49%)	53 KB (27%)
webtrees-jsonld.png (PNG)	26 KB	7.6 KB (71%)	24 KB (8%)	44 KB (-69%)¹
domain-5243252.jpg (JPG)	226 KB	15 KB (93%)	44 KB (81%)	83 KB (63%)
brass-band-018.jpg (JPG)	417 KB	26 KB (94%)	135 KB (68%)	333 KB (20%)
brass-band-023.jpg (JPG)	329 KB	23 KB (93%)	103 KB (69%)	224 KB (32%)
christmas-tree.jpg (JPG)	47 KB	4.3 KB (91%)	9.6 KB (80%)	16 KB (66%)

¹ Small PNG images may increase in size at 1200w due to WebP encoding overhead, but the 400w and 800w versions still provide significant savings for mobile users.

The results show dramatic differences between PNG and JPEG compression to WebP. PNG images with large solid color areas (like the mail-regex featured image) achieve 88-96% compression ratios. JPEG images compress moderately but still meaningfully, typically 60-80% at the 1200w size and 90%+ at mobile sizes.

The real performance win comes from responsive sizing. A mobile user loading the 400w version uses only 4-8 KB for images that were originally 200+ KB. Desktop users on 2K displays get the 1200w version, which still saves 20-88% depending on the source format.

Here’s what the build script generates from a simple AsciiDoc image:: directive:

<picture>
  <source type="image/webp"
          srcset="mail-regex-regrets-validation-1200x630-400w.webp 400w,
                  mail-regex-regrets-validation-1200x630-800w.webp 800w,
                  mail-regex-regrets-validation-1200x630-1200w.webp 1200w"
          sizes="(max-width: 549px) 100vw, (max-width: 949px) 50vw, 412px">
  <source type="image/png"
          srcset="mail-regex-regrets-validation-1200x630-400w.png 400w,
                  mail-regex-regrets-validation-1200x630-800w.png 800w,
                  mail-regex-regrets-validation-1200x630-1200w.png 1200w"
          sizes="(max-width: 549px) 100vw, (max-width: 949px) 50vw, 412px">
  <img src="mail-regex-regrets-validation-1200x630-800w.png"
       width="1200" height="630"
       alt="Featured image of Regex vs. Email Addresses"
       fetchpriority="high" decoding="async">
</picture>

The browser automatically selects the best format and size. WebP-capable browsers get WebP; older browsers fall back to PNG. Mobile devices get smaller files, desktops get higher resolution.

JavaScript

The JavaScript was causing re-rendering steps which made the text reflow (forced reflows) — the browser had to recalculate layout mid-execution. This kills performance because layout calculation is expensive. Google and other search engines will penalize this behavior.

The following fixes to the jbake-author-template resolved these issues:

Early batched reads and writes to the document object model (DOM) mostly eliminate forced reflows.
The sticky sidebar scroll behavior was altered in a way that it does not cause reflows.
SimpleLightbox now works with <picture> elements and retries init if library loads late.

The main idea was: batch all DOM reads together, then batch all DOM writes. This will enable rewrites in an early rendering stage, avoiding interleaved reads and writes that cause reflows.

The core idea here:

Read → write → read → write: causes multiple reflows.
Read → read → read → write → write → write: causes one reflow.

The sidebar scroll fix was particularly tricky — it needed to track scroll position without constantly triggering reflows during scroll events.

Build Pipeline

Because the blog is based on jbake, I »naturally« started with Apache Maven as the build tool. However, it did not take long to see that enourmous post-processing was needed: Minification and compression of HTML and images, responsive image generation, and so on. Maven alone is not well-suited for these tasks, but can be used as an orchestrator to start npm/yarn/node-based tasks.

The build consists of several steps:

Apache Maven with the jbake plugin to generate the static site.
Frontend Maven Plugin + Yarn for image optimization.
CSS/JS minification during build.
Post-processing transforms HTML and generates responsive images.

For the bmarwell.de landing page specifically, I also implemented pre-compression at build time — generating .br, .gz, and .zst files to avoid on-the-fly compression overhead. The results were significant and impressive: HTML compressed from 26KB to 8KB with Brotli (69% reduction), and the avatar image went from 20KB JPG to 13KB WebP (35% reduction).

The blog uses my custom JBake Author theme, available as a template repository.

The build is reproducible and deterministic — same input always produces same output. No manual steps, no "works on my machine" issues.

3. Semantic HTML and Structured Data

Document Structure

Getting the basics right: one H1, proper hierarchy, semantic elements. This seems obvious, but it’s easy to break when templates get complex. All those little details added up into the following changes:

One <h1> per page.
Proper heading hierarchy.
Semantic HTML5 elements (<article>, <section>, <nav>).

The single H1 rule is critical — it tells search engines what the page is about. Heading hierarchy (H1 > H2 > H3) creates a document outline that screen readers and search engines can parse. Semantic elements like <article> and <section> add meaning beyond just visual structure.

Understanding JSON-LD and Structured Data

JSON-LD (JavaScript Object Notation for Linked Data) is structured data that can be embedded in web pages to describe their content in a machine-readable format. Unlike HTML, which is targeted at humans, JSON-LD is designed to be easily parseable for bots or tools. It is a formal JSON-based language that search engines, social media platforms, and other automated systems commonly understand.

I have actually worked with JSON-LD before — back in 2015, I wrote a plugin that added JSON-LD markup to the webtrees genealogy software. That early experience taught me the value of structured data, but somewhere along the way, I forgot to apply those lessons to my own blog.

JSON-LD is an addition to semantic HTML and schema.org microdata — they’re complementary, not competing. Think of semantic HTML as document structure, HTML microdata as inline annotations, and JSON-LD as explicit metadata that doesn’t clutter your markup.

The three layer together: semantic elements provide structure, microdata annotates specific content, and JSON-LD delivers the complete picture in one clean block.

Google will check for the existence for any of these formats. I do not know which one will be used, but all of them are understood by Google. My best guess is that Google favours blogs with multiple formats, as this shows effort and dedication to structured data.

Schema.org Types Added

I love technicality, and I love to see the results in the rich snippet testing tool showing up. Apart from that, it really does aid Google and other search engine bots in understanding what a site is about. No guessing, no interpretation — structured data that machines can parse reliably.

Schema Type	Purpose	Key Information
ProfilePage Person	Establishes my professional identity on bmarwell.de	Name, job title, expertise areas, Apache affiliation, verified profiles (GitHub, Stack Overflow)
BlogPosting	Describes each article on the blog	Author, publication date, headline, featured image, article body, publisher
Organization	Represents the blog itself	Logo, name, same-as links
WebSite	Defines site-wide information	Site name, search action endpoint, potential actions
BreadcrumbList	Shows navigation hierarchy	Page position in site structure, parent pages

Schema Type

Purpose

Key Information

ProfilePage
Person

Establishes my professional identity on bmarwell.de

Name, job title, expertise areas, Apache affiliation, verified profiles (GitHub, Stack Overflow)

BlogPosting

Describes each article on the blog

Author, publication date, headline, featured image, article body, publisher

Organization

Represents the blog itself

Logo, name, same-as links

WebSite

Defines site-wide information

Site name, search action endpoint, potential actions

BreadcrumbList

Shows navigation hierarchy

Page position in site structure, parent pages

Here’s a condensed example from my new landing page bmarwell.de showing how JSON-LD describes my professional identity:

{
  "@context": "https://schema.org",
  "@type": "ProfilePage",
  "mainEntity": {
    "@type": "Person",
    "name": "Benjamin Marwell",
    "url": "https://bmarwell.de/",
    "jobTitle": "DevSecOps Engineer | Apache Maven PMC Member",
    "description": "Apache Maven PMC member and DevSecOps Engineer...",
    "sameAs": [
      "https://github.com/bmarwell",
      "https://blog.bmarwell.de",
      "https://stackoverflow.com/users/1549977/benjamin-marwell"
    ],
    "knowsAbout": [
      "Apache Maven",
      "Java",
      "DevSecOps"
    ],
    "affiliation": {
      "@type": "Organization",
      "name": "Apache Software Foundation"
    }
  }
}

Let me break down what this does:

@context tells parsers we’re using schema.org vocabulary — the standard for structured web data.
@type: "ProfilePage" declares this is a profile page, not a blog post or product page.
mainEntity describes the Person (me), including name, job title, and professional description.
sameAs lists verified profiles across the web — GitHub, Stack Overflow, LinkedIn. This builds authority by connecting identities.
knowsAbout explicitly states my areas of expertise, relevant for E-E-A-T signals.
affiliation shows my connection to the Apache Software Foundation, a recognized authority in open source.

Search engines extract this data and use it to:

Build knowledge panels in search results.
Understand author expertise and authority.
Connect related content across the web.
Display rich snippets with structured information.
Evaluate E-E-A-T signals for content ranking.

Why Separate Identity from Content

The Person JSON-LD lives on bmarwell.de (main domain), not blog.bmarwell.de (subdomain). This separation is intentional: the main domain establishes who I am, the blog shows what I write.

Google treats the main domain as the canonical identity source. The blog links back with rel="author" and rel="me", creating verified connections between identity and content. Each domain has clear focus — one for authority signals, one for articles.

Rich Snippets

I would really love to see rich snippets for my articles in Google search results. Be it a definition, a person card or article details — rich snippets make results stand out and are really helpful for users. They are generated from structured data like JSON-LD and schema.org markup we discussed earlier.

In my (this) blog’s case, the structured data enables:

Author information appears next to articles in search results (my name, potentially my photo).
Article metadata like publication date and last updated timestamp.
Breadcrumb navigation showing the page’s position in the site hierarchy.
Site search box directly in Google results (from WebSite schema with search action).
Organization logo associated with the domain.

These rich results aren’t just cosmetic — they increase click-through rates significantly. A search result with author info and a clear breadcrumb looks more trustworthy than plain blue links.

The connection here is the following: JSON-LD provides the raw data, schema.org defines the vocabulary, and Google transforms that into rich snippets. Without the structured data, Google has to guess at this information (and often guesses wrong and probably shows nothing at all). With it, you can control what might appear in search results.

4. SEO Improvements

Meta Data

Every page needs metadata — for search engines, for social media, for users. Missing metadata means search engines have to guess, and they often guess wrong.

Additions:

Meta descriptions for all pages (including paginated index pages).
Open Graph tags for social sharing.
Twitter Card metadata with large images.

Meta descriptions appear in search results — they’re your pitch to potential visitors. Open Graph tags control how links look when shared on Facebook, LinkedIn, etc. Twitter Cards do the same for Twitter.

Without these, your content might be great but look terrible when shared.

Content Protection

Full-text RSS feeds are an invitation to content scrapers. The idea here: Scrapers might just repost my blog contents with better markup, SEO, performance and faster indexing. This means: They can republish your content instantly, potentially getting indexed before you do.

So I made the following changes to my feed:

RSS/Atom feed now shows excerpts only, not full content.
Configurable via feed.showFullContent in jbake.properties.

Excerpts give readers enough context to decide if they want to read more, while forcing them to visit your site for the full article. This protects your content while still providing a useful feed.

Robots and Indexing

Not every page should be in Google’s index. Tag and archive pages create duplicate content — the same article appearing on multiple URLs.

Implementation:

Tag pages: noindex, follow.
Archive pages: noindex, follow.
Proper canonical URLs and next/prev links for pagination.

noindex, follow tells Google "don’t index this page, but do follow the links on it." This keeps tag pages out of search results while still allowing Google to discover articles linked from them.

Canonical URLs and next/prev links help Google understand pagination — that /page/2/ is part of a series, not duplicate content.

Sitemap

Sitemap priorities should reflect actual importance and update frequency, which was not something I configured before. The sitemap was just a collection of links without priorities and/or frequencies. Setting everything to 0.8 or nothing at all tells Google nothing.

For me, I think these are realistic priorities and change frequencies:

Homepage: 1.0, weekly, contains all the recent posts.
Subsequent pages (paginated index): 0.8, monthly
Posts from the last 180 days: 0.9, yearly (because the do not change after publishing)
Older posts: 0.8, yearly
Pages and other static content: 0.7, yearly
Archives: 0.5, weekly
Feed: 0.3, daily

Posts are high priority because they’re the main content. More recent posts get slightly higher priority since they’re more relevant. Since I do not publish new content frequently, weekly for the homepage is probably a reasonable change frequency. Archives are lower because they are merely navigational. The feed changes more frequently but shouldn’t compete with actual content for crawl budget.

These values guide Google’s crawler toward the content that matters most.

5. Content Improvements

Content is king, but structured, accessible content is even better. I added a few new pages and improved existing ones with a better structure and more information, targeted at humans, not bots.

New Content

I modified and added pages that establish expertise and personality:

Badges page: Credly certifications, including IBM Champion 2022.
Orchestra/music section: detailed background for each and I played in, with images.
Enhanced about pages.

These aren’t just random additions. This content establishes who I am beyond just technical articles — a musician, a certified professional, someone with interests outside coding. Google wants to see real people with diverse backgrounds, not content farms.

Translation Consistency

For multilingual content, structure matters as much as translation quality:

German and English versions have a parallel structure, i.e. mostly the same headings and content.
I prefer manual and natural translations, not literal ones done by AI.

The parallel structure of translated pages helps Google understand that these are translations, not duplicate content. Any human-made natural translation sounds better and reads better than literal word-for-word translations.

Writing Guidelines

I established standards for maintainable content and created an internal copilot-instructions.md file with guidelines:

One sentence per line in AsciiDoc (better Git diffs).
German: »guillemets« for quotation marks.
All images link to themselves (SimpleLightbox compatibility).

These guidelines will make future content consistent and easier to manage in version control. This will also help me to use copilot to check for deviations from these standards. But the corrections will still be done manually.

6. Code Quality

Clean code isn’t just for developers — it affects maintainability, which affects content quality long-term. Technical debt in templates means bugs in production pages.

DRY

Eliminated code duplication that made maintenance difficult:

I created an »image path« macro in FreeMarker (before, the functionality was duplicated everywhere).
The template structure was unified.

When image path logic existed in five different templates, fixing a bug meant five separate edits. By extracting a single macro this means one fix needs to be done for future bugs. This reduces bugs and makes the codebase more maintainable.

Documentation

As stated before, I created a file .github/copilot-instructions.md with comprehensive guidelines:

Build commands and timeouts
Code formatting standards
Content guidelines
Technical SEO checklist
Performance notes

The documentation captures decisions and rationale that would otherwise be lost. Future contributors (including future me) can understand why things work the way they do.

Code Standards

While I usually have tools and strict coding standards configured in Apache Maven (or in my IDE or just in my head), I did not apply them to this blog so far. Here is what I added as code standards for this blog project:

No trailing whitespace: Strict enforcement across all file types.
Unix line endings (LF): Consistent across the codebase.
UTF-8 encoding: Universal character encoding.
Clear over clever: Prioritize code clarity over terse "clever" solutions.
JSDoc comments: Function documentation explaining purpose, usage, and implementation rationale.

These standards prevent issues like inconsistent line endings breaking shell scripts, or trailing whitespace causing Git noise. They’re not about aesthetics — they’re about reliability.

Commit History

The recovery involved 36+ commits over three days (December 26-29, 2024):

UTF-8, Unix LF, no trailing whitespace.
Prefer clarity to clever hacks.
JSDoc comments explaining why and how to use it and when to call it, not just what the function does (that’s what the function name is for, imo).

Key Changes Summary

All those fixes cen be devided into multiple categories. Here’s a before-after breakdown with some comments on the effect of each change:

Category Before After Effect

Category	Before	After	Effect
HTML Structure	Multiple H1 tags, no semantic elements	Single H1, proper `<article>`, `<section>`, `<main>`	Better content understanding
Structured Data	None	JSON-LD for BlogPosting, Person, Organization, WebSite, BreadcrumbList	Clear authority signals
Images	Full-size featured images (1200×630px), no responsive variants	WebP with 400w/800w/1200w variants, `<picture>` elements	Faster LCP, less bandwidth
CSS Loading	Blocking render	Async with media print swap	Faster initial paint
JavaScript	Blocking, forced reflows	Deferred, batched DOM operations	Better performance scores
Fonts	Default loading	`font-display: swap`, preconnect	No layout shift
RSS Feed	Full article content	Excerpts only (configurable)	Content protection
Duplicate Content	Tag/archive pages indexed	`noindex, follow` on tag/archive	Cleaner index
Sitemap	Generic priorities	Realistic values (posts 0.8, archives 0.5)	Better crawl focus
Meta Descriptions	Missing on many pages	Added to all pages including paginated index	Better search snippets
Domain Authority	Redirect-only main domain	Full landing page with schema.org Person/Org markup	E-E-A-T signals
Cache Headers	2 days for all assets	1 year for versioned CSS/JS/images, 1 week for HTML	Better browser caching

HTML Structure

Multiple H1 tags, no semantic elements

Single H1, proper <article>, <section>, <main>

Better content understanding

Structured Data

None

JSON-LD for BlogPosting, Person, Organization, WebSite, BreadcrumbList

Clear authority signals

Images

Full-size featured images (1200×630px), no responsive variants

WebP with 400w/800w/1200w variants, <picture> elements

Faster LCP, less bandwidth

CSS Loading

Blocking render

Async with media print swap

Faster initial paint

JavaScript

Blocking, forced reflows

Deferred, batched DOM operations

Better performance scores

Fonts

Default loading

font-display: swap, preconnect

No layout shift

RSS Feed

Full article content

Excerpts only (configurable)

Content protection

Duplicate Content

Tag/archive pages indexed

noindex, follow on tag/archive

Cleaner index

Sitemap

Generic priorities

Realistic values (posts 0.8, archives 0.5)

Better crawl focus

Meta Descriptions

Missing on many pages

Added to all pages including paginated index

Better search snippets

Domain Authority

Redirect-only main domain

Full landing page with schema.org Person/Org markup

E-E-A-T signals

Cache Headers

2 days for all assets

1 year for versioned CSS/JS/images, 1 week for HTML

Better browser caching

Each fix targeted specific issues found in Search Console or PageSpeed Insights. The commits (36+ over three days) systematically addressed every category.

I also used various free online SEO tools for on-page checks and addressed these issues. When I did not know how to fix an issue, I had a proposal created by Copilot and then adjusted it manually.

The Role of GitHub Copilot

I will be honest: much of the technical changes was done using GitHub Copilot. And I think that’s worth talking about, because AI tools are changing how we work.

Copilot helped me:

Spot refactoring patterns.
Implement the image optimization pipeline.
Write JSDoc explaining implementation choices.
Catch semantic HTML issues.
Generate documentation and standards.

But here’s what Copilot didn’t do: it didn’t identify the problems. It didn’t create the strategy. It didn’t decide what mattered and what could wait.

The strategy, analysis, and decisions were mine. Copilot sped up implementation, but knowing what to fix and why required human judgment. I had to understand SEO, performance optimization, semantic HTML, and user experience to give Copilot the right instructions.

Am I relying on AI? Certainly!

But more importantly, I think (and I hope) that I am an informed user who directs copilot. In my (current) opinion, this is the key to success with AI tools and how they are used efficiently and effectively (as of now).

And to be honest: it was fun to work with Copilot again after a long time of not using it. And it implemented many of the changes faster than I could have done it manually. I could spend more time reviewing the changes and making it »perfect« for my needs.

Let me know in the comments if you think this was the right move!

Results and Next Steps

As of writing, the Google index recovery is still ongoing. Google takes time to reassess a site after major changes — sometimes weeks, sometimes months. The algorithm doesn’t just reindex immediately when you fix things.

Based on experiences and SEO research, here is what I expect:

Weeks 1-2: Google recrawls the site and discovers the changes. Index coverage may fluctuate.
Weeks 3-4: Initial recovery as Google re-evaluates the site. Expecting 50% index recovery.
Weeks 5-8: Continued improvement as trust rebuilds. Targeting full recovery to pre-deindexing levels.

I will be monitoring a few things closely:

Index coverage in Search Console
Core Web Vitals (LCP, FID, CLS)
Crawl stats
PageSpeed scores (targeting 90+)
Click-through rates

The foundation is laid out now. Fast loading, proper semantic structure, clear authority signals. PageSpeed scores are in the green, Core Web Vitals are passing, structured data validates.

Now it’s about consistent content and waiting for Google’s algorithms to catch up. The technical problems are solved — the rest is patience and continued quality content.

Lessons Learned

Don’t Neglect Your Main Domain

A redirect-only main domain is a wasted opportunity. Build a proper landing page that establishes who you are and why you have expertise. This isn’t vanity — it’s critical for E-E-A-T signals that Google uses to evaluate authority.

Technical SEO Matters

Page speed and Core Web Vitals directly impact rankings now. This isn’t a nice-to-have anymore — it’s foundational. If your LCP is over 2.5 seconds, you’re losing rankings regardless of content quality.

Semantic HTML Helps Search Engines

Proper markup helps Google understand your structure without guessing. One H1 tag, proper heading hierarchy, semantic elements like <article> and <section>. Schema.org structured data is especially important for context and authority signals.

Protect Your Content

Use excerpts in feeds, not full articles. Don’t make it easy for scrapers to republish your content before Google indexes yours. This seems minor, but content theft can directly harm your rankings.

The downside: RSS feed users will only see excerpts. But I think that’s a reasonable trade-off to protect original content.

To be honest, maybe scrapers were not the problem at all, and I cannot prove this was one of the reasons which hurt my ranking. But I am not going to take any chances here, why would I risk it?

Small Issues Compound

I didn’t have just one big problem. I had dozens of small issues that together screamed »low quality« to Google’s algorithm. Each issue alone might not have killed the site, but together they were lethal.

Fixing them required systematic work — methodically going through Search Console warnings, PageSpeed Insights recommendations, and structured data validation.

Document Everything

Write guidelines for yourself. Document builds, code standards, content rules. Future you will thank present you when you need to make changes six months later.

This also makes recovery faster if something breaks — you know what the standards are and can verify against them.

Conclusion

Getting deindexed by Google is scary, but fixable. For me, it required fixing performance, semantic structure, content quality, and domain authority all at once.

The key takeaway: systematic analysis using Search Console and PageSpeed Insights, then methodically fix what you find. No silver bullets here — improvement comes from fixing many small things.

If you’re in the same boat, I hope this helps you see what to look for. Good content deserves good rankings, but that requires both quality writing and solid technical foundations.

Now back to writing instead of fixing infrastructure. Though I’ll keep watching those Search Console metrics…