I work on a lot of large websites. Most of the time when I am brought in to work on one of these sites, whether a content site or a marketplace (similar to ecommerce), one of the biggest changes I can make is fixing their site’s information architecture.

That’s what we are talking about today.

Before I go any further, I should be clear that this post is not talking about a flat URL structure (eg site.com/important-page). These two terms, flat page (site) architecture and flat URL structure, often get confused in the minds of junior SEOs and so it’s necessary to separate them.

URL structure is how your URLs are formatted, such as with a subfolder or dashes instead of underlines.

Site architecture refers to how your pages are interlinked and organized on your site so that they are findable for users as the user navigates your site.

Site structure examples

A website’s structure can best be thought of as a tree. The homepage is at the top and from there everything trickles down, just like the top of a tree all the way down to its roots.

When done correctly, the site can be diagrammed this way:

But I see a lot of websites that are structured thus:

Instead of a tree that gradually fans out to gather more of the nutrients from the soil, it goes deep into the bedrock where it receives zero nutrients and doesn’t help the tree grow.

Same with the pages on your site. When they are hidden far down in your website’s architecture, all of this happens:

  1. They are crawled by the search engines less than other pages on the site that are closer to the homepage;
  2. They are more difficult for your users to discover and use to navigate through the site;
  3. They can take away from other pages on your site being crawled, and may even give you duplicate content problems if tags generated by users.

Is this just a large site issue?

Absolutely not. I have audited many sites over the years big and small that have this problem.

Back in the day, many SEOs wrote posts about “categories vs tags” especially around content management platforms like WordPress. Some writers would use just a few categories within which all of their posts were contained, and then each post would have multiple tags.

The architecture would end up looking like this, with the major problem highlighted within the green box:

This site has three categories, one subcategory, ten posts, and six tags. Can you spot the problem?

There are only 21 pages on this site, yet we already have pages that are five levels deep into the site architecture away from the homepage. The homepage, of course, is almost always the strongest page on the website, and the further a page is away from the homepage the harder it is to rank.

Definitely a problem and one that will only get worse over time especially if they use multiple tags per post, do not keep to a set of tags that are pre-populated purposefully, and keep adding new posts to the site. Things get buried quickly.

There is a way forward however. Today I’ll show you how to do that.

How to build (or fix) a site’s architecture

You either have a new site or you’re working on a site that is long established. You want to set up the new site correctly for long term SEO success, or you need to fix the site’s architecture to capitalize on the site’s strength and kickstart the site’s organic traffic again. Unfortunately, SEOs are usually brought in for the latter and not to build it from the start.

Each requires a different approach so let’s talk about both.

How to build a new site’s architecture

For an SEO who geeks out on site architecture and technical SEO like I do, this is a dream. As I’ve built Credo, thinking through the site architecture and building avenues to get everything indexed has been incredibly fun. It’s still not perfect and is a constantly changing challenge as the site gets bigger, but all in all it’s been done well (though if anyone wants to audit it and tell me what they’d change, that would be fun).

So how do you do it?

First, you have to identify the highest volume keywords that you want to target. These become your top level category pages that are linked most often on your site.

Second, identify the longer queries that nest nicely under these top level category pages. For example, Credo has a page for New York Marketing Agencies as the top level but also a page for New York SEO Agencies. This taxonomy is slowly being expanded.

Finally, you have all of your content, whether that is posts, profiles, SKUs or anything else you can have on your site. These are contained within your categories and your subcategories and link back up where possible.

How to fix an old site’s architecture

Fixing a site’s architecture can be much more challenging especially when the site has millions of pages. I follow this strategy:

  1. Do a full crawl of the website or section by section to find all of the pages and their levels deep in the architecture (start with ScreamingFrog and if that bogs down then you may need to upgrade to DeepCrawl or Botify).
  2. Download as much of the Google Search Console keyword data as you can. Run these through Google’s Keyword Tool/Moz’s Explorer/keyword tool of your choice to identify the following:
    1. Search terms with the highest potential search traffic where you are receiving very little;
    2. Match this against the URLs ranking for those terms;
    3. Prioritize by levels deep within the site. I always start with pages that are at least 5 levels deep.
  3. Decide if the page is optimized enough to rank (with a good URL, onpage SEO elements, unique content, page speed, etc)
    1. If it is well optimized (they rarely are, by the way) then you use one set of strategies.
    2. If they are not well optimized, then you are often better off building out a new URL route and structure to then redirect these deep pages to.
  4. Identify scaleable ways (everything on a large site has to be scaleable) to bring those pages higher in the architecture.

There are a few common ways to bring pages higher in the architecture, and a successful site re-architecture campaign usually involves many of them:

  1. Creating HTML sitemaps linked off the homepage that then link to all of the categories/subcategories by taxonomy;
  2. Re-configuring URLs to be better optimize for search and navigation, then redirecting old URLs to new;
  3. Segmenting XML sitemaps by taxonomy/type to see where indexation is failing and where you can further optimize;

The analysis via ScreamingFrog and Excel is usually fairly quick depending on how you slice up the site, and of course the most time consuming is keyword research and then mapping it to a new site organization architecture and building out the technical URL routes and handling the redirects.

It’s a lot of work, but it’s worth it.


I’ve worked on multiple sites that have fixed their site architecture or begun that work with a test to see how it works for them (spoiler: it always helps).

Here are two examples.

First, a travel website that I worked on in 2012/2013 when I worked as a consultant with Distilled in New York City. We fixed their architecture by:

  1. Creating subcategories to target keyword terms that were slightly longer than the head terms (eg Chicago Boutique Hotels, under Chicago Hotels)
  2. Implementing internal linking from categories to these new subcategories;
  3. Implementing internal linking from individual listings back to both categories and subcategories with relevant anchor text.

Here is a screenshot from a presentation I gave on the topic:

Here’s the full presentation:

Another site I have been working on more recently is much larger than site 1. It currently has approximately 1.8M pages in Google’s index and for years had the structure pointed out above with:

  • Few categories
  • Few subcategories
  • Many products (in the hundreds of thousands)
  • Hundreds of thousands of user generated tags

We ran a “test” on one category to expand the number of subcategory pages, only by about 100 pages in total. Here is a snippet of the results we saw, where the lines annotated with arrows are where the ranking URL switched from a tag page to a subcategory page which is linked high in the architecture (no more than 3 levels deep ever):

And traffic? Yeah that’s done well to for this section of the site:

  • +74% organic sessions to this section of the site;
  • +41% pages per session
  • +148% new users

Everything is heading in the right direction.

This stuff works, and I’d love to hear your stories about it as well!

Credo exists to help businesses connect with the right SEO or digital marketing agency. Get started here.