Skip to main content

RankDisco Classification Taxonomy

This document is the definitive reference for RankDisco's domain and URL classification system. It covers the complete taxonomy hierarchy, decision rules, mutual exclusivity constraints, and confidence thresholds.


Table of Contents

  1. Overview
  2. Tier-1 Archetypes
  3. Domain Types (Tier-2)
  4. Structural Types
  5. Page Types
  6. Decision Trees
  7. Mutual Exclusivity Rules
  8. Confidence Thresholds
  9. Quality Tiers
  10. Spam Tiers

Overview

RankDisco uses a hierarchical classification system with two primary axes:

  1. Domain Classification - Classifies the entire domain/website

    • Tier-1 (Archetype): Universal, immutable category
    • Tier-2 (Domain Type): Specific type within the archetype
  2. URL Classification - Classifies individual pages

    • Structural Type: What the page DOES functionally
    • Page Type: Specific page category within structural type

Key Principles

  • Mutual Exclusivity: Every domain has exactly ONE tier1_type, every page has exactly ONE structural_type
  • Exhaustive Coverage: Every domain and page maps to exactly one category at each level
  • Orthogonal Dimensions: Quality, trust, freshness, and monetization are separate scoring dimensions, NOT archetypes

Tier-1 Archetypes

Tier-1 answers ONE question: "Where does the user extract primary value?"

These are immutable, mutually exclusive archetypes. A domain has exactly ONE tier1_type.

Tier-1Value Extraction ModeUser ActionExample Domains
platformUSE (interactive tooling)Log in, do workSlack, Notion, Zoom, GitHub, NordVPN
marketplaceBROWSE + TRANSACT (multi-party)Search, buy/sellAmazon, eBay, Yelp, Zillow, Indeed, G2
commerceTRANSACT (direct purchase)Add to cart, payNike.com, Apple Store, Warby Parker
serviceCONTACT/BOOK (conversion)Get quote, bookLaw firms, agencies, dentists, banks
informationREAD (consumptive)Read, learnNYTimes, TechCrunch, Wikipedia, Wirecutter
communityPARTICIPATE (UGC-driven)Post, discussReddit, Discord, Stack Overflow, Quora
institutionalTRUST (authority-backed)Verify, reference.gov sites, universities, nonprofits, WHO
unknownUNDETERMINED-Insufficient signals or confidence < 0.35

Tier-1 Decision Rules

Evaluate in priority order. Pick the FIRST match:

  1. PLATFORM - Users log in and USE a tool/app/dashboard
  2. MARKETPLACE - Lists items/businesses/people NOT owned by the domain
  3. COMMERCE - Sells products directly (has cart/checkout)
  4. SERVICE - Sells services (conversion is contact/quote/booking)
  5. INFORMATION - Primary value is content/articles/guides (read-only)
  6. COMMUNITY - User-generated content dominates (participation > consumption)
  7. INSTITUTIONAL - Government, education, nonprofit, religious authority
  8. UNKNOWN - Cannot determine (use sparingly, confidence < 0.35)

Tier-1 Invariants (Never Violate)

  • Tier-1 is MUTUALLY EXCLUSIVE: a domain has exactly ONE tier1_type
  • Tier-1 is EXHAUSTIVE: every domain maps to exactly one archetype
  • Tier-1 does NOT encode: quality, freshness, trust, monetization, SEO abuse
  • Those are ORTHOGONAL AXES, not archetypes

Known Pressure Points

The information archetype is intentionally broad (news, blogs, wikis, affiliate sites). This is correct at Tier-1; differentiation happens at Tier-2. Do NOT split information at Tier-1.


Domain Types (Tier-2)

Tier-2 answers: "What specific kind of [Tier-1] is this?"

Every domain_type has exactly ONE parent tier1_type. A domain_type CANNOT belong to multiple tier1 categories.

Platform Domain Types

Domain TypeDescriptionExamples
saas_productBusiness software, productivityNotion, Slack, Asana
code_repositorySource code hostingGitHub, GitLab, Bitbucket
app_platformApp distributionApp Store listing pages
documentation_portalHosted documentationGitBook, ReadTheDocs, Docusaurus
messaging_platformReal-time communicationSlack, Discord, Teams
social_networkSocial networkingLinkedIn, Twitter/X, Facebook
audio_platformAudio streamingSpotify, Apple Music, SoundCloud
video_platformVideo streamingNetflix, YouTube (as platform)

Marketplace Domain Types

Domain TypeDescriptionExamples
ecommerce_marketplaceMulti-seller retailAmazon, eBay, Etsy
ticket_marketplaceEvent ticketsTicketmaster, StubHub, SeatGeek
real_estate_marketplaceProperty listingsZillow, Realtor, Redfin
job_marketplaceJob listingsIndeed, LinkedIn Jobs, Glassdoor
service_marketplaceFreelance/gig workUpwork, Fiverr, TaskRabbit
app_marketplaceSoftware listingsChrome Web Store, WordPress plugins
review_marketplaceBusiness reviews + leadsG2, Capterra, TrustRadius
directory_citationBusiness directoriesYellow Pages, Yelp listings

Commerce Domain Types

Domain TypeDescriptionExamples
ecommerce_storeD2C retailNike.com, Warby Parker, Allbirds
travel_bookingTravel purchasesAirlines, hotels, car rental
subscription_commerceRecurring product deliveryDollar Shave Club
product_manufacturerBrand/manufacturer sitesApple, Samsung, Ford

Service Domain Types

Domain TypeDescriptionExamples
agency_providerMarketing, PR, design agenciesDigital agencies, dev shops
pr_distributionPress release wiresPR Newswire, Business Wire
professional_serviceProfessional servicesAccounting, consulting firms
healthcare_providerHealthcare servicesClinics, telehealth, dentists
financial_serviceFinancial servicesFintech, insurance brokers
legal_serviceLegal servicesLaw firms, legal tech

Information Domain Types

Domain TypeDescriptionExamples
news_publisherJournalism organizationsNYTimes, BBC, Reuters
magazine_publisherMagazines/periodicalsWired, The Atlantic
blog_publisherBlog platformsMedium, Substack, personal blogs
content_publisherGeneric content sitesHow-to sites, recipe blogs, guides
review_siteEditorial reviewsWirecutter, CNET reviews
affiliate_review_siteAffiliate-driven reviewsNerdWallet, affiliate blogs
reference_wikiReference/encyclopediaWikipedia, Fandom wikis

Community Domain Types

Domain TypeDescriptionExamples
forum_communityDiscussion forumsReddit, Discourse, phpBB
gaming_communityGaming forums/fan sitesIGN forums, GameFAQs
sports_communitySports fan communitiesTeam forums, fantasy sports
qna_platformQ&A sitesStack Overflow, Quora
ugc_videoUser-generated videoYouTube (as UGC platform)

Institutional Domain Types

Domain TypeDescriptionExamples
government_siteGovernment agencies.gov sites, municipal sites
education_academicEducational institutionsUniversities, schools
nonprofit_orgNonprofits/NGOsCharities, foundations
healthcare_institutionHealthcare organizationsHospital systems, CDC, WHO
financial_institutionFinancial institutionsFederal Reserve, banks
legal_institutionLegal institutionsCourts, bar associations
trade_associationIndustry associationsIEEE, ACM, trade groups

Unknown/Risk Domain Types

Domain TypeDescriptionRisk Level
pbn_suspectedSuspected private blog networkHigh
spam_low_qualityKnown spam or very low qualityCritical
unknown_otherGenuinely cannot classifyN/A

Structural Types

Structural type answers: "What is this page DOING functionally?"

This is the parent layer for page_type. Classify structural_type FIRST, then page_type is constrained to valid children.

Structural TypeDescriptionPrimary Purpose
articleLong-form written contentRead and learn
detailShows one specific thingView product/profile/video/event
listingShows multiple itemsBrowse and compare
threadConversational/sequential UGCDiscuss, Q&A
utilityFunctional pagesLogin, checkout, settings
corporateCompany info pagesAbout, contact, legal, homepage
referenceDocumentation/help contentDocs, wikis, FAQs, support
spamLow quality/maliciousTERMINAL OVERRIDE - stop classifying
unknownCould not determineInsufficient signals

Structural Type Examples

Structural TypeURL Pattern Examples
article/blog/how-to-..., /news/2024/..., /guide/...
detail/products/widget, /user/john, /watch?v=...
listing/category/electronics, /search?q=..., /r/subreddit
thread/questions/12345, /t/topic/123, /comments/...
utility/login, /checkout, /settings, /download
corporate/about, /contact, /careers, /privacy
reference/docs/api, /wiki/Topic, /faq, /help/article
spamParked domains, PBN content, malware pages

Page Types

Page types are organized by category. Each page_type belongs to exactly ONE structural_type parent.

Content Page Types (structural_type: article)

Page TypeDescriptionURL Pattern Examples
articleGeneric article/article/topic-name
blog_postBlog post/blog/post-title
feature_articleIn-depth feature/features/deep-dive
guideComprehensive guide/guide/complete-guide-to
howto_articleHow-to instructions/how-to/do-something
listicleList-based article/10-best-tools-for
news_articleNews coverage/news/2024/01/story
opinion_articleOpinion/editorial/opinion/my-take-on
press_releasePress release/press-releases/announcement
recipe_pageRecipe content/recipes/chocolate-cake
research_articleResearch/study/research/new-findings
ugc_articleUser-generated articleMedium posts, Substack
buying_guidePurchase guide/buying-guide/best-laptops
review_pageProduct/service review/reviews/product-name
case_study_pageCase study/case-studies/client-success

Commerce Page Types (mixed structural_types)

Page TypeStructural TypeDescription
product_pagedetailIndividual product
app_pagedetailApp listing
auction_pagedetailAuction listing
coupon_pagedetailCoupon/discount
deal_pagedetailDeal/offer
category_pagelistingProduct category
comparison_pagelistingProduct comparison
store_locatorlistingStore finder
wishlist_pagelistingSaved items
auto_generated_comparisonlistingProgrammatic comparison
booking_pageutilityReservation page
checkout_pageutilityCart/checkout
landing_pageutilityMarketing landing
pricing_pageutilityPricing plans
sales_pageutilitySales conversion

Community Page Types (mixed structural_types)

Page TypeStructural TypeDescription
profile_pagedetailUser profile
video_pagedetailVideo content
channel_pagedetailContent channel
podcast_episodedetailPodcast episode
event_pagedetailEvent details
group_pagedetailGroup/community
live_stream_pagedetailLive streaming
gallery_pagelistingImage/media gallery
playlist_pagelistingContent playlist
events_indexlistingEvents list
subreddit_indexlistingSubreddit listing
forum_threadthreadForum discussion
qna_pagethreadQ&A thread
discussion_threadthreadGeneral discussion
comment_threadthreadComment section
social_postthreadSocial media post
poll_pageutilityPoll/survey
invite_pageutilityInvitation page

Company Page Types (structural_type: corporate)

Page TypeDescriptionURL Pattern Examples
homepageSite homepage/, /en/
about_pageAbout us/about, /company
contact_pageContact info/contact, /get-in-touch
careers_pageJob listings/careers, /jobs
team_pageTeam members/team, /leadership
press_pagePress/media/press, /newsroom
partners_pagePartner info/partners
portfolio_pageWork portfolio/portfolio, /work
brand_pageBrand page/brand, /brand-assets
features_pageProduct features/features
service_pageService details/services/consulting
legal_pageLegal content/legal
legal_privacy_pagePrivacy policy/privacy, /privacy-policy
legal_terms_pageTerms of service/terms, /tos
demo_pageDemo request/demo, /request-demo
login_pageUser login/login, /signin
signup_pageUser registration/signup, /register
settings_pageAccount settings/settings, /account

Index Page Types (structural_type: listing)

Page TypeDescriptionURL Pattern Examples
category_index_pageCategory listing/blog, /news, /docs
archive_pageContent archive/archive, /tag/topic
author_indexAuthor listing/authors, /contributors
directory_listingDirectory entry/directory/business
search_results_pageSearch results/search?q=term
location_pageLocation info/locations/city-name
redirect_pageRedirect targetN/A

Reference Page Types (structural_type: reference)

Page TypeDescriptionURL Pattern Examples
documentation_pageTechnical docs/docs/getting-started
api_reference_pageAPI documentation/api/endpoints
wiki_pageWiki content/wiki/Topic
faq_pageFAQ content/faq, /help/faq
support_articleHelp article/help/how-to-reset
pdf_documentPDF file*.pdf
course_pageCourse content/courses/intro-to-python
repository_pageCode repository/owner/repo
tool_pageInteractive tool/tools/calculator
download_pageDownload page/download, /downloads

Spam/Risk Page Types (structural_type: spam)

Page TypeDescriptionRisk Level
parked_domain_pageParked domainHigh
pbn_article_pagePBN contentCritical
comment_spam_pageComment spamCritical
malicious_pageMalware/phishingCritical

Decision Trees

Domain Classification Decision Tree

START: Evaluate domain
|
v
1. Does user LOG IN to USE a tool/app/dashboard?
YES --> tier1: platform
NO --> continue
|
v
2. Does domain LIST items/businesses/people NOT owned by the domain?
YES --> tier1: marketplace
NO --> continue
|
v
3. Does domain have CART/CHECKOUT (sells products directly)?
YES --> tier1: commerce
NO --> continue
|
v
4. Does domain SELL SERVICES (contact/quote/booking conversion)?
YES --> tier1: service
NO --> continue
|
v
5. Is PRIMARY VALUE content consumption (articles/guides/news)?
YES --> tier1: information
NO --> continue
|
v
6. Does USER-GENERATED CONTENT dominate?
YES --> tier1: community
NO --> continue
|
v
7. Is this a GOVERNMENT, EDUCATION, or NONPROFIT entity?
YES --> tier1: institutional
NO --> continue
|
v
8. Confidence < 0.35 or insufficient signals?
YES --> tier1: unknown

URL Classification Decision Tree

START: Evaluate URL
|
v
1. Check for SPAM signals (PBN, malware, parked)
DETECTED --> structural_type: spam (TERMINAL - stop classifying)
NOT DETECTED --> continue
|
v
2. Is this ROOT PATH (/, /en/, etc.)?
YES --> structural_type: corporate, page_type: homepage
NO --> continue
|
v
3. Match against PLATFORM-SPECIFIC patterns
MATCH --> Use platform pattern (e.g., GitHub repo, YouTube video)
NO MATCH --> continue
|
v
4. Match against GENERIC PATH patterns
MATCH --> Use generic pattern confidence
NO MATCH --> continue
|
v
5. Apply semantic analysis
- Corporate keywords (about, contact, careers) --> corporate
- Article patterns (blog, news, guide) --> article
- Detail patterns (product, user, video) --> detail
- Listing patterns (category, search, archive) --> listing
- Thread patterns (thread, questions, comments) --> thread
- Reference patterns (docs, wiki, faq) --> reference
- Utility patterns (login, checkout, settings) --> utility
|
v
6. Insufficient signals?
YES --> structural_type: unknown

Mutual Exclusivity Rules

Domain Level

RuleDescription
One tier1 per domainA domain has exactly ONE tier1_type. Never classify as both platform AND information.
tier1 constrains domain_typedomain_type must be valid for the assigned tier1_type. A news_publisher cannot have tier1=platform.
No cross-tier leakageQuality/trust signals do NOT determine tier1. Low-quality content is still information, not spam.

URL Level

RuleDescription
One structural_type per URLA URL has exactly ONE structural_type.
structural_type constrains page_typepage_type must be valid for the assigned structural_type.
spam is terminalOnce structural_type: spam is assigned, stop classifying. Structure is irrelevant for spam.

Prohibited Operations

  1. Cross-tier leakage: Using quality/trust signals to determine Tier-1

    • WRONG: "This is low-quality content, so it's spam"
    • RIGHT: "This is information/blog_publisher with low quality_score"
  2. Splitting information at Tier-1: Creating separate Tier-1 for news/blogs

    • WRONG: Adding "media" or "editorial" as Tier-1 types
    • RIGHT: Use Tier-2 domain_types under information
  3. Industry in Tier-1/2: Using vertical (sports, gaming) as archetype

    • WRONG: "gaming_site" as a Tier-1 or Tier-2 type
    • RIGHT: "community/gaming_community" or "information/blog_publisher" + industry=gaming
  4. Optional Tier-2: Allowing domains without domain_type

    • WRONG: Storing only tier1_type without domain_type
    • RIGHT: Every classified domain has BOTH tier1_type AND domain_type

Confidence Thresholds

Classification Source Priority

SourcePriorityDescription
manual1 (highest)Human-curated classification
rule2Rules engine pattern match
vector3Vectorize/embedding similarity
llm4LLM-based classification
merged5 (lowest)Merged from multiple sources

Confidence Score Thresholds

ConfidenceInterpretationAction
95-100Very HighUse directly
85-94HighUse with standard validation
70-84ModerateUse with additional signals
50-69LowConsider LLM fallback
35-49Very LowLLM classification required
0-34InsufficientClassify as unknown

Source-Specific Thresholds

SourceTrust ThresholdOverride Behavior
Manual curationAlways trustedOverrides all other sources
Known domain database95+ confidenceTrusted for domain classification
TLD rules (.gov, .edu)90+ confidenceStrong domain_type signal
Platform URL patterns85+ confidenceTrusted for page_type
Generic URL patterns60-85 confidenceMay need LLM confirmation
DataForSEO platform typeVariableUseful for some types, garbage bucket for others

DataForSEO Platform Type Mapping

DFS Platform TypeMaps ToNotes
newsnews_publisherReliable
blogsblog_publisherReliable
ecommerceecommerce_storeReliable
message-boardsforum_communityReliable
wikisreference_wikiReliable
socialsocial_networkReliable
educationaleducation_academicReliable
governmentalgovernment_siteReliable
directorydirectory_citationReliable
organization-Garbage bucket, needs classifier
unknown-Garbage bucket, needs classifier
cms-Garbage bucket, needs classifier

Quality Tiers

Quality tiers measure publisher authority, separate from domain type classification.

TierDomain Rank (DFS)DescriptionExamples
tier_1800-1000Top 500 DR sites, household namesNYT, Forbes, WSJ
tier_2600-799DR 60-80, recognized in nicheTechCrunch, Ars Technica
tier_3400-599DR 40-60, decent but not authoritativeMid-tier publications
tier_4200-399DR 20-40, low authoritySmall blogs
tier_50-199DR < 20, minimal authorityPersonal sites
unratedN/ANo domain rank data available-

Tier Overrides

Some domains have manual tier overrides regardless of domain rank:

Tier 1 Overrides: nytimes.com, wsj.com, washingtonpost.com, forbes.com, bbc.com, cnn.com, reuters.com, bloomberg.com, theguardian.com, businessinsider.com

Tier 2 Overrides: techcrunch.com, theverge.com, wired.com, arstechnica.com, venturebeat.com, mashable.com, entrepreneur.com, inc.com, fastcompany.com


Spam Tiers

Spam tiers are based on backlink_spam_score (0-100) from DataForSEO.

TierScore RangeDescriptionRecommended Action
toxic80-100Critical spam signalsShould be disavowed
high_risk60-79Significant spam signalsLikely problematic
moderate40-59Some spam signalsReview recommended
low_risk20-39Minor spam signalsGenerally safe
clean0-19No spam signals detectedSafe
unknownN/ANo spam score available-

URL Pattern Examples

Generic Path Patterns (Platform-Agnostic)

Pattern NameRegexPage TypeConfidence
homepage_root^https?://[^/]+/?$homepage99
pdf_document\.pdf(\?.*)?$pdf_document99
profile_at/@[^/]+/?$profile_page90
blog_post/(blog|blogs)/[^/]+/?$blog_post75
news_post/(news)/[^/]+/?$news_article75
date_article_full/\d{4}/\d{2}/\d{2}/[^/]+news_article85
product_page/(products?)/[^/]+/?$product_page80
pricing_page/(pricing|plans|packages)/?$pricing_page90
about_page/(about|about-us|company)/?$about_page85
privacy_policy/(privacy|privacy-policy)legal_privacy_page95
forum_thread/(thread|topic|discussion)/[^/]+forum_thread80
docs_page/(docs?|documentation)/[^/]+documentation_page80

Platform-Specific Patterns

PlatformDomain MatchPatternPage Type
Mediummedium.com/@[^/]+/?$profile_page
Mediummedium.com/@[^/]+/[^/]+blog_post
Substack*.substack.com/p/[^/]+blog_post
GitHubgithub.com/[^/]+/[^/]+/?$repository_page
GitHubgithub.com/[^/]+/[^/]+/issues/\d+forum_thread
YouTubeyoutube.com/watch\?video_page
Stack Overflowstackoverflow.com/questions/\d+qna_page
Amazonamazon.com/dp/[A-Z0-9]{10}product_page

Appendix: Legacy Type Mappings

Some legacy domain types are aliased to current canonical types:

Legacy TypeCanonical Type
ugc_forum_communityforum_community
ugc_qnaqna_platform
ugc_video_platformvideo_platform
marketplace_platformecommerce_marketplace
product_manufacturer_brandproduct_manufacturer
business_corporate_siteprofessional_service
agency_service_provideragency_provider
service_businessprofessional_service
hospital_systemhealthcare_institution
guest_post_networkspam_low_quality
link_insertion_sitespam_low_quality
niche_edit_networkspam_low_quality

Version History

VersionDateChanges
3.02024Formal tier1/tier2 specification, structural types, spam tiers
2.02024V2 domain database (7,000+ domains), URL pattern engine
1.02023Initial taxonomy with property types