How the Rating Works

An independent, community-driven ranking — not affiliated with any official organization

What is AgilityDogsWorld Rankings?

AgilityDogsWorld Rankings is an independent rating system for dog agility teams worldwide. It ranks teams based on their competition results, providing a transparent and objective measure of performance across international events.

The system is not official and is not affiliated with FCI or any other governing body. It is a community-driven project, open about its methodology so that anyone can understand how ratings are calculated.

How Ratings Are Calculated

The PlackettLuce Model

Ratings use the PlackettLuce model, a well-established statistical method for ranking based on ordered outcomes. In simple terms: each run is a head-to-head comparison. Beat strong teams and your rating goes up more. Lose to weaker teams and your rating goes down more.

Think of it like this: finishing 5th at a World Championship against the best teams in the world tells us more than finishing 1st at a small local trial. The model accounts for who you competed against, not just where you placed.

✓ What affects your rating

  • Your placement in each run
  • Strength of competitors in that run
  • Competition tier (Major events count more)

✗ What does NOT affect your rating

  • Time or faults directly (only placement matters)
  • Breed of dog
  • Country of origin

Rating Scale

Ratings are centered around 1500, with higher values indicating better performance. Approximately 68% of teams fall between 1350 and 1650. Teams need at least 5 runs to appear on the leaderboard, and teams with few runs are marked as provisional. Teams are classified into tiers based on their rating:

  • Elite — the top performers, consistently beating strong competition
  • Champion — strong competitors with proven results
  • Expert — experienced teams with solid performances
  • Competitor — active participants building their record

Size Categories

Dogs compete in four size categories: S (Small), M (Medium), I (Intermediate), and L (Large). Each dog competes in one category based on their height. When dogs from different categories meet in the same run, they are all rated against each other. Display ratings are then normalized per category so they are comparable across S, M, I, and L.

What Keeps the Rankings Fair

The PlackettLuce model is the foundation, but raw results alone can be misleading. These mechanisms ensure the rating reflects genuine competitive strength.

1

PlackettLuce Model

Bayesian skill estimation treats each run as a series of head-to-head comparisons. Beat strong teams and your rating rises more; lose to weaker teams and it drops more.

The core engine — every other mechanism builds on this foundation.

2

Major Event Weighting

Tier 1 events (AWC, EO, JOAWC, SOAWC) get a 1.2× weight multiplier. A podium at Worlds against the best teams should mean more than winning a smaller competition.

Major events attract the strongest fields — the rating should reward competing at the top level.

3

Field-Size Weighting

Runs with fewer finishers carry proportionally less weight, scaled against a baseline of 20 competitors.

Beating 5 teams tells us less about your skill than beating 25. Larger fields provide more information.

4

Minimum Field Size

Runs with fewer than 6 teams are excluded from rating entirely. Too few competitors means too little information for a meaningful comparison.

Prevents tiny fields from introducing noise into the rating.

5

Elimination Handling

Eliminated teams are excluded from the run comparison entirely — they receive no rating update for that run. Only non-eliminated teams with a valid placement are compared against each other.

Without this, finishers would be treated as “beating” all eliminated teams, massively inflating their ratings in runs with high elimination rates.

6

Newcomer Confidence

Teams with few competitions get a more cautious rating estimate via an adaptive sigma penalty that decreases as the team gains experience.

A few lucky results shouldn’t catapult a new team to the top. More data means more confidence.

7

Podium Boost

Teams are rewarded for finishing in the top 3. Competing often but rarely reaching the podium produces a lower rating than fewer runs with consistent top-3 finishes. Podium placements only count in runs with at least 15 competitors.

Without this, a team that enters 80 competitions and always finishes mid-pack could outrank one that enters 20 and wins half. The rating reflects excellence, not just participation.

8

Uncertainty Stabilization

After each run, the uncertainty (sigma) in a team’s rating decays slightly, but it never drops below a minimum floor. This prevents overconfidence in any single team’s rating.

Balances responsiveness (ratings should update) with stability (ratings shouldn’t swing wildly).

9

Cross-Size Normalization

Ratings are z-score normalized within each size category (S/M/I/L) so that 1500 always means “average” and the scale is consistent. A rating of 1650 means “one standard deviation above the mean” in any category.

Large has more competitors than Small. Without normalization, the wider spread in Large would make ratings incomparable across categories.

10

2-Year Rolling Window

Only runs from the last 730 days count toward the rating. Older results are dropped, so the rating always reflects current form.

A team that dominated three years ago but hasn’t competed since shouldn’t hold a top spot forever.

Data Sources

We import results from international FCI-rules agility competitions spanning the last 2–3 years:

What We Import

  • Tier 1 — AWC, EO, JOAWC, SOAWC (world and European championships)
  • Tier 2 — international open competitions running under FCI rules (~20 events per year)

Sources

Results are imported from publicly available sources including agilitynow.eu, kacr.info, flowagility.com, smarteragility.com, sport.enci.it, devent.no, and others.

What We Don’t Import Yet

Non-FCI competitions (AKC, USDAA, UKI, IFCS, WAO, KC/Crufts) are not included. These organizations use different rules, jump heights, and size categories, making direct comparison with FCI results unreliable. Expanding to non-FCI competitions is a future goal.

Transparency note: not all competitions are included. Results data is imported from publicly available sources. We are continually expanding our coverage to include more events.

Size Categories

All competitions use FCI size categories based on the dog’s height at the withers:

Category Height
S Small < 35 cm
M Medium 35–43 cm
I Intermediate 43–48 cm
L Large > 48 cm

Each dog is assigned to one category based on their most recent run. An admin override is possible if a dog’s category changes.

Cross-Category Runs

Some competitions mix dogs from different size categories in the same run (for example, WAO 500 includes both Intermediate and Large dogs). When this happens, all competitors are rated against each other — the placement is real, they ran the same course. Display ratings are then normalized independently per category, so each leaderboard remains fair.

Limitations and Disclaimers

We believe in transparency. Here are the known limitations of the rating system:

  • AgilityDogsWorld Rankings is an independent project. It is not endorsed by or affiliated with FCI or any other organization.
  • Only FCI-rules competitions are currently included. Teams that primarily compete in non-FCI events (AKC, USDAA, UKI, IFCS, WAO) will not have a rating.
  • Ratings are only as good as the data. We rely on publicly available competition results. Errors in source data may affect ratings.
  • Not all competitions are included. A team's true strength may not be fully reflected if they primarily compete at events not yet in our database.
  • Rating is based on placement, not absolute performance. A clean run at a weak competition may count less than a faulted run at a strong one.
  • Categories that rarely meet in cross-category runs develop partially independent rating pools. Display ratings are normalized per category, so each leaderboard is internally consistent, but the underlying skill estimates may not be perfectly calibrated between categories.