The confidence level algorithm produces a 0-100 score indicating the likelihood that an IP address is associated with suspicious activity. It combines evidence from two independent sources:
The algorithm is designed to be evidence-weighted: more signals and stronger signals produce higher scores. Evidence stacks without artificial caps until final normalization. There is no time decay-the score represents total evidence strength. Recency information is available via timestamps for users who need it.
Confidence levels map to threat levels that indicate appropriate response actions:
| Score | Level | Interpretation |
|---|---|---|
| 90-100 | Very High | Strong evidence across multiple signals. Highest confidence of suspicious activity. |
| 70-89 | High | Significant evidence from sensors and/or community reports. |
| 40-69 | Medium | Suspicious activity detected. Monitor closely or challenge. |
| 10-39 | Low | Minimal signals. May be scanning or probing activity. |
| 0-9 | None | No significant evidence of suspicious activity. |
The saturation curve 100 × (1 - e-raw/70) naturally compresses scores into this range:
| Raw Evidence | Final Score |
|---|---|
| 35 | ~39 |
| 70 | ~63 |
| 140 | ~86 |
| 200 | ~94 |
| 300 | ~99 |
Sensor evidence comes from our honeypot network and consists of four components: behaviors, primitives, volume, and protocol diversity.
Behaviors are analyst-classified attack patterns. Each behavior has a severity level that determines its weight. Count scaling uses sqrt(count) with a ceiling of 6 to reward repeated activity without runaway scores.
| Severity | Weight | Examples |
|---|---|---|
| very_high | 55 | Patterns consistent with exploitation, ransomware |
| high | 35 | Credential stuffing, lateral movement |
| medium | 20 | Privilege escalation attempts, data exfiltration |
| low | 8 | Reconnaissance, service enumeration |
| info | 3 | Banner grabbing, version detection |
A diversity bonus of +6 points is added for each additional distinct behavior beyond the first, recognizing that multiple attack patterns indicate a more sophisticated threat.
Primitives are atomic suspicious indicators (e.g., specific commands, payloads, or patterns) that haven't been classified into behaviors. They always contribute to the score, but are discounted to 40% when behaviors are also present to avoid double-counting.
Each primitive contributes independently using logarithmic scaling, plus a diversity bonus for multiple distinct primitives:
| Context | Formula |
|---|---|
| No behaviors | Σ(2 × ln(1 + counti)) + 2 × ln(1 + distinct) |
| With behaviors | (Σ(2 × ln(1 + counti)) + 2 × ln(1 + distinct)) × 0.4 |
Where counti is the session count for each individual primitive, and distinct is the number of unique primitives matched.
Volume scoring uses per-day velocity rather than raw totals. This distinguishes between an IP that generated 1,000 sessions over a year versus one that did so in a week.
| Metric | Weight | Description |
|---|---|---|
| Sessions/day | 10 | Rate of connection attempts |
| Events/day | 8 | Rate of actions taken |
| Burst ratio | 5 | Events per session (intensity) |
Activity across multiple protocols indicates broader capability. Each protocol adds +2 points, up to a maximum of 6 protocols (+12 points).
2 high behaviors (counts: 10, 5)
1 medium behavior (count: 3)
= 35×min(6,√10) + 35×min(6,√5) + 20×min(6,√3)
= 35×3.16 + 35×2.24 + 20×1.73
= 110.6 + 78.4 + 34.6
= 223.6
+ diversity: 6 × 2 = 12
= 235.6 behavior points
5,000 sessions over 10 days
80,000 events total
sessions/day = 500
events/day = 8,000
burst = 16 events/session
= 10×ln(501) + 8×ln(8001) + 5×ln(17)
= 10×6.22 + 8×8.99 + 5×2.83
= 62.2 + 71.9 + 14.2
= 148.3 volume points
Contributor evidence comes from reports submitted by the security community. It provides independent validation of suspicious activity.
Multiple independent reporters significantly increases confidence. The formula weighs unique reporters more heavily than total reports:
Report categories are weighted by threat severity:
| Weight | Categories |
|---|---|
| 8 | DDoS Attack, Web Exploit, SQL Injection, Exploited Host, Malware Distribution |
| 5 | Brute Force, Phishing, DNS Abuse, IoT Targeting, Spoofing, Fraud |
| 3 | Open Proxy |
| 1.5 | Port Scan, Spam, Bad Bot, Other |
Reports across multiple protocols indicate broader suspicious activity:
8 reports from 5 unique reporters
Categories: Brute Force (5), DDoS (3)
Protocols: SSH, HTTP
credibility = 7×ln(6) + 4×ln(9)
= 7×1.79 + 4×2.20 = 21.3
categories = 5×ln(6) + 8×ln(4)
= 5×1.79 + 8×1.39 = 20.1
protocols = 2×ln(3) = 2.2
= 43.6 contributor points
The confidence level represents total evidence strength without time decay. This provides a stable, consistent score that reflects the complete history of suspicious activity from an IP address.
For users who need recency information, timestamps are provided:
| Field | Description |
|---|---|
| firstSeen | First time this IP was observed by sensors (epoch ms) |
| lastSeen | Most recent activity from this IP (epoch ms) |
You can implement your own recency logic based on these timestamps. For example, you might choose to ignore IPs not seen in the last 30 days, or apply your own decay multiplier based on lastSeen.
Time decay adds complexity and can produce surprising results. An IP with extensive suspicious history shouldn't suddenly appear "clean" just because it hasn't been seen recently.
By separating the score (evidence strength) from recency (timestamps), users can make informed decisions based on their own risk tolerance and use case requirements.
// Only block if seen in last 30 days
const thirtyDaysAgo = Date.now() - 30*24*60*60*1000;
if (ip.confidenceLevel >= 70 && ip.lastSeen > thirtyDaysAgo) {
block(ip);
}
When both sensor and contributor data exist for an IP, a corroboration multiplier is applied. Independent sources agreeing dramatically increases confidence.
The multiplier ranges from 1.15x to 1.25x based on signal strength:
Where min_signals is the minimum of:
This ensures that both sources must have meaningful signals for the bonus to apply fully.
| Min Signals | Multiplier |
|---|---|
| 1 | 1.19x |
| 2 | 1.21x |
| 3 | 1.22x |
| 5 | 1.24x |
| 6+ | 1.25x |
Sensor: 3 behaviors, primitives present (4 signals)
Contributor: 6 unique reporters
Combined evidence: 180 points
min_signals = min(4, 6) = 4
multiplier = 1.15 + 0.10 × (ln(5)/ln(7))
= 1.15 + 0.10 × 0.83 = 1.23
final = 180 × 1.23 = 221.4
score = 100 × (1 - e-221.4/70)
= 96
If any behavior with very_high severity is present, a floor of 75 is enforced.
This ensures that strong exploitation signals always result in a high confidence level, even if other evidence is sparse.
| Condition | Effect |
|---|---|
| Very high severity behavior present | Score = max(calculated, 75) |
| No very high severity behavior | Score unchanged |
An IP exhibiting patterns consistent with ransomware or exploitation should never score below "High Confidence", even if it only appeared in a single session.
The lastSeen timestamp is available for users who want to apply their own recency filtering to critical threats.
Known-benign scanners (Googlebot, Bingbot, Censys, Shodan, etc.) routinely probe internet-facing services and can accumulate high confidence levels. To prevent users from inadvertently blocking legitimate services, a whitelist discount is applied to IPs that fall within published scanner IP ranges.
Each whitelist source has a configurable discount multiplier (0.0 to 1.0). When an IP matches a whitelisted range, the final score is:
Both the raw and discounted scores are stored. The API returns the discounted score by default, so your blacklists automatically exclude known scanners.
If you want the raw, undiscounted score (for example, to block all scanners regardless of origin), pass ignoreWhitelist=true on the IP Check or Blacklist endpoints.
IP ranges are automatically synced from published provider lists. Currently tracked sources include:
| Provider | Type | Discount |
|---|---|---|
| Googlebot | Search engine crawler | 0.15 |
| Bingbot | Search engine crawler | 0.15 |
| Censys | Security research scanner | 0.30 |
| Shodan | Security research scanner | 0.30 |
| Cloudflare | CDN / security provider | 0.10 |
Discount values are approximate and may be adjusted as we calibrate the system. Lower values mean more aggressive discounting (0.0 = score reduced to zero, 1.0 = no discount).
Raw score: 82 (from scanning activity)
Discount: 0.15 (search engine crawler)
confidenceLevel = floor(82 × 0.15)
= floor(12.3)
= 12
With ignoreWhitelist=true:
confidenceLevel = 82
Raw score: 65 (port scanning activity)
Discount: 0.30 (security researcher)
confidenceLevel = floor(65 × 0.30)
= floor(19.5)
= 20
Falls below typical scoreMinimum of 50,
so excluded from default blacklists.
Unlike algorithms with aggressive intermediate caps, evidence accumulates until final normalization. An IP with 10 behaviors doesn't score the same as one with 3 just because both hit a cap.
Using sqrt and ln instead of log2 provides stronger growth for higher counts. Going from 10 to 1000 events meaningfully increases the score.
Sessions-per-day and events-per-day matter more than raw totals. 1,000 sessions in a day is far more concerning than 1,000 sessions over a year.
No time decay is applied. The score represents total evidence strength, providing a stable metric. Recency is exposed via timestamps for users who need it.
Two independent sources agreeing is a strong signal. The corroboration multiplier rewards this rather than simply adding the scores together.
Known-benign scanners are discounted by default, but both the raw and discounted scores are available. Users who want to block all scanners regardless can opt out with a single query parameter.
Mixed classified and unclassified activity should score higher than pure classified activity. Primitives are discounted when behaviors exist, not discarded.
All weights and thresholds are configurable. The values documented here are calibrated against our threat data but can be adjusted based on: