API Reference

domain-scout can be used as a Python library for programmatic domain discovery.

Basic usage

from domain_scout import Scout

result = Scout().discover(
    company_name="Palo Alto Networks",
    seed_domain="paloaltonetworks.com",
)

for domain in result.domains:
    print(f"{domain.domain:40s}  {domain.confidence:.2f}  {domain.sources}")

Async usage

import asyncio
from domain_scout import Scout, EntityInput

async def main():
    scout = Scout()
    result = await scout.discover_async(EntityInput(
        company_name="Palo Alto Networks",
        seed_domain=["paloaltonetworks.com"],
    ))
    return result

result = asyncio.run(main())

Configuration

from domain_scout import Scout
from domain_scout.config import ScoutConfig

config = ScoutConfig(
    total_timeout=180,          # seconds
    deep_mode=True,             # enable GeoDNS
    discovery_mode="fingerprint",  # "default" or "fingerprint"
    fp_candidate_limit=200,     # max candidates to fingerprint-verify
    dns_timeout=5.0,            # per-query DNS timeout
    org_match_threshold=0.65,   # fuzzy match threshold
    inclusion_threshold=0.6,    # minimum confidence to include
    geodns_concurrency=3,       # concurrent GeoDNS requests
    geodns_delay=0.5,           # delay between GeoDNS requests
)

scout = Scout(config=config)
result = scout.discover(company_name="Acme Corp", seed_domain="acme.com")

Discovery profiles

Profiles provide preset threshold configurations for different use cases:

from domain_scout.config import ScoutConfig

config = ScoutConfig.from_profile("broad")    # lower thresholds, includes non-resolving
config = ScoutConfig.from_profile("balanced")  # defaults
config = ScoutConfig.from_profile("strict")    # higher thresholds

# Profiles accept overrides
config = ScoutConfig.from_profile("broad", total_timeout=200)

Or via CLI: domain-scout --name "Acme" --seed acme.com --profile strict

Response models

ScoutResult

class ScoutResult(BaseModel):
    entity: EntityInput                                  # the input
    domains: list[DiscoveredDomain]                      # discovered domains, sorted by confidence
    seed_domain_assessment: dict[str, str]               # seed -> "confirmed" | "suspicious" | "invalid" | "timeout"
    seed_cross_verification: dict[str, list[str]]        # seed -> list of co-hosted seeds
    run_metadata: RunMetadata                             # audit trail and reproducibility metadata

DiscoveredDomain

class DiscoveredDomain(BaseModel):
    domain: str                          # e.g. "samsclub.com"
    confidence: float                    # 0.0 to 1.0
    sources: list[str]                   # e.g. ["ct_org_match", "ct_san_expansion:walmart.com"]
    evidence: list[EvidenceRecord]       # structured attribution evidence
    cert_org_names: list[str]            # organization names from certificates
    first_seen: datetime | None          # earliest cert notBefore
    last_seen: datetime | None           # latest cert notAfter
    resolves: bool                       # DNS resolution status
    is_seed: bool                        # True if this is a seed domain
    seed_sources: list[str]              # which seeds contributed to discovering this domain

EvidenceRecord

class EvidenceRecord(BaseModel):
    source_type: str                     # e.g. "ct_org_match", "ct_san_expansion", "dns_guess"
    description: str                     # human-readable explanation
    seed_domain: str | None = None       # which seed produced this evidence
    cert_id: int | None = None           # crt.sh certificate ID (links to https://crt.sh/?id=N)
    cert_org: str | None = None          # O= field from the certificate
    similarity_score: float | None = None  # org-name similarity 0.0-1.0

RunMetadata

class RunMetadata(BaseModel):
    schema_version: str = "1.0"          # output schema version
    tool_version: str                    # domain-scout package version
    timestamp: datetime                  # UTC timestamp of the run
    elapsed_seconds: float               # wall-clock duration
    domains_found: int                   # number of domains in output
    timed_out: bool = False              # whether any phase timed out
    seed_count: int = 0                  # number of seed domains used
    errors: list[str]                    # warnings and errors encountered
    config: dict[str, object]            # snapshot of ScoutConfig used

EntityInput

class EntityInput(BaseModel):
    company_name: str                        # required
    location: str | None = None              # optional
    seed_domain: list[str] = []              # optional, repeatable
    industry: str | None = None              # optional

Delta reporting

Compare two scan results to see what changed:

from domain_scout import compute_delta, Scout

baseline = Scout().discover(company_name="Acme Corp", seed_domain="acme.com")
# ... time passes ...
current = Scout().discover(company_name="Acme Corp", seed_domain="acme.com")

report = compute_delta(baseline, current)
print(f"Added: {report.summary.added}, Removed: {report.summary.removed}")
for d in report.added:
    print(f"  + {d.domain}")
for d in report.removed:
    print(f"  - {d.domain}")
for c in report.changed:
    print(f"  ~ {c.domain}: {[ch.field for ch in c.changes]}")

Or via CLI:

domain-scout diff baseline.json current.json            # table output
domain-scout diff baseline.json current.json -o json    # JSON output

DeltaReport

class DeltaReport(BaseModel):
    added: list[DiscoveredDomain]        # domains in current but not baseline
    removed: list[DiscoveredDomain]      # domains in baseline but not current
    changed: list[ChangedDomain]         # domains in both with meaningful differences
    summary: DeltaSummary                # aggregate counts
    warnings: list[DeltaWarning]         # context warnings (different seeds, config, etc.)
    baseline_metadata: RunMetadata       # metadata from the baseline scan
    current_metadata: RunMetadata        # metadata from the current scan

ChangedDomain

class ChangedDomain(BaseModel):
    domain: str                          # e.g. "samsclub.com"
    changes: list[DomainChange]          # field-level changes
    baseline_confidence: float           # confidence in baseline
    current_confidence: float            # confidence in current

DomainChange

class DomainChange(BaseModel):
    field: str                           # "confidence", "resolves", "sources", or "rdap_org"
    old: float | bool | str | list[str] | None
    new: float | bool | str | list[str] | None

DeltaSummary

class DeltaSummary(BaseModel):
    added: int
    removed: int
    changed: int
    unchanged: int
    baseline_total: int
    current_total: int

DeltaWarning

class DeltaWarning(BaseModel):
    code: str                            # e.g. "seeds_changed", "config_changed"
    message: str                         # human-readable explanation

REST API

Start the server:

domain-scout serve --port 8080
domain-scout serve --port 8080 --api-key YOUR_KEY  # require authentication

Server environment variables

Configure server-wide defaults so clients don't need to pass paths per request:

Variable	Description	Default
`DOMAIN_SCOUT_WAREHOUSE_PATH`	Path to parquet warehouse directory	None
`DOMAIN_SCOUT_SUBSIDIARIES_PATH`	Path to subsidiaries CSV file	None
`DOMAIN_SCOUT_LOCAL_MODE`	`disabled`, `local_only`, or `local_first`	`disabled` (auto-enables `local_first` if warehouse path is set and no explicit mode is given)
`DOMAIN_SCOUT_API_KEY`	Require this key on authenticated endpoints	None
`DOMAIN_SCOUT_CACHE`	Enable DuckDB cache (`true`/`false`)	`true`
`DOMAIN_SCOUT_CACHE_DIR`	DuckDB cache directory	System default
`DOMAIN_SCOUT_MAX_CONCURRENT`	Max concurrent scans	`3`

Example deployment with warehouse:

export DOMAIN_SCOUT_WAREHOUSE_PATH=/opt/ct-warehouse
export DOMAIN_SCOUT_API_KEY=secret
domain-scout serve --port 8080
# Server auto-enables local_first mode — clients just POST to /scan

Endpoints

Method	Path	Description
`POST`	`/scan`	Run a domain discovery scan
`POST`	`/diff`	Compare two scan results
`GET`	`/health`	Health check (returns version + status)
`GET`	`/ready`	Readiness probe (checks crt.sh connectivity)
`GET`	`/cache/stats`	Cache statistics
`POST`	`/cache/clear`	Clear all cached entries
`GET`	`/metrics`	Prometheus metrics

Authenticated endpoints (/scan, /diff, /cache/*) require X-API-Key header when --api-key is set.

POST /scan

{
  "entity": {
    "company_name": "Shelter Insurance",
    "seed_domain": ["shelterinsurance.com"]
  },
  "profile": "balanced",
  "timeout": 120,
  "deep": false,
  "local_mode": null,
  "warehouse_path": null,
  "subsidiaries_path": null
}

Only entity.company_name is required. All other fields are optional:

Field	Type	Description
`profile`	`broad` \| `balanced` \| `strict`	Threshold preset
`timeout`	`5-300`	Override total timeout (seconds)
`deep`	`bool`	Enable GeoDNS deep mode
`local_mode`	`disabled` \| `local_only` \| `local_first` \| `null`	Override server default
`warehouse_path`	`string` \| `null`	Override server default warehouse path
`subsidiaries_path`	`string` \| `null`	Override server default subsidiaries path

When local_mode, warehouse_path, or subsidiaries_path are null (or omitted), the server's environment variable defaults are used.

Returns a ScoutResult JSON object.

JSON output

Use model_dump_json() for serialization:

result = scout.discover(company_name="Acme Corp")
print(result.model_dump_json(indent=2))