Docs
    Chargement...
    Mankinds - AI Scorecard
    Mankinds AI Scorecard Interface

    From connection to decision in 4 steps

    01

    Connect your AI system

    Python/TypeScript SDK, REST API, or native connectors.
    Integration in minutes, not weeks.

    python
    02

    Import or generate your dataset

    Bring your own golden dataset or let Mankinds generate test scenarios automatically.
    Define what success looks like for your AI system.

    Import your dataset
    Auto-generate scenarios
    Test Scenarios42 scenarios
    #
    Input
    Expected output
    ...
    03

    Run an evaluation

    Our engine runs automated test batteries across your 6 dimensions. Heuristics, NER/PII detection, LLM-as-Judge, statistical metrics, all combined for robust evaluation.

    Testing prompt injection resistance...
    Analyzing PII leakage patterns...
    Evaluating response groundedness...

    Complete evaluation in ~10 minutes

    Running evaluation...42 test cases
    Privacy
    Security
    Accuracy
    Fairness
    Explainability
    Accountability
    04

    Get your verdict

    Clear scorecard, detailed report, actionable recommendations.
    Share with your team, export for audits, integrate into your CI/CD pipelines.

    Interactive scorecard
    Exportable PDF report
    Webhook for CI/CD
    Secure shareable link

    Trust Scorecard

    my-chatbot-v2.3

    A
    B
    C
    D
    F

    GO

    Ready for deployment

    CI/CD Integration

    Automate with your pipelines

    Block deployments that don't meet the trust threshold you define.

    yaml

    6 dimensions. One complete view.

    Each AI system is evaluated against a rigorous framework, aligned with international standards.

    Privacy

    Is your data protected, even against attacks?

    What we evaluate

    PII Reuse
    PII Request
    PII Masking
    PII in Logs
    PII in Database
    PII Anonymization
    Data Minimization
    Privacy Refusal
    Example finding

    "The system exposes phone numbers in 3% of responses when users rephrase their question ambiguously."

    Security

    Is your system resilient against attacks and adversarial inputs?

    What we evaluate

    PII Exfiltration
    Tech Exfiltration
    Internal Exfiltration
    Context Exfiltration
    Traces Exfiltration
    Prompt Injection
    Multi-turn Resistance
    Obfuscation Resistance
    Example finding

    "The system leaks its internal instructions when users encode requests in Base64 or use non-Latin scripts."

    Accuracy

    Does your AI respond correctly, every time?

    What we evaluate

    Reproducibility
    Response Quality
    Factual Grounding
    Hallucination Detection
    Response Completeness
    Contextual Coherence
    Reformulation Stability
    Edge Case Handling
    Example finding

    "The system hallucinates product prices in 12% of cases when information is not in the RAG context."

    Fairness

    Does your AI treat all users fairly?

    What we evaluate

    Age Bias
    Ethnic Bias
    Gender Bias
    Health Bias
    Identity Bias
    Religious Bias
    Socioeconomic Bias
    Intersectional Bias
    Example finding

    "The ML scoring systematically assigns 15% fewer points to candidates with foreign-sounding first names."

    Explainability

    Can you explain why the AI responded that way?

    What we evaluate

    Justification
    Purpose Disclosure
    AI Nature Disclosure
    AI Self-Disclosure
    Control Transparency
    Scope Clarification
    Scope Refusal
    Limitations
    Example finding

    "The system never cites sources in complex responses, making human verification impossible."

    Accountability

    Who is responsible when the AI makes a mistake?

    What we evaluate

    Usage Conformity
    Scope Creep Detection
    Decision Override
    Opt-out Mechanisms
    Override Resistance
    Secure Logging
    Traceability
    Human Escalation
    Example finding

    "No human escalation mechanism is planned for cases where the system detects its own uncertainty."

    These dimensions are not checkboxes. They are observed, measured, proven behaviors.

    All the AI systems you deploy

    Chatbots & Conversational Assistants

    Customer support, internal assistants, onboarding...

    Risks evaluated: hallucinations, inappropriate tone, data leaks, prompt injections.

    RAG Systems

    Knowledge bases, intelligent documentation, search...

    Risks evaluated: groundedness, source citation, retrieval-generation consistency, context poisoning.

    Autonomous AI Agents

    Agents that take actions, use tools...

    Risks evaluated: unauthorized actions, infinite loops, privilege escalation, irreversible decisions.

    Voicebots & Callbots

    Voice conversational AI, call centers...

    Risks evaluated: misunderstanding, inappropriate responses, sensitive voice data.

    Document Extraction & Classification

    Document parsing, entity extraction, classification...

    Risks evaluated: extraction errors, classification bias, mishandled personal data.

    ML Scoring & Classifiers

    Credit scoring, fraud detection, eligibility...

    Risks evaluated: discriminatory bias, decision explainability, prediction stability.

    Integrates with your existing stack

    LLM Providers

    OpenAI
    OpenAI
    Anthropic
    Anthropic
    Google
    Google
    Mistral
    Mistral
    AWS Bedrock
    AWS Bedrock

    Frameworks & Orchestration

    LangChain
    LangChain
    LlamaIndex
    LlamaIndex
    Haystack
    Haystack

    Data Sources

    PostgreSQL
    PostgreSQL
    MongoDB
    MongoDB
    MySQL
    MySQL
    Snowflake
    Snowflake

    Automation

    Copilot
    Copilot
    n8n
    n8n
    Zapier
    Zapier
    Make
    Make

    Observability

    Datadog
    Datadog
    MLflow
    MLflow
    Langfuse
    Langfuse

    CI/CD

    GitHub Actions
    GitHub Actions
    GitLab CI
    GitLab CI
    Jenkins
    Jenkins

    Aligned with international standards

    Our evaluation methodology is built on reference frameworks for AI trust.

    NIST AI RMF

    NIST AI RMF

    ISO/IEC 42001

    ISO/IEC 42001

    OWASP LLM Top 10

    OWASP LLM Top 10

    EU AI Act

    EU AI Act

    Mankinds is not a certification body.
    We provide the technical evaluations and documentation needed to facilitate your compliance processes.

    Ready to know if your AI is trustworthy?

    Start for free. Discover the power of Mankinds. No credit card required.