Default model
Supported models
Any model available through the HuggingFace Inference API:meta-llama/Llama-4-Scout-17B-16E-Instruct(default)meta-llama/Llama-4-Maverick-17B-128E-Instructmistralai/Mistral-Small-24B-Instruct-2501google/gemma-2-27b-it- Any model hosted on HuggingFace with an inference endpoint
Authentication
HuggingFace accepts the API key via theHUGGINGFACE_API_KEY env var:
Environment variable:
Some models on HuggingFace work without authentication, but rate limits are strict. Always use a token for reliable results.
Example usage
Tool-use support
No. The HuggingFace provider does not support tool-use attacks. Single-turn and multi-turn attacks work.Notes
- The provider uses the
huggingface_hubSDK’sInferenceClient - Max output tokens per request: 4,096
- System prompts are passed as a
systemrole message - Retry with exponential backoff is automatic on API failures (3 attempts)
- Token usage tracking is available when the HuggingFace API returns usage metadata
- Model availability depends on the HuggingFace Inference API - not all models have active endpoints
- Use the full model ID format:
org/model-name(e.g.meta-llama/Llama-4-Scout-17B-16E-Instruct)