Concierge Security

Concierge includes a multi-layer security system designed to protect your site, control costs, and prevent misuse of the AI. This page explains what each layer does and why it is there.

Prompt Injection Detection

Prompt injection is an attempt by a visitor to manipulate the assistant into ignoring its instructions — for example, by typing "ignore all previous instructions" or "pretend you are a different AI". Left unchecked, this kind of input can cause an assistant to behave unpredictably or reveal information it should not.

Concierge checks every incoming message against 14 known injection patterns before sending anything to the AI. If a match is found, the message is blocked immediately — no API call is made, so no usage cost is incurred — and the visitor receives a polite, neutral response redirecting them to genuine enquiries. The blocked message is logged for your records.

Rate Limiting

Each visitor session is limited to five messages per minute. This prevents automated tools from sending large volumes of requests in a short period, which could drive up API costs or attempt to extract information through repetitive querying.

If a session exceeds this limit, the assistant responds with a short message asking the visitor to wait a moment. Normal conversation resumes automatically once the minute window resets.

Conversation Length Cap

Each conversation is capped at 20 exchanges. When this limit is reached, the assistant closes the session gracefully and directs the visitor to contact your team. This limit prevents unusually long sessions from consuming disproportionate API usage and closes off a common method of attempting to wear down an assistant's instructions through persistence.

Input Length Limit

Individual messages are limited to 1,500 characters. Messages longer than this are blocked before reaching the API. The visitor is asked to break their question into shorter parts. This prevents attempts to overwhelm the assistant with very large inputs designed to push the system prompt out of the model's attention.

Response Token Cap

All responses from the AI are capped at 600 tokens — approximately 400 to 450 words. This cap is applied at the API level and cannot be bypassed. It keeps responses at a natural conversational length and prevents runaway responses that could suggest something unexpected is happening.

Response Scanning

Before a response is delivered to the visitor, Concierge scans it for signs of prompt leakage — phrases that suggest the assistant may be revealing the contents of its system prompt or behaving anomalously. If a suspicious pattern is detected, the response is replaced with a safe, neutral message. The visitor never sees the raw output.

Hardened System Prompt

Every conversation includes a set of security rules appended automatically to your system prompt. These rules instruct the assistant never to reveal its instructions, never to adopt a different persona, and to redirect any attempt to manipulate it. These rules are added by Concierge at the point of sending and cannot be removed or overridden by visitor messages.

Blocked Request Log

Every blocked request — whether stopped by injection detection, rate limiting, the turn cap, or the input length limit — is recorded in a rolling log. The log stores the session identifier, the reason for the block, a short excerpt of the blocked message, and the domain it came from. The log holds the 100 most recent entries.

This log is accessible to White Media for diagnostic purposes. If you notice unusual patterns in your widget behaviour, it can help identify the source.

API Key & Secret Storage

Your Anthropic API key (Option A) and your proxy shared secret (Option B) are stored encrypted in your WordPress database using AES-256 encryption. The encryption key is derived from your site's unique authentication key, which means the stored values are tied to your specific WordPress installation and cannot be decrypted if moved to another server without the original keys.