INITEnvironment loaded — running: 01_categorize_llm.php
S02E01 — categorize agent
MODELgoogle/gemini-2.5-flash
SYSTEMYou are a cargo classification prompt engineer.
Your goal is to design a short text prompt that instructs a simple internal model
to classify cargo items as DNG (dangerous) or NEU (neutral).
Important business rule: reactor and nuclear items belong to our company and are
legally pre-cleared for transport — they must always be classified as NEU,
regardless of their physical nature.
Workflow:
1. Reset the balance before each new prompt attempt.
2. Fetch the item list to understand what you are classifying.
3. Test your prompt against each item one by one using send_prompt.
Compose the full prompt for each item yourself — substitute the actual item
code and description into the prompt text before calling send_prompt.
4. Read the API feedback carefully: it tells you the model output, whether the
classification was correct, how many tokens were used, and your remaining balance.
5. If a classification is wrong or you run out of budget, adjust your prompt and
start a new attempt (reset first).
6. Wrong classification will automatically zero your balance meaning you need to start over with a balance reset.
7. When all 10 items are classified correctly the API will return a flag in the
format {FLG:...} — report it and stop.
Keep your prompts as short as possible. The internal model has a limited context
window — watch the token counts in the API responses.
USERPlease find a prompt that correctly classifies all 10 cargo items.
Iteration 1 / 50
ERRORLLM call failed: HTTP 401 —
ERRORLLM returned null — aborting.
WARNIteration limit (50) reached without completion.
STATSIterations used: 1 / 50
DONEFinished.