Troubleshooting
Symptom → likely cause → check → fix for dashboard, models, skills, MCP, channels, and resource issues.
Dashboard and startup
Each entry below follows the same four-part shape: symptom (what you observe), likely cause (why it usually happens), check (a command or inspection that confirms it), fix (the concrete action). Scan the symptoms until one matches.
Dashboard does not load on port 8080
Likely cause: the container is not running or the port was never published.
docker ps --filter name=golemcore-bot
docker logs --tail=50 golemcore-botFix: if the container is stopped, restart it with -p 8080:8080. If the container is running but the dashboard still does not load, check that the startup sequence completed without errors — a missing volume or corrupted preferences file will abort startup before the HTTP port is opened.
Settings disappear after every restart
Likely cause: the workspace volume is not mounted, so preferences are written to the ephemeral container filesystem and discarded on restart.
docker inspect golemcore-bot --format '{{ range .Mounts }}{{ .Destination }} {{ end }}'Fix: restart the container with -v golemcore-bot-data:/app/workspace. The Configuration page has the full command.
Initial admin password not in the logs
Likely cause: the container has already saved an admin credential on the volume, so no bootstrap password is generated on subsequent starts.
docker exec golemcore-bot cat /app/workspace/preferences/admin-credentials.jsonFix: reset the password through Settings → Security. If you are locked out entirely, set BOT_DASHBOARD_ADMIN_PASSWORD at container start — the bot accepts it as an override on the first boot after the file is absent.
Models and routing
Rate limit exceeded from the provider
Likely cause: request burst or shared key across multiple bots.
docker logs golemcore-bot | grep -Ei "rate.?limit|429"Fix:raise your plan's limit with the provider, lower llmRequestsPerMinute in preferences/rate-limit.json, or route the affected tier to a different provider in the model router.
Model router falls back to balanced on unknown tier
Likely cause: a skill references a tier name that is not configured in preferences/model-router.json.
docker exec golemcore-bot cat /app/workspace/preferences/model-router.jsonFix: either add the missing tier mapping, or change the skill to use a tier you actually have configured. balanced must always be present as the ultimate fallback.
Context length exceeded on a long session
Likely cause:auto-compaction cannot fit the session into the model's context window.
Check: look at the failing turn in Sessions. If auto-compaction ran but still overflowed, the message shows it explicitly.
Fix: start a new session (cleanest), let dynamic escalation pick a larger model on retry, or edit the affected tier in preferences/model-router.json to point at a model with a wider context window before resuming.
Skills and MCP
Skill does not appear in /skills list
Likely cause: the SKILL.md file is missing or has invalid YAML frontmatter.
docker exec golemcore-bot ls /app/workspace/skills/
docker exec golemcore-bot head -n 10 /app/workspace/skills/<name>/SKILL.mdFix:confirm the directory name matches the skill's name field and that the frontmatter block is delimited by --- on its own line at both ends.
MCP handshake timeout on first activation
Likely cause: the MCP server takes longer than startup_timeout to launch — common on first npx use because the package is still downloading.
docker logs golemcore-bot | grep -Ei "mcp|handshake|timeout"Fix: raise startup_timeoutin the skill's mcp: block (60s is a safe default for first runs), or pre-download the MCP server package into the image.
Tools from the MCP server never register
Likely cause: the handshake never completed, or the skill was deactivated before the handshake finished.
Check: tail docker logs for a line about the initialize and tools/list handshake completing for your skill. The bot only exposes MCP tools after both steps finish.
Fix: trigger the skill again in a fresh message. If the handshake still does not complete, raise startup_timeoutand inspect the MCP server's own stderr for a launch error.
Channels and webhooks
Webhook returns status: "error"
Likely cause: bad auth, unknown hook name, or an internal handler failure. The webhook endpoints are synchronous at the HTTP layer — they return accepted or error immediately, not a queued status.
docker logs golemcore-bot | grep -Ei "webhook|hook"Fix: if errorMessage in the response mentions auth, verify the bearer token or HMAC signature matches preferences/webhooks.json. If it mentions the hook name, confirm the custom hook is defined. For everything else, inspect the Sessions page using the chatId from the response body.
Telegram bot does not reply
Likely cause: token misconfigured, user not in the allow-list, or Telegram disabled in preferences.
docker exec golemcore-bot cat /app/workspace/preferences/telegram.jsonFix: verify the token with @BotFather, add your user ID to allowedUsers, and confirm enabled is true.
Tools and resources
Browser plugin crashes on page load
Likely cause: missing Chromium shared memory and sandbox flags. Chromium silently crashes inside Docker when /dev/shm is the default 64 MiB, and its user-namespace sandbox needs an extra capability.
Check: run docker inspect golemcore-bot and look for ShmSize and CapAdd.
Fix: restart the container with --shm-size=256m --cap-add=SYS_ADMIN. Both flags are the officially documented Playwright-in-Docker configuration; neither alone is enough.
Traces fill the disk
Likely cause: the per-session trace budget is larger than your disk can absorb for the number of sessions you run.
docker exec golemcore-bot du -sh /app/workspace/traces
docker exec golemcore-bot cat /app/workspace/preferences/tracing.jsonFix: lower sessionTraceBudgetMb (default 128) and maxTracesPerSession (default 100) in preferences/tracing.json. Tracing is size- and count-capped, not time-capped — you can also prune the directory manually with the container stopped.
Still stuck
Still stuck
Open an issue on GitHub Issues with the symptom, the check output, and the logs around the failure. The project lives on reproducible reports.
Related pages
Reference
FAQ
Answers to the operator questions that actually come up: restart behavior, key rotation, backups, shared keys, shared workspaces, disk requirements.
User Guide
Deployment
Three ways to run the bot — Docker, Docker Compose, or a plain JAR — and what to check before you call it production-ready.
User Guide
Configuration
How settings work, why you need a persistent volume, and where to find each setting in the dashboard.