7 min read·Updated 2026-04-13

Troubleshooting

Symptom → likely cause → check → fix for dashboard, models, skills, MCP, channels, and resource issues.

Dashboard and startup

Each entry below follows the same four-part shape: symptom (what you observe), likely cause (why it usually happens), check (a command or inspection that confirms it), fix (the concrete action). Scan the symptoms until one matches.

Dashboard does not load on port 8080

Likely cause: the container is not running or the port was never published.

Check

bash

docker ps --filter name=golemcore-bot
docker logs --tail=50 golemcore-bot

Fix: if the container is stopped, restart it with -p 8080:8080. If the container is running but the dashboard still does not load, check that the startup sequence completed without errors — a missing volume or corrupted preferences file will abort startup before the HTTP port is opened.

Settings disappear after every restart

Likely cause: the workspace volume is not mounted, so preferences are written to the ephemeral container filesystem and discarded on restart.

Check

bash

docker inspect golemcore-bot --format '{{ range .Mounts }}{{ .Destination }} {{ end }}'

Fix: restart the container with -v golemcore-bot-data:/app/workspace. The Configuration page has the full command.

Initial admin password not in the logs

Likely cause: the container has already saved an admin credential on the volume, so no bootstrap password is generated on subsequent starts.

Check

bash

docker exec golemcore-bot cat /app/workspace/preferences/admin-credentials.json

Fix: reset the password through Settings → Security. If you are locked out entirely, set BOT_DASHBOARD_ADMIN_PASSWORD at container start — the bot accepts it as an override on the first boot after the file is absent.

Models and routing

Rate limit exceeded from the provider

Likely cause: request burst or shared key across multiple bots.

Check

bash

docker logs golemcore-bot | grep -Ei "rate.?limit|429"

Fix:raise your plan's limit with the provider, lower llmRequestsPerMinute in preferences/rate-limit.json, or route the affected tier to a different provider in the model router.

Model router falls back to balanced on unknown tier

Likely cause: a skill references a tier name that is not configured in preferences/model-router.json.

Check

bash

docker exec golemcore-bot cat /app/workspace/preferences/model-router.json

Fix: either add the missing tier mapping, or change the skill to use a tier you actually have configured. balanced must always be present as the ultimate fallback.

Context length exceeded on a long session

Likely cause:auto-compaction cannot fit the session into the model's context window.

Check: look at the failing turn in Sessions. If auto-compaction ran but still overflowed, the message shows it explicitly.

Fix: start a new session (cleanest), let dynamic escalation pick a larger model on retry, or edit the affected tier in preferences/model-router.json to point at a model with a wider context window before resuming.

Skills and MCP

Skill does not appear in /skills list

Likely cause: the SKILL.md file is missing or has invalid YAML frontmatter.

Check

bash

docker exec golemcore-bot ls /app/workspace/skills/
docker exec golemcore-bot head -n 10 /app/workspace/skills/<name>/SKILL.md

Fix:confirm the directory name matches the skill's name field and that the frontmatter block is delimited by --- on its own line at both ends.

MCP handshake timeout on first activation

Likely cause: the MCP server takes longer than startup_timeout to launch — common on first npx use because the package is still downloading.

Check

bash

docker logs golemcore-bot | grep -Ei "mcp|handshake|timeout"

Fix: raise startup_timeoutin the skill's mcp: block (60s is a safe default for first runs), or pre-download the MCP server package into the image.

Tools from the MCP server never register

Likely cause: the handshake never completed, or the skill was deactivated before the handshake finished.

Check: tail docker logs for a line about the initialize and tools/list handshake completing for your skill. The bot only exposes MCP tools after both steps finish.

Fix: trigger the skill again in a fresh message. If the handshake still does not complete, raise startup_timeoutand inspect the MCP server's own stderr for a launch error.

Channels and webhooks

Webhook returns `status: "error"`

Likely cause: bad auth, unknown hook name, or an internal handler failure. The webhook endpoints are synchronous at the HTTP layer — they return accepted or error immediately, not a queued status.

Check

bash

docker logs golemcore-bot | grep -Ei "webhook|hook"

Fix: if errorMessage in the response mentions auth, verify the bearer token or HMAC signature matches preferences/webhooks.json. If it mentions the hook name, confirm the custom hook is defined. For everything else, inspect the Sessions page using the chatId from the response body.

Telegram bot does not reply

Likely cause: token misconfigured, user not in the allow-list, or Telegram disabled in preferences.

Check

bash

docker exec golemcore-bot cat /app/workspace/preferences/telegram.json

Fix: verify the token with @BotFather, add your user ID to allowedUsers, and confirm enabled is true.

Tools and resources

Browser plugin crashes on page load

Likely cause: missing Chromium shared memory and sandbox flags. Chromium silently crashes inside Docker when /dev/shm is the default 64 MiB, and its user-namespace sandbox needs an extra capability.

Check: run docker inspect golemcore-bot and look for ShmSize and CapAdd.

Fix: restart the container with --shm-size=256m --cap-add=SYS_ADMIN. Both flags are the officially documented Playwright-in-Docker configuration; neither alone is enough.

Traces fill the disk

Likely cause: the per-session trace budget is larger than your disk can absorb for the number of sessions you run.

Check

bash

docker exec golemcore-bot du -sh /app/workspace/traces
docker exec golemcore-bot cat /app/workspace/preferences/tracing.json

Fix: lower sessionTraceBudgetMb (default 128) and maxTracesPerSession (default 100) in preferences/tracing.json. Tracing is size- and count-capped, not time-capped — you can also prune the directory manually with the container stopped.

Still stuck

Open an issue on GitHub Issues with the symptom, the check output, and the logs around the failure. The project lives on reproducible reports.

FAQ

Questions operators actually ask — restart behavior, key rotation, backups, disk choice.

Reference

Deployment

Deployment and the production checklist every item above references.

Concept

Reference

FAQ

Answers to the operator questions that actually come up: restart behavior, key rotation, backups, shared keys, shared workspaces, disk requirements.

User Guide

Deployment

Three ways to run the bot — Docker, Docker Compose, or a plain JAR — and what to check before you call it production-ready.

User Guide

Configuration

How settings work, why you need a persistent volume, and where to find each setting in the dashboard.

Troubleshooting

Dashboard and startup

Dashboard does not load on port 8080

Settings disappear after every restart

Initial admin password not in the logs

Models and routing

Rate limit exceeded from the provider

Model router falls back to balanced on unknown tier

Context length exceeded on a long session

Skills and MCP

Skill does not appear in /skills list

MCP handshake timeout on first activation

Tools from the MCP server never register

Channels and webhooks

Webhook returns status: "error"

Telegram bot does not reply

Tools and resources

Browser plugin crashes on page load

Traces fill the disk

Still stuck

Webhook returns `status: "error"`