cap_im_platform — IM platform integration
Entry point: cap_im_platform.c · header: cap_im_platform.h · Skill: SKILL.md
cap_im_platform is the unified source component for ESP-Claw IM integrations. It bundles the Feishu, QQ, Telegram, WeChat, and shared attachment implementation into one ESP-IDF component, while keeping the runtime surface split by platform.
This means the build dependency is unified, but the existing runtime group ids and tool names remain stable:
| Runtime group | Event source | Text | Image | File |
|---|---|---|---|---|
cap_im_feishu | feishu_gateway | feishu_send_message | feishu_send_image | feishu_send_file |
cap_im_qq | qq_gateway | qq_send_message | qq_send_image | qq_send_file |
cap_im_tg | tg_gateway | tg_send_message | tg_send_image | tg_send_file |
cap_im_wechat | wechat_gateway | wechat_send_message | wechat_send_image | Not supported |
Component layout
Section titled “Component layout”The platform component keeps each backend in its own source file so protocol-specific logic stays isolated:
| Source | Responsibility |
|---|---|
| cap_im_platform.c | Registers all enabled IM runtime groups. |
| cap_im_feishu.c | Feishu WebSocket/Event API ingress, rich text flattening, and sends. |
| cap_im_qq.c | QQ Bot WebSocket ingress, token handling, and sends. |
| cap_im_tg.c | Telegram long-poll ingress, attachment download queue, and sends. |
| cap_im_wechat.c | WeChat ClawBot polling, QR login state, and sends. |
| cap_im_attachment.c | Shared local attachment path helpers. |
Runtime model
Section titled “Runtime model”Each backend follows the same split:
- Event source: receive messages from the IM platform, normalize them, and publish
claw_event_routerevents. - Callable tools: expose platform-specific send functions so the Agent, Console, or automation can send text or media.
- Attachment handling: save inbound media under the configured inbox root and publish
attachment_savedevents for downstream rules.
Startup and visibility
Section titled “Startup and visibility”Application startup prepares credentials and attachment settings per enabled platform, then registers the matching runtime groups. The edge_agent app binds outbound Event Router channels such as qq, feishu, telegram, and wechat to the corresponding send tools.
The unified Skill declares all four IM runtime groups in cap_groups. Activating the Skill gives the model the platform-specific tools together with the guidance for choosing the current channel and avoiding duplicate replies.
Platform differences
Section titled “Platform differences”| Platform | Inbound model | Chat target | Notes |
|---|---|---|---|
| Feishu | WebSocket/Event API | Feishu chat_id, or user open_id beginning with ou_ | Text sends prefer Markdown-capable interactive cards with plain-text fallback. Media captions are sent as follow-up text. |
| QQ Bot WebSocket API | c2c:<openid> or group:<group_openid> | File delivery depends on QQ platform support; image and generic file paths are separate tool calls. | |
| Telegram | Bot API long polling | Numeric chat id such as 123456789 or -100... | Long text is chunked and files are uploaded with multipart streaming. |
| ClawBot polling API | Concrete room id or contact id | Text and image sends are supported; generic non-image file send is not available. |
Telegram as a reference backend
Section titled “Telegram as a reference backend”Telegram remains a useful representative implementation because it shows the full pattern in a compact backend: long-poll ingress, deduplication, async attachment downloads, and callable text/media sends.
Event source: long polling
Section titled “Event source: long polling”The Telegram backend starts two FreeRTOS tasks from the cap_im_tg group start hook.
tg_poll_task calls getUpdates with a 20 s long-poll timeout, parses each update, and publishes events:
claw_event_router then routes the event to claw_core for the Agent or to automation actions.
Dedup cache
Section titled “Dedup cache”Network jitter can replay updates; cap_im_tg keeps a ring of FNV-1a 64-bit hashes so the same message is not handled twice:
Attachments
Section titled “Attachments”Media download is slow, so Telegram handles it asynchronously:
tg_poll_taskenqueuescap_im_tg_attachment_job_titems into a queue.tg_attachment_taskconsumes jobs, callsgetFile, and streams the payload into FATFS.- On completion it publishes
attachment_savedwith local path, MIME, size, and platform metadata.
Downstream rules can listen for attachment_saved and chain cap_llm_inspect, file operations, or custom automation.
Callable tools
Section titled “Callable tools”The cap_im_tg runtime group registers four descriptors:
| Tool ID | Description | kind |
|---|---|---|
tg_gateway | Poll gateway (event source) | EVENT_SOURCE |
tg_send_message | Send text to a chat_id | CALLABLE |
tg_send_image | Send a local image file | CALLABLE |
tg_send_file | Send a local arbitrary file | CALLABLE |
For tg_send_message, chat_id falls back to the current call context when omitted:
Long text is chunked to fit Telegram’s message limits. tg_send_image and tg_send_file upload via multipart/form-data, using stat() for exact Content-Length and streaming parts through esp_http_client_open instead of buffering the whole file in RAM.
Configuration API
Section titled “Configuration API”Application code configures the Telegram backend through the cap_im_tg_* API exported by cap_im_platform:
The same architectural roles are used by the other backends, with platform-specific authentication, message formats, and media APIs.