Skip to content

cap_im_platform — IM platform integration

Entry point: cap_im_platform.c · header: cap_im_platform.h · Skill: SKILL.md

cap_im_platform is the unified source component for ESP-Claw IM integrations. It bundles the Feishu, QQ, Telegram, WeChat, and shared attachment implementation into one ESP-IDF component, while keeping the runtime surface split by platform.

This means the build dependency is unified, but the existing runtime group ids and tool names remain stable:

Runtime groupEvent sourceTextImageFile
cap_im_feishufeishu_gatewayfeishu_send_messagefeishu_send_imagefeishu_send_file
cap_im_qqqq_gatewayqq_send_messageqq_send_imageqq_send_file
cap_im_tgtg_gatewaytg_send_messagetg_send_imagetg_send_file
cap_im_wechatwechat_gatewaywechat_send_messagewechat_send_imageNot supported

The platform component keeps each backend in its own source file so protocol-specific logic stays isolated:

SourceResponsibility
cap_im_platform.cRegisters all enabled IM runtime groups.
cap_im_feishu.cFeishu WebSocket/Event API ingress, rich text flattening, and sends.
cap_im_qq.cQQ Bot WebSocket ingress, token handling, and sends.
cap_im_tg.cTelegram long-poll ingress, attachment download queue, and sends.
cap_im_wechat.cWeChat ClawBot polling, QR login state, and sends.
cap_im_attachment.cShared local attachment path helpers.

Each backend follows the same split:

  1. Event source: receive messages from the IM platform, normalize them, and publish claw_event_router events.
  2. Callable tools: expose platform-specific send functions so the Agent, Console, or automation can send text or media.
  3. Attachment handling: save inbound media under the configured inbox root and publish attachment_saved events for downstream rules.
Diagram

Application startup prepares credentials and attachment settings per enabled platform, then registers the matching runtime groups. The edge_agent app binds outbound Event Router channels such as qq, feishu, telegram, and wechat to the corresponding send tools.

The unified Skill declares all four IM runtime groups in cap_groups. Activating the Skill gives the model the platform-specific tools together with the guidance for choosing the current channel and avoiding duplicate replies.

PlatformInbound modelChat targetNotes
FeishuWebSocket/Event APIFeishu chat_id, or user open_id beginning with ou_Text sends prefer Markdown-capable interactive cards with plain-text fallback. Media captions are sent as follow-up text.
QQQQ Bot WebSocket APIc2c:<openid> or group:<group_openid>File delivery depends on QQ platform support; image and generic file paths are separate tool calls.
TelegramBot API long pollingNumeric chat id such as 123456789 or -100...Long text is chunked and files are uploaded with multipart streaming.
WeChatClawBot polling APIConcrete room id or contact idText and image sends are supported; generic non-image file send is not available.

Telegram remains a useful representative implementation because it shows the full pattern in a compact backend: long-poll ingress, deduplication, async attachment downloads, and callable text/media sends.

Diagram

The Telegram backend starts two FreeRTOS tasks from the cap_im_tg group start hook.

tg_poll_task calls getUpdates with a 20 s long-poll timeout, parses each update, and publishes events:

// Text -> standard message event
claw_event_router_publish_message(
    "tg_gateway",   // source_cap
    "telegram",     // source_channel
    chat_id,        // chat id
    text,           // body
    sender_id,      // sender id
    message_id      // message id
);

claw_event_router then routes the event to claw_core for the Agent or to automation actions.

Network jitter can replay updates; cap_im_tg keeps a ring of FNV-1a 64-bit hashes so the same message is not handled twice:

#define CAP_IM_TG_DEDUP_CACHE_SIZE 64

static bool cap_im_tg_dedup_check_and_record(const char *update_key)
{
    uint64_t key = cap_im_tg_fnv1a64(update_key);
    for (size_t i = 0; i < CAP_IM_TG_DEDUP_CACHE_SIZE; i++) {
        if (s_tg.seen_update_keys[i] == key) return true; // seen
    }
    s_tg.seen_update_keys[s_tg.seen_update_idx] = key;
    s_tg.seen_update_idx = (s_tg.seen_update_idx + 1) % CAP_IM_TG_DEDUP_CACHE_SIZE;
    return false;
}

Media download is slow, so Telegram handles it asynchronously:

  1. tg_poll_task enqueues cap_im_tg_attachment_job_t items into a queue.
  2. tg_attachment_task consumes jobs, calls getFile, and streams the payload into FATFS.
  3. On completion it publishes attachment_saved with local path, MIME, size, and platform metadata.
// Example attachment_saved payload_json
{
  "platform": "telegram",
  "attachment_kind": "photo",
  "saved_path": "/fatfs/inbox/telegram/-123456/789/photo.jpg",
  "saved_dir": "/fatfs/inbox/telegram/-123456/789",
  "saved_name": "photo.jpg",
  "mime": "image/jpeg",
  "caption": "Look at this",
  "platform_file_id": "AgACAgIAAxkBAAI...",
  "size_bytes": 45231,
  "saved_at_ms": 1714000000000
}

Downstream rules can listen for attachment_saved and chain cap_llm_inspect, file operations, or custom automation.

The cap_im_tg runtime group registers four descriptors:

Tool IDDescriptionkind
tg_gatewayPoll gateway (event source)EVENT_SOURCE
tg_send_messageSend text to a chat_idCALLABLE
tg_send_imageSend a local image fileCALLABLE
tg_send_fileSend a local arbitrary fileCALLABLE

For tg_send_message, chat_id falls back to the current call context when omitted:

// chat_id precedence: JSON arg > call context
if (cJSON_IsString(chat_id_json) && chat_id_json->valuestring[0]) {
    chat_id = chat_id_json->valuestring;
} else if (ctx && ctx->chat_id && ctx->chat_id[0]) {
    chat_id = ctx->chat_id;  // inherit session context
}

Long text is chunked to fit Telegram’s message limits. tg_send_image and tg_send_file upload via multipart/form-data, using stat() for exact Content-Length and streaming parts through esp_http_client_open instead of buffering the whole file in RAM.

Application code configures the Telegram backend through the cap_im_tg_* API exported by cap_im_platform:

// Bot token (must be set before start)
cap_im_tg_set_token("YOUR_BOT_TOKEN");

// Optional inbound attachment policy
cap_im_tg_set_attachment_config(&(cap_im_tg_attachment_config_t){
    .storage_root_dir         = "/fatfs/inbox",
    .max_inbound_file_bytes   = 2 * 1024 * 1024,  // max 2 MB
    .enable_inbound_attachments = true,
});

// Manual start (normally via claw_cap_start_group)
cap_im_tg_start();

The same architectural roles are used by the other backends, with platform-specific authentication, message formats, and media APIs.