Integration Guide — Your Data Stories

Overview

The Marketing Science Gateway is a stateless HTTP API. There is no SDK to install and no session state to manage. Every request is fully independent — send data in, get results back.

The API uses MCP (Model Context Protocol) — a lightweight JSON-RPC 2.0 envelope over HTTP. You do not need an MCP library to use it. All examples on this page use plain HTTP.

Every call follows the same two-step pattern:

Discover — describe what you need in plain English; get back the best-matching algorithms and their required inputs.
Execute — call the algorithm with your data; get back structured results.

You can skip discovery entirely if you already know which algorithm you want — go straight to Execute.

The endpoint

All requests go to a single URL provided when your API key is issued. The base URL takes the form:

endpoint

POST https://api.yourdatastories.com/mcp

Property	Value
Method	POST — all requests use POST
Content-Type	`application/json`
Accept	`application/json, text/event-stream`
Response format	Server-Sent Events (SSE) — one `data:` line containing JSON
State	Stateless — each request is independent; no persistent session required
Protocol	JSON-RPC 2.0 (MCP Streamable HTTP transport)

Authentication

All requests require an API key passed as a bearer token in the Authorization header:

http header

Authorization: Bearer YOUR_API_KEY

Your API key and endpoint URL are issued together. To request access, email admin@yourdatastories.com.

Keep your key private. Do not include it in client-side code or public repositories. Pass it as an environment variable or use your platform's secret management.

Quickstart

A Bayesian A/B test in two steps. Replace YOUR_API_KEY and YOUR_ENDPOINT with the values from your API key confirmation.

curl

bash

# Step 1 — open a session, capture the session ID
SESSION_ID=$(curl -s -D - -o /dev/null \
  -X POST YOUR_ENDPOINT \
  -H 'Content-Type: application/json' \
  -H 'Accept: application/json, text/event-stream' \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -d '{"jsonrpc":"2.0","method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"my-client","version":"1.0"}},"id":0}' \
  | grep -i 'mcp-session-id' | awk '{print $2}' | tr -d '\r')

# Step 2 — call the algorithm
curl -s -X POST YOUR_ENDPOINT \
  -H 'Content-Type: application/json' \
  -H 'Accept: application/json, text/event-stream' \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -H "Mcp-Session-Id: $SESSION_ID" \
  -d '{
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
      "name": "bayesian_ab_testing",
      "arguments": {
        "control_successes": 1200,
        "control_trials":    8400,
        "test_successes":    1380,
        "test_trials":       8400
      }
    },
    "id": 1
  }'

Python

python

import requests, json, os

ENDPOINT = os.environ["YDS_ENDPOINT"]   # e.g. https://api.yourdatastories.com/mcp
API_KEY  = os.environ["YDS_API_KEY"]

HEADERS = {
    "Content-Type":  "application/json",
    "Accept":         "application/json, text/event-stream",
    "Authorization":  f"Bearer {API_KEY}",
}

def get_session_id():
    r = requests.post(ENDPOINT, headers=HEADERS, json={
        "jsonrpc": "2.0", "method": "initialize",
        "params": {"protocolVersion": "2024-11-05", "capabilities": {},
                   "clientInfo": {"name": "my-client", "version": "1.0"}},
        "id": 0,
    }, timeout=8)
    return r.headers.get("Mcp-Session-Id") or r.headers.get("mcp-session-id")

def call_algorithm(session_id, tool_name, arguments):
    headers = {**HEADERS, "Mcp-Session-Id": session_id}
    r = requests.post(ENDPOINT, headers=headers, json={
        "jsonrpc": "2.0", "method": "tools/call",
        "params": {"name": tool_name, "arguments": arguments},
        "id": 1,
    }, timeout=30)
    # Parse the SSE envelope
    raw = r.text.strip()
    for line in raw.splitlines():
        if line.strip().startswith("data:"):
            raw = line.strip()[5:].strip()
            break
    rpc = json.loads(raw)
    return json.loads(rpc["result"]["content"][0]["text"])

sid    = get_session_id()
result = call_algorithm(sid, "bayesian_ab_testing", {
    "control_successes": 1200, "control_trials": 8400,
    "test_successes":    1380, "test_trials":    8400,
})
print(json.dumps(result, indent=2))

Discovering algorithms

find_marketing_algorithm is the gateway's built-in search tool. Describe what you want to analyse in plain English — it returns the top matching algorithms with their names, purpose, and required inputs.

Use it when you do not know which algorithm to call, or when you want to route a user's natural language question automatically.

python

matches = call_algorithm(sid, "find_marketing_algorithm", {
    "query": "segment my customers by purchase behaviour"
})

# Returns up to 5 ranked results:
# [{"tool_name": "rfm_segmentation", "purpose": "...", "required_inputs": [...], "score": 0.87}]
for algo in matches["matched_algorithms"]:
    print(algo["tool_name"], "—", algo["required_inputs"])

Discovery uses semantic search (sentence-transformers) with a keyword fallback. Plain synonyms and marketing domain terminology both work — you do not need to know the exact algorithm name.

Calling an algorithm

Once you know the algorithm name and its required inputs, call it directly. Browse all 134 algorithms and their input schemas in the algorithm library.

Request structure

json

{
  "jsonrpc": "2.0",
  "method":  "tools/call",
  "params": {
    "name":      "algorithm_name",
    "arguments": { /* algorithm inputs */ }
  },
  "id": 1
}

Example — RFM segmentation

python

result = call_algorithm(sid, "rfm_segmentation", {
    "records": [
        {"customer_id": "C001", "order_date": "2024-11-15", "revenue": 250},
        {"customer_id": "C002", "order_date": "2025-01-03", "revenue": 85},
        {"customer_id": "C001", "order_date": "2025-02-20", "revenue": 190},
    ],
    "customer_id_col": "customer_id",
    "date_col":        "order_date",
    "revenue_col":     "revenue",
})

Data preparation

All data is passed as inline JSON in the request body. No file uploads or external storage is required.

Property	Detail
Format	Array of row objects — `[{"col": value, ...}, ...]`
Row limit	~2,000–5,000 rows per request. Above this, response times increase. Contact us if you need to process larger datasets.
Numbers	Pass as numeric types, not strings. Strip currency symbols and thousands separators before sending — `$3,200` → `3200`
Dates	ISO 8601 strings: `YYYY-MM-DD` or `YYYY-MM-DD HH:MM:SS`
Column names	Match exactly what `required_inputs` specifies for the algorithm

Larger datasets: Signed URL (CSV/parquet via GCS) and BigQuery integration for production-scale workloads are on the roadmap. Email us if this is a blocker.

Parsing the response

Responses use Server-Sent Events (SSE) format. The body looks like this:

raw response

event: message
data: {"jsonrpc":"2.0","id":1,"result":{"content":[{"type":"text","text":"{...output...}"}]}}

The algorithm output is a JSON string nested inside result.content[0].text. Extract it like this:

python

raw = response.text.strip()
for line in raw.splitlines():
    if line.strip().startswith("data:"):
        raw = line.strip()[5:].strip()
        break
rpc    = json.loads(raw)
output = json.loads(rpc["result"]["content"][0]["text"])

Output types

Type in algorithm	Serialised as
DataFrame	`{"columns": [...], "data": [...]}` (split orient)
numpy array	JSON array
matplotlib Figure	Base64-encoded PNG string
numpy scalar	Python primitive (int or float)

Error handling

Input validation errors

When required inputs are missing or the wrong type, the outer JSON envelope is valid but content[0].text contains a plain-text error message rather than JSON. Always wrap the inner parse in a try/except:

python

text = rpc["result"]["content"][0]["text"]
try:
    output = json.loads(text)
except json.JSONDecodeError:
    raise ValueError(f"Validation error: {text}")

Timeouts

Most algorithms return in under 3 seconds. Model-training algorithms (churn_scoring, propensity_model, customer_lifetime_value_predictive) can take 10–20 seconds on larger payloads. Set your HTTP client timeout to at least 30 seconds.

Retries

The API is stateless — all requests are safe to retry. On timeout or 5xx, retry with standard exponential backoff (1s, 2s, 4s). Do not retry 400-level validation errors — fix the input first.

Agent framework integration

The gateway is a standards-compliant MCP server. Any MCP-compatible framework can connect to it directly — all 143 algorithms register as callable tools automatically.

Generic MCP server config

json

{
  "mcpServers": {
    "marketing-science": {
      "url":       "YOUR_ENDPOINT",
      "transport": "http",
      "headers": {
        "Authorization": "Bearer YOUR_API_KEY"
      }
    }
  }
}

OpenAI Agents SDK

python

from agents import Agent, Runner
from agents.mcp import MCPServerHTTP

gateway = MCPServerHTTP(
    url=os.environ["YDS_ENDPOINT"],
    headers={"Authorization": f"Bearer {os.environ['YDS_API_KEY']}"},
)

agent = Agent(
    name="Marketing Analyst",
    instructions="Use the marketing science tools to analyse data and return results.",
    mcp_servers=[gateway],
)

result = await Runner.run(agent, "Run RFM segmentation on this customer data: ...")

LangChain

python

from langchain_mcp_adapters.client import MultiServerMCPClient

async with MultiServerMCPClient({
    "marketing-science": {
        "url":       os.environ["YDS_ENDPOINT"],
        "transport": "streamable_http",
        "headers":   {"Authorization": f"Bearer {os.environ['YDS_API_KEY']}"},
    }
}) as client:
    tools = client.get_tools()
    # pass tools to your LangChain agent as normal

Recommended: Use find_marketing_algorithm as a routing tool in your agent rather than exposing all 143 algorithms directly. It keeps token counts low and lets the model focus on a small set of relevant candidates — the same approach used in our own reference agent.

Algorithm reference

The full library — 143 algorithms across 13 sections — is at:

www.yourdatastories.com/algorithms →

Each entry shows the algorithm name (the exact value to pass as "name" in tools/call), its dependencies, and what it computes. For required input keys, call find_marketing_algorithm with a plain English description — the response includes required_inputs for each match.