Responses API (推奨)

モデル応答を作成します。テキストと画像の入力をサポートし、テキストまたはJSON出力を生成します。Function Calling（ツールコール）、ストリーミングレスポンス、マルチターン対話に対応しています。

新規プロジェクトにはResponses APIを推奨します。 これはOpenAIが提供する新世代APIで、Chat Completionsと比較して以下の利点があります：

ネイティブPrompt Caching — instructionsとinputが分離されており、システム指示が自動的にキャッシュプレフィックスとなります。マルチターン対話で変化しないプレフィックス部分はキャッシュヒット率が高く、最大50%の入力トークン費用を節約しながらレイテンシも低減できます
構造化されたitemモデル — 入出力フォーマットが明確で、ツールコールフローをネイティブにサポート
より豊富なストリーミングイベント — 細粒度のSSEイベントタイプにより、リアルタイムUIレンダリングが容易

エンドポイント


POST https://api.ofox.io/v1/responses

リクエストパラメータ

パラメータ	型	必須	説明
`model`	string	✅	モデル識別子（例：`openai/gpt-5.4-mini`）
`input`	string \| array	✅	入力内容。プレーンテキスト文字列または構造化メッセージ配列
`instructions`	string	—	システム指示（inputとは独立、自動的にPrompt Cachingが適用される）
`stream`	boolean	—	SSEストリーミングレスポンスを有効にするか、デフォルト`false`
`max_output_tokens`	number	—	最大生成トークン数
`temperature`	number	—	サンプリング温度 0-2、デフォルト 1
`top_p`	number	—	核サンプリングパラメータ
`tools`	array	—	利用可能なツール定義（Function Calling）
`tool_choice`	string \| object	—	ツール選択戦略：`auto`、`none` または指定ツール
`truncation`	string	—	切り詰め戦略：`auto` 自動切り詰め / `disabled` 上限超過でエラー（デフォルト）
`text`	object	—	テキスト生成形式の設定
`store`	boolean	—	応答を保存するか（デフォルト `true`）
`metadata`	object	—	カスタムメタデータのキー・バリューペア
`provider`	object	—	OfoxAI拡張：ルーティングとフォールバック設定

Input形式

inputは2つの形式をサポートします：

1. 単純な文字列 — テキストを直接渡す


{
  "input": "こんにちは、自己紹介をお願いします"
}

2. 構造化メッセージ配列 — マルチターン対話とマルチモーダル入力


interface InputItem {
  type: 'message'
  role: 'user' | 'assistant'
  content: ContentPart[]
  id?: string               // assistantメッセージでは必須
  status?: 'completed'      // assistantメッセージでは必須
}
 
type ContentPart =
  | { type: 'input_text'; text: string }           // ユーザーのテキスト入力
  | { type: 'input_image'; image_url: string }     // 画像入力
  | { type: 'output_text'; text: string; annotations?: any[] }  // アシスタントのテキスト出力

マルチターン対話でassistantロールのメッセージを含める場合、idとstatusフィールドは必須です。 Responses APIはステートレス設計であり、各リクエストで完全な対話履歴を渡す必要があります。

リクエスト例

cURL

Terminal


curl https://api.ofox.io/v1/responses \
  -H "Authorization: Bearer $OFOX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5.4-mini",
    "input": "API Gatewayとは何か説明してください",
    "instructions": "あなたは親切な技術アシスタントです。日本語で回答してください。",
    "max_output_tokens": 1024
  }'

Python

responses.py


from openai import OpenAI
 
client = OpenAI(
    base_url="https://api.ofox.io/v1",
    api_key="<あなたの OFOXAI_API_KEY>"
)
 
response = client.responses.create(
    model="openai/gpt-5.4-mini",
    input="API Gatewayとは何か説明してください",
    instructions="あなたは親切な技術アシスタントです。日本語で回答してください。",
    max_output_tokens=1024
)
 
print(response.output_text)

TypeScript

responses.ts


import OpenAI from 'openai'
 
const client = new OpenAI({
  baseURL: 'https://api.ofox.io/v1',
  apiKey: '<あなたの OFOXAI_API_KEY>'
})
 
const response = await client.responses.create({
  model: 'openai/gpt-5.4-mini',
  input: 'API Gatewayとは何か説明してください',
  instructions: 'あなたは親切な技術アシスタントです。日本語で回答してください。',
  max_output_tokens: 1024
})
 
console.log(response.output_text)

レスポンス形式


{
  "id": "resp_abc123",
  "object": "response",
  "created_at": 1703123456,
  "model": "openai/gpt-5.4-mini",
  "status": "completed",
  "output": [
    {
      "type": "message",
      "id": "msg_def456",
      "status": "completed",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "API Gateway（APIゲートウェイ）とは...",
          "annotations": []
        }
      ]
    }
  ],
  "usage": {
    "input_tokens": 25,
    "output_tokens": 150,
    "total_tokens": 175
  }
}

レスポンスフィールド説明

フィールド	型	説明
`id`	string	応答の一意識別子、`resp_`で始まる
`object`	string	固定値 `"response"`
`created_at`	number	作成タイムスタンプ（Unix秒）
`model`	string	実際に使用されたモデルID
`status`	string	応答ステータス：`completed`、`failed`、`in_progress`、`cancelled`
`output`	array	出力item配列。メッセージとツールコールを含む
`usage`	object	トークン使用量統計

構造化メッセージ入力

構造化メッセージ配列を使用してマルチターン対話を実装します：

Python

multi_turn.py


response = client.responses.create(
    model="openai/gpt-5.4-mini",
    input=[
        {
            "type": "message",
            "role": "user",
            "content": [
                {"type": "input_text", "text": "フランスの首都はどこですか？"}
            ]
        },
        {
            "type": "message",
            "role": "assistant",
            "id": "msg_abc123",
            "status": "completed",
            "content": [
                {"type": "output_text", "text": "フランスの首都はパリです。", "annotations": []}
            ]
        },
        {
            "type": "message",
            "role": "user",
            "content": [
                {"type": "input_text", "text": "そこには何人住んでいますか？"}
            ]
        }
    ]
)
 
print(response.output_text)

ストリーミングレスポンス

stream: true を設定してSSEストリーミングレスポンスを有効にします：

Python

stream.py


stream = client.responses.create(
    model="openai/gpt-5.4-mini",
    input="プログラミングに関するジョークを教えてください",
    stream=True
)
 
for event in stream:
    if event.type == "response.output_text.delta":
        print(event.delta, end="", flush=True)

ストリーミングイベントタイプ

ストリーミングレスポンスはSSE経由で以下のイベントを送信します：


data: {"type":"response.created","response":{"id":"resp_abc123","object":"response","status":"in_progress"}}

data: {"type":"response.output_item.added","output_index":0,"item":{"type":"message","id":"msg_def456","role":"assistant","status":"in_progress","content":[]}}

data: {"type":"response.content_part.added","output_index":0,"content_index":0,"part":{"type":"output_text","text":""}}

data: {"type":"response.output_text.delta","output_index":0,"content_index":0,"delta":"こん"}

data: {"type":"response.output_text.delta","output_index":0,"content_index":0,"delta":"にちは"}

data: {"type":"response.output_item.done","output_index":0,"item":{"type":"message","id":"msg_def456","role":"assistant","status":"completed","content":[{"type":"output_text","text":"こんにちは..."}]}}

data: {"type":"response.completed","response":{"id":"resp_abc123","object":"response","status":"completed","usage":{"input_tokens":12,"output_tokens":45,"total_tokens":57}}}

data: [DONE]

イベントタイプ	説明
`response.created`	応答オブジェクトが作成された
`response.output_item.added`	新しい出力itemが追加された
`response.content_part.added`	新しいコンテンツパートが追加された
`response.output_text.delta`	テキストの増分（トークン単位の出力）
`response.output_item.done`	出力itemが完了
`response.completed`	応答全体が完了
`response.function_call_arguments.delta`	関数呼び出し引数の増分
`response.function_call_arguments.done`	関数呼び出し引数が完了

Function Calling

Responses APIはツールコールをネイティブにサポートします：

Python

tools.py


response = client.responses.create(
    model="openai/gpt-5.4-mini",
    input="東京の今日の天気はどうですか？",
    tools=[
        {
            "type": "function",
            "name": "get_weather",
            "description": "指定された都市の現在の天気を取得する",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "都市名（例：東京）"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"]
                    }
                },
                "required": ["location"]
            }
        }
    ],
    tool_choice="auto"
)
 
# ツールコールを処理
for item in response.output:
    if item.type == "function_call":
        print(f"関数呼び出し: {item.name}")
        print(f"引数: {item.arguments}")

ツールコールのレスポンス形式

モデルがツールを呼び出すと、outputにfunction_callタイプのitemが含まれます：


{
  "id": "resp_abc123",
  "object": "response",
  "status": "completed",
  "output": [
    {
      "type": "function_call",
      "id": "fc_abc123",
      "call_id": "call_xyz789",
      "name": "get_weather",
      "arguments": "{\"location\":\"東京\",\"unit\":\"celsius\"}"
    }
  ],
  "usage": {
    "input_tokens": 45,
    "output_tokens": 25,
    "total_tokens": 70
  }
}

ツール実行結果の送信

ツールの実行結果をモデルに返す際は、inputに完全な呼び出しチェーンを含めます：


# 2回目のリクエスト：ツール実行結果を送信
response = client.responses.create(
    model="openai/gpt-5.4-mini",
    input=[
        {
            "type": "message",
            "role": "user",
            "content": [{"type": "input_text", "text": "東京の今日の天気はどうですか？"}]
        },
        {
            "type": "function_call",
            "id": "fc_abc123",
            "call_id": "call_xyz789",
            "name": "get_weather",
            "arguments": "{\"location\":\"東京\",\"unit\":\"celsius\"}"
        },
        {
            "type": "function_call_output",
            "id": "fco_abc123",
            "call_id": "call_xyz789",
            "output": "{\"temperature\":\"22°C\",\"condition\":\"晴れ\"}"
        }
    ]
)
 
print(response.output_text)
# => "東京の今日は晴れで、気温は22°Cです。屋外活動に最適です。"

Tool Choiceオプション

値	説明
`"auto"`	モデルがツールを呼び出すかを自ら判断する（デフォルト）
`"none"`	ツール呼び出しを禁止する
`{"type": "function", "name": "tool_name"}`	指定したツールの呼び出しを強制する

Chat Completionsとの比較

機能	Chat Completions	Responses API
エンドポイント	`/v1/chat/completions`	`/v1/responses`
入力形式	`messages` 配列	`input` 文字列または構造化item配列
システム指示	`role: "system"` メッセージ	`instructions` パラメータ（独立したキャッシュ）
Prompt Caching	システム指示がmessagesに混在し、キャッシュプレフィックスが不安定	`instructions` を独立して渡し、自動キャッシュ、ヒット率が高い
出力形式	`choices[0].message.content`	`output[0].content[0].text` または `output_text`
ツールコール	`tool_calls` がmessage内にある	独立した `function_call` 出力item
ツール結果	`role: "tool"` メッセージ	`function_call_output` 入力item
ストリーミングイベント	`chat.completion.chunk`	構造化イベントタイプ（`response.*`）
トークンフィールド	`prompt_tokens` / `completion_tokens`	`input_tokens` / `output_tokens`

両APIとも本番環境で利用可能です。既にChat Completionsで構築済みの場合、移行する必要はありません。 新規プロジェクトにはResponses APIを推奨します。特に複雑なツールコールフローが必要な場合や、高頻度の呼び出し（キャッシュを十分活用してコスト削減が可能）が想定される場合に有利です。詳細はファンクションコーリングガイドをご覧ください。