Responses API (권장)

모델 응답을 생성합니다. 텍스트 및 이미지 입력을 지원하며 텍스트 또는 JSON 출력을 생성합니다. 함수 호출(Tool Calling), 스트리밍 응답, 멀티턴 대화를 지원합니다.

신규 프로젝트는 Responses API 사용을 권장합니다. OpenAI가 출시한 차세대 API로, Chat Completions와 비교하여 다음과 같은 장점이 있습니다:

네이티브 Prompt Caching — instructions와 input이 분리되어, 시스템 지시문이 자동으로 캐시 프리픽스 역할을 하며, 멀티턴 대화에서 변하지 않는 프리픽스 부분의 캐시 히트율이 더 높습니다. 최대 50% 입력 토큰 비용 절감과 동시에 지연 시간도 단축됩니다
구조화된 item 모델 — 입력/출력 형식이 더 명확하며, 도구 호출 흐름을 네이티브로 지원합니다
더 풍부한 스트리밍 이벤트 — 세분화된 SSE 이벤트 타입으로 실시간 UI 렌더링이 용이합니다

엔드포인트


POST https://api.ofox.io/v1/responses

요청 파라미터

파라미터	타입	필수	설명
`model`	string	✅	모델 식별자, 예: `openai/gpt-5.4-mini`
`input`	string \| array	✅	입력 콘텐츠, 일반 텍스트 문자열 또는 구조화된 메시지 배열 가능
`instructions`	string	—	시스템 지시문 (input과 독립적이며 자동으로 Prompt Caching 적용)
`stream`	boolean	—	SSE 스트리밍 응답 활성화 여부, 기본값 `false`
`max_output_tokens`	number	—	최대 생성 토큰 수
`temperature`	number	—	샘플링 온도 0-2, 기본값 1
`top_p`	number	—	핵 샘플링 파라미터
`tools`	array	—	사용 가능한 도구 정의 (Function Calling)
`tool_choice`	string \| object	—	도구 선택 전략: `auto`, `none` 또는 지정 도구
`truncation`	string	—	절단 전략: `auto` 자동 절단 / `disabled` 초과 시 오류 (기본값)
`text`	object	—	텍스트 생성 형식 설정
`store`	boolean	—	응답 저장 여부 (기본값 `true`)
`metadata`	object	—	사용자 정의 메타데이터 키-값 쌍
`provider`	object	—	OfoxAI 확장: 라우팅 및 폴백 설정

Input 형식

input은 두 가지 형식을 지원합니다:

1. 단순 문자열 — 텍스트를 직접 전달


{
  "input": "안녕하세요, 자기소개 부탁드립니다"
}

2. 구조화된 메시지 배열 — 멀티턴 대화 및 멀티모달 입력


interface InputItem {
  type: 'message'
  role: 'user' | 'assistant'
  content: ContentPart[]
  id?: string               // assistant 메시지에서 필수
  status?: 'completed'      // assistant 메시지에서 필수
}
 
type ContentPart =
  | { type: 'input_text'; text: string }           // 사용자 텍스트 입력
  | { type: 'input_image'; image_url: string }     // 이미지 입력
  | { type: 'output_text'; text: string; annotations?: any[] }  // 어시스턴트 텍스트 출력

멀티턴 대화에서 assistant 역할 메시지를 포함할 때, id와 status 필드는 필수입니다. Responses API는 무상태(stateless) 설계이므로, 매 요청마다 전체 대화 기록을 함께 전송해야 합니다.

요청 예시

cURL

Terminal


curl https://api.ofox.io/v1/responses \
  -H "Authorization: Bearer $OFOX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5.4-mini",
    "input": "API Gateway가 무엇인지 설명해 주세요",
    "instructions": "당신은 유용한 기술 도우미이며, 한국어로 답변합니다.",
    "max_output_tokens": 1024
  }'

Python

responses.py


from openai import OpenAI
 
client = OpenAI(
    base_url="https://api.ofox.io/v1",
    api_key="<OFOXAI_API_KEY>"
)
 
response = client.responses.create(
    model="openai/gpt-5.4-mini",
    input="API Gateway가 무엇인지 설명해 주세요",
    instructions="당신은 유용한 기술 도우미이며, 한국어로 답변합니다.",
    max_output_tokens=1024
)
 
print(response.output_text)

TypeScript

responses.ts


import OpenAI from 'openai'
 
const client = new OpenAI({
  baseURL: 'https://api.ofox.io/v1',
  apiKey: '<OFOXAI_API_KEY>'
})
 
const response = await client.responses.create({
  model: 'openai/gpt-5.4-mini',
  input: 'API Gateway가 무엇인지 설명해 주세요',
  instructions: '당신은 유용한 기술 도우미이며, 한국어로 답변합니다.',
  max_output_tokens: 1024
})
 
console.log(response.output_text)

응답 형식


{
  "id": "resp_abc123",
  "object": "response",
  "created_at": 1703123456,
  "model": "openai/gpt-5.4-mini",
  "status": "completed",
  "output": [
    {
      "type": "message",
      "id": "msg_def456",
      "status": "completed",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "API Gateway(API 게이트웨이)는...",
          "annotations": []
        }
      ]
    }
  ],
  "usage": {
    "input_tokens": 25,
    "output_tokens": 150,
    "total_tokens": 175
  }
}

응답 필드 설명

필드	타입	설명
`id`	string	응답 고유 식별자, `resp_`로 시작
`object`	string	고정값 `"response"`
`created_at`	number	생성 타임스탬프 (Unix 초)
`model`	string	실제 사용된 모델 ID
`status`	string	응답 상태: `completed`, `failed`, `in_progress`, `cancelled`
`output`	array	출력 item 배열, 메시지 및 도구 호출 포함
`usage`	object	토큰 사용량 통계

구조화된 메시지 입력

구조화된 메시지 배열을 사용하여 멀티턴 대화를 구현합니다:

Python

multi_turn.py


response = client.responses.create(
    model="openai/gpt-5.4-mini",
    input=[
        {
            "type": "message",
            "role": "user",
            "content": [
                {"type": "input_text", "text": "프랑스의 수도는 어디인가요?"}
            ]
        },
        {
            "type": "message",
            "role": "assistant",
            "id": "msg_abc123",
            "status": "completed",
            "content": [
                {"type": "output_text", "text": "프랑스의 수도는 파리입니다.", "annotations": []}
            ]
        },
        {
            "type": "message",
            "role": "user",
            "content": [
                {"type": "input_text", "text": "그곳의 인구는 얼마나 되나요?"}
            ]
        }
    ]
)
 
print(response.output_text)

스트리밍 응답

stream: true를 설정하여 SSE 스트리밍 응답을 활성화합니다:

Python

stream.py


stream = client.responses.create(
    model="openai/gpt-5.4-mini",
    input="프로그래밍에 관한 농담 하나 들려주세요",
    stream=True
)
 
for event in stream:
    if event.type == "response.output_text.delta":
        print(event.delta, end="", flush=True)

스트리밍 이벤트 타입

스트리밍 응답은 SSE를 통해 다음 이벤트를 전송합니다:


data: {"type":"response.created","response":{"id":"resp_abc123","object":"response","status":"in_progress"}}

data: {"type":"response.output_item.added","output_index":0,"item":{"type":"message","id":"msg_def456","role":"assistant","status":"in_progress","content":[]}}

data: {"type":"response.content_part.added","output_index":0,"content_index":0,"part":{"type":"output_text","text":""}}

data: {"type":"response.output_text.delta","output_index":0,"content_index":0,"delta":"안"}

data: {"type":"response.output_text.delta","output_index":0,"content_index":0,"delta":"녕"}

data: {"type":"response.output_item.done","output_index":0,"item":{"type":"message","id":"msg_def456","role":"assistant","status":"completed","content":[{"type":"output_text","text":"안녕..."}]}}

data: {"type":"response.completed","response":{"id":"resp_abc123","object":"response","status":"completed","usage":{"input_tokens":12,"output_tokens":45,"total_tokens":57}}}

data: [DONE]

이벤트 타입	설명
`response.created`	응답 객체 생성
`response.output_item.added`	새 출력 item 추가
`response.content_part.added`	새 콘텐츠 조각 추가
`response.output_text.delta`	텍스트 증분 (토큰 단위 출력)
`response.output_item.done`	출력 item 완료
`response.completed`	응답 전체 완료
`response.function_call_arguments.delta`	함수 호출 인자 증분
`response.function_call_arguments.done`	함수 호출 인자 완료

Function Calling

Responses API는 도구 호출을 네이티브로 지원합니다:

Python

tools.py


response = client.responses.create(
    model="openai/gpt-5.4-mini",
    input="서울의 오늘 날씨는 어떤가요?",
    tools=[
        {
            "type": "function",
            "name": "get_weather",
            "description": "지정한 도시의 현재 날씨를 가져옵니다",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "도시 이름, 예: 서울"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"]
                    }
                },
                "required": ["location"]
            }
        }
    ],
    tool_choice="auto"
)
 
# 도구 호출 처리
for item in response.output:
    if item.type == "function_call":
        print(f"호출 함수: {item.name}")
        print(f"인자: {item.arguments}")

도구 호출 응답 형식

모델이 도구를 호출하면, output에 function_call 타입의 item이 포함됩니다:


{
  "id": "resp_abc123",
  "object": "response",
  "status": "completed",
  "output": [
    {
      "type": "function_call",
      "id": "fc_abc123",
      "call_id": "call_xyz789",
      "name": "get_weather",
      "arguments": "{\"location\":\"서울\",\"unit\":\"celsius\"}"
    }
  ],
  "usage": {
    "input_tokens": 45,
    "output_tokens": 25,
    "total_tokens": 70
  }
}

도구 결과 제출

도구 실행 결과를 모델에 돌려보내며, input에 전체 호출 체인을 포함합니다:


# 두 번째 요청: 도구 결과 제출
response = client.responses.create(
    model="openai/gpt-5.4-mini",
    input=[
        {
            "type": "message",
            "role": "user",
            "content": [{"type": "input_text", "text": "서울의 오늘 날씨는 어떤가요?"}]
        },
        {
            "type": "function_call",
            "id": "fc_abc123",
            "call_id": "call_xyz789",
            "name": "get_weather",
            "arguments": "{\"location\":\"서울\",\"unit\":\"celsius\"}"
        },
        {
            "type": "function_call_output",
            "id": "fco_abc123",
            "call_id": "call_xyz789",
            "output": "{\"temperature\":\"22°C\",\"condition\":\"맑음\"}"
        }
    ]
)
 
print(response.output_text)
# => "서울의 오늘 날씨는 맑고, 기온은 22°C로 야외 활동하기에 매우 좋습니다."

Tool Choice 옵션

값	설명
`"auto"`	모델이 도구 호출 여부를 자체 결정 (기본값)
`"none"`	도구 호출 금지
`{"type": "function", "name": "tool_name"}`	지정한 도구를 강제로 호출

Chat Completions와의 비교

특성	Chat Completions	Responses API
엔드포인트	`/v1/chat/completions`	`/v1/responses`
입력 형식	`messages` 배열	`input` 문자열 또는 구조화된 item 배열
시스템 지시문	`role: "system"` message	`instructions` 파라미터 (독립 캐싱)
Prompt Caching	시스템 지시문이 messages에 섞여 있어 캐시 프리픽스가 불안정	`instructions` 독립 전달, 자동 캐싱으로 히트율이 더 높음
출력 형식	`choices[0].message.content`	`output[0].content[0].text` 또는 `output_text`
도구 호출	message 내의 `tool_calls`	독립된 `function_call` output item
도구 결과	`role: "tool"` message	`function_call_output` input item
스트리밍 이벤트	`chat.completion.chunk`	구조화된 이벤트 타입 (`response.*`)
토큰 필드	`prompt_tokens` / `completion_tokens`	`input_tokens` / `output_tokens`

두 API 모두 프로덕션 환경에서 사용 가능합니다. 이미 Chat Completions를 통합하셨다면 마이그레이션할 필요는 없습니다. 신규 프로젝트는 Responses API 사용을 권장합니다. 특히 복잡한 도구 호출 흐름이나 고빈도 호출(캐싱을 충분히 활용하여 비용 절감 가능)이 필요한 시나리오에 적합합니다. 자세한 내용은 함수 호출 가이드를 참조하세요.