# Building a Custom OpenAI-Compatible API Server with Kotlin, Spring Boot, LangChain4j

### Overview

* Since **OpenAI** released **ChatGPT** to the world in November 2022, **OpenAI**'s **LLM** has become the de facto standard. Many open-source and commercial solutions supporting **LLM** integration offer **OpenAI Compatible APIs** that function identically to **OpenAI**'s **API**. This means that many companies can build and operate their own **OpenAI Compatible Servers** tailored to their internal security environments and use cases.
    
* An **LLM Proxy** serves as an intermediary layer between client applications and various **LLM** providers. It standardizes the interaction interface while adding essential enterprise features such as authentication, monitoring, and failover capabilities. This approach allows organizations to maintain control over their **AI** operations while leveraging different **LLM** services through a unified interface.
    
* In this post, we'll outline how to create an **OpenAI Compatible Server** using **Kotlin**, **Spring Boot** with **Azure OpenAI**, **Amazon Bedrock Claude**.
    

### Why Should You Run Your Own OpenAI-Compatible API Server?

* Integration with internal authentication systems(**SSO**, **OAuth**, etc.) enables permission management and usage limits at department or team member levels. It also allows for detailed usage monitoring and audit log management.
    
* Sensitive corporate data can be securely processed using internal **LLM**s only, and prompt filtering can be implemented when necessary to prevent data leakage.
    
* Multiple **LLM** services such as **Azure OpenAI** and **Amazon Bedrock** can be flexibly selected and used according to specific situations.
    
* Automatic failover to alternative **LLM**s is possible when a specific **LLM** experiences an outage.
    
* While maintaining these advantages, popular **LLM** integration solutions like **LangChain** and **Aider** can immediately utilize it as an **OpenAI-compatible API**. Migration of existing **OpenAI**\-based applications is also straightforward.
    

### OpenAI Compatible Server Specification

* The core of an **OpenAI Compatible Server** is to accurately emulate the operation of the **OpenAI Chat Completion API**. The server should be able to handle client requests like the following and perform **LLM** operations:
    

```bash
$ curl -X POST "http://localhost:8080/v1/openai/chat/completions" \
      -H "Content-Type: application/json" \
      -H "Authorization: Bearer {YOUR_API_KEY}" \
      -d '{
            "model": "gpt4-o",
            "messages": [
              {
                "role": "user",
                "content": "Hello, how are you?"
              }
            ],
            "maxTokens": 4096,
            "temperature": 0.1,
            "stream": true
          }'
```

* For streaming responses, the server should be able to send each response **Chunk** to the client using **Server-Sent Events** as follows:
    

```bash
{
   "id": "unique-emitter-id",
   "object": "chat.completion.chunk",
   "created": 1633024800,
   "model": "gpt4-o",
   "choices": [
     {
       "delta": {
         "content": "Hello"
       }
     }
   ]
 }
```

* When the streaming response is complete, the server should be able to send a completion message to the client using **Server-Sent Events** as follows:
    

```bash
[DONE]
```

### Project Creation

* Install `Spring Initializr` locally and create a new project as follows:
    

```bash
$ sdk install springboot
$ spring init --type gradle-project-kotlin --language kotlin --java-version 21 --dependencies=web openai-comp-demo
$ cd openai-comp-demo
```

### build.gradle.kts

* Add the `LangChain4j` library dependency to the `build.gradle.kts` file in the project root as follows:
    

```kotlin
val langChain4jVersion = "0.35.0"
val awsSdkVersion = "2.29.6"
dependencies {
	implementation("dev.langchain4j:langchain4j-core:$langChain4jVersion")
	implementation("dev.langchain4j:langchain4j-azure-open-ai:$langChain4jVersion")
    implementation("software.amazon.awssdk:bedrockruntime:$awsSdkVersion")
    implementation("software.amazon.awssdk:apache-client:$awsSdkVersion")
    implementation("software.amazon.awssdk:netty-nio-client:$awsSdkVersion")
}
```

### Creating JsonConfig

* Create an `ObjectMapper` bean that will convert responses from the **REST API** into **DTO**s.
    

```kotlin
@Configuration
class JsonConfig {

    @Bean("objectMapper")
    @Primary
    fun objectMapper(): ObjectMapper {

        return Jackson2ObjectMapperBuilder
            .json()
            .serializationInclusion(JsonInclude.Include.ALWAYS)
            .failOnEmptyBeans(false)
            .failOnUnknownProperties(false)
            .featuresToDisable(SerializationFeature.WRITE_DATES_AS_TIMESTAMPS)
            .modulesToInstall(kotlinModule(), JavaTimeModule())
            .build()
    }
}
```

### Creating OpenAiCompatibleChatCompletionDTO

* Create DTOs that comply with the **OpenAi Compatible API** as follows:

```kotlin
import com.fasterxml.jackson.annotation.JsonProperty
import com.fasterxml.jackson.core.JsonGenerator
import com.fasterxml.jackson.core.JsonParser
import com.fasterxml.jackson.core.JsonToken
import com.fasterxml.jackson.core.type.TypeReference
import com.fasterxml.jackson.databind.DeserializationContext
import com.fasterxml.jackson.databind.JsonDeserializer
import com.fasterxml.jackson.databind.JsonSerializer
import com.fasterxml.jackson.databind.SerializerProvider
import com.fasterxml.jackson.databind.annotation.JsonDeserialize
import com.fasterxml.jackson.databind.annotation.JsonSerialize

/**
 * Represents a chat completion request in OpenAI-compatible format.
 * @property model The model identifier to use for completion
 * @property messages The conversation history as a list of messages
 * @property maxCompletionTokens Maximum tokens to generate in the response
 * @property temperature Controls randomness in the response (0.0 = deterministic, 1.0 = creative)
 * @property stream Whether to stream the response or return it all at once
 */
data class OpenAiCompatibleChatCompletionRequest(
    val model: String = "gpt-4o",
    val messages: List<OpenAiCompatibleChatMessage>,
    val maxCompletionTokens: Int = 16384,
    val temperature: Float = 0.0f,
    val stream: Boolean = false
)

/**
 * Represents a chat message in OpenAI-compatible format.
 * @property role The role of the message sender (e.g., "system", "user", "assistant")
 * @property content List of content items that can include text and images
 */
data class OpenAiCompatibleChatMessage(
    val role: String = "user",
    @JsonDeserialize(using = ContentDeserializer::class)
    @JsonSerialize(using = ContentSerializer::class)
    val content: List<OpenAiCompatibleContentItem>? = null
)

/**
 * Represents a single content item in a chat message.
 * @property type Content type identifier ("text" or "image_url")
 * @property text The text content if type is "text"
 * @property imageUrl The image URL details if type is "image_url"
 */
data class OpenAiCompatibleContentItem(
    val type: String = "text",
    val text: String? = null,
    @JsonProperty("image_url")
    val imageUrl: ImageUrl? = null
)

/**
 * Contains image URL information for image content items.
 * @property url The actual URL of the image (can be http(s) or base64 data URI)
 * @property detail The desired detail level for image analysis
 */
data class ImageUrl(
    val url: String,
    val detail: String? = "auto"
)

/**
 * Represents a complete response from the chat completion API.
 * @property id Unique identifier for the completion
 * @property object Type identifier for the response
 * @property created Timestamp of when the completion was created
 * @property model The model used for completion
 * @property choices List of completion choices/responses
 * @property usage Token usage statistics for the request
 */
data class OpenAiCompatibleChatCompletionResponse(
    val id: String,
    val `object`: String,
    val created: Long,
    val model: String,
    val choices: List<OpenAiCompatibleChoice>,
    val usage: OpenAiCompatibleUsage? = null
)

/**
 * Represents a single completion choice in the response.
 * @property message The generated message content
 * @property finishReason Why the completion stopped (e.g., "stop", "length")
 */
data class OpenAiCompatibleChoice(
    val message: OpenAiCompatibleChatMessage,
    val finishReason: String? = null
)

/**
 * Represents a chunk of the streaming response.
 * Used when stream=true in the request.
 */
data class OpenAiCompatibleChatCompletionChunk(
    val id: String,
    val `object`: String,
    val created: Long,
    val model: String,
    val choices: List<OpenAiCompatibleChunkChoice>
)

/**
 * Represents a choice within a streaming response chunk.
 */
data class OpenAiCompatibleChunkChoice(
    val delta: OpenAiCompatibleDelta,
    val finishReason: String? = null
)

/**
 * Represents the incremental changes in a streaming response.
 */
data class OpenAiCompatibleDelta(
    val content: String? = null,
    val role: String? = null
)

/**
 * Contains token usage statistics for the request.
 * @property promptTokens Number of tokens in the input prompt
 * @property completionTokens Number of tokens in the generated completion
 * @property totalTokens Total tokens used (prompt + completion)
 */
data class OpenAiCompatibleUsage(
    val promptTokens: Int,
    val completionTokens: Int,
    val totalTokens: Int
)

/**
 * Custom serializer for chat message content.
 * Converts structured content arrays to string format for compatibility with litellm.
 */
class ContentSerializer : JsonSerializer<List<OpenAiCompatibleContentItem>>() {

    override fun serialize(
        value: List<OpenAiCompatibleContentItem>?,
        gen: JsonGenerator,
        serializers: SerializerProvider
    ) {
        when {
            value == null -> gen.writeNull()
            value.isEmpty() -> gen.writeString("")
            else -> {
                // Combine all text content into a single string
                val combinedText = value.mapNotNull { item ->
                    when (item.type) {
                        "text" -> item.text
                        else -> null
                    }
                }.joinToString("\n")
                gen.writeString(combinedText)
            }
        }
    }
}

/**
 * Custom deserializer for chat message content.
 * Handles both string-only content and structured content arrays.
 * Converts legacy string content to the new structured format for compatibility.
 */
class ContentDeserializer : JsonDeserializer<List<OpenAiCompatibleContentItem>>() {

    override fun deserialize(p: JsonParser, ctxt: DeserializationContext): List<OpenAiCompatibleContentItem> {
        return when (p.currentToken) {
            JsonToken.VALUE_STRING -> {
                // Convert legacy string content to structured format
                listOf(OpenAiCompatibleContentItem(type = "text", text = p.valueAsString))
            }

            JsonToken.START_ARRAY -> {
                // Parse structured content array
                val typeRef = object : TypeReference<List<OpenAiCompatibleContentItem>>() {}
                p.codec.readValue(p, typeRef)
            }

            JsonToken.VALUE_NULL -> {
                emptyList()
            }

            else -> {
                throw ctxt.weirdStringException(p.text, List::class.java, "Unexpected JSON token")
            }
        }
    }
}
```

### Creating OpenAiCompatibleService

* Before creating the actual implementation service class that performs the role of an **LLM Proxy**, we create an interface to accommodate various **LLMs**.

```kotlin
import org.springframework.web.servlet.mvc.method.annotation.SseEmitter

interface OpenAiCompatibleService {
    fun createChatCompletion(request: OpenAiCompatibleChatCompletionRequest): OpenAiCompatibleChatCompletionResponse
    fun createStreamingChatCompletion(request: OpenAiCompatibleChatCompletionRequest): SseEmitter
}
```

### Creating OpenAiCompatibleAzureOpenAiServiceImpl

* Create an **OpenAiCompatibleAzureOpenAiServiceImpl** bean that supports both streaming and non-streaming methods:
    

```kotlin
import com.fasterxml.jackson.databind.ObjectMapper
import dev.langchain4j.data.message.AiMessage
import dev.langchain4j.data.message.UserMessage
import dev.langchain4j.model.StreamingResponseHandler
import dev.langchain4j.model.azure.AzureOpenAiChatModel
import dev.langchain4j.model.azure.AzureOpenAiStreamingChatModel
import dev.langchain4j.model.output.Response
import org.springframework.http.MediaType
import org.springframework.stereotype.Service
import org.springframework.web.servlet.mvc.method.annotation.SseEmitter
import java.io.IOException
import java.time.Instant
import java.util.*
import java.util.concurrent.ConcurrentHashMap

@Service
class OpenAiCompatibleAzureOpenAiServiceImpl(
    private val objectMapper: ObjectMapper
) : OpenAiCompatibleService {
    private val emitters = ConcurrentHashMap<String, SseEmitter>()

    override fun createChatCompletion(request: OpenAiCompatibleChatCompletionRequest): OpenAiCompatibleChatCompletionResponse {

        val chatLanguageModel = AzureOpenAiChatModel.builder()
            .apiKey("{your-azure-openai-api-key}")
            .endpoint("{your-azure-openai-endpoint}")
            .deploymentName("{your-azure-openai-deployment-name}")
            .temperature(request.temperature.toDouble())
            .maxTokens(request.maxCompletionTokens)
            .topP(0.3)
            .logRequestsAndResponses(true)
            .build()


        val messages = request.messages.map { msg ->
            val content = msg.content?.joinToString("\n") { item ->
                when (item.type) {
                    "text" -> item.text ?: ""
                    else -> ""
                }
            } ?: ""
            UserMessage.from(content)
        }
        val response = chatLanguageModel.generate(messages.toList())

        return OpenAiCompatibleChatCompletionResponse(
            id = UUID.randomUUID().toString(),
            `object` = "chat.completion",
            created = Instant.now().epochSecond,
            model = request.model,
            choices = listOf(
                OpenAiCompatibleChoice(
                    OpenAiCompatibleChatMessage(
                        role = "assistant",
                        content = listOf(OpenAiCompatibleContentItem(type = "text", text = response.content().text()))
                    )
                )
            )
        )
    }

    override fun createStreamingChatCompletion(request: OpenAiCompatibleChatCompletionRequest): SseEmitter {

        val streamingChatLanguageModel = AzureOpenAiStreamingChatModel.builder()
            .apiKey("{your-azure-openai-api-key}")
            .endpoint("{your-azure-openai-endpoint}")
            .deploymentName("{your-azure-openai-deployment-name}")
            .temperature(request.temperature.toDouble())
            .maxTokens(request.maxCompletionTokens)
            .logRequestsAndResponses(true)
            .build()

        val emitter = SseEmitter()
        val emitterId = UUID.randomUUID().toString()
        emitters[emitterId] = emitter

        val messages = request.messages.map { msg ->
            val content = msg.content?.joinToString("\n") { item ->
                when (item.type) {
                    "text" -> item.text ?: ""
                    else -> ""
                }
            } ?: ""
            UserMessage.from(content)
        }

        streamingChatLanguageModel.generate(messages.toList(), object : StreamingResponseHandler<AiMessage> {
            override fun onNext(token: String) {
                val chunk = OpenAiCompatibleChatCompletionChunk(
                    id = emitterId,
                    `object` = "chat.completion.chunk",
                    created = Instant.now().epochSecond,
                    model = request.model,
                    choices = listOf(OpenAiCompatibleChunkChoice(OpenAiCompatibleDelta(content = token)))
                )
                try {
                    try {
                        emitter.send(
                            SseEmitter.event()
                                .data(objectMapper.writeValueAsString(chunk), MediaType.APPLICATION_NDJSON)
                        )
                    } catch (e: IOException) {
                        emitter.completeWithError(e)
                        emitters.remove(emitterId)
                    }
                } catch (e: IOException) {
                    emitter.completeWithError(e)
                    emitters.remove(emitterId)
                }
            }

            override fun onComplete(response: Response<AiMessage>) {
                try {
                    emitter.send(SseEmitter.event().data("[DONE]"))
                    emitter.complete()
                    emitters.remove(emitterId)
                } catch (e: IOException) {
                    emitter.completeWithError(e)
                    emitters.remove(emitterId)
                }
            }

            override fun onError(error: Throwable) {
                emitter.completeWithError(error)
                emitters.remove(emitterId)
            }
        })

        return emitter
    }
}
```

### Creating OpenAiCompatibleAmazonBedrockClaudeServiceImpl

* Create an **OpenAiCompatibleAmazonBedrockClaudeServiceImpl** bean that supports both streaming and non-streaming methods:

```kotlin
import com.fasterxml.jackson.databind.ObjectMapper
import org.springframework.http.MediaType
import org.springframework.stereotype.Service
import org.springframework.web.servlet.mvc.method.annotation.SseEmitter
import software.amazon.awssdk.auth.credentials.DefaultCredentialsProvider
import software.amazon.awssdk.core.SdkBytes
import software.amazon.awssdk.http.apache.ApacheHttpClient
import software.amazon.awssdk.http.nio.netty.ProxyConfiguration
import software.amazon.awssdk.regions.Region
import software.amazon.awssdk.services.bedrockruntime.BedrockRuntimeAsyncClient
import software.amazon.awssdk.services.bedrockruntime.BedrockRuntimeClient
import software.amazon.awssdk.services.bedrockruntime.model.*
import java.net.HttpURLConnection
import java.time.Duration
import java.time.Instant
import java.util.*
import java.util.concurrent.CompletableFuture
import java.util.concurrent.ExecutionException
import java.util.concurrent.TimeUnit
import java.util.concurrent.TimeoutException

/**
 * Implementation of OpenAI-compatible API using Amazon Bedrock Claude model.
 * Provides both streaming and non-streaming chat completions with OpenAI-compatible interface.
 */
@Service
class OpenAiCompatibleAmazonBedrockClaudeServiceImpl(
    private val objectMapper: ObjectMapper
) : OpenAiCompatibleService {

    companion object {
        // Maximum time to wait for model response before timing out
        private const val TIMEOUT_SECONDS = 180L

        // Claude model identifier - latest stable version as of 2024
        private const val MODEL_ID = "anthropic.claude-3-5-sonnet-20241022-v2:0"
    }

    /**
     * Synchronous Bedrock client for non-streaming requests.
     * Configured with appropriate timeouts and AWS credentials.
     */
    private val bedrockRuntimeClient: BedrockRuntimeClient by lazy {
        val httpClient = ApacheHttpClient.builder()
            .connectionTimeout(Duration.ofSeconds(TIMEOUT_SECONDS))
            .socketTimeout(Duration.ofSeconds(TIMEOUT_SECONDS))
            .build()

        BedrockRuntimeClient.builder()
            .region(Region.US_WEST_2)
            .credentialsProvider(DefaultCredentialsProvider.create())
            .httpClient(httpClient)
            .build()
    }

    /**
     * Asynchronous Bedrock client optimized for streaming responses.
     * Configured with proxy settings to bypass corporate proxies for AWS services,
     * appropriate timeouts, and AWS credentials.
     */
    private val bedrockRuntimeAsyncClient: BedrockRuntimeAsyncClient by lazy {
        System.setProperty("http.nonProxyHosts", "*.amazonaws.com|*.amazon.com")

        val asyncHttpClient = software.amazon.awssdk.http.nio.netty.NettyNioAsyncHttpClient.builder()
            .connectionTimeout(Duration.ofSeconds(TIMEOUT_SECONDS))
            .readTimeout(Duration.ofSeconds(TIMEOUT_SECONDS))
            .proxyConfiguration(
                ProxyConfiguration.builder()
                    .nonProxyHosts(setOf("*.amazonaws.com", "*.amazon.com"))
                    .useSystemPropertyValues(true)
                    .build()
            )
            .build()

        BedrockRuntimeAsyncClient.builder()
            .region(Region.US_WEST_2)
            .credentialsProvider(DefaultCredentialsProvider.create())
            .httpClient(asyncHttpClient)
            .build()
    }

    /**
     * Creates a non-streaming chat completion using Claude model.
     * Handles the asynchronous request-response cycle with Amazon Bedrock,
     * maintaining OpenAI API compatibility for seamless integration.
     */
    override fun createChatCompletion(request: OpenAiCompatibleChatCompletionRequest): OpenAiCompatibleChatCompletionResponse {

        try {
            // Normalize and validate message sequence
            val normalizedMessages = normalizeMessages(request.messages)
            validateMessages(normalizedMessages)

            // Set up CompletableFuture for async response handling
            val future = CompletableFuture<OpenAiCompatibleChatCompletionResponse>()

            // Invoke Bedrock's Claude model asynchronously
            bedrockRuntimeAsyncClient.converse { params ->
                params.modelId(MODEL_ID)
                    .messages(normalizedMessages)
                    .inferenceConfig { config ->
                        config.maxTokens(request.maxCompletionTokens)
                            .temperature(request.temperature)
                    }
            }.whenComplete { response, error ->
                if (error != null) {
                    future.completeExceptionally(error)
                } else {
                    val inputText = normalizedMessages.joinToString("\n") { msg ->
                        msg.content().joinToString("\n") { item ->
                            when (item.type()) {
                                ContentBlock.Type.TEXT -> item.text()
                                else -> ""
                            }
                        }
                    }
                    val outputText = response.output().message().content()[0].text()
                    val usage = response.usage()

                    println("===== Input text: $inputText")
                    println("===== Output text: $outputText")
                    println("===== Input tokens: ${usage.inputTokens()}")
                    println("===== Output tokens: ${usage.outputTokens()}")
                    println("===== Total tokens: ${usage.totalTokens()}")

                    val compatibleResponse = OpenAiCompatibleChatCompletionResponse(
                        id = UUID.randomUUID().toString(),
                        `object` = "chat.completion",
                        created = Instant.now().epochSecond,
                        model = request.model,
                        choices = listOf(
                            OpenAiCompatibleChoice(
                                OpenAiCompatibleChatMessage(
                                    role = "assistant",
                                    content = listOf(OpenAiCompatibleContentItem(type = "text", text = outputText))
                                )
                            )
                        )
                    )
                    future.complete(compatibleResponse)
                }
            }

            return future.get(TIMEOUT_SECONDS, TimeUnit.SECONDS)

        } catch (e: Exception) {
            when (e) {
                is TimeoutException -> throw RuntimeException("Request timed out after $TIMEOUT_SECONDS seconds", e)
                is ExecutionException -> throw RuntimeException("Bedrock API Error: ${e.cause?.message}", e)
                else -> throw RuntimeException("Unexpected error: ${e.message}", e)
            }
        }
    }

    /**
     * Creates a streaming chat completion using Claude model.
     * Uses Server-Sent Events (SSE) to stream responses in OpenAI-compatible format.
     */
    override fun createStreamingChatCompletion(request: OpenAiCompatibleChatCompletionRequest): SseEmitter {

        // Initialize SSE emitter with timeout
        val emitter = SseEmitter(TIMEOUT_SECONDS * 1000)
        val emitterId = UUID.randomUUID().toString()

        // StringBuilder to accumulate response text
        val responseBuilder = StringBuilder()
        val inputText = request.messages.joinToString("\n") { msg ->
            msg.content?.joinToString("\n") { item ->
                when (item.type) {
                    "text" -> item.text ?: ""
                    else -> ""
                }
            } ?: ""
        }

        // Variable to track token usage
        var lastTokenUsage: TokenUsage? = null

        try {
            val normalizedMessages = normalizeMessages(request.messages)
            validateMessages(normalizedMessages)

            val responseStreamHandler = ConverseStreamResponseHandler.builder()
                .subscriber(
                    ConverseStreamResponseHandler.Visitor.builder()
                        .onContentBlockDelta { chunk ->
                            val deltaContent = chunk.delta().text()
                            responseBuilder.append(deltaContent)

                            val compatibleChunk = OpenAiCompatibleChatCompletionChunk(
                                id = emitterId,
                                `object` = "chat.completion.chunk",
                                created = Instant.now().epochSecond,
                                model = request.model,
                                choices = listOf(
                                    OpenAiCompatibleChunkChoice(
                                        delta = OpenAiCompatibleDelta(content = deltaContent)
                                    )
                                )
                            )

                            emitter.send(
                                SseEmitter.event()
                                    .data(objectMapper.writeValueAsString(compatibleChunk), MediaType.APPLICATION_JSON)
                            )
                        }
                        .onMetadata { metadata ->
                            // Update token usage metrics from metadata
                            lastTokenUsage = metadata.usage()
                        }
                        .build()
                )
                .onError { err ->
                    emitter.completeWithError(RuntimeException("Bedrock API Error: ${err.message}"))
                }
                .build()

            bedrockRuntimeAsyncClient.converseStream(
                { builder ->
                    builder.modelId(MODEL_ID)
                        .messages(normalizedMessages)
                        .inferenceConfig { config ->
                            config.maxTokens(request.maxCompletionTokens)
                                .temperature(request.temperature)
                        }
                },
                responseStreamHandler
            ).whenComplete { _, error ->
                if (error != null) {
                    emitter.completeWithError(error)
                } else {
                    println("===== Input text: $inputText")
                    println("===== Output text: $responseBuilder")
                    lastTokenUsage?.let { usage ->
                        println("===== Input tokens: ${usage.inputTokens()}")
                        println("===== Output tokens: ${usage.outputTokens()}")
                        println("===== Total tokens: ${usage.totalTokens()}")
                    }

                    emitter.send(SseEmitter.event().data("[DONE]"))
                    emitter.complete()
                }
            }

        } catch (e: Exception) {
            emitter.completeWithError(e)
        }

        return emitter
    }

    /**
     * Converts OpenAI message format to Claude's expected format.
     * Handles:
     * - Adding default system message if not present
     * - Converting message roles (system/user/assistant)
     * - Processing text and image content
     * - Merging consecutive messages from same role
     *
     * @param messages List of OpenAI-formatted messages
     * @return List of Claude-formatted messages
     */
    private fun normalizeMessages(messages: List<OpenAiCompatibleChatMessage>): List<Message> {
        val defaultSystemMessage = Message.builder()
            .content(ContentBlock.fromText("You are a helpful assistant."))
            .role(ConversationRole.USER)
            .build()

        val convertedMessages = messages.mapIndexed { index, msg ->
            val contentBlocks = mutableListOf<ContentBlock>()
            msg.content?.forEach { item ->
                when (item.type) {
                    "text" -> item.text?.let { text ->
                        contentBlocks.add(ContentBlock.fromText(text))
                    }

                    "image_url" -> item.imageUrl?.let { imageUrl ->
                        val sdkBytes = when {
                            imageUrl.url.startsWith("data:") -> {
                                val base64Data = imageUrl.url.substringAfter("base64,")
                                val decodedBytes = Base64.getDecoder().decode(base64Data)
                                SdkBytes.fromByteArray(decodedBytes)
                            }

                            imageUrl.url.startsWith("http://") || imageUrl.url.startsWith("https://") -> {
                                val connection =
                                    java.net.URI(imageUrl.url).toURL().openConnection() as HttpURLConnection
                                connection.connectTimeout = 10000
                                connection.readTimeout = 10000
                                connection.inputStream.use { inputStream ->
                                    SdkBytes.fromInputStream(inputStream)
                                }
                            }

                            else -> throw IllegalArgumentException("Unsupported image URL format: ${imageUrl.url}")
                        }

                        contentBlocks.add(
                            ContentBlock.fromImage(
                                ImageBlock.builder()
                                    .source(ImageSource.builder().bytes(sdkBytes).build())
                                    .format(ImageFormat.PNG)
                                    .build()
                            )
                        )
                    }
                }
            }

            Message.builder()
                .content(contentBlocks)
                .role(
                    when {
                        index == 0 && msg.role == "system" -> ConversationRole.USER
                        msg.role == "user" -> ConversationRole.USER
                        msg.role == "assistant" -> ConversationRole.ASSISTANT
                        else -> ConversationRole.USER
                    }
                )
                .build()
        }

        // Prepend default system message if needed
        val initialMessages = if (messages.firstOrNull()?.role != "system") {
            listOf(defaultSystemMessage) + convertedMessages
        } else {
            convertedMessages
        }

        // Merge consecutive messages from the same role
        return initialMessages.fold(mutableListOf()) { acc, message ->
            if (acc.isEmpty() || acc.last().role() != message.role()) {
                acc.add(message)
            } else {
                val lastMessage = acc.last()
                acc[acc.lastIndex] = Message.builder()
                    .content(
                        ContentBlock.fromText(
                            buildString {
                                lastMessage.content().forEach { block ->
                                    if (block.type() == ContentBlock.Type.TEXT) {
                                        append(block.text())
                                        append("\n")
                                    }
                                }
                                message.content().forEach { block ->
                                    if (block.type() == ContentBlock.Type.TEXT) {
                                        append(block.text())
                                        append("\n")
                                    }
                                }
                            }.trimEnd()
                        )
                    )
                    .role(lastMessage.role())
                    .build()
            }
            acc
        }
    }

    /**
     * Validates message sequence according to Claude model requirements.
     * Ensures:
     * - Messages list is not empty
     * - Proper role alternation between user and assistant
     *
     * @param messages List of normalized messages to validate
     * @throws IllegalArgumentException if validation fails
     */
    private fun validateMessages(messages: List<Message>) {

        if (messages.isEmpty()) {
            throw IllegalArgumentException("Messages cannot be empty")
        }

        messages.windowed(2).forEach { (prev, current) ->
            if (prev.role() == current.role()) {
                throw IllegalArgumentException("Messages must alternate between user and assistant roles")
            }
        }
    }
}
```

### Creating OpenAiCompatibleController

* Finally, create the **OpenAiCompatibleController** bean:
    

```kotlin
import org.springframework.beans.factory.annotation.Qualifier
import org.springframework.http.MediaType
import org.springframework.web.bind.annotation.*

@RestController
@RequestMapping("/v1/openai")
class OpenAiCompatibleController(
    // Specify the implementation for [Azure OpenAI] or [Amazon Bedrock Claude]
    @Qualifier("openAiCompatibleAmazonBedrockClaudeServiceImpl") private val openAiCompatibleService: OpenAiCompatibleService
) {
    @PostMapping("/chat/completions", produces = [MediaType.APPLICATION_JSON_VALUE])
    fun chatCompletions(
        @RequestHeader("Authorization") authHeader: String?,
        @RequestBody request: OpenAiCompatibleChatCompletionRequest
    ): Any {

        val apiKey = authHeader?.removePrefix("Bearer ")
        // Custom authentication can be applied using the obtained API_KEY

        return if (request.stream) {
            openAICompatibleService.createStreamingChatCompletion(request)
        } else {
            openAICompatibleService.createChatCompletion(request)
        }
    }
}
```

### Testing the OpenAI compatible API

* The creation of the **OpenAI Compatible Server** is complete. You can run the server and set environment variables for **Aider**, a popular **AI** coding assistant tool, to verify its operation.
    

```bash
# Run the project
$ ./gradlew bootRun

# Set the API of the running project in Aider's environment variables
$ export OPENAI_API_BASE=http://localhost:8080/v1/openai/
$ export OPENAI_API_KEY={YOUR_API_KEY}

# Reset token-related settings when using Amazon Bedrock Claude implementation
$ nano ~/.aider.model.metadata.json
{
    "openai/gpt-4o": {
        "max_tokens": 8192,
        "max_input_tokens": 200000,
        "max_output_tokens": 8192,
        "input_cost_per_token": 0.000003,
        "output_cost_per_token": 0.000015,
        "litellm_provider": "openai",
        "mode": "chat",
        "supports_function_calling": true,
        "supports_vision": true,
        "tool_use_system_prompt_tokens": 159,
        "supports_assistant_prefill": true
    }
}

# Run Aider
$ aider --model openai/gpt-4o
Aider v0.63.1
Model: openai/custom with whole edit format, infinite output
Git repo: .git with 22 files
Repo-map: disabled
Use /help <question> for help, run "aider --help" to see cmd line args
> Hello, how are you?

Hello! I'm doing well, thank you. How can I assist you with your project today? If you have any specific changes or questions, feel
free to let me know!
```

### References and Further Reading
* [How to build an OpenAI-compatible API](https://towardsdatascience.com/how-to-build-an-openai-compatible-api-87c8edea2f06)
* [AWS - Invoke Anthropic Claude on Amazon Bedrock using Bedrock's Converse API with a response stream](https://docs.aws.amazon.com/bedrock/latest/userguide/bedrock-runtime_example_bedrock-runtime_ConverseStream_AnthropicClaude_section.html)
* [How to Install Aider - AI Coding Assistant Chatbot](https://jsonobject.hashnode.dev/how-to-install-aider-ai-coding-assistant-chatbot)