Next-Generation Large Model Paradigm: Inner Tools

Community Article Published January 9, 2026

Author: peirongyan | Company: Tencent

Let’s get straight to the point. This article proposes a new large model paradigm called Inner Tools. The core idea is to directly implement and utilize common tools that do not require network calls within the model's foundational framework. It is expected to address issues such as the limited long-context understanding of large models, as well as high resource consumption and long latency in practical applications.

Background

In practice, when using AI coding tools like CodeBuddy, it is observed that after a query, the large model gradually invokes code retrieval tools to query code blocks (transmitting only code blocks is primarily an engineering technique to reduce context size, especially in large repositories). In scenarios with complex code logic, the large model often requires multiple calls to clearly understand the implementation, leading to multiple network transmissions between the agent layer and the model layer. Each time, the agent layer must re-invoke the large model, and the input tokens for each interaction incur costs, consuming resources and increasing latency. This issue similarly exists in knowledge base tools like IMA.

Solution

In fact, the capability to retrieve code blocks can be directly supported at the large model layer. While providing the standard input to the model, additional parameters are added to pass the code repository as internal knowledge (inner_knowledge). When the model outputs tokens, it checks for tags indicating retrieval from the code base. If such a tag is detected, token prediction is interrupted, and a local retrieval tool is invoked to directly search the input inner_knowledge. The retrieval results are then directly appended to the context, and token prediction resumes. This process iterates until the final output is generated.

This capability is not limited to code repositories; it is equally applicable to ultra-long texts such as documents, reports, legal contracts, etc., and can be uniformly supported as an internal knowledge retrieval tool (inner_search). Further expanding this concept, other tool calls that do not require network requests can be converted into internal tool calls (Inner Tools), such as mathematical operations, certain code execution, image generation, and more. (In a sense, with this implementation, the large model's performance on mathematical problems could arguably approach the upper limit, albeit in a non-conventional manner.)

Specific Technical Solution

Here, we propose a minimal viable implementation conceived by the author. While other more suitable technical solutions may exist, the core of this scheme is to treat Inner Tools as standard function call capabilities, distinguished only by adding a prefix identifier "inner_tool_" to the function name. For example, the function name for inner_searchwould be inner_tool_inner_search.

Using inner_searchas a detailed example:

  1. First, extend the large model invocation protocol to include parameters required by the inner_searchtool (other tools may not require this step). The corresponding structure is:
"inner_search_paras": {
    "desc": "Description of the knowledge base, e.g., 'This is the code repository for the service.'",
    "knowledge": "Content of the knowledge base, e.g., code files from the repository."
}
  1. Upon receiving user input, the large model layer needs to add the provided list of Inner Tools to the input's tools list, prefixing the function names with the identifier "inner_tool_". For inner_search, the function description also needs to be modified based on the description (desc) in the parameters. Example:
messages = [
    {"role": "system", "content": "You are a programming assistant."},  // System prompt
    {"role": "user", "content": "Hello"},                                 // Historical query
    {"role": "assistant", "content": "Hello, how can I help you?"},       // Historical response
    {"role": "user", "content": "What features does this service have?"}, // Latest query
]
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",    // User-provided tool
        "description": "Get current weather information"
    }
}, {
    "type": "function",
    "function": {
        "name": "inner_tool_inner_search",    // Added Inner Tool
        "description": "Internal knowledge retrieval tool. Knowledge description: {inner_search_paras.desc}" // Fill function description with the knowledge description from input parameters
    }
}]
  1. The preparation is now complete. Next, during real-time model generation, tokens are monitored to detect if they correspond to a tool call. If so, the function name is checked to determine if it is an Inner Tool. If it is, the tool is invoked locally, and model output is interrupted. This process is theoretically very fast as it executes locally.

  2. After the tool call is completed, the tool's result (here, a list of code blocks) is appended to the context. The model then continues prediction until the final output is complete. Other Inner Tools can be implemented by following a similar approach.

Advantages

  1. The Inner Tools approach can significantly reduce the number of large model invocations and associated latency without compromising performance.

  2. The implementation of the inner_searchtool can handle ultra-long contexts. The inner_knowledgecan be very large, does not consume context window length, and can be flexibly retrieved and incorporated into the context at the model layer.

  3. The proposed implementation does not require modifications to or retraining of the large model itself. Changes are needed only at the model deployment layer, enabling rapid adaptation.

Summary and Outlook

The above describes the rationale and specific implementation proposal for the Inner Tools paradigm. Subsequently, the author will provide a simple demo based on DeepSeek's open-source code, attached to the article.

Community

Sign up or log in to comment