Structuring Prompts for JSON and Other Structured Data

Beyond Plain Text

One of the most powerful applications for LLMs in development and data processing is their ability to structure information. You can use AI to extract data from unstructured text and format it perfectly as JSON, CSV, XML, or any other structured format. The secret lies in giving the AI an unmistakable template to follow.

The Core Technique: Provide a Template

The single most effective way to get structured data is to show the AI exactly what you want. This is a form of "few-shot" or even "zero-shot with instruction" prompting.

Example 1: Generating JSON from Scratch

Let's say you want to generate example data for a user profile.

Prompt:

Generate a list of 3 fictional user profiles in a JSON array format. Each object in the array should adhere to the following structure and use realistic but fake data. Do not include any explanation or commentary outside of the JSON block itself.

      JSON Structure:
      {
        "id": "integer",
        "username": "string",
        "email": "string (email format)",
        "isActive": "boolean",
        "tags": ["string", "string", ...]
      }

By providing the structure with data types, you are giving the model a perfect blueprint. The instruction "Do not include any explanation" is also key to preventing conversational filler around the code block.

Example 2: Extracting Data into JSON

This is a more common use case: pulling structured information from a block of unstructured text.

Prompt:

Act as a data extraction expert. Read the following product review and extract the specified information into a JSON object. If a piece of information is not present, use a value of null.

      Review Text: "I bought the new X-Pro camera last week. I love the image quality and the 4K video, but the battery life is pretty disappointing, it only lasted about 2 hours on my first shoot. The product arrived on March 15, 2024."

      Extract the following information and format it according to the JSON structure below.

      {
        "productName": "string",
        "positiveMentions": ["string"],
        "negativeMentions": ["string"],
        "deliveryDate": "string (YYYY-MM-DD format)"
      }

The AI will now reliably output:

{
        "productName": "X-Pro camera",
        "positiveMentions": ["image quality", "4K video"],
        "negativeMentions": ["battery life"],
        "deliveryDate": "2024-03-15"
      }

Tips for Reliable Structured Output

Be Explicit: Always state the desired format (e.g., "in a JSON array," "as a CSV with headers").
Provide a Schema/Template: Show the exact keys, and if possible, the expected data types.
Handle Missing Data: Tell the model what to do if a field is missing (e.g., "use null," "use an empty string").
Forbid Extra Text: Instruct the model to "only output the code block" or "do not provide any explanation." This is crucial for programmatic use.
Use Tools with Structured Output Features: Some newer models and APIs (like Genkit's `output: { schema: ... }` feature) have built-in support for enforcing JSON output, which is even more reliable than prompting alone.

Conclusion

Getting reliable structured data from an LLM is a superpower for developers and data analysts. By moving beyond simple text requests and providing clear, templated instructions, you can automate data extraction and generation tasks that were previously tedious and time-consuming.