Generating Schema for Securities Trading

What you’ll build

This tutorial walks you through generating a database schema for a financial analytics application. For example, a RAG system that answers questions from investment analysts, financial advisers, or securities traders. It generates signals by correlating securities price movements with facts from SEC filings:

10-K — Annual reports
10-Q — Quarterly reports
8-K — Current reports

Implementation

step 1/5. Get API key

Request early access to receive your API key.

step 2/5. Prepare specification and data samples

Fill in the specification template

Access patterns and downstream usage are the biggest drivers of architectural decisions. Use the specification template to outline your business goals and functional requirements. Xfrukt uses these inputs to tailor the architecture and schema to your use case.


# Industry
 
financial markets
 
# Users
 
securities traders
our CFO who gambles on stocks of competitors using company's budgets
 
# Key Responsibilities and Tasks of the Users
 
market analysis
trading strategy
keeping us profitable
 
 
# Functionality Of The Future Application
 
It a combination of a data analytics console with text form to ask all sorts of questions. 
The sofware should assist in answering questions like these (but not limited too):
Which companies have 10-Q revenue acceleration + rising gross margins + low relative valuation vs peers, but stock hasn't moved yet?
Which companies reported lower-than-expected bad debt expense / allowances in 10-Q despite rising DSO?
Which companies have CFOs who sold >50% of their holdings within 30 days of filing 10-Q, where the subsequent 10-Q revealed deteriorating working capital ratios?
 
 
 
# Data sources
SEC reports
End-of-day trading pricing 
Companies announcements 
 
 
# Other Requirements and Notes
none

Note: You don’t need to be 100% precise with terminology or work out the architecture upfront. That’s exactly what Xfrukt provides:

standardization of business terminology,
best practices for vertical and operational tasks,
battle-tested data architectures suited to the type of system you’re building.

You just define initial guidelines on the overall direction.

Save the completed specification to a file and note its path; you’ll reference it in the next step.


files_metas = [
    {
        "filename": "spec_securities_trader.md",
        "file_path": "/tmp/spec_securities_trader.md",
        "purpose": "spec",
        "content_type": "text/markdown"
    }
]

Prepare data samples

Include representative samples of each filing type: 10-K, 10-Q, and 8-K.

https://www.sec.gov/submit-filings/forms-index

https://edgartools.readthedocs.io/en/latest/guides/working-with-filing/


import requests
import edgar
 
dirname = "/tmp"
 
TICKER = "NVDA"
 
 
 
# Downloading reports
 
edgar.set_identity("Moly moly@email.com")
 
edgar.use_local_storage(
    path_or_enable=dirname,
    use_local=True
    )
 
 
company = edgar.Company(cik_or_ticker=TICKER)
 
forms = ["10-K", "10-Q", "8-K"]
 
for form in forms:
 
    filing_meta = company.latest(form)
    primary = filing_meta.document.text()
 
    if form == "8-K":
        filing_date = str(filing_meta.report_date).replace("-", "")
    else:
        filing_date = filing_meta.period_of_report
 
    filename = f"{form}_{TICKER}_{filing_date}.txt"
    filename = filename.lower().replace("-", "_")
    filepath = os.path.join(dirname, filename)
    with open(os.path.join(dirname, filename), "w", encoding="utf-8") as txt_file:
        txt_file.write(primary)
 
    files_metas.append(
        {
            "filename": filename,
            "filepath": filepath,
            "purpose": "data", # how to treat this file
            "content_type": "text/plain"
        }
    )
 
# Downloading trading data
 
try:
    API_KEY = os.environ["API_KEY_ALPHAVANTAGE"]
    TICKER = "NVDA"
except:
    API_KEY = "demo"
    TICKER = "IBM"
 
# outputsize=compact --> the latest 100 data points
url = f"https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol={TICKER}&outputsize=compact&apikey={API_KEY}"
r = requests.get(url)
data = r.json()
 
filename = f"ohlcv_{TICKER}_latest.json".lower()
filepath = os.path.join(dirname, filename)
with open(filepath, 'w') as f:
    json.dump(data, f, indent=2, ensure_ascii=False)
 
files_metas.append(
    {
        "filename": filename,
        "file_path": str(filepath),
        "purpose": "data",
        "content_type": "application/json"
    }
)

Once prepared, you should have the following metadata for each file


[
    {
        "filename": "spec_securities_trader.md",
        "file_path": "/tmp/spec_securities_trader.md",
        "purpose": "spec",
        "content_type": "text/markdown"
    },
    {
        "filename": "ohlcv_nvda_latest.json",
        "file_path": "/tmp/ohlcv_nvda_latest.json",
        "purpose": "data",
        "content_type": "application/json"
    },
    {
        "filename": "10_k_nvda_2025_01_26.txt",
        "file_path": "/tmp/10_k_nvda_2025_01_26.txt",
        "purpose": "data",
        "content_type": "text/plain"
    },
    {
        "filename": "10_q_nvda_2025_10_26.txt",
        "file_path": "/tmp/10_q_nvda_2025_10_26.txt",
        "purpose": "data",
        "content_type": "text/plain"
    },
    {
        "filename": "8_k_nvda_2026_01_20.txt",
        "file_path": "/tmp/8_k_nvda_2026_01_20.txt",
        "purpose": "data",
        "content_type": "text/plain"
    },
]

where

`content_type`
`text/plain`
`text/markdown`
`application/json`

`purpose`	Description
`spec`	Your description of the future application and business context
`data`	one sample of a data

step 3/5. Upload data samples and specification

Upload each file individually.


import requests
 
XFRUKT_API_KEY = "YOUR_API_KEY"
BASE_URL = "https://api.xfrukt.com"
 
headers = {
    "X-Api-Key": XFRUKT_API_KEY,
}
 
 
for ix, file in enumerate(files_metas): 
    with open(file["file_path"], "rb") as f:
        file = {
            "file": (file["filename"], f, "text/markdown"),
        }
 
        response = requests.post(
            f"{BASE_URL}/v1/files",
            headers=headers,
            files=file,
        )
 
    response_json = response.json()  
    file_id = response_json["file_id"]
 
    # 🔑 will be used in the next step
    files_metas[ix]["file_id"] = response_json["file_id"]

Save every file_id and filename returned by the API. You’ll pass the file_id values into the next request to create the schema.

Response

Each successful upload returns a JSON body like

Success — `200 OK`


{
  "file_id": "a68d2bf16a44efc0efbc35c5564d5230a4d0480e900d495e787688c8ef353dfa",
  "object": "file",
  "filename": "8_k_nvda_2026_01_20.txt",
  "status": "uploaded",
  "content_type": "text/plain",
  "content_size": 9479,
  "created_at": 1776836274
}

step 4/5. Initiate generation of a schema

Reference all file_id from step 3.


response = requests.post(
        f"{BASE_URL}/v1/xfrukt_schema", 
        json=files_metas, 
        headers=headers
    )
response_json = response.json()
 
assert response_json["status"] == "RUNNING"
 
# 🔑 will be used in the next step
request_id = response_json["request_id"]

step 5/5. Download the schema

Poll the job status. Once it reports FINISHED, download the result.


response = requests.get(
    f"{BASE_URL}/v1/xfrukt_schema?request_id={request_id}", 
    headers=headers
)
response_json = response_get.json()
 
assert response_json.get("status", "") == "FINISHED"

Success — `200 OK`


{
    "request_id": "293db4a4-8a5e-43ec-a530-b187a221f804",
    "status": "FINISHED",
    "results": [
                {
                    "file_id": "d7ff9ada-7f1d-4682-8885-b4d993de0701",
                    "filename": "securities_trading_schema.jsonl",
                    "content": "...",
                    "format": "application/jsonl",
                    "error": ""
                }
            ]
}

Download the schema


 
import tempfile
 
file_ids = [r["file_id"] for r in response_json["results"]]
data = {"file_ids": file_ids}
response = requests.post(
        f"{BASE_URL}/v1/download", 
        json=data, 
        headers=headers
    )
 
with tempfile.NamedTemporaryFile(mode='wb', delete=False, suffix='.jsonl') as temp_jsonl:
    jsonl_path = pathlib.Path(temp_jsonl.name)
    for chunk in response.iter_bytes(chunk_size=8192):
        temp_jsonl.write(chunk)
 
all_files = []
line_number = 0
with open(jsonl_path, "r", encoding="utf-8") as jsonl_file:
    for line in jsonl_file:
        line_number += 1
        out = {}
 
        line = line.strip()
        if not line:
            continue
 
        try:
            row = json.loads(line)
        except json.JSONDecodeError as err:
            # log but continue processing remaining records
            print(f"Warning: Malformed JSON at line {line_number}: {err}")
            out["error"] = err
            all_files.append(out)
            continue
 
 
        file_id = row["file_id"]
        filename = row["filename"]
        content = row["content"]
 
        out = {
                "file_id": file_id,
                "filename": filename,
            }
 
        all_files.append(out)

Resulting schema

Loading flow…

Schemas are returned as JSON Structure notation by default. Contact us to get another format like SQL DDL.


{
  "$schema": "https://json-structure.org/meta/core/v0/#",
  "$id": "https://xfrukt.com/schemas/917c26c8-b43d-460b-ad0c-29c25151e777",
  "name": "securities_trading",
  "definitions": {
    "types": {
      "entity_type_enum": {
        "name": "entity_type_enum",
        "type": "string",
        "enum": [
          "CORPORATION",
          "LLC",
          ...
        ]
      },
      "period_type_enum": {
        "name": "period_type_enum",
        "type": "string",
        "enum": [
          "FY",
          "Q1",
          ...
        ]
      }
    },
    "tables": {
      "entities": {
        "type": "object",
        "properties": {
          "entity_id": {
            "name": "entity_id",
            "type": "uuid",
            "description": "Primary key. May align with external identifiers such as CIK, LEI, or internally generated."
          },
          "legal_name": {
            "name": "legal_name",
            "type": "string",
            "maxLength": 512
          },
          "names": {
            "name": "names",
            "type": "array",
            "description": "All known aliases, trading names, former names.",
            "items": {
              "type": "string"
            }
          },
          "entity_type": {
            "name": "entity_type",
            "type": {
              "$ref": "#/definitions/types/entity_type_enum"
            }
          },
          "tickers": {
            "name": "tickers",
            "type": "array",
            "items": {
              "type": "string",
              "maxLength": 10
            }
          },
          "lei": {
            "name": "lei",
            "type": "string",
            "description": "ISO 17442 Legal Entity Identifier.",
            "minLength": 20,
            "maxLength": 20
          },
          "metadata": {
            "name": "metadata",
            "type": "map",
            "description": "Extensible key-value store for jurisdiction, SIC code, incorporation date, parent entity, and other domain-specific attributes.",
            "values": {
              "type": "any"
            }
          }
        },
        "required": [
          "entity_id"
        ],
        "identity": [
          "entity_id"
        ],
        "description": "Universal legal or organizational entity. Synonyms: issuer, company, counterparty, filer, fund, SPV."
      },
      "publications": {
        "type": "object",
        "properties": {
          "publication_id": {
            "name": "publication_id",
            "type": "uuid"
          },
          "entity_id": {
            "name": "entity_id",
            "type": "uuid",
            "description": "FK to entities. The filing or reporting entity."
          },
          "document_ids": {
            "name": "document_ids",
            "type": "array",
            "description": "External identifiers for constituent documents and attachments.",
            "items": {
              "type": "string"
            }
          },
          "title": {
            "name": "title",
            "type": "string",
            "description": "Human-readable title of the publication."
          },
          "publication_types": {
            "name": "publication_types",
            "type": "array",
            "description": "All applicable classification labels. Includes SEC form types (10-Q, 8-K, DEF 14A), press releases, etc. Values from dei:DocumentType or 'Other'.",
            "items": {
              "type": "string"
            }
          },
          "fiscal_year": {
            "name": "fiscal_year",
            "type": "int16",
            "description": "Fiscal year of the reporting period."
          }
        },
        "required": [
          "publication_id",
          "entity_id"
        ],
        "identity": [
          "publication_id"
        ],
        "relations": {
          "entity_id": {
            "name": "entity_id",
            "cardinality": "single",
            "targettype": {
              "$ref": "#/definitions/tables/entities"
            },
            "description": "The entity that authored or filed this publication."
          }
        }
      },
      "sections": {
        "type": "object",
        "properties": {
          "section_id": {
            "name": "section_id",
            "type": "uuid",
          },
          "publication_id": {
            "name": "publication_id",
            "type": "uuid",
          },
          "seq": {
            "name": "seq",
            "type": "int32",
            "description": "Ordinal position within the publication across all documents."
          },
          "text": {
            "name": "text",
            "type": "string",
            "description": "Raw or normalized textual content of the section."
          },
          "metadata": {
            "name": "metadata",
            "type": "map",
            "description": "Extensible attributes: page number, language, extraction confidence, etc.",
            "values": {
              "type": "any"
            }
          }
        },
        "required": [
          "section_id",
          "publication_id",
          "seq",
          "text",
          "metadata"
        ],
        "identity": [
          "section_id"
        ],
        "description": "Together all sections reconstitute the full document.",
        "relations": {
          "publication_id": {
            "name": "publication_id",
            "cardinality": "single",
            "targettype": {
              "$ref": "#/definitions/tables/publications"
            }
          }
        }
      },
      "securities": {
        "type": "object",
        "properties": {
          "security_id": {
            "name": "security_id",
            "type": "uuid"
          },
          "entity_id": {
            "name": "entity_id",
            "type": "uuid"
          },
          "isin": {
            "name": "isin",
            "type": "string",
            "minLength": 12,
            "maxLength": 12
          },
          "metadata": {
            "name": "metadata",
            "type": "map",
            "description": "Extensible attributes: currency, maturity date, coupon rate, strike price, underlying_security_id, contract size, settlement type, etc.",
            "values": {
              "type": "any"
            }
          }
        },
        "required": [
          "security_id",
          "entity_id",
          "isin"
          "metadata"
        ],
        "identity": [
          "security_id"
        ],
        "description": "Tradable financial instruments. Covers equities, fixed income, derivatives, funds, and structured products. One security per distinct instrument listing.",
        "relations": {
          "entity_id": {
            "name": "entity_id",
            "cardinality": "single",
            "targettype": {
              "$ref": "#/definitions/tables/entities"
            }
          }
        }
      },
      "price_history": {
        "type": "object",
        "properties": {
          "price_id": {
            "name": "price_id",
            "type": "uuid"
          },
          "security_id": {
            "name": "security_id",
            "type": "uuid"
          },
          "price_date": {
            "name": "price_date",
            "type": "date"
          },
          "open_price": {
            "name": "open_price",
            "type": "decimal"
            "precision": 18,
            "scale": 6
          },
          "high_price": {
            "name": "high_price",
            "type": "decimal"
            "precision": 18,
            "scale": 6
          },
          "low_price": {
            "name": "low_price",
            "type": "decimal"
            "precision": 18,
            "scale": 6
          },
          "close_price": {
            "name": "close_price",
            "type": "decimal",
            "precision": 18,
            "scale": 6
          },
          "adjusted_close": {
            "name": "adjusted_close",
            "type": "decimal",
            "precision": 18,
            "scale": 6
          },
          "volume": {
            "name": "volume",
            "type": "int64"
          },
          "vwap": {
            "name": "vwap",
            "type": "decimal",
            "precision": 18,
            "scale": 6
          },
          "trade_count": {
            "name": "trade_count",
            "type": "int32",
            "description": "Number of discrete trades in the session."
          },
          "source_name": {
            "name": "source_name",
            "type": "string",
            "description": "Data vendor or exchange feed identifier.",
            "maxLength": 50
          }
        },
        "required": [
          "price_id",
          "security_id",
          "price_date",
          "open_price",
          "high_price",
          "low_price",
          "close_price",
          "adjusted_close",
          "volume",
          "vwap",
          "trade_count",
          "source_name"
        ],
        "identity": [
          "price_id"
        ],
        "description": "End-of-day and intraday OHLCV price records per security per date. Source of truth for market price time series.",
        "relations": {
          "security_id": {
            "name": "security_id",
            "cardinality": "single",
            "targettype": {
              "$ref": "#/definitions/tables/securities"
            }
          }
        }
      }
    }
  }
}

Generating Schema for Securities Trading

What you’ll build

Implementation

step 1/5. Get API key

step 2/5. Prepare specification and data samples

Fill in the specification template

Prepare data samples

step 3/5. Upload data samples and specification

Response

Success — 200 OK

step 4/5. Initiate generation of a schema

step 5/5. Download the schema

Success — 200 OK

Resulting schema

Request Early Access

Success — `200 OK`

Success — `200 OK`