Generating Schema for Securities Trading
TD;DR Jump to the resulting schema
What you’ll build
This tutorial walks you through generating a database schema for a financial analytics application. For example, a RAG system that answers questions from investment analysts, financial advisers, or securities traders. It generates signals by correlating securities price movements with facts from SEC filings:
- 10-K — Annual reports
- 10-Q — Quarterly reports
- 8-K — Current reports
Implementation
step 1/5. Get API key
Request early access to receive your API key.
step 2/5. Prepare specification and data samples
Fill in the specification template
Access patterns and downstream usage are the biggest drivers of architectural decisions. Use the specification template to outline your business goals and functional requirements. Xfrukt uses these inputs to tailor the architecture and schema to your use case.
# Industry
financial markets
# Users
securities traders
our CFO who gambles on stocks of competitors using company's budgets
# Key Responsibilities and Tasks of the Users
market analysis
trading strategy
keeping us profitable
# Functionality Of The Future Application
It a combination of a data analytics console with text form to ask all sorts of questions.
The sofware should assist in answering questions like these (but not limited too):
Which companies have 10-Q revenue acceleration + rising gross margins + low relative valuation vs peers, but stock hasn't moved yet?
Which companies reported lower-than-expected bad debt expense / allowances in 10-Q despite rising DSO?
Which companies have CFOs who sold >50% of their holdings within 30 days of filing 10-Q, where the subsequent 10-Q revealed deteriorating working capital ratios?
# Data sources
SEC reports
End-of-day trading pricing
Companies announcements
# Other Requirements and Notes
none
Note: You don’t need to be 100% precise with terminology or work out the architecture upfront. That’s exactly what Xfrukt provides:
- standardization of business terminology,
- best practices for vertical and operational tasks,
- battle-tested data architectures suited to the type of system you’re building.
You just define initial guidelines on the overall direction.
Save the completed specification to a file and note its path; you’ll reference it in the next step.
files_metas = [
{
"filename": "spec_securities_trader.md",
"file_path": "/tmp/spec_securities_trader.md",
"purpose": "spec",
"content_type": "text/markdown"
}
]Prepare data samples
Include representative samples of each filing type: 10-K, 10-Q, and 8-K.
https://www.sec.gov/submit-filings/forms-indexÂ
https://edgartools.readthedocs.io/en/latest/guides/working-with-filing/Â
import requests
import edgar
dirname = "/tmp"
TICKER = "NVDA"
# Downloading reports
edgar.set_identity("Moly moly@email.com")
edgar.use_local_storage(
path_or_enable=dirname,
use_local=True
)
company = edgar.Company(cik_or_ticker=TICKER)
forms = ["10-K", "10-Q", "8-K"]
for form in forms:
filing_meta = company.latest(form)
primary = filing_meta.document.text()
if form == "8-K":
filing_date = str(filing_meta.report_date).replace("-", "")
else:
filing_date = filing_meta.period_of_report
filename = f"{form}_{TICKER}_{filing_date}.txt"
filename = filename.lower().replace("-", "_")
filepath = os.path.join(dirname, filename)
with open(os.path.join(dirname, filename), "w", encoding="utf-8") as txt_file:
txt_file.write(primary)
files_metas.append(
{
"filename": filename,
"filepath": filepath,
"purpose": "data", # how to treat this file
"content_type": "text/plain"
}
)
# Downloading trading data
try:
API_KEY = os.environ["API_KEY_ALPHAVANTAGE"]
TICKER = "NVDA"
except:
API_KEY = "demo"
TICKER = "IBM"
# outputsize=compact --> the latest 100 data points
url = f"https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol={TICKER}&outputsize=compact&apikey={API_KEY}"
r = requests.get(url)
data = r.json()
filename = f"ohlcv_{TICKER}_latest.json".lower()
filepath = os.path.join(dirname, filename)
with open(filepath, 'w') as f:
json.dump(data, f, indent=2, ensure_ascii=False)
files_metas.append(
{
"filename": filename,
"file_path": str(filepath),
"purpose": "data",
"content_type": "application/json"
}
)Once prepared, you should have the following metadata for each file
[
{
"filename": "spec_securities_trader.md",
"file_path": "/tmp/spec_securities_trader.md",
"purpose": "spec",
"content_type": "text/markdown"
},
{
"filename": "ohlcv_nvda_latest.json",
"file_path": "/tmp/ohlcv_nvda_latest.json",
"purpose": "data",
"content_type": "application/json"
},
{
"filename": "10_k_nvda_2025_01_26.txt",
"file_path": "/tmp/10_k_nvda_2025_01_26.txt",
"purpose": "data",
"content_type": "text/plain"
},
{
"filename": "10_q_nvda_2025_10_26.txt",
"file_path": "/tmp/10_q_nvda_2025_10_26.txt",
"purpose": "data",
"content_type": "text/plain"
},
{
"filename": "8_k_nvda_2026_01_20.txt",
"file_path": "/tmp/8_k_nvda_2026_01_20.txt",
"purpose": "data",
"content_type": "text/plain"
},
]where
content_type |
|---|
text/plain |
text/markdown |
application/json |
purpose | Description |
|---|---|
spec | Your description of the future application and business context |
data | one sample of a data |
step 3/5. Upload data samples and specification
Upload each file individually.
import requests
XFRUKT_API_KEY = "YOUR_API_KEY"
BASE_URL = "https://api.xfrukt.com"
headers = {
"X-Api-Key": XFRUKT_API_KEY,
}
for ix, file in enumerate(files_metas):
with open(file["file_path"], "rb") as f:
file = {
"file": (file["filename"], f, "text/markdown"),
}
response = requests.post(
f"{BASE_URL}/v1/files",
headers=headers,
files=file,
)
response_json = response.json()
file_id = response_json["file_id"]
# 🔑 will be used in the next step
files_metas[ix]["file_id"] = response_json["file_id"]Save every file_id and filename returned by the API. You’ll pass the file_id values into the next request to create the schema.
Response
Each successful upload returns a JSON body like
Success — 200 OK
{
"file_id": "a68d2bf16a44efc0efbc35c5564d5230a4d0480e900d495e787688c8ef353dfa",
"object": "file",
"filename": "8_k_nvda_2026_01_20.txt",
"status": "uploaded",
"content_type": "text/plain",
"content_size": 9479,
"created_at": 1776836274
}step 4/5. Initiate generation of a schema
Reference all file_id from step 3.
response = requests.post(
f"{BASE_URL}/v1/xfrukt_schema",
json=files_metas,
headers=headers
)
response_json = response.json()
assert response_json["status"] == "RUNNING"
# 🔑 will be used in the next step
request_id = response_json["request_id"]step 5/5. Download the schema
Poll the job status. Once it reports FINISHED, download the result.
response = requests.get(
f"{BASE_URL}/v1/xfrukt_schema?request_id={request_id}",
headers=headers
)
response_json = response_get.json()
assert response_json.get("status", "") == "FINISHED"Success — 200 OK
{
"request_id": "293db4a4-8a5e-43ec-a530-b187a221f804",
"status": "FINISHED",
"results": [
{
"file_id": "d7ff9ada-7f1d-4682-8885-b4d993de0701",
"filename": "securities_trading_schema.jsonl",
"content": "...",
"format": "application/jsonl",
"error": ""
}
]
}Download the schema
import tempfile
file_ids = [r["file_id"] for r in response_json["results"]]
data = {"file_ids": file_ids}
response = requests.post(
f"{BASE_URL}/v1/download",
json=data,
headers=headers
)
with tempfile.NamedTemporaryFile(mode='wb', delete=False, suffix='.jsonl') as temp_jsonl:
jsonl_path = pathlib.Path(temp_jsonl.name)
for chunk in response.iter_bytes(chunk_size=8192):
temp_jsonl.write(chunk)
all_files = []
line_number = 0
with open(jsonl_path, "r", encoding="utf-8") as jsonl_file:
for line in jsonl_file:
line_number += 1
out = {}
line = line.strip()
if not line:
continue
try:
row = json.loads(line)
except json.JSONDecodeError as err:
# log but continue processing remaining records
print(f"Warning: Malformed JSON at line {line_number}: {err}")
out["error"] = err
all_files.append(out)
continue
file_id = row["file_id"]
filename = row["filename"]
content = row["content"]
out = {
"file_id": file_id,
"filename": filename,
}
all_files.append(out)Resulting schema
Loading flow…
Schemas are returned as JSON Structure notation by default. Contact us to get another format like SQL DDL.
{
"$schema": "https://json-structure.org/meta/core/v0/#",
"$id": "https://xfrukt.com/schemas/917c26c8-b43d-460b-ad0c-29c25151e777",
"name": "securities_trading",
"definitions": {
"types": {
"entity_type_enum": {
"name": "entity_type_enum",
"type": "string",
"enum": [
"CORPORATION",
"LLC",
...
]
},
"period_type_enum": {
"name": "period_type_enum",
"type": "string",
"enum": [
"FY",
"Q1",
...
]
}
},
"tables": {
"entities": {
"type": "object",
"properties": {
"entity_id": {
"name": "entity_id",
"type": "uuid",
"description": "Primary key. May align with external identifiers such as CIK, LEI, or internally generated."
},
"legal_name": {
"name": "legal_name",
"type": "string",
"maxLength": 512
},
"names": {
"name": "names",
"type": "array",
"description": "All known aliases, trading names, former names.",
"items": {
"type": "string"
}
},
"entity_type": {
"name": "entity_type",
"type": {
"$ref": "#/definitions/types/entity_type_enum"
}
},
"tickers": {
"name": "tickers",
"type": "array",
"items": {
"type": "string",
"maxLength": 10
}
},
"lei": {
"name": "lei",
"type": "string",
"description": "ISO 17442 Legal Entity Identifier.",
"minLength": 20,
"maxLength": 20
},
"metadata": {
"name": "metadata",
"type": "map",
"description": "Extensible key-value store for jurisdiction, SIC code, incorporation date, parent entity, and other domain-specific attributes.",
"values": {
"type": "any"
}
}
},
"required": [
"entity_id"
],
"identity": [
"entity_id"
],
"description": "Universal legal or organizational entity. Synonyms: issuer, company, counterparty, filer, fund, SPV."
},
"publications": {
"type": "object",
"properties": {
"publication_id": {
"name": "publication_id",
"type": "uuid"
},
"entity_id": {
"name": "entity_id",
"type": "uuid",
"description": "FK to entities. The filing or reporting entity."
},
"document_ids": {
"name": "document_ids",
"type": "array",
"description": "External identifiers for constituent documents and attachments.",
"items": {
"type": "string"
}
},
"title": {
"name": "title",
"type": "string",
"description": "Human-readable title of the publication."
},
"publication_types": {
"name": "publication_types",
"type": "array",
"description": "All applicable classification labels. Includes SEC form types (10-Q, 8-K, DEF 14A), press releases, etc. Values from dei:DocumentType or 'Other'.",
"items": {
"type": "string"
}
},
"fiscal_year": {
"name": "fiscal_year",
"type": "int16",
"description": "Fiscal year of the reporting period."
}
},
"required": [
"publication_id",
"entity_id"
],
"identity": [
"publication_id"
],
"relations": {
"entity_id": {
"name": "entity_id",
"cardinality": "single",
"targettype": {
"$ref": "#/definitions/tables/entities"
},
"description": "The entity that authored or filed this publication."
}
}
},
"sections": {
"type": "object",
"properties": {
"section_id": {
"name": "section_id",
"type": "uuid",
},
"publication_id": {
"name": "publication_id",
"type": "uuid",
},
"seq": {
"name": "seq",
"type": "int32",
"description": "Ordinal position within the publication across all documents."
},
"text": {
"name": "text",
"type": "string",
"description": "Raw or normalized textual content of the section."
},
"metadata": {
"name": "metadata",
"type": "map",
"description": "Extensible attributes: page number, language, extraction confidence, etc.",
"values": {
"type": "any"
}
}
},
"required": [
"section_id",
"publication_id",
"seq",
"text",
"metadata"
],
"identity": [
"section_id"
],
"description": "Together all sections reconstitute the full document.",
"relations": {
"publication_id": {
"name": "publication_id",
"cardinality": "single",
"targettype": {
"$ref": "#/definitions/tables/publications"
}
}
}
},
"securities": {
"type": "object",
"properties": {
"security_id": {
"name": "security_id",
"type": "uuid"
},
"entity_id": {
"name": "entity_id",
"type": "uuid"
},
"isin": {
"name": "isin",
"type": "string",
"minLength": 12,
"maxLength": 12
},
"metadata": {
"name": "metadata",
"type": "map",
"description": "Extensible attributes: currency, maturity date, coupon rate, strike price, underlying_security_id, contract size, settlement type, etc.",
"values": {
"type": "any"
}
}
},
"required": [
"security_id",
"entity_id",
"isin"
"metadata"
],
"identity": [
"security_id"
],
"description": "Tradable financial instruments. Covers equities, fixed income, derivatives, funds, and structured products. One security per distinct instrument listing.",
"relations": {
"entity_id": {
"name": "entity_id",
"cardinality": "single",
"targettype": {
"$ref": "#/definitions/tables/entities"
}
}
}
},
"price_history": {
"type": "object",
"properties": {
"price_id": {
"name": "price_id",
"type": "uuid"
},
"security_id": {
"name": "security_id",
"type": "uuid"
},
"price_date": {
"name": "price_date",
"type": "date"
},
"open_price": {
"name": "open_price",
"type": "decimal"
"precision": 18,
"scale": 6
},
"high_price": {
"name": "high_price",
"type": "decimal"
"precision": 18,
"scale": 6
},
"low_price": {
"name": "low_price",
"type": "decimal"
"precision": 18,
"scale": 6
},
"close_price": {
"name": "close_price",
"type": "decimal",
"precision": 18,
"scale": 6
},
"adjusted_close": {
"name": "adjusted_close",
"type": "decimal",
"precision": 18,
"scale": 6
},
"volume": {
"name": "volume",
"type": "int64"
},
"vwap": {
"name": "vwap",
"type": "decimal",
"precision": 18,
"scale": 6
},
"trade_count": {
"name": "trade_count",
"type": "int32",
"description": "Number of discrete trades in the session."
},
"source_name": {
"name": "source_name",
"type": "string",
"description": "Data vendor or exchange feed identifier.",
"maxLength": 50
}
},
"required": [
"price_id",
"security_id",
"price_date",
"open_price",
"high_price",
"low_price",
"close_price",
"adjusted_close",
"volume",
"vwap",
"trade_count",
"source_name"
],
"identity": [
"price_id"
],
"description": "End-of-day and intraday OHLCV price records per security per date. Source of truth for market price time series.",
"relations": {
"security_id": {
"name": "security_id",
"cardinality": "single",
"targettype": {
"$ref": "#/definitions/tables/securities"
}
}
}
}
}
}
}