v0.5.2 stable

DataEngineX: One config file.
Complete stack, built in.

Production-ready Data + ML + AI engineering, unified. Works standalone or alongside Airflow, MLflow, and LangChain.

pip install dataenginex — or: uv add dataenginex

dex.yaml

data:
  source: s3://my-bucket/raw/
  format: parquet
  quality:
    null_threshold: 0.05

ml:
  backend: mlflow
  training:
    model: xgboost
    target: revenue

ai:
  provider: openai
  retrieval: hybrid
  agents:
    - name: analyst
      tools: [sql, search]

observability:
  metrics: prometheus
  tracing: otel

Not just orchestration. Not just tracking.
The complete stack.

1 config file

0 vendor lock-in

∞ swappable backends

Airflow for orchestration. MLflow for tracking. LangChain for agents. FastAPI wired together by hand. Prometheus bolted on. Each tool: its own config format, auth system, failure mode, oncall rotation. Stop building glue. Start shipping products.

Everything included

Six domains. One framework. No assembly required.

Data

Connectors, transforms, and quality checks from a single config. DuckDB and Spark backends built in.

ML Lifecycle

Experiment tracking, training, serving, and drift detection built in. MLflow, W&B, or the built-in backend — your call.

AI Agents

LLM providers, hybrid BM25+dense retrieval, and LangGraph agent runtime — swappable, not locked in.

DEX Studio

Self-hosted web UI — pipelines, warehouse, ML experiments, AI agents, and SQL console. FastAPI/Jinja2, port 7860.

Observability

structlog structured logging, Prometheus metrics, and OpenTelemetry tracing — wired up from config, not code.

Deploy

K3s, Helm, and Terraform via infradex. From dev to production Kubernetes cluster without writing manifests by hand.

One file. Everything configured.

dex.yaml is the single source of truth for your entire platform. Sources, transforms, quality rules, model config, agent definitions, API settings, and observability — all in one place.

No more hunting across twelve repos to find why a pipeline broke. No more "it works in dev" because dev and prod share the same config schema.

Validate config with dex validate dex.yaml
Swap backends without changing application code
Strict Pydantic validation — config errors caught before runtime
Same config format from laptop to production cluster

Read the docs

dex.yaml — full example

# DataEngineX — full stack config

data:
  source: s3://my-bucket/raw/
  format: parquet
  backend: duckdb        # or spark
  quality:
    null_threshold: 0.05
    schema_enforcement: strict
    audit_table: quality.audit

ml:
  backend: mlflow
  tracking_uri: http://mlflow:5000
  training:
    model: xgboost
    target: revenue
    features: [clicks, sessions, region]
  serving:
    endpoint: /api/v1/predict
    drift_detection: true

ai:
  provider: openai
  model: gpt-4o-mini
  retrieval: hybrid   # BM25 + dense
  agents:
    - name: analyst
      tools: [sql, search, python]

observability:
  metrics: prometheus
  tracing: otel
  log_level: info

One core. Three components.

Each component is independently useful. Together they cover data, ML, AI, and careers.

v0.5.2

dataenginex

pip install dataenginex

Core framework — config system, backend registry, CLI, ML lifecycle, AI agents, DuckDB lakehouse. Pure Python library, no server bundled.

View on GitHub →

FastAPI

dex-studio

Port 7860

B2B web UI — FastAPI + Jinja2 + HTMX. Monitor pipelines, browse data, inspect ML experiments, chat with AI agents, query via SQL console.

View on GitHub →

K3s + Helm

infradex

Terraform + Helm

K3s cluster config, Helm charts for Authentik, Langfuse, Qdrant, Prometheus, Grafana, ArgoCD. From blank VPS to production Kubernetes — no manual YAML.

View on GitHub →

Ready to unify your stack?

Install the base package or pick the extras you need.

Base

pip install dataenginex
# or
uv add dataenginex

Extras

pip install "dataenginex[cloud]"  # S3 · GCS · BigQuery
pip install "dataenginex[qdrant]" # Qdrant vector store
pip install 'litellm>=1.83.3' --no-deps # 100+ LLM providers

Getting Started Guide View dex-studio on GitHub

DataEngineX: One config file.Complete stack, built in.

Not just orchestration. Not just tracking.The complete stack.

Everything included

Data

ML Lifecycle

AI Agents

DEX Studio

Observability

Deploy

One file. Everything configured.

One core. Three components.

dataenginex

dex-studio

infradex

Ready to unify your stack?

DataEngineX: One config file.
Complete stack, built in.

Not just orchestration. Not just tracking.
The complete stack.