v1.1.2 stable

DataEngineX: One config file.
Complete stack, built in.

Production-ready Data + ML + AI engineering, unified. Works standalone or alongside Airflow, MLflow, and LangChain.

pip install dataenginex — or: uv add dataenginex
dex.yaml
data:
  source: s3://my-bucket/raw/
  format: parquet
  quality:
    null_threshold: 0.05

ml:
  backend: mlflow
  training:
    model: xgboost
    target: revenue

ai:
  provider: openai
  retrieval: hybrid
  agents:
    - name: analyst
      tools: [sql, search]

observability:
  metrics: prometheus
  tracing: otel

Not just orchestration. Not just tracking.
The complete stack.

1 config file
0 vendor lock-in
swappable backends

Airflow for orchestration. MLflow for tracking. LangChain for agents. FastAPI wired together by hand. Prometheus bolted on. Each tool: its own config format, auth system, failure mode, oncall rotation. Stop building glue. Start shipping products.

Everything included

Six domains. One framework. No assembly required.

Data

Connectors, transforms, and quality checks from a single config. DuckDB and Spark backends built in.

ML Lifecycle

Experiment tracking, training, serving, and drift detection built in. MLflow, W&B, or the built-in backend — your call.

AI Agents

LLM providers, hybrid BM25+dense retrieval, and LangGraph agent runtime — swappable, not locked in.

DEX Studio

Self-hosted web UI — pipelines, warehouse, ML experiments, AI agents, and SQL console. FastAPI/Jinja2, port 7860.

Observability

structlog structured logging, Prometheus metrics, and OpenTelemetry tracing — wired up from config, not code.

Deploy

K3s, Helm, and Terraform via infradex. From dev to production Kubernetes cluster without writing manifests by hand.

One file. Everything configured.

dex.yaml is the single source of truth for your entire platform. Sources, transforms, quality rules, model config, agent definitions, API settings, and observability — all in one place.

No more hunting across twelve repos to find why a pipeline broke. No more "it works in dev" because dev and prod share the same config schema.

  • Validate config with dex validate dex.yaml
  • Swap backends without changing application code
  • Strict Pydantic validation — config errors caught before runtime
  • Same config format from laptop to production cluster
Read the docs
dex.yaml — full example
# DataEngineX — full stack config

data:
  source: s3://my-bucket/raw/
  format: parquet
  backend: duckdb        # or spark
  quality:
    null_threshold: 0.05
    schema_enforcement: strict
    audit_table: quality.audit

ml:
  backend: mlflow
  tracking_uri: http://mlflow:5000
  training:
    model: xgboost
    target: revenue
    features: [clicks, sessions, region]
  serving:
    endpoint: /api/v1/predict
    drift_detection: true

ai:
  provider: openai
  model: gpt-4o-mini
  retrieval: hybrid   # BM25 + dense
  agents:
    - name: analyst
      tools: [sql, search, python]

observability:
  metrics: prometheus
  tracing: otel
  log_level: info

One core. Four components.

Each component is independently useful. Together they cover data, ML, AI, and careers.

v1.1.2

dataenginex

pip install dataenginex

Core framework — config system, backend registry, CLI, ML lifecycle, AI agents, DuckDB lakehouse. Pure Python library, no server bundled.

View on GitHub →
FastAPI

dex-studio

Port 7860

B2B web UI — FastAPI + Jinja2 + HTMX. Monitor pipelines, browse data, inspect ML experiments, chat with AI agents, query via SQL console.

View on GitHub →
Python

careerdex

Port 7870

B2C career AI — job matching, resume analysis, ATS scanning, interview prep, and application tracking. Powered by the same dex core.

View on GitHub →
K3s + Helm

infradex

Terraform + Helm

K3s cluster config, Helm charts for Authentik, Langfuse, Qdrant, Prometheus, Grafana, ArgoCD. From blank VPS to production Kubernetes — no manual YAML.

View on GitHub →

Ready to unify your stack?

Install the base package or pick the extras you need.

Base
pip install dataenginex
# or
uv add dataenginex
Extras
pip install "dataenginex[cloud]"         # S3 · GCS · BigQuery
pip install "dataenginex[observability]" # Langfuse LLM tracing
pip install 'litellm>=1.83.3' --no-deps  # 100+ LLM providers