DataHub Python Builds

These prebuilt wheel files can be used to install our Python packages as of a specific commit.

Build context

Built at 2025-10-28T03:14:11.000047+00:00.

{
  "timestamp": "2025-10-28T03:14:11.000047+00:00",
  "branch": "feat-make-pyspark-optional",
  "commit": {
    "hash": "432e0ce48c69f118bdc46e77d6c68adc58c5fbdb",
    "message": "feat(ingestion): Make PySpark optional for S3, ABS, and Unity Catalog sources\n\nPySpark and PyDeequ have been required dependencies for S3, ABS, and Unity Catalog\nsources, even when profiling is disabled. This creates unnecessary installation\noverhead (~500MB) and potential dependency conflicts for users who don't need\nprofiling capabilities.\n\n**PySpark Detection Framework**\n- Added `pyspark_utils.py` with centralized availability detection\n- Graceful fallback when PySpark/PyDeequ unavailable\n- Clear error messages guiding users to install dependencies when needed\n\n**Modular Installation Options**\n- S3/ABS/GCS sources now work without PySpark when profiling is disabled\n- New `data-lake-profiling` extra for modular PySpark installation\n- Convenience extras: `s3-profiling`, `gcs-profiling`, `abs-profiling`\n- Unity Catalog gracefully falls back to sqlglot when PySpark unavailable\n\n**Config Validation**\n- Added validators to S3/ABS configs to check PySpark availability at config time\n- Validates profiling dependencies before attempting to use them\n- Provides actionable error messages with installation instructions\n\n**Installation Examples**\n```bash\npip install 'acryl-datahub[s3]'\n\npip install 'acryl-datahub[s3,data-lake-profiling]'\n\npip install 'acryl-datahub[s3-profiling]'\n```\n\n**Dependencies**\n- PySpark ~=3.5.6 (in `data-lake-profiling` extra)\n- PyDeequ >=1.1.0 (data quality validation)\n\n**Benefits**\n- Reduced footprint: Base installs ~500MB smaller without PySpark\n- Faster installs: No PySpark compilation for non-profiling users\n- Better errors: Clear messages when profiling needs PySpark\n- Flexibility: Users choose profiling support level\n- Backward compatible: Existing installations continue working\n\n**Testing**\n- Added 46+ unit tests validating optional PySpark functionality\n- Tests cover availability detection, config validation, and graceful fallbacks\n- All existing tests continue to pass\n\nSee docs/PYSPARK.md for detailed installation and usage guide."
  },
  "pr": {
    "number": 15123,
    "title": "feat(ingestion): Make PySpark optional for S3, ABS, and Unity Catalog sources",
    "url": "https://github.com/datahub-project/datahub/pull/15123"
  }
}

Usage

Current base URL: unknown

Package Size Install command
acryl-datahub 2.414 MB uv pip install 'acryl-datahub @ <base-url>/artifacts/wheels/acryl_datahub-0.0.0.dev1-py3-none-any.whl'
acryl-datahub-actions 0.101 MB uv pip install 'acryl-datahub-actions @ <base-url>/artifacts/wheels/acryl_datahub_actions-0.0.0.dev1-py3-none-any.whl'
acryl-datahub-airflow-plugin 0.039 MB uv pip install 'acryl-datahub-airflow-plugin @ <base-url>/artifacts/wheels/acryl_datahub_airflow_plugin-0.0.0.dev1-py3-none-any.whl'
acryl-datahub-dagster-plugin 0.019 MB uv pip install 'acryl-datahub-dagster-plugin @ <base-url>/artifacts/wheels/acryl_datahub_dagster_plugin-0.0.0.dev1-py3-none-any.whl'
acryl-datahub-gx-plugin 0.010 MB uv pip install 'acryl-datahub-gx-plugin @ <base-url>/artifacts/wheels/acryl_datahub_gx_plugin-0.0.0.dev1-py3-none-any.whl'
prefect-datahub 0.011 MB uv pip install 'prefect-datahub @ <base-url>/artifacts/wheels/prefect_datahub-0.0.0.dev1-py3-none-any.whl'