Data science projects are notorious for eating weeks on repetitive tasks: loading messy CSV files, cleaning data, generating exploratory plots, and iterating through model experiments. This repository flips that script by deploying specialized AI agents that handle each workflow step autonomously - from initial data inspection to MLflow model tracking.
The flagship AI Pipeline Studio creates a visual workspace where you orchestrate both manual and AI-driven steps, with full lineage tracking and reproducible scripts. Each agent is purpose-built: the Data Wrangling Agent handles feature engineering, the EDA Agent generates insightful visualizations, and the Modeling Agent experiments with H2O AutoML. The pipeline approach means you can jump between datasets, merge workflows, and save projects with metadata-only footprints for easy sharing.
At 4.6k stars and actively maintained by Business Science, this hits the sweet spot for data teams wanting to 10x their productivity without losing control. The Python library works standalone, but the Streamlit app showcases the full agent orchestration in action. If you’ve ever wished for a competent data science intern who never sleeps, this is remarkably close.
⭐ Stars: 4634
💻 Language: Python
🔗 Repository: business-science/ai-data-science-team