The applied-ml repository is a rich, curated collection of papers, technical articles, and case-study blog posts about how machine learning (ML) and data-driven systems are applied in real production environments by major companies. Instead of focusing solely on theoretical ML research, this repo highlights industry-scale challenges: data collection, quality, infrastructure, feature stores, model serving, monitoring, scalability, and how ML is embedded in product workflows. It acts as a living library for practitioners who want to learn from real-world successes and failures — giving insight into how large organizations structure their data pipelines, how they manage ML lifecycle at scale, and what architectural or operational tradeoffs they made. For someone designing—or planning to build—a production ML system, this repo provides patterns, precedents, and lessons learned from firms that operate at big scale.
Features
- Curated collection of papers, blog posts, and case studies about ML in production
- Coverage of multiple ML-system aspects: data ingestion, quality, feature engineering, storage, serving, monitoring
- Real-world company-scale examples — learn from what large firms built and operate
- Open-source under MIT license, enabling easy adoption, contributions, or adaptation
- Broad scope covering different domains (recommendation, data engineering, forecasting, classification, etc.)
- Useful as a reference library for designing, auditing, or improving ML workflows in production