Import public NYC taxi and for-hire vehicle (Uber, Lyft)
An AI-powered data science team of agents
Links to everything you'd ever want to learn about data engineering
Simple tools for data cleaning in R
An end-to-end Data Scientist
Analytics for developers, setup Analytics in 30 seconds
Basic To Intermediate Python data science guide
ExtractThinker is a Document Intelligence library for LLMs
CSV Lint plug-in for Notepad++ for syntax highlighting
The open source mesh processing system
Clean Jupyter notebooks of outputs, metadata, and empty cells
FDUPES is a program for identifying or deleting duplicate files
Data and tools for generating and inspecting OLMo pre-training data
Miller is like awk, sed, cut, join, and sort for name-indexed data
PandasAI is a Python library that integrates generative AI
Converts books written in Markdown to HTML, LaTeX/PDF and EPUB
Scan and remove junk files, caches, logs, and more
Java dataframe and visualization library
Automated Tool for Optimized Modelling
Scalable data pre processing and curation toolkit for LLMs
Big Model Application Development Practice 1
A natural language interface for computers
Cleans HTML to avoid XSS attacks
Jupyter notebooks that walk you through the fundamentals of ML
Haskell code prettifier