Top recent duckdb news, how-tos and comparisions

Exploring UK Environment Agency Data with DuckDB and Rill
2025-02-28
This article details the author's experience working with Environment Agency flood and river level data using DuckDB for rapid ingest and prototyping and Rill for visualization. The author found that increasing the `maximum_object_size` parameter was necessary to pull in more complete data from the API.
A BEGINNER’S GUIDE TO GEOSPATIAL WITH DUCKDB SPATIAL AND MOTHERDUCK
2025-02-26
Today, we delve into the world of geospatial analysis using DuckDB and MotherDuck as powerful tools. We'll explore how these technologies can be used for efficient data processing and visualization in a variety of applications, from store locators to delivery services. By leveraging their capabilities, you can unlock new insights and enhance your digital experiences.
Reading and Writing Google Sheets in DuckDB
2025-02-26
This article discusses the use of DuckDB and its Google Sheets extension for data analysis workflows. It highlights how to automate interactions with Google Sheets using persistent secrets and provides examples for integrating the GSheets extension into GitHub Actions pipelines. The piece also outlines future potential features and encourages community contributions.
Vector Search with DuckDB
2025-02-26
This documentation provides a detailed guide on how to perform vector search on the Hugging Face Hub using DuckDB as an in-memory database. The process involves creating embeddings for datasets and storing them back to the Hub. It covers both approaches: performing vector search without an index (slower but more precise) and with an index (faster but less precise). Additionally, it includes setup instructions for installing necessary dependencies, creating embeddings, and querying the dataset using DuckDB.
DuckDB and PostgreSQL Make a Great Pair for Analytical Processing
2025-02-25
DuckDB and PostgreSQL are used together at EthicalAds to handle analytical processing efficiently. While PostgreSQL handles transactional workloads, DuckDB is used for expensive aggregation queries that need faster response times for reporting purposes. Parquet files are stored in cloud storage, which are then queried using DuckDB. Joins between the aggregated data from DuckDB and the Postgres database allow for rich reports without overloading the production database. Challenges include performance issues during cross-database joins and a desire for direct integration of pg_parquet with Azure Managed PostgreSQL to optimize data processing.
DuckDB: Prefix aliases make SQL more readable
2025-02-25
DuckDB introduces a new feature in version 1.2.0 allowing SQL aliases to be defined before the expression they reference using a colon (:) syntax. This change aims to make aliases easier to read and find within complex queries.
Why We Moved from SQLite to DuckDB: 5x Faster Queries, ~80% Less Storage
2025-02-24
This case study outlines the transition from SQLite to DuckDB in Trace, a macOS time tracking application. The move was motivated by performance and storage efficiency improvements.
Faster health data analysis with MotherDuck and Preswald
2025-02-14
This content is about using MotherDuck and Preswald to quickly turn large datasets into interactive data dashboards. It includes step-by-step instructions, code snippets, and practical examples for building a dashboard on cholesterol estimates. The guide also highlights the benefits of combining MotherDuck's speed and scalability with Preswald's ease of use for real-time exploration and sharing.
DuckDB is the best TPC data generator
2025-02-11
This blog post explains how to use DuckDB to generate TPC test data and export it as Parquet files for loading. It covers the installation of DuckDB, running DuckDB to generate the data, and exporting the generated data using DuckDB's native `EXPORT` SQL command.
DuckDB vs ClickHouse performance comparison for structured data serialization and in-memory TPC-DS queries execution
2025-02-11
This blog post discusses the use of Azure Functions and related tools for data processing tasks. The author shares his experience with using Azure Functions over runbooks and Pandas for CSV to Parquet transformations.
Announcing DuckDB 1.2.0
2025-02-05
DuckDB 1.2.0 release includes various improvements such as CLI safe mode, friendly SQL features like prefix aliases and RENAME clause in SELECT, optimizations in the optimizer, a new C API for extensions, support for musl libraries, and many other enhancements. These updates improve DuckDB's functionality and usability while maintaining compatibility with different platforms.
Running SQL on files with the esProc is very convenient, on par with duckDB
2025-01-27
Judy discusses the convenience and power of esProc for handling complex data processing tasks. She highlights how SPL (Script Processing Language) can simplify queries on various file types such as CSV, JSON, Excel, and more. The discussion includes examples of step-by-step SQL-like commands and the use of SPL's unique syntax. Judy also mentions that while esProc's SQL is limited to a subset of SQL92, it excels in order-related calculations. She notes that esProc can be integrated into applications as an embedded database using its JDBC driver.
A Duck Takes Flight
2025-01-25
This article discusses how Mike Ritchie at Definite implemented a solution using DuckDB and Arrow Flight for streaming data in near real-time analytics. The solution addresses the concurrency limitations of DuckDB by leveraging Arrow Flight to allow multiple writers and readers simultaneously.
How to improve DuckDB performance on Parquet?
2025-01-22
pg_mooncake outperforms DuckDB in handling Parquet files by leveraging Postgres for metadata management and optimizing query execution through detailed column statistics and caching mechanisms. While the system routes queries to DuckDB, it significantly reduces overhead associated with external catalog querying and I/O.
Access Databricks UnityCatalog from duckdb
2025-01-20
Dieser Artikel zeigt, wie man Machine-Learning-Modelle schnell und effizient bewerten kann. Es erklärt die Anwendung von Quality Gates für den Automatisierung der Bewertung und bietet praktische Beispiele.
Report with all data