Top recent duckdb news, how-tos and comparisions
2025-02-28
This article details the author's experience working with Environment Agency flood and river level data using DuckDB for rapid ingest and prototyping and Rill for visualization. The author found that increasing the `maximum_object_size` parameter was necessary to pull in more complete data from the API.
2025-02-26
Today, we delve into the world of geospatial analysis using DuckDB and MotherDuck as powerful tools. We'll explore how these technologies can be used for efficient data processing and visualization in a variety of applications, from store locators to delivery services. By leveraging their capabilities, you can unlock new insights and enhance your digital experiences.
2025-02-26
This article discusses the use of DuckDB and its Google Sheets extension for data analysis workflows. It highlights how to automate interactions with Google Sheets using persistent secrets and provides examples for integrating the GSheets extension into GitHub Actions pipelines. The piece also outlines future potential features and encourages community contributions.
2025-02-26
This documentation provides a detailed guide on how to perform vector search on the Hugging Face Hub using DuckDB as an in-memory database. The process involves creating embeddings for datasets and storing them back to the Hub. It covers both approaches: performing vector search without an index (slower but more precise) and with an index (faster but less precise). Additionally, it includes setup instructions for installing necessary dependencies, creating embeddings, and querying the dataset using DuckDB.
2025-02-25
DuckDB and PostgreSQL are used together at EthicalAds to handle analytical processing efficiently. While PostgreSQL handles transactional workloads, DuckDB is used for expensive aggregation queries that need faster response times for reporting purposes. Parquet files are stored in cloud storage, which are then queried using DuckDB. Joins between the aggregated data from DuckDB and the Postgres database allow for rich reports without overloading the production database. Challenges include performance issues during cross-database joins and a desire for direct integration of pg_parquet with Azure Managed PostgreSQL to optimize data processing.
2025-02-25
DuckDB introduces a new feature in version 1.2.0 allowing SQL aliases to be defined before the expression they reference using a colon (:) syntax. This change aims to make aliases easier to read and find within complex queries.
2025-02-24
This case study outlines the transition from SQLite to DuckDB in Trace, a macOS time tracking application. The move was motivated by performance and storage efficiency improvements.
2025-02-14
This content is about using MotherDuck and Preswald to quickly turn large datasets into interactive data dashboards. It includes step-by-step instructions, code snippets, and practical examples for building a dashboard on cholesterol estimates. The guide also highlights the benefits of combining MotherDuck's speed and scalability with Preswald's ease of use for real-time exploration and sharing.
2025-02-11
This blog post explains how to use DuckDB to generate TPC test data and export it as Parquet files for loading. It covers the installation of DuckDB, running DuckDB to generate the data, and exporting the generated data using DuckDB's native `EXPORT` SQL command.
2025-02-11
This blog post discusses the use of Azure Functions and related tools for data processing tasks. The author shares his experience with using Azure Functions over runbooks and Pandas for CSV to Parquet transformations.
2025-02-05
DuckDB 1.2.0 release includes various improvements such as CLI safe mode, friendly SQL features like prefix aliases and RENAME clause in SELECT, optimizations in the optimizer, a new C API for extensions, support for musl libraries, and many other enhancements. These updates improve DuckDB's functionality and usability while maintaining compatibility with different platforms.
2025-01-27
Judy discusses the convenience and power of esProc for handling complex data processing tasks. She highlights how SPL (Script Processing Language) can simplify queries on various file types such as CSV, JSON, Excel, and more. The discussion includes examples of step-by-step SQL-like commands and the use of SPL's unique syntax. Judy also mentions that while esProc's SQL is limited to a subset of SQL92, it excels in order-related calculations. She notes that esProc can be integrated into applications as an embedded database using its JDBC driver.
2025-01-25
This article discusses how Mike Ritchie at Definite implemented a solution using DuckDB and Arrow Flight for streaming data in near real-time analytics. The solution addresses the concurrency limitations of DuckDB by leveraging Arrow Flight to allow multiple writers and readers simultaneously.
2025-01-22
pg_mooncake outperforms DuckDB in handling Parquet files by leveraging Postgres for metadata management and optimizing query execution through detailed column statistics and caching mechanisms. While the system routes queries to DuckDB, it significantly reduces overhead associated with external catalog querying and I/O.
2025-01-20
Dieser Artikel zeigt, wie man Machine-Learning-Modelle schnell und effizient bewerten kann. Es erklärt die Anwendung von Quality Gates für den Automatisierung der Bewertung und bietet praktische Beispiele.