Latest Articles

Invalid Date

Migrating to Apache Iceberg: Strategies for Every Source System

4/29/2026

<!-- Meta Description: Migrate to Iceberg from Hive, data warehouses, or raw files using in-place migration, full rewrite, or the zero-downtime view s...

Hands-On with Apache Iceberg Using Dremio Cloud

4/29/2026

<!-- Meta Description: A practical walkthrough of creating, querying, and optimizing Iceberg tables on Dremio Cloud, from account setup to AI-powered ...

Approaches to Streaming Data into Apache Iceberg Tables

4/29/2026

<!-- Meta Description: Stream data into Iceberg with Spark Structured Streaming, Flink, or Kafka Connect. Here is how each works and the trade-offs be...

Using Apache Iceberg with Python and MPP Query Engines

4/29/2026

<!-- Meta Description: Access Iceberg tables from Python with PyIceberg, DuckDB, and Polars, or through MPP engines like Dremio, Spark, and Trino. Her...

Apache Iceberg Metadata Tables: Querying the Internals

4/29/2026

<!-- Meta Description: Iceberg metadata tables let you query snapshots, files, manifests, and partitions using SQL. Here is every metadata table and h...

Maintaining Apache Iceberg Tables: Compaction, Expiry, and Cleanup

4/29/2026

<!-- Meta Description: Keep Iceberg tables fast with compaction, snapshot expiry, orphan cleanup, and manifest rewriting. Here is when and how to run ...

Concurrency, Isolation, and MVCC: How Engines Handle Contention

4/29/2026

<!-- Meta Description: Databases handle concurrent access using locks, MVCC, or optimistic concurrency control. Here is how each approach works and wh...

How Data Lake Table Storage Degrades Over Time

4/29/2026

<!-- Meta Description: Iceberg tables degrade through small files, orphan files, metadata bloat, sort order decay, and partition skew. Here is how to ...

Hash, Sort-Merge, Broadcast: How Distributed Joins Work

4/29/2026

<!-- Meta Description: Distributed joins move data across the network using shuffle, broadcast, or co-location strategies. Here is how each works and ...

When Catalogs Are Embedded in Storage

4/29/2026

<!-- Meta Description: S3 Tables and MinIO AI Stor embed the Iceberg catalog directly in the storage layer. Here is when embedded catalogs make sense ...

Partitioning, Sharding, and Data Distribution Strategies

4/29/2026

<!-- Meta Description: Hash partitioning distributes data evenly. Range partitioning enables fast range scans. Both create tradeoffs. Here is how data...

What Are Lakehouse Catalogs? The Role of Catalogs in Apache Iceberg

4/29/2026

<!-- Meta Description: Lakehouse catalogs store metadata pointers, manage namespaces, and enforce access control. Here is the complete catalog landsca...

Buffer Pools, Caches, and the Memory Hierarchy

4/29/2026

<!-- Meta Description: Databases use buffer pools, column caches, and result caches to keep hot data in RAM. Here is how each caching strategy works a...

Writing to an Apache Iceberg Table: How Commits and ACID Actually Work

4/29/2026

<!-- Meta Description: Here is exactly how an engine writes to an Iceberg table, step by step, from data files through the atomic commit that makes AC...

Volcano, Vectorized, Compiled: How Engines Execute Your Query

4/29/2026

<!-- Meta Description: The Volcano model processes one row at a time. Vectorized execution processes batches with SIMD. Code generation fuses operator...

Hidden Partitioning: How Iceberg Eliminates Accidental Full Table Scans

4/29/2026

<!-- Meta Description: Iceberg's hidden partitioning separates physical layout from user queries using transform functions. Here is how it works and w...

Inside the Query Optimizer: How Engines Pick a Plan

4/29/2026

<!-- Meta Description: Query optimizers transform SQL into execution plans using rule-based rewrites, cost-based search, and adaptive runtime adjustme...

Partition Evolution: Change Your Partitioning Without Rewriting Data

4/29/2026

<!-- Meta Description: Iceberg lets you change partition schemes without rewriting data. Here is how partition evolution works internally and why Hive...

B-Trees, LSM Trees, and the Indexing Tradeoff Spectrum

4/29/2026

<!-- Meta Description: B-trees balance reads and writes for OLTP. LSM trees maximize write throughput. Bitmap indexes accelerate OLAP filtering. Here ...

Performance and Apache Iceberg's Metadata

4/29/2026

<!-- Meta Description: Iceberg's three-layer metadata tree eliminates directory listing and enables multi-level data skipping. Here is how scan planni...

How Databases Organize Data on Disk: Pages, Blocks, and File Formats

4/29/2026

<!-- Meta Description: Databases structure data on disk as heap files, sorted files, or LSM trees, then wrap it in formats like Parquet with metadata ...

The Metadata Structure of Modern Table Formats

4/29/2026

<!-- Meta Description: Iceberg uses a metadata tree, Delta Lake uses a transaction log, Hudi uses a timeline. Here is exactly how each format organize...

Row vs. Column: How Storage Layout Shapes Everything

4/29/2026

<!-- Meta Description: Row stores keep records together for fast transactions. Column stores keep field values together for fast analytics. Here is ho...

What Are Table Formats and Why Were They Needed?

4/29/2026

<!-- Meta Description: Table formats like Apache Iceberg solved the ACID, schema, and performance problems that turned data lakes into data swamps. He...

How Query Engines Think: The Tradeoffs Behind Every Data System

4/29/2026

<!-- Meta Description: Every database is a collection of engineering tradeoffs. Learn the 9 design decisions that shape how query engines store, index...

Agentic Analytics on the Apache Lakehouse

4/13/2026

*Read the complete Open Source and the Lakehouse series:* * [Part 1: Apache Software Foundation: History, Purpose, and Process](/blog/2026-04-apache-s...

What is Apache Iceberg? The Table Format Revolution

4/13/2026

*Read the complete Open Source and the Lakehouse series:* * [Part 1: Apache Software Foundation](/blog/2026-04-apache-software-foundation) * [Part 2: ...

What is Apache Arrow? Erasing the Serialization Tax

4/13/2026

*Read the complete Open Source and the Lakehouse series:* * [Part 1: Apache Software Foundation: History, Purpose, and Process](/blog/2026-04-apache-s...

What is Apache Parquet? Columns, Encoding, and Performance

4/13/2026

*Read the complete Open Source and the Lakehouse series:* * [Part 1: Apache Software Foundation: History, Purpose, and Process](/blog/2026-04-apache-s...

What is Apache Polaris? Unifying the Iceberg Ecosystem

4/13/2026

*Read the complete Open Source and the Lakehouse series:* * [Part 1: Apache Software Foundation: History, Purpose, and Process](/blog/2026-04-apache-s...

Assembling the Apache Lakehouse: The Modular Architecture

4/13/2026

*Read the complete Open Source and the Lakehouse series:* * [Part 1: Apache Software Foundation: History, Purpose, and Process](/blog/2026-04-apache-s...

Apache Software Foundation: History, Purpose, and Process

4/13/2026

*Read the complete Open Source and the Lakehouse series:* * [Part 1: Apache Software Foundation](/blog/2026-04-apache-software-foundation) * [Part 2: ...

The Model Context Protocol (MCP) Explained: A Complete Guide to How Every Major AI Tool Connects to External Data

3/7/2026

The Model Context Protocol (MCP) has become the universal standard for connecting AI models to external tools, data sources, and services. Originally ...

Context Management Strategies for VS Code with LLM Plugins: A Complete Guide to Building Your Own AI-Powered IDE

3/7/2026

Visual Studio Code is the most widely used code editor in the world, and its extensibility means you can integrate AI capabilities through a growing e...

Context Management Strategies for T3 Chat: A Complete Guide to the Unified Multi-Model AI Interface

3/7/2026

T3 Chat is a modern web-based AI chat interface that gives you access to multiple AI models through a single unified platform. Its primary value propo...

Context Management Strategies for Zed: A Complete Guide to the High-Performance AI Code Editor

3/7/2026

Zed is a high-performance code editor built in Rust that prioritizes speed, simplicity, and real-time collaboration. Its AI integration is designed to...

Context Management Strategies for Windsurf: A Complete Guide to the AI Flow IDE

3/7/2026

Windsurf is an AI-powered IDE built on the VS Code foundation that introduces the concept of "Flows," a paradigm where the AI maintains deep awareness...

Context Management Strategies for Perplexity AI: A Complete Guide to Research-First AI Conversations

3/7/2026

Perplexity AI occupies a unique position in the AI landscape: it is a research-first tool that combines conversational AI with real-time web search to...

Context Management Strategies for Cursor: A Complete Guide to the AI-Native Code Editor

3/7/2026

Cursor is an AI-native code editor built on the VS Code foundation that integrates AI deeply into every aspect of the development workflow. Its contex...

Context Management Strategies for OpenWork: A Complete Guide to the Desktop AI Agent Framework

3/7/2026

OpenWork is a desktop-native AI agent framework designed for local, multi-step task execution on your computer. Unlike browser-based AI tools or termi...

Context Management Strategies for OpenCode: A Complete Guide to the Open-Source Terminal AI Agent

3/7/2026

OpenCode is an open-source terminal-based AI coding agent that prioritizes privacy, local-first operation, and broad model provider support. Built as ...

Context Management Strategies for Google Antigravity: A Complete Guide to the Agent-First IDE

3/7/2026

Google Antigravity is an agent-first IDE built by Google DeepMind's Advanced Agentic Coding team. It approaches context management differently from ot...

Context Management Strategies for Gemini CLI: A Complete Guide to Terminal-Native AI Development

3/7/2026

Gemini CLI is an open-source terminal agent powered by Gemini models that operates directly in your command line. It brings Google's AI capabilities i...

Context Management Strategies for Gemini Web and NotebookLM: A Complete Guide to Google's AI Knowledge Ecosystem

3/7/2026

Google's AI ecosystem for knowledge work consists of two deeply integrated tools: Gemini (the conversational AI at gemini.google.com) and NotebookLM (...

Context Management Strategies for Claude Code: A Complete Guide for Developers

3/7/2026

Claude Code is a terminal-native agentic coding assistant that lives in your command line and operates directly on your codebase. Unlike chat-based in...

Context Management Strategies for Claude CoWork: A Complete Guide for Knowledge Workers

3/7/2026

Claude CoWork represents a fundamentally different approach to AI context management. Unlike chat interfaces where you send messages and receive respo...

Context Management Strategies for Claude Desktop: A Complete Guide to MCP, Computer Use, and Local File Access

3/7/2026

Claude Desktop takes everything available in Claude Web and adds three capabilities that fundamentally change how you manage context: MCP server conne...

Context Management Strategies for Claude Web: A Complete Guide to Projects, Artifacts, and Intelligent Context

3/7/2026

Claude's web interface at claude.ai combines one of the largest context windows in the industry with a structured Project system that makes it genuine...

Context Management Strategies for OpenAI Codex: A Complete Guide Across Browser, CLI, and App

3/7/2026

OpenAI Codex is not a chatbot. It is an autonomous software engineering agent that runs tasks in isolated cloud sandboxes, operates across a browser i...

Context Management Strategies for ChatGPT: A Complete Guide to Getting Better Results

3/7/2026

Getting consistently useful results from ChatGPT requires more than writing good prompts. The real differentiator is how you manage context: the backg...

How to Use Dremio with OpenWork: Connect, Query, and Build Data Apps

3/5/2026

OpenWork is an open-source desktop AI agent built on the OpenCode engine. It runs entirely on your machine with your own API keys, giving you full con...

How to Use Dremio with OpenCode: Connect, Query, and Build Data Apps

3/5/2026

OpenCode is an open-source, terminal-based AI coding agent released under the MIT license. It provides a TUI with split panes, uses the Language Serve...

How to Use Dremio with Zed: Connect, Query, and Build Data Apps

3/5/2026

Zed is an open-source, GPU-accelerated code editor written in Rust. It is designed for speed and collaboration, with a built-in AI assistant that supp...

How to Use Dremio with OpenAI Codex CLI: Connect, Query, and Build Data Apps

3/5/2026

OpenAI Codex CLI is a terminal-based coding agent built in Rust. It reads your codebase, writes files, executes commands, and supports MCP for connect...

How to Use Dremio with Amazon Kiro: Connect, Query, and Build Data Apps

3/5/2026

Amazon Kiro is an agentic AI IDE from AWS that introduces spec-driven development to the coding workflow. Instead of jumping straight to code, Kiro he...

How to Use Dremio with JetBrains AI Assistant: Connect, Query, and Build Data Apps

3/5/2026

JetBrains AI Assistant is built into IntelliJ IDEA, PyCharm, DataGrip, and every JetBrains IDE. It provides AI chat, inline code generation, multi-fil...

How to Use Dremio with Gemini CLI: Connect, Query, and Build Data Apps

3/5/2026

Gemini CLI is Google's open-source terminal-based AI agent. It runs directly in your terminal, powered by Gemini models with a 1-million token context...

How to Use Dremio with Google Antigravity: Connect, Query, and Build Data Apps

3/5/2026

Google Antigravity is an agent-first IDE built by Google DeepMind. Its autonomous agents plan multi-step tasks, write code, browse documentation, and ...

How to Use Dremio with Windsurf: Connect, Query, and Build Data Apps

3/5/2026

Windsurf is an AI-native code editor built as a fork of VS Code. Its standout feature is Cascade, an agentic AI system that plans and executes multi-s...

How to Use Dremio with GitHub Copilot: Connect, Query, and Build Data Apps

3/5/2026

GitHub Copilot is the most widely adopted AI coding assistant, integrated into VS Code, JetBrains IDEs, and the GitHub platform. Its agent mode allows...

How to Use Dremio with Claude CoWork: Connect, Query, and Build Data Apps

3/5/2026

Claude CoWork is Anthropic's desktop agentic assistant. Unlike Claude Code (a terminal coding agent), CoWork operates as a general-purpose autonomous ...

How to Use Dremio with Claude Code: Connect, Query, and Build Data Apps

3/5/2026

Claude Code is Anthropic's terminal-based coding agent. It reads your files, writes code, runs commands, and maintains context across a session. Dremi...

How to Use Dremio with Cursor: Connect, Query, and Build Data Apps

3/5/2026

Cursor is an AI-native code editor built as a fork of VS Code. It integrates AI directly into the editing experience with features like Chat, Composer...

The 2025 State of the Apache Iceberg Ecosystem Results

3/1/2026

![2025 Survey](https://imgur.com/eSwOYfd.png) **Raw Results at Bottom of Post** **Apache Iceberg Literature from Alex Merced and/or Andrew Madsen:**...

Connect Dremio Software to Dremio Cloud: Hybrid Federation Across Deployments

3/1/2026

Dremio Cloud can connect to Dremio Software (self-managed) instances as a federated data source. This creates a hybrid deployment where Dremio Cloud s...

Dremio's Built-in Open Catalog: Your Zero-Configuration Apache Iceberg Lakehouse

3/1/2026

Every Dremio Cloud account starts with a built-in Open Catalog — a fully managed Apache Iceberg catalog with integrated storage. When you create a Dre...

Connect Any Iceberg REST Catalog to Dremio Cloud: Universal Lakehouse Access

3/1/2026

The Apache Iceberg REST Catalog specification defines a standard HTTP API for managing Iceberg table metadata. Any catalog implementation that conform...

Connect Databricks Unity Catalog to Dremio Cloud: Query Delta Lake Tables with Federation and AI

3/1/2026

Databricks Unity Catalog is Databricks' governance layer for data and AI assets. It manages Delta Lake tables, machine learning models, feature stores...

Connect Snowflake Open Catalog to Dremio Cloud: Multi-Engine Iceberg Analytics

3/1/2026

Snowflake Open Catalog is Snowflake's managed implementation of the Apache Iceberg REST catalog specification, based on the open-source Apache Polaris...

Connect AWS Glue Data Catalog to Dremio Cloud: Query and Manage Your AWS Iceberg Tables

3/1/2026

AWS Glue Data Catalog is AWS's managed metadata service for data lakes. It stores table definitions, schemas, partition information, and statistics fo...

Connect Apache Druid to Dremio Cloud: Add SQL Joins, AI, and Governance to Your Real-Time Analytics

3/1/2026

Apache Druid is a real-time analytics database designed for sub-second queries on high-ingestion-rate event data. Clickstream analytics, application m...

Connect MongoDB to Dremio Cloud: SQL Analytics on Document Data

3/1/2026

MongoDB is the most popular NoSQL document database. It stores data in flexible JSON-like documents, making it ideal for applications with evolving sc...

Connect Vertica to Dremio Cloud: Federation for Analytics-Optimized Data

3/1/2026

Vertica is a columnar analytics database engineered for fast aggregate queries on large datasets. It was built from the ground up for analytical workl...

Connect Azure Synapse Analytics to Dremio Cloud: Multi-Cloud Data Warehouse Federation

3/1/2026

Microsoft Azure Synapse Analytics combines big data analytics and enterprise data warehousing into a single Azure-integrated platform. If your organiz...

Connect Snowflake to Dremio Cloud: Federate, Govern, and Accelerate Beyond Snowflake

3/1/2026

Snowflake is a popular cloud data warehouse known for its separation of storage and compute, near-zero maintenance, and broad ecosystem. Many organiza...

Connect Google BigQuery to Dremio Cloud: Cross-Cloud Analytics Without Data Movement

3/1/2026

Google BigQuery is Google Cloud's serverless data warehouse. If your organization uses Google Cloud Platform, BigQuery is where your analytics data, m...

Connect Amazon Redshift to Dremio Cloud: Extend Your Warehouse with Federation and AI Analytics

3/1/2026

Amazon Redshift is AWS's managed data warehouse, designed for petabyte-scale analytics. If your organization chose Redshift for analytical workloads, ...

Connect Azure Storage to Dremio Cloud: Query Your Microsoft Data Lake with SQL and AI

3/1/2026

Azure Storage is Microsoft's cloud storage platform, spanning Blob Storage, Azure Data Lake Storage Gen2 (ADLS Gen2), and Azure Files. If your organiz...

Connect Amazon S3 to Dremio Cloud: Query Your Data Lake with SQL, Federation, and AI

3/1/2026

Amazon S3 is the default landing zone for data in the cloud. Log files, Parquet datasets, CSV exports, JSON events, IoT telemetry, and raw data dumps ...

Connect SAP HANA to Dremio Cloud: Unlock Analytics Beyond the SAP Ecosystem

3/1/2026

SAP HANA is the in-memory database platform that powers SAP S/4HANA, SAP BW/4HANA, and custom enterprise applications across finance, manufacturing, l...

Connect IBM Db2 to Dremio Cloud: Modernize Mainframe Analytics with Federation and AI

3/1/2026

IBM Db2 is the relational database that powers critical applications across banking, insurance, government, healthcare, and manufacturing. For organiz...

Connect Microsoft SQL Server to Dremio Cloud: Federate Enterprise Data Without ETL

3/1/2026

Microsoft SQL Server is one of the most widely deployed enterprise databases in the world. ERP systems, CRM platforms, financial applications, and cus...

Connect Oracle Database to Dremio Cloud: Enterprise Analytics Without Data Movement

3/1/2026

Oracle Database runs the most critical enterprise applications in the world — ERP systems, financial ledgers, supply chain management, and HR platform...

Connect MySQL to Dremio Cloud: Federated Analytics Without ETL

3/1/2026

MySQL runs more web applications, SaaS platforms, and e-commerce backends than any other database. It's fast for transactional reads and writes, but i...

Connect PostgreSQL to Dremio Cloud: Query, Federate, and Accelerate Your Data

3/1/2026

PostgreSQL powers more production applications than almost any other open-source database. It's where your customer records, transaction logs, product...

Extract Structured Data from Text with Dremio's AI_GENERATE Function

3/1/2026

Unstructured text is the most underused data in most organizations. Customer emails sit in inboxes. Contract notes live in text fields. Meeting summar...

Generate Summaries and Insights with Dremio's AI_COMPLETE Function

3/1/2026

Every data team has a version of this problem: a table full of raw data that needs human-readable summaries, translations, or narrative descriptions. ...

Classify Your Data with SQL: A Hands-On Guide to Dremio's AI_CLASSIFY Function

3/1/2026

Most classification workflows require exporting data to Python, running a model, and importing results back into your warehouse. Dremio's `AI_CLASSIFY...

Semantic Layer Best Practices: 7 Mistakes to Avoid

2/18/2026

![Semantic layer best practices checklist — checks and mistakes](/images/blog/semantic-layer/best-practices.png) Semantic layers don't fail because t...

How a Self-Documenting Semantic Layer Reduces Data Team Toil

2/18/2026

![Self-documenting semantic layer — AI generating descriptions and labels automatically](/images/blog/semantic-layer/self-documenting.png) Every data...

Headless BI: How a Universal Semantic Layer Replaces Tool-Specific Models

2/18/2026

![Headless BI — one semantic layer serving all consumers](/images/blog/semantic-layer/headless-bi.png) Your organization uses Tableau for executive d...

Data Virtualization and the Semantic Layer: Query Without Copying

2/18/2026

![Data virtualization — connecting sources to a unified semantic layer without copying](/images/blog/semantic-layer/data-virtualization.png) Every da...

The Role of the Semantic Layer in Data Governance

2/18/2026

![Data governance through a semantic layer — centralized policies and documentation](/images/blog/semantic-layer/governance-semantic.png) Most organi...

Why Your AI Initiatives Fail Without a Semantic Layer

2/18/2026

![AI with vs without a semantic layer — failure modes and fixes](/images/blog/semantic-layer/ai-semantic-layer.png) Your team builds an AI agent. It ...

Semantic Layer vs. Data Catalog: Complementary, Not Competing

2/18/2026

![Data catalog and semantic layer — complementary systems bridged together](/images/blog/semantic-layer/catalog-vs-semantic.png) "We already have a d...

Semantic Layer vs. Metrics Layer: What's the Difference?

2/18/2026

![Semantic layer vs metrics layer — the metrics layer is a subset](/images/blog/semantic-layer/semantic-vs-metrics.png) Both terms appear in every mo...

How to Build a Semantic Layer: A Step-by-Step Guide

2/18/2026

![Building a semantic layer — Bronze, Silver, and Gold tiers](/images/blog/semantic-layer/build-semantic-layer.png) Most teams start building a seman...

What Is a Semantic Layer? A Complete Guide

2/18/2026

![Semantic layer concept — translating raw data into business terms](/images/blog/semantic-layer/semantic-layer-concept.png) Ask three teams in your ...

Data Engineering Best Practices: The Complete Checklist

2/18/2026

![Comprehensive data engineering checklist organized by categories with status indicators](/images/blog/debp/de-checklist.png) Best practices documen...

Pipeline Observability: Know When Things Break

2/18/2026

![Pipeline observability dashboard showing metrics, logs, and data lineage](/images/blog/debp/observability-dashboard.png) An analyst messages you on...

Testing Data Pipelines: What to Validate and When

2/18/2026

![Data pipeline testing pyramid with schema tests at the base, contract tests in the middle, and regression tests at the top](/images/blog/debp/testin...

Partition and Organize Data for Performance

2/18/2026

![Table data split into partitions by date with query scanning only the relevant partition](/images/blog/debp/partition-overview.png) A table with 50...

Batch vs. Streaming: Choose the Right Processing Model

2/18/2026

![Batch processing in scheduled groups vs streaming in continuous flow](/images/blog/debp/batch-vs-streaming.png) "We need real-time data." This is o...

Schema Evolution Without Breaking Consumers

2/18/2026

![Schema as a contract between producers and consumers with version tracking](/images/blog/debp/schema-contract.png) A source team renames a column f...

Idempotent Pipelines: Build Once, Run Safely Forever

2/18/2026

![Pipeline running multiple times and converging to the same result](/images/blog/debp/idempotent-pipeline.png) A pipeline runs, processes 100,000 re...

Data Quality Is a Pipeline Problem, Not a Dashboard Problem

2/18/2026

![Data quality checks enforced at the pipeline validation stage before data reaches consumers](/images/blog/debp/data-quality-pipeline.png) When an a...

How to Design Reliable Data Pipelines

2/18/2026

![Data pipeline architecture with four layers flowing from ingestion through staging, transformation, and serving](/images/blog/debp/pipeline-architec...

How to Think Like a Data Engineer

2/18/2026

![Data flowing through a system of interconnected pipeline stages from sources to consumers](/images/blog/debp/data-engineer-mindset.png) The median ...

Data Modeling Best Practices: 7 Mistakes to Avoid

2/18/2026

![Checklist of data modeling quality markers with warning symbols on common mistakes](/images/blog/data-modeling/best-practices-checklist.png) A bad ...

Data Vault Modeling: Hubs, Links, and Satellites

2/18/2026

![Data Vault model showing Hubs, Links, and Satellites as interconnected components](/images/blog/data-modeling/data-vault-overview.png) Dimensional ...

Denormalization: When and Why to Flatten Your Data

2/18/2026

![Normalized model with many interconnected tables vs. denormalized wide flat table](/images/blog/data-modeling/denormalization-overview.png) Normali...

Data Modeling for Analytics: Optimize for Queries, Not Transactions

2/18/2026

![OLTP normalized model vs. OLAP denormalized model side by side](/images/blog/data-modeling/analytics-data-modeling.png) The data model that runs yo...

Slowly Changing Dimensions: Types 1-3 with Examples

2/18/2026

![Dimension timeline showing attribute values changing across time periods](/images/blog/data-modeling/slowly-changing-dimensions.png) Dimensions cha...

Dimensional Modeling: Facts, Dimensions, and Grains

2/18/2026

![Dimensional model showing a central fact table connected to surrounding dimension tables](/images/blog/data-modeling/dimensional-modeling.png) Dime...

Data Modeling for the Lakehouse: What Changes

2/18/2026

![Traditional data warehouse model vs. open lakehouse model with flexible schema and views](/images/blog/data-modeling/lakehouse-data-modeling.png) T...

Star Schema vs. Snowflake Schema: When to Use Each

2/18/2026

![Star schema with central fact table surrounded by denormalized dimension tables](/images/blog/data-modeling/star-vs-snowflake.png) Both star schema...

Conceptual, Logical, and Physical Data Models Explained

2/18/2026

![Three layers of data modeling from business concepts to database implementation](/images/blog/data-modeling/types-of-data-models.png) Most data tea...

What Is Data Modeling? A Complete Guide

2/18/2026

![Data entities connected by relationship lines forming a structured data model](/images/blog/data-modeling/data-modeling-overview.png) Every databas...

A 2026 Introduction to Apache Iceberg

2/13/2026

Apache Iceberg is an open-source table format for large analytic datasets. It defines how data files stored on object storage (S3, ADLS, GCS) are orga...

A Practical Guide to AI-Assisted Coding Tools

1/15/2026

**Get Data Lakehouse Books:** - [Apache Iceberg: The Definitive Guide](https://drmevn.fyi/tableformatblog) - [Apache Polaris: The Definitive Guide](ht...

What Are Recursive Language Models?

1/10/2026

**Get Data Lakehouse Books:** - [Apache Iceberg: The Definitive Guide](https://drmevn.fyi/tableformatblog) - [Apache Polaris: The Defintive Guide](htt...

RAG Isn’t a Modeling Problem. It’s a Data Engineering Problem.

1/6/2026

**Get Data Lakehouse Books:** - [Apache Iceberg: The Definitive Guide](https://drmevn.fyi/tableformatblog) - [Apache Polaris: The Defintive Guide](htt...

Building Pangolin - My Holiday Break, an AI IDE, and a Lakehouse Catalog for the Curious

1/2/2026

**Get Data Lakehouse Books:** - [Apache Iceberg: The Definitive Guide](https://drmevn.fyi/tableformatblog) - [Apache Polaris: The Defintive Guide](htt...

2025 Year in Review Apache Iceberg, Polaris, Parquet, and Arrow

12/29/2025

**Get Data Lakehouse Books:** - [Apache Iceberg: The Definitive Guide](https://drmevn.fyi/tableformatblog) - [Apache Polaris: The Defintive Guide](htt...

dremioframe & iceberg - Pythonic interfaces for Dremio and Apache Iceberg

12/5/2025

Modern data teams want simple tools to work with Iceberg tables and Dremio. Two new Python libraries now make that work easier. The first is DremioFra...

Introducing dremioframe - A Pythonic DataFrame Interface for Dremio

11/29/2025

If you're a data analyst or Python developer who prefers chaining expressive `.select()` and `.mutate()` calls over writing raw SQL, you're going to l...

Comprehensive Hands-on Walk Through of Dremio Cloud Next Gen (Hands-on with Free Trial)

11/12/2025

[Video Playlist of this Walkthough](https://www.youtube.com/playlist?list=PL-gIUf9e9CCvY0bcRBGu2SzFFR-yJGIB6) On November 13, at the [Subsurface Lake...

2025-2026 Guide to Learning about Apache Iceberg, Data Lakehouse & Agentic AI

10/23/2025

The data world is evolving fast. Just a few years ago, building a modern analytics stack meant stitching together tools, ETL pipelines, and compromise...

An Exploration of the Commercial Iceberg Catalog Ecosystem

10/21/2025

**Get Data Lakehouse Books:** - [Apache Iceberg: The Definitive Guide](https://drmevn.fyi/tableformatblog) - [Apache Polaris: The Defintive Guide](htt...

Building a Universal Lakehouse Catalog - Beyond Iceberg Tables

10/17/2025

**Get Data Lakehouse Books:** - [Apache Iceberg: The Definitive Guide](https://drmevn.fyi/tableformatblog) - [Apache Polaris: The Defintive Guide](htt...

Intro to Apache Iceberg with Apache Polaris and Apache Spark

10/16/2025

**Get Data Lakehouse Books:** - [Apache Iceberg: The Definitive Guide](https://drmevn.fyi/tableformatblog) - [Apache Polaris: The Defintive Guide](htt...

The State of Apache Iceberg v4 - October 2025 Edition

10/14/2025

**Get Data Lakehouse Books:** - [Apache Iceberg: The Definitive Guide](https://drmevn.fyi/tableformatblog) - [Apache Polaris: The Defintive Guide](htt...

The Ultimate Guide to Open Table Formats - Iceberg, Delta Lake, Hudi, Paimon, and DuckLake

9/24/2025

**Get Data Lakehouse Books:** - [Apache Iceberg: The Definitive Guide](https://drmevn.fyi/tableformatblog) - [Apache Polaris: The Defintive Guide](htt...

The 2025 & 2026 Ultimate Guide to the Data Lakehouse and the Data Lakehouse Ecosystem

9/23/2025

- [Join the Data Lakehouse Community](https://www.datalakehousehub.com) - [Data Lakehouse Blog Listings](https://lakehouseblogs.com) *Year-end 2025 r...

Composable Analytics with Agents - Leveraging Virtual Datasets and the Semantic Layer

9/17/2025

- **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_external_blog&utm_me...

The Endgame — Building an Autonomous Optimization Pipeline for Apache Iceberg

9/16/2025

- **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_external_blog&utm_me...

Managing Large-Scale Optimizations — Parallelism, Checkpointing, and Fail Recovery

9/9/2025

- **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_external_blog&utm_me...

Unlocking the Power of Agentic AI with Apache Iceberg and Dremio

9/5/2025

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Hidden Pitfalls — Compaction and Partition Evolution in Apache Iceberg

9/2/2025

- **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_external_blog&utm_me...

Using Iceberg Metadata Tables to Determine When Compaction Is Needed

8/26/2025

- **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_external_blog&utm_me...

Designing the Ideal Cadence for Compaction and Snapshot Expiration

8/19/2025

- **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_external_blog&utm_me...

Avoiding Metadata Bloat with Snapshot Expiration and Rewriting Manifests

8/12/2025

- **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_external_blog&utm_me...

Smarter Data Layout — Sorting and Clustering Iceberg Tables

8/5/2025

- **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_external_blog&utm_me...

Optimizing Compaction for Streaming Workloads in Apache Iceberg

7/29/2025

- **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_external_blog&utm_me...

The Basics of Compaction — Bin Packing Your Data for Efficiency

7/22/2025

- **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_external_blog&utm_me...

The Cost of Neglect — How Apache Iceberg Tables Degrade Without Optimization

7/15/2025

- **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_external_blog&utm_me...

How to Discover or Organize Lakehouse & Apache Iceberg Meetups

7/3/2025

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

What is an API? And Why Data Architecture Depends on Them

6/23/2025

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Decoding AWS EC2 Instance Type Names

6/18/2025

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Introduction to Data Engineering Concepts | What is Data Engineering?

5/2/2025

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Introduction to Data Engineering Concepts | Understanding Data Sources and Ingestion

5/2/2025

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Introduction to Data Engineering Concepts | ETL vs ELT – Understanding Data Pipelines

5/2/2025

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Introduction to Data Engineering Concepts | Batch Processing Fundamentals

5/2/2025

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Introduction to Data Engineering Concepts | Streaming Data Fundamentals

5/2/2025

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Introduction to Data Engineering Concepts | Data Modeling Basics

5/2/2025

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Introduction to Data Engineering Concepts | Data Warehousing Fundamentals

5/2/2025

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Introduction to Data Engineering Concepts | Data Lakes Explained

5/2/2025

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Introduction to Data Engineering Concepts | Storage Formats and Compression

5/2/2025

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Introduction to Data Engineering Concepts | Data Quality and Validation

5/2/2025

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Introduction to Data Engineering Concepts | Metadata, Lineage, and Governance

5/2/2025

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Introduction to Data Engineering Concepts | Scheduling and Workflow Orchestration

5/2/2025

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Introduction to Data Engineering Concepts | Building Scalable Pipelines

5/2/2025

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Introduction to Data Engineering Concepts | Cloud Data Platforms and the Modern Stack

5/2/2025

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Introduction to Data Engineering Concepts | DevOps for Data Engineering

5/2/2025

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Introduction to Data Engineering Concepts | Data Lakehouse Architecture Explained

5/2/2025

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Introduction to Data Engineering Concepts | Apache Iceberg, Arrow, and Polaris

5/2/2025

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Introduction to Data Engineering Concepts | The Power of Dremio in the Modern Lakehouse

5/2/2025

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

A Journey from AI to LLMs and MCP - 10 - Sampling and Prompts in MCP — Making Agent Workflows Smarter and Safer

4/14/2025

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

A Journey from AI to LLMs and MCP - 9 - Tools in MCP — Giving LLMs the Power to Act

4/13/2025

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

A Journey from AI to LLMs and MCP - 8 - Resources in MCP — Serving Relevant Data Securely to LLMs

4/12/2025

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

A Journey from AI to LLMs and MCP - 7 - Under the Hood — The Architecture of MCP and Its Core Components

4/11/2025

# A Journey from AI to LLMs and MCP - 7 - Under the Hood — The Architecture of MCP and Its Core Components ## Free Resources - **[Free Apache Icebe...

Journey from AI to LLMs and MCP - 6 - Enter the Model Context Protocol (MCP) — The Interoperability Layer for AI Agents

4/10/2025

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

A Journey from AI to LLMs and MCP - 5 - AI Agent Frameworks — Benefits and Limitations

4/9/2025

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

A Journey from AI to LLMs and MCP - 4 - What Are AI Agents — And Why They're the Future of LLM Applications

4/8/2025

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

A Journey from AI to LLMs and MCP - 3 - Boosting LLM Performance — Fine-Tuning, Prompt Engineering, and RAG

4/7/2025

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

A Journey from AI to LLMs and MCP - 2 - How LLMs Work — Embeddings, Vectors, and Context Windows

4/6/2025

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

A Journey from AI to LLMs and MCP - 1 - What Is AI and How It Evolved Into LLMs

4/5/2025

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Building a Basic MCP Server with Python

4/4/2025

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Using Helm with Kubernetes - A Guide to Helm Charts and Their Implementation

2/19/2025

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

Crash Course on Developing AI Applications with LangChain

2/1/2025

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

The Data Lakehouse - The Benefits and Enhancing Implementation

1/31/2025

## Free Resources - **[Free Apache Iceberg Course](https://hello.dremio.com/webcast-an-apache-iceberg-lakehouse-crash-course-reg.html?utm_source=ev_...

2025 Comprehensive Guide to Apache Iceberg

1/20/2025

- [Free Apache Iceberg Crash Course](https://university.dremio.com/?utm_source=ev_external_blog&utm_medium=influencer&utm_campaign=2025-iceberg-comp-g...

When to use Apache Xtable or Delta Lake Uniform for Data Lakehouse Interoperability

1/7/2025

- [Blog: What is a Data Lakehouse and a Table Format?](https://www.dremio.com/blog/apache-iceberg-crash-course-what-is-a-data-lakehouse-and-a-table-fo...

2025 Guide to Architecting an Iceberg Lakehouse

12/9/2024

- [Blog: What is a Data Lakehouse and a Table Format?](https://www.dremio.com/blog/apache-iceberg-crash-course-what-is-a-data-lakehouse-and-a-table-fo...

10 Future Apache Iceberg Developments to Look forward to in 2025

11/25/2024

- [Blog: What is a Data Lakehouse and a Table Format?](https://www.dremio.com/blog/apache-iceberg-crash-course-what-is-a-data-lakehouse-and-a-table-fo...

Deep Dive into Dremio's File-based Auto Ingestion into Apache Iceberg Tables

11/15/2024

- [Blog: What is a Data Lakehouse and a Table Format?](https://www.dremio.com/blog/apache-iceberg-crash-course-what-is-a-data-lakehouse-and-a-table-fo...

Intro to SQL using Apache Iceberg and Dremio

11/8/2024

- [Blog: What is a Data Lakehouse and a Table Format?](https://www.dremio.com/blog/apache-iceberg-crash-course-what-is-a-data-lakehouse-and-a-table-fo...

Dremio, Apache Iceberg and their role in AI-Ready Data

11/5/2024

- [Blog: What is a Data Lakehouse and a Table Format?](https://www.dremio.com/blog/apache-iceberg-crash-course-what-is-a-data-lakehouse-and-a-table-fo...

Introduction to Cargo and cargo.toml

11/5/2024

When working with Rust, Cargo is your go-to tool for managing dependencies, building, and running your projects. Acting as Rust's package manager and ...

Leveraging Python's Pattern Matching and Comprehensions for Data Analytics

11/1/2024

- [Blog: What is a Data Lakehouse and a Table Format?](https://www.dremio.com/blog/apache-iceberg-crash-course-what-is-a-data-lakehouse-and-a-table-fo...

Hands-on with Apache Iceberg & Dremio on Your Laptop within 10 Minutes

10/31/2024

- [Free Copy of Apache Iceberg the Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=ev_external_b...

Data Modeling - Entities and Events

10/30/2024

Structuring data thoughtfully is critical for both operational efficiency and analytical value. Data modeling helps us define the relationships, const...

All About Parquet Part 01 - An Introduction

10/21/2024

- [Free Copy of Apache Iceberg the Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=alexmerced&ut...

All About Parquet Part 02 - Parquet's Columnar Storage Model

10/21/2024

- [Free Copy of Apache Iceberg the Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=alexmerced&ut...

All About Parquet Part 03 - Parquet File Structure | Pages, Row Groups, and Columns

10/21/2024

- [Free Copy of Apache Iceberg the Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=alexmerced&ut...

All About Parquet Part 04 - Schema Evolution in Parquet

10/21/2024

- [Free Copy of Apache Iceberg the Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=alexmerced&ut...

All About Parquet Part 05 - Compression Techniques in Parquet

10/21/2024

- [Free Copy of Apache Iceberg the Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=alexmerced&ut...

All About Parquet Part 06 - Encoding in Parquet | Optimizing for Storage

10/21/2024

- [Free Copy of Apache Iceberg the Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=alexmerced&ut...

All About Parquet Part 07 - Metadata in Parquet | Improving Data Efficiency

10/21/2024

- [Free Copy of Apache Iceberg the Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=alexmerced&ut...

All About Parquet Part 08 - Reading and Writing Parquet Files in Python

10/21/2024

- [Free Copy of Apache Iceberg the Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=alexmerced&ut...

All About Parquet Part 09 - Parquet in Data Lake Architectures

10/21/2024

- [Free Copy of Apache Iceberg the Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=alexmerced&ut...

All About Parquet Part 10 - Performance Tuning and Best Practices with Parquet

10/21/2024

- [Free Copy of Apache Iceberg the Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=alexmerced&ut...

Orchestrating Airflow DAGs with GitHub Actions - A Lightweight Approach to Data Curation Across Spark, Dremio, and Snowflake

10/19/2024

- [Free Copy of Apache Iceberg the Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=alexmerced&ut...

A Deep Dive Into GitHub Actions From Software Development to Data Engineering

10/19/2024

- [Free Copy of Apache Iceberg the Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_content=alexmerced&u...

A Guide to dbt Macros - Purpose, Benefits, and Usage

10/18/2024

- [Apache Iceberg 101](https://www.dremio.com/lakehouse-deep-dives/apache-iceberg-101/?utm_source=ev_external_blog&utm_medium=influencer&utm_campaign=...

Data Lakehouse Roundup 1 - News and Insights on the Lakehouse

10/16/2024

I’m excited to kick off a new series called "Data Lakehouse Roundup," where I’ll cover the latest developments in the data lakehouse space, approximat...

Getting Started with Data Analytics Using PyArrow in Python

10/15/2024

- [Apache Iceberg Crash Course: What is a Data Lakehouse and a Table Format?](https://www.dremio.com/blog/apache-iceberg-crash-course-what-is-a-data-l...

What is Three-Tier Data (Bronze, Silver, Gold) and How Dremio Simplifies It

10/9/2024

- [Apache Iceberg 101](https://www.dremio.com/lakehouse-deep-dives/apache-iceberg-101/?utm_source=ev_external_blog&utm_medium=influencer&utm_campaign=...

A Brief Guide to the Governance of Apache Iceberg Tables

10/7/2024

- [Apache Iceberg Crash Course: What is a Data Lakehouse and a Table Format?](https://www.dremio.com/blog/apache-iceberg-crash-course-what-is-a-data-l...

Exploring Data Operations with PySpark, Pandas, DuckDB, Polars, and DataFusion in a Python Notebook

10/7/2024

- [Apache Iceberg Crash Course: What is a Data Lakehouse and a Table Format?](https://www.dremio.com/blog/apache-iceberg-crash-course-what-is-a-data-l...

Ultimate Directory of Apache Iceberg Resources

10/5/2024

This article is a comprehensive directory of Apache Iceberg resources, including educational materials, tutorials, and hands-on exercises. Whether you...

Change Data Capture (CDC) when there is no CDC

10/4/2024

- [Free Copy of Apache Iceberg: The Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=alexmerced&u...

Virtualization + Lakehouse + Mesh = Data At Scale

9/25/2024

- [Free Copy of Apache Iceberg: The Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=ev_external_...

Deep Dive into Data Apps with Streamlit

9/22/2024

# Introduction The ability to quickly develop and deploy interactive applications is invaluable. **Streamlit** is a powerful tool that enables data s...

A Deep Dive into Docker Compose

9/21/2024

## Understanding the Docker Compose File Structure Docker Compose uses a YAML file (`docker-compose.yml`) to define services, networks, and volumes t...

Hands-on with Apache Iceberg on Your Laptop - Deep Dive with Apache Spark, Nessie, Minio, Dremio, Polars and Seaborn

9/12/2024

- [Free Copy of Apache Iceberg: The Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=ev_external_...

Why Data Analysts, Engineers, Architects and Scientists Should Care about Dremio and Apache Iceberg

9/10/2024

- [Free Copy of Apache Iceberg: The Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=ev_external_...

5 Trends in the Data Lakehouse Space

9/1/2024

- [Free Copy of Apache Iceberg: The Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=ev_external_...

Using the alexmerced/datanotebook Docker Image

8/30/2024

- [Watch My Intro to Data Playlist](https://www.youtube.com/watch?v=nq8ETrTgT7o&list=PLsLAVBjQJO0p_4Nqz99tIjeoDYE97L0xY&pp=iAQB) - [Download Free Copy...

Understanding Apache Iceberg Delete Files

8/29/2024

- [Free Copy of Apache Iceberg: The Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=ev_external_...

Understanding the Apache Iceberg Manifest

8/27/2024

- [Free Copy of Apache Iceberg: The Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=ev_external_...

Understanding the Apache Iceberg Manifest List (Snapshot)

8/25/2024

- [Free Copy of Apache Iceberg: The Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=ev_external_...

Understanding Apache Iceberg's Metadata.json

8/21/2024

- [Free Copy of Apache Iceberg: The Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=alexmerced&u...

What Apache Iceberg REST Catalog is and isn't

8/18/2024

- [Free Copy of Apache Iceberg: The Definitive Guide](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html?utm_source=alexmerced&u...

ACID Guarantees and Apache Iceberg - Turning Any Storage into a Data Warehouse

8/15/2024

Apache Iceberg has become a prominent name in the data world, with numerous platforms integrating support for Iceberg tables as part of the growing op...

Data Lakehouse 101 - The Who, What and Why of Data Lakehouses

8/5/2024

- [Sign-up for this free Apache Iceberg Crash Course](https://bit.ly/am-2024-iceberg-live-crash-course-1) - [Get a free copy of Apache Iceberg the Def...

Understanding the Polaris Iceberg Catalog and Its Architecture

7/31/2024

NOTE: I am working on a hands-on tutorial for Polaris, so please watch for the [Dremio Blog](https://www.dremio.com/blog) in the coming days. Also, ch...

Apache Iceberg Reliability

7/26/2024

- [Get a Free Copy of "Apache Iceberg: The Definitive Guide"](https://bit.ly/am-iceberg-book) - [Sign Up for the Free Apache Iceberg Crash Course](htt...

Upcoming Data Talks from Alex Merced (And how to follow)

7/20/2024

In this article, I will provide you with a list of events I'm currently scheduled to speak at. New events are regularly being added, so here are a cou...

Databases Deconstructed - The Value of Data Lakehouses and Table Formats

7/12/2024

- [Checkout out my Apache Iceberg Crash Course](https://bit.ly/am-2024-iceberg-live-crash-course-1) - [Get a free copy of Apache Iceberg the Definitiv...

Video Course - Basics of Lakehouse Engineering - Apache Iceberg, Nessie, Dremio

6/26/2024

[Get a Free Copy of "Apache Iceberg: The Definitive Guide"](https://bit.ly/am-iceberg-book) ## #1 - Intro - Basics of Lakehouse Engineering - Apache ...

Partitioning with Apache Iceberg - A Deep Dive

5/29/2024

- [Apache Iceberg 101](https://www.dremio.com/blog/apache-iceberg-101-your-guide-to-learning-apache-iceberg-concepts-and-practices/) - [Get Hands-on W...

3 Reasons Data Engineers Should Embrace Apache Iceberg

5/15/2024

Data engineers are constantly seeking ways to streamline workflows and enhance data management efficiency. [Apache Iceberg, a high-performance table f...

Running SQL on your Excel Files From Your Laptop with Dremio

5/3/2024

Being able to quickly analyze and gain insights from your data is crucial. Excel is widely used for data storage, but when it comes to complex queries...

Understanding the Future of Apache Iceberg Catalogs

4/4/2024

[Apache Iceberg](https://www.dremio.com/blog/apache-iceberg-101-your-guide-to-learning-apache-iceberg-concepts-and-practices/) is revolutionizing the ...

A Deep Intro to Apache Iceberg and Resources for Learning More

4/4/2024

For a long time, siloed data systems such as databases and data warehouses were sufficient. These systems provided convenient abstractions for various...

End-to-End Basic Data Engineering Tutorial (Spark, Dremio, Superset)

4/1/2024

Data engineering aims to make data accessible and usable for data analytics and data science purposes. This involves several key aspects: - Transferr...

5 Open Source Data Projects You Should Be Following

3/19/2024

[Follow Me On Social](https://bio.alexmerced.com/data) [Subscribe to my SubStack](https://amdatalakehouse.substack.com) Open source technology signif...

5 Reasons Dremio is the Ideal Apache Iceberg Lakehouse Platform

3/9/2024

[The Apache Iceberg table format](https://www.dremio.com/blog/apache-iceberg-101-your-guide-to-learning-apache-iceberg-concepts-and-practices/) has se...

The Apache Iceberg Lakehouse - The Great Data Equalizer

3/6/2024

> [Get a Free Copy of "Apache Iceberg: The Definitive Guide"](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html) > [Build an I...

10 Reasons to Make Apache Iceberg and Dremio Part of Your Data Lakehouse Strategy

3/1/2024

> [Get a Free Copy of "Apache Iceberg: The Definitive Guide"](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html) > [Build an I...

A deep dive into the concept and world of Apache Iceberg Catalogs

3/1/2024

> [Get a Free Copy of "Apache Iceberg: The Definitive Guide"](https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html) > [Build an I...

Introduction to ANSI SQL - Understanding the Syntax and Concepts

2/24/2024

[Subscribe to my Data Youtube Channel and Podcasts, Links Here](https://bio.alexmerced.com/data) [Subscribe to my web development youtube channel and...

The Role of Ontologies in Data Management

2/24/2024

The concept of ontologies plays a pivotal role in organizing and making sense of the vast information available. In data management, ontologies are cr...

What is the Data Lakehouse and the Role of Apache Iceberg, Nessie and Dremio?

2/21/2024

Organizations are constantly seeking more efficient, scalable, and flexible solutions to manage their ever-growing data assets. This quest has led to ...

Partitioning Practices in Apache Hive and Apache Iceberg

2/12/2024

# Partitioning Practices in Apache Hive and Apache Iceberg ## Introduction The efficiency of query execution is paramount. One of the key strategies ...

Columnar vs. Row-based Data Structures in OLTP and OLAP Systems

2/3/2024

[Follow my Data Youtube Channel](https://www.youtube.com/@alexmerceddata) The decision between using columnar and row-based data structures can signi...

Introduction to Data Vault Modeling

2/2/2024

[Subscribe to my Data Youtube Channel and Podcasts, Links Here](https://bio.alexmerced.com/data) Data Vault modeling is an approach to data warehouse...

Table Format FUD - Thinking Through the Table Format Conversion (Apache Iceberg, Apache Hudi, Delta Lake)

2/2/2024

## Context This article is meant to be a sober reflection on the data lakehouse table format conversation I have had as a participant over the last t...

Embracing the Future of Data Management - Why Choose Lakehouse, Iceberg, and Dremio?

1/25/2024

Data is not just an asset but the cornerstone of business strategy. The way we manage, store, and process this invaluable resource has evolved dramati...

Open Lakehouse Engineering/Apache Iceberg Lakehouse Engineering - A Directory of Resources

1/19/2024

The concept of the **Open Lakehouse** has emerged as a beacon of flexibility and innovation. An Open Lakehouse represents a specialized form data lake...

Nessie - An Alternative to Hive & JDBC for Self-Managed Apache Iceberg Catalogs

1/8/2024

Unlike traditional table formats, Apache Iceberg provides a comprehensive solution for handling big data's complexity, volume, and diversity. It's des...

Apache Iceberg, Git-Like Catalog Versioning and Data Lakehouse Management - Pillars of a Robust Data Lakehouse Platform

1/3/2024

Managing vast amounts of data efficiently and effectively is crucial for any organization aiming to leverage its data for strategic decisions. The key...