Schema enabled lakehouses are now supported on Spark 3.5

Photo by Luca Bravo on Unsplash

Schema enabled lakehouses are now supported on Spark 3.5

Microsoft Fabric Runtime 1.3 is the latest GA runtime version and incorporates the following components and upgrades designed to enhance your data processing capabilities:

  • Apache Spark 3.5

  • Operating System: Mariner 2.0

  • Java: 11

  • Scala: 2.12.17

  • Python: 3.11

  • Delta Lake: 3.2

  • R: 4.4.1

In the latest version, there’s a major upgrade in compatibility for structured streaming, and some exciting enhancements for PySpark and SQL users:

SQL Improvements:

  • SQL identifier clause

  • Named arguments in SQL function calls

  • New SQL functions for HyperLogLog approximate aggregations

Python Enhancements:

  • Support for user-defined table functions

Distributed Training:

  • Simplified with DeepSpeed

Structured Streaming:

  • Features like watermark propagation and the new dropDuplicatesWithinWatermark operation

Curious about all the details? Check out the full list of changes here: Spark Release 3.5.0 | Apache Spark


Lakehouse schemas (Preview) - Microsoft Fabric | Microsoft Learn

But are you testing out lakehouse schemas?

Screenshot showing the new lakehouse dialog.

Unfortunately, schema enabled lakehouses aren’t supported on Spark 3.5 yet!

You’ll need to roll back and use Runtime 1.2 (Spark 3.4, Delta 2.4)

Public preview limitations

The following features and functionalities are currently unsupported in the latest public preview release. But don’t worry—they'll be addressed in upcoming updates before General Availability.

Unsupported Features/ FunctionalityNotes
Non-Delta, Managed table schemaGetting schema for managed, non-Delta formatted tables (for example, CSV) isn't supported. Expanding these tables in lakehouse explorer doesn't show any schema information in the UX.
External Spark tablesExternal Spark table operations (for example, discovery, getting schema, etc.) aren't supported. These tables are unidentified in the UX.
Public APIPublic APIs (List tables, Load table, exposing defaultSchema extended property etc.) aren't supported for schema enabled Lakehouse. Existing public APIs called on a schema enabled Lakehouse results an error.
Table maintenanceNot supported.
Update table propertiesNot supported.
Spark 3.5Spark 3.5 runtime isn't supported
Workspace name containing special charactersWorkspace with special characters (for example, space, slashes) isn't supported. A user error is shown.
Spark viewsNot supported.
Hive specific featuresNot supported.
USE <schemaName>Doesn't work cross workspaces, but supported within same workspace.
MigrationMigration of existing non-schema Lakehouses to schema-based Lakehouses isn't supported.

Update

As of today October 1, 2024 Schema enabled lakehouses is now supported in Spark 3.5

Did you find this article valuable?

Support Ian's blog by becoming a sponsor. Any amount is appreciated!