


Set lock-impl as .glue.DynamoLockManager.This feature requires the following lock related catalog properties: GlueCatalog first obtains a lock using a helper DynamoDB table and then try to safely modify the Glue table. Therefore, DynamoDB can be used for Glue, so that for every commit,
EMR ICEBERG UPDATE
There is no guarantee that one update would not clobber the other update. Glue does not have a strong guarantee over concurrent updates to a table.Īlthough it throws ConcurrentModificationException when detecting two processes updating a table at the same time,
EMR ICEBERG ARCHIVE
Therefore, it is recommended to turn off the archive feature in Glue by setting glue.skip-archive to true.įor more details, please read Glue Quotas and the UpdateTable API. However, if you are streaming data to Iceberg, this will easily create a lot of Glue table versions. Skip Archiveīy default, Glue stores all the table versions created and user can rollback a table to any historical version if needed. See more details in AWS client customization. If the Glue catalog is in a different region, you should configure you AWS client to point to the correct region, The Glue catalog ID is your numeric AWS account ID. You can specify the Glue catalog ID through glue.id catalog property to point to a Glue catalog in a different AWS account. There is a unique Glue metastore in each AWS account and each AWS region.īy default, GlueCatalog chooses the Glue metastore to use based on the user’s default AWS client credential and region setup. More details about loading the catalog can be found in individual engine pages, such as Spark and Flink. Just like what is shown in the enabling AWS integration section above. You can start using Glue catalog by specifying the catalog-impl as .glue.GlueCatalog, When used, an Iceberg namespace is stored as a Glue Database,Īn Iceberg table is stored as a Glue Table,Īnd every Iceberg table version is stored as a Glue TableVersion. Iceberg enables the use of AWS Glue as the Catalog implementation. There are multiple different options that users can choose to build an Iceberg catalog with AWS. You can also preload the catalog by setting the configurations above in hive-site.xml. STORED BY '.hive.HiveIcebergStorageHandler' TBLPROPERTIES ( 'iceberg.catalog' = 'glue') suppose you have an Iceberg table database_a.table_a created by GlueCatalog
EMR ICEBERG DOWNLOAD
To use AWS module with Flink, you can download the necessary dependencies and specify them when starting the Flink SQL client: conf _ =myGlueLockTableĪs you can see, In the shell command, we use -packages to specify the additional AWS bundle and HTTP client dependencies with their version as 2.15.40. conf _catalog.lock-impl =.glue.DynamoLockManager \ conf _catalog.catalog-impl =.glue.GlueCatalog \

conf _catalog.warehouse =s3://my-bucket/my/key/prefix \ # add Iceberg dependency ICEBERG_VERSION =0.13.1ĭEPENDENCIES = ":iceberg-spark3-runtime:$ICEBERG_VERSION " # add AWS dependnecy AWS_SDK_VERSION =2.15.40ĪWS_PACKAGES =( "bundle" "url-connection-client" ) for pkg in " } " do DEPENDENCIES += ",$AWS_MAVEN_GROUP :$pkg :$AWS_SDK_VERSION " done # start Spark SQL client shell spark-sql -packages $DEPENDENCIES \
EMR ICEBERG HOW TO
You can go to the documentations of each engine to see how to load a custom catalog.įor example, to use AWS features with Spark 3.0 and AWS clients version 2.15.40, you can start the Spark SQL shell with: See the section client customization for more details.Īll the AWS module features can be loaded through custom catalog properties, To choose a different HTTP client library such as Apache HTTP Client, This dependency is not part of the AWS SDK bundle and needs to be added separately. Or individual AWS client packages (Glue, S3, DynamoDB, KMS, STS) if you would like to have a minimal dependency footprint.Īll the default AWS clients use the URL Connection HTTP Client You can choose to use the AWS SDK bundle, You will need to provide the AWS v2 SDK because that is what Iceberg depends on. However, the AWS clients are not bundled so that you can use the same client version as your application. The iceberg-aws module is bundled with Spark and Flink engine runtimes for all versions from 0.11.0 onwards.

This section describes how to use Iceberg with AWS. Iceberg provides integration with different AWS services through the iceberg-aws module.
