Managing data consistency

Amazon S3 provides eventual consistency for some operations, so it is possible that new data will not be available immediately after the upload, which could result in an incomplete data load or loading stale data. All uploads to buckets in the US Standard Region are eventually consistent. All other regions provide read-after-write consistency for uploads of new objects with unique object keys. For more information about data consistency, see Amazon S3 Data Consistency Model in the Amazon Simple Storage Service Developer Guide.

To ensure that your application loads the correct data, we recommend the following practices: Create new object keys.

Amazon S3 provides eventual consistency in all regions for overwrite operations. Creating new file names, or object keys, in Amazon S3 for each data load operation provides strong consistency in all regions except US Standard.

Use a manifest file with your COPY operation. The manifest explicitly names the files to be loaded. Using a manifest file enforces strong consistency, so it is especially important for buckets in the US Standard region, but it is a good practice in all regions. Use a named endpoint.

If your cluster is in the US East (N. Virginia) Region, you can improve data consistency and reduce latency by using a named endpoint when you create your Amazon S3 bucket.

Hadoop Learning Resource
