amazon s3 practice tips

1. 文件本身的复制是可以从一个Bucket到另外一个Bucket,可以借由这个操作实现跨region复制。
2. 文件本身的命名(objectName)可以以“/”为分割,这样在AWS console管理时,能形成更好的浏览结构:文件夹样式
3. Version功能在创建Bucket时就要决定是否开启,没有开启这个功能时,传内容到同一个Key,直接更新内容到最新,不会产生版本。如果开启了,则有Version产生,当前内容指向最新(current)版本
4. 创建Bucket时,可以指定所在区域(Region),例如US East (N. Virginia) us-east-1,所以最好就近创建就近使用(不过其他区域也能访问到,只要Bucket名称对即可访问)
5. 上传时可以指定不同的存储模式(Storage class),也可以以后再修改存储模式:例如,Intelligent-Tiering等一些不常用的可以用这种模式,相比较标准模式更廉洁,虽然可靠性差点。(3个9)
更多模式参考:
https://aws.amazon.com/cn/s3/storage-classes/
设置方式:
com.amazonaws.services.s3.model.AbstractPutObjectRequest#setStorageClass(com.amazonaws.services.s3.model.StorageClass)

BTW: 一个Bucket内部不同对象可以使用不同的模式
Q: Can I have a bucket that has different objects in different storage classes?
Yes, you can have an S3 bucket that has different objects stored in S3 Standard, S3 Intelligent-Tiering, S3 Standard-IA, S3 One Zone-IA, S3 Glacier Instant Retrieval, S3 Glacier Flexible Retrieval, and S3 Glacier Deep Archive.
6. 限制: Amazon S3 对象的大小范围可以从最小 0 字节到最大 5TB。可在单个 PUT 中上传的最大数据对象为 5 GB。对于大于 100MB 的对象,客户应该考虑使用分段上传功能。
7. 可以创建一些规则(lifecycle rule)来管理Version,例如把30天前的版本换成更便宜的存储模式或者直接删除等等。不过这个配置限制比较多,例如,天数要求不小于30天。
常见的一些规则可参考下面:
Move current versions of objects between storage classes
Move noncurrent versions of objects between storage classes
Expire current versions of objects
Permanently delete noncurrent versions of objects
Delete expired object delete markers or incomplete multipart uploads
8 关于删除:
启用版本控制后,简单 DELETE 无法永久删除对象。这种情况下,Amazon S3 将在存储桶中插入删除标记,该删除标记将成为对象的当前版本并具有新的 ID。
当您尝试对当前版本为删除标记的对象执行 GET 操作时,Amazon S3 将该对象作为已删除对象对待(即使它尚未被擦除),并返回 404 错误。
要永久删除受版本控制的对象,您必须使用 DELETE Object versionId。
9 分段上传和分段并发下载
当传一个大文件时可以考虑使用这种方式去传,防止一个大文件传输的过程中失败又需要从0开始。
https://aws.amazon.com/cn/blogs/developer/parallelizing-large-downloads-for-optimal-speed/
下面的这段话待验证:
“An object can be uploaded to S3 in multiple parts. You can retrieve a part of an object from S3 by specifying the part number in GetObjectRequest. TransferManager uses this logic to download all parts of an object asynchronously and writes them to individual, temporary files. The temporary files are then merged into the destination file provided by the user. For more information, see the implementation here.
We hope you’ll try the new parallel download feature supported by TransferManger. Feel free to leave your feedback in the comments.”

https://blog.duyidong.com/2021/08/07/speed-up-s3-download/

实际上,分段是有大小限制的:

“Exception in thread “main” com.amazonaws.services.s3.model.AmazonS3Exception: Your proposed upload is smaller than the minimum allowed size (Service: Amazon S3; Status Code: 400; Error Code: EntityTooSmall; Request ID: 0X4N796MSHXAWKX1; S3 Extended Request ID: ncnrYdqGPST5uqPK22cTTne9deGZuPqC7IouxoJpiTSr4CBQZhhQQfc1AOKEMgmwOtz0ivYFxFA=; Proxy: null), S3 Extended Request ID: ncnrYdqGPST5uqPK22cTTne9deGZuPqC7IouxoJpiTSr4CBQZhhQQfc1AOKEMgmwOtz0ivYFxFA=

Part numbers can be any number from 1 to 10,000, inclusive. A part number uniquely identifies a part and also defines its position within the object being created. If you upload a new part using the same part number that was used with a previous part, the previously uploaded part is overwritten. Each part must be at least 5 MB in size, except the last part. There is no size limit on the last part of your multipart upload.

final GetObjectMetadataRequest getObjectMetadataRequest = new GetObjectMetadataRequest(“mybucket”, “mykey”);
getObjectMetadataRequest.setPartNumber(1); //这行很关键,不带,获取不到number
final ObjectMetadata objectMetadata1 = s3.getObjectMetadata(getObjectMetadataRequest);
final Integer partCount2 = objectMetadata1.getPartCount();

对比下对于分段上传的对象的两种获取属性的方式:
{Accept-Ranges=bytes, Content-Length=7319922, Content-Type=text/plain, ETag=8a7bb2190ab64abafe382bb463ea42cc-2, Last-Modified=Tue Dec 07 14:42:16 CST 2021, x-amz-server-side-encryption=AES256}
//加了 getObjectMetadataRequest.setPartNumber(1) 之后。
{Accept-Ranges=bytes, Content-Length=5242880, Content-Range=bytes 0-5242879/7319922, Content-Type=text/plain, ETag=8a7bb2190ab64abafe382bb463ea42cc-2, Last-Modified=Tue Dec 07 14:42:16 CST 2021, x-amz-mp-parts-count=2, x-amz-server-side-encryption=AES256}
分段数就是:x-amz-mp-parts-count

查询一个不存在的part,例如对于不分段的内容,访问“2”段的属性:则会抛出异常:
Exception in thread “main” com.amazonaws.services.s3.model.AmazonS3Exception: Requested Range Not Satisfiable (Service: Amazon S3; Status Code: 416; Error Code: 416 Requested Range Not Satisfiable; Request ID: 3VX9R81T5FA4PRN5; S3 Extended Request ID: SLi99kXksJ1E3a6ENj/j+N+qf3XjL6/6+vw7LWxtQCNqjAjDJw5SpxZ7Nw/Gq3hmygb2UFJY2NA=; Proxy: null), S3 Extended Request ID: SLi99kXksJ1E3a6ENj/j+N+qf3XjL6/6+vw7LWxtQCNqjAjDJw5SpxZ7Nw/Gq3hmygb2UFJY2NA=

查询一个不分段的对象,段是可以设置成“1”的,因为一定存在第1段。

实现:

@Override
public Long call() throws Exception {
RandomAccessFile randomAccessFile = new RandomAccessFile(destinationFile, “rw”); //目标文件
FileChannel channel = randomAccessFile.getChannel();
channel.position(position); //位置
S3ObjectInputStream objectContent = null;
long filePosition;

try {

S3Object object = serviceCall.call();

objectContent = object.getObjectContent();

final byte[] buffer = new byte[BUFFER_SIZE];
int bytesRead;

final ByteBuffer byteBuffer = ByteBuffer.wrap(buffer);
while ((bytesRead = objectContent.read(buffer)) > -1) {
byteBuffer.limit(bytesRead);

while (byteBuffer.hasRemaining()) {
channel.write(byteBuffer);
}
byteBuffer.clear();
}

filePosition = channel.position();
} finally {
IOUtils.closeQuietly(objectContent, LOG);
IOUtils.closeQuietly(randomAccessFile, LOG);
IOUtils.closeQuietly(channel, LOG);
}
return filePosition;
}

10:为什么要使用 Transfer Acceleration?

位于全球各地的客户需要上传到集中式存储桶。

定期跨大洲传输数 GB 至数 TB 数据。

在上传到 Amazon S3 时无法充分利用 Internet 上的所有可用带宽。

11: 下载部分数据:range

使用方法:
com.amazonaws.services.s3.model.GetObjectRequest#setRange(long, long)
com.amazonaws.services.s3.model.GetObjectRequest#setRange(long) //实际调用setRange(start, Long.MAX_VALUE – 1);
最后在发出请求时,会带上一个Header:
Range: bytes=5-7

12: 对象元数据
可以自定义,但是有限制:
”PUT 请求标头的大小限制为 8 KB。在 PUT 请求标头中,用户定义的元数据的大小限制为 2 KB。通过计算每个键和值的 UTF-8 编码中的字节总数来测量用户定义的元数据的大小。“

com.amazonaws.services.s3.model.ObjectMetadata#setUserMetadata
com.amazonaws.services.s3.model.PutObjectRequest#withMetadata

13: 关于要不要设置x-amz-server-side-encryption:AES256这个header来对对象进行落盘加密(使用SSE-S3).

就是如果bucket开了SSE-S3,其实请求头加不加一样,如果bucket没有开
那么要设置请求头才能加密

14: 关于接入区域和S3存储区域是否一致对操作的影响:

当复制时,假设目标bucket不在这个接入点一致的区域,则不允许复制。

15: copy时,要注意一些属性是不能直接带过去的。例如加密、存储级别:
/**
* The optional Amazon S3 storage class to use when storing the newly copied
* object. If not specified, the default, standard storage class will be used.
*

* For more information on Amazon S3 storage classes and available values,
* see the {@link StorageClass} enumeration.
*/
private String storageClass;

16: 坑
默认情况下,您可以在每个 AWS 账户 中创建多达 100 个存储桶。如果您需要更多存储桶,则可以通过提交服务限额提升请求将账户的存储桶限制提高至最多 1,000 个存储桶。无论您使用许多存储桶还是少量存储桶,性能都没有差异。

发布者

傅, 健

程序员,Java、C++、开源爱好者. About me