Set up Amazon S3 cloud storage
You can connect your Amazon S3 bucket to Label Studio to retrieve labeling tasks or store completed annotations.
Source store allows you to import data from your cloud environment. See Source storage.
Target store allows you to automatically export annotations. See Target storage.
You can create one or both connection types.
Configure access to your S3 bucket
Before you begin, you must configure access and permissions for your data.
These steps assume that you’re using the same AWS role to manage both source and target storage with Label Studio. If you only use S3 for source storage, Label Studio does not need PUT access to the bucket.
1. Enable programmatic access to your bucket
See the Amazon Boto3 configuration documentation for more on how to set up access to your S3 bucket.
note
A session token is only required in case of temporary security credentials. See the AWS Identity and Access Management documentation on Requesting temporary security credentials.
2. Assign a role policy
Assign the following role policy to an account you set up to retrieve source tasks and store annotations in S3, replacing <your_bucket_name> with your bucket name:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor1",
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::<your_bucket_name>",
"arn:aws:s3:::<your_bucket_name>/*"
]
}
]
}
note
"s3:PutObject" is only needed for target storage connections, and "s3:DeleteObject" is only needed for target storage connections where you want to allow deleted annotations in Label Studio to also be deleted in the target S3 bucket.
3. Configure CORS
Set up cross-origin resource sharing (CORS) access to your bucket, using a policy that allows GET access from the same host name as your Label Studio deployment. See Configuring cross-origin resource sharing (CORS) in the Amazon S3 User Guide.
note
This is only required if you are using pre-signed URLs. If you are using proxying, you do not have to configure CORS. For more information, see Pre-signed URLs vs Storage proxies.
Use or modify the following example:
[
{
"AllowedHeaders": [
"*"
],
"AllowedMethods": [
"GET"
],
"AllowedOrigins": [
"*"
],
"ExposeHeaders": [
"x-amz-server-side-encryption",
"x-amz-request-id",
"x-amz-id-2"
],
"MaxAgeSeconds": 3000
}
]
Amazon S3
Before you begin:
- Review the information in Cloud storage for projects and Secure access to cloud storage.
- Configure access to your S3 bucket.
- Obtain your AWS access keys.
Create a source storage connection
From Label Studio, open your project and select Settings > Cloud Storage > Add Source Storage.
Select Amazon S3 and click Next.
Configure Connection
Complete the following fields and then click Test connection:
| Field | Description |
|---|---|
| Storage Title | Enter a name to identify the storage connection. |
| Bucket Name | Enter the name of your S3 bucket. |
| Region Name |
Enter the AWS region name. For example us-east-1.
|
| S3 Endpoint | Optionally, enter an S3 endpoint. This is useful if you want to override the default URL created by S3 to access your bucket. |
| Access Key ID | Enter the access key ID of the temporary security credentials for an AWS account with access to your S3 bucket. |
| Secret Access Key | Enter the secret key of the temporary security credentials for an AWS account with access to your S3 bucket. |
| Session Token | Optionally, enter a session token of the temporary security credentials for an AWS account with access to your S3 bucket. |
| Use pre-signed URLs (On) / Proxy through the platform (Off) |
This determines how data from your bucket is loaded:
For more information, see Pre-signed URLs vs Storage proxies. |
| Expire pre-signed URLs (minutes) | Control how long pre-signed URLs remain valid. |
Import Settings & Preview
Complete the following fields and then click Load preview to ensure you are syncing the correct data:
| Bucket Prefix | Optionally, enter the directory name within your bucket that you would like to use. For example, data-set-1 or data-set-1/subfolder-2. |
| Import Method | Select whether you want create a task for each file in your bucket or whether you would like to use a JSON/JSONL/Parquet file to define the data for each task. |
| File Name Filter | Specify a regular expression to filter bucket objects. Use .* to collect all objects. |
| Scan all sub-folders | Enable this option to perform a recursive scan across subfolders within your container. |
Review & Confirm
If everything looks correct, click Save & Sync to sync immediately, or click Save to save your settings and sync later.
Tip
You can also use the API to sync import storage.
Create a target storage connection
From Label Studio, open your project and select Settings > Cloud Storage > Add Target Storage.
Select Amazon S3 and click Next.
Complete the following fields:
| Storage Title | Enter a name to identify the storage connection. |
| Bucket Name | Enter the name of your S3 bucket. |
| Bucket Prefix |
Optionally, enter the directory name within your bucket that you would like to use. For example, data-set-1 or data-set-1/subfolder-2.
|
| Region Name |
Enter the AWS region name. For example us-east-1.
|
| S3 Endpoint | Optionally, enter an S3 endpoint. This is useful if you want to override the default URL created by S3 to access your bucket. |
| Access Key ID | Enter the access key ID of the temporary security credentials for an AWS account with access to your S3 bucket. |
| Secret Access Key | Enter the secret key of the temporary security credentials for an AWS account with access to your S3 bucket. |
| Session Token | Optionally, enter a session token of the temporary security credentials for an AWS account with access to your S3 bucket. |
| Can delete objects from storage | Enable this option if you want to delete annotations stored in the S3 bucket when they are deleted in Label Studio. The storage credentials associated with the bucket must include the ability to delete bucket objects. |
After adding the storage, click Sync.
Tip
You can also use the API to sync export storage.
Amazon S3 with IAM role
In Label Studio Enterprise, you can use an IAM role configured with an external ID to access S3 bucket contents securely. An ‘external ID’ is a unique identifier that enhances security by ensuring that only trusted entities can assume the role, reducing the risk of unauthorized access. See Amazon S3 with IAM role in the Enterprise documentation.
Add storage with the Label Studio API
You can also use the API to programmatically create connections. See our API documentation.
IP filtering and VPN for enhanced security for S3 storage
To maximize security and data isolation behind a VPC, restrict access to the Label Studio backend and internal network users by setting IP restrictions for storage, allowing only trusted networks to perform task synchronization and generate pre-signed URLs. Additionally, establish a secure connection between storage and users’ browsers by configuring a VPC private endpoint or limiting storage access to specific IPs or VPCs.
Read more about Source storage behind your VPC.
Bucket Policy Example for S3 storage
warning
These example bucket policies explicitly deny access to any requests outside the allowed IP addresses. Even the user that entered the bucket policy can be denied access to the bucket if the user doesn't meet the conditions. Therefore, make sure to review the bucket policy carefully before saving it. If you get accidentally locked out, see How to regain access to an Amazon S3 bucket.
Helpful Resources:
Go to your S3 bucket and then Permissions > Bucket Policy in the AWS management console. Add the following policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DenyAccessUnlessFromSaaSIPsForListAndGet",
"Effect": "Deny",
"Principal": {
"AWS": "arn:aws:iam::490065312183:role/label-studio-app-production"
},
"Action": [
"s3:ListBucket",
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::YOUR_BUCKET_NAME",
"arn:aws:s3:::YOUR_BUCKET_NAME/*"
],
"Condition": {
"NotIpAddress": {
"aws:SourceIp": [
//// IP ranges for app.humansignal.com from the documentation
"x.x.x.x/32",
"x.x.x.x/32",
"x.x.x.x/32"
]
}
}
},
//// Optional
{
"Sid": "DenyAccessUnlessFromVPNForGetObject",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*",
"Condition": {
"NotIpAddress": {
"aws:SourceIp": "YOUR_VPN_SUBNET/32"
}
}
}
]
}