Set up Google Cloud Storage
Dynamically import tasks and export annotations to Google Cloud Storage (GCS) buckets in Label Studio. For details about how Label Studio secures access to cloud storage, see Secure access to cloud storage.
Configure access to your Google Cloud Storage bucket
First, review the information in Cloud storage for projects and Secure access to cloud storage.
Then you will need to complete the following prerequisites:
1. Enable programmatic access to your bucket
See Cloud Storage Client Libraries in the Google Cloud Storage documentation for how to set up access to your GCS bucket.
2. Set up authentication to your bucket
Your account must have the Service Account Token Creator and Storage Object Viewer roles and storage.buckets.get access permission. See Setting up authentication and IAM permissions for Cloud Storage in the Google Cloud Storage documentation.
3. Configure CORS
Set up cross-origin resource sharing (CORS) access to your bucket, using a policy that allows GET access from the same host name as your Label Studio deployment. See Configuring cross-origin resource sharing (CORS) in the Google Cloud User Guide.
note
This is only required if you are using pre-signed URLs. If you are using proxying, you do not have to configure CORS. For more information, see Pre-signed URLs vs Storage proxies.
Use or modify the following example:
echo '[
{
"origin": ["*"],
"method": ["GET"],
"responseHeader": ["Content-Type","Access-Control-Allow-Origin"],
"maxAgeSeconds": 3600
}
]' > cors-config.json
Replace YOUR_BUCKET_NAME with your actual bucket name in the following command to update CORS for your bucket:
gsutil cors set cors-config.json gs://YOUR_BUCKET_NAME
Google Cloud Storage
Before you begin:
- Review the information in Cloud storage for projects and Secure access to cloud storage.
- Configure access to your bucket.
Google Application Credentials
You will need to provide Google Application Credentials. These will be a JSON file that you input while setting up your storage.
- From the Google Cloud Console, go to IAM & Admin > Service Accounts.
- Select the specific service account you need credentials for. If you don’t have one, create a new one.
- In the service account details, go to the Keys tab and click Add Key > Create new key.
- Select the JSON key type and click Create. The JSON file will be generated and automatically downloaded to your computer.
See also:
note
If you're using a service account to authorize access to the Google Cloud Platform, make sure to activate it. See gcloud auth activate-service-account.
Create a source storage connection
From Label Studio, open your project and select Settings > Cloud Storage > Add Source Storage.
Select Google Cloud Storage and click Next.
Configure Connection
Complete the following fields and then click Test connection:
| Field | Description |
|---|---|
| Storage Title | Enter a name to identify the storage connection. |
| Bucket Name | Enter the name of your GCS bucket. |
| Google Application Credentials |
Enter the JSON file with the GCS credentials you created to manage authentication for your bucket. On-prem users: Alternatively, you can use the GOOGLE_APPLICATION_CREDENTIALS environment variable and/or set up Application Default Credentials, so that users do not need to configure credentials manually. See Application Default Credentials for enhanced security below. |
| Google Project ID |
Enter the ID of your Google project in which the bucket is located (for example, my-label-studio-project). If you're unsure, you can find this in Google Cloud Console under IAM & Admin > Settings. |
| Use pre-signed URLs (On) / Proxy through the platform (Off) |
This determines how data from your bucket is loaded:
For more information, see Pre-signed URLs vs Storage proxies. |
| Expire pre-signed URLs (minutes) | Control how long pre-signed URLs remain valid. |
Import Settings & Preview
Complete the following fields and then click Load preview to ensure you are syncing the correct data:
| Bucket Prefix | Optionally, enter the directory name within your bucket that you would like to use. For example, data-set-1 or data-set-1/subfolder-2. |
| Import Method | Select whether you want create a task for each file in your bucket or whether you would like to use a JSON/JSONL/Parquet file to define the data for each task. |
| File Name Filter | Specify a regular expression to filter bucket objects. Use .* to collect all objects. |
| Scan all sub-folders | Enable this option to perform a recursive scan across subfolders within your container. |
Review & Confirm
If everything looks correct, click Save & Sync to sync immediately, or click Save to save your settings and sync later.
Tip
You can also use the API to sync import storage.
Create a target storage connection
From Label Studio, open your project and select Settings > Cloud Storage > Add Target Storage.
Select Google Cloud Storage and click Next.
Complete the following fields:
| Storage Title | Enter a name to identify the storage connection. |
| Bucket Name | Enter the name of your GCS bucket. |
| Bucket Prefix |
Optionally, enter the directory name within your bucket that you would like to use. For example, data-set-1 or data-set-1/subfolder-2.
|
| Google Application Credentials |
Enter the JSON file with the GCS credentials you created to manage authentication for your bucket. On-prem users: Alternatively, you can use the GOOGLE_APPLICATION_CREDENTIALS environment variable and/or set up Application Default Credentials, so that users do not need to configure credentials manually. See Application Default Credentials for enhanced security below. |
| Google Project ID |
Enter the ID of your Google project in which the bucket is located (for example, my-label-studio-project). If you're unsure, you can find this in Google Cloud Console under IAM & Admin > Settings. |
| Can delete objects from storage | Enable this option if you want to delete annotations stored in the bucket when they are deleted in Label Studio. Your credentials must include the ability to delete bucket objects. |
After adding the storage, click Sync.
Tip
You can also use the API to sync export storage.
Application Default Credentials for enhanced security for GCS
If you use Label Studio on-premises with Google Cloud Storage, you can set up Application Default Credentials to provide cloud storage authentication globally for all projects, so users do not need to configure credentials manually.
The recommended way to to do this is by using the GOOGLE_APPLICATION_CREDENTIALS environment variable. For example:
export GOOGLE_APPLICATION_CREDENTIALS=json-file-with-GCP-creds-23441-8f8sd99vsd115a.json
Google Cloud Storage with Workload Identity Federation (WIF)
In Label Studio Enterprise, you can use Workload Identity Federation (WIF) pools with Google Cloud Storage.
Unlike with application credentials, WIF allows you to use temporary credentials. Each time you make a request to GCS, Label Studio connects to your identity pool to request temporary credentials.
For more information, see Google Cloud Storage with Workload Identity Federation (WIF) in our Enterprise documentation.
Add storage with the Label Studio API
You can also use the API to programmatically create connections. See our API documentation.
IP filtering for enhanced security for GCS
Google Cloud Storage offers bucket IP filtering as a powerful security mechanism to restrict access to your data based on source IP addresses. This feature helps prevent unauthorized access and provides fine-grained control over who can interact with your storage buckets.
Read more about Source storage behind your VPC.
Common Use Cases:
- Restrict bucket access to only your organization’s IP ranges
- Allow access only from specific VPC networks in your infrastructure
- Secure sensitive data by limiting access to known IP addresses
- Control access for third-party integrations by whitelisting their IPs
How to Set Up IP Filtering
- First, create your GCS bucket through the console or CLI
- Create a JSON configuration file to define IP filtering rules. You have two options:
For public IP ranges:
{ "mode": "Enabled", "publicNetworkSource": { "allowedIpCidrRanges": [ "xxx.xxx.xxx.xxx", // Your first IP address "xxx.xxx.xxx.xxx", // Your second IP address "xxx.xxx.xxx.xxx/xx" // Your IP range in CIDR notation ] } }
For VPC network sources:
{
"mode": "Enabled",
"vpcNetworkSources": [
{
"network": "projects/PROJECT_ID/global/networks/NETWORK_NAME",
"allowedIpCidrRanges": [
RANGE_CIDR
]
}
]
}
Apply the IP filtering rules to your bucket using the following command:
gcloud alpha storage buckets update gs://BUCKET_NAME --ip-filter-file=IP_FILTER_CONFIG_FILETo remove IP filtering rules when no longer needed:
gcloud alpha storage buckets update gs://BUCKET_NAME --clear-ip-filter
Limitations to Consider
- Maximum of 200 IP CIDR blocks across all rules
- Maximum of 25 VPC networks in the IP filter rules
- Not supported for dual-regional buckets
- May affect access from certain Google Cloud services