Terraform基础设施即代码实战
// 目录 · contents
前言 Terraform架构 核心概念 HCL语法基础 Provider配置 变量与输出 资源定义 Data Sources 模块化设计 模块结构 模块定义 模块调用 状态管理 远程状态存储 状态操作 Workspaces 高级特性 Lifecycle规则 Moved块(重构) 条件表达式与循环 CI/CD集成 最佳实践 总结
前言
基础设施即代码(Infrastructure as Code,
IaC)是DevOps的核心实践之一。Terraform作为最流行的IaC工具,支持多云环境的基础设施管理。本文将从HCL语法基础到生产级最佳实践进行全面讲解。
graph TB
subgraph Workflow["Terraform工作流"]
Write["编写 .tf 文件"] --> Init["terraform init<br>初始化Provider"]
Init --> Plan["terraform plan<br>生成执行计划"]
Plan --> Apply["terraform apply<br>应用变更"]
Apply --> State["terraform.tfstate<br>状态文件"]
end
subgraph Providers["Providers"]
AWS["AWS Provider"]
GCP["GCP Provider"]
Azure["Azure Provider"]
K8s["Kubernetes Provider"]
end
Init --> Providers
Apply --> |"API调用"| Cloud["Cloud Resources"]
State --> |"记录映射"| Cloud
核心概念
graph LR
Config[".tf配置文件"] --> |"terraform plan"| Plan["执行计划<br>(+create, ~update, -destroy)"]
Plan --> |"terraform apply"| Resources["云资源"]
Resources --> |"记录状态"| State["State文件"]
State --> |"terraform plan时对比"| Config
HCL语法基础
Provider配置
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 # versions.tf - Provider版本锁定 terraform { required_version = ">= 1.6.0" required_providers { aws = { source = "hashicorp/aws" version = "~> 5.0" } kubernetes = { source = "hashicorp/kubernetes" version = "~> 2.25" } } # 远程状态存储 backend "s3" { bucket = "mycompany-terraform-state" key = "production/terraform.tfstate" region = "ap-northeast-1" dynamodb_table = "terraform-locks" encrypt = true } } provider "aws" { region = var.aws_region default_tags { tags = { Environment = var.environment ManagedBy = "terraform" Project = var.project_name } } }
变量与输出
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 # variables.tf variable "aws_region" { description = "AWS region" type = string default = "ap-northeast-1" } variable "environment" { description = "Environment name" type = string validation { condition = contains(["dev", "staging", "production"], var.environment) error_message = "Environment must be one of: dev, staging, production." } } variable "vpc_cidr" { description = "VPC CIDR block" type = string default = "10.0.0.0/16" } variable "private_subnets" { description = "Private subnet CIDRs" type = list(string) default = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"] } variable "cluster_config" { description = "EKS cluster configuration" type = object({ name = string version = string node_count = number instance_types = list(string) enable_logging = bool }) default = { name = "main" version = "1.29" node_count = 3 instance_types = ["m6i.large"] enable_logging = true } } # outputs.tf output "vpc_id" { description = "VPC ID" value = aws_vpc.main.id } output "cluster_endpoint" { description = "EKS cluster endpoint" value = aws_eks_cluster.main.endpoint sensitive = true } output "subnet_ids" { description = "Private subnet IDs" value = aws_subnet.private[*].id }
资源定义
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 # vpc.tf - VPC网络基础设施 resource "aws_vpc" "main" { cidr_block = var.vpc_cidr enable_dns_hostnames = true enable_dns_support = true tags = { Name = "${var.project_name}-${var.environment}-vpc" } } # 使用count创建多个子网 resource "aws_subnet" "private" { count = length(var.private_subnets) vpc_id = aws_vpc.main.id cidr_block = var.private_subnets[count.index] availability_zone = data.aws_availability_zones.available.names[count.index] tags = { Name = "${var.project_name}-private-${count.index + 1}" "kubernetes.io/role/internal-elb" = "1" } } # 使用for_each创建安全组规则 resource "aws_security_group" "app" { name_prefix = "${var.project_name}-app-" vpc_id = aws_vpc.main.id dynamic "ingress" { for_each = var.app_ports content { from_port = ingress.value.port to_port = ingress.value.port protocol = ingress.value.protocol cidr_blocks = ingress.value.cidr_blocks description = ingress.value.description } } egress { from_port = 0 to_port = 0 protocol = "-1" cidr_blocks = ["0.0.0.0/0"] } lifecycle { create_before_destroy = true } } # NAT Gateway resource "aws_nat_gateway" "main" { allocation_id = aws_eip.nat.id subnet_id = aws_subnet.public[0].id tags = { Name = "${var.project_name}-nat" } depends_on = [aws_internet_gateway.main] }
Data Sources
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 # 查询现有资源 data "aws_availability_zones" "available" { state = "available" } data "aws_ami" "ubuntu" { most_recent = true owners = ["099720109477"] # Canonical filter { name = "name" values = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"] } filter { name = "virtualization-type" values = ["hvm"] } } data "aws_caller_identity" "current" {} data "aws_eks_cluster_auth" "main" { name = aws_eks_cluster.main.name }
模块化设计
模块结构
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 infrastructure/ ├── modules/ │ ├── vpc/ │ │ ├── main.tf │ │ ├── variables.tf │ │ ├── outputs.tf │ │ └── README.md │ ├── eks/ │ │ ├── main.tf │ │ ├── variables.tf │ │ ├── outputs.tf │ │ └── iam.tf │ └── rds/ │ ├── main.tf │ ├── variables.tf │ └── outputs.tf ├── environments/ │ ├── dev/ │ │ ├── main.tf │ │ ├── terraform.tfvars │ │ └── backend.tf │ ├── staging/ │ │ ├── main.tf │ │ ├── terraform.tfvars │ │ └── backend.tf │ └── production/ │ ├── main.tf │ ├── terraform.tfvars │ └── backend.tf └── modules.tf
graph TB
subgraph Environments["环境配置"]
Dev["dev/main.tf"]
Staging["staging/main.tf"]
Prod["production/main.tf"]
end
subgraph Modules["可复用模块"]
VPC["modules/vpc"]
EKS["modules/eks"]
RDS["modules/rds"]
end
Dev --> VPC
Dev --> EKS
Staging --> VPC
Staging --> EKS
Staging --> RDS
Prod --> VPC
Prod --> EKS
Prod --> RDS
模块定义
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 # modules/vpc/main.tf resource "aws_vpc" "this" { cidr_block = var.vpc_cidr enable_dns_hostnames = true enable_dns_support = true tags = merge(var.tags, { Name = "${var.name}-vpc" }) } resource "aws_subnet" "private" { for_each = { for idx, cidr in var.private_subnet_cidrs : idx => cidr } vpc_id = aws_vpc.this.id cidr_block = each.value availability_zone = var.azs[each.key] tags = merge(var.tags, { Name = "${var.name}-private-${each.key}" Tier = "private" }) } # modules/vpc/variables.tf variable "name" { type = string } variable "vpc_cidr" { type = string } variable "private_subnet_cidrs" { type = list(string) } variable "azs" { type = list(string) } variable "tags" { type = map(string) default = {} } # modules/vpc/outputs.tf output "vpc_id" { value = aws_vpc.this.id } output "private_subnet_ids" { value = [for s in aws_subnet.private : s.id] }
模块调用
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 # environments/production/main.tf module "vpc" { source = "../../modules/vpc" name = "production" vpc_cidr = "10.0.0.0/16" private_subnet_cidrs = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"] azs = ["ap-northeast-1a", "ap-northeast-1c", "ap-northeast-1d"] tags = local.common_tags } module "eks" { source = "../../modules/eks" cluster_name = "production" cluster_version = "1.29" vpc_id = module.vpc.vpc_id subnet_ids = module.vpc.private_subnet_ids node_count = 5 instance_types = ["m6i.xlarge"] tags = local.common_tags } module "rds" { source = "../../modules/rds" name = "production" engine_version = "15.4" instance_class = "db.r6g.xlarge" allocated_storage = 100 vpc_id = module.vpc.vpc_id subnet_ids = module.vpc.private_subnet_ids multi_az = true tags = local.common_tags } locals { common_tags = { Environment = "production" Project = "myproject" ManagedBy = "terraform" } }
状态管理
远程状态存储
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 # 创建S3后端存储基础设施 resource "aws_s3_bucket" "terraform_state" { bucket = "mycompany-terraform-state" lifecycle { prevent_destroy = true } } resource "aws_s3_bucket_versioning" "terraform_state" { bucket = aws_s3_bucket.terraform_state.id versioning_configuration { status = "Enabled" } } resource "aws_s3_bucket_server_side_encryption_configuration" "terraform_state" { bucket = aws_s3_bucket.terraform_state.id rule { apply_server_side_encryption_by_default { sse_algorithm = "aws:kms" } } } # DynamoDB用于状态锁 resource "aws_dynamodb_table" "terraform_locks" { name = "terraform-locks" billing_mode = "PAY_PER_REQUEST" hash_key = "LockID" attribute { name = "LockID" type = "S" } }
状态操作
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 terraform state list terraform state show aws_vpc.main terraform state mv aws_vpc.main aws_vpc.primary terraform import aws_vpc.main vpc-12345678 terraform state rm aws_vpc.legacy terraform state pull > state_backup.json terraform refresh
sequenceDiagram
participant Dev as 开发者A
participant Lock as DynamoDB Lock
participant State as S3 State
participant Cloud as AWS
Dev->>Lock: 获取状态锁
Lock-->>Dev: 锁定成功
Dev->>State: 读取当前状态
State-->>Dev: terraform.tfstate
Dev->>Cloud: API调用(创建/修改/删除)
Cloud-->>Dev: 操作结果
Dev->>State: 更新状态文件
Dev->>Lock: 释放锁
Workspaces
1 2 3 4 5 6 7 8 9 10 11 12 terraform workspace new staging terraform workspace new production terraform workspace select production terraform workspace list
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 # 基于workspace的配置差异 locals { env_config = { dev = { instance_type = "t3.small" node_count = 2 multi_az = false } staging = { instance_type = "t3.medium" node_count = 3 multi_az = false } production = { instance_type = "m6i.large" node_count = 5 multi_az = true } } config = local.env_config[terraform.workspace] } resource "aws_instance" "app" { count = local.config.node_count instance_type = local.config.instance_type ami = data.aws_ami.ubuntu.id tags = { Name = "app-${terraform.workspace}-${count.index}" Environment = terraform.workspace } }
高级特性
Lifecycle规则
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 resource "aws_instance" "app" { ami = data.aws_ami.ubuntu.id instance_type = "m6i.large" lifecycle { # 先创建新资源再销毁旧资源 create_before_destroy = true # 禁止销毁(需要先移除此规则才能destroy) prevent_destroy = true # 忽略外部变更 ignore_changes = [ tags["LastModified"], user_data, ] # 替换触发器 replace_triggered_by = [ aws_security_group.app.id, ] } }
Moved块(重构)
1 2 3 4 5 6 7 8 9 10 11 # 重命名资源时,使用moved块避免destroy+create moved { from = aws_instance.web to = aws_instance.app } # 从count迁移到for_each moved { from = aws_subnet.private[0] to = aws_subnet.private["az-a"] }
条件表达式与循环
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 # 条件创建 resource "aws_cloudwatch_log_group" "app" { count = var.enable_logging ? 1 : 0 name = "/app/${var.environment}" } # for表达式 locals { # 列表转换 upper_names = [for name in var.names : upper(name)] # Map转换 tag_map = { for k, v in var.raw_tags : lower(k) => v } # 过滤 production_instances = [ for instance in aws_instance.app : instance.id if instance.tags["Environment"] == "production" ] } # for_each遍历 resource "aws_iam_user" "users" { for_each = toset(var.user_names) name = each.value }
CI/CD集成
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 name: Terraform on: pull_request: paths: ['infrastructure/**' ] push: branches: [main ] paths: ['infrastructure/**' ]jobs: plan: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: hashicorp/setup-terraform@v3 with: terraform_version: 1.7 .0 - name: Terraform Init run: terraform init working-directory: infrastructure/environments/production - name: Terraform Plan id: plan run: terraform plan -no-color -out=tfplan working-directory: infrastructure/environments/production - name: Comment PR if: github.event_name == 'pull_request' uses: actions/github-script@v7 with: script: | const plan = `${{ steps.plan.outputs.stdout }}`; github.rest.issues.createComment({ issue_number: context.issue.number, owner: context.repo.owner, repo: context.repo.repo, body: `## Terraform Plan\n\`\`\`\n${plan}\n\`\`\`` }); apply: needs: plan if: github.ref == 'refs/heads/main' runs-on: ubuntu-latest environment: production steps: - uses: actions/checkout@v4 - uses: hashicorp/setup-terraform@v3 - name: Terraform Apply run: | terraform init terraform apply -auto-approve working-directory: infrastructure/environments/production
最佳实践
状态文件永远不要提交到Git :使用远程后端(S3/GCS/Azure
Blob)
启用状态锁 :防止并发操作导致状态损坏
使用模块化设计 :DRY原则,公共模块复用
锁定Provider版本 :避免意外升级导致破坏性变更
Plan审查 :所有变更先plan,review后再apply
敏感数据管理 :使用sensitive = true标记敏感输出,不在tfvars中存储密钥
标签规范 :通过default_tags统一管理资源标签
小步变更 :避免一次性大规模变更,降低风险
graph LR
A["编写代码"] --> B["terraform plan"]
B --> C["代码审查"]
C --> D["terraform apply"]
D --> E["验证资源"]
E --> F["提交代码"]
style C fill:#FF9800,color:#fff
总结
Terraform通过声明式的方式管理基础设施,配合模块化设计和远程状态管理,能够实现基础设施的版本化、可审计、可复用。在团队协作中,结合CI/CD流水线和严格的Plan-Review-Apply流程,可以安全高效地管理复杂的多云基础设施。