Files
canada-kaktus/README.md
Julian Haseleu 4805faf9db
All checks were successful
Gitea Docker Build Demo / Test (push) Successful in 1m1s
Gitea Docker Build Demo / Build_Image (push) Successful in 1m18s
chore(): added docs
2025-10-07 09:38:38 +00:00

335 lines
8.9 KiB
Markdown

# Canada Kaktus Documentation
[![Build Status](https://git.uploadfilter24.eu/covidnetes/canada-kaktus/actions/workflows/main.yaml/badge.svg?branch=main)](https://git.uploadfilter24.eu/covidnetes/canada-kaktus/actions)
## Overview
Canada Kaktus is a Kubernetes controller that automatically manages Cilium LoadBalancer IP pools by synchronizing them with Hetzner Cloud server instances. It continuously monitors Hetzner Cloud servers matching a specific label selector and updates a Kubernetes Custom Resource Definition (CRD) to maintain an up-to-date IP pool for load balancing services.
## Purpose
The application serves as a bridge between Hetzner Cloud infrastructure and Kubernetes/Cilium networking, ensuring that load balancer IP pools always reflect the current set of available server instances. This automation eliminates the need for manual IP pool management when servers are added or removed from the cluster.
## Architecture
### Components
1. **Configuration Management** (`config.go`)
- Environment-based configuration with default values
- Supports JSON configuration files with auto-reload capability
- Manages Hetzner Cloud API tokens and label selectors
2. **Health Monitoring** (`health.go`)
- HTTP health endpoint on port 8080
- Thread-safe health state management
- RESTful health checks for Kubernetes probes
3. **Hetzner Cloud Integration** (`hetzner.go`)
- Interacts with Hetzner Cloud API
- Discovers servers based on label selectors
- Extracts public IPv4 addresses from server instances
4. **Kubernetes Integration** (`k8s.go`)
- Manages Cilium LoadBalancer IP Pool CRDs
- In-cluster Kubernetes client configuration
- Template-based CRD generation and updates
5. **Logging** (`utils/logging.go`)
- Structured JSON logging with configurable levels
- Contextual logging with caller information
## Processing Flow
```mermaid
graph TD
A[Application Start] --> B[Load Configuration]
B --> C[Configure Logger]
C --> D[Start Health Server]
D --> E[Enter Main Loop]
E --> F[Query Hetzner Cloud API]
F --> G{Servers Found?}
G -->|No| H[Log Error & Set Unhealthy]
G -->|Yes| I[Extract IP Addresses]
I --> J{IPs Valid?}
J -->|No| K[Log Error & Set Unhealthy]
J -->|Yes| L[Get Current CRD Resource Version]
L --> M[Generate IP Pool Template]
M --> N[Update Kubernetes CRD]
N --> O{Update Successful?}
O -->|No| P[Log Error & Set Unhealthy]
O -->|Yes| Q[Log Success & Set Healthy]
H --> R[Wait 15 minutes]
K --> R
P --> R
Q --> R
R --> E
subgraph "Health Endpoint"
S[HTTP GET /health] --> T[Return Health Status]
end
subgraph "Hetzner Cloud"
U[Server Instances] --> V[Label Selector Filter]
V --> W[Public IPv4 Addresses]
end
subgraph "Kubernetes"
X[CiliumLoadBalancerIPPool CRD] --> Y[IP Pool Configuration]
Y --> Z[Load Balancer Services]
end
```
## Configuration
### Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `CANADA_KAKTUS_LOGLEVEL` | `Info` | Logging level (Debug, Info, Warn, Error) |
| `CANADA_KAKTUS_LABELSELECTOR` | `kops.k8s.io/instance-role=Node` | Label selector for Hetzner Cloud servers |
| `CANADA_KAKTUS_HCLOUD_TOKEN` | *(required)* | Hetzner Cloud API token |
### Configuration File
Optionally, a `config.json` file can be used with auto-reload capability:
```json
{
"LogLevel": "Info",
"LabelSelector": "kops.k8s.io/instance-role=Node",
"HcloudToken": "your-hetzner-token-here"
}
```
## Deployment
### Prerequisites
- Kubernetes cluster with Cilium CNI
- Hetzner Cloud API token with read access to servers
- Proper RBAC permissions for CRD management
### Required Kubernetes Permissions
```yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: canada-kaktus
rules:
- apiGroups: ["cilium.io"]
resources: ["ciliumloadbalancerippools"]
verbs: ["get", "create", "update", "patch"]
```
### Docker Deployment
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: canada-kaktus
spec:
replicas: 1
selector:
matchLabels:
app: canada-kaktus
template:
metadata:
labels:
app: canada-kaktus
spec:
containers:
- name: canada-kaktus
image: your-registry/canada-kaktus:latest
env:
- name: CANADA_KAKTUS_HCLOUD_TOKEN
valueFrom:
secretKeyRef:
name: hetzner-credentials
key: token
ports:
- containerPort: 8080
name: health
livenessProbe:
httpGet:
path: /health
port: health
initialDelaySeconds: 30
periodSeconds: 30
readinessProbe:
httpGet:
path: /health
port: health
initialDelaySeconds: 5
periodSeconds: 10
```
## API Endpoints
### Health Check
- **URL**: `GET /health`
- **Port**: `8080`
- **Response Codes**:
- `200 OK`: All operations successful
- `503 Service Unavailable`: Error in processing loop
## Generated Resources
### Cilium LoadBalancer IP Pool CRD
The application generates and maintains a `CiliumLoadBalancerIPPool` resource:
```yaml
apiVersion: cilium.io/v2alpha1
kind: CiliumLoadBalancerIPPool
metadata:
name: covidnetes-pool
annotations:
argocd.argoproj.io/tracking-id: "cilium-lb:cilium.io/CiliumLoadBalancerIPPool:kube-system/covidnetes-pool"
managed-by: "canada-kaktus"
spec:
blocks:
- cidr: "192.168.1.100/32"
- cidr: "192.168.1.101/32"
disabled: false
```
## Operation Details
### Main Loop Behavior
1. **Interval**: Runs every 15 minutes
2. **Error Handling**: Non-fatal errors are logged and health status is updated
3. **Resilience**: Continues operation despite temporary failures
4. **State Management**: Maintains health status for monitoring systems
### Error Scenarios
- **Hetzner API Failures**: Network issues, authentication problems, rate limiting
- **Kubernetes API Failures**: RBAC issues, CRD not found, API server unavailable
- **Configuration Issues**: Invalid tokens, missing permissions, malformed templates
### Logging
All operations are logged with structured JSON format including:
- Timestamp
- Log level
- Caller information
- Contextual details
- Error messages
Example log entry:
```json
{
"Caller": "Main",
"level": "info",
"msg": "Successfully recreated IP Pool CRD",
"time": "2025-10-07T10:30:00Z"
}
```
## Dependencies
### Go Modules
- **Hetzner Cloud SDK**: `github.com/hetznercloud/hcloud-go` - Hetzner Cloud API client
- **Kubernetes Client**: `k8s.io/client-go` - Kubernetes API interactions
- **Configuration**: `github.com/jinzhu/configor` - Environment and file-based config
- **Logging**: `github.com/sirupsen/logrus` - Structured logging
- **HTTP Router**: `github.com/gorilla/mux` - Health endpoint routing
### External Services
- **Hetzner Cloud API**: Server discovery and metadata retrieval
- **Kubernetes API**: CRD management and cluster integration
- **Cilium**: LoadBalancer IP pool consumption
## Monitoring and Observability
### Health Monitoring
- HTTP health endpoint for liveness/readiness probes
- Health status reflects the success of the last operation cycle
- Automatic health status updates on errors
### Logging
- Configurable log levels (Debug, Info, Warn, Error)
- Structured JSON output for log aggregation
- Contextual information for debugging
### Metrics
Currently, the application provides health status via HTTP endpoint. For production deployments, consider adding:
- Prometheus metrics for operation success/failure rates
- Timing metrics for API calls
- Counter metrics for IP pool updates
## Troubleshooting
### Common Issues
1. **Authentication Failures**
- Verify Hetzner Cloud token is valid and has necessary permissions
- Check token is correctly set in environment variable
2. **No Servers Found**
- Verify label selector matches your server configuration
- Check servers exist in the configured Hetzner project
3. **Kubernetes Permission Errors**
- Ensure proper RBAC permissions for CRD access
- Verify service account has necessary cluster roles
4. **Health Endpoint Unavailable**
- Check port 8080 is accessible
- Verify no port conflicts in the cluster
### Debug Mode
Enable debug logging by setting:
```bash
export CANADA_KAKTUS_LOGLEVEL=Debug
```
This provides detailed information about:
- Server discovery process
- IP address extraction
- CRD template generation
- Kubernetes API interactions
## Development
### Building
```bash
go mod download
go build -o canada-kaktus ./cmd/main.go
```
### Testing
```bash
go test ./internal/...
```
### Local Development
For local testing, ensure you have:
- Valid Hetzner Cloud token
- Kubernetes cluster access (can use kind/minikube)
- Cilium installed in the cluster
Set environment variables and run:
```bash
export CANADA_KAKTUS_HCLOUD_TOKEN="your-token"
go run ./cmd/main.go
```