Getting Started

Warning: PaaSTA is an opinionated way to integrate a collection of open source components in a holistic way to build a PaaS. It is not optimized to be simple to deploy for operators. It is optimized to not reinvent the wheel and utilizes existing solutions to problems where possible.

PaaSTA has many dependencies. This document provides documentation on installing some of these dependencies, but some of them are left as an exercise to the reader.

PaaSTA is used in production at Yelp, and has never been designed to be easy to deploy or installable from a single command (curl paasta.sh | sudo bash). We don’t install things that way at Yelp, and we don’t expect others to install things like that either. At Yelp we happen to use Puppet to deploy PaaSTA and the related components. Currently all of the Puppet code is not open source, but we hope to eventually have a fully working example deployment.

paasta_tools

The paasta_tools package contains the PaaSTA CLI and other extra integration code that interacts with the other components. Binary packages of paasta_tools are currently not available, so one must build them and install them manually:

git clone git@github.com:Yelp/paasta.git
# Assuming you are on Ubuntu Jammy or Noble
make itest_jammy
# or
make itest_noble
sudo dpkg -i dist/paasta-tools*.deb

This package must be installed anywhere the PaaSTA CLI is needed and on the kube nodes.

Once installed, paasta_tools reads global configuration from /etc/paasta/. This configuration is in key/value form encoded as JSON. All files in /etc/paasta are merged together to make it easy to deploy files with configuration management.

For example, one essential piece of configuration that must be deployed to servers that are a member of a particular cluster is the cluster setting:

# /etc/paasta/cluster.json
{
  "cluster": "test-cluster"
}

It is not necessary to define this config option for servers that only require the PaaSTA CLI tools (as they may not technically be part of any particular PaaSTA cluster).

See more documentation for system paasta configs

soa-configs

soa-configs are the shared configuration storage that PaaSTA uses to hold the description and configuration of what services exist and how they should be deployed and monitored.

This directory needs to be deployed globally in the same location to every server that runs any PaaSTA component. See the dedicated documentation on how to build your own soa-configs.

soa-configs also transport the deployments.json files for each service. This file contains a mapping for which shas should be deployed where. These files are generated by the generate_all_deployments command. This method allows PaaSTA to inspect the deployments for each service once, and deploy that information in soa-configs, as opposed to having each cluster inspecting git directly.

Docker and a Docker Registry

PaaSTA uses Docker to build and distribute code for each service. PaaSTA assumes that a single registry is available and that the associated components (Docker commands, unix users, Kubernetes Nodes, etc) have the correct credentials to use it.

The docker registry needs to be defined in a config file in /etc/paasta/. PaaSTA merges all json files in /etc/paasta/ together, so the actual filename is irrelevant, but here would be an example /etc/paasta/docker.json:

{
  "docker_registry": "private-docker-registry.example.com:443"
}

There are many registries available to use, or you can host your own.

Kubernetes

PaaSTA uses Kubernetes to manage and orchestrate its containerized services. See the PaaSTA documentation for how to define PaaSTA services in Kubernetes.

Once PaaSTA services are defined in soa-configs, there are a few tools provided by PaaSTA that interact with the Kubernetes API:

setup_kubernetes_job: Does the initial sync between soa-configs and the Kubernetes API. This is the tool that handles “bouncing” to new version of code, and resizing Kubernetes deployments when autoscaling is enabled. This is idempotent, and is ran periodically on a box with a deployments.json file in the /nail/etc/services directory, updating or creating the Kubernetes Deployment object representing the modified service instance.
cleanup_kubernetes_jobs: Cleans up lost or abandoned services. This tool looks for Kubernetes instances that are not defined in soa-configs and removes them.
check_kubernetes_services_replication: Iterates over all Kubernetes services and inspects their health. This tool integrates with the monitoring infrastructure and will alert the team responsible for the service if it becomes unhealthy to the point where manual intervention is required.

SmartStack and Hacheck

SmartStack is a dynamic service discovery system that allows clients to find and route to healthy Kubernetes Pods for a particular service. Smartstack consists of two agents: nerve and synapse. Nerve is responsible for health-checking services and registering them in ZooKeeper. Synapse then reads that data from ZooKeeper and configures an HAProxy instance.

To manage the configuration of nerve (detecting which services are running on a node and what port they are using, etc.), we have a package called nerve-tools. This repo builds a .deb package, and should be installed on all slaves. Each slave should run configure_nerve periodically. We recommend this runs quite frequently (we run it every 5s), since Kubernetes Pods created by PaaSTA are not available to clients until nerve is reconfigured.

Similarly, to manage the configuration of synapse, we have a package called synapse-tools. Each slave should have this installed, and should run configure_synapse periodically. configure_synapse can run less frequently than configure_nerve – it only limits how quickly a new service, service instance, or haproxy option changes in smartstack.yaml will take effect.

Alongside SmartStack, we run hacheck. Hacheck is a small HTTP service that handles health checks for services. nerve-tools and synapse-tools configure nerve and HAProxy, respectively, to send its health check requests through hacheck. Hacheck provides several behaviors that are useful for Paasta:

It caches health check results for a short period of time (1 second, by default). This avoids overloading services if many health check requests arrive in a short period of time.

It can preemptively return error codes for health checks, allowing us to remove a task from load balancers before shutting it down. (This is implemented in the HacheckDrainMethod.)

Sensu

Sensu is a flexible and scalable monitoring system that allows clients to send alerts for arbitrary events. PaaSTA uses Sensu to allow individual teams to get alerts for their services.

The official documentation has instructions on how to set it up.

Out of the box Sensu doesn’t understand team-centric routing, and must be combined with handlers that are “team aware” it it is installed in a multi-tenant environment. We to do that, we have written some custom Sensu handlers to do that.

Sensu is an optional but highly recommended component.

Jenkins / Build Orchestration

Jenkins is the suggested method for orchestrating build pipelines for services, but it is not a hard requirement. The actual method that Yelp uses to integrate Jenkins with PaaSTA is not open source.

In practice, each organization will have to decide how they want to actually run the paasta cli tool to kick off the building and deploying of images. This may be something as simple as a bash script:

#!/bin/bash
service=my_service
sha=$(git rev-parse HEAD)
paasta itest --service $service --commit $sha
paasta push-to-registry --service $service --commit $sha
paasta mark-for-deployment --git-url $(git config --get remote.origin.url) --commit $sha --deploy-group prod.main --service $service

PaaSTA can integrate with any existing orchestration tool that can execute commands like this.

Logging

Paasta can use one of several backends to centrally log events about what is happening in the infrastructure and to power paasta logs. The backends that are available are listed in the system config docs under log_writer and log_reader.

At Yelp, we use Scribe for log writing, so we use the scribe log writer. For reading logs, we have some in-house tools that are unfortunately not open source. The code that reads from these in-house tools are the scribereader log_reader driver, but this code relies on some not-open-source code, so we do not expect that logging via Scribe will work outside of Yelp.

The file log writer driver may be useful for getting log data into your logging system, but files are not generally aggregated across the whole cluster in a way that is useful for paasta logs. We are in need of alternate log reader driver, so please file an issue (or better yet, a pull request).