Getting Started

Warning: PaaSTA is an opinionated way to integrate a collection of open source components in a holistic way to build a PaaS. It is not optimized to be simple to deploy for operators. It is optimized to not reinvent the wheel and utilizes existing solutions to problems where possible.

PaaSTA has many dependencies. This document provides documentation on installing some of these dependencies, but some of them are left as an exercise to the reader.

PaaSTA is used in production at Yelp, and has never been designed to be easy to deploy or installable from a single command (curl paasta.sh | sudo bash). We don’t install things that way at Yelp, and we don’t expect others to install things like that either. At Yelp we happen to use Puppet to deploy PaaSTA and the related components. Currently all of the Puppet code is not open source, but we hope to eventually have a fully working example deployment.

We do have an example cluster which uses docker-compose to create containers running the necessary components of a PaaSTA cluster. However, it is not a recommended production configuration.

paasta_tools

The paasta_tools package contains the PaaSTA CLI and other extra integration code that interacts with the other components. Binary packages of paasta_tools are currently not available, so one must build them and install them manually:

git clone git@github.com:Yelp/paasta.git
# Assuming you are on Ubuntu Trusty
make itest_trusty
sudo dpkg -i dist/paasta_tools*.deb

This package must be installed anywhere the PaaSTA CLI and on the Mesos/Marathon masters. If you are using SmartStack for service discovery, then the package must be installed on the Mesos Slaves as well so they can query the local API.

Once installed, paasta_tools reads global configuration from /etc/paasta/. This configuration is in key/value form encoded as JSON. All files in /etc/paasta are merged together to make it easy to deploy files with configuration management.

For example, one essential piece of configuration that must be deployed to servers that are a member of a particular cluster is the cluster setting:

# /etc/paasta/cluster.json
{
  "cluster": "test-cluster"
}

It is not necessary to define this config option for servers that only require the PaaSTA CLI tools (as they may not technically be part of any particular PaaSTA cluster).

See more documentation for system paasta configs

soa-configs

soa-configs are the shared configuration storage that PaaSTA uses to hold the description and configuration of what services exist and how they should be deployed and monitored.

This directory needs to be deployed globally in the same location to every server that runs any PaaSTA component. See the dedicated documentation on how to build your own soa-configs.

soa-configs also transport the deployments.json files for each service. This file contains a mapping for which shas should be deployed where. These files are generated by the generate_all_deployments command. This method allows PaaSTA to inspect the deployments for each service once, and deploy that information in soa-configs, as opposed to having each cluster inspecting git directly.

Docker and a Docker Registry

PaaSTA uses Docker to build and distribute code for each service. PaaSTA assumes that a single registry is available and that the associated components (Docker commands, unix users, mesos slaves, etc) have the correct credentials to use it.

The docker registry needs to be defined in a config file in /etc/paasta/. PaaSTA merges all json files in /etc/paasta/ together, so the actual filename is irrelevant, but here would be an example /etc/paasta/docker.json:

{
  "docker_registry": "private-docker-registry.example.com:443"
}

There are many registries available to use, or you can host your own.

Mesos

PaaSTA uses Mesos to do the heavy lifting of running the actual services on pools of machines. See the official documentation on how to get started with Mesos.

Marathon

PaaSTA uses Marathon for supervising long-running services running in Mesos. See the official documentation for how to get started with Marathon. Then, see the PaaSTA documentation for how to define Marathon jobs.

Once Marathon jobs are defined in soa-configs, there are a few tools provided by PaaSTA that interact with the Marathon API:

  • deploy_marathon_services: Does the initial sync between soa-configs and the Marathon API. This is the tool that handles “bouncing” to new version of code, and resizing Marathon applications when autoscaling is enabled. This is idempotent, and should be run periodically on a box with a marathon.json file in the system paasta config directory (Usually /etc/paasta). We recommend running this frequently - delays between runs of this command will limit how quickly new versions of services or changes to soa-configs are picked up.
  • cleanup_marathon_jobs: Cleans up lost or abandoned services. This tool looks for Marathon jobs that are not defined in soa-configs and removes them.
  • check_marathon_services_replication: Iterates over all Marathon services and inspects their health. This tool integrates with the monitoring infrastructure and will alert the team responsible for the service if it becomes unhealthy to the point where manual intervention is required.

Chronos

Chronos is a Mesos framework for running scheduled tasks. See the official documentation for how to get started with Chronos. Then, see the PaaSTA documentation for how to define Chronos jobs.

PaaSTA has tools for synchronizing jobs with the Chronos API:

  • deploy_chronos_jobs: This tool does the bouncing and initial setup of Chronos jobs that are defined in soa-configs. This is idempotent, and should be run periodically on a box with a chronos.json file in the system paasta config directory (Usually /etc/paasta). We recommend running this frequently - delays between runs of this command will limit how quickly new versions of services or changes to soa-configs are picked up.
  • cleanup_chronos_jobs: Cleans up lost or abandoned Chronos jobs.
  • check_chronos_jobs: Iterates over the current status of the Chronos jobs associated with a service and alerts the team responsible when they start to fail.
  • list_chronos_jobs: List all the chronos jobs in a cluster.

SmartStack and Hacheck

SmartStack is a dynamic service discovery system that allows clients to find and route to healthy mesos tasks for a particular service. Smartstack consists of two agents: nerve and synapse. Nerve is responsible for health-checking services and registering them in ZooKeeper. Synapse then reads that data from ZooKeeper and configures an HAProxy instance.

To manage the configuration of nerve (detecting which services are running on a node and what port they are using, etc.), we have a package called nerve-tools. This repo builds a .deb package, and should be installed on all slaves. Each slave should run configure_nerve periodically. We recommend this runs quite frequently (we run it every 5s), since Marathon tasks created by Paasta are not available to clients until nerve is reconfigured.

Similarly, to manage the configuration of synapse, we have a package called synapse-tools. Each slave should have this installed, and should run configure_synapse periodically. configure_synapse can run less frequently than configure_nerve – it only limits how quickly a new service, service instance, or haproxy option changes in smartstack.yaml will take effect.

Alongside SmartStack, we run hacheck. Hacheck is a small HTTP service that handles health checks for services. nerve-tools and synapse-tools configure nerve and HAProxy, respectively, to send its health check requests through hacheck. Hacheck provides several behaviors that are useful for Paasta:

  • It caches health check results for a short period of time (1 second, by default). This avoids overloading services if many health check requests arrive in a short period of time.
  • It can preemptively return error codes for health checks, allowing us to remove a task from load balancers before shutting it down. (This is implemented in the HacheckDrainMethod.)

Packages for nerve-tools and synapse-tools are available in our bintray repo.

Sensu

Sensu is a flexible and scalable monitoring system that allows clients to send alerts for arbitrary events. PaaSTA uses Sensu to allow individual teams to get alerts for their services.

The official documentation has instructions on how to set it up.

Out of the box Sensu doesn’t understand team-centric routing, and must be combined with handlers that are “team aware” it it is installed in a multi-tenant environment. We to do that, we have written some custom Sensu handlers to do that.

Sensu is an optional but highly recommended component.

Jenkins / Build Orchestration

Jenkins is the suggested method for orchestrating build pipelines for services, but it is not a hard requirement. The actual method that Yelp uses to integrate Jenkins with PaaSTA is not open source.

In practice, each organization will have to decide how they want to actually run the paasta cli tool to kick off the building and deploying of images. This may be something as simple as a bash script:

#!/bin/bash
service=my_service
sha=$(git rev-parse HEAD)
paasta itest --service $service --commit $sha
paasta push-to-registry --service $service --commit $sha
paasta mark-for-deployment --git-url $(git config --get remote.origin.url) --commit $sha --deploy-group prod.main --service $service

PaaSTA can integrate with any existing orchestration tool that can execute commands like this.

Logging

Paasta can use one of several backends to centrally log events about what is happening in the infrastructure and to power paasta logs. The backends that are available are listed in the system config docs under log_writer and log_reader.

At Yelp, we use Scribe for log writing, so we use the scribe log writer. For reading logs, we have some in-house tools that are unfortunately not open source. The code that reads from these in-house tools are the scribereader log_reader driver, but this code relies on some not-open-source code, so we do not expect that logging via Scribe will work outside of Yelp.

The file log writer driver may be useful for getting log data into your logging system, but files are not generally aggregated across the whole cluster in a way that is useful for paasta logs. We are in need of alternate log reader driver, so please file an issue (or better yet, a pull request).