Getting Started
===============

**Warning**: PaaSTA is an opinionated way to integrate a collection of open
source components in a holistic way to build a PaaS. It is not optimized to be
simple to deploy for operators. It is optimized to not reinvent the wheel and
utilizes existing solutions to problems where possible.

PaaSTA has many dependencies. This document provides documentation on
installing some of these dependencies, but some of them are left as an
exercise to the reader.

PaaSTA is used *in production* at Yelp, and has never been designed to be
easy to deploy or installable from a single command (``curl paasta.sh | sudo bash``).
We don't install things that way at Yelp, and we don't expect others
to install things like that either. At Yelp we happen to use Puppet to deploy
PaaSTA and the related components. Currently all of the Puppet code is not
open source, but we hope to eventually have a fully working example deployment.

paasta_tools
------------

The ``paasta_tools`` package contains the PaaSTA CLI and other extra integration
code that interacts with the other components. Binary packages of ``paasta_tools``
are currently not available, so one must build them and install them manually::

  git clone git@github.com:Yelp/paasta.git
  # Assuming you are on Ubuntu Jammy or Noble
  make itest_jammy
  # or
  make itest_noble
  sudo dpkg -i dist/paasta-tools*.deb

This package must be installed anywhere the PaaSTA CLI is needed and on the kube nodes.

Once installed, ``paasta_tools`` reads global configuration from ``/etc/paasta/``.
This configuration is in key/value form encoded as JSON. All files in ``/etc/paasta``
are merged together to make it easy to deploy files with configuration management.

For example, one essential piece of configuration that must be deployed to servers
that are a member of a particular cluster is the ``cluster`` setting::

  # /etc/paasta/cluster.json
  {
    "cluster": "test-cluster"
  }

It is not necessary to define this config option for servers that only require the
PaaSTA CLI tools (as they may not technically be part of any particular PaaSTA cluster).

See more `documentation for system paasta configs <../system_configs.html>`_

soa-configs
-----------

soa-configs are the shared configuration storage that PaaSTA uses to hold the
description and configuration of what services exist and how they should be
deployed and monitored.

This directory needs to be deployed globally in the same location to every
server that runs any PaaSTA component. See the
`dedicated documentation <../soa_configs.html>`_ on how to build your own ``soa-configs``.

``soa-configs`` also transport the ``deployments.json`` files for each service.
This file contains a mapping for which shas should be deployed where. These files
are generated by the ``generate_all_deployments`` command. This method allows PaaSTA
to inspect the deployments for each service once, and deploy that information in
soa-configs, as opposed to having each cluster inspecting git directly.

Docker and a Docker Registry
----------------------------

PaaSTA uses `Docker <https://www.docker.com/>`_ to build and distribute code for each service. PaaSTA
assumes that a single registry is available and that the associated components
(Docker commands, unix users, Kubernetes Nodes, etc) have the correct credentials
to use it.

The docker registry needs to be defined in a config file in ``/etc/paasta/``.
PaaSTA merges all json files in ``/etc/paasta/`` together, so the actual
filename is irrelevant, but here would be an example
``/etc/paasta/docker.json``::

  {
    "docker_registry": "private-docker-registry.example.com:443"
  }

There are many registries available to use, or you can
`host your own <https://docs.docker.com/registry/>`_.

Kubernetes
----------

PaaSTA uses `Kubernetes <https://kubernetes.io/>`_ to manage and orchestrate its containerized services.
See the `PaaSTA documentation <../yelpsoa_configs.html#kubernetes-clustername-yaml>`_ for how to define PaaSTA
services in Kubernetes.

Once PaaSTA services are defined in soa-configs, there are a few tools provided by PaaSTA
that interact with the Kubernetes API:

* ``setup_kubernetes_job``: Does the initial sync between soa-configs and the Kubernetes API.
  This is the tool that handles "bouncing" to new version of code, and resizing Kubernetes deployments when autoscaling
  is enabled.
  This is idempotent, and is ran periodically on a box with a ``deployments.json`` file in the
  ``/nail/etc/services`` directory, updating or creating the Kubernetes Deployment object representing the modified service instance.
* ``cleanup_kubernetes_jobs``: Cleans up lost or abandoned services. This tool
  looks for Kubernetes instances that are *not* defined in soa-configs and removes them.
* ``check_kubernetes_services_replication``: Iterates over all Kubernetes services
  and inspects their health. This tool integrates with the monitoring infrastructure
  and will alert the team responsible for the service if it becomes unhealthy to
  the point where manual intervention is required.

SmartStack and Hacheck
----------------------

`SmartStack <http://nerds.airbnb.com/smartstack-service-discovery-cloud/>`_ is
a dynamic service discovery system that allows clients to find and route to
healthy Kubernetes Pods for a particular service.
Smartstack consists of two agents: `nerve <https://github.com/airbnb/nerve>`_ and `synapse <https://github.com/airbnb/synapse>`_.
Nerve is responsible for health-checking services and registering them in ZooKeeper.
Synapse then reads that data from ZooKeeper and configures an HAProxy instance.

To manage the configuration of nerve (detecting which services are running on a node and what port they are using, etc.),
we have a package called `nerve-tools <https://github.com/Yelp/nerve-tools>`_.
This repo builds a .deb package, and should be installed on all slaves.
Each slave should run ``configure_nerve`` periodically.
We recommend this runs quite frequently (we run it every 5s), since Kubernetes Pods created by PaaSTA are not available
to clients until nerve is reconfigured.

Similarly, to manage the configuration of synapse, we have a package called `synapse-tools <https://github.com/Yelp/synapse-tools>`_.
Each slave should have this installed, and should run ``configure_synapse`` periodically.
``configure_synapse`` can run less frequently than ``configure_nerve`` --
it only limits how quickly a new service, service instance, or haproxy option changes in
`smartstack.yaml <../yelpsoa_configs.html#smartstack-yaml>`_ will take effect.

Alongside SmartStack, we run `hacheck <https://github.com/Yelp/hacheck>`_.
Hacheck is a small HTTP service that handles health checks for services.
nerve-tools and synapse-tools configure nerve and HAProxy, respectively, to send its health check requests through
hacheck.
Hacheck provides several behaviors that are useful for Paasta:

  * It caches health check results for a short period of time (1 second, by default).
    This avoids overloading services if many health check requests arrive in a short period of time.

  * It can preemptively return error codes for health checks, allowing us to remove a task from load balancers before
    shutting it down.
    (This is implemented in the
    `HacheckDrainMethod <../generated/paasta_tools.drain_lib.html#paasta_tools.drain_lib.HacheckDrainMethod>`_.)

Sensu
-----

`Sensu <https://sensu.io/>`_ is a flexible and scalable monitoring system
that allows clients to send alerts for arbitrary events. PaaSTA uses Sensu to
allow individual teams to get alerts for their services.

The `official documentation <https://docs.sensu.io/sensu-go/latest/>`_ has
instructions on how to set it up.

Out of the box Sensu doesn't understand team-centric routing, and must be combined
with handlers that are "team aware" it it is installed in a multi-tenant environment.
We to do that, we have written some `custom Sensu handlers <https://github.com/Yelp/sensu_handlers>`_
to do that.

Sensu is an optional but highly recommended component.

Jenkins / Build Orchestration
-----------------------------

Jenkins is the suggested method for orchestrating build pipelines for services,
but it is not a hard requirement. The actual method that Yelp uses to integrate
Jenkins with PaaSTA is not open source.

In practice, each organization will have to decide how they want to actually
run the ``paasta`` cli tool to kick off the building and deploying of images.
This may be something as simple as a bash script::

  #!/bin/bash
  service=my_service
  sha=$(git rev-parse HEAD)
  paasta itest --service $service --commit $sha
  paasta push-to-registry --service $service --commit $sha
  paasta mark-for-deployment --git-url $(git config --get remote.origin.url) --commit $sha --deploy-group prod.main --service $service

PaaSTA can integrate with any existing orchestration tool that can execute
commands like this.

Logging
-------

Paasta can use one of several backends to centrally log events about what is happening in the infrastructure and to
power ``paasta logs``.
The backends that are available are listed in the `system config docs <../system_configs.html>`_ under ``log_writer``
and ``log_reader``.

At Yelp, we use `Scribe <https://github.com/facebookarchive/scribe>`_ for log writing, so we use the ``scribe`` log
writer.
For reading logs, we have some in-house tools that are unfortunately not open source.
The code that reads from these in-house tools are the ``scribereader`` log_reader driver, but this code relies on some
not-open-source code, so we do not expect that logging via Scribe will work outside of Yelp.

The ``file`` log writer driver may be useful for getting log data into your logging system, but files are not generally
aggregated across the whole cluster in a way that is useful for ``paasta logs``.
We are in need of alternate log reader driver, so please file an issue (or better yet, a pull request).