Architecture overview

This page describes Perforator's architecture.

Intro

Perforator consists of several components. There are four different kinds of components:

  1. Perforator agent which is running on each node of the profiled cluster
  2. A few microservices to collect, store, analyze, and symbolize profiles & executable files from the agents. We run our microservices inside the same cluster Perforator profiles.
  3. A web-based user interface.
  4. A set of databases: ClickHouse to store profile metadata, PostgreSQL to store binary metadata, and S3-compatible store to store raw profiles and binaries.

Such design has been proven to be effective enough to scale to tens of thousands of nodes and petabytes of profiles.

High-level schema

Server #1Container AContainer BContainer CPerforatorAgentCollect profileServer #2Container DContainer EPerforatorAgentCollect profilePush profilePush profileGet serviceprofileStore binary metaStorageStorageStoragepodsSymbolizerpodsPerforator databasesPostgreSQL(binaries)ClickHouse(profiles)S3-compatibleobject storeStore profile metaStore binary/profileDownload matched profiles/binariesFind binary metaSelect profiles matching {selector}

Components overview

Agent

The Perforator agent is the core of our profiling infrastructure. The agent runs on each node inside the cluster, collects profiles using eBPF, aggregates, compresses and sends them to the storage via gRPC in the pprof-compatible format.

Agent connects to kubelet to identify running pods on a node. Also agent tracks all the processes on a node and analyzes running executables. To profile, the agent uses perf_events Linux API to trigger eBPF program on each perf event like "1M CPU cycles" or "1K major pagefaults". The eBPF program collects info about thread it was invoked on like thread / process id, thread / process name, userspace & kernelspace call stack and so on. The program sends collected samples to the user-space part of the agent via eBPF perfbuf API.

eyJ2ZXJzaW9uIjoiMSIsImVuY29kaW5nIjoiYnN0cmluZyIsImNvbXByZXNzZWQiOnRydWUsImVuY29kZWQiOiJ4nOVcXFtT20hcdTAwMTN9z6+g+F7XszM9933jflx0XHUwMDEwMFx1MDAxMCC7Wylhy9ixkVx1MDAxY1lcXLf2v389hiBZXHUwMDE3I8A2uFakXHUwMDEyXCKNNKNRnz6ne1r659PCwmJ81/dcdTAwMTf/WFj0b1x1MDAxYl6v04y8m8Xf3P5rP1x1MDAxYXTCXHUwMDAwXHUwMDBmwfD/g/AqalxmW7bjuD/44/ffkzNII7x8OMvv+Zd+XHUwMDEwXHUwMDBmsN2f+P+FhX+Gf6f6ifxG7Fx1MDAwNVx1MDAxNz1/eMLwUKorTrN798Jg2C2jVlpGXHUwMDA1U08tOkHTv3XXPOcrydU6g1VcdTAwMWNF7DfxSMvrXHL85IjbtVx1MDAxOH7Z+bHp+Yfr+j7q7vhcdTAwMTeNPXl/mZze6vR6h/Fd7+FWvUb7KkpccnVcdTAwMTBHYdc/6TTj9q+ZSe1/Om9cdTAwMTDixCRnReHVRTvwXHUwMDA3g5Fzwr7X6MR3w3tLbvthbv5YSPa4W2ScamKNpKCpXHUwMDE1YE1yXHUwMDE5d1x1MDAwMWUpoUxyy4SWQmSGtVx1MDAxMvbCyFxy63/Mdz/JwM69RvdcdTAwMDJHXHUwMDE3NJM2rZbfsDZpc/N4s9qYp31tv3PRjnEnXHUwMDE3NunMXHUwMDFmzjnjQK1cdTAwMDaTelCui/5Wc2hcdTAwMTV/JzNcdTAwMWR5l/6WOyW46vXSk1x1MDAxNTRcdTAwMWYn65f1JPbDXHUwMDFm9/yb3INrv5ayu6SHq37TezBcdTAwMDSmuWFaaoNDk0/He52gm+2+XHUwMDE3NrqJ7XxK9ZUz5di/jYusmHFTasVcdTAwMWGUNVZKKLLi1epWXGbt3ln78/3KTZM34/O98Me56e2UWHHGXHUwMDFh38+GgY61YTMpXHUwMDFijiMvXHUwMDE49L1cYi1cIm/HwFxy4biBXHUwMDAxoammPG/VQlx1MDAxMKZoalx1MDAxM1kjXHUwMDA3ramSOjXMV9v4yIGpXHUwMDFhczKqMIhcdTAwMGY790M8KWJo8a26VuveZafnnk4yS0Orx0n+7EeB38NpbqRcdTAwMWWGO7jU61xcOFx1MDAxOCz2/NYoPuJcdTAwMGUyxtPhOOwnR1x1MDAxYtiZ11x0/Cg/WWHUuehcdTAwMDRe72hcXMfeVVx1MDAxY9b9wcNNxdGVn55cdTAwMTd/89eTZVx1MDAwNMZCeiw7SSqze3/hmlx1MDAwYmHRokxi8SlYr1WH9fXP7ct6sF/bvvnWU0HH37nY7fbmgJzkXHUwMDE4YDPOXHUwMDE5sUJohza8mlGZkb2Mn/zz1rnv53EtXHJcdTAwMTCJqFx1MDAwNKpcdTAwMTGXnLFcIrZSbpzWKKFccjPUJM/zibyMXHUwMDE2eLpIPPh/jLzQM5ZcdTAwMTm5YYKDlrxQga1XN/LW4fZ1KNeOumZ7c39X22/b3c+tj85dTIw3cTM5XHUwMDEzXHUwMDFmS1/MyrxhP09YTOLTXHUwMDEzgjF4u2HPK2NcdTAwMWRcdTAwMGbQnmfPV1x1MDAwNd3Ogq2EXHUwMDEx2b1PQFZcXKNEV0mDXHUwMDE0kDeqXHUwMDAzuau+N7aO1ronuz9XgyNf7Vx1MDAxY2yv1EuA3IjCwaDW9uJG+1x1MDAwM8BZUE5cdTAwMThcdTAwMDW0SWBWZdEs0Ma4MZopxbjmTGZcdTAwMDb2MsI6h1x1MDAxNpyf55GMXHUwMDEylGhtpOZKM2lcdTAwMGJ0KDOMUIugXHUwMDExSoLmObZcdTAwMTLGcsmUfa9I6+mc5OwyhklZjVx1MDAxNFx1MDAxYsvHn8XGXHS7bq39XGLvd+Bwny4+tfv3t+LLPpxs/Kugdqv11z26eb/7RUWrOtpcdTAwMWHt5Vf/Xlx1MDAxNIU3Va8rjo+BmbONL1v33Vxys3p1s7l5PZjAdTV0v8mNuzV5uFx1MDAxMt7VY14/OG/Iatd9/O3dXHUwMDE1gVx1MDAwMF3mSFx1MDAxMEdcdTAwMTJcdTAwMWSwKU7KbFb3JMU2MTPdm2jFl3lcdTAwMTGNwlx1MDAxNlx1MDAxOFx1MDAxM1x1MDAxONBLlJTJPD24XHUwMDExo4miglOKXG5cdTAwMTij/jd5kbK0XGbqXHUwMDAxgo9BQYEs0IYopSxoYFxugEI+jKVcdTAwMTKHZpR+d1WgXmDFharA+VJtkNlcdTAwMDSllmk90miMKNj3o1ZcdTAwMTh5cVx1MDAxOP1cdTAwMTV4XHUwMDE3I5JrRFx1MDAxYzTwkFx1MDAxZo2RXHUwMDA3l51mM83Eo1xu4TmuzIqGMcOahXjgulx1MDAxY/NUMK5UOkWTwvxWdcxvrG/a+NA7uV2r7ar6UWdpnTKYXHUwMDBi9YCaYEwwoFx1MDAxNUwwXHUwMDE527CNXHUwMDAy1Fx1MDAwYqVcdFfJT9JLXCJcdTAwMWVkPjXLqFx1MDAxNNxcdTAwMDBMIG01Q8FcdTAwMTDfn6nV02/N7yvrJ+trXHUwMDExKLV7XHUwMDFivbdgOD0xe1x1MDAwN1tcdTAwMTer33rN79dM7Vx1MDAxZVx1MDAxY6qtlVx0XFyXX1x1MDAxZtuz6/PtlS/Rqr+5e7p+x0MzX4JBKpbd++Q8XGZH3yx0cVx1MDAwZWG7uvMotomZOo/Xilx1MDAwNs5cdNfCSE6RrlhcbrruXHUwMDAyhjGijHbZaS5cXJZ6Ku6DXHUwMDBiQ6Th1lxirlx1MDAwYnSDtEThQ8RcdTAwMDBcdTAwMTF9XHUwMDA1w1x1MDAxMCPnSKjGcFx1MDAwNJI7m1PVICzRXGbnXHUwMDE5J1xcol1aqJxL2OlcdTAwMDRXt1x1MDAwYt1hKnpKiuFcdTAwMTl+zCqG4lx1MDAxMc1CLCBcdTAwMWaW4Z1LTVx1MDAwMUrg/rk63PdbvbOj+krQ3/m58+1GXHUwMDFm26Xayv1cXGhcdTAwMDXLxmlcdTAwMDUzQa3g++d+yyuIXHUwMDEwoEhcdTAwMWTQfO5bXHUwMDE5Q7k0cr7SXHTc/7pUX7H9xsbyTrB5oI6/3qrtXHUwMDE5sO3Y674hnfBRWJxB6UJcdTAwMDDnQoCgxatdO9VRXfzo5oHELZI4Rv9aSCNcdTAwMWOqR0GtXHUwMDE54U+g5vatgX8xh1x1MDAxYkkoIE1cdTAwMTdcdTAwMDX+Rlx1MDAxMew2tWWxjsSOdMfk/Mf9amTvXHUwMDE4xvaX99f/XG76UXgxJb5+hqOyfF00npmwNdWlRVbWMtBW8cLqlN3qwO5cdTAwMWRcdTAwMWTY/eWDev1uc+NzcFx1MDAxYdSCn1/CeaBr4JRcdTAwMTIppFQ4XHJuMS+jzrUkyI+AbG6EtPptyPZk07RaeWRcdTAwMGKhiED/oVA8YaRgU4ScLGVTTqRVLohQlGqZctaPXHUwMDEwt1xiMMPMJFJ7M2Tz472jr8taRmtcdTAwMDdn7Vx1MDAxZrXrXbF2fNKaQUz+YViXs/JcdTAwMTJI7N9KyUxh5m2vOjyLJ3lcdTAwMGV4XHUwMDE3hHLrdlx1MDAxY4RcdTAwMDXLqVJ8XHUwMDA0nlx1MDAxNnmPcYUqllx1MDAxYUHTWe1JXHUwMDEyL1x1MDAwM6R3t/DMqVx1MDAxY9pFQaWJ4Vx1MDAwNKU058jQzGVLIYtOpnGYysJcdTAwMDTqTN53OV5Zwqxi1EiJXHUwMDAxjmZ2pNFcdTAwMTg+Xtk/nlx1MDAxMlx1MDAxMT/DPllcIlx1MDAxZVx1MDAxOciEXHUwMDE4+MG9XHUwMDE04NuW15BcdTAwMTmhhMI5LET3l+roXHUwMDFln/GcXHUwMDFjupveoO1PeFneXG5Xw0U1XCJbK2ZH2ZdcdFBEW1BcYn1hMLB+XHUwMDFi/Zaty1x1MDAwM1x1MDAxYbRx0Y2L2I1FlZ/Ht7BcZilcdTAwMWGdkOXALbU0VfDzXHUwMDAwcMlQRqnpxNIwLfatb3R19FXex/21XHUwMDFkXHUwMDE5bzSu4/bF8nRZMvVcYr0oXkaT71x1MDAwNFx1MDAxN6N3/fhSQZV1tKHbaVxcuSmoUVwikCs1ZdxIXHUwMDFjWqp+wj1ccq/v0Og0XHUwMDE0XHUwMDA1LpXAXHUwMDE2qLlyXHUwMDEz61x1MDAwN83nxzQ+U5dcdTAwMWFcdTAwMTNcdTAwMGVJXHUwMDAzlVx1MDAwMs2bXHUwMDFiKlx1MDAxOIXckJggXFxcdTAwMWFKXHUwMDAxnPBDQOjcmHreIF5cdC8vOzFO/X7YXHTi7Fx1MDAxNFx1MDAwZudyyfmgtu/ljFx1MDAwYu8pfSzrrPruiqNG9GfKVGjabujT73//Vti6XHUwMDFjTG6r5XGUXFzwU/rfXHUwMDE3lzFCdueTm9XMSFZSxbhf3c1cdTAwMTZjZS7cLEokYlx1MDAxNFx1MDAxM1xujZ7TrJdlXFxcdTAwMTNj0MWijSqbLkyaqJeVnCj2yvRcdTAwMDVHesCBq7lfgKievjj046v+X4HLXHUwMDFhLLikXHUwMDAx3tmU5NMz+iErn8ZcdTAwMGVs2nJKlC81XHUwMDAy5Vxc2Fx1MDAxMSNJXHUwMDAx/aA60MdHlVx1MDAxZlx1MDAxOejAjWNcdTAwMTMmKTOu5DExtmEyQypiQFx1MDAxOae1XHUwMDA0lfC2aKlcZudcdTAwMTIkUVJLsCjnXGYv0FKaXHUwMDEyzlx1MDAxOeAwrESHw/Jo11x1MDAwMtmSTqcmf2paqq+jlaW7XHUwMDEzr1x1MDAxZZx4VFx1MDAwNeHX3a1TeJGWMlx1MDAxNNK1d5PXUuNcdTAwMDOlXHUwMDEx3YKESSVFtYRhK3K1UKlWj8LFXHUwMDEyikE5V8JVmUhl0tfp3I4qlSnrLNR+XG5cclx1MDAwNlxyXHUwMDFmeUQwVIF57cc04Vx1MDAwMu/KsuFKt3jZeOdJg9VcZlx1MDAxMI3PXHUwMDBlkNBBKi1V+vRaXHUwMDBlf89er1xm08ODSlx1MDAxMaFAXGJXcYguR5tcdCk6U/5SpXLezYAo9PT16p6+XHUwMDE4snPh6Vx1MDAxOeVcdTAwMDQwzuHKKlxynI0mxjR3Mls42asxXHUwMDEwmo6nx7BcdTAwMDXNjFwiJMG9VsletzDF3Vx1MDAxMJl617yYtTr1XvO0ld1cdTAwMTFcbqpcdTAwMGI/Wug42Vx1MDAxNl31Y1x1MDAxNFNeo72wt9C4a/T8wZRk3jOyJivzqo9y2pqvvNqEoeVbjsI8sZ6UIzh8wcL02LKAj+xcYphcdTAwMTGGKFx1MDAxN0uj6MNcdTAwMWbILE1j5GeZS9RKKmxcbmWTlXxcdTAwMDRVgNOdXG6jNCZcdTAwMGK+XHUwMDFiXHUwMDAwijBcdTAwMDPDKIorSW1BNYqQwJSe0sstXHUwMDE1VN97JrleXCJ0XHUwMDA0indjJFx1MDAxN4JZzpRNtXpcdTAwMTA6mlx1MDAwMFI1iiE3oYZJ8TrxNX55e3RM2JEwVFxibjVcdTAwMWGBKVx1MDAxMItALFx1MDAxMlx1MDAxNapcdTAwMTJcdTAwMWNcdTAwMTJCVuczb/OksEot3m15W3+hXCIq9YRSl5boMFRcdTAwMDRS2TTEU67wqLorXHUwMDFjX8n0oV2h1Zag2GFgqKJKZ8Jfd1x1MDAxNOGAglx1MDAxN9BXgn3bW+mloogxXCIlt+Cs3Vx1MDAxOUfRYoLgXHUwMDA05SuKXHUwMDBll/fF0Clp82sxXHUwMDAxpfRkXHUwMDE2XHUwMDBiZ1x1MDAxOFx1MDAwMFx1MDAxZq2tXHUwMDFl3W7WWtdw293Z2Nk++rl/f/SxXHUwMDE2XHUwMDEzXuLTkE9cdTAwMTlw0FxcXHUwMDBidKaGp1o9+TSlXHUwMDE0oHhcdTAwMDRcZqTwT35mKznayitcdTAwMWNcdTAwMTiVXHUwMDBmQU6FNK7QTFx1MDAxNaxwMI5qwL2lp6hBy8KJnWtHWyvHk9vySHqhqy1cdTAwMGI+oXzV1kiqcYZcdTAwMGJjz+PqfrZcdTAwMTgtc+FntUJcdTAwMTViXHUwMDAwbVxm6d5wPVrizJBcdTAwMGVcdMWQVFx1MDAxYoV0P608I0OatUK+MuxE5WStXHUwMDAz03uGnbNdUKj7/TDCKG7gXfZ7ZZ9EeGuQ+Yx6yFx1MDAwNpllY5pJXSRLSYCsmuJcdTAwMDKda9n3fb5WR7mNlrZpp2HXa/f8plx1MDAxN4X0dDlcXJ9cdTAwMDLKJ19cdTAwMThcdO7zXFzodLlcdTAwMTGObjJiypWNXHUwMDEyt2bt1rZBvvGLXHStRlx1MDAwM5p6WoWRqFx1MDAwNy3Sg57E60szrIxcXDptf/F/yFx1MDAxZmsrZ9/XRc2efO8u2//QZ1ximJCllZHGwVNcdFu4pn9SXHUwMDFknsVzPFN4vrIwkmtDUFxmMVe5XGZajGZ9XHUwMDE4N1x1MDAxNo9yKnGKXFw9xtuW9MtcbiPBSIzqSzn42ZJIt9wlXGbnc0/Cr66J3PPjmzDqTomMnyGfLFx1MDAxOedcdTAwMDYz7cSuYaVFOyBRYjNLXHUwMDBiayNPq1x1MDAwM3y8XHUwMDFm/NgqW3KCvFx1MDAwNspcbvfugcxU7UgmiORggSlcbjSdK3hccsSb9LxV9GqCXHUwMDA0hvGklVx1MDAxNLBD9CRcdTAwMDU4t4IoadBcdTAwMWZLQPPHgCCLc8RcdTAwMDRwOVx0lM8wmeEvtevfZWvb34uWT1x1MDAwZvun2q+dmo+VzHhJZSSgXHShv7WWu5dcXExBXHUwMDE5XCKqPWxcdTAwMDNUWs5cdTAwMDFcdTAwMWR3bmIr5TLGu5yFdC5cdTAwMDN9v0Zdyaw2zpjzqVxmXHUwMDAwl4NxXHUwMDE1XGKIXHUwMDE2bVFo5sY0T6lcZq5cXE7JXGKgXHUwMDAy406JtJk+nUvCcFx1MDAxMqxcdTAwMDKttTXKPHe5UmS6rSYtkUJQbsF9o9C8NFx1MDAwM136pYfS/LNER6SFKl6JO6vusItxN1x1MDAxN1x1MDAwZVx1MDAxYi2V4IM1WnNLU4rowV1cdTAwMDNcdTAwMDJMaMnQ5q2xb1xmmMrcNdpcdTAwMDMxJUWWgqNISW+5XHUwMDEydqBIJaNcdTAwMWarnlNBVj0rsnbrMlx1MDAxMK6QXHUwMDExTWtaWZFnVEhWiJWN6aV67NOj81j0+v3DXHUwMDE4p/nJcS9ed/yb5UKp7zbHc0PX4PDmXHUwMDBmmfffT//+XHUwMDFmVeVyXHUwMDAwIn0=KernelspaceUserspacePerforatoragentLinux kerneleBPFprogCPUSetupeBPF programTrigger interrupteach N cyclesReportsampleNetworkExport profile

Agent collects samples from the eBPF program in memory and periodically sends them to the storage over gRPC. By default, for each pod an agent will send a profile every minute. We call this profile, consisting of the samples of one workload over one minute, an atomic profile.

In order to generate human-readable profile, the agent should map addresses of executable instructions to source code lines. That process, called symbolization, is compute-intensive. If the same code is executed on thousands of nodes, the agent should run the same symbolization on each node, which is proven to be really expensive. So we took another approach.

In addition to profiles, agent uploads executable files found on the profiled nodes. This can sound scary, but with careful synchronization we guarantee that each binary file is uploaded only once. The binaries are uploaded through storage microservice to S3 and PostgreSQL storages, and can be post-processed later to generate symbolized profiles. Binary files are identified using build-id: unique identifier of the compiled binary, which is often computed as a hash of meaningful ELF sections. Some executables do not contain build-id, so we compute so-called pseudo build-id: hash based on a few random portions of executable text.

Note

While in theory we can support external artifact registries, this feature is not available now. Feel free to discuss and contribute.

Databases

Perforator uses three different storages. All the other microservices are stateless.

  • Raw atomic profiles, binaries, and some post-processed artifacts of binaries (GSYM data) are stored in the S3-compatible storage. S3 is cheap and scales well.
  • For each atomic profile we write a row to the ClickHouse cluster. This allows us to quickly find interesting profiles using complex selectors, because ClickHouse is blazing fast. See ClickHouse table structure for details.
  • For each executable file we write a row to the PostgreSQL cluster to synchronize uploads and track binary TTL. We have found that there is not too many executable files and they can be easily stored in an PostgreSQL. However, it is quite easy to use your custom SQL database instead (probably a few hundred lines of Golang). See the source code for details.

For more details about database structure see database reference.

Storage proxy

Storage is a simple gRPC server that proxies upload requests from agents to ClickHouse, PostgreSQL and S3. Storage proxy is stateless, despite the name, and can (and should!) safely be replicated. We run hundreds of storage pods in our largest cluster. Storage can optionally provide agent authorization using mTLS.

Symbolizer

Symbolizer (or proxy in the source code) is a main user-facing service. Symbolizer allows user to list profiles or services, build merged profiles that span multiple atomic profiles matching one selector and so on. Symbolizer consumes a noticeable amount of resources: symbolization is heavy. We are working to optimize this. Likely the symbolizer itself is often running in the same cluster we profile and can be easily self-profiled.

Symbolizer provides two variants of the same API: raw gRPC interface and HTTP RPC using beautiful grpc-gateway. The gRPC interface can be used by CLIs and automated services like profile quality monitors while HTTP API is used by our web interface.