Postgres Postmaster does not scale

(recall.ai)

73 points | by davidgu 14 hours ago

10 comments

  • kayson 1 hour ago
    > sudo echo $NUM_PAGES > /proc/sys/vm/nr_hugepages

    This won't work :) echo will run as root but the redirection is still running as the unprivileged user. Needs to be run from a privileged shell or by doing something like sudo sh -c "echo $NUM_PAGES > /proc/sys/vm/nr_hugepages"

    The point gets across, though, technicality notwithstanding.

    • timetoogo 28 minutes ago
      Great point, I was running as root so I didn't pick this up. Corrected, thank you!
    • thayne 55 minutes ago
      Or

          echo $NUM_PAGES | tee /proc/sys/vm/nr_hugepages 
      
      
      I've always found it odd that there isn't a standard command to write stdin to a file that doesn't also write it to stdout. Or that tee doesn't have an option to supress writing to stdout.
      • axiolite 45 minutes ago
        You forgot the "sudo" before "tee"

        > write stdin to a file that doesn't also write it to stdout

        You mean like "dd of=/path/file" ?

  • haki 1 hour ago
    Some a prime example of a service that naturally peaks at round hours.

    We have a habbit of never scheduling long running processes at round hours. Usually because they tend to be busier.

    https://hakibenita.com/sql-tricks-application-dba#dont-sched...

    • abrookewood 1 hour ago
      I wish more applications would adopt the "H" option that Jenkins uses in it's cron notation - essentially it is a randomiser, based on some sort of deterministic hashing function. So you say you want this job to run hourly and it will always run at the same minute past the hour, but you don't know (or care) what that minute that is. Designed to prevent the thundering herd problem with scheduled work.
  • mannyv 1 hour ago
    Note that they were running Postgres on a 32 CPU box with 256GB of ram.

    I'm actually surprised that it handled that many connections. The data implies that they have 4000 new connections/sec...but is it 4000 connections handled/sec?

  • atherton94027 1 hour ago
    I'm a bit confused here, do they have a single database they're writing to? Wouldn't it be easier and more reliable to shard the data per customer?
    • hinkley 32 minutes ago
      When one customer is 50 times bigger than your average customer then sharding doesn't do much.
      • BatteryMountain 14 minutes ago
        Combination of partitioning + sharding perhaps? Often times its is only a handful of tables that grows large, so even less so for a single large customer, thus sharding that customer out and then partitioning the data by a common/natural boundary should get you 90% there. Majority of data can be partitioned, and it doesn't have to be by date - it pays dividends to go sit with the data and reflect what is being stored, its read/write pattern and its overall shape, to determine where to slice the partitions best. Sometimes splitting a wide table into two or three smaller tables can work if your joins aren't too frequent or complex. Can also help if you can determine which of the rows can be considered hot or cold, so you move the colder/hotter data to a separate tables to speed up read/writes. There are always opportunities for storage optimization large datasets but it does take time & careful attention to get it right.
    • atsjie 1 hour ago
      I wouldn't call that "easier" perse.
  • vel0city 2 hours ago
    Isn't this kind of the reason why teams will tend to put database proxies in front of their postgres instances, to handle massive sudden influxes of potentially short lived connections?

    This sounds exactly like the problem tools like pgbouncer were designed to solve. If you're on AWS one could look at RDS Proxy.

    • pmontra 1 hour ago
      The article is very well written but is somewhat lacking at the end.

      The conclusion lists pgbouncer as one of the solutions but it does not explain it clearly.

      > Many pieces of wisdom in the engineering zeitgeist are well preached but poorly understood. Postgres connection pooling falls neatly into this category. In this expedition we found one of the underlying reasons that connection pooling is so widely deployed on postgres systems running at scale. [...] an artificial constraint that has warped the shape of the developer ecosystem (RDS Proxy, pgbouncer, pgcat, etc) around it.

      The artificial constraint is the single core nature of postmaster.

      Other points at the end of the article that can be improved:

      > we can mechnically reason about a solution.

      Mechanically as in letting an AI find a solution, or as in reasoning like a mechanic, or? Furthermore:

      > * Implementing jitter in our fleet of EC2 instances reduced the peak connection rate

      How? Did they wait a random amount of milliseconds before sending queries to the db?

      > * Eliminating bursts of parallel queries from our API servers

      How?

      • tbrownaw 1 hour ago
        > Mechanically as in letting an AI find a solution, or as in reasoning like a mechanic, or?

        As in it's fully characterized, so you can use only math and logic rather than relying on experience and guesswork.

    • evanelias 2 hours ago
      Also check out ProxySQL [1][2], it's an extremely powerful and battle-tested proxy. Originally it was only for MySQL/MariaDB, where it is very widely used at scale, even despite MySQL already having excellent built-in scalable threaded connection management. But ProxySQL also added Postgres support too in 2024 and that has become a major focus.

      [1] https://proxysql.com/

      [2] https://github.com/sysown/proxysql

      • sroussey 1 hour ago
        And lets you rewrite queries on the fly. :)
  • levkk 1 hour ago
    One of the many problems PgDog will solve for you!
    • eatonphil 1 hour ago
      The article addresses this, sort of. I don't understand how you can run multiple postmasters.

      > Most online resources chalk this up to connection churn, citing fork rates and the pid-per-backend yada, yada. This is all true but in my opinion misses the forest from the trees. The real bottleneck is the single-threaded main loop in the postmaster. Every operation requiring postmaster involvement is pulling from a fixed pool, the size of a single CPU core. A rudimentary experiment shows that we can linearly increase connection throughput by adding additional postmasters on the same host.

      • btown 1 hour ago
        You don't need multiple postmasters to spawn connection processes, if you have a set of Postgres proxies each maintaining a set pool of long-standing connections, and parceling them out to application servers upon request. When your proxies use up all their allocated connections, they throttle the application servers rather than overwhelming Postgres itself (either postmaster or query-serving systems).

        That said, proxies aren't perfect. https://jpcamara.com/2023/04/12/pgbouncer-is-useful.html outlines some dangers of using them (particularly when you might need session-level variables). My understanding is that PgDog does more tracking that mitigates some of these issues, but some of these are fundamental to the model. They're not a drop-in component the way other "proxies" might be.

  • vivzkestrel 1 hour ago
    very stupid question: similar to how we had a GIL replacement in python, cant we replace postmaster with something better?
    • lfittl 1 hour ago
      Specifically on the cost of forking a process for each connection (vs using threads), there are active efforts to make Postgres multi-threaded.

      Since Postgres is a mature project, this is a non-trivial effort. See the Postgres wiki for some context: https://wiki.postgresql.org/wiki/Multithreading

      But, I'm hopeful that in 2-3 years from now, we'll see this bear fruition. The recent asynchronous read I/O improvements in Postgres 18 show that Postgres can evolve, one just needs to be patient, potentially help contribute, and find workarounds (connection pooling, in this case).

  • moomoo11 1 hour ago
    maybe this is silly but these days cloud resources are so cheap. just loading up instances and putting this stuff into memory and processing it is so fast and scalable. even if you have billions of things to process daily you can just split if needed.

    you can keep things synced across databases easily and keep it super duper simple.

    • sgt 21 minutes ago
      It's not really my experience that cloud resources are very cheap.
  • clarity_hacker 41 minutes ago
    [dead]
  • parentheses 1 hour ago
    I think this is the kind of investigation that AI can really accelerate. I imagine it did. I would love to see someone walk through a challenging investigation assisted by AI.