Josh Berkus
A very excisting time for PostgreSQL, since the final commitfest for PG 9.0 development.
8.5 had gone by the by because the feature set is going to be SO DAMN HUGE. 64 bit windows, exclusion constraints, JSON/XML EXPLAIN output, host standby, sync replication.
What’s a commitfest? It’s a way of dealing with scarce committer and reviewers time; review patches faster, sooner, every patch, and train reviewers. People were getting patches knocked back or forgotten because of a lack of reviewer time, which was making people unhappy.
Every other month during the development period, we clear the queue of patches; this also makes it transparent as to what’s happening in the development world. It lets you help with failing patches if you care about them, too.
Four commitfests, followed by cleanup, beta testing, and then a final release. Version 9 should be released mid-way next year.
What’s in 9.0?
Hot Standby
This is a much-anticipated feature, by Simon. He’s the top contributer of features in the last wee while.
8.0 got point in time recovery from crash logs; Simon realised he could recover from other locations, not merely DR in the same location. Take a snapshot of the server; ship the transaction logs; replay them against the snapshot when the primary fails.
Unfortunately it’s not a great failover mechanism, since it can take some time to replay the logs.
This lead to warm standby in 8.3; you apply the logs to the standby as they arrive from the primary. This cuts the time down from hours to seconds or minutes. This does have the shortcoming that you’re wasting the standby hardware, in as much as it won’t run queries or do anything useful.
Alvaro then started looking at this problem. We have a problem with transactions; everything in PostgreSQL is transactional, not just for writes, but also reads recieve a transaction ID to keep track of what’s visible when. This prevents a standby, from working, and high-read-rate DBs can run out of TIDs.
Thus were born “ghost TIDs” that can be served for read-only queries.
Some configuration may be required, because we didn’t have enough config files already.
Config logs to be shipped to the remote, and then use the recovery.conf to specify a recover tool to import the logs. Now you have a hot standby, lo, as if by magic.
By default there’s a window between the server where commits may take time to flow to the slave.
One problem is that masters may perform options that can upset the slave; vaccuum is a great example; the vacuum is a GC operation and could cause conflicts with queries running on the slave. We have a maxstandbydelay which allows the slave to flag it has long-running queries of this sort, put replication on hold. Once you exceed that delay, however, the logs will be applied and queries will fail with a particular error.
Streaming Replication
NTT have some different problems; they use PG extensively, and have firm requirements: zero data loss, seconds of downtime.
Fujii Masao and Itagaki Takahiro started working on this in the form of synchronous Log-Shipping Replication. While a useful solution, it was very specific to the NTT use cases and data structure. Reworking some of that created streaming replication.
Create standby snapshot; then the slave creates a walreciever, and the master uses a walsender to ship the logs in realtime instead of chunks. Logs can also be retained in case the slave gets so out-of-date that it’s better to bulk-update it rather than to rely on streaming replication.
It’s another set of configs in recovery.log.
One nice thing about this approach is that there are no limits with this approach; you can propogate DDL and DML equally with no special functionality.
EXPLAIN ANALYZE enhancements
Robert Haas and Greg Sabino-Mullaine have made some improvements. The current format is decently human-readable, but hopeless for programs to read; this had been enhanced with some new formats, wrapping it in YAML, XML and JSON. YAML is if anything clearer for a human than the default output.
Incidentally, patches must generally be accompanied by documentation, or they’ll be rejected.
Aside: Why is there only one contributer from New Zealand? PG needs developers, but it also needs testers for 9.0, performance testing and the like. pgsql-hackers, pgsql-testers.
PostgreSQL is also moving to git, incidentally.
QUESTIONS
Can you work off a commonly-mounted clustered filesystem instead of shipping logs for hot standby, or block-level replication in your SAN? You can used them for the initial snapshotting, but the streaming assumes a network shunt.
How does streaming replication compare with DRBD? They have some similarity, but there’s no scaling, just recovery. If there’s no load balancing requirement, then it’s simpler, but higher overhead on the network.
News on the collation front? Nobody has been working on that this year, it’s a big todo. There’s a lot of work for per-column stuff. If you have C skills or money, your love is welcome.
Is there a heartbeat? Not in the release code. It was specific to NTT. May add monitoring hooks.
How do you handle multiple slaves and slave promotion? Streaming is async, so it’s easy (for 9.1 there will be a syn option), errors are thrown on the slave. There’s a trigger file or command that will allow you to promote a slave to a master.