Alexander Schwartzman has written a good article summarizing the lessons learned from using Schematron and DTDs together over multiple years for a non-trivial DTD.
JATS Subset and Schematron: Achieving the Right Balance from the Journal Article Tage Suite Conference 2017 is now online.
Alexander is mainly concerned about whether you should subset a standard DTD or instead use Schematron rules to point out deprecated elements, as a second layer. His thoughts would apply just as much to RELAX NG and XSD, I think.
He gives many examples where Schematron is clearly the better approach, and otherwise comes down in favour of using DTDs (grammars) for quasi-static constraints and Schematron for quasi-dynamic constraints: you upgrade the DTD rarely and with attention to ramifications, you upgrade the Schematron as often as you find something new to : this seems a very workable approach, and probably is at heart an application of Conway’s Law. (But if his quasi-static versus quasi-dynamic demarcation holds water, does that mean that XSD 2.0 style assertions miss the mark, since they are appropriate for quasi-static constraints only?)
Alexander also makes a strong point that subsetting the DTD to only the elements that you actually need can reduce the number and complexity of the Schematron rules too.
What is perhaps the most interesting aspect of the article is that it is, in a sense, a follow up article to one made seven years earlier in the same conference in 2010: Superset Me—Not: Why the Journal Publishing Tag Set Is Sufficient if You Use Appropriate Layer Validation which has the abstract
So we have a good history of the experience of the publisher, first the issues they found by extending a DTD and realizing that a validation layer would have been better, and then by seeing that that validation layer would be even better by more subsetting the DTD, to avoid maintenance effort.
Of course, every project has a unique story. But we ignore lessons learned at our (projects’) peril.