41bafc4ff3
Accept additional user-defined syntax classes in fenced code blocks Part of #79483. This is a re-opening of https://github.com/rust-lang/rust/pull/79454 after a big update/cleanup. I also converted the syntax to pandoc as suggested by `@notriddle:` the idea is to be as compatible as possible with the existing instead of having our own syntax. ## Motivation From the original issue: https://github.com/rust-lang/rust/issues/78917 > The technique used by `inline-c-rs` can be ported to other languages. It's just super fun to see C code inside Rust documentation that is also tested by `cargo doc`. I'm sure this technique can be used by other languages in the future. Having custom CSS classes for syntax highlighting will allow tools like `highlight.js` to be used in order to provide highlighting for languages other than Rust while not increasing technical burden on rustdoc. ## What is the feature about? In short, this PR changes two things, both related to codeblocks in doc comments in Rust documentation: * Allow to disable generation of `language-*` CSS classes with the `custom` attribute. * Add your own CSS classes to a code block so that you can use other tools to highlight them. #### The `custom` attribute Let's start with the new `custom` attribute: it will disable the generation of the `language-*` CSS class on the generated HTML code block. For example: ```rust /// ```custom,c /// int main(void) { /// return 0; /// } /// ``` ``` The generated HTML code block will not have `class="language-c"` because the `custom` attribute has been set. The `custom` attribute becomes especially useful with the other thing added by this feature: adding your own CSS classes. #### Adding your own CSS classes The second part of this feature is to allow users to add CSS classes themselves so that they can then add a JS library which will do it (like `highlight.js` or `prism.js`), allowing to support highlighting for other languages than Rust without increasing burden on rustdoc. To disable the automatic `language-*` CSS class generation, you need to use the `custom` attribute as well. This allow users to write the following: ```rust /// Some code block with `{class=language-c}` as the language string. /// /// ```custom,{class=language-c} /// int main(void) { /// return 0; /// } /// ``` fn main() {} ``` This will notably produce the following HTML: ```html <pre class="language-c"> int main(void) { return 0; }</pre> ``` Instead of: ```html <pre class="rust rust-example-rendered"> <span class="ident">int</span> <span class="ident">main</span>(<span class="ident">void</span>) { <span class="kw">return</span> <span class="number">0</span>; } </pre> ``` To be noted, we could have written `{.language-c}` to achieve the same result. `.` and `class=` have the same effect. One last syntax point: content between parens (`(like this)`) is now considered as comment and is not taken into account at all. In addition to this, I added an `unknown` field into `LangString` (the parsed code block "attribute") because of cases like this: ```rust /// ```custom,class:language-c /// main; /// ``` pub fn foo() {} ``` Without this `unknown` field, it would generate in the DOM: `<pre class="language-class:language-c language-c">`, which is quite bad. So instead, it now stores all unknown tags into the `unknown` field and use the first one as "language". So in this case, since there is no unknown tag, it'll simply generate `<pre class="language-c">`. I added tests to cover this. Finally, I added a parser for the codeblock attributes to make it much easier to maintain. It'll be pretty easy to extend. As to why this syntax for adding attributes was picked: it's [Pandoc's syntax](https://pandoc.org/MANUAL.html#extension-fenced_code_attributes). Even if it seems clunkier in some cases, it's extensible, and most third-party Markdown renderers are smart enough to ignore Pandoc's brace-delimited attributes (from [this comment](https://github.com/rust-lang/rust/pull/110800#issuecomment-1522044456)). ## Raised concerns #### It's not obvious when the `language-*` attribute generation will be added or not. It is added by default. If you want to disable it, you will need to use the `custom` attribute. #### Why not using HTML in markdown directly then? Code examples in most languages are likely to contain `<`, `>`, `&` and `"` characters. These characters [require escaping](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/pre) when written inside the `<pre>` element. Using the \`\`\` code blocks allows rustdoc to take care of escaping, which means doc authors can paste code samples directly without manually converting them to HTML. cc `@poliorcetics` r? `@notriddle`