Lemme just get this out in the open, my RegEx knowledge is dire, when I need it, I tend to find solutions through Google, test them in a helper and hope for the best.
So, I have a site that uses a markdown based CMS, the CMS was formerly Netlify CMS and it is now Decap CMS. In the config.yml files I can create custom widgets to add custom "Blocks" to the CMS editor for team mates and myself. I've created several of these widgets, which extend the CMS to our needs. They work absolutely fine for the most part, we use Eleventy and the markdown generated by the CMS is processed.
The issue I have is related to my poor RegEx skills. When a user uses one of the custom widgets I created, what appears visually is a grey block, with the appropriate number of inputs for that particular widget. As an example, for our accordion widget there are 3 inputs:
- The first allows a user to select the heading level of the accordion, between 2 & 6, this is a number
- The second is for the accordion title (the text that will be in the button) and this is a string
- The third is for the contents, the contents will be Markdown, which is obviously later processed by Eleventy into HTML, it can contain code blocks, text formatting, images and other standard stuff
That visible grey block is important for non-devs, as it shows it is a block and they can't accidentally break the HTML. When a user saves a post and revisits later, in the editor what we see is the raw HTML, as opposed to the grey block with the inputs.
The reason the above is happening is React is used for the CMS, when a user revisits a post, React is parsing the contents of the editor and applying these blocks to widgets (there are default widgets such as image and code block etc), the default widgets get the block, my widgets get the raw HTML and the reason for this is my RegEx string is required for the parser to detect some specific HTML and present it as a block after a reload event.
The HTML is very basic, the accordions are progressively enhanced so don't worry about how they appear here, they work perfectly in the browser, the HTML is as follows:
<h2 class="accordion"></h2>
<div class="accordion__panel">
</div>
- That heading level can be any HTML heading other than 1, so 2-6
- The heading of any level must have a class of accordion
- Within the heading tags, allow the string of text
Then:
- A div is present and that div must have the class of accordion__panel
- Then there is a new line
- Then allow any content
- Then a new line
- Then a closing div tag
What I have thus far, from winging it with various helpers and what not is the following:
\<h[2-6].*class="accordion".*\>.*\</h[2-6]\>\n\<div class="accordion__panel">\n.*\<\/div\>/ms
This appears to be working OK, using RegEx tools for testing, but I'm not confident I have successfully winged my way through this, without issue and wondered if any kind folks here could give me any help on improving it or confirming it's good?
Thanks
.
when you probably just want to match whitespace. Also problematic that you mandate linebreaks in html. But the bigger problem is - html is not a regular language, and can't be fully parsed with regex. You might need something more high-level like javascript or python to deal with the DOM. But I don't have enough understanding of what you want to do with the regex to understand if that's an option.