0

Lemme just get this out in the open, my RegEx knowledge is dire, when I need it, I tend to find solutions through Google, test them in a helper and hope for the best.

So, I have a site that uses a markdown based CMS, the CMS was formerly Netlify CMS and it is now Decap CMS. In the config.yml files I can create custom widgets to add custom "Blocks" to the CMS editor for team mates and myself. I've created several of these widgets, which extend the CMS to our needs. They work absolutely fine for the most part, we use Eleventy and the markdown generated by the CMS is processed.

The issue I have is related to my poor RegEx skills. When a user uses one of the custom widgets I created, what appears visually is a grey block, with the appropriate number of inputs for that particular widget. As an example, for our accordion widget there are 3 inputs:

  • The first allows a user to select the heading level of the accordion, between 2 & 6, this is a number
  • The second is for the accordion title (the text that will be in the button) and this is a string
  • The third is for the contents, the contents will be Markdown, which is obviously later processed by Eleventy into HTML, it can contain code blocks, text formatting, images and other standard stuff

That visible grey block is important for non-devs, as it shows it is a block and they can't accidentally break the HTML. When a user saves a post and revisits later, in the editor what we see is the raw HTML, as opposed to the grey block with the inputs.

The reason the above is happening is React is used for the CMS, when a user revisits a post, React is parsing the contents of the editor and applying these blocks to widgets (there are default widgets such as image and code block etc), the default widgets get the block, my widgets get the raw HTML and the reason for this is my RegEx string is required for the parser to detect some specific HTML and present it as a block after a reload event.

The HTML is very basic, the accordions are progressively enhanced so don't worry about how they appear here, they work perfectly in the browser, the HTML is as follows:

<h2 class="accordion"></h2>
<div class="accordion__panel">

</div>
  • That heading level can be any HTML heading other than 1, so 2-6
  • The heading of any level must have a class of accordion
  • Within the heading tags, allow the string of text

Then:

  • A div is present and that div must have the class of accordion__panel
  • Then there is a new line
  • Then allow any content
  • Then a new line
  • Then a closing div tag

What I have thus far, from winging it with various helpers and what not is the following:

\<h[2-6].*class="accordion".*\>.*\</h[2-6]\>\n\<div class="accordion__panel">\n.*\<\/div\>/ms

This appears to be working OK, using RegEx tools for testing, but I'm not confident I have successfully winged my way through this, without issue and wondered if any kind folks here could give me any help on improving it or confirming it's good?

Thanks

6
  • problematic because it matches . when you probably just want to match whitespace. Also problematic that you mandate linebreaks in html. But the bigger problem is - html is not a regular language, and can't be fully parsed with regex. You might need something more high-level like javascript or python to deal with the DOM. But I don't have enough understanding of what you want to do with the regex to understand if that's an option.
    – julaine
    Commented Jan 17 at 11:06
  • especially the part "then allow any content" ... "then a closing div tag" is not possible to do with regex. "Any content" might include a closing "div"-tag, and a regex cannot understand which closing "div"-tag it should match.
    – julaine
    Commented Jan 17 at 11:08
  • see here for a more detailed explanation of limitations when using regex on html stackoverflow.com/questions/6751105/…
    – julaine
    Commented Jan 17 at 11:09
  • Sorry, I tried to be as explanatory as possible. This post on the official docs may demo what I am trying to achieve: decapcms.org/docs/custom-widgets. In the docs they are using a summary and details component, using a similar method to what i have, only I tried to change my RegEx to allow for the class and variable heading level, as well as the closing div (which would be the last div)
    – Daz Lee
    Commented Jan 17 at 11:22
  • That approach suffers from the same unsolvable problem of trying to parse html with regex. You will not find a bullet-proof solution because it cannot even theoretically exist. But if "works for simple cases" is good enough and your regex works for the inputs you tried then go ahead and do it, it might be the best most pragmatic solution.
    – julaine
    Commented Jan 17 at 12:52

0