Edit

My question was very badly written but the new title reflect the actual question. Thanks to 3 very friendly and dedicated users (@harsh3466 @tuna @learnbyexample) I was able to find a solution for my files, so thank you guys !!!

For those who will randomly come across this post here are 3 possible ways to achieve the desired results.

Solution 1 (https://lemmy.ml/post/25346014/16383487)

#! /bin/bash
files="/home/USER/projects/test.md"

mdlinks="$(grep -Po ']\((?!https).*\)' "$files")"
mdlinks2="$(grep -Po '#.*' <<<$mdlinks)"

while IFS= read -r line; do
	#Converts 1.2 to 1-2 (For a third level heading needs to add a supplementary [0-9]) 
	dashlink="$(echo "$line" | sed -r 's|(.+[0-9]+)\.([0-9]+.+\))|\1-\2|')"
	sed -i "s/$line/${dashlink}/" "$files"

	#Puts everything to lowercase after a hashtag
	lowercaselink="$(echo "$dashlink" | sed -r 's|#.+\)|\L&|')"
	sed -i "s/$dashlink/${lowercaselink}/" "$files"

	#Removes spaces (%20) from markdown links after a hashtag
	spacelink="$(echo "$lowercaselink" | sed 's|%20|-|g')"
	sed -i "s/$lowercaselink/${spacelink}/" "$files"

done <<<"$mdlinks2"

Solution 2 (https://lemmy.ml/post/25346014/16453351)

sed -E ':l;s/(\[[^]]*\]\()([^)#]*#[^)]*\))/\1\n\2/;Te;H;g;s/\n//;s/\n.*//;x;s/.*\n//;/^https?:/!{:h;s/^([^#]*#[^)]*)(%20|\.)([^)]*\))/\1-\3/;th;s/(#[^)]*\))/\L\1/;};tl;:e;H;z;x;s/\n//;'

Solution 3 (https://lemmy.ml/post/25346014/16453161)

perl -pe 's/\[[^]]+\]\((?!https?)[^#]*#\K[^)]+(?=\))/lc $&=~s:%20|\d\K\.(?=\d):-:gr/ge'

Relevant links

https://mike.bailey.net.au/notes/software/apps/obsidian/issues/markdown-heading-anchors/#background


Hi everyone !

I’m in need for some assistance for string manipulation with sed and regex. I tried a whole day to trial & error and look around the web to find a solution however it’s way over my capabilities and maybe here are some sed/regex gurus who are willing to give me a helping hand !

With everything I gathered around the web, It seems it’s rather a complicated regex and sed substitution, here we go !

What Am I trying to achieve?

I have a lot of markdown guides I want to host on a self-hosted forgejo based git markdown. However the classic markdown links are not the same as one github/forgejo…

Convert the following string:

[Some text](#Header%20Linking%20MARKDOWN.md)

Into

[Some text](#header-linking-markdown.md)

As you can see those are the following requirement:

  • Pattern: [Some text](#link%20to%20header.md)
  • Only edit what’s between parentheses
  • Replace space (%20) with -
  • Everything as lowercase
  • Links are sometimes in nested parentheses
    • e.g. (look here [Some text](#link%20to%20header.md))
  • Do not change a line that begins with https (external links)

While everything is probably a bit complex as a whole the trickiest part is probably the nested parentheses :/

What I tried

The furthest I got was the following:

sed -Ei 's|\(([^\)]+)\)|\L&|g' test3.md #make everything between parentheses lowercase

sed -i '/https/ ! s/%20/-/g' test3.md #change every %20 occurrence to -

These sed/regx substitution are what I put together while roaming the web, but it has a lot a flaws and doesn’t work with nested parentheses. Also this would change every %20 occurrence in the file.

The closest solution I found on stackoverflow looks similar but wasn’t able to fit to my needs. Actually my lack of regex/sed understanding makes it impossible to adapt to my requirements.


I would appreciate any help even if a change of tool is needed, however I’m more into a learning processes, so a script or CLI alternative is very appreciated :) actually any help is appreciated :D !

Thanks in advance.

  • N0x0n@lemmy.mlOP
    link
    fedilink
    arrow-up
    0
    ·
    3 days ago

    Hello :) Sorry for the very late response !

    Effectively your regex is very close as a one line, I’m pretty impress ! :0 However I missed to mention something In my post (I only though about it after working on it with another user in the comments…). There a 2 things missing on your beautiful and complex regex:

    1. Numbering with dots also needs to have a dash in between (actually I think every special characters like spaces or a dots are converted to a dash )
    FROM
    ---------------
    [Link with numbers](readme.md#1.3%20this%20is%20another%20test)
    
    TO
    ---------------
    [Link with numbers](readme.md#1-3-this-is-another-test)
    
    1. The part before the hashtag needs to keep it original form (links to a real file)
    FROM
    ---------------
    [Link with numbers](Another%20file%20to%20readme.md#1.3%20this%20is%20another%20test.md)
    
    TO
    ---------------
    [Link with numbers](Another%20file%20to%20readme.md#1-3-this-is-another-test.md)
    

    Sorry for the trouble I wasn’t aware of all the GitHub-Flavored Markdown syntax :/. I got a a very cool working script that works perfectly with another user but If you want to modify your regex and try to solve the issue in pure regex feel free :) I’m very curious how It could look like (god regex is so obscure and at the same time it has some beauty in it !)

    #! /bin/bash
    
    files="/home/USER/projects/test.md"
    
    mdlinks="$(grep -Po ']\((?!https).*\)' "$files")"
    mdlinks2="$(grep -Po '#.*' <<<$mdlinks)"
    
    while IFS= read -r line; do
    	#Converts 1.2 to 1-2 (For a third level heading needs to add a supplementary [0-9]) 
    	dashlink="$(echo "$line" | sed -r 's|(.+[0-9]+)\.([0-9]+.+\))|\1-\2|')"
    	sed -i "s/$line/${dashlink}/" "$files"
    
    	#Puts everything to lowercase after a hashtag
    	lowercaselink="$(echo "$dashlink" | sed -r 's|#.+\)|\L&|')"
    	sed -i "s/$dashlink/${lowercaselink}/" "$files"
    
    	#Removes spaces (%20) from markdown links after a hashtag
    	spacelink="$(echo "$lowercaselink" | sed 's|%20|-|g')"
    	sed -i "s/$lowercaselink/${spacelink}/" "$files"
    
    done <<<"$mdlinks2"
    
    • tuna@discuss.tchncs.de
      link
      fedilink
      arrow-up
      0
      ·
      edit-2
      2 days ago

      I did it!! It also handles the case where an external link and internal link are on the same line :D

      sed -E ':l;s/(\[[^]]*\]\()([^)#]*#[^)]*\))/\1\n\2/;Te;H;g;s/\n//;s/\n.*//;x;s/.*\n//;/^https?:/!{:h;s/^([^#]*#[^)]*)(%20|\.)([^)]*\))/\1-\3/;th;s/(#[^)]*\))/\L\1/;};tl;:e;H;z;x;s/\n//;'
      

      Here is my annotated file

      # Begin loop
      :l;
      
      # Bisect first link in pattern space into pattern space and append to hold space
      # Example: `text [label](file#fragment)'
      #   Pattern space: `file#fragment)'
      #   Hold space: `text [label]('
      # Steps:
      #   1. Strategically insert \n
      #       1a. If this fails, branch out
      #   2. Append to hold space (this creates two \n's. It feels weird for the
      #      first iteration, but that's ok)
      #   3. Copy hold space to pattern space, remove first \n, then trim off
      #      everything past the second \n
      #   4. Swap pattern/hold, and trim off everything up to and incl the last \n
      s/(\[[^]]*\]\()([^)#]*#[^)]*\))/\1\n\2/;
      Te;
      H;
      g; s/\n//; s/\n.*//;
      x; s/.*\n//;
      
      # Modify only if it is an internal link
      /^https?:/! {
          # Add hyphens
          :h;
          s/^([^#]*#[^)]*)(%20|\.)([^)]*\))/\1-\3/;
          th;
          # Make lowercase
          s/(#[^)]*\))/\L\1/;
      };
      
      # "conditional" branch so it checks the next conditional again
      tl;
      
      # Exit: join pattern space to hold space, then move to pattern space.
      # Since the loop uses H instead of h, have to make sure hold space is empty
      :e;
      H;
      z;
      x; s/\n//;
      
      • N0x0n@lemmy.mlOP
        link
        fedilink
        arrow-up
        0
        ·
        2 days ago

        Wow ! Thank you ! It did a rapid test on a test-file.md

        [Just a test](#just-a-test)
        [Just a link](https://mylink/%20with%20space.com)
        [External link](readme.md#just-a-test)
        [Link with numbers](readme.md#1-3-this-is-another-test)
        [Link with numbers](Another%20file%20to%20readme.md#1-3-this-is-another-test)
        

        Great job ! Thank you very much !!! I’m really impressed what someone with proper knowledge can do ! However, I really do not want to mess around with your regex… This will only call for disaster xD ! I will keep preciously your regex and annotated file in my knowledge base, I’m sure some time in the future I will come back to it and try to break it down as learning process.

        Thank you very much !!! 👍