I may have found a way to search the html files without having to edit them. I should find the files that doesn’t contain the same link up and down. I haven’t provided that as it does seem that you have some good regex knowledge said in Regex: Find those files that doesn't contain the same link in 2 different html tags: If you require some descriptions of what my second regex is doing please ask. Line number is also shown although this won’t match the original file. After clicking the “Find All” button the search window will show which files have what you seek, mismatching https references for any group. Set the Filters and folder (also sub folder option) as appropriate. So my assumption (from your example data) is that there are ONLY 2 https references for each group.įind What: (?-s)(. ? The Replace With field is empty as we are only using the “Find All” button to search the files. The second step is to compare the 2 https references on each line, to locate lines where there are differences. If we went with your idea of crossing multiple lines the regex will “attempt” to find a match, and will therefore try to expand it’s search into the next group. What this does is allow us the ability to limit the search within 1 line. So using the “Find in Files” function we have: The first step is to remove all line feeds except the ones starting a group, which I believe starts with the “meta property” tag. I hope I have understood enough from your original post to provide some useful information, if not then you need to elaborate (such as whether the example data was a "good’ set or “bad” set). So my solution means we will be editing the files somewhat so it should be done on a copy of the html files. I think it will be very difficult to achieve what you want and actually there is probably an easier method by breaking down the process into multiple steps, 2 actually. However I decided that your example data is what I would try to work with. I looked at your example data and that doesn’t appear to match what your regex is looking for (you had “link rel=canonical” mentioned). I should find the files that doesn’t contain the same link up and down This can find the second link from For example, we can match a line containing either “ awesome” or “ powerful“: $ grep 'awesome\|powerful' input.txtĪs we’ve seen in the command above, we’ve escaped the ‘|’ character to give it special meaning. That is to say, if we don’t set an option, it only supports BRE syntax. Grep is by default in GNU BRE matching mode.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |