Search text file for pattern
-
I like to search a text file for a pattern and get a list of every occurrence that matches the pattern. Not every line but every pattern that matches. Can I do this with grep?
The pattern is
- always separated by characters like "><.:; or whitespace
- and it starts with zero or one character that is A to Z
- then T4
- and then any number of characters after that of the type A to Z or 0 to 9
If it can't be done with grep, please suggest other solutions
-
@Pete-S : grep can use regular expressions to search using the pcre syntax.
If you could provide a sample of a piece of text that you want matched we may be able to help.
Otherwise, use regexr.com and it'll help you build the expression. -
-
@manxam said in Search text file for pattern:
Otherwise, use regexr.com and it'll help you build the expression.
nice tool. will have to remember it.
-
Alright, I have the search expression down and regexr.com was a great interactive tool.
/[<\.:;"]([A-Z]*T4[A-Z,0-9]+)[>\.:;"]+/g
However, how do I get grep to deliver the match (capturing group) and not the complete lines?
-
@Pete-S said in Search text file for pattern:
Alright, I have the search expression down and regexr.com was a great interactive tool.
/[<\.:;"]([A-Z]*T4[A-Z,0-9]+)[>\.:;"]+/g
However, how do I get grep to deliver the match (capturing group) and not the complete lines?
Don’t use grep? I have no idea what tool you actually want here.
-
@JaredBusch said in Search text file for pattern:
@Pete-S said in Search text file for pattern:
However, how do I get grep to deliver the match (capturing group) and not the complete lines?
Don’t use grep? I have no idea what tool you actually want here.
Me neither. I just want to search a file and get a list of what matches.
-
@Pete-S said in Search text file for pattern:
@JaredBusch said in Search text file for pattern:
@Pete-S said in Search text file for pattern:
However, how do I get grep to deliver the match (capturing group) and not the complete lines?
Don’t use grep? I have no idea what tool you actually want here.
Me neither. I just want to search a file and get a list of what matches.
grep should be able to do that natively now. We used to
cat filename | grep
-
@travisdh1 said in Search text file for pattern:
@Pete-S said in Search text file for pattern:
@JaredBusch said in Search text file for pattern:
@Pete-S said in Search text file for pattern:
However, how do I get grep to deliver the match (capturing group) and not the complete lines?
Don’t use grep? I have no idea what tool you actually want here.
Me neither. I just want to search a file and get a list of what matches.
grep should be able to do that natively now. We used to
cat filename | grep
Grep returns the whole line where the string was found. He does not want that. He only wants to results of the regex.
-
@JaredBusch said in Search text file for pattern:
@travisdh1 said in Search text file for pattern:
@Pete-S said in Search text file for pattern:
@JaredBusch said in Search text file for pattern:
@Pete-S said in Search text file for pattern:
However, how do I get grep to deliver the match (capturing group) and not the complete lines?
Don’t use grep? I have no idea what tool you actually want here.
Me neither. I just want to search a file and get a list of what matches.
grep should be able to do that natively now. We used to
cat filename | grep
Grep returns the whole line where the string was found. He does not want that. He only wants to results of the regex.
OK, I think I have it.
-o, --only-matching
Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.So
grep pattern file -o
should do the trick and it seems like it from my first test. -
@Pete-S said in Search text file for pattern:
@JaredBusch said in Search text file for pattern:
@travisdh1 said in Search text file for pattern:
@Pete-S said in Search text file for pattern:
@JaredBusch said in Search text file for pattern:
@Pete-S said in Search text file for pattern:
However, how do I get grep to deliver the match (capturing group) and not the complete lines?
Don’t use grep? I have no idea what tool you actually want here.
Me neither. I just want to search a file and get a list of what matches.
grep should be able to do that natively now. We used to
cat filename | grep
Grep returns the whole line where the string was found. He does not want that. He only wants to results of the regex.
OK, I think I have it.
-o, --only-matching
Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.So
grep pattern file -o
should do the trick and it seems like it from my first test.OK, so
grep
can match an expression and deliver the match with-o
option but not part of the match, aka capturing group. Grep doesn't have the full capabilities of regular expressions, only parts of it.But
pcregrep
does.So in my case I put the search expression in pattern.txt because I had some odd characters in it. So pcregrep delivered the goods with:
pcregrep -o1 -f pattern.txt file_to_search
So the take home message is that
pcregrep
is like grep on steroids.
Most of the options and usage is the same as grep so it's a drop in replacement in most cases.PS. I tried with debian and on debian you need the pcregrep package, which you can install with
apt install pcregrep
. I don't know if it's installed by default on fedora/centos. -
@Pete-S said in Search text file for pattern:
Grep doesn't have the full capabilities of regular expressions, only parts of it.
Not sure what this means. RE is a kind of thing, there is no such thing as a full set of RE capabilities. Some systems are more powerful than others. But there is no standard RE. It's not like relational databases which were defined before the first one was created as to what RDBs were and what was required to be fully relational. RE is just a concept of pattern matching, nothing more. Anything that does pattern matching is doing RE.
-
@scottalanmiller said in Search text file for pattern:
@Pete-S said in Search text file for pattern:
Grep doesn't have the full capabilities of regular expressions, only parts of it.
Not sure what this means. RE is a kind of thing, there is no such thing as a full set of RE capabilities. Some systems are more powerful than others. But there is no standard RE. It's not like relational databases which were defined before the first one was created as to what RDBs were and what was required to be fully relational. RE is just a concept of pattern matching, nothing more. Anything that does pattern matching is doing RE.
OK, you are right.
The regular expressions used in javascript, php and others are perl compatible. PCRE stands for Perl Compatible Regular Expressions.
pcregrep
supports these but grep does not.According to wikipedia:
PCRE's syntax is much more powerful and flexible than either of the POSIX regular expression flavors and than that of many other regular-expression libraries.
-
@Pete-S said in Search text file for pattern:
@scottalanmiller said in Search text file for pattern:
@Pete-S said in Search text file for pattern:
Grep doesn't have the full capabilities of regular expressions, only parts of it.
Not sure what this means. RE is a kind of thing, there is no such thing as a full set of RE capabilities. Some systems are more powerful than others. But there is no standard RE. It's not like relational databases which were defined before the first one was created as to what RDBs were and what was required to be fully relational. RE is just a concept of pattern matching, nothing more. Anything that does pattern matching is doing RE.
OK, you are right.
The regular expressions used in javascript, php and others are perl compatible. PCRE stands for Perl Compatible Regular Expressions.
pcregrep
supports these but grep does not.Yes, PERL made a non-standard RE system long after GREP and other mainline RE tools were commonplace. PERL is the weird one here, and later systems copied them. But PERL went their own route long after GREP and others were standard.
-
@scottalanmiller said in Search text file for pattern:
@Pete-S said in Search text file for pattern:
@scottalanmiller said in Search text file for pattern:
@Pete-S said in Search text file for pattern:
Grep doesn't have the full capabilities of regular expressions, only parts of it.
Not sure what this means. RE is a kind of thing, there is no such thing as a full set of RE capabilities. Some systems are more powerful than others. But there is no standard RE. It's not like relational databases which were defined before the first one was created as to what RDBs were and what was required to be fully relational. RE is just a concept of pattern matching, nothing more. Anything that does pattern matching is doing RE.
OK, you are right.
The regular expressions used in javascript, php and others are perl compatible. PCRE stands for Perl Compatible Regular Expressions.
pcregrep
supports these but grep does not.Yes, PERL made a non-standard RE system long after GREP and other mainline RE tools were commonplace. PERL is the weird one here, and later systems copied them. But PERL went their own route long after GREP and others were standard.
You know, before this thread I never took notice that there was an older standard in place and that some of the tools used it. I just though everything regex was the same, maybe with some minor implementation specific changes.
The reason for that is of course that every time I had to do some more advanced regex, at least the last 10 years or so, it has been in mod_rewrite, php, javascript or something like that. And as it turns out, they are all using pcre.
-
@Pete-S said in Search text file for pattern:
@scottalanmiller said in Search text file for pattern:
@Pete-S said in Search text file for pattern:
@scottalanmiller said in Search text file for pattern:
@Pete-S said in Search text file for pattern:
Grep doesn't have the full capabilities of regular expressions, only parts of it.
Not sure what this means. RE is a kind of thing, there is no such thing as a full set of RE capabilities. Some systems are more powerful than others. But there is no standard RE. It's not like relational databases which were defined before the first one was created as to what RDBs were and what was required to be fully relational. RE is just a concept of pattern matching, nothing more. Anything that does pattern matching is doing RE.
OK, you are right.
The regular expressions used in javascript, php and others are perl compatible. PCRE stands for Perl Compatible Regular Expressions.
pcregrep
supports these but grep does not.Yes, PERL made a non-standard RE system long after GREP and other mainline RE tools were commonplace. PERL is the weird one here, and later systems copied them. But PERL went their own route long after GREP and others were standard.
You know, before this thread I never took notice that there was an older standard in place and that some of the tools used it. I just though everything regex was the same, maybe with some minor implementation specific changes.
The reason for that is of course that every time I had to do some more advanced regex, at least the last 10 years or so, it has been in mod_rewrite, php, javascript or something like that. And as it turns out, they are all using pcre.
Yeah, I come from the pre-PERL GREP era and so PERL was this annoying upstart that broke all of our de facto standards
-
I should have noted that grep likely wouldn't like complicated RE patterns. Sed, however, supports the full gamut and is installed by default on pretty much every linux variant; not sure if pcregrep is.
It seems everyone likes the link I posted for the RE builder so I'll paste two others that I use:
https://regex101.com/
http://leaverou.github.io/regexplained/Cheers!