Regular expression help

Hello, I am trying to create a regular expression to extract some data from my filenames.

I have files names such as:

blue_MAX_P1_W1_Sa

red_MAX_P1_W2_Sb

RGB_MAX_P1_W16_Sd

…where P is the plate, W is the well and S is the site.

I just want to extract the P, W, and S but I am completely new to these regular expressions and cannot make even this simple task work!

Here’s an example of what I assume I have to do (although it is certainly very naive and wrong!)
^P(?P[0-9])W(?P[0-16])S(?P[a-d])

Can someone help, and more importantly, can someone point me in the direction of some information that will help me understand how to generate these regular expressions?

Thanks in advance

Hello Freemano,
Here’s a very helpful page: it’s a regex editor/tester with tons of explanations…

As for your current needs, a very simple way would be the following:
.*MAX_(?P<Plate>P[0-9]*)_(?P<Well>W[0-9]*)_(?P<Site>S[a-d])

such a simple regex will do the following:
.*MAX_ will drop beginning of the name until it finds MAX_

(?P<Plate>P[0-9]*)_ will extract the plate name: the plate name will be composed of the letter P followed by as many number digits until the next “_”.

(?P<Well>W[0-9]*)_ will extract the well name: the well name will be composed of the letter W followed by as many number digits until the next “_”.
You were suggesting P[0-16] so I assume you have 16 wells? But a regular expression doesn’t expect a range but only the list of expected characters. So if you have 16 wells, you have 1 or 2 characters which can be any digit between 0 and 9.

(?P<Site>S[a-d]) will extract the Site. Site name will be composed of the letter S followed by either an “a”, “b”, “c” or “d”.

So, the name
blue_MAX_P1_W1_Sa
will therefore be extracted as
Plate: P1
Well: W1
Site: Sa

If instead you use
.*MAX_P(?P<Plate>[0-9]*)_W(?P<Well>[0-9]*)_S(?P<Site>[a-d])
the name
blue_MAX_P1_W1_Sa
will therefore be extracted as
Plate: 1
Well: 1
Site: a

I hope it help!
Good luck,
Fabien

4 Likes

Thanks Fabien, this is great info. Thanks especially for the link to regex101, I really appreciate it :slight_smile: