In this case we have parsed files and get two types of records, which should be parsed further to get order number and client name.

Example of records:

Order - ID 123456 (Client Name 1)

Order completion report #12345 (Client category > Client Name 2)

 1. To get order number we will use:

\d{5,6}

This means, that parser will extract symbols of digits in sequence of 5 or 6 in a row. 

regex for digits

2. To extract client name, we will use conditional regular expression. Synopsys of conditional regular expressions looks like:

 (?(condition)true|false)

The reason of using conditional regular expression is imposibility of catching client name by extracting everything between curvy brackets. In second example Client category will be extracted as well, which is unacceptable. So in first case wee need text between brackets, in the second one we we need the text after ">" sign and before closing curvy bracket.

To distinguish these two types of records let's use difference in order numbers: first one has 6 digit number. We begin with choosing "true" valued lines by expression: if we have text ahead "123456 (".

(?(?<=\d{6}\s\()true|false)

Then in "true" case extract everything before ")"

(?(?<=\d{6}\s\()((.*)(?=\)))|false)

For the "false" case we will extract the text after "> ". Expression for it will look like this: (?<=\>\s)(.*)(?=\)). And the result is:

(?(?<=\d{6}\s\()((.*)(?=\)))|(?<=\>\s)(.*)(?=\)))

conditional regex

No comments yet