yandex
loader

please wait

  • Atul Kasana Mar-11-2019 07:14:46 AM ( 1 week ago )

    I want to find and separate words in a title that has no spaces.

    Before

    ThisIsAnExampleTitleHELLO-WORLD2019T.E.S.T.(Test)"Test"'Test'[Test]

    After

    This Is An Example Title HELLO-WORLD 2019 T.E.S.T. (Test) [Test] "Test" 'Test'


    I'm looking for a Regex rule that can do the following.

    I thought I'd identify each word if it starts with an Uppercase letter.

    But also preserve ALL UPPERCASE words as not to space them into A L L U P P E R C A S E.

    Additional rules:

    • Space a letter if it touches a number Hello2019World Hello 2019 World
    • Ignore spacing initials that contain periods, hyphens, or underscores T.E.S.T.
    • Ignore spacing if between brackets, parentheses, or quotes [Test] (Test) "Test" 'Test'
    • Preserve hyphens Hello-World

    C#

     

    // Title without spaces
    string title = "ThisIsAnExampleTitleHELLO-WORLD2019T.E.S.T.(Test)[Test]\"Test\"'Test'";
    
    // Detect where to space words
    string[] split =  Regex.Split(title, "(?); 
    
    // Trim each word of extra spaces before joining
    split = (from e in split
             select e.Trim()).ToArray();
    
    // Join into new title
    string newtitle = string.Join(" ", split);
    
    // Display
    Console.WriteLine(newtitle);

    Regex

    I'm having trouble with spacing before the numbers, brackets, parentheses, and quotes.

    (?\-'"([{])(?A-Z])[A-Z][\d+?]?) (? // negative look behind (?= // positive look ahead (?\-'"([{]) // ignore if starts with punctuation (?A-Z]) // ignore if starts with double Uppercase letter [A-Z] // space after each Uppercase letter [\d+]? // space after number )

     

  • Deepak Parmar Mar-11-2019 07:17:19 AM ( 1 week ago )

    You could reduce the requirements to shorten the steps of a regular expression using a different interpretation of them. For example, the first requirement would be the same as to say, preserve capital letters if they are not preceded by punctuation marks or capital letters.

    The following regex works almost for all of the mentioned requirements and may be extended to include or exclude other situations:

    (?A-Z\p{P}])[A-Z]|(?<=\p{P})\p{P}

    You have to use Replace() method and use $0 as substitution string.

Please login

Similar Discussion

Recommended For You