regex: août 2015

mardi 4 août 2015

htaccess redirect for http but exclude https

I am changing the domain of a website, however due to hosting restrictions the ssl connection has to use the old domain (for the time being at least). I am therefore trying to use .htaccess to redirect from one domain to another for http connections but not for those using https.

I have the following which isn't working

RewriteEngine on
RewriteCond %{HTTPS} !=on
RewriteCond {HTTP_HOST} ^olddomain.com$ [OR]
RewriteCond {HTTP_HOST} ^www.olddomain.com$ 

RewriteRule (.*)$ http://newdomain.com/$1 [R=301,L]

Any help would be gratefully received. Thanks.

Python : Scraping the Internet using Re and Pandas

Hi so I am trying to scrape data from the pga website to give me a CSV of information on golf courses. I tried something new and used module re and pandas instead of beautiful soup to access the data. I am having a problem writing a CSV file. I tried using pandas dataframe module but I have been getting an attribute error. with my current scheme, It is giving me an attribute error when encoding the Utf-8 and I was wondering if I should break my scrapers into try/except blocks just. Lastly how can i create a progress bar for my sanity while waiting for it to be scraped. Ideas thoughts will be greatly appreciated.

Code cited below:

import re
import requests
import pandas as pd
import csv


L = []
for i in range(1):      # Number of pages plus one 
    url = "http://ift.tt/1TSyPTR".format(i)
    r = requests.get(url)

    name     = re.findall('(?<=<div class="views-field-title"><span class="field-content">)([^<]+)', r.text)
    print (name)

    address1 = re.findall('(?<=<div class="views-field-address"><span class="field-content">)([^<]+)', r.text)

    address2 = re.findall('(?<=<div class="views-field-city-state-zip"><span class="field-content">)([^<]+)', r.text)

    ownership = re.findall('(?<=<div class="views-field-course-type"><span class="field-content">)([^<]+)',r.text)

    website   = re.findall('(?<=<div class="class":"views-field-website"><span class="field-content">)([^<]+)',r.text)

    phone     = re.findall('(?<=<div class="class":"views-field-work-phone"><span class="field-content">)([^<]+)',r.text)

    #L.extend(zip(name,address1,address2,ownership,website,phone))

    course=[name,address1,address2,ownership,website,phone]
    L.append(course)

    with open ('Testing.csv','a') as file:
        writer=csv.writer(file)
        for row in L:
            writer.writerow([s.encode("utf-8") for s in row])

Regex to detect end of line(\n) that has double slash(//)

i really need a regex for this Example:

    //This is a comment and I need this \n position
    String notwanted ="//I do not need this end of line position";

Any help is appreciated. Thank you for your time

Java: I want use Regex to count phone numbers

I am a newbie to Regex. I am building a messaging app with windowbuilder eclipse that handles user putting series of phone numbers and I want to use Regex to accomplish the following:

Get input from user from TextArea and count the phone numbers
total digit for each phone number is 11 (no spaces in between them)
only single spaces or comma between each phone number are allowed

I'm still learning Regex haven't gotten used to it so I would appreciate it if someone could provide a solution to my challenges. Thanks

Express using a patternless dynamic base URL to render different pages

I am interested in having a route that could respond to a request with a file eg express's res.sendFile() based on the URL's base parameter i.e. http://ift.tt/1P4lZhD. The problem is that the URLs are completely user generated and completely dynamic. Similar to that of Github, http://ift.tt/1M3etU0 could render a user's profile or http://ift.tt/1P4lZxT could render a project—but they are both strings that don't have a pattern and the machine has no way of knowing that http://ift.tt/1M3etU0 refers to a user view unless it does some type of check.

app.all('/*', function(req, res) {
res.sendfile('index.html', { root: config.server.distFolder });

Github responds to server requests with different views based on the parameter, even though they have no predefined pattern.

i.e. it would be easy to know that http://ift.tt/1M3euYh is a user route and the server can respond with a user view (the pattern to match would be http://ift.tt/1P4lZxY But when the string is completely dynamic it becomes more difficult. (How does express know if it should respond with a user or a project view with a url like example.com/cococola)?

I believe you would somehow be able to check the URL parameter, understand that it refers to (in this case) either a project or a user's page, and then render that view. How do you do this without making a synchronous call and forcing the user to wait for the server to check what view-type the parameter string refers to before responding?

I'm using angular, are there other ways to respond to server requests with different pages based on URL's that have no predeterminable matching pattern? The reason being that, I would like to separate my site into many different apps. http://ift.tt/1gZKM9B might require a user's profile SPA, whereas http://ift.tt/1P4lZy2 might require a user's project SPA—but, as these are user defined, there is no way to respond based on the parameter's matching pattern. Would like to keep the URL as minimal as possible :-)

Any help is appreciated. Thanks :-)

Split UNIX String with DOT delimiter and store in variable

I have a file abc.123.234.345 and I want to store value of 123.234.345 into variable1 using UNIX oneliner. Then I have to rename a file named def as def.123.234.345 I tried sed, awk but not able to generate split string.

I have to search as abc.*.*.* as string will be different every time.

Any suggestions?

Some sample that I use but not working

FILE=abc.*.*.* | mv /loc/def /loc/def.${FILE#*.}

How Can I Use a Regex on the Reverse of a string?

I want to use a regex on the reverse of a string.

I can do the following but all my sub_matches are reversed:

string foo("lorem ipsum");
match_results<string::reverse_iterator> sm;

if (regex_match(foo.rbegin(), foo.rend(), sm, regex("(\\w+)\\s+(\\w+)"))) {
    cout << sm[1] << ' ' << sm[2] << endl;
}
else {
    cout << "bad\n";
}

[Live example]

What I want is to get out:

ipsum lorem

Is there any provision for this beyond reversing the strings after they're matched like this:

string first(sm[1]);
string second(sm[2]);

reverse(first.begin(), first.end());
reverse(second.begin(), second.end());

cout << first << ' ' << second << endl;

Google Analytics: View Filter by Request URI - does this work?

I think this should be pretty straight forward but if I have a url(s) with a request URIs such as:

/en/my-hometown/92-winston
/en/my-hometown/92-winston/backyard

and I want to set up a GA view which only includes this page and any subpages, then does the following Filter work in Analytics?

Will that basically filter against and URI which specifically contains 92-winston or do I need to wrap it in a fancy regex? I'm not that great with RegEx.

Thanks! Apologies in advance if this is ridiculously easy.

Getting "error: invalid regular expression"

So I have this code where I'd like to replace all single backslashes with 2 backslashes, for example: \ ---> \\ I tried to do this by the following code:

string = string.replace(new RegExp("\\", "g"), "\\\\");

But apparently this doesn't work because I get the following error:

Uncaught SyntaxError: Invalid regular expression: //: \ at end of pattern

Any idea why?

regex pattern for a reference point [on hold]

Using regex in python, I want to find a regex pattern and make the pattern a reference point for the text that follows. Then when the pattern is found again, it begins another record with the second-found-pattern as a reference point for everything that follows until the next pattern and its records.

I have been studying search, findall, and match, and group(), but I can't seem to find out how to do this. I know this isn't a perfect question, but I'm a noob. Thanks for any direction.

Here is an example of desired output:

<document>
Pattern stuff stuff stuff 
      Stuff thing 
Stuff 

Stuff stuff stuff stuff thing 

Pattern stuff stuff stuff stuff stuff stuff 
Pattern stuff stuff thing  stuff 

Pattern stuf thing 

<\document>

Desired output:

Pattern[0] stuff ...
Pattern[1] stuff ...
Pattern[2] stuff...

Another desired output:

Pattern[0] thing ...
Pattern[1] thing ...
Pattern[2] thing...

Regex using notepad++ to delete and replace text

Hey somim trying to use regex to replace and delete some text on existing code.

Here is sample of my text

01;TEST;Delete;Delete;Delete;keep||

So far i was able to replace the number and test to new text but i would like to delete 3 texts after words and keep the 4th so far i got ^\d+\; to work any help will be appreciated thanks :)

How to modify C++ code from user-input

I am currently writing a program that sits on top of a C++ interpreter. The user inputs C++ commands at runtime, which are then passed into the interpreter. For certain patterns, I want to replace the command given with a modified form, so that I can provide additional functionality.

I want to replace anything of the form

A->Draw(B1, B2)

with

MyFunc(A, B1, B2).

My first thought was regular expressions, but that would be rather error-prone, as any of A, B1, or B2 could be arbitrary C++ expressions. As these expressions could themselves contain quoted strings or parentheses, it would be quite difficult to match all cases with a regular expression. In addition, there may be multiple, nested forms of this expression

My next thought was to call clang as a subprocess, use "-dump-ast" to get the abstract syntax tree, modify that, then rebuild it into a command to be passed to the C++ interpreter. However, this would require keeping track of any environment changes, such as include files and forward declarations, in order to give clang enough information to parse the expression. As the interpreter does not expose this information, this seems infeasible as well.

The third thought was to use the C++ interpreter's own internal parsing to convert to an abstract syntax tree, then build from there. However, this interpreter does not expose the ast in any way that I was able to find.

Are there any suggestions as to how to proceed, either along one of the stated routes, or along a different route entirely?

parsing website with domdocument

i write this script in order to extract external links(href without http) from a website . for parsing i use DOMdocument because it is advisable vs regex and i don't know if it is well written that'i the script

<?php 

  // It may take a whils to spider a website ... 
     set_time_limit(10000); 

  // Inculde the phpcrawl-mainclass 
 include_once('../PHPCrawl_083/PHPCrawl_083/libs/PHPCrawler.class.php'); 
  //include ('2.php');  
  // Extend the class and override the handleDocumentInfo()-method 

  class MyCrawler extends PHPCrawler 

   {   
    function handleDocumentInfo(PHPCrawlerDocumentInfo $DocInfo) {

        if (PHP_SAPI == "cli") $lb = "\n"; 
         else {
        $lb = "<br />"; 

         $home_url = parse_url($DocInfo->url ,PHP_URL_HOST ); 

         $dom = new DOMDocument();
          $dom->loadHTML($DocInfo->url);

          // all links in document
       $dom->strictErrorChecking = FALSE;

         // Get all the links
         $links = $dom->getElementsByTagName("a");
        foreach($links as $link) {
         $href = $link->getAttribute("href");


          if (strpos( $home_url['host'], $href) == -1) {

        echo $link ;
         }

       }


           }
         }
     }
    $crawler = new MyCrawler(); 
     $crawler->setURL("http://tunisie-web.org"); 

   $crawler->addURLFilterRule("#\.(jpg|gif|png|pdf|jpeg|css|js)$#i"); 
   $crawler->setWorkingDirectory("C:/Users/mayss/Documents/travailcrawl/"); 
        $crawler->go(); 

   ?>

Replace all occurrences of PHP function name with SED

I have a PHP file with multiple calls of function my_function() and I want to replace it with blah_blah(), just rename the function name everywhere. I have also another_my_function() there which I don't wan't to be replaced.

Here I've created the regular expression:

[^\w\*]my_function\(

Which means that the prefix should be some special character, than function name with starting round bracket. I replace it in a folder so here we go:

find my_folder_name -type f -exec sed -i 's/[^\w\*]my_function(/blah_blah(/g' {} \;

Round bracket is not a special character for sed so we don't need to escape it. The problem is, I'm sure you noticed, that it will replace the function names including the prefix symbol [^\w\*].

How can I replace only the name of function?

How do I select one exact substring but not another from a string in mysql?

I have the following string stored in a field in mysql:

a:11:{i:0;s:6:"BFA.AD";i:1;s:11:"BFA.AD.COPY";i:2;s:6:"BFA.AE";i:3;s:6:"BFA.TC";i:4;s:6:"BFA.FA";i:5;s:6:"BFA.GC";i:6;s:6:"BFA.IL";i:7;s:6:"BFA.IN";i:8;s:6:"BFA.PH";i:9;s:6:"BFA.PR";i:10;s:6:"BFA.TR";}

I am trying to write a case when statement that matches BFA.AD but not BFA.AD.COPY.

I have tried:

case when (select m.meta_value from `ccsadm_frm_item_metas` m where m.item_id = i.id and m.field_id = 1308) like '%BFA.AD%' then 1 else 0 end as 'BFA.AD',

which marks both BFA.AD and BFA.AD.COPY as 1. I have tried using the regular expression ["]\bBFA.AD\b["], but that does not seem to work in MySQL.

Can I get the length of the next line after match with sed?

I have a file of sequences, formatted as a line of information followed by the sequence, e.g.:

someinformation length=50
JJJIJJJJJJJJJIJGIJJJJJJIJJIJJJJJIJJJJHHHHHFFFFFCCC
someotherinformation length=50
GEFE?BEDHCBBACEBHAFEBFEBFHFFDDDFD@@@
[...]

I want replace the length=50 (could be a different number) with the actual length of the next line (without next line character). So something like this:

sed -i "s/length=[0-9]+/length=length_next_line/" infile

Is it possible in sed to get the length of the next line?

Trouble matching a regex part of an XML with java

Hello I got a problem using the regex with Java.

I'm trying to parse this :

*whatever string*
<AttributeDesignator AttributeId="MyIDToParse"
DataType="http://ift.tt/QHQMAU"
Category="theCategoryIWantToParse"
MustBePresent="false"
/>
*whatever string that may contain the same regular expression*

using this code (Pattern + Matcher)

Pattern regex = Pattern.compile("AttributeDesignator +AttributeId=\"(.+)\" +.*Category=\"(.+)", Pattern.DOTALL);
Matcher matcher = regex.matcher(xml);
while (matcher.find()) {
    String ID = matcher.group(1);
    String Category = matcher.group(2);

The output is the following :

group 1 :

MyIDToParse"
    DataType="http://ift.tt/QHQMAU"
    Category="theCategoryIWantToParse"
    MustBePresent="false"
    />
    *whatever string that may contain the same regular expression*

group2 :

theCategoryIWantToParse"
    MustBePresent="false"
    />
    *whatever string that may contain the same regular expression*

I feel like it's a simple thing but I can't find whatever I'm doing wrong.. When I used the regex in a website to test them it works correctly and highlight the right expression from my xml entry.

Angularjs filter with regex

I have the following regex to match a URL:

/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/

I need to apply this to a filter so it only matches and grabs the URL from a value:

I want to add it to this part in particular:

return arr[i].text;

Filter

    app.filter('getFirstCommentFrom',function(){
  return function(arr, user){
    for(var i=0;i<arr.length;i++)
    {
      if(arr[i].from.username==user)
        return arr[i].text;
    }
    return '';
  }
})



{{p.comments.data|getFirstCommentFrom:'machinas_test'}}

Not sure how to add this.

How to have multiple regex for same element

I have a text box which has a regular expression which is something like below

^AB[a-zA-Z0-9]{20}$

which basically allows charecter AB , followed by 20 either alphabetic or numbers, and for example lets consider the validation error for not following this regex is Some Test Error

I have a scenario where user enters AB1234 and tabs out of the text box, and the error Some Test Error shows, but I have a requirement of not showing the same error message Some Test Error if user is trying to follow the format but not adhering to the entire regex.

Scenario 1 :- User enters CD12345675438976524381 I need to show Some Test Error

Scenario 2 : USer enters AB12345 I need to shoe Different Test Error, because user tried to enter a value starting from AB*

How can achieve this, is there a way of specifying multiple regex's?

Ignore file with similar name using regex

I have a problem using regex I have a folder that contains 3 files

 Test.log 
 Test.log.1
 Test.log.2

I want to catch only the test.log file I tried a lot of combinations like:

Test.log
^Test.log
Test.log[^0-3]

Etc... What can I do?

what is that mean in perl language ? $testModule =~ s@/@::@ig;

I am a newbie of perl and I‘m reading some perl codes, I find one line below I could't understand, can anyone tell what is that mean " s@/@::@ig". I know '=~' is try to match some regular expression. usually I would see code like 's///gi', so I was a little confused of the following code. can anyone help to elaborate? thanks.

$testModule =~ s@/@::@ig;

How can I match a string with parenthesis in JavaScript?

I want to match a string with its parentheses using Regex.

This is the string I'm trying to match: (CA)

This is my Regex: (/$([A-Z]{2})$/)

I've tried using new RegExp("\$([A-Z]{2})\$") as well.

No matter what I try I always end up with the Unrecognized expression: (CA) error message in my console.

What am I doing wrong?

Matching regexes

I have this code:

full_patterns <- c("I2-EX-I3-EX-I2-IEX-I3-I2-EX-I2-I2-II3-I2-III2-I2-I3-INR-FA-NR-I3-INR-IEX-QU-I3-NR-FA-EX-QU-NR-I2-I2-I2-NR-TR-II2-I3-NR-IIEX-NR-NR-INR-NR-I3-I2-NR-IQU-QU-ITR-QU-NR-NR-QU-TR-NR-ITR-IFA-II2-QU-TR-FA-EX-QU-QU-QU-NR-QU-ITR-FA-QU-FA-FA-TR-FA-QU-EX-QU-IQU-QU-FA-FA-QU-QU-FA-FA-I3-NR-FA-II2-FA-QU-FA-I2-FA-NR-INR-TR-NR-EX-NR-NR-EX-TR-I3-INR-NR-FA-ITR-EX-NR-NR-IINR-INR-EX-EX-EX-NR-NR-NR-FA-FA", "FA-I2-I2-I2-EX-I2-I3-FA-II2-TR-II2-FA-I3-IFA-FA-NR-I3-I2-TR-II2-II2-FA-I2-II3-FA-QU-II2-I2-I2-NR-I2-I2-NR-II2-INR-I3-QU-I2-I3-QU-NR-I2-INR-QU-QU-I2-IEX", "FA-FA-ITR-IIFA,TR-FA-I2-I2-FA-EX-IFA,IEX,I2-I2-INR-I2-I3-I1,TR-NR-I2-I3-EX-IQU-TR-I3-NR-EX-I3-EX,I2-EX-IIIII2-II3-I2-EX,FA-IEX-EX-TR-EX-TR-I3-INR-I2-FA-FA-TR-I2-IIIIIFA-I2-FA-TR-III3-NR-FA-III3-TR-I2-I2,I2-I2-EX,TR-TR-I2-FA-I2-I3-IIIFA-ITR-FA-IFA-INR-NR-II2-I3-I2-FA-II2-EX-FA,I3-I3-TR-I3-FA-NR-II2-II3-TR-TR-EX,I3-TR-NR-TR-QU-EX-NR-TR-I2-EX-III3-INR-INR-IFA,TR-I3-I2-I3-NR-NR-I1,IIFA-FA-IFA-FA-NR-II3-NR-I2-FA-FA-IFA-NR-FA,IFA-FA-NR-NR-I2-NR-IIIFA-EX,II2-II2-I2-QU-TR-FA-QU-I3-EX-ITR-IFA-FA-NR-INR-FA-FA-EX-II2-NR-I3,I3-FA-I2-I2-FA-I2-FA-I2,I2-INR-I2-NR-II3-TR-FA-I2-I3,I3-NR-EX-TR-IEX,II2-FA-I2-INR-I2-I3-IIEX-FA,IEX-EX-EX-EX-EX-EX-EX-TR-TR-I2-NR-NR-EX-NR-I3-FA-NR-NR-NR-EX-NR-II2-IIFA-FA-ITR-NR-I2-I3-I2-NR-FA-NR-I1")

literal_strings <- c("FA-QU-II2-I2-I2-NR-I2-I2-NR-II2-INR-", "QU-I2-", "QU-NR-I2-INR-QU-QU-I2-IEX-", "FA-", "QU-EX-NR-", "NR-EX-", "NR-EX-TR-", "QU-")

regex_list <- list()
for (i in 1:length(literal_strings)){
  regex_list[i] <- paste0("(?<=", literal_strings[i], "?)(?:I\\d-?)*I3(?:-?I\\d)*")
}

IVs_identified <- list()
DVs_identified <- list()

for (i in 1:length(regex_list)){
  DVs_identified[[i]] <- lapply(full_patterns, str_extract_all, regex_list[[i]])
  IVs_identified[[i]] <- lapply(full_patterns, str_extract_all, literal_strings[[i]])
}

data.frame(unlist(DVs_identified), unlist(IVs_identified))

length(unlist(DVs_identified))
length(unlist(IVs_identified))

The point of the code is to generate a data.frame with two columns. The first column should contain the first part of the regex match (contained in literal_strings). The second column should have the second part of the regex match (i.e. (?:I\\d-?)*I3(?:-?I\\d)*, but only if it is followed by the appropriate literal string).

As you can see, the code doesn't work because the two lists that are generated are of different lengths - they need to be of the exact same length. How can I accomplish this?

Trying to find regex performance tweaks for large pattern against binary matches

This is a snippet of a very large regex I am working on - for lack of a better description it is basically a tree of branches from the starting characters which I found to increase performance (radically: million passes 3.6s cut to 0.875s).

For some reason using ?: to skip back-references did not accelerate anything but I discovered ?> sliced 200ms off a million passes without errors.

But I am at a dead-end for performance tweaks - I still see patterns in there but I do not understand enough about advanced regex to re-order it further.

Are there any obvious tweaks that I am missing?

      /\x87(?>                   
              \xa6\xF0\x9F\x87[\xa8-\xac\xae\xb1\xb2\xb4\xb7\xb9\xba\xbc\xbf]
            | \xa7\xF0\x9F\x87[\xa6\xa7\xa9-\xaf\xb2-\xb4\xb7-\xb9\xbc\xbe\xbf]
            | \xa8\xF0\x9F\x87[\xa6\xa9\xab-\xae\xb1-\xb4\xb7\xba\xbb\xbe\xbf]
            | \xa9\xF0\x9F\x87[\xaa\xaf\xb0\xb2\xb4\xbf]
            | \xaa\xF0\x9F\x87[\xa8\xaa\xac\xad\xb7-\xb9]
            | \xab\xF0\x9F\x87[\xae-\xb0\xb2\xb4\xb7]
            | \xac\xF0\x9F\x87[\xa6\xa7\xa9\xaa\xad\xae\xb1-\xb3\xb6\xb7\xb9\xba\xbc\xbe]
            | \xad\xF0\x9F\x87[\xb0\xb3\xb7\xb9\xba]
            | \xae\xF0\x9F\x87[\xa9\xaa\xb1\xb3\xb6-\xb9]
            | \xaf\xF0\x9F\x87[\xaa\xb2\xb4\xb5]
            | \xb0\xF0\x9F\x87[\xaa\xac-\xae\xb2\xb3\xb5\xb7\xbc\xbe\xbf]
            | \xb1\xF0\x9F\x87[\xa6-\xa8\xae\xb0\xb7\xb8\xb9\xba\xbb\xbe]
            | \xb2\xF0\x9F\x87[\xa6\xa8-\xaa\xac\xad\xb0-\xb4\xb7\xb8\xb9\xba-\xbf]
            | \xb3\xF0\x9F\x87[\xa6\xa8\xaa\xac\xae\xb1\xb4\xb5\xb7\xba\xbf]
            | \xb4\xF0\x9F\x87[\xb2]
            | \xb5\xF0\x9F\x87[\xa6\xaa-\xad\xb0\xb1\xb7-\xb9\xbc\xbe]
            | \xb6\xF0\x9F\x87[\xa6]
            | \xb7\xF0\x9F\x87[\xb4\xb8\xba\xbc]
            | \xb8\xF0\x9F\x87[\xa6-\xaa\xac-\xae\xb0-\xb4\xb7\xb9\xbb\xbe\xbf]
            | \xb9\xF0\x9F\x87[\xa9\xac\xad\xaf\xb1-\xb4\xb7\xb9\xbb\xbc\xbf]
            | \xba\xF0\x9F\x87[\xa6\xac\xb8\xbe\xbf]
            | \xbb\xF0\x9F\x87[\xa6\xa8\xaa\xae\xb3\xba]
            | \xbc\xF0\x9F\x87[\xab\xb8]
            | \xbd\xF0\x9F\x87[\xb0]
            | \xbe\xF0\x9F\x87[\xaa]
            | \xbf\xF0\x9F\x87[\xa6\xb2\xbc]
            | [\xa6-\xbf]   
      )/xS

Validate ID value by Regular expression or xml.etree.ElementTree

Problem Statement:

I have to validate ID value of all elements in the HTML content. ID value rule is:-

ID value must begin with a letter ([A-Za-z]) and may be followed by any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"), colons (":"), and periods (".").

Code by Regular expression

>>> content
'<div id="11"> <p id="34"></p><div id="div2"> </div></div>'
>>> all_ids = re.findall("id=\"([^\"]*)\"", content)
>>> id_validation = re.compile("[A-Za-z][\-A-Za-z0-9_:\.]*$")
>>> invalid_ids = [i for i in all_ids if not bool(id_validation.match(i))]
>>> invalid_ids
['11', '34']

Code by xml.etree.ElementTree parser:

>>> import xml.etree.ElementTree as PARSER
>>> root = PARSER.fromstring(content)
>>> all_ids = [i.attrib["id"] for i in root.getiterator() if "id" in i.attrib]
>>> all_ids
['11', '34', 'div2']
>>> id_validation = re.compile("[A-Za-z][\-A-Za-z0-9_:\.]*$")
>>> [i for i in all_ids if not bool(id_validation.match(i))]
['11', '34']
>>>

It also one line by lxml, but exiting code NOT use lxml lib due to some reason.

>>> from lxml import etree
>>> root = etree.fromstring(content)
>>> root.xpath("//*/@id")
['11', '34', 'div2']

The input content contains 100000 of tags, so which is above process best for performance?

Best way to get URL without subdomains in JavaScript

I have some strings in JavaScript:

Example: http://ift.tt/1dTOlMV

I'm looking for a regex (or similar) expression that can remove the subdomains from the string that leaves: http://domain.com

Complex regex to capture digits and words containing comma and spaces

I have an input: 'BOOM', 'GRE,M', 'H SE', 1,2

Now, I want to split the string on the conditions:

If it a word in the string is surrounded by a single quote, remove the single quote e.g 'BOOM'==> BOOM, 'GRE,M' ==> GRE,M, 'H SE'==> H SE (the space should be kept). So, as the example shows, if the word has a space as in the "'H SE'" example, only the single quote should be removed and the space preserved.
if it is a word with no single quote, then do nothing e.g 1 ==>1 , 2==>2

I have a regex '(.*)', but this only takes care of words, within the single quotes. So, with the regex, I could easily get the following: BOOM, GRE,M, H SE BUT NOT the 1, and 2. What regex will do this?

Regular Expression to match compound words using only the first word

I am trying to create a regular expression in JS which will match the occurences of box and return the full compound word

Using the string:

the box which is contained within a box-wrap has a box-button

I would like to get:

[box, box-wrap, box-button]

Is this possible to match these words only using the string box?

Parse line of text after multiline regex pattern

I am attempting to parse fields from a pdf file converted to txt via pdfbox. Here is an example of a field I need to extract, "BUYER NAME AND ADDRESS:". These documents often contain translations, and the ":" colon appears a variable number of characters after BUYER NAME AND ADDRESS. Example below.

Txt file..
BUYER NAME AND ADDRESS / NOMBRE Y
DIRECCIÓN DEL COMPRADOR:
Name of buyer here
Txt continues..

Here is my attempted pattern / scanning code.

Scanner sc = new Scanner(txtFile);
Pattern p = Pattern.compile("BUYER NAME AND ADDRESS:.*", Pattern.MULTILINE);
sc.findWithinHorizon(p, 0);
String buyer = sc.nextLine();
buyer = sc.nextLine();
System.out.println("Buyer Name: "+buyer);

This works when the text file is english only e.g. BUYER NAME AND ADDRESS: but if there are additional characters or line returns, it fails. How can I fix the pattern?

Extract part of a url using pattern matching in python

I want to extract part of a url using pattern matching in python from a list of links

Examples:

http://ift.tt/1KNZ7nE  
http://ift.tt/1g4Fr1i

This is my regex:

re.match(r'(http?|ftp)(://[a-zA-Z0-9+&/@#%?=~_|!:,.;]*)(.\b[a-z]{1,3}\b)(/about[a-zA-Z-_]*/?)', str(href), re.IGNORECASE)

I want to get links ending only with /about or /about/
but the about regex selects all links with "about" word in it

Regex for number range from 0 to 31 excluding preceding zeros

I have written regex from numbers from 0 to 31. It shall not allow preceding zeros.

[0-2]\\d|/3[0-2]

But it also allows preceding zeros.

01 invalid
02 invalid

Can some tell me how to fix this.

RewriteRule for query string to static URL

I have a tag based search button linking to http://ift.tt/1Dq9dby that I want to rewrite in the browser to display as "http://ift.tt/1Dq9bAF (but still pull the content from that original URL).

Its a purely aesthetics move, I have tried:

RewriteEngine On
RewriteCond %{QUERY_STRING} ^tag=tanks
RewriteRule ^product/search/$ /tanks/ [L,QSA,NC]

Doesnt work for me, any suggestions?

diffuculties with a Regex Pattern in Java

I am facing a problem regarding Regex pattern creation in order to get all the required tokens. My String value on which regex will be applied have this shape can be like this;

Value:

"DB_TABLE_LUX.field_8='bbb \' `\" dsd' and DB_TABLE_FRA.field_1 = ' bbb dsd' and DB_TABLE_FRA.fieldName = ' bbb dsd ' or DB_TABLE_GER.field_3= 125 "

Required result: I want to have a list of Strings having those values

List {

"DB_TABLE_LUX.field_8='bbb \' `\" dsd'",

"DB_TABLE__FRA.field_1 = ' bbb dsd'",

"DB_TABLE_FRA.fieldName = ' bbb dsd '",

"DB_TABLE_GER.field_3= 125" }

Following the regex used :

"DB_TABLE_[a-zA-Z]{3}\.\w+\s*\=\s*([0-9]+|(\'(\s*\w+\s*)+\'))"

The regex below is not extracting the whole data, the first values is missing and below is the result

List{

"DB_TABLE_FRA.field_1 = ' bbb dsd'",

"DB_TABLE_FRA.fieldName = ' bbb dsd '",

"DB_TABLE_GER.field_3= 125" }

I want to take into account the next value

DB_TABLE_LUX.field_8='bbb \' `\" dsd'

Thanks.

How to match any characters without space and don't array usefull things

How to search all characters without space and optimize speed of my script by don't adding LTEXT, PUSHBUTTON etc to the array.

My pattern:

$pattern = '/(LTEXT|PUSHBUTTON|CAPTION|GROUPBOX|RTEXT|MENUITEM|[0-9]+|IDS_.+|STRING_.+) .*"(.+)".*\R?/';

To be more specific: searching chars without characters must be in this place of code: "(.+)"

Need Regular Expression in scala

This is my response

   scala > val a= """{"string":"{\"data\":{\"id\":\"2c91809f4ef7678b014ef86ee28511c2\",\"unitName\":\"gatlir1\",\"owner\":\"gatlir1\",\"description\":\"gatlir1\",\"nofChairs\":0,\"nofBeds\":0,\"nofApptStartWithInHour\":0,\"nofApptDischargeWithInHour\":0,\"modifiedDateTime\":\"Aug 4, 2015 4:18:13 AM\"},\"status\":\"SUCCESS\",\"message\":\"unit_save\"}"}"""

i need to fetch id value from that response in scala. i have stored response in one variable. and i have written regular expression for that and stored in another variable.

Here is the problem , i am getting error. RegEx:

     scala> val b= """{\"id\":\"(\w+)\""""

while inserting regex into "b" , i didn't get any error, but while comparing i am getting error

if i write expression like this

          scala> a.matches(b)

error:

  java.util.regex.PatternSyntaxException: Illegal repetition
   {\"id\":\"(\w+)\"
   at java.util.regex.Pattern.error(Pattern.java:1955)
   at java.util.regex.Pattern.closure(Pattern.java:3157)
   at java.util.regex.Pattern.sequence(Pattern.java:2134)
   at java.util.regex.Pattern.expr(Pattern.java:1996)
   at java.util.regex.Pattern.compile(Pattern.java:1696)
   at java.util.regex.Pattern.<init>(Pattern.java:1351)
   at java.util.regex.Pattern.compile(Pattern.java:1028)
   at java.util.regex.Pattern.matches(Pattern.java:1133)
   at java.lang.String.matches(String.java:2109)

could any one help me in this..

Regex to match repeating groups only capturing one group

I want to match the string

6 cakes 5 donuts 12 muffins

into three groups viz. 6 cakes, 5 donuts, 12 muffins. To achieve this I've used the regex

([\d]{1}[\s]{1}[\w]*)

But the problem is its only matching the first group 6 cakes and ignoring the rest. How can I change it to make the group repeating.

Regex - Repeating negative character classes?

To put it simply, I have the regular expression: /[^0-9]+/gi

It is not stored as a string, but as a JavaScript regular expression. In other words, without quotation marks. My intention is to return an array of character classes consisting of characters that are not digits.

I expect to return this array when given the code: /[^0-9]+/gi.exec("rgb(123, 124, 125);");

However I only receive an array of length: 1 with index [0] being "rgb(". Why am I not getting an array consisting of the other non-digits like ); or ,?

regular expression in oracle to get a range of ip address from 100.64.. to 100.127..?

i need a regular expression in oracle to get a range of IP address from 100.64.*.* to 100.127.*.* ?

for example if i have

1) 100.63.22.55 
2) 100.64.102.558 
3) 100.123.22.12 
4) 100.127.22.55 
5) 100.128.221.55

regular expression should pick the below ip address

2) 100.64.102.558 
3) 100.123.22.12 
4) 100.127.22.55

Unexpected behavior when validating password using abide validation

I'm using Zurb Foundation and I'm trying to make custom password validation using abide validation.

I need to make sure the password has at least 1 uppercase, 1 lowercase, 1 number, 1 special character and this is the regex I'm using /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[^\da-zA-Z]).{8,15}$/ which I got from this accepted answer.

But I can't submit the form, although I'm fulfilling the conditions for the password.

This is a jsfiddle link, although I couldn't add foundation.js properly as External Resource, I get the same behavior I'm getting on my local machine.

How to extract latitud and longitude greedily in Pandas?

I have a dataframe in Pandas like this:

        id          loc
 40     100005090   -38.229889,-72.326819   
 188    100020985   ut: -33.442101,-70.650327   
 249    10002732    ut: -33.437478,-70.614637   
 361    100039605   ut: 10.646041,-71.619039    \N
 440    100048229   4.666439,-74.071554

I need to extract the gps points. I first ask for a contain of a certain regex (found here in SO, see below) to match all cells that have a "valid" lat/long value. However, I also need to extract these numbers and either put them on a series of their own (and then call split on the comma) or put them in two new pandas series. I have tried the following for the extraction part:

ids_with_latlong["loc"].str.extract("[-+]?([1-8]?\d(\.\d+)?|90(\.0+)?),\s*[-+]?(180(\.0+)?|((1[0-7]\d)|([1-9]?\d))(\.\d+)?)$")

but it looks, because of the output, that the reg exp is not doing the matching greedily, because I get something like this:

    0   1            2      3   4           5   6       7    8
    40  38.229889   .229889 NaN 72.326819   NaN 72  NaN 72  .326819
    188 33.442101   .442101 NaN 70.650327   NaN 70  NaN 70  .650327

Obviously it's matching more than I want (I would just need cols 0, 1, and 4), but simply dropping them is too much of a hack for me to do. Notice that the extract function also got rid of the +/- signs at the beginning. If anyone has a solution, I'd really appreciate.

Search for 100th occurrence in a string using regular expression in Notepad++

I have a very large file with 10,000 lines and it has many occurrences of PQR in the string in many lines.

I want to know the 100th occurrence with the line number so that I can change the line accordingly.

Please help me with the regular expression I tried many in internet but not useful.

Thanks.

PHP Regex: number x number

I'm looking for a regex like for this pattern;

This is what i've found out, but its not working properly.

/([0-9]x[0-9])\w+/g

Regex uppercase words with condition

I'm new to regex and I can't figure it out how to do this:

Hello this is JURASSIC WORLD shut up Ok

[REVIEW] The movie BATMAN is awesome lol

What I need is the title of the movie. It will be only one per sentence. I have to ignore the words between [] as it will not be the title of the movie.

I thought of this:

^\w([A-Z]{2,})+

Any help would be welcome.

Thanks.

Does regex special character allowing security break?

My regex looks like this.

    if (preg_match("/^[a-zA-Z0-9~@#$^*()_+=[\]{}|\\,.?: -]*$/", $text) == FALSE) {
      echo 'Wrong!'
    }

I want to allow special characters ' and " sign too. How should I better implement it? And whether it is a security break for mysql database.

Show English characters and simple special characters only

Problem I need to fix is that if I enter characters outside English range in description textarea it shows that only 1 character is used for 1 character typed, so you can write full text area of 2000 characters.

head.php

script shows all (ā, ē, ī, ū, ') characters as 1 symbol typed. But in real the code understands those characters as many ascii characters. So when I try to submit full form with these characters (āēīū', and others) in submit.php IF statement error "Description should be between 500 and 2000 characters long!" shows up.

So it's because for example this special symbol ' is treated inserting this character \ before '. So it turns showing wrong count.
I would like to enter only english characters. But with some special characters.

For example ' this character is needed. But I need that it counts as 1 character not like 2 because of this \ slash for security reason.

head.php

    <script>
      $(document).ready(function() {
        var text_max = 2000;
        $('#textarea_feedback').html(text_max + ' characters');

        $('#textarea').keyup(function() {
        var text_length = $('#textarea').val().length;
        var text_remaining = text_max - text_length;

        $('#textarea_feedback').html(text_remaining + ' characters');
        });
      });
    </script>

function.php

    function sanitize($data) {
      return htmlentities(strip_tags(mysql_real_escape_string($data)));
    }

display.php

    <?php

    $sql = mysql_query("SELECT * FROM `table` ");

      while($row2 = mysql_fetch_array($sql)) {

        $description = $row2['description'];

     <?php if(strlen($description) >= 1000) { echo $description = substr($description, 0, 1000). '...'; ?>
      <a href="description.php?page=<?php echo $uid;?>">Read more</a>
    <?php } else { echo $description; } ?>
      }
    ?>

submit.php

    <?php $description = sanitize($_POST['description']); 

      $sql2 = "INSERT INTO `table` (description)
        VALUES ('$description')";
    ?>

    <form action="submit.php" method="post" enctype='multipart/form-data'>
      <span id="insert">Description</span><div id="textarea_feedback"></div><br>
      <textarea maxlength="2000" id="textarea" name="description" ><?php echo stripslashes($description); ?></textarea><br>
      <input type="submit" value="Submit" id="submit" name="submit" />
   </form>

    <?php
      if (isset($_POST['submit'])) {
        $con = mysql_connect("localhost","root","");
        if (!$con){
            exit("Can not connect: " . mysql_error());
        }
        mysql_select_db("database",$con);

      if ((strlen($description) < 500) or (strlen($description) > 2000)) {
      echo "<h1 class='top'>Description should be between 500 and 2000 characters long!</h1>";
      } else
            {
      mysql_query($sql2,$con);
        }
    }

Finding and removing the thousand separator in Notepad++

I have numbers formated like: 1.100.00 and would like to make 1100.00

Couldn't find something similar to solve this.

Java regex match group empty even though it matches

Let's assume I got this line:

|125148 Schalter f GLE GÜ 90/80Z nei PL 80 16AJ

And want to match the following two parts:

125148
Schalter f GLE GÜ 90/80Z nei PL 80

16AJ could be used as "break point" so anything from the first letter after the number until the "break point" should be matched in a group.

I got this somehow working with this regex ^\|([0-9])+(.)+(?=\s+16AJ) but my first group contains 8 characters and my second group contains nothing.

See this demo

What am I missing out here? Why is my second group empty?

phone number validation regex in rails

Hi I need to validate phone number in these formats +91 9425456971 , (123) 986-5470 and 123-569-8740 I have given following regex

validates :phone, format: { with: /\A\(?\d{3}\)?[- ]?\d{3}[- ]?\d{4}\z/,
                              message: I18n.t('global.errors.phone_format')}

It working for (123) 986-5470 and 123-569-8740 but not for +91 9425456971. Please guide me how to solve this.Thanks in advance

HTML ID value validation by RE

Problem Statement:

ID value must begin with a letter ([A-Za-z]) and may be followed by any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"), colons (":"), and periods (".").

I done by regular expression.

Code:

>>> import re
>>> id_value1 =  "custom-title1"
>>> id_value2 =  "1-custom-title"
>>> pattern = "[A-Za-z][\-A-Za-z0-9_:\.]*"

Code for valid ID value

>>> flag= False
>>> try:
...     if re.finadll(pattern, id_value1)[0]==id_value1:
...         flag=True
... except:
...     pass
... 
>>> print flag
False

Code for invalid ID value:

>>> flag = False
>>> try:
...     if re.findall(pattern, id_value2)[0]==id_value2:
...         flag=True
... except IndexError:
...     pass
... 
>>> print flag
False

Code for IndexError

>>> try:
...     if re.findall(pattern, "")[0]=="":
...         print "In "
... except IndexError:
...    print "Exception Index Error"
... 
Exception Index Error
>>>

I will move above code in one function. This function will call more then 1000 times. So can anyone optimize above code?

regex pattern for a reference point

I have been studying search, findall, and match, and group(), but I can't seem to find out how to do this. I know this isn't a perfect question, but I'm a noob. Thanks for any direction.