mardi 4 août 2015

How to *extract* latitud and longitude greedily in Pandas?

I have a dataframe in Pandas like this:

        id          loc
 40     100005090   -38.229889,-72.326819   
 188    100020985   ut: -33.442101,-70.650327   
 249    10002732    ut: -33.437478,-70.614637   
 361    100039605   ut: 10.646041,-71.619039    \N
 440    100048229   4.666439,-74.071554

I need to extract the gps points. I first ask for a contain of a certain regex (found here in SO, see below) to match all cells that have a "valid" lat/long value. However, I also need to extract these numbers and either put them on a series of their own (and then call split on the comma) or put them in two new pandas series. I have tried the following for the extraction part:

ids_with_latlong["loc"].str.extract("[-+]?([1-8]?\d(\.\d+)?|90(\.0+)?),\s*[-+]?(180(\.0+)?|((1[0-7]\d)|([1-9]?\d))(\.\d+)?)$")

but it looks, because of the output, that the reg exp is not doing the matching greedily, because I get something like this:

    0   1            2      3   4           5   6       7    8
    40  38.229889   .229889 NaN 72.326819   NaN 72  NaN 72  .326819
    188 33.442101   .442101 NaN 70.650327   NaN 70  NaN 70  .650327

Obviously it's matching more than I want (I would just need cols 0, 1, and 4), but simply dropping them is too much of a hack for me to do. Notice that the extract function also got rid of the +/- signs at the beginning. If anyone has a solution, I'd really appreciate.

Aucun commentaire:

Enregistrer un commentaire