I have a dataframe in Pandas like this:
id loc
40 100005090 -38.229889,-72.326819
188 100020985 ut: -33.442101,-70.650327
249 10002732 ut: -33.437478,-70.614637
361 100039605 ut: 10.646041,-71.619039 \N
440 100048229 4.666439,-74.071554
I need to extract the gps points. I first ask for a contain of a certain regex (found here in SO, see below) to match all cells that have a "valid" lat/long value. However, I also need to extract these numbers and either put them on a series of their own (and then call split on the comma) or put them in two new pandas series. I have tried the following for the extraction part:
ids_with_latlong["loc"].str.extract("[-+]?([1-8]?\d(\.\d+)?|90(\.0+)?),\s*[-+]?(180(\.0+)?|((1[0-7]\d)|([1-9]?\d))(\.\d+)?)$")
but it looks, because of the output, that the reg exp is not doing the matching greedily, because I get something like this:
0 1 2 3 4 5 6 7 8
40 38.229889 .229889 NaN 72.326819 NaN 72 NaN 72 .326819
188 33.442101 .442101 NaN 70.650327 NaN 70 NaN 70 .650327
Obviously it's matching more than I want (I would just need cols 0, 1, and 4), but simply dropping them is too much of a hack for me to do. Notice that the extract function also got rid of the +/- signs at the beginning. If anyone has a solution, I'd really appreciate.
Aucun commentaire:
Enregistrer un commentaire