I have this code:
full_patterns <- c("I2-EX-I3-EX-I2-IEX-I3-I2-EX-I2-I2-II3-I2-III2-I2-I3-INR-FA-NR-I3-INR-IEX-QU-I3-NR-FA-EX-QU-NR-I2-I2-I2-NR-TR-II2-I3-NR-IIEX-NR-NR-INR-NR-I3-I2-NR-IQU-QU-ITR-QU-NR-NR-QU-TR-NR-ITR-IFA-II2-QU-TR-FA-EX-QU-QU-QU-NR-QU-ITR-FA-QU-FA-FA-TR-FA-QU-EX-QU-IQU-QU-FA-FA-QU-QU-FA-FA-I3-NR-FA-II2-FA-QU-FA-I2-FA-NR-INR-TR-NR-EX-NR-NR-EX-TR-I3-INR-NR-FA-ITR-EX-NR-NR-IINR-INR-EX-EX-EX-NR-NR-NR-FA-FA", "FA-I2-I2-I2-EX-I2-I3-FA-II2-TR-II2-FA-I3-IFA-FA-NR-I3-I2-TR-II2-II2-FA-I2-II3-FA-QU-II2-I2-I2-NR-I2-I2-NR-II2-INR-I3-QU-I2-I3-QU-NR-I2-INR-QU-QU-I2-IEX", "FA-FA-ITR-IIFA,TR-FA-I2-I2-FA-EX-IFA,IEX,I2-I2-INR-I2-I3-I1,TR-NR-I2-I3-EX-IQU-TR-I3-NR-EX-I3-EX,I2-EX-IIIII2-II3-I2-EX,FA-IEX-EX-TR-EX-TR-I3-INR-I2-FA-FA-TR-I2-IIIIIFA-I2-FA-TR-III3-NR-FA-III3-TR-I2-I2,I2-I2-EX,TR-TR-I2-FA-I2-I3-IIIFA-ITR-FA-IFA-INR-NR-II2-I3-I2-FA-II2-EX-FA,I3-I3-TR-I3-FA-NR-II2-II3-TR-TR-EX,I3-TR-NR-TR-QU-EX-NR-TR-I2-EX-III3-INR-INR-IFA,TR-I3-I2-I3-NR-NR-I1,IIFA-FA-IFA-FA-NR-II3-NR-I2-FA-FA-IFA-NR-FA,IFA-FA-NR-NR-I2-NR-IIIFA-EX,II2-II2-I2-QU-TR-FA-QU-I3-EX-ITR-IFA-FA-NR-INR-FA-FA-EX-II2-NR-I3,I3-FA-I2-I2-FA-I2-FA-I2,I2-INR-I2-NR-II3-TR-FA-I2-I3,I3-NR-EX-TR-IEX,II2-FA-I2-INR-I2-I3-IIEX-FA,IEX-EX-EX-EX-EX-EX-EX-TR-TR-I2-NR-NR-EX-NR-I3-FA-NR-NR-NR-EX-NR-II2-IIFA-FA-ITR-NR-I2-I3-I2-NR-FA-NR-I1")
literal_strings <- c("FA-QU-II2-I2-I2-NR-I2-I2-NR-II2-INR-", "QU-I2-", "QU-NR-I2-INR-QU-QU-I2-IEX-", "FA-", "QU-EX-NR-", "NR-EX-", "NR-EX-TR-", "QU-")
regex_list <- list()
for (i in 1:length(literal_strings)){
regex_list[i] <- paste0("(?<=", literal_strings[i], "?)(?:I\\d-?)*I3(?:-?I\\d)*")
}
IVs_identified <- list()
DVs_identified <- list()
for (i in 1:length(regex_list)){
DVs_identified[[i]] <- lapply(full_patterns, str_extract_all, regex_list[[i]])
IVs_identified[[i]] <- lapply(full_patterns, str_extract_all, literal_strings[[i]])
}
data.frame(unlist(DVs_identified), unlist(IVs_identified))
length(unlist(DVs_identified))
length(unlist(IVs_identified))
The point of the code is to generate a data.frame with two columns. The first column should contain the first part of the regex match (contained in literal_strings). The second column should have the second part of the regex match (i.e. (?:I\\d-?)*I3(?:-?I\\d)*, but only if it is followed by the appropriate literal string).
As you can see, the code doesn't work because the two lists that are generated are of different lengths - they need to be of the exact same length. How can I accomplish this?
Aucun commentaire:
Enregistrer un commentaire