### Plant materials

Non-GM *B. napus* “Deza oil No. 18” (AACC, 2n = 38) was used as the pollen donor in this study. This cultivar has recessive genetic male sterility and is double cross variety, and its growth period is approximately 224 days. The pollen recipient plant was the open-pollination variety (Nongxing 80-day) of *B. rapa* (AA, 2n = 20), which is mainly used as a green manure crop in Taiwan. *B. napus* seedlings were treated with vernalization to ensure flowering in Taiwan. *B. napus* seedlings were cooled to 4 °C for at least 30 days. After *B. napus* vernalization, *B. napus* and *B. rapa* seedlings were planted in 128-well plastic trays in a greenhouse. The seedlings were transplanted to a field until the five-leaf stage.

### Experiment design

The pollen dispersal experiments were conducted from the fall of 2013 to the spring of 2017 at Taiwan Agricultural Research Institute (TARI), Council of Agriculture (COA), Executive Yuan (24° 03′ N, 120° 69′ E), and Agricultural Experiment Station (AES), College of Agriculture and Natural Resources, National Chung Hsing University (24° 07′ N, 120° 71′ E). The experiments were replicated eight times, four times for each site. The total area of the two experimental sites was approximately 0.054 ha (36 × 15 m^{2}; Fig. 1; Hong et al. 2016; Su 2015; Wang 2017; Yang 2018).

The two pollen recipient plots were located next to the pollen donor plot to simulate adjacent field arrangements in Taiwan (Nieh et al. 2014). The field design of the experiment was established at TARI, where the two recipient plots were located on the north and south sides of the source plot. At the AES site, the two recipient plots were set up on the west and east sides of the source plot. Each experimental field had 12 furrows, and each furrow had two rows. There were 696 and 1776 *B. napus* and *B. rapa* plants in each field, respectively. Blooming was controlled through cutting early flowers to assure flower synchronization.

Meteorological information was recorded by a weather station located at TARI. The daily maximum frequency of the wind direction was taken as the field prevailing wind direction of each day. The proportion of each wind direction during the flowering period was defined as the field prevailing wind direction.

The recipient plants were sampled in two rows of each furrow (except the first and last furrow) at different distances. The sampling distance was in the range of 0.35–12.95 m at 0.7-m intervals. One or two flower stalks were cut for each plant. Mature pods were dried, threshed, and stored for inspecting the hybridization of recipient plants.

### Hybrid progeny screening

A previous study discovered that the hybrids of *B. rapa *× *B. napus* could be distinguished from their parents through morphology (Jørgensen and Andersen 1994; Lu et al. 2001; Tu et al. 2020). The morphology characteristics of *B. napus*, *B. rapa* and *B. rapa *× *B. napus* (F1) were described in Tu et al. (2020). The difference between F1 hybrid and parents also showed in the genome size and molecular marker (Tu et al. 2020). In this study, leaf characteristics were used to differentiate between hybrid and nonhybrid progenies at the two-leaf stage. The hybrid leaves were circular, dark green, and displayed a trichome and strong dentation at the margin (Fig. 2a, b). By contrast, the nonhybrid leaves were thin oval shape, light green (Fig. 2c, d).

For each sample, 384 seeds were sowed in plastic trays, and the number of hybrid progenies was counted. The CP rate (%) was calculated by counting the number of outcrossing progenies in each seeding sample, as follows (Eq. 1):

$$ {\text{CP }}\left( {\text{\% }} \right) = \frac{{n_{c} }}{N} \times 100\% $$

(1)

where *n*_{c} is the number of hybrid progenies, and *N* is the total seedling number of the sample. Because of model fitting requirements, the CP rates were transformed into count data by multiplying them by 384 and rounding the value.

### Zero-inflated model

According to previous studies, the CP rate decreases with increasing distance (Beckie et al. 2003; Damgaard and Kjellsson 2005). Therefore, this may result in a relatively large number of zero values in the CP data. Most of models typically demonstrate poor predictive performance when fitted with excess zero values (Rodriguez 2013). The zero-inflated model has been proposed to address the problem of excess zero-count data (Greene 1994; Lambert 1992).

The ZIP model is a model consisting of a fixed zero count and a Poisson distribution. The ZIP model increases the probability of the occurrence of zero values to address excess zero counts. Assume that the probability of zero counts is *π*_{i}, and the response variable *Y*_{i}, *i *= 1, 2, 3…, *n*, is a counting variable with a probability density function (pdf; Eq. 2):

$$ {\text{P}}\left( {Y_{i} = y_{i} ;\mu_{i} , \pi_{i} } \right) = \left\{ {\begin{array}{*{20}l} {\pi_{i} + \left( {1 - \pi_{i} } \right)e^{{ - \mu_{i} }} , \quad y_{i} = 0} \\ {\left( {1 - \pi_{i} } \right)\frac{{\mu_{i}^{{y_{i} }} }}{{y_{i} !}}e^{{ - \mu_{i} }} , \qquad y_{i} > 0} \\ \end{array} } \right. $$

(2)

where *μ*_{i} is the parameter of the Poisson distribution. The parameter *μ*_{i} satisfies the log link function (Eq. 3). We defined the predictor of *μ*_{i} as *Q* × *r(x, y)*. The parameter *Q* and dispersal kernel function *r(x, y)* were introduced in a previous study (Bullock et al. 2017). Dispersal kernel functions include log-sech, exponential power, power law, logistic, 2Dt, gamma, WALD, Weibull, Exponential, log-normal, and Gaussian. Variables *x* and *y* are the two-dimensional coordinates. The parameter *π*_{i} is defined as the logit link function (Eq. 4). The predictor for *π*_{i} is the same as that for *μ*_{i}.

$$ \mu_{i} = \exp \left( {Q \times r\left( {x,y} \right)} \right) $$

(3)

$$ \pi_{i} = \frac{{\mu_{i} }}{{1 + \mu_{i} }} $$

(4)

Bias may remain in parameter estimation when the ZIP model fits the overdispersed data. Therefore, another zero-inflated model, the zero-inflated negative binomial (ZINB) model, was suggested to solve this problem. The concept of the ZINB model is similar to that of the ZIP model. Because the ZINB model adds a parameter to evaluate the dispersion of data, it is more suitable for overdispersed data. The pdf of the ZINB model is analogous to that of the ZIP model (Eq. 5).

$$ {\text{P}}\left( {Y_{i} = y_{i} ; \mu_{i} , \pi_{i} } \right) = \left\{ {\begin{array}{*{20}l} {\pi_{i} + \left( {1 - \pi_{i} } \right) \cdot g\left( {y_{i} } \right), \quad y_{i} = 0} \\ {\left( {1 - \pi_{i} } \right) \cdot g\left( {y_{i} } \right), \qquad y_{i} > 0} \\ \end{array} } \right. $$

(5)

$$ g\left( {y_{i} } \right) = \frac{{\varGamma \left( {y_{i} + \alpha^{ - 1} } \right)}}{{\varGamma \left( {\alpha^{ - 1} } \right)\varGamma \left( {y_{i} + 1} \right)}}\left( {\frac{1}{{1 + \alpha \mu_{i} }}} \right)^{{\alpha^{ - 1} }} \left( {\frac{{\alpha \mu_{i} }}{{1 + \alpha \mu_{i} }}} \right)^{{y_{i} }} $$

(6)

The function *ɡ(y*_{i}*)* is the pdf of the negative binominal distribution, where Γ is the gamma function, and *α* is the shape parameter. The definitions of *μ*_{i} and *π*_{i} in the ZINB model are the same as those in the ZIP model (Eqs. 3 and 4).

To apply the two-dimensional function *r*(*x, y*), the distance between individual plants is calculated using Eq. 7. The experimental field is considered a two-dimensional coordinate plane where plant positions are defined by a coordinate point. In Eq. 7, coordinate points (*x, y*) and (*x’, y’*) define the positions of the recipient and donor plants, respectively.

$$ {\text{distance}} = \sqrt {\left( {x - x^{\prime}} \right)^{2} - \left( {y - y^{\prime}} \right)^{2} } $$

(7)

### Statistical analysis

We expected that wind would not influence the number of CP events. A CP event was defined as the occurrence of CP at a sampling point. We compared the number of CP events in the two recipient plots by using a z-test to evaluate the wind effect. In addition, this study conducted an ANOVA to test the wind effect to the variation of CP rate. Examination of excess zero values was conducted by counting the frequency of zero values among the data and comparing this with the predicted zero frequency of the Poisson distribution. There were excess zero values if the number of zero events was more than expected. Overdispersion was examined based on the assumption of Poisson distribution. If the variance was higher than the mean, then overdispersion may have occurred in the data. In addition, we calculated the deviance by fitting the data with the Poisson distribution, and we computed the ratio of deviance to the degree of freedom (d.f.). A dataset with a ratio of > 1 is considered to be overdispersed (McCullagh and Nelder 1989).

The data of each year and site were combined, and 70% of the total data were randomly selected to train the model. The remaining 30% of the data were the validation dataset. The performance of model fitting was evaluated based on root mean square error (RMSE), adjusted coefficient of determination (adj. R^{2}), Akaike’s information criterion (AIC), and Schwarz’s Bayesian information criterion (BIC; Akaike 1974; Schwarz 1978). We selected models with small values of RMSE, AIC, and BIC. A large adj. R^{2} value demonstrated a good model fit. The predictive capability of the model was assessed based on the mean squared prediction error (MSPR). In our study, a model with a small MSPR value was selected as the best model (Jung and Hu 2015). The model selection procedures identified models with a good predictive ability based on the aforementioned criteria recommended for application. The conservative isolation distance at various thresholds was estimated through 500 bootstrapping simulations. The 95th percentile of the distance generated through the simulations was considered the conservative isolation distance. All statistical analyses were performed using SAS 9.4 (SAS Institute Inc., Cary, NC, USA) and R v 3.4.0 (R Development Core Team 2017) software.