Abstract:
There are so many multi-scale, multi-variable, and multi-physics coupling nonlinear seepage problems in the process of porous media seepage, which presents a huge challenge for the characterization of complex mechanism of flow behavior in the porous media and the analytical solution of mathematical models. The complex mathematical model considers the key mechanical problems of fluid flow in porous media, and its solution is a trade-off between computational cost and calculation accuracy. In recent years, the seepage proxy model based on various types of oilfield data has provided some possible alternatives for efficiently solving multi-variable nonlinear fluid flow problems. However, the application of seepage proxy model in oilfields is limited by the small sample data due to incomplete records and improper operation. A data-driven proxy model is proposed in this paper to predict the cumulative oil production based multi-variable and small sample oilfield data. Through a series of data preprocessing methods such as filling in missing values, one-hot encoding of classified data, data standardization etc., the database to forecast oil production can be built; In this paper, the random split techniques can be used to divided the whole database into train data and test data. Besides, ten-fold cross validation can be applied to test the error and accuracy of three data-driven models, which include Random Forest, extreme Gradient Boosting and Artificial neural networks. The results show that the determination coefficients of the three data modes all exceed 0.8, and the prediction results are more consistent with the actual data; In addition, for the small sample of multivariate oilfield data, data preprocessing methods have a significant impact on the accuracy of the cumulative oil production prediction; Moreover, after data standardization, the Random Forest algorithm performs best (mean square error of 0.12, coefficient of determination 0.87), which is more suitable for small samples of multivariate production forecast problem.