Semi-automatic Data Fusion Method for Spatial Datasets

Jong-chan Yoon, Han-joon Kim

Abstract


With the development of big data-related technologies, it has become possible to process vast amounts of data that could not be processed before. Accordingly, the establishment of an automated data selection and fusion process for the realization of big data-based services has become a necessity, not an option. In this paper, we propose an automation technique to create meaningful new information by fusing datasets containing spatial information. Firstly, the given datasets are embedded by using the Node2Vec model and the keywords of each dataset. Then, the semantic similarities among all of datasets are obtained by calculating the cosine similarity for the embedding vector of each pair of datasets. In addition, a person intervenes to select some candidate datasets with one or more spatial identifiers from among dataset pairs with a relatively higher similarity, and fuses the dataset pairs to visualize them. Through such semi-automatic data fusion processes, we show that significant fused information that cannot be obtained with a single dataset can be generated.


Full Text:

PDF

References


Bleiholder, Jens, and Felix, N., “Data fusion,” ACM computing surveys (CSUR), Vol. 41, No. 1, pp. 1-41, 2009.

Chang, T. W., “A Study on Integration and Application Plans of Address and Location Information,” The Journal of Society for e-Business Studies, Vol. 15, No. 2, pp. 93-105, 2010.

Cho, S. R. and Kim, H. J., “A Preliminary Study on Improving Korean Text Embedding Model,” Proceedings of KICS Winter Conference, 2020.

Cho, S. R. and Kim, H. J., “Topic Re-modeling System using Node2Vec,” Proceedings of Fall Conference of 2020 Korea Associations of Information Systems, 2020.

Choi, Y. S., Park, H. G., and Kim, G. S., “Establishment of th Plane Coordinate System for Framework Data(UTM-K) in Korea,” Korean Journal of Geomatics, Vol. 22, No. 4, 2004.

Gao, J., Li, P., Chen, Z., and Zhang, J., “A Survey on Deep Learning for Multimodal Data Fusion,” Neural Computation, Vol. 32, No. 5, pp. 829-864, 2020.

Grover, A. and Leskovec, “Node2Vec: Scalable feature learning for networks,” Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016.

Khan, S., Nazir, S., García-Magariño, I., and Hussain, A., “Deep learning-based urban big data fusion in smart cities: Towards traffic monitoring and flow-preserving fusion,” Computers & Electrical Engineering, Vol. 89, 106906, 2021.

Korea Ministry of the Interior and Safety, Road Name Address System, http://www.juso.go.kr/.

Lee, S. H., Yang, C. M., and Baek, S. C., “Improvement on Location Based Parcel Numbering System,” Journal of Cadastre & Land Informatix, Vol. 42, No. 1, pp. 148-149, 2012.

Li, Y. and Yang, T., “Word embedding for understanding natural language: A survey,” Guide to big data applications, pp. 83-104, Springer, 2018.

Liu, J., Li, T., Xie, P., Du, S., Teng, F., and Yang, X., “Urban big data fusion based on deep learning: An overview,” Information Fusion, Vol. 53, pp. 123-133, 2020.

Ma, L. and Zhang, Y., “Using Word2Vec to process big text data,” Proceedings of IEEE International Conference on Big Data, pp. 2895-2897, 2015.

Wiemann, S., and Lars, B., “Spatial data fusion in spatial data infrastructures using linked data,” International Journal of Geographical Information Science, Vol. 30, No. 4, pp. 613-636, 2016.

Winarno, E., Hadikurniawati, W., and Rosso, R. N., “Location based Service for Presence System using Haversine Method,” Proceedings of 2017 International Conference on Innovative and Creative Information Technology (ICITech), pp. 1-4, 2017.

Xia, P., Zhang, L., and Li, F., “Learning Similarity with Cosine Similarity Ensemble,” Information Sciences, Vol. 307, pp. 39-52, 2015.


Refbacks

  • There are currently no refbacks.