Academic Conference Categorization According to SubjectsUsing Topical Information Extraction from Conference Websites

Sue Kyoung Lee, Kwanho Kim

Abstract


Recently, the number of academic conference information on the Internet has rapidly increased, the automatic classification of academic conference information according to research subjects enables researchers to find the related academic conference efficiently. Information provided by most conference listing services is limited to title, date, location, and website URL. However, among these features, the only feature containing topical words is title, which causes information insufficiency problem. Therefore, we propose methods that aim to resolve information insufficiency problem by utilizing web contents. Specifically, the proposed methods the extract main contents from a HTML document collected by using a website URL. Based on the similarity between the title of a conference and its main contents, the topical keywords are selected to enforce the important keywords among the main contents. The experiment results conducted by using a real-world dataset showed that the use of additional information extracted from the conference websites is successful in improving the conference classification performances. We plan to further improve the accuracy of conference classification by considering the structure of websites.


Full Text:

PDF

References


Cho, J., “A New Word Semantic Similarity Measure Method based on WordNet,” Journal of Korean Institute of Information Technology, Vol. 11, No. 7, pp. 121-129, 2013.

Ciravegna, F., “, An Adaptive Algorithm for Information Extraction from Web-related Texts,” Proceeding of the IJCAI-2001 Workshop on Adaptive Text Extraction and Mining, 2001.

Conference.city, “International Conference Search Engine,” [URL] http://www.conference.city/.

Cortes, C. and Vapnik, V., “Support Vector Networks,” Machine Learning, Vol. 20, No. 3, pp. 273-297, 1995.

Cox, C., Nicolson, J., Finkel, J. R., Manning, C., and Langley, P., “Template Sampling for Leveraging Domain Knowledge in Information Extraction,” Proceeding of PASCAL Challenges Workshop, 2005.

Eom, J., “Information Extraction Using a Hidden Markov Model,” Thesis of Graduate School of Seoul National University, 2001.

Joachims, T., “Text Categorization with Support Vector Machines: Learning with Many Relevant Features,” Proceeding of the 10th European Conference on Machine Learning, Vol. 1398, pp. 137-142, 1998.

Kim, J., Park, S. B., and Lee, S. J., “Information Extraction from Call-for-Papers Using a Hidden Markov Model,” Proceeding of 2005 Conference on the HCI Society of Korea, Vol. 2005, No. 1, pp. 967-972, 2005.

Kreßel, U., “Pairwise Classification and Support Vector Machines,” Advances in Kernel Methods Support Vector Learning, pp. 255-268, 1999.

Lazarinis, F., “Combining Information Retrieval with Information Extraction for Efficient Retrieval of Calls for Papers,” Proceeding of IRSG’1998, 1998.

Lee, S. and Kim, H., “Keyword Extraction from News Corpus using Modified TF-IDF,” The Journal of Society for e-Business Studies, Vol. 14, No. 4, pp. 59-73, 2009.

Lee, Y., “A Study on Extracting News Contents from News Web Pages,” Journal of the Korean Society for Information Management, Vol. 26, No. 1, pp. 305-320, 2009.

Leopold, E. and Kindermann, J., “Text Categorization with Support Vector Machines: How to Represent Texts in Input Space?,” Machine Learning, Vol. 46, pp. 423-444, 2002.

Li, Y., Bontcheva, K., and Cunningham, H., “Using Uneven Margins SVM and Perceptron for Information Extraction,” Proceeding of the 9th Conference on Computational Natural Language Learning, 2005.

Munková, D., Munk, M., and Vozár, M., “Data Pre-Processing Evaluation for Text Mining: Transaction/Sequence Model,” 2013 International Conference on Computational Science, Vol. 18, pp. 1198-1207, 2013.

ReadabilityBUNDLE Library, [URL] https://github.com/srijiths/readabilityBUNDLE.

Roh, J.-H., Kim, H.-j., and Chang, J.-Y., “Improving Hypertext Classification Systems Through WordNet-based Feature Abstraction,” The Journal of Society for e-Business Studies, Vol. 18, No. 2, pp. 95-110, 2013.

Ryu, J., “Real-world Pattern Classifications Using Optimal Feature/Classifier Ensemble,” Master’s Theses for Graduate School of Seoul National University, 2002.

Schneider, K., “Information Extraction from Calls for Papers with Conditional Random Fields and Layout Features,” Artificial Intelligence Review, Vol. 25, No. 1, pp. 67-77, 2006.

Sebastiani, F., “Machine Learning in Automated Text Categorization,” ACM Computing Surveys, Vol. 34, No. 1, pp. 1-47, 2002.

WikiCFP, “A Semantic wiki for Calls For Papers in Science and Technology Fields,” [URL] http://www.wikicfp.com/cfp/.

Wikipedia, “TF-IDF,” [URL] https://ko.wikipedia.org/wiki/TF-IDF.

Xia, J., Wen, K., Li, R. and Gu, X., “Optimizing Academic Conference Classification using Social Tags,” 2010 13th IEEE International Conference on Computational Science and Engineering, pp. 289-294, 2010.

Xin, X., Li, J., Tang, J., and Luo, Q., “Academic Conference Homepage Understanding Using Constrained Hierarchical Conditional Random Fields,” In Proceeding of International Conference on Information and Knowledge Management, pp. 1301-1310, 2008.


Refbacks

  • There are currently no refbacks.