• Login
    View Item 
    •   Home
    • Research
    • Articles
    • View Item
    •   Home
    • Research
    • Articles
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Browse

    All of KAUSTCommunitiesIssue DateSubmit DateThis CollectionIssue DateSubmit Date

    My Account

    Login

    Quick Links

    Open Access PolicyORCID LibguidePlumX LibguideSubmit an Item

    Statistics

    Display statistics

    An improved Density-Based Approach to Spatio–Textual Clustering on Social Media

    • CSV
    • RefMan
    • EndNote
    • BibTex
    • RefWorks
    Thumbnail
    Name:
    08658072.pdf
    Size:
    2.031Mb
    Format:
    PDF
    Description:
    Published version
    Download
    Type
    Article
    Authors
    Nguyen, Minh D.
    Shin, Won-Yong
    KAUST Department
    Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
    Date
    2019-03-04
    Online Publication Date
    2019-03-04
    Print Publication Date
    2019
    Permanent link to this record
    http://hdl.handle.net/10754/631755
    
    Metadata
    Show full item record
    Abstract
    Density-based spatial clustering of applications with noise (DBSCAN) is the most commonly used density-based clustering algorithm but may not be sufficient when the input data type is heterogeneous in terms of textual description. When we aim to discover clusters of geo-tagged records relevant to a particular point of interest (POI) on social media, examining only one type of input data (e.g., the tweets relevant to a POI) may draw an incomplete picture of clusters due to noisy regions. To overcome this problem, we introduce DBSTexC , a newly defined density-based clustering algorithm using spatio-textual information on social media (e.g., Twitter). We first characterize the POI-relevant and POI-irrelevant geo-tagged tweets as the texts that include and do not include a POI name or its semantically coherent variations, respectively. By leveraging the proportion of the POI-relevant and POI-irrelevant tweets, the proposed algorithm demonstrates much higher clustering performance than the DBSCAN case in terms of F1 score and its variants. While DBSTexC performs exactly as DBSCAN with the textually homogeneous inputs, it far outperforms DBSCAN with the textually heterogeneous inputs. Furthermore, to further improve the clustering quality by fully capturing the geographic distribution of geo-tagged points, we present fuzzy DBSTexC ( F-DBSTexC ), an extension of DBSTexC , which incorporates the notion of fuzzy clustering into the DBSTexC . We then demonstrate the consistent superiority of F-DBSTexC over the original DBSTexC via intensive experiments. The computational complexity of our algorithms is also analytically and numerically shown.
    Citation
    Nguyen MD, Shin W-Y (2019) An Improved Density-Based Approach to Spatio-Textual Clustering on Social Media. IEEE Access 7: 27217–27230. Available: http://dx.doi.org/10.1109/access.2019.2896934.
    Sponsors
    This paper was presented in part at the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Sydney, Australia, July/August 2017. This paper has been significantly extended based on the prior work [41].
    Publisher
    Institute of Electrical and Electronics Engineers (IEEE)
    Journal
    IEEE Access
    DOI
    10.1109/access.2019.2896934
    Additional Links
    https://ieeexplore.ieee.org/document/8658072
    ae974a485f413a2113503eed53cd6c53
    10.1109/access.2019.2896934
    Scopus Count
    Collections
    Articles; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

    entitlement

     
    DSpace software copyright © 2002-2021  DuraSpace
    Quick Guide | Contact Us | Send Feedback
    Open Repository is a service hosted by 
    Atmire NV
     

    Export search results

    The export option will allow you to export the current search results of the entered query to a file. Different formats are available for download. To export the items, click on the button corresponding with the preferred download format.

    By default, clicking on the export buttons will result in a download of the allowed maximum amount of items. For anonymous users the allowed maximum amount is 50 search results.

    To select a subset of the search results, click "Selective Export" button and make a selection of the items you want to export. The amount of items that can be exported at once is similarly restricted as the full export.

    After making a selection, click one of the export format buttons. The amount of items that will be exported is indicated in the bubble next to export format.