A machine learning approach to the classification of phishing bot accounts within Twitter

Christopher Brake

Abstract


Social network bots are becoming an ever-greater threat to online users. Most studies carried out have looked at bots which generate a lot of tweets known as spam, as these are very common. In recent years research into the area of bots within Twitter has been carried out using machine learning to attempt to find patterns in these ac-counts to aide with detection. However, limited research has been carried out that focuses on a sub set of Twitter bots which are involved in phishing campaigns which tweet very little to avoid detection. In this project an application was developed that combines a variety of commercial tools with machine learning theory to allow a user to collect and analyse public Twitter data using a neural network. The focus of the project is to try and find patterns in these phishing bots’ properties and to use the data collected to train a neural network to recognise these patterns and detect bots. A Twit-ter crawler was developed that harvests data from the Twitter API and stores it in a graph database. The data is then formatted and normalised by a pre-processor mod-ule which is then fed into a neural network. The neural network evaluates the data and creates predictions based on what it has previously learnt, these predictions are then displayed in a graph format within the browser. Experimental results have shown that there is a pattern in the properties of an account, and tests showed a correlation in the friend to follower ratio of bot accounts. With this pattern and other properties of an account, a neural network has been trained to detect bot accounts, with tests showing the neural being able to make predictions for an account with an accuracy of 92%. Whilst these results are still experimental the project has proven that is it possible to detect bots within Twitter using just the properties of an account.

Full Text:

PDF

References


Adeli, H. and Hung, S. (1995). Machine learning. New York: John Wiley & Sons.

Alchemy (2017). Alchemy.js. [online] Graphalchemist.github.io. Available at: http://graphalchemist.github.io/Alchemy/#/ [Accessed 8 May 2017].

Apache Spark (2017). Apache Spark™ - Lightning-Fast Cluster Computing. [online] Spark.apache.org. Available at: https://spark.apache.org/ [Accessed 8 May 2017].

Bostock, M. (2017). D3.js - Data-Driven Documents. [online] D3js.org. Available at: https://d3js.org/ [Accessed 8 May 2017].

Bottle (2017). Bottle: Python Web Framework — Bottle 0.13-dev documentation. [online] Bottlepy.org. Available at: http://bottlepy.org/docs/dev/ [Accessed 8 May 2017].

Chu, Z., Gianvecchio, S., Wang, H., & Jajodia, S. (2012). Detecting automation of twitter accounts: Are you a human, bot, or cyborg? IEEE Transactions on Dependa-ble and Secure Computing, 9(6), 811-824.Gibson, A. and Nicholson, C. (2017). Deeplearning4j: Open-source, Distributed Deep Learning for the JVM. [online] Deeplearning4j.org. Available at: https://deeplearning4j.org/ [Accessed 8 May 2017].

Gibson, A. and Nicholson, C. (2017). ND4J: N-Dimensional Arrays for Java - N-Di-mensional Scientific Computing for Java. [online] Nd4j.org. Available at: http://nd4j.org/index.html [Accessed 8 May 2017].

Grier, C., Thomas, K., Paxson, V., & Zhang, M. (2010, October). @ spam: the under-ground on 140 characters or less. In Proceedings of the 17th ACM conference on Computer and communications security (pp. 27-37). ACM.

Jones, M. (2005). AI application programming. 1st ed. Hingham (Mass.): Charles River Media.

Knight-McCord, J., Cleary, D., Grant, N., Herron, A., Lacey, T., Livingston, T., & Emanuel, R. (2016). What social media sites do college students use most? Journal of Undergraduate Ethnic Minority Psychology, 2, 21.

Neo4j (2017). Neo4j, the world's leading graph database - Neo4j Graph Database. [online] Neo4j Graph Database. Available at: https://neo4j.com/ [Accessed 17 May 2017].

OpenBLAS (2017). OpenBLAS: An optimized BLAS library. [online] Openblas.net. Available at: http://www.openblas.net/ [Accessed 8 May 2017].

Porter, B., Zyl, J. and Lamy, O. (2017). Maven – Welcome to Apache Maven. [online] Maven.apache.org. Available at: https://maven.apache.org/ [Accessed 27 Apr. 2017].

Subrahmanian, V. S., Azaria, A., Durst, S., Kagan, V., Galstyan, A., Lerman, K. & Menczer, F. (2016). The DARPA Twitter bot challenge. Computer, 49(6), 38-46.

Thomas, K., Grier, C., Song, D., & Paxson, V. (2011, November). Suspended ac-counts in retrospect: an analysis of twitter spam. In Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference (pp. 243-258). ACM.

Twitter (2016). Developer Agreement & Policy. [online] Available at: https://dev.twit-ter.com/overview/terms/agreement-and-policy [Accessed 8 May 2017].

Wang, A. H. (2010). Detecting Spam Bots in Online Social Networking Sites: A Ma-chine Learning Approach. DBSec, 10, 335-342.

Zhang, C., & Paxson, V. (2011). Detecting and analyzing automated activity on twit-ter. In Passive and Active Measurement (pp. 102-111). Springer Berlin/Heidelberg.

Zhang, Q. J., Gupta, K. C., & Devabhaktuni, V. K. (2003). Artificial neural networks for RF and micro-wave design-from theory to practice. IEEE transactions on micro-wave theory and techniques, 51(4), 1339-1350.


Refbacks

  • There are currently no refbacks.


Creative Commons License 
This work is licensed under a Creative Commons Attribution 3.0 License

ISSN 1754-2383 [Online] ©University of Plymouth