Demonstration on Extending The Pageview Feature to Page Section Based: Towards Identifying Reading Patterns of Users
Demonstration on extending_the_pageview_feature_to_page_section_based_presentation from Fajar Purnama
Fajar Purnama, Alvin Fungai, Tsuyoshi Usagawa.
The invention of the Internet had greatly change how people access, share, and store information. Before, people have to take a long walk to the library to read books, and attend seminars to learn from others. Today many people shares information on the Internet which is practically accessible almost anywhere at anytime with all each person needed was a computer device connected to the Internet. Not long afterward, online analytics or another name web analytics was introduced where we can see what visitors do on our webpage. One of an online analytic is pageview that can show the number, the duration, and variation of people reading the webpage. Conventionally, it can be seen how popular a book is by surveying how many copies were sold and how many have read them, but through online there is no longer restriction to place and time, and with the help of computers it is possible to get data that could determine the behavior of the readers.
The field of education was also influenced by these technologies that sparks popular terms such as e-learning, online course, and learning analytics. Our colleges implemented e-learning on their respective universities and their research   was able to determine whether the lectures and students were ready to use it, and their data can suggest the next course of action in order for the university to fully utilize e-learning. Another research in  studies about online discussion forums where they state that the design of the online course determines how active the students will be. Their data shows that assignments and quizzes decreases idleness and lurkers, and increases more active students. A research in  distinguished the learning patterns of those who fails and passes an online course, for example what materials students read, what exercises they attempted, and how often they perform online discussion before attempting an online exam.
Their research contributes to the quality of education, but there are still limits of how far their data can envision since they consist only of contents viewed, discussion posted, assignments submitted, quizzes attempted, and scores which cannot determine their detailed reading patterns. When reading a person can either read everything in detail, skim through the page, or semi-skim reading only the headlines then read in detail if that person finds it interesting. A common a example is when researcher browse to publish manuscripts where they first only reads the title, then the abstract if the title is eye catching, next skim through the manuscript by reading the introduction, figures, and conclusion if the abstract is interesting, finally they read in detail if they find it important to them. Simply the answer of what, when, and where the contents are viewed is in hand, but how the contents are viewed still remains a question. To answer this question it is necessary to extend the pageview feature alike by extending its monitoring to as far as each section of the page/contents.
One of the very basic that all users on the Internet know is browser history which hold records of the date and time of websites that they visited. Software developers developed more advance tools. A Chrome Browser plugin called Timestats  shows statistical and timeline analysis of the websites users visited showing their browsing behaviors such as how long they have been browsing for the day and which website was frequently visited presented in graph and pie chart alike. Relic Browser  was able to record user clicks and scrolls, and if desired, typings on keyboard which all of them can show how users responded to a webpage. Another application called CA App Real Browser Monitoring  which can playback some browsing based on the events recorded, this is very close to what this work wants to achieve.
On another side there are applications developed by researchers from learning analytics field. One of our colleges   developed an open textbook analytic tool that can record studentsâ actions for example page flipping, bookmarks, links clicked, notes created, and time spent on chapters and pages,  is also quite related but on e-magazines stating that it is able to show reading habits. On our perspective they desired to achieve a section based monitoring similar to this work but they used pages like on electronic textbook to divide contents into sections, however this work intends to monitor deeper than that. The closest and might be better to this work is Finger Trail Learning System (FTLS)  that records the mouse cursors trails. However it forces the users to trail on each characters on a text in order for it to appear and be highlighted, while this work does not force the users to do such but just browse normally. The authors made an introductory work on  but only highlight the main idea, on this work a detailed discussion will be made.
TThe web application architecture can be seen on Fig. 1 which consists of representation interface, web application programming interface (API), and a database. The representation interface tracks the reading pattern of a user on this case the date and duration on a section which is state of the art of this work. The web API is quite common that retrieves the data recorded by the representation interface, stores it on the database, and may process the data to show the statistics. Together they form a complete web application which the structure is quite standard and simple but may require additional feature depending on the implementation. Next subsections explains in details the method of the representation interface and the web API but the full source code can only be viewed on the link .
Listing I Concept of Tracker Code
Listing II Tracker Code Insertion Algorithm
On the introductory work  was to insert the tracker code manually as on Listing I which will work on any webpage, while this work introduces a method to insert the tracking code dynamically using document object model (DOM) HTML with the algorithm shown on Listing II, though it is only tested on a simple HTML page which contains no other than âhtmlâ, âheadâ, âbodyâ, âh1â, âpâ, and âbrâ tags. Most HTML today is more complicated and may need extra method for it to work.
The tracking code is all on Listing I except for line 9 and the goal is to insert them automatically. For a simple HTML the body usually contains nodes/tags âh1â for heading, âpâ for paragraph, âdivâ for sections, and etc. First, one of them have to be chosen as a section identifier called a section node. Second create a tracker code as many as the existing section node (on the algorithm is one additional because the starting node is usually not a section node). Third put the respective section node and its following nodes into their respective tracker code (the first section node and its following until the second section node into the first tracking code, then the second section node and its following until the third section node into the second tracking code, and so on).
The web APIâs job is to retrieve the tracking data (date accessed and time spent) of a section by the representation interface and store it into the database. Since capturing the tracking data uses a client side programming language, it is still stored on the client side, and therefore the client must handover the data to the server. The âformâ tag Listing I is to submit the data to the server, and the server must be prepared with a server side programming language which on this case Java is used. The first thing the program needs to do is to be able to read the parameter values âdateâ and âdurationâ from Listing I line 6 and line 7, and store it in a variable. The second thing is to make a connection to a database server, create a database if it is not created already, create a table, and store the values into the table. For the database server uses MySQL on this work. Afterwards another web API can be created to mine this data, present it like the ones on  which is beyond the scope of this work.
The page will submit the six data based on the six boxes everytime the mouse cursor leaves the section to a web API. The web API stores it on a database, and another simple web API can be created to show the data. An example output can seen on Fig. 4 where a raw output is shown from the databaseâs table. From the data can be said that the user starts reading from the first section of the context for 6 seconds, then moves to the next section and read there for 10 seconds, then moves on again. Normally when the duration is short, the user is skimming through the page, but when the duration is long, the user is focused on that section. Compared to using page view this data can show more information about userâs reading patterns.
Conclusion and Future Work
The web application was able to demonstrate recording the date accessed and time spent on a section of an HTML page by a user which can be used as an extension to pageview feature. The data obtained from the web application was able to show more reading pattern information and other researchers on related fields may use this on their analysis and may find new answers.
Though the concepts were shown on this work but were not implemented yet on a real situation/webpage which leaves this for a future work. Further evaluation issues such as appearance, compatibility, and resource consumption issues still needs to be addressed.
Part of this work was supported by JSPS KAKENHI Grant-in-Aid for Scientific Research 25280124 and 15H02795.