Rsync and Rdiff Implementation on Moodle's Backup and Restore Feature for Course Synchronization over The Network
Rdiff and Rsync Implementation on Moodle's Backup and Restore Feature of Course Synchronization over The Network Presentation from Fajar Purnama
E-learning has been widely implemented in educations system. Most higher institutions have applied Learning Management Systems (LMSs) to manage their online courses, with Moodle as one of the most favored LMS. However on the other side creating a well designed and written course remains problematic for teachers. That's why the community encourages them to share their courses for others to reuse. The authors or teachers then will continuously revise their courses, that will make subscribers to re-download the whole course again, which will soon lead to exhaustive network usage. To cope with this issue a synchronization model of course's backup file is proposed, retrieving the differential updates only. This paper proposed the synchronization of the existing backup and restore features. The file synchronization is performed between course's backup files based on rsync algorithm. The experiment was conducted on virtual machine, local network, and public network. The result showed lower network traffic compared to the conventional sharing method just like our previous synchronization method. However unlike the previous one this method had two other additional advantages which are the flexibility to control the synchronization content and compatibility to all versions of Moodle.
It is very common today to deliver education using electronic devices referred as e-learning. An advance application system that could manage e-learning known as LMS are widely use in higher educations. Modular Object-Oriented Dynamic Learning (Moodle) is one of the most popular and preferred LMS to deliver courses online. Many higher institutions in one of the author's country origin had implemented Moodle as their LMS  and discussed the problems that had been faced by the country's students. The authors on  have investigated the readiness of elearning implementation in Sam Ratulangi University. Implementation of mobile learning on GPRS network has been assessed in . With many research on e-learning have been initiated, thus it's likely to see more Universities will implements e-learning soon. No doubt that the students are fortunate being given more flexibility. With just a computer device and Internet connection they are able to attempt these online courses without being limited by the boundaries of place and time. It's also very flexible on the teacher's side where they could prepare their courses before hand and give feedbacks to students on their leisure time.
However designing and writing a good content may not be easy. It takes experiences and time to make a well designed and written one. Some special contents may only be correctly written by Professors. For this occasion Moodle encourages course sharing as stated in . There are many other sites that provides backups of courses deployable on Moodle. As time passes another problem was encountered, that constant revision will inevitably occur when perfecting a course. In addition with today's multimedia technologies, for example the course's creator might consider adding videos on their courses, which makes it common to see a very large backup course in terms of filesize. The problem became more seriously as the survey result on  on 10 different universities in Indonesia shows that Internet connection as one of the major obstacles faced when implementing e-learning.
To overcome the constant revision on the course contents and Internet connection problem, the work in  proposed course content synchronization. With this method there's no need to redownload the whole course whenever it is revised, but retrieve the revised part only. The application was created for Moodle version 1.9, and therefore it is needed to develop another one that is compatible for later version of Moodle as the next work on . Those previous methods converts the course's database and directories into blocks and calculate the difference remotely between the outdated and latest course. In other words the previous application also handles the exports and imports of courses. This leads to an issue where a new application needs to be created everytime the structure on Moodle changes.
Moodle already have a course backup and restore feature and therefore it's better to let Moodle handle that part and only focus on the synchronization. This will lead to an application compatible with all versions of Moodle. Also the existing feature provides more flexibility of what contents to be synchronized. With that this paper proposed a file synchronization between course's backup archive based on rsync algorithm that can calculate the difference of a files remotely. Figure 1 is the general framework of the proposed method where we only need to send a reference of the outdated backup archive and use it to create a patch. Thus the objective of this research is to develop a course synchronization application that is compatible with all version of Moodle.
The introduction of the term massive online open course (MOOC) was the starting point where lots of online courses became open via web and allows unlimited participants. As for Moodle's case it is the teaching with Moodle MOOC  on Moodle HQ. Thousands of educators from around the globe have taken this MOOC and introduced to Moodle both as a user and as a course creator. It is still running periodically up to today. The participants are encouraged to share their courses on . On that website visitors may try online courses or download them as .mbz format which is an output from Moodle's course backup and restore feature, and that is not the only website that has online course sharing.
As the authors on  wanted to implement distributed LMS for higher institutions in Indonesia, using their proposed method to distribute courses, was not entirely possible due to the band limited network connection or low capacity of Internet connection. When facing with education's curriculum, developing online courses takes continuous and countless revisions. This forces redistribution of the courses again and it heavily burdens the network capacity.
The general framework of the previous synchronization method on both master and slave LMS side consists of Moodle table and synchronization table which was a conversion of Moodle table into blocks containing sets of ID, hash, and version information. It is between these 2 synchronization tables that the synchronization occurs. At first a version matching takes place. If the slave side is outdated, block matching takes place. If new informations exists on the master LMS, than that information will be added to the slave LMS, the instruction will be marked as "append". If informations on slave LMS doesn't exist on the master LMS then it will be deleted, thus the instruction will be marked as "delete". Finally if informations exist on both sides but different mapping, the instruction will be marked as "update". Overall the synchronization has three main steps. Other than the database, this applies to the course's directory as well. With that algorithm a standalone application was written in PHP, and compatible with Moodle version 1.9. The experiment was conducted between Institut Teknologi Sepuluh November (ITS) Surabaya, Indonesia, and Kumamoto University, Kyushu, Japan, and showed a low network traffic usage.
The courses are shared as a backup archive in .mbz format and our method applies remote file synchronization on the transmission process, by utilizing rsync algorithm. The common file patching system needs the two files, i.e. an unrevised file and a revised file on the same system in order to create a patch for the previous version file. Uniquely rsync can perfom this remotely. Suppose that there are two LMSs, one is on the master side and the other is on the slave side. The masterside has the latest backup fileα while the slave side has the outdated backup fileβ. Based on  it is possible to updateβ to the latest revisionα with the following steps: (1) the slaveside splitsβ into series of non-overlapping fixed-sized blocks that had the same size, with the last block may have the same equal size or smaller, (2) a weak “rolling” 32-bit checksum and a strong 128-bit MD4 checksum, total 2 checksums are calculated for every blocks inβ, (3) the checksums are sent to the master side, (4) the master side searches α to find all blocks at any offset that have the same weak and strong checksumas one in the blocks of β, and (5) the master side sends a sequence of instructions to the slave side to construct a copyof α which can either be instructions refering blocks on β or data retrieved fromαthat does not match on any blocks on β.
The name rsync itself is an application already installed in most Linux distribution. It is said on the manual page  as a fast extraordinarily versatile file copying tool that could replace conventional copying because it sends not the whole file but the difference between existing file. On this paper thought will be using rdiff, it is an application to generate difference between two binary files based on rsync algorithm. Basically it is an rsync implementation but gives more control than the existing rsync application. Rdiff is part of the package librsync . Another application that will be used is rdiffdir, since the course's backup file is an archive. Rdiffdir is directory synchronization version of rdiff which is included in duplicity package .
Backup and Restore Feature
Moodle has a course backup and restore feature that could do backup on a course into .mbz format. Users with previleges are given almost full control of what to backup from the course. Starting from whether to include users, anonym users, or no users at all, until backing up full content or certain parts of the contents only. This can be shown from a menu screenshot on Figure 2, and Figure 6 which is also our course design that shows capability of choosing certain sections to backup. In addition the restore feature gives the same menu. From Moodle's documentation  is also possible to alter the backup file for advance used.
As stated on the previous section the experiments uses rdiff rather than rsync directly because it's still not common sharing backup course over rsync daemon or SSH, but very common over hyper text transfer protocol (HTTP). The slave side will generate a signature file of its course's backup archive and sends it to the master. The master side will use the received signature file and its course's backup archive to compute the delta file which can also be said as a patch file for the slave side course's backup archive. The master side will return a delta file to the slave side, and the slave side will generate the latest version of the course's backup archive. Overall it can be illustrated on Figure 3.
There will be two kinds of synchronization demonstrated. One will directly synchronize the backup archive using rdiff, and the other one will synchronize each file inside the backup archive recursively using rdiffdir. Unlike the first one which is purely binary file synchronization master's and slave's side course backup archive, the second one is more to course synchronization. The inside of the course's backup archive can be seen on 4. The "activity" folder contains forums, lessons, and quizzes alike. The "course" folder contains more of the course's settings. The "files" folder contains materials uploaded for the course, and the "section" folder defines each section on the course. Rdiffdir will recursively perform rdiff on those files. The result of rdiffdir is shown on Figure 5 where the difference of each file resides on the "diffs" folder, new added files on master side on the "snapshots" folder, and instructions to delete files that was deleted on master side on the "deleted" folder.
The experiment uses the main author's own developed course in Moodle version 3.0 as a material which has three large sections (topics) as seen in Figure 6. We also made the course available on , by login as username "teacher" and password "teacher". The experiment has seven scenarios where scenario 1 without sychronization and the others with synchronization as follows: (1) retrieving the whole course's backup file (conventional sharing), (2) large content addition on the master side (slave side only have 1 section), (3) medium content addition on the master side (slave side has 2 sections), (4) small content addition on the master side (adding an url module), (5) small change on the master side (modifying a text on one of the course outline module), (6) section order change on the master side (section 2 shifts to section 1, section 3 shifts to section 2, and section 1 shifts to section 3), (7) no change on the master side. Moreover the scenarios are conducted on 3 situations: (a) local machine and virtual machine, (b) local area network (LAN), and (c) public network on . The local machine acts as the slave side while the other as the master side. Very simple php scripts are written to perform the synchronization as seen on illustration on Figure 3. Then the total sent and received traffic is measured using a packet capture tool Wireshark that will be discussed on the next section.
The first subsection Demonstration shows that the developed application utilize the output of Moodle's course backup and restore feature. Unlike the previous applications on  and  they are not responsible for exporting and importing courses, but rely on the internal feature in Moodle. This makes this paper's synchronization application compatible with existing and upcoming versions of Moodle. The second subsection Measurement Result shows that the application functions as a synchronizer like the previous applications on  and  by showing network efficiencies during transmissions.
We made the PHP scripts available on . The first draft developed has given a feature to the users on both master and slave to dump their own backup course archive in .mbz format. What information existed on the backup archive depends on what options are used on Moodle's backup and restore feature. We utilize common PHP file upload script that can be found in many tutorial on the web, except for this experiment the file will be automatically renamed into "backup.mbz". The demonstration that is shown on this section is for scenario 2, Figure 7 is the console for both master and slave LMSs to initially dump their backup course. As seen on the slave side the outdated "backup.mbz" file has a size around 16 MB where it only contains the first section of the course on Figure 6 (a).
The next step should be clicking the update button. The update button contains instruction to generate a "backup.mbz.sig" signature file from "backup.mbz" archive using the rdiff command, then send the "backup.mbz.sig" to master LMS url stated on the script written in curl PHP. The script to accept the file on the master LMS (the same common upload script in PHP) activates and do an extra instruction written to generate a delta (patch) file, with "backup.mbz.sig" and the master side's "backup.mbz" as inputs. The next step is to send the generated patch file "backup.mbz.delta" to the slave LMS. For that we invoke a script on the slave LMS to download the "backup.mbz.delta" written in curl PHP. On that script also contains instruction to backup the previous "backup.mbz" into "backup.mbz.backup" and apply patching using rdiff command to update the "backup.mbz" using "backup.mbz.delta" as input. Finally Figure 8 shows the updated "backup.mbz" that has a new file size of 30 MB which includes all contents as seen in Figure 6. It is also shown that the "backup.mbz.sig" has a size around 16 kB and size of "backup.mbz.delta" is around 23 MB. The overall process is then repeated for each scenario.
The second draft is similar to the first one except it implements rdiffdir. It shows signature file around 1.5 MB and delta file around 16 MB for scenario one. During the synchronization process the "backup.mbz" archive on both master and slave side are extracted into a folder named "backup". Starting on the slave side rdiffdir recursively generates signatures for each files on "backup" and stored it as an archive "backup.sig". The "backup.sig" is then sent to the master side and to be used as a reference to recursively produce deltas for each file on the master's side "backup" folder and store the deltas into an archive "backup.delta". Next the "backup.delta" is sent to the slave side and patch the "backup" folder, and finally recompressed into an archive "backup.mbz".
The experiment was conducted by sending the signature file which influences the outgoing network traffic and retrieving the delta file which influence the incoming network traffic.
The first experiment synchronizes the course's backup archive directly with rdiff on Figure 9 and the second experiment synchronizes each files contained within the course's backup archive with rdiffdir on Figure 10. The signature file was roughly produced around 200 kB and the delta file was around 20 MB. The first scenario (without synchronization) downloaded the whole course's backup file which had a file size around 30 MB, and the other scenarios (with synchronization) downloaded only the difference generated by rdiff. The overall result shows that using the proposed method is more efficient than doing the conventional way (scenario 1). On this case the slave side consumes total amount of traffic around 30 MB when not using synchronization, and consumes total amount of traffic around 20 MB when using synchronization. The proposed method proves that there is an efficiency of 10 MB of network capacity in term of bandwidth. For scenario 2 and 3 the outdated courses have a considerable amount of difference between the latest course and the results proves that it is very beneficial for this case. For scenario 4, 5, and 6 the outdated courses have a very few differences between the latest course, but the result shows around 20 MB of network consumption which is very high for this case. This is due to synchronizing while both archives are still compressed.
The second experiment on the other hand decompresses both archives and synchronizes each files within which is more accurate for course synchronization. Scenario 4, 5, and 6 only makes small changes on the course's contents which made the incoming network consumption also small, around 1.5 MB. It's a very large efficiency compared to the first synchronization experiment, although the outgoing traffic increases which is due to high number of signature files. Either way both experiment results are better than without synchronization process. The last scenario shows very low traffic due to the course's backup file on the slave side is up to date with the master side, so no update is required. Since the measurement is based on the outgoing and incoming traffic it is logical that the public network shows a slightly higher traffic than between virtual machines and on local area network.
Conclusion and Future Work
Like the previous method of course synchronization the proposed method of rdiff and rsync utilization for backup archive both in master and slave sides saved the network consumption for the course sharing using Moodle, except there were two other merits than to the previous method. The first one was the flexibility to configure the course's contents to be synchronized, and the second one was time efficiency since no adaption process of application of the proposed method was needed when the version Moodle changes, however both of them were not fully demonstrated on this paper. Therefore in the future we will further develop its compatibility and demonstrate on all version of Moodle and other LMSs. The method also gives possibility to develop partial course synchronization.
Part of this work was supported by JSPS KAKENHI Grant-in-Aid for Scientific Research 25280124 and 15H02795.
Incremental synchronization-implementation-on-survey-using-hand-carry-server-raspberry-pi from Fajar Purnama