A Bayesian Network Approach to Identify Factors Affecting Learning of Additional Mathematics

Additional Mathematics is an elective subject in Sijil Pelajaran Malaysia (SPM). However, it is treated as a core subject for almost all the science stream students. Many students who can perform well in Modern Mathematics since primary school cannot master the Additional Mathematics. They fail to understand the concepts in Additional Mathematics. This study seeks to identify the factors that affect students in mastering Additional Mathematics at five schools in an urban area. Bayesian network is used to identify the relationship between the factors in the study and to analyze the data as it is able to represent the variables as nodes and the relationships as directed arcs. Constraint-based algorithms and score-based algorithms are used to generate the networks into several categories to compare and identify the strong relationships among the factors that affect the students’ learning of the subject. It is concluded that the new symbols and sign learned in Additional Mathematics affects the students in mastering the subject.


INTRODUCTION BACKGROUND OF THE NATIONAL TyPE SECONDARy SCHOOL IN MALAySIA
National Type Secondary Schools in Malaysia also known as 'Sekolah Menengah Jenis Kebangsaan (SMJK)' are categorized as semi-government secondary schools.These schools use the same syllabus and system as in the ordinary National Secondary Schools which is known as 'Sekolah Menengah Kebangsaan (SMK).'These schools are called semi-government schools because the government does not have full authority on the schools.Most of the National Type Secondary Schools in Malaysia have their own board of governors.The board of governors owns the school land, buildings, and facilities (Florence 2009).The government provides the teachers for the schools and text book loans for the students.Other than that, most of the physical issues of the school are handled by the board.

ADDITIONAL MATHEMATICS IN SECONDARy SCHOOL
Additional Mathematics is taught to form four and five student at the secondary school level.This is an elective subject but, most of the schools in Malaysia encourage their students who are in science stream to learn the subject.Some of the schools will offer Additional Mathematics for the Accounting class students in the arts stream.Modern Mathematics is the core subject which must be taken by all students who sit for the SPM exam.Comparing Additional Mathematics and Modern Mathematics, one will find that the syllabus and content of the Additional Mathematics are much more complicated and tougher.There are many complaints from the students, especially those who have just started form four and started to learn the subject.Since the solutions for Additional Mathematics questions are well arranged in a longer form, the students conclude that this is a more difficult subject (Tan 2009).
BAyESIAN NETwORK Bayesian network belongs to the family of probabilistic graphical models.These graphical structures are used to represent knowledge about an uncertain domain.Each node in the graph represents a random variable and the edges between the nodes represent probabilistic dependencies among the corresponding random variables.These conditional dependencies in the graph are often estimated by using known statistical and computational methods.Bayesian network is a combined discipline from graph theory, probability theory, computer science and statistics (Scutari 2010).Bayesian networks have been used in many fields, from Online Analytical Processing (OLAP) performance enhancement (Margaritis 2003) to medical service performance analysis (Acid et al. 2004), gene analysis (Friedman 2000), breast cancer prognosis and epidemiology (Holmes & Jain 2008).Bayesian network enables an effective representation and computation of the joint probability distribution over a set of random variables (Scutari 2010).The structure is defined by two sets, the nodes (vertices) and the set of directed edges.The nodes represent random variables and are drawn as circles labeled by the name of variables.The edges represent direct dependence among the variables and are drawn by arrows between nodes.In this study, we explored the relationships between the factors affecting students in mastering Additional Mathematics in secondary school level.In addition, this study focused on students in National Type Secondary Schools in an urban area.

THE OBJECTIvES OF THE STUDy
This study was conducted to: 1. Identify the factors affecting students in mastering Additional Mathematics at secondary school level.2. Use Bayesian network to visualize the variables and identify the connections between the factors.3. Identify the most significant factors that affect students in mastering Additional Mathematics using Bayesian network.

METHODOLOGy DATA FOR THE STUDy
This study was done among the form four and form five students in five National Type Secondary Schools in a big town because most of the students from National Type Secondary School generally perform well in mathematics.
The study was explained to the students who take Additional Mathematics and the survey was done using questionnaires.Even though they are good in mathematics, these students also faced difficulties to master the Additional Mathematics when they reached form four and form five levels.From each school, we chose 200 students to do the survey.In total, our sample size for the study is 1000 students.we were able to collect the data from 1000 respondents for the study.we have prepared questionnaire to collect the data from the students.The groups of students who take Additional Mathematics were asked to assemble in the school hall to do the survey.In some schools, the survey was conducted in the class room itself.The questionnaire contains 15 items linked to fifteen variables shown in Table 1 to be answered.The items were prepared in both languages English Language and in Bahasa Malaysia.The items were very straight forward and simple.All the fifteen items were given in five Likert scales.These items were designed to see the causal relationship between them.All the items were related to the factors affecting the students in mastering Additional Mathematics.

THE 'BNLEARN' PACKAGE
The bnlearn package from the R programming language provides a free implementation of some of these structure learning algorithms along with the conditional independence tests and network scores used to construct the Bayesian network.Both discrete and continuous data are supported.This algorithm contains two phrases which are called the growing phrase and the shrinking phrase.This is similar to Grow-Shrink (GS) and Incremental Association Markov Blanket (IAMB).During the growing phase of each iteration, it sorts the attributes that are candidates for admission from the most to the least conditional dependent, according to a heuristic function.Fast-IAMB is used to reduce the number of such tests by adding not one but a number of attributes at a time after each reordering of the remaining attributes (yaramakala & Margaritis 2005).

Grow-Shrink (GS)
Grow-Shrink (GS) consists of two phases grow phase and shrink phase.The GS algorithm actually was proposed by Margaritis (2003).In GS, the growing phase of a variable X continues or proceeds by trying to add each variable Y to the current set of hypothesized neighbour of X.

Hill-Climbing (HC)
Hill-Climbing (HC) is commonly used in practice (Kojima et al. 2010).Kojima et al. also claimed that Hill-Climbing is used to find the local optima and upgraded versions of this algorithms lead to improve the score and structure of the results.Hill-Climbing is a common score based learning algorithm on the space of directed graphs.

Incremental Association Markov Blanket (IAMB)
Incremental Association Markov Blanket (IAMB) consists of two phases, a forward phase and a backward phase.In the forward phase, a variable of interest T, is denoted as MB(T).Tsamardinos, Aliferis and Statnikov (2003) claimed that MB(T) is a minimal set of variables conditioned on which other variables are probabilistically independent of the target T.

Max-Min Parents and Children (MMPC)
This is a forward selection technique for neighborhood detection-based on the maximization of the minimum association measure observed with any subset of the nodes selected in the previous iterations (Tsamardinos et al. 2006 The arcs between the nodes show the direct dependent relationship between the connecting variables.The nodes represent the variables, where in this case are the factors affecting the students in their failure to master Additional Mathematics.The existence of conditional independent relationships is indicated by the absence of the arcs between the nodes.These networks also represent the logical cause and effect between the variables.From all the nine diagrams of various structural learning algorithms, we have found some common arcs.Most of the networks have the common arcs between the same nodes as shown in Figure 1.After running all the algorithms in R Language for the first time, we are able to find the number of links and common arcs between all the nine networks as shown in Table 2 and we have selected several common relationships between the nodes such as in Figure 1.Thus, we need to run the data again to white list these common links.This procedure is to ensure that the common links remain in the final network.During the white list process, we need to run the algorithms again and choose the direction for those arcs without direction.The p-value will clearly state the better direction to be chosen for two nodes.After white listing to choose the direction, there is the second set of nine networks that we have obtained.Based on the second set of networks, we can calculate the network scores to find the best network which fits the data.From the second set of nine networks from all the nine algorithms, the results of scores are shown in the Table 3.The scores that we used are Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), Bayesian Dirichlet Equivalent (BDE), K2, and log-likelihood (loglik).
Based on the results shown in Table 3, the highest scores are highlighted.Hill-Climbing and Tabu which gave the best fit also gave the same values and networks.Figure 2 shows the best network.
From this network, the arcs strength was then used to evaluate the strength for all the edges.The arcs strength was used to measure the strength of the probabilistic relationship expressed by the arcs of a Bayesian network (Scutari 2010).we can then identify the most significant and strongest arc in the network.Thus, it will be helpful for us to determine the best dependencies between the factors for the study.
Table 4 shows the arcs strength between the nodes in the network from the Hill-Climbing algorithm.From the table, we highlighted the strong relationship between the nodes.Figure 3 shows the arc strengths in the Hill-Climbing network.

DISCUSSION
The arc strength between the nodes in the Hill-Climbing network which were selected as the best algorithm for this study is shown (Table 4; Figure 3).The arc from node Q10 to node Q6 has the strongest arc strength.This relationship shows that node Q6 is dependent on Q10 in the highest frequency.Based on the interpretation, Q6 is the variable which is linked to the confusion of the students on the usage of many new symbols in Additional Mathematics.Q10 is the variable about the non-attractive and non-interesting lesson conducted for Additional Mathematics lesson.we can clearly see the relation between the variables where students will feel so confused with the new things that were introduced to them.Moreover, if the students learn something totally new with unknown symbols and facts, they need to adapt to new facts and symbols to understand.They easily get bored with the lesson if the teaching and learning process were conducted uninterestingly.The students might not be interested with the boring lesson especially when they are required to deal with new things which they have never seen or used before.Thus, we can clearly see that the variable Q6 is strongly dependent on variable Q10.
The next strongest arc is from Q13 to Q11.The Q11 concerns the visual-based questions in Additional Mathematics.Q13 concerns about the insufficient time to teach Additional Mathematics.Basically, most schools allocate about four periods of teaching and learning for Additional Mathematics.In teaching Additional Mathematics, teachers really need to spend more time to explain each step of the solution to the students.In addition when it comes to visual-based questions, teachers need to spend time to explain the diagram and show the proper procedure to solve the question.For visual-based questions, students need sufficient time to discuss and find the solution.Since the time to teach Additional Mathematics in schools is not sufficient, most of the teachers will rush to finish the syllabus and content.So, we can conclude that the students will have problem to solve the visualbased questions in Additional Mathematics due to the time constraint.
The next strongest arc is from node Q4 to node Q3.Node Q3 concerns choosing the correct formula to find the solution for the question and Q4 concerns the understanding of the needs of the questions.we also can conclude that node Q3 is dependent on Q4.Most of the students fail to master Additional Mathematics because they fail to understand the question's need.They are not sure about the question's need and not able to choose the correct formula to be used in solving the question.Most of the questions in Additional Mathematics are dependent on formulas.Students who are taking Additional Mathematics must be really good in understanding and using the correct formula to master the subject.The students are not able to choose the correct formula for a particular question if they cannot identify the question's need.They must know the objectives of the question before they choose which formula to use.
The next strongest arc is from node Q10 to node Q11.Q11 concerns the visual-based questions in Additional Mathematics and Q10 concerns the boring lesson in class.Q11 is dependent on Q10 because questions which are related to diagrams in Additional Mathematics must be taught in very interesting methods with the help of effective teaching aids.The teacher must use proper teaching aids to teach the concepts for particular visual-based questions.Interesting lesson with interesting teaching aids will attract students interest, thus helping them to understand the concept of the diagram and question.Therefore, visual-based questions need to be taught as interestingly as possible with the teaching aids.
The next strongest arc is from node Q7 to node Q2.Node Q2 concerns careless and lazy attitude of the students in learning the subject.Q7 concerns poor skill in choosing the method of solution for the questions.It is very obvious to see that node Q2 is dependent on node Q7.If a student is not able to identify the correct method of solution for a question, the student will be unable to proceed further in the question.
Finally, Q2 is dependent on Q1.Q2 concerns students' attitude and Q1 concerns the mind set of them towards the subject.As we all know, humans believe and have the mentality that something will really affect their involvement in that particular issue.If a student's mindset is influenced by others by saying that Additional Mathematics is a difficult subject, then the students will always follow the belief that they cannot succeed in the subject.They do not feel confident with their own ability.The negative mind set will make them doubtful to master the subject.This will result them in becoming lazy and careless when learning the subject.
From the results, we can conclude that the students were confused with the usage of many symbols in Additional Mathematics because of the non-attractive teaching and learning process.The students were not able to understand or manipulate the symbols in Additional Mathematics because they cannot get the idea of the usage, function, and purpose of the symbols correctly.This In Kinzel (1999), students have difficulties in understanding and interpreting the symbolic notation used in algebra.Caprapo and Joffrion (2006) said that the school students often demonstrate much stronger skills in solving problems in mathematics that require algebraic reasoning than symbolizing equations solving.According to Pimm (1987) the problem is that the symbols themselves are taken as the objects of mathematics rather than the ideas and processes which they represent.Pupils fail to interpret or understand the meaning of certain mathematical symbols due to the way by which they are taught to read those symbols.The general consensus is that the introduction of mathematical symbols presents difficulties and challenges beyond those presented by words alone (Kuster 2010;Lee 2004).Earle (1977) argues that the problem lies on how symbols are used and perceived by the students.If a student cannot recognize and pronounce a symbol correctly, then he or she will have difficulties in using it.Symbols are the components of the mathematics language that make it possible for a person to communicate, manipulate, and reflect upon abstract mathematical concepts.However, the symbolic language is often a cause of great confusion for students (Rubenstein & Thompson 2001).The expert mathematicians or mathematics teachers are able to work with and to see the mathematics through its symbolic representations, but students often struggle in this endeavor, as they may need to be told what to see and how to reason with mathematical symbols (Bakker et al. 2003;Kinzel 1999;Stacey & Mac Gregor 1999).
Learning duration for the subject is also another strong factor affecting the students.According to Jane (1996), the issue of time in mathematics classroom needs to be addressed because radical changes have taken place over the last decade in the field of mathematics education.Since Additional Mathematics is a complicated subject, students need to be exposed more frequently with the content and methods.Most of the schools will have the problem to finish the syllabus of Additional Mathematics by end of the year, especially for the form five students.They are forced to rush through to finish the syllabus.This will make the situation worse.In several schools, extra class will be conducted for Additional Mathematics subject, especially for the form five students.The students need sufficient time to understand and learn the subject.Extra time will help them to be familiar with the visual-based questions in Additional Mathematics too.In education, time is an indispensable asset.It is an educational resource.According to Agabi (2010), time is an educational resource that is highly limited in supply and critical but often taken for granted by the providers of education.It is so important and useful that each school activity is regulated.Maduagwu and Nwogu (2006) posted that different tasks need to be allotted time and emphasized the need for proper time management.It is important to emphasize that time frame for each activity of any day, week, or year should be structured in the form of time table.
At the same time, we found that the ability to choose the correct formula to solve a question in Additional Mathematics is another factor in this study.Besides that, the students are not able to understand the need of the question.This factor will affect in choosing the correct formula to solve the question.Zakaria (2002) indicates that most students have difficulty in learning mathematics because their analysis revealed that more than half of the students could not understand the questions and do not know the method to plan and implement the strategies towards the solutions.

CONCLUSION
This study was done to see the relationship between the factors which affect the students in mastering the Additional Mathematics in National Type Secondary School.The Bayesian network was employed to interpret the data collected doing probabilistic relation and dependencies between the variables which are the factors for the study.The data that have been collected from the students of five National Type Secondary Schools from a big town were fair enough to run the study.This is because there are no preferences in the choice of the students.we have gained the responses from five school students who are from different ethnic background and culture.
The nine types of structural algorithms were used and the network for each of the algorithm was produced.Based on the networks structures learned by selected algorithms, we have finalized the common arcs.we ran the programme again by whitelisting and choosing the direction.
From this second set of networks, we need to find the results of scores for all the algorithms.Based on the scores, the best network was identified from the results of the nine algorithms.In this study, we have chosen the network from Tabu and Hill-Climbing algorithms as the best network.

FIGURE 1 .
FIGURE 1. Common arcs from all the learned networks

FIGURE 2 .
FIGURE 2. The final results of the score (Hill-Climbing algorithm)

FIGURE
FIGURE 3. Arc strengths in the Hill-Climbing network

TABLE 4 .
The strongest relationship between the nodes and the arc strengths