Friday, 11 May 2012

Identifying Socio-Technical Trends for Rural water supply schemes using Case-based Reasoning (CBR)


Identifying Socio-Technical Trends for Rural water supply schemes using Case-based Reasoning (CBR) 
- Jack Barrie -

This blog will introduce the fundamental concepts of Case-Based Reasoning (CBR) and why we have chosen to use this method to help identify complex trends in both the social and technical factors that lead to the failure of rural water supply schemes.

At first CBR appears to be quite complex, but actually it is effectively the same methodology that humans use to solve problems – by experience and logic. I will ease you gently into the world of CBR by explaining how I began to work with it and how it can be applied to our research here.

Background to CBR
I first encountered CBR when studying the problem of determining the significance of a wide range of factors that could influence the time in which it takes to complete a specific building task. These could include the number of people working on the project, level of access for deliveries of materials, level of skill required, volume of concrete required etc etc. Case-Based Reasoning was successfully applied to help determine the level of significance of each factor as well as being able to help project managers predict the time taken to complete future projects by comparing the factors (mentioned above) in the future project compared to previous projects of similar style.

When working in Cambodia with Angus McBride, analysing the wide ranging project styles and technologies used for supplying water to rural communities, it became apparent that CBR could be used similarly to the example I gave above. It could not only help to identify the significance of specific socio-technical community characteristics on water supply projects, but also help governments and NGO’s better predict the impact of future projects.

Therefore I co-developed a CBR model with Angus for my MEng Thesis. We designed the model to be used as such:

  1. Locate new rural village that you plan to initiate a water project in
  2. Collect specific social and technical characteristics of the village
  3. Enter these details into the CBR and the model compares the characteristics to all the past projects
  4. It produces the % likelihood of success for various different project methods and technologies based on the outcome of the most similar cases.


After rigorous testing this model was able to predict the outcome of previous water supply projects (pretending they were a new case) to the accuracy of 74%.

What is CBR?

Case-based reasoning (CBR) is a form of artificial intelligence which attempts to replicate human learning by using past experience to solve complex problems. It has been successfully applied to solve complex problems in a wide range of holistic fields including medicine, law and engineering.
There are generally four main components to CBR:
1.                   Retrieve and build a database of similar cases (water projects),
2.                   reuse the cases to solve a new problem,
3.                   revise the solution, and
4.                   retain experience to solve a future problem (learn)
Each case contains dozens of factors such as; the type of hand pump, number or years since installation, hand pump productivity, water quality and number of users, effectiveness of pump user committee.
Each case contains one evaluation indicator that identifies if the case in question has failed.
 A new case, for which a solution is sought, is tested by comparing its similarity with other ‘known’ cases from the systems case base to determine the possible outcome of the case.


The use of Genetic Algorithms in CBR
Case-based reasoning uses the process of genetic algorithms (GA) to quantitatively determine the significance and level of interdependencies of factors affecting the outcome of a process by processing hundreds of thousands of case evaluations, and therefore replicating the experience of the expert.
Therefore the main advantage of GA is that you can determine the significance of each factor through a complex iterative process. The second advantage is that the genetic algorithms can actually mutate (slightly alter) past cases to help better predict unique cases in the future.
The effectiveness of the GA's ability to ascertain the significance of each factor is determined by the size of the case base, or 'experience'. Therefore, as the number of cases (water projects) in the case base increases, the prediction accuracy of the GA increases. The advantage of GA is its ability to gain a much larger 'experience' than any single expert and the influence of bias is much reduced as the outcome is based purely on qualitative data. Furthermore it can assess all interdependencies between all factors included in the case-base.
Therefore if the user selects 100 generations for a case size of 20 factors, the model will analyse 2000 possible weighting scenarios per case. Hence, a case base of 151 cases will amount to the equivalent of 302,000 individual case evaluations.

How will it be applied to the aspect of post conflict water supply schemes in rural Sierra Leone?
I plan to develop the model to carry out similar functions to the Cambodia CBR model developed in my Masters Thesis. It should be able to suggest the significance of specific factors affecting the level of failure of a well. For instance, hypothetically, it may identify that the salty taste of the water is 3 times as more likely to influence the failure of the well compared to the location of the well within the village.
Furthermore, it also raises the possibility of being used by and to help stakeholders involved in installing wells in Sierra Leone to better plan for future problems by running scenarios of communities through the model.
Therefore to simplify, the model would work like this:
1.  A donor/implementer carries out a rapid widespread survey of the targeted villages.  Collecting a detailed well observation and community survey.
2.  Each individual well surveyed makes up an individual case in the case base built up of both the social and technological data collected. Therefore the database would consist of 1000’s of individual case evaluations (collected in the survey)
3.  The collection of these cases would then be analysed using Genetic Algorithms to determine the influence of each factor collected in the survey.
4.   The characteristics of any future villages to be supplied with a well can be input to the model and compared to the vast database of cases already collected to determine if (a) there are any specific factors that may cause problems (important factors determined in 3.) and (b) the likelihood of success of the project.

In summary, I believe CBR offers the opportunity to investigate in a more concise manner the complexity of water supply projects in developing countries and help us to learn from our mistakes made in the past. I hope that helps clarify what I am aiming to do with this research.

Jack