Identifying
Socio-Technical Trends for Rural water supply schemes using Case-based
Reasoning (CBR)
- Jack Barrie -
This blog will introduce the fundamental
concepts of Case-Based Reasoning (CBR) and why we have chosen to use this
method to help identify complex trends in both the social and technical factors
that lead to the failure of rural water supply schemes.
At first CBR appears to be quite complex, but
actually it is effectively the same methodology that humans use to solve
problems – by experience and logic. I will ease you gently into the world of
CBR by explaining how I began to work with it and how it can be applied to our
research here.
Background to CBR
I first encountered CBR when studying the
problem of determining the significance of a wide range of factors that could
influence the time in which it takes to complete a specific building task.
These could include the number of people working on the project, level of
access for deliveries of materials, level of skill required, volume of concrete
required etc etc. Case-Based Reasoning was successfully applied to help
determine the level of significance of each factor as well as being able to
help project managers predict the time taken to complete future projects by
comparing the factors (mentioned above) in the future project compared to
previous projects of similar style.
When working in Cambodia with Angus McBride,
analysing the wide ranging project styles and technologies used for supplying
water to rural communities, it became apparent that CBR could be used similarly
to the example I gave above. It could not only help to identify the
significance of specific socio-technical community characteristics on water
supply projects, but also help governments and NGO’s better predict the impact
of future projects.
Therefore I co-developed a CBR model with Angus
for my MEng Thesis. We designed the model to be used as such:
- Locate new rural village that you plan to initiate a water project in
- Collect specific social and technical characteristics of the village
- Enter these details into the CBR and the model compares the characteristics to all the past projects
- It produces the % likelihood of success for various different project methods and technologies based on the outcome of the most similar cases.
After rigorous testing this model was able to
predict the outcome of previous water supply projects (pretending they were a
new case) to the accuracy of 74%.
What is CBR?
Case-based reasoning (CBR) is a form of artificial intelligence which
attempts to replicate human learning by using past experience to solve complex
problems. It has been successfully applied to solve complex problems in a wide
range of holistic fields including medicine, law and engineering.
There are generally four main components to CBR:
1.
Retrieve and build a database of
similar cases (water projects),
2.
reuse the cases to solve a new problem,
3.
revise the solution, and
4.
retain experience to solve a future
problem (learn)
Each case contains dozens of factors such as; the type of hand pump,
number or years since installation, hand pump productivity, water quality and
number of users, effectiveness of pump user committee.
Each case contains one evaluation indicator that identifies if the case
in question has failed.
A new case, for which a solution
is sought, is tested by comparing its similarity with other ‘known’ cases from
the systems case base to determine the possible outcome of the case.
The use of Genetic
Algorithms in CBR
Case-based reasoning uses the process of genetic algorithms (GA) to
quantitatively determine the significance and level of interdependencies of factors
affecting the outcome of a process by processing hundreds of thousands of case
evaluations, and therefore replicating the experience of the expert.
Therefore the main advantage of GA is that you can determine the
significance of each factor through a complex iterative process. The second
advantage is that the genetic algorithms can actually mutate (slightly alter)
past cases to help better predict unique cases in the future.
The effectiveness of the GA's ability to ascertain the significance of
each factor is determined by the size of the case base, or 'experience'. Therefore, as the number of
cases (water projects) in the case base increases, the prediction accuracy of
the GA increases. The advantage of GA is its ability to gain a much larger 'experience' than any single expert and
the influence of bias is much reduced as the outcome is based purely on
qualitative data. Furthermore it can assess all interdependencies between all
factors included in the case-base.
Therefore if the user selects 100 generations for a case size of 20
factors, the model will analyse 2000 possible weighting scenarios per case.
Hence, a case base of 151 cases will amount to the equivalent of 302,000
individual case evaluations.
How will it be
applied to the aspect of post conflict water supply schemes in rural Sierra
Leone?
I plan to develop the model to carry out similar functions to the
Cambodia CBR model developed in my Masters Thesis. It should be able to suggest
the significance of specific factors affecting the level of failure of a well.
For instance, hypothetically, it may identify that the salty taste of the water
is 3 times as more likely to influence the failure of the well compared to the
location of the well within the village.
Furthermore, it also raises the possibility of being used by and to help
stakeholders involved in installing wells in Sierra Leone to better plan for
future problems by running scenarios of communities through the model.
Therefore to simplify, the model would work like this:
1. A donor/implementer carries out a rapid
widespread survey of the targeted villages.
Collecting a detailed well observation and community survey.
2. Each individual well surveyed makes up
an individual case in the case base built up of both the social and
technological data collected. Therefore the database would consist of 1000’s of
individual case evaluations (collected in the survey)
3. The collection of these cases would
then be analysed using Genetic Algorithms to determine the influence of each
factor collected in the survey.
4. The characteristics of any future
villages to be supplied with a well can be input to the model and compared to
the vast database of cases already collected to determine if (a) there are any
specific factors that may cause problems (important factors determined in 3.)
and (b) the likelihood of success of the project.
In summary, I believe CBR offers the opportunity to investigate in a
more concise manner the complexity of water supply projects in developing
countries and help us to learn from our mistakes made in the past. I hope that
helps clarify what I am aiming to do with this research.
Jack