Wednesday, August 14, 2019

Project Report, GSoC, 2019


INTRODUCTION

Hi all!

This project "Validation of Spatial Systems Biology Models in Java(TM)" was done as a part of the Google Summer of Code program, with the organization N.R.N.B.

My mentors for this project were Nicolas Rodriguez and Thomas M. Hamm, along with support and valuable suggestions from Lucian P. Smith, Jun.-Prof. Dr Andreas Draeger and Prof. Akira Funahashi.

The basic overview of the program and the project can be found in my introductory blog: 

For the in-depth background of the project, and detailed understanding of the working of the offline validator the readers are suggested to go through my proposal for the Google Summer of Code, 2019: 

WORK DETAILS

My GitHub fork for this project is https://github.com/bhavyejain/jsbml.

In the duration of this project, I implemented constraint classes for the spatial package and fixed issues with the existing spatial classes, consequently bringing most of them to the latest specification. My work in this repository is reflected in the following extension: https://github.com/bhavyejain/jsbml/tree/master/extensions/spatial/src/org/sbml/jsbml.

I made a pull request after implementing 2-3 constraint classes at a time so that it is easy to review the code from time to time. The test files of many classes also required corrections. The modified test files were uploaded to a master google drive and the corrections listed in a document for further reference. All the merged contributions have been compiled and tested before making the merge.

All of my commits can be viewed here:

Except for the new rules under construction, all my pull requests have been merged with the master branch of the product, and can be viewed here:

The ongoing work is reflected in the open pull requests which will be merged as and when new rules are added successfully with simultaneous creation of test files. The open pull requests (if any active at the moment) authored by me can be viewed here:

DELIVERABLES

All validation and consistency rules provided in the Spatial Processes SBML Level 3 Package Specification have been implemented and tested.
Existing source code has been updated to meet the latest requirements of the validator and the specification.
New code has been tested successfully on various test files.

CURRENT STATUS AND FURTHER DEVELOPMENT

Currently, work is being done to manually add more validation rules. Additional rules for around 13 classes have been identified and finalised, with more to be proposed in coming days. The procedure to add such rules has been decided upon and work is expected to proceed at a steady pace. Post final evaluation, I shall work upon completing the validation for spatial package, and include as many syntactic and possibly semantic error checks to make the validator robust. 

Tuesday, August 13, 2019

#10 Final Phase of Google Summer of Code, 2019!

Greeting everyone!

In the last week, the offline validator for spatial was completed as per the existing specification! The conflicts and errors (both semantic and syntactic) have been resolved to best of my knowledge, and the system seems to work as it should for the time being! In the complete process, I was able to bring many classes up to the latest specification and added (and fixed) some code in the SpatialParser as well.

The SpatialPoints class now extends AbstractSpatialNamedSBase to include the 'id' and 'name' attributes.

The issue with the SpatialParser where it was adding attributes to the UNKOWN_XML multiple times (duplicates) has been resolved. A processEndDocument() method has been added to handle missing SpatialReactionPlugin extension for the Reaction element.

Work has started on adding more rules by hand. Rather, the progress is good, and I am already in the process of implementing some additional rules. I figured out some rules by going through the specification and shared a Google Doc with my mentors. They helped me finalize the rules and give them a proper language. We started the numbering of the new rules from 50, example: 1220650 (In general, 12XXX50, 12XXX51 and so on). This was done to accommodate any future changes in the specification and the rules that the automatic generator would then produce. As of now, we have new rules for around 12 classes, with more to be added with time. Hopefully, the rules can be reflected in the upcoming draft of the spatial specification. 


Since most of the work is already done, the updated from here on could be really slow. Further work requires some discussion and finalization of things before starting to implement them. Nevertheless, all the work can be seen on the GitHub repository itself. 



I'll be posing the final report soon! :-D



Till next time,

Cheers!

Tuesday, July 23, 2019

#9 Second Evaluation Approaches

Greetings everyone!

Since a lot has not transpired in the last two weeks, I decided to write a combined blog for both. Last week, I completed implementing the rules provided in the Spatial package specification. In a sheer coincidence, I crossed 100 commits and 10k lines of code in the process too!

While implementing constraints for MixedGeometry, I had to make some changes in the SpatialParser. The parser had no case to handle elements with ContextObject as MixedGeometry. So I added the block to read listOfOrdinalMappings and listOfGeometryDefinitions. Also, the parser did not have the code to read the OrdinalMapping element, so that was added too.

The implementation of the rules for this class was pretty straightforward, but I encountered a small problem while testing some of the rules. Earlier, the parser recognised geometry definitions as a child of listOfGeometryDefinitions belonging to the main Geometry element. But MixedGeometry also has a listOfGeometryDefinitions. The geometry definitions in this list were being wrongly stored as children of the Geometry element. So I created a new addGeomteryDefinition() method which adds the GeometryDefinition to the correct parent by checking the parent ContextObject.

These corrections and additions in the SpatialParser also fix many of the problems that were encountered while reading and rewriting the test files using stax. Implementing rules for OrdinalMapping and SampledFiled were straight forward too, although I had to edit some test files along the way.

Now I enter the last phase of GSoC, where I'll try to figure out some more rules from the text of the specification, and once approved by my mentors, I shall implement them. I shall resume discussing the different constraint rules in my next blog. 

Till next time,
Cheers!

Friday, July 12, 2019

#8 Last Week Of My Summer Break

Greetings Everyone!

In the previous week, I had been working on constraint classes for ParametricObject class and the SpatialPoints class.
I did not encounter any major problems while implementing the two classes but did have to ask for some clarifications
from my mentors along the way.

The ParametricObject element has an attribute called pointIndex, which is written as values outside the XML tags as:

<spatial:parametricObject spatial:compression="uncompressed" spatial:dataType="double" spatial:domainType="domainType_1" spatial:id="parametricObject_1" spatial:name="someString" spatial:pointIndexLength="0" spatial:polygonType="triangle">
    0 2 5; 0 6 2; 0 5 6; 2 6 5
</spatial:parametricObject>

I did not know how these values are being read. It turned out that there is a method processCharactersOf() in SpatialParser which reads these
values and calls the append() method of the ParametricObject class. The method appends the values to a string variable pointIndex one by one.

After this, the check on the values was fairly simple. I used StringTokenizer with an alternate constructor, passing a string of delimiters
" ;" as an argument. This argument helps the StringTokenizer to separate the string "0 2 5; 0 6 2; 0 5 6; 2 6 5" whenever it encounters a
space or a semicolon, thus returning the individual values.

For the SpatialPoints class, I still need to verify with a mentor of mine if the 'id' and 'name' attributes are a part of the new specification.
After that, necessary changes will be made either to the test files, or the existing class to incorporate the final decision.

Today I shall discuss checks on invalid and unknown attributes on an element.

1)
The value of the attribute <attribute_name> of a <spatial_element> object must be an array of values of type <datatype>.
OR
The value of the attribute <attribute_name> of a <spatial_element> object must conform to the syntax of SBML data type 
<datakind_class> and may only take on the allowed values of <datakind_class> defined in SBML; that is, the value must 
be one of the following: “value_1” or “value_2”.

For such rules, the first step is to modify the source class to handle invalid attributes whenever they are encountered.
In the readAttribute() method of the class, look for a branch condition on attributeName.equals() for the concerned attribute.
There must be an enclosed try-catch block inside the branched block of code. The try block sets the value of the attribute,
and the setting method throws an exception if the value is not syntactically correct. This is caught by the catch block, and
that is where we need to add the following line of code:

AbstractReaderWriter.processInvalidAttribute(attributeName, null, value, prefix, this);

This call handles the invalid attribute and adds it to the INVALID_XML object of the class.
Now the constraint is impemented by a single line as:

func = new InvalidAttributeValidationFunction<spatial_element>(SpatialConstants.<attribute_name>);

2)
A <spatial_element> object must have the required attributes <attribute_1> and <attribute_2>, and may have the optional 
attributes <attribute_3>, <attribute_4> and <attribute_5>. No other attributes from the SBML Level 3 Spatial Processes 
namespaces are permitted on a SpatialPoints object.

The helper class for such a rule is UnknownPackageAttributeValidationFunction. We need to override the check() method
to incorporate test on the must-required attributes. We call the pre-implemented check() method by a super call after
checking must-required attributes.

func = new UnknownPackageAttributeValidationFunction<element_name>(SpatialConstants.shortLabel) {
     @Override
     public boolean check(ValidationContext ctx, SpatialElement obj){
if(!obj.isSetAttribute1){
    return false;
}
if(!obj.isSetAttribute2){
    return false;
}
return super.check(ctx, obj);
     }
};

Saturday, July 6, 2019

#7 Coding Continues

Greetings people!

The results of the first evaluation were released last week, and I'm glad that I have passed to the next phase of the program, with a positive response from my mentors.

I was not able to do a lot of work in the past week, but I managed to implement constraints for 2 more classes, namely, CSGPrimitive and CSGSetOperator. I did face a bit of problem with CSGSetOperator initially as the XML was not being read properly. I figured it out as a problem in the SpatialParser.

The parent of the listOfCsgNodes element was being read as CSGObject but it should have been SCGSetOperator. CSGObject is usually the parent of CSGSetOperator. Thus it was accessing the parent of the parent and hence the problem. A simple deletion of the getParent() call (where it appears for the second time) did the fix.

As of now, only 6 classes remain to be implemented. I hope to complete the preliminary work by the end of the next week. After that, I'll start working on adding constraints by hand.

Thursday, June 27, 2019

#6 Fingers Crossed For The First Eval!

Currently, the first evaluation phase is underway and ends tomorrow,  on the 28th of June. Prior to actually opening the evaluation form, I was expecting a long questionnaire. But it turned out to be a rather small one!

The last week went pretty smooth, so as to say. Thankfully, I did not encounter any new issues in the code. But I had to do one tedious job alongside coding the constraints. For the set of constraints, I am currently working with, the test files are faulty, and actually, do not contain the elements that I need to test. As a result, I need to edit all the test files to include appropriate XML elements and attributes. And this is, needless to say, a very tedious job! But I started enjoying that eventually. I reduced the job to just copy paste, by making a template and then making the minor changes so that it fails (or succeeds).

As I said in the last post, I will start discussing some of the broad types of constraints that I am implementing in my project. In this post, I'll discuss the constraints regarding CORE attributes and elements.

1)
" A <some_spatial_element> object may have the optional SBML Level 3 Core attributes metaid and sboTerm. No other attributes from the SBML Level 3 Core namespaces are permitted on a <some_spatial_element>. "

Let us take an element as an example. In XML, a Domain element would be typically written as:

<spatial:domain spatial:domainType="domainType_1" spatial:id="domain_1" spatial:name="someString">

The prefix spatial indicates that the element/attribute belongs to the spatial namespace. If an element/attribute does not have a prefix, it is considered to belong to the core package of JSBML. The rule says that this element (Domain) can have 'optional' attributes 'metaid' and 'sboTerm' from the core. This means that these are not compulsory but can be added. Thus the following XML examples are valid:

<spatial:domain metaid="someStringspatial:domainType="domainType_1" spatial:id="domain_1" spatial:name="someString">

<spatial:domain sboTerm="SBO:0000001spatial:domainType="domainType_1" spatial:id="domain_1" spatial:name="someString">

Whereas, the following XML is a wrong XML as it contains an attribute 'foo' which is not recognized by core.

<spatial:domain foo="someStringspatial:domainType="domainType_1" spatial:id="domain_1" spatial:name="someString">

To implement this check, we simply return the helper function for unknown core attributes as:

func = new UnknownCoreAttributeValidationFunction<Domain>();


2)
" A <some_spatial_element> object may have the optional SBML Level 3 Core subobjects for notes and annotations. No other elements from the SBML Level 3 Core namespaces are permitted on a <some_spatial_element>. "

Continuing with the same example, an XML element can have multiple children elements. The domain element typically has listOfInteriorPoints as its child. But it can have the <notes/> or <annotation/> element from core as its child too. Thus some valid XMLs look like:

<spatial:domain spatial:domainType="domainType_1" spatial:id="domain_1" spatial:name="someString">
     <notes/>
</spatial:domain>

<spatial:domain spatial:domainType="domainType_1" spatial:id="domain_1" spatial:name="someString">
     <annotation/>
</spatial:domain>


The following example is an invalid XML because the element <foo/> does not exist in the core.

<spatial:domain spatial:domainType="domainType_1" spatial:id="domain_1" spatial:name="someString">
     <foo/>
</spatial:domain>


To implement this check, we again simply make use of a helper validation function for unknown core elements:

func = new UnknownCoreElementValidationFunction<Domain>();



In the next blog post, I will continue with the discussion on constraint implementations, and give further updates on my project.

Till next time,
Cheers! :D

Friday, June 21, 2019

#5 First Evaluation Approaches!

The first phase of coding is nearing its end, and the first evaluation is due on June 24th.

The first 4 weeks of GSoC have been immensely rewarding in terms of learning and working experience. Since the existing code was quite old, I encountered quite a few problems any time I worked on a new type of constraint. With the help of my mentor, I was able to fix pretty much all of those issues, saving one or two odd issues.

In this blog post, I'll discuss some more issues that I encountered along the way and their solutions in short.

1.
Once I started implementing checks for core attributes on spatial elements, I noticed sometimes they were not being stored in the UNKNOWN_XML. The attributes on a spatial element are read by the readAttribute() method of the respective spatial class. The problem was due to the statement like:

   boolean isAttributeRead = (super.readAttribute(attributeName, prefix, value))
                          && (SpatialConstants.shortLabel == prefix);

The problem here is the condition that prefix is the same as the shortLabel, which is nothing but the package name. Core attributes do not have any prefix and thus they are finally considered not processed as isAttributeRead becomes false.
Just removing the second condition does the fix and the core attributes are read properly.

2.
A major problem, due to which some constraints were facing issues, as I mentioned in the previous blog, was regarding the ListOf<> elements.
In classes such as Domain.java, there are ListOf<?> elements which are children on these classes in the XML. The checks for such ListOf<?> elements included checking core elements and attributes. 
The problem was that the unknown core elements or attributed were being added to the UNKNOWN_XML of a ListOf context object, but the ListOf was being duplicated during the flow of the program because of a faulty isSetListOf() method:

   public boolean isSetListOfInteriorPoints() {
      if ((listOfInteriorPoints == null) || listOfInteriorPoints.isEmpty()) {
         return false;
      }
      return true; 
   }


This method, apart from checking if the container is null, also checks if it is empty. As a result, new containers were being initialized in the code when the container was not populated.  So, the UNKNOWN_XML belonging to the previous container is lost and the checks fail. 

Again, just simply removing the condition listOfInteriorPoints.isEmpty() does the job and the tests start working properly.


In forthcoming blogs, I will start discussing the various categories (broadly) of constraint rules, and their implementation.

Till next time,
Cheers! :-D