Wednesday, August 14, 2019

Project Report, GSoC, 2019


INTRODUCTION

Hi all!

This project "Validation of Spatial Systems Biology Models in Java(TM)" was done as a part of the Google Summer of Code program, with the organization N.R.N.B.

My mentors for this project were Nicolas Rodriguez and Thomas M. Hamm, along with support and valuable suggestions from Lucian P. Smith, Jun.-Prof. Dr Andreas Draeger and Prof. Akira Funahashi.

The basic overview of the program and the project can be found in my introductory blog: 

For the in-depth background of the project, and detailed understanding of the working of the offline validator the readers are suggested to go through my proposal for the Google Summer of Code, 2019: 

WORK DETAILS

My GitHub fork for this project is https://github.com/bhavyejain/jsbml.

In the duration of this project, I implemented constraint classes for the spatial package and fixed issues with the existing spatial classes, consequently bringing most of them to the latest specification. My work in this repository is reflected in the following extension: https://github.com/bhavyejain/jsbml/tree/master/extensions/spatial/src/org/sbml/jsbml.

I made a pull request after implementing 2-3 constraint classes at a time so that it is easy to review the code from time to time. The test files of many classes also required corrections. The modified test files were uploaded to a master google drive and the corrections listed in a document for further reference. All the merged contributions have been compiled and tested before making the merge.

All of my commits can be viewed here:

Except for the new rules under construction, all my pull requests have been merged with the master branch of the product, and can be viewed here:

The ongoing work is reflected in the open pull requests which will be merged as and when new rules are added successfully with simultaneous creation of test files. The open pull requests (if any active at the moment) authored by me can be viewed here:

DELIVERABLES

All validation and consistency rules provided in the Spatial Processes SBML Level 3 Package Specification have been implemented and tested.
Existing source code has been updated to meet the latest requirements of the validator and the specification.
New code has been tested successfully on various test files.

CURRENT STATUS AND FURTHER DEVELOPMENT

Currently, work is being done to manually add more validation rules. Additional rules for around 13 classes have been identified and finalised, with more to be proposed in coming days. The procedure to add such rules has been decided upon and work is expected to proceed at a steady pace. Post final evaluation, I shall work upon completing the validation for spatial package, and include as many syntactic and possibly semantic error checks to make the validator robust. 

Tuesday, August 13, 2019

#10 Final Phase of Google Summer of Code, 2019!

Greeting everyone!

In the last week, the offline validator for spatial was completed as per the existing specification! The conflicts and errors (both semantic and syntactic) have been resolved to best of my knowledge, and the system seems to work as it should for the time being! In the complete process, I was able to bring many classes up to the latest specification and added (and fixed) some code in the SpatialParser as well.

The SpatialPoints class now extends AbstractSpatialNamedSBase to include the 'id' and 'name' attributes.

The issue with the SpatialParser where it was adding attributes to the UNKOWN_XML multiple times (duplicates) has been resolved. A processEndDocument() method has been added to handle missing SpatialReactionPlugin extension for the Reaction element.

Work has started on adding more rules by hand. Rather, the progress is good, and I am already in the process of implementing some additional rules. I figured out some rules by going through the specification and shared a Google Doc with my mentors. They helped me finalize the rules and give them a proper language. We started the numbering of the new rules from 50, example: 1220650 (In general, 12XXX50, 12XXX51 and so on). This was done to accommodate any future changes in the specification and the rules that the automatic generator would then produce. As of now, we have new rules for around 12 classes, with more to be added with time. Hopefully, the rules can be reflected in the upcoming draft of the spatial specification. 


Since most of the work is already done, the updated from here on could be really slow. Further work requires some discussion and finalization of things before starting to implement them. Nevertheless, all the work can be seen on the GitHub repository itself. 



I'll be posing the final report soon! :-D



Till next time,

Cheers!

Tuesday, July 23, 2019

#9 Second Evaluation Approaches

Greetings everyone!

Since a lot has not transpired in the last two weeks, I decided to write a combined blog for both. Last week, I completed implementing the rules provided in the Spatial package specification. In a sheer coincidence, I crossed 100 commits and 10k lines of code in the process too!

While implementing constraints for MixedGeometry, I had to make some changes in the SpatialParser. The parser had no case to handle elements with ContextObject as MixedGeometry. So I added the block to read listOfOrdinalMappings and listOfGeometryDefinitions. Also, the parser did not have the code to read the OrdinalMapping element, so that was added too.

The implementation of the rules for this class was pretty straightforward, but I encountered a small problem while testing some of the rules. Earlier, the parser recognised geometry definitions as a child of listOfGeometryDefinitions belonging to the main Geometry element. But MixedGeometry also has a listOfGeometryDefinitions. The geometry definitions in this list were being wrongly stored as children of the Geometry element. So I created a new addGeomteryDefinition() method which adds the GeometryDefinition to the correct parent by checking the parent ContextObject.

These corrections and additions in the SpatialParser also fix many of the problems that were encountered while reading and rewriting the test files using stax. Implementing rules for OrdinalMapping and SampledFiled were straight forward too, although I had to edit some test files along the way.

Now I enter the last phase of GSoC, where I'll try to figure out some more rules from the text of the specification, and once approved by my mentors, I shall implement them. I shall resume discussing the different constraint rules in my next blog. 

Till next time,
Cheers!

Friday, July 12, 2019

#8 Last Week Of My Summer Break

Greetings Everyone!

In the previous week, I had been working on constraint classes for ParametricObject class and the SpatialPoints class.
I did not encounter any major problems while implementing the two classes but did have to ask for some clarifications
from my mentors along the way.

The ParametricObject element has an attribute called pointIndex, which is written as values outside the XML tags as:

<spatial:parametricObject spatial:compression="uncompressed" spatial:dataType="double" spatial:domainType="domainType_1" spatial:id="parametricObject_1" spatial:name="someString" spatial:pointIndexLength="0" spatial:polygonType="triangle">
    0 2 5; 0 6 2; 0 5 6; 2 6 5
</spatial:parametricObject>

I did not know how these values are being read. It turned out that there is a method processCharactersOf() in SpatialParser which reads these
values and calls the append() method of the ParametricObject class. The method appends the values to a string variable pointIndex one by one.

After this, the check on the values was fairly simple. I used StringTokenizer with an alternate constructor, passing a string of delimiters
" ;" as an argument. This argument helps the StringTokenizer to separate the string "0 2 5; 0 6 2; 0 5 6; 2 6 5" whenever it encounters a
space or a semicolon, thus returning the individual values.

For the SpatialPoints class, I still need to verify with a mentor of mine if the 'id' and 'name' attributes are a part of the new specification.
After that, necessary changes will be made either to the test files, or the existing class to incorporate the final decision.

Today I shall discuss checks on invalid and unknown attributes on an element.

1)
The value of the attribute <attribute_name> of a <spatial_element> object must be an array of values of type <datatype>.
OR
The value of the attribute <attribute_name> of a <spatial_element> object must conform to the syntax of SBML data type 
<datakind_class> and may only take on the allowed values of <datakind_class> defined in SBML; that is, the value must 
be one of the following: “value_1” or “value_2”.

For such rules, the first step is to modify the source class to handle invalid attributes whenever they are encountered.
In the readAttribute() method of the class, look for a branch condition on attributeName.equals() for the concerned attribute.
There must be an enclosed try-catch block inside the branched block of code. The try block sets the value of the attribute,
and the setting method throws an exception if the value is not syntactically correct. This is caught by the catch block, and
that is where we need to add the following line of code:

AbstractReaderWriter.processInvalidAttribute(attributeName, null, value, prefix, this);

This call handles the invalid attribute and adds it to the INVALID_XML object of the class.
Now the constraint is impemented by a single line as:

func = new InvalidAttributeValidationFunction<spatial_element>(SpatialConstants.<attribute_name>);

2)
A <spatial_element> object must have the required attributes <attribute_1> and <attribute_2>, and may have the optional 
attributes <attribute_3>, <attribute_4> and <attribute_5>. No other attributes from the SBML Level 3 Spatial Processes 
namespaces are permitted on a SpatialPoints object.

The helper class for such a rule is UnknownPackageAttributeValidationFunction. We need to override the check() method
to incorporate test on the must-required attributes. We call the pre-implemented check() method by a super call after
checking must-required attributes.

func = new UnknownPackageAttributeValidationFunction<element_name>(SpatialConstants.shortLabel) {
     @Override
     public boolean check(ValidationContext ctx, SpatialElement obj){
if(!obj.isSetAttribute1){
    return false;
}
if(!obj.isSetAttribute2){
    return false;
}
return super.check(ctx, obj);
     }
};

Saturday, July 6, 2019

#7 Coding Continues

Greetings people!

The results of the first evaluation were released last week, and I'm glad that I have passed to the next phase of the program, with a positive response from my mentors.

I was not able to do a lot of work in the past week, but I managed to implement constraints for 2 more classes, namely, CSGPrimitive and CSGSetOperator. I did face a bit of problem with CSGSetOperator initially as the XML was not being read properly. I figured it out as a problem in the SpatialParser.

The parent of the listOfCsgNodes element was being read as CSGObject but it should have been SCGSetOperator. CSGObject is usually the parent of CSGSetOperator. Thus it was accessing the parent of the parent and hence the problem. A simple deletion of the getParent() call (where it appears for the second time) did the fix.

As of now, only 6 classes remain to be implemented. I hope to complete the preliminary work by the end of the next week. After that, I'll start working on adding constraints by hand.

Thursday, June 27, 2019

#6 Fingers Crossed For The First Eval!

Currently, the first evaluation phase is underway and ends tomorrow,  on the 28th of June. Prior to actually opening the evaluation form, I was expecting a long questionnaire. But it turned out to be a rather small one!

The last week went pretty smooth, so as to say. Thankfully, I did not encounter any new issues in the code. But I had to do one tedious job alongside coding the constraints. For the set of constraints, I am currently working with, the test files are faulty, and actually, do not contain the elements that I need to test. As a result, I need to edit all the test files to include appropriate XML elements and attributes. And this is, needless to say, a very tedious job! But I started enjoying that eventually. I reduced the job to just copy paste, by making a template and then making the minor changes so that it fails (or succeeds).

As I said in the last post, I will start discussing some of the broad types of constraints that I am implementing in my project. In this post, I'll discuss the constraints regarding CORE attributes and elements.

1)
" A <some_spatial_element> object may have the optional SBML Level 3 Core attributes metaid and sboTerm. No other attributes from the SBML Level 3 Core namespaces are permitted on a <some_spatial_element>. "

Let us take an element as an example. In XML, a Domain element would be typically written as:

<spatial:domain spatial:domainType="domainType_1" spatial:id="domain_1" spatial:name="someString">

The prefix spatial indicates that the element/attribute belongs to the spatial namespace. If an element/attribute does not have a prefix, it is considered to belong to the core package of JSBML. The rule says that this element (Domain) can have 'optional' attributes 'metaid' and 'sboTerm' from the core. This means that these are not compulsory but can be added. Thus the following XML examples are valid:

<spatial:domain metaid="someStringspatial:domainType="domainType_1" spatial:id="domain_1" spatial:name="someString">

<spatial:domain sboTerm="SBO:0000001spatial:domainType="domainType_1" spatial:id="domain_1" spatial:name="someString">

Whereas, the following XML is a wrong XML as it contains an attribute 'foo' which is not recognized by core.

<spatial:domain foo="someStringspatial:domainType="domainType_1" spatial:id="domain_1" spatial:name="someString">

To implement this check, we simply return the helper function for unknown core attributes as:

func = new UnknownCoreAttributeValidationFunction<Domain>();


2)
" A <some_spatial_element> object may have the optional SBML Level 3 Core subobjects for notes and annotations. No other elements from the SBML Level 3 Core namespaces are permitted on a <some_spatial_element>. "

Continuing with the same example, an XML element can have multiple children elements. The domain element typically has listOfInteriorPoints as its child. But it can have the <notes/> or <annotation/> element from core as its child too. Thus some valid XMLs look like:

<spatial:domain spatial:domainType="domainType_1" spatial:id="domain_1" spatial:name="someString">
     <notes/>
</spatial:domain>

<spatial:domain spatial:domainType="domainType_1" spatial:id="domain_1" spatial:name="someString">
     <annotation/>
</spatial:domain>


The following example is an invalid XML because the element <foo/> does not exist in the core.

<spatial:domain spatial:domainType="domainType_1" spatial:id="domain_1" spatial:name="someString">
     <foo/>
</spatial:domain>


To implement this check, we again simply make use of a helper validation function for unknown core elements:

func = new UnknownCoreElementValidationFunction<Domain>();



In the next blog post, I will continue with the discussion on constraint implementations, and give further updates on my project.

Till next time,
Cheers! :D

Friday, June 21, 2019

#5 First Evaluation Approaches!

The first phase of coding is nearing its end, and the first evaluation is due on June 24th.

The first 4 weeks of GSoC have been immensely rewarding in terms of learning and working experience. Since the existing code was quite old, I encountered quite a few problems any time I worked on a new type of constraint. With the help of my mentor, I was able to fix pretty much all of those issues, saving one or two odd issues.

In this blog post, I'll discuss some more issues that I encountered along the way and their solutions in short.

1.
Once I started implementing checks for core attributes on spatial elements, I noticed sometimes they were not being stored in the UNKNOWN_XML. The attributes on a spatial element are read by the readAttribute() method of the respective spatial class. The problem was due to the statement like:

   boolean isAttributeRead = (super.readAttribute(attributeName, prefix, value))
                          && (SpatialConstants.shortLabel == prefix);

The problem here is the condition that prefix is the same as the shortLabel, which is nothing but the package name. Core attributes do not have any prefix and thus they are finally considered not processed as isAttributeRead becomes false.
Just removing the second condition does the fix and the core attributes are read properly.

2.
A major problem, due to which some constraints were facing issues, as I mentioned in the previous blog, was regarding the ListOf<> elements.
In classes such as Domain.java, there are ListOf<?> elements which are children on these classes in the XML. The checks for such ListOf<?> elements included checking core elements and attributes. 
The problem was that the unknown core elements or attributed were being added to the UNKNOWN_XML of a ListOf context object, but the ListOf was being duplicated during the flow of the program because of a faulty isSetListOf() method:

   public boolean isSetListOfInteriorPoints() {
      if ((listOfInteriorPoints == null) || listOfInteriorPoints.isEmpty()) {
         return false;
      }
      return true; 
   }


This method, apart from checking if the container is null, also checks if it is empty. As a result, new containers were being initialized in the code when the container was not populated.  So, the UNKNOWN_XML belonging to the previous container is lost and the checks fail. 

Again, just simply removing the condition listOfInteriorPoints.isEmpty() does the job and the tests start working properly.


In forthcoming blogs, I will start discussing the various categories (broadly) of constraint rules, and their implementation.

Till next time,
Cheers! :-D

Wednesday, June 12, 2019

#4 Writing My First Constraints

The last week was quite interesting as I got a hang of writing my own constraints, and things started moving ahead smoothly, for most of it!

To make my work easier, I created a template of a Constraint Class, with all the documentation and the constructs common to all constraint classes. I pretty much just need to insert the class name at 2-3 odd places to make it ready for the actual checks.

JSBML (core) has a bunch of 'helper classes' which contain some frequently used validations. Most other validations are extensions of these helper functions, and sometimes, I use them directly.

I'll describe some of these classes briefly here:

1) DuplicatedElementValidationFunction
    This class is used to check that a child XML element was not present more than once.

2) InvalidAttributeValidationFunction
    This class is used to check if any invalid XML attributes were found in an element. The argument for the constructor of this class is the attribute name.

3) UnknownPackageAttributeValidationFunction
    This class is used to check if any unknown XML attributes from a specified package/namespace were found. The constructor for this class takes the package name as an argument. For spatial, we pass SpatialConstants.shortLabel (SpatialConstants is the class containing various constant strings related to the spatial package. The shortLabel constant contains the namespace string)

4) UnknownPackageElementValidationFunction
    This class is used to check if any unknown XML elements from a specified package/namespace were found. The argument for the constructor is the same as the previous class.

5) UnknownCoreAttributeValidationFunction
    This class used to check if any unknown XML attributes from SBML core were found.

6) UnknownCoreElementValidationFunction
    This class used to check if any unknown XML elements from SBML core were found.

I had studied the various packages for which constraints had already been implemented, and that helped me to figure out in advance on how to use these classes.

Now I'll come to the modifications that were required in the existing code of spatial.

The first major change had to be made to the SpatialParser.java . In the method processStartElement, every time we got a match for the contextObject as an instanceof some class, I needed to add the element being read to a list which would be utilised in later validations by classes such the DuplicatedElementValidationFunction. The classes would then be required only to get the data from this list. This had to be done by using the AbstractReaderWriter class and call its method storeElementsOrder(). Also, if we did not get a match for any element after the previous check, I had to invoke the processUnknownElement() method of the AbstractReaderWriter to handle the unknown element.

The next change was in StringTools.java from JSBML Core. I had to add a method to parse a string as double and throw an exception if it was not able to. The existing method did not throw an exception, and only registered a warning in the logger.

There is one change which I have to make to the source class (for which I am writing the constraint) of every constraint class. In the readAttribute() method of the class, I need to add AbstractReaderWriter.processInvalidAttribute() wherever we have a try/catch construct to assign a value to an attribute. This must be done to handle invalid attributes as we encounter them.

This was pretty much what I explored and worked on in the past week. I have implemented constraints for around 13 classes successfully by now!
I am experiencing some issues in 1 or 2 odd classes, which I shall discuss once I solve them!

Till next time,
Cheers! :-D

Wednesday, June 5, 2019

3# First Week Of Coding

The coding period for GSoC'19 officially began on May 27th (Monday).

The first week was pretty much spent in properly setting up the tests for the constraint classes. I tried setting up the project with the help of my mentors during a hangout meeting multiple times.

At first, the OfflineValidatorTests.java was not able to locate classes of the spatial package. But things only got worse later when the class itself refused to run! 3 failed attempts later, we decided that for the time being, I could create a bigjar of the entire jsbml project and run the tests via the command line as:

~$ ant bigjar
~$ java -classpath dist/jsbml-1.5-SNAPSHOT/jsbml-1.5-SNAPSHOT-with-dependencies.jar org.sbml.jsbml.test.OfflineValidatorTests /home/bhavye/GSOC/syntactic-cases-2017-11-20 spatial-20201 xml

I tried to set up the project myself afterwards and got it running finally. In the following text, I will describe the process I followed to get the tests up and running.

  1. Create a new workspace in eclipse.
  2. From the 'File' drop-down menu, select 'Open Projects from File System'.
  3. Locate the jsbml (cloned) directory and select.
  4. Select 'Finish' with default options selected. 
  5. In the package explorer, enter core -> test -> org.sbml.jsbml.test
  6. Right click on OfflineValidatorTests.java -> Run As -> Java Application
  7. Now again right click on  OfflineValidatorTests.java, go in 'Run As' and select 'Run Configurations'.
  8. Enter the 'Dependencies' tab, and select 'Classpath Entries'.
  9. Select 'Add Projects...' and check the tick box of spatial.
  10. Enter the 'Source' tab and select 'Add...'.
  11. Select 'Java Project' and check spatial.
The arguments for the configuration would be of the format:
<test files directory> <error codes (separated by colon if range)> xml

eg:
/home/bhavye/GSOC/spatial-test-files spatial-21305 xml
/home/bhavye/GSOC/spatial-test-files spatial-21301:spatial-21305 xml

The alternate, rather the more neat way of setting up the project would be to select 'Create a New Java Project' at the beginning and then proceed to clear unwanted files from the build path. But that created unforeseen errors. It generates a bin folder to keep all the class files, and so, having the jsbml/bin folder as the default output folder for every project should have worked in the first place.

So finally having my project running, I started creating the actual constraint classes and tested them along the way. I will discuss the modifications in the existing code and the new problems in the forthcoming blogs!
 


Monday, May 27, 2019

#2 First Hangout Meeting

Last Tuesday, I had my first Hangout meeting with my mentors. Here's the recap:
  • We completed the student-mentor agreement for GSoC, 2019, which was to be mailed to the org admin.
  • Meanwhile, the mentors had a bit of a hard time pronouncing my name, as one can imagine, with the strong H sounds in Indian names!
  • My mentors helped me set up the OfflineValidatorTests class, which would be required to test the tests I write!
    • The run configuration of the Java class was updated with the directory path to the test files, the range of the test codes, and the filter 'xml'.
    • The program was executed successfully for the layout package, but when we tried to run it with the configuration for spatial, it faltered and ran checks for core. 
    • The problem was figured out by my mentor and was fixed the very next day.
  • In the end, it was decided that we'll be meeting every Tuesday evening (IST). I also resolved a couple of doubts regarding the tests and with that, we concluded our meeting.
For the remaining time in the community bonding period, I decided to go through the implementations of constraints from other packages and try to resolve any issues I encounter along the way.

Till next time,
Cheers!

Monday, May 20, 2019

Starting Google Summer of Code, 2019

Hi all!

I am Bhavye Jain, and I am glad to share that I was selected to participate in Google Summer of Code (GSoC) this year, with the organisation National Resource for Network Biology (NRNB)! Many thanks to my mentors Nicolas Rodriguez, Thomas M. Hamm and Dr. Andreas Dräger for the guidance during this journey!

My proposal for this year's GSoC was titled "Validation of Spatial System Biology Models in Java" and the project can be viewed here. I intend to use this blog to share my progress on the project. The official coding period starts from 27 May, 2019 and currently, the community bonding period is underway. I'll try to utilize this period by interacting with my mentors and discussing the workflow of the project.

The Organisation

The aim of the National Resource for Network Biology (NRNB) is to advance the new science of Biological Networks through analytic tools, visualizations, databases and computing resources. Biomedical research is increasingly dependent on knowledge of biological networks of multiple types and scales, including gene, protein and drug interactions, cell-cell and cell-host communication, and vast social networks. Our technologies enable researchers to assemble and analyze these networks and to use them to better understand biological systems and, in particular, how they fail in disease. You can learn more about it by visiting http://nrnb.org/

The Project


My project is based on the GitHub issue #120 and I will be working on this GitHub repository
SBML is a machine-readable representation format, based on XML, for representing biological models. SBML can encode models consisting of entities (species) acted upon by processes (reactions). JSBML (JavaTM SBML) is a community-driven project to create a free, open-source, pure Java library for reading, writing, and manipulating SBML files and data streams. An extension of SBML, called spatial processes (spatial), supports for describing processes that involve a spatial component and describing the geometries involved.

In this project, the Java implementation of the spatial modeling package for SBML will be updated and a full validation function will be implemented for this package. A well-built, predefined system for offline validation of SBML files already exists, and this project will enhance its capabilities. 

I shall be working in a phased manner to write the various ConstraintDeclaration classes for the validation rules provided in the latest specification of the spatial package. The skeleton code for writing a ConstraintDeclaration is provided in the documentation of JSBML, and several helper functions are available for use while creating the new classes. The details of the implementation can be viewed in my project proposal. During the last phase of the project, I will try to add new rules manually, which have not been explicitly listed in the specification, but can be extrapolated from the textual description of the classes in spatial package.

My mentors say that if I complete my project early, I might get a chance to work on something even more interesting! Let's hope I get through with my work timely enough to get to that part!

Till next time,
Cheers! :D