Author Topic: SBXML Extension Module  (Read 2995 times)

support

  • Administrator
  • *****
  • Posts: 844
    • Script BASIC Open Source Project
SBXML Extension Module
« on: November 20, 2012, 02:47:16 PM »
I have been working on a XML parser (reader) written in ScriptBasic. The Gnome XML2 extension module is undocumented and doesn't seem to work. Armando's mini-XML example is just that and needs to be finished. I even looked at expat as an option but all are memory hogs and require a callback interface. (or at least emulated) ScriptBasic with it's flexible array usage, (element, associative or a combination of both) typeless variables and outstanding memory management/garbage collection, is a natural for XML parsing. I'm using the mini-XML user interface as a guide but will have extra functions like SBXML::DumpData() and SBXML::DumpTags() if all you're interested in is the data or want to view the XML definition tags in the document tree.

Code: [Select]
IMPORT sbxml.bas

doc = SBXML::LoadDoc("simple.xml")
node = SBXML::GetNode(doc,"/breakfast_menu/food/name")
PRINT SBXML::GetNodeValue(node),"\n"

jrs@laptop:~/sb/test$ scriba testxml.sb
Belgian Waffles
jrs@laptop:~/sb/test$

simple.xml
Code: [Select]
<?xml version="1.0" encoding="ISO-8859-1"?>
<!-- Edited by XMLSpy® -->
<breakfast_menu>
<food>
<name>Belgian Waffles</name>
<price>$5.95</price>
<description>two of our famous Belgian Waffles with plenty of real maple syrup</description>
<calories>650</calories>
</food>
<food>
<name>Strawberry Belgian Waffles</name>
<price>$7.95</price>
<description>light Belgian waffles covered with strawberries and whipped cream</description>
<calories>900</calories>
</food>
<food>
<name>Berry-Berry Belgian Waffles</name>
<price>$8.95</price>
<description>light Belgian waffles covered with an assortment of fresh berries and whipped cream</description>
<calories>900</calories>
</food>
<food>
<name>French Toast</name>
<price>$4.50</price>
<description>thick slices made from our homemade sourdough bread</description>
<calories>600</calories>
</food>
<food>
<name>Homestyle Breakfast</name>
<price>$6.95</price>
<description>two eggs, bacon or sausage, toast, and our ever-popular hash browns</description>
<calories>950</calories>
</food>
</breakfast_menu>

This is what the XML file looks like in a ScriptBasic array.

node[idx]{class} = value

Code: [Select]
0 - HDR - ?xml version="1.0" encoding="ISO-8859-1"?
1 - REM - !-- Edited by XMLSpy® --
2 - NS - breakfast_menu
3 - NS - food
4 - NS - name
5 - DATA - Belgian Waffles
6 - NE - name
7 - NS - price
8 - DATA - $5.95
9 - NE - price
10 - NS - description
11 - DATA - two of our famous Belgian Waffles with plenty of real maple syrup
12 - NE - description
13 - NS - calories
14 - DATA - 650
15 - NE - calories
16 - NE - food
17 - NS - food
18 - NS - name
19 - DATA - Strawberry Belgian Waffles
20 - NE - name
21 - NS - price
22 - DATA - $7.95
23 - NE - price
24 - NS - description
25 - DATA - light Belgian waffles covered with strawberries and whipped cream
26 - NE - description
27 - NS - calories
28 - DATA - 900
29 - NE - calories
30 - NE - food
31 - NS - food
32 - NS - name
33 - DATA - Berry-Berry Belgian Waffles
34 - NE - name
35 - NS - price
36 - DATA - $8.95
37 - NE - price
38 - NS - description
39 - DATA - light Belgian waffles covered with an assortment of fresh berries and whipped cream
40 - NE - description
41 - NS - calories
42 - DATA - 900
43 - NE - calories
44 - NE - food
45 - NS - food
46 - NS - name
47 - DATA - French Toast
48 - NE - name
49 - NS - price
50 - DATA - $4.50
51 - NE - price
52 - NS - description
53 - DATA - thick slices made from our homemade sourdough bread
54 - NE - description
55 - NS - calories
56 - DATA - 600
57 - NE - calories
58 - NE - food
59 - NS - food
60 - NS - name
61 - DATA - Homestyle Breakfast
62 - NE - name
63 - NS - price
64 - DATA - $6.95
65 - NE - price
66 - NS - description
67 - DATA - two eggs, bacon or sausage, toast, and our ever-popular hash browns
68 - NE - description
69 - NS - calories
70 - DATA - 950
71 - NE - calories
72 - NE - food
73 - NE - breakfast_menu
« Last Edit: November 20, 2012, 05:46:31 PM by support »
Script BASIC Project Manager

support

  • Administrator
  • *****
  • Posts: 844
    • Script BASIC Open Source Project
SBXML Extension Module
« Reply #1 on: November 23, 2012, 10:59:13 PM »
Here is the Mini-XML SB example working with my SBXML extension module written in ScriptBasic. It's still needs work but the proof of concept phase is complete.

Code: [Select]
IMPORT "/home/jrs/sb/modules/sbxml.bas"

doc = SBXML::LoadDoc("simple.xml")

node = SBXML::GetNode(doc,"/breakfast_menu")
child = SBXML::GetChild(node)

REPEAT

  node = SBXML::GetNode(child,"name")
  if node then PRINT "name = ",SBXML::GetNodeValue(node),"\n"
  node = SBXML::GetNode(child,"price")
  if node THEN PRINT "price = ",SBXML::GetNodeValue(node),"\n"
  node = SBXML::GetNode(child,"description")
  if node THEN PRINT "description = ",SBXML::GetNodeValue(node),"\n"
  node = SBXML::GetNode(child,"calories")
  if node THEN PRINT "calories = ",SBXML::GetNodeValue(node),"\n"
  child = SBXML::GetNext(child)

UNTIL child = undef

jrs@laptop:~/sb/test$ scriba simple_xml.sb
name = Belgian Waffles
price = $5.95
description = two of our famous Belgian Waffles with plenty of real maple syrup
calories = 650
name = Strawberry Belgian Waffles
price = $7.95
description = light Belgian waffles covered with strawberries and whipped cream
calories = 900
name = Berry-Berry Belgian Waffles
price = $8.95
description = light Belgian waffles covered with an assortment of fresh berries and whipped cream
calories = 900
name = French Toast
price = $4.50
description = thick slices made from our homemade sourdough bread
calories = 600
name = Homestyle Breakfast
price = $6.95
description = two eggs, bacon or sausage, toast, and our ever-popular hash browns
calories = 950
jrs@laptop:~/sb/test$

Here are the execution time differences between using the Mini-XML C library extension module and my SBXML version written in ScriptBasic.

SBXML ScriptBasic extension module

real   0m0.022s
user   0m0.020s
sys   0m0.000s


Mini-XML C extension module

real   0m0.010s
user   0m0.008s
sys   0m0.000s

« Last Edit: November 23, 2012, 11:56:13 PM by support »
Script BASIC Project Manager

support

  • Administrator
  • *****
  • Posts: 844
    • Script BASIC Open Source Project
Re: SBXML Extension Module
« Reply #2 on: November 25, 2012, 09:16:36 PM »
I took a short break from coding and did a bit of research about the XML DOM and how browsers parse XML data with JavaScript. My current SBXML array structure really doesn't represent the document tree as it should and counts on unique tag names for node access. I'm in the process of adding a tree level element to my structure definition.

node[index][level]{class} = value

I feel this will allow the following advantages using this approach.

  • Faster node searches
  • Access by Mini-XML like path or by node tree level
  • Easier access to complex and non-well formed XML documents
  • Informative node list functions and tree structure


The plan for the first release is to be a XML document reader / parser only. Editing and saving XML documents will come in a later release if enough interest in the module materializes.

I'm getting pretty close to the Mini-XML C extension module execution times with the new addition of the level element in my XML array structure.

real   0m0.017s
user   0m0.016s
sys   0m0.000s



 
« Last Edit: November 28, 2012, 09:45:39 PM by support »
Script BASIC Project Manager

support

  • Administrator
  • *****
  • Posts: 844
    • Script BASIC Open Source Project
Re: SBXML Extension Module
« Reply #3 on: November 25, 2012, 09:50:47 PM »
What is the XML DOM?

  • A standard object model for XML
  • A standard programming interface for XML
  • Platform- and language-independent
  • A W3C standard



DOM Nodes

  • The entire document is a document node
  • Every XML element is an element node
  • The text in the XML elements are text nodes
  • Every attribute is an attribute node
  • Comments are comment nodes



Node Parents, Children, and Siblings

  • In a node tree, the top node is called the root
  • Every node, except the root, has exactly one parent node
  • A node can have any number of children
  • A leaf is a node with no children
  • Siblings are nodes with the same parent
Script BASIC Project Manager