tracker-extract-oasis misparses .odt files with nested <text> tags
See the example file from #109
The file contains this XML content:
<text:p text:style-name="P3">A wonderful serenity has taken possession of my entire soul, like these sweet mornings of spring which I enjoy with my whole heart. I am alone, and feel the charm of existence in this spot, which was created for the bliss of souls like <text:span text:style-name="T2">Wandabuz</text:span>. I am so happy, my dear friend, so absorb<text:span text:style-name="T2">issimmo</text:span> in the exquisite sense of mere tranquil existence, that I neglect my talents. I should be incapable of drawing a single stroke at the present moment; and yet I feel that I never was a greater artist than now. When, while the lovely valley teems with vapour around me, and the meridian sun strikes the upper surface of the impenetrable foliage of my trees, and but a few stray gleams steal into the inner sanctuary, I throw myself down among the tall grass by the trickling stream; and, as I lie close to the earth, a thousand unknown plants are noticed by me: when I hear the buzz of the little world among the stalks, and grow familiar with the countless indescribable forms of the insect<text:span text:style-name="T1">odoits</text:span> and flies, then I feel the presence of the Almighty, who formed us in his own image, and the breath.</text:p>
The extraction produces this output:
nie:plainTextContent "A wonderful serenity has taken possession of my entire soul, like these sweet mornings of spring which I enjoy with my whole heart. I am alone, and feel the charm of existence in this spot, which was created for the bliss of souls like Wandabuz issimmo odoits "
The issue is that there is a <text:span>
tag nested inside another <text:span>
tag. The rather simplistic parser in tracker-extract-oasis.c
doesn't handle this case correctly.