Currently I find it easy to read manga or text, but text with occasional pictures often throws me off of my rhythm.
This is probably more what I'm used to than an innate thing, though?
Those pictures specifically: I think I wouldn't mind them, although I think the text is indistinct and bad. I think you could just remove it? Like the whole point of pictures in a picture book is to contrast with and complement words.
I would highly suggest not using a spoiler, as that kills the aesthetic-- if you're going to use images, use it to make your work stronger for those who do like images, and don't compromise to please people who don't like images.
I would doublecheck that whatever images you use look good and look good with the text in both in phone mode and in website mode, since people will inevitably use both.