It's rather difficult to implement F4 in Maple and have it run fast. The problem is that Maple lacks an appropriate data structure for sparse linear algebra. This is how I would do it in C. It helps to think of the algorithm as running over a finite field so that all the coefficients are the same size and there is no coefficient blowup.
In each iteration, F4 computes a batch of S-polynomials and reduces them. It doesn't particularly care about the remainders of these polynomials, it's just trying to find new leading monomials. So imagine you wrote down the S-polynomials and kept their terms sorted in a monomial order. Eliminate the largest monomial(s) from the polynomials in any way you want. Discard zeros. Then add these polynomials to the basis, compute all the S-polynomials, and repeat. It helps to think of these polynomials as rows in a matrix.
You'll notice that eliminating the largest monomial sounds an awful lot like polynomial division. This is not a co-incidence, polynomial division is equivalent to a linear algebra problem. In F4 you have much more flexiblility for eliminating terms because you have a set of polynomials that you're reducing, and a set of polynomials you're dividing by. So there are many different strategies you could use. Algorithms such as SlimGB exploit the sparsity of polynomials and the size of the coefficients.
It is very desirable to implement the algorithm without storing the entire matrix in memory or operating on entire rows at once. More "working storage" means slower algorithm. Ideally you should only operate on the leading elements, and construct the rest of the polynomials term by term, as you need them, on the fly. For an idea of how to do this, see Stephen C. Johnson "Sparse Polynomial Arithmetic" and note the algorithms with a heap. You may also be interested in my papers at http://www.cecm.sfu.ca/~rpearcea/